首页 \ 问答 \ ElasticSearch索引与搜索时间分析器(ElasticSearch index vs search time analyzer)

ElasticSearch索引与搜索时间分析器(ElasticSearch index vs search time analyzer)

遇到一个问题,这让我觉得我没有完全理解ElasticSearch 5.5中的索引与搜索时间分析。

假设我有一个只有namestate的人的基本索引。 为了简单起见,我将al => alabama设置为唯一的状态同义词。

PUT people
{
  "mappings": {
    "person": {
      "properties": {
        "name": {
          "type": "text"
        },
        "state": {
          "type": "text",
          "analyzer": "us_state"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "state_synonyms": {
          "type": "synonym",
          "synonyms": "al => alabama"
        }
      },
      "analyzer": {
        "us_state": {
          "filter": [
            "standard",
            "lowercase",
            "state_synonyms"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}

我的理解是,当我索引一个文档时, state字段数据将被索引为扩展的同义词形式。 这可以运行测试:

GET people/_analyze
{
  "text": "al",
  "field": "state"
}

返回

{
  "tokens": [
    {
      "token": "alabama",
      "start_offset": 0,
      "end_offset": 2,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

看起来不错,让我们索引一个文件:

POST people/person
{
  "name": "dave",
  "state": "al"
}

并执行搜索:

GET people/person/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "state": "al"
          }
        }
      ]
    }
  }
}

它什么都不返回:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

我希望我的搜索中的al可以通过相同的us_state分析器运行并匹配我的文档。 但是,如果我将查询更改为:

"term": { "state": "alabama" }


Running into a problem which makes me think I don't fully understand index vs search time analysis in ElasticSearch 5.5.

Let's say I have a basic index for a person with just a name and a state. For simplicity I have set al => alabama as the only state synonym.

PUT people
{
  "mappings": {
    "person": {
      "properties": {
        "name": {
          "type": "text"
        },
        "state": {
          "type": "text",
          "analyzer": "us_state"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "state_synonyms": {
          "type": "synonym",
          "synonyms": "al => alabama"
        }
      },
      "analyzer": {
        "us_state": {
          "filter": [
            "standard",
            "lowercase",
            "state_synonyms"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}

My understanding is that when I index a document that the state field data will be indexed as the expanded synonym form. This can be tested running:

GET people/_analyze
{
  "text": "al",
  "field": "state"
}

which returns

{
  "tokens": [
    {
      "token": "alabama",
      "start_offset": 0,
      "end_offset": 2,
      "type": "SYNONYM",
      "position": 0
    }
  ]
}

Looks good, let's index a document:

POST people/person
{
  "name": "dave",
  "state": "al"
}

And perform a search:

GET people/person/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "state": "al"
          }
        }
      ]
    }
  }
}

which returns nothing:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

I would expect the al in my search to be run through the same us_state analyzer and match my document. However, the search does work if I change my query to:

"term": { "state": "alabama" }


原文:https://stackoverflow.com/questions/50614230
更新时间:2022-08-25 08:08

最满意答案

命名约定中最重要的是一致性。 只要它是理智且一致的,你几乎可以找出任何命名约定。

话虽这么说,在这种情况下我的名字可能会更加冗长。 路径可能已经足够好了,但我更UserLib.js根据你的例子看到UserRoutes.jsUserModel.js甚至UserLib.js

在我的一些node.js项目中,我甚至不使用.js扩展名。 我的路线例如是user.routes 。 很容易根据不同的扩展名更改编辑器中的语法突出显示。


The most important thing in your naming convention is consistency. You can figure out pretty much any naming convention as long as it is sane and consistent.

That being said, I would probably be more verbose in my names in this case. Paths might be good enough, but I would rather see UserRoutes.js, UserModel.js and maybe even UserLib.js based on your examples.

In some of my node.js projects, I have even taken to not using a .js extension. My routes for instance would be user.routes. It is easy enough to change the syntax highlighting in editors based on different extensions.

相关问答

更多
  • Microsoft提供了一体化代码框架编码标准 ,其中包含一套完整的规则和准则。 (也可在这里 ) 本文档描述了Microsoft All-In-One Code Framework项目团队使用的本地C ++和.NET(C#和VB.NET)编程的编码风格指南。 There is the All-In-One Code Framework Coding Standards from Microsoft which contains a complete set of rules and guidelines. ...
  • 我遵循道格拉斯Crockford的 JavaScript 代码约定 。 我也使用他的JSLint工具来验证遵循这些约定。 I follow Douglas Crockford's code conventions for javascript. I also use his JSLint tool to validate following those conventions.
  • 据我所知,没有标准。 在这段时间里,我发现这些准则是有帮助的: 使用短名称,因为它们不会使日志文件中的行太长。 在开始时创建重要部分的名称。 图形用户界面中的日志查看器倾向于具有列列,而线列通常较小,或者由您读取其他所有列。 不要在线程名称中使用“thread”一词,因为它很明显。 使线程名称容易地grep-able。 避免类似的声音线程名称 如果您有几个相同性质的线程,则可以枚举它们的ID,这些ID对应用程序的一个执行或一个日志文件是唯一的,取决于您的日志记录习惯。 避免使用“WorkerThread”( ...
  • 标准库使用骆驼案,所以我建议你这样做。 第一个字母是大写或小写,具体取决于是否要导出常量。 几个例子: md5.BlockSize os.O_RDONLY是一个例外,因为它直接从POSIX借来。 os.PathSeparator The standard library uses camel-case, so I advise you do that as well. The first letter is uppercase or lowercase depending on whether you wa ...
  • 关于表名,案例等,普遍的惯例是: SQL关键字: UPPER CASE 名称(标识符): lower_case_with_underscores 例如 : UPDATE my_table SET name = 5; 这不是写在石头上,但强烈建议小写的标识符位,IMO。 Postgresql不引用时会对标识符进行异常处理(实际上将它们折叠成小写内部),引用时敏感地显示案例; 很多人不知道这个特质。 使用始终小写,你是安全的。 无论如何,使用camelCase或PascalCase (或UPPER_CASE ...
  • 你需要使用[] -operator: data.setValue(i, 1, response.d[i][columnLabel]); obj.property等同于obj['property'] 。 You need to use the []-operator: data.setValue(i, 1, response.d[i][columnLabel]); obj.property is equivalent to obj['property'].
  • 这不是人类的选择,它是一个缩小的类名。 Facebook规模的线路上的字节数是可测量且成本高昂的。 It's not a human choice, it's a minified classname. Bytes on the wire at Facebook's scale are measurable and costly.
  • 命名约定中最重要的是一致性。 只要它是理智且一致的,你几乎可以找出任何命名约定。 话虽这么说,在这种情况下我的名字可能会更加冗长。 路径可能已经足够好了,但我更UserLib.js根据你的例子看到UserRoutes.js , UserModel.js甚至UserLib.js 。 在我的一些node.js项目中,我甚至不使用.js扩展名。 我的路线例如是user.routes 。 很容易根据不同的扩展名更改编辑器中的语法突出显示。 The most important thing in your namin ...
  • Node的默认约定是camelCase。 但是文件系统模块中的函数根据它们各自的POSIX C接口函数命名。 例如readdir , readlink 。 这些函数名称是Linux开发人员所熟知的,因此它们经常决定按原样使用它们(作为单个单词),而不是驼峰。 Node's default convention is camelCase. But functions in file system module named according to their respective POSIX C interf ...
  • 这些命名约定不是由Spotify设置的,而是由内容提供商设置的,因此Spotify没有正式的规范。 解决此问题的一种方法是在出现错误时存储轨道名称,并从该数据中学习(甚至可以通过机器学习)约定。 希望有所帮助! Those naming conventions aren't set by Spotify, but by the content provider, so there's no formal specification from Spotify. One way you could approa ...

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)