首页 \ 问答 \ ElasticSearch - 由空格分割的术语聚合(ElasticSearch - terms aggregation split by whitespace)

ElasticSearch - 由空格分割的术语聚合(ElasticSearch - terms aggregation split by whitespace)

 我有一堆弹性搜索文档，其中包含有关招聘广告的信息。 我正在尝试聚合attributes.Title字段以从作业发布中提取“体验”实例的数量。 例如Junior，Senior，Lead等。相反，我得到的是与标题整体匹配的桶而不是标题字段的每个单词。 例如“初级Java开发人员”，“高级.NET分析师”等。  
 我如何告诉弹性搜索根据标题中的每个单词拆分聚合，而不是匹配整个字段的值。  
 我后来想要扩展查询以提取“技能级别”和“角色”，但是如果存储桶包含字段中的所有单词，只要它们被拆分为单独的存储桶，它也应该没问题。  
 当前查询：  
GET /jobs/_search
{
  "query": {
    "simple_query_string" : {
        "query": "Java",
        "fields": ["attributes.Title"]
    }
  },
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "attributes.Title.keyword"
      }
    }
  }
}
 
 不需要的输出：  
{
  ...
  "hits": {
    "total": 63,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 14,
      "buckets": [{
          "key": "Junior Java Tester",
          "doc_count": 6
        },{
          "key": "Senior Java Lead",
          "doc_count": 6
        },{
          "key": "Intern Java Tester",
          "doc_count": 5
        },
        ...
      ]
    }
  }
}
 
 期望的输出：  
{
  ...
  "hits": {
    "total": 63,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 14,
      "buckets": [{
          "key": "Junior",
          "doc_count": 12
        },{
          "key": "Senior",
          "doc_count": 8
        },{
          "key": "Tester",
          "doc_count": 5
        },{
          "key": "Intern",
          "doc_count": 5
        },{
          "key": "Analyst",
          "doc_count": 5
        },
        ...
      ]
    }
  }
}

I have a bunch of elastic search documents that contain information about jobs ads. I'm trying to aggregate the attributes.Title field to extract the number of "experience" instances from the job posting. e.g. Junior, Senior, Lead, etc. Instead what I'm getting are buckets that match the title as a whole instead of the each word it the title field. e.g. "Junior Java Developer", "Senior .NET Analyst", etc. 
How can I tell elastic search to split the aggregation based on each word in the title as opposed the matching the value of the whole field. 
I would later like to expand the query to also extract the "skill level" and "role", but it should also be fine if the buckets contain all the words in the field as long as they are split into separate buckets. 
Current query: 
GET /jobs/_search
{
  "query": {
    "simple_query_string" : {
        "query": "Java",
        "fields": ["attributes.Title"]
    }
  },
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "attributes.Title.keyword"
      }
    }
  }
}
 
Unwanted Output: 
{
  ...
  "hits": {
    "total": 63,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 14,
      "buckets": [{
          "key": "Junior Java Tester",
          "doc_count": 6
        },{
          "key": "Senior Java Lead",
          "doc_count": 6
        },{
          "key": "Intern Java Tester",
          "doc_count": 5
        },
        ...
      ]
    }
  }
}
 
Desired Output: 
{
  ...
  "hits": {
    "total": 63,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 14,
      "buckets": [{
          "key": "Junior",
          "doc_count": 12
        },{
          "key": "Senior",
          "doc_count": 8
        },{
          "key": "Tester",
          "doc_count": 5
        },{
          "key": "Intern",
          "doc_count": 5
        },{
          "key": "Analyst",
          "doc_count": 5
        },
        ...
      ]
    }
  }
}

原文：https://stackoverflow.com/questions/46669017

更新时间：2024-01-21 12:01

最满意答案

 看看Visual Studio调试和发布模式  
 发布模式  
 在发布模式下构建程序集时，编译器会执行所有可用的优化，以确保输出的可执行文件和库尽可能高效地执行。 此模式应用于要发布给最终用户的已完成和测试的软件。 释放模式的缺点是虽然生成的代码通常更快更小，但调试工具无法访问它。  
 调试模式  
 在开发软件时使用调试模式。 在调试模式下编译程序集时，会嵌入其他符号信息，并且不会优化代码。 这意味着编译器的输出通常更大，更慢且效率更低。 但是，可以将调试器附加到正在运行的程序，以允许在监视内部变量值的同时逐步执行代码。 

Have a look at Visual Studio Debug and Release Modes 
Release Mode 
When an assembly is built in release mode, the compiler performs all available optimisations to ensure that the outputted executables and libraries execute as efficiently as possible. This mode should be used for completed and tested software that is to be released to end-users. The drawback of release mode is that whilst the generated code is usually faster and smaller, it is not accessible to debugging tools. 
Debug Mode 
Debug mode is used whilst developing software. When an assembly is compiled in debug mode, additional symbolic information is embedded and the code is not optimised. This means that the output of the compiler is generally larger, slower and less efficient. However, a debugger can be attached to the running program to allow the code to be stepped through whilst monitoring the values of internal variables.

ElasticSearch - 由空格分割的术语聚合(ElasticSearch - terms aggregation split by whitespace)

最满意答案

相关问答

TCP/IP模型是一个________。[2023-05-19]

下列中不属于面向对象的编程语言的是?[2022-05-30]

OSX上的Eclipse C ++ Cant Build(Eclipse C++ on OSX Cant Build)[2022-10-14]

使用Go构建包装C ++(Wrapping C++ with Go build)[2022-08-09]

理解#includes c ++(Understanding #includes c++)[2022-02-03]

构建的c ++问题(c++ problem with build)[2023-03-01]

理解C ++ [重复](Understanding C++ [duplicate])[2022-01-31]

理解指针c ++(understanding pointers c++)[2023-12-26]

理解Build c ++(understanding a Build c++)[2023-01-07]

C ++构建过程(C++ Build Process)[2022-05-14]

相关文章

最新问答