首页 \ 问答 \ Solr:如何突出显示整个搜索短语?(Solr: how to highlight the whole search phrase only?)

Solr:如何突出显示整个搜索短语?(Solr: how to highlight the whole search phrase only?)

我需要执行短语搜索。 在搜索结果我得到确切的词组匹配,但看着突出显示的部分,我看到这个短语被标记化,即当我搜索“第1天”时,我得到这个词:

<arr name="post">
  <str><em>Day</em> <em>1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>

这就是我想要得到的结果:

    <arr name="post">
  <str><em>Day 1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>

我在做的查询是这样的:管理控制台:

q = day 1 
fq = post:"day 1" OR title:"day 1"
hl = true
hl.fl =title,post

选择Q =天+ 1&FQ =张贴%3A%22天+ 1%22 + OR +标题%3A%22天+ 1%22重量= XML&缩进=真HL =真hl.fl =标题%2Cpost&hl.simple.pre =%3Cem%3E&HL .simple.post =%3C%2Fem%3E

这些是我的领域:

     <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />

这是我的fied类型text_general的solr模式部分:

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.GreekStemFilterFactory"/>
    <filter class="solr.GreekLowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

B)我可以在突出部分看到更多令人不安的结果,即突出显示不是预期的整个单词,而是单个片段:在.where you get to see all of Athens ... <em>Day</em> 2 - Carmens我不想要在突出显示的部分查看此结果(仅需要查看“第1天”这两个词)。 有任何想法吗 ?

我正在阅读Solr亮点部分,但是......真的......甚至没有一个例子!


A I need to perform a phrase search. On the search results Im getting the exact phrase matches but looking at the highlighted parts I see that the phrase are tokenized i.e This is what I get when I search for the prase "Day 1" :

<arr name="post">
  <str><em>Day</em> <em>1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>

This is what I want to receive as a result:

    <arr name="post">
  <str><em>Day 1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>

The query I m doing is this: Admin console:

q = day 1 
fq = post:"day 1" OR title:"day 1"
hl = true
hl.fl =title,post

select?q=day+1&fq=post%3A%22day+1%22+OR+title%3A%22day+1%22&wt=xml&indent=true&hl=true&hl.fl=title%2Cpost&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

Theese are my fields:

     <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />

This is the solr schema section for my fied type text_general:

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.GreekStemFilterFactory"/>
    <filter class="solr.GreekLowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

B) I can see in the highlight section more disturbing results i.e highlighting not the whole word as expected but single fragments: .where you get to see all of Athens ... <em>Day</em> 2 - Carmens I dont want to see this result in the highlighted section (Only need to see both words "Day 1"). Any ideas ?

I m reading the Solr highlight section but .. really... there is not even 1 example!!!


原文:https://stackoverflow.com/questions/25930180
更新时间:2024-01-27 20:01

最满意答案

with open(src, newline='') as file:
    r = csv.reader(file, delimiter=';')
    for line in r:
        if len(line[0]) ==2 and line[0].isalpha() and line[16]=='15%':
            print(line) #Or whatever it is you want to do

没有正则表达式真的需要,但r'[a-zA-Z]{2}'也可以工作


with open(src, newline='') as file:
    r = csv.reader(file, delimiter=';')
    for line in r:
        if len(line[0]) ==2 and line[0].isalpha() and line[16]=='15%':
            print(line) #Or whatever it is you want to do

No regex really necessary, but r'[a-zA-Z]{2}' could also work

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)