首页 \ 问答 \ Solr：如何突出显示整个搜索短语？(Solr: how to highlight the whole search phrase only?)

Solr：如何突出显示整个搜索短语？(Solr: how to highlight the whole search phrase only?)

 我需要执行短语搜索。 在搜索结果我得到确切的词组匹配，但看着突出显示的部分，我看到这个短语被标记化，即当我搜索“第1天”时，我得到这个词：  
<arr name="post">
  <str><em>Day</em> <em>1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>
 
 这就是我想要得到的结果：  
    <arr name="post">
  <str><em>Day 1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>
 
 我在做的查询是这样的：管理控制台：  
q = day 1 
fq = post:"day 1" OR title:"day 1"
hl = true
hl.fl =title,post
 
 选择Q =天+ 1＆FQ =张贴％3A％22天+ 1％22 + OR +标题％3A％22天+ 1％22重量= XML＆缩进=真HL =真hl.fl =标题％2Cpost＆hl.simple.pre =％3Cem％3E＆HL .simple.post =％3C％2Fem％3E  
 这些是我的领域：  
     <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
 
 这是我的fied类型text_general的solr模式部分：  
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.GreekStemFilterFactory"/>
    <filter class="solr.GreekLowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
 
 B）我可以在突出部分看到更多令人不安的结果，即突出显示不是预期的整个单词，而是单个片段：在.where you get to see all of Athens ... <em>Day</em> 2 - Carmens我不想要在突出显示的部分查看此结果（仅需要查看“第1天”这两个词）。 有任何想法吗 ？  
 我正在阅读Solr亮点部分，但是......真的......甚至没有一个例子！ 

A I need to perform a phrase search. On the search results Im getting the exact phrase matches but looking at the highlighted parts I see that the phrase are tokenized i.e This is what I get when I search for the prase "Day 1" :  
<arr name="post">
  <str><em>Day</em> <em>1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>
 
This is what I want to receive as a result: 
    <arr name="post">
  <str><em>Day 1</em>   We have begun a new adventure! An early morning (4:30 a.m.) has found me meeting with</str>
</arr>
 
The query I m doing is this: Admin console: 
q = day 1 
fq = post:"day 1" OR title:"day 1"
hl = true
hl.fl =title,post
 
select?q=day+1&fq=post%3A%22day+1%22+OR+title%3A%22day+1%22&wt=xml&indent=true&hl=true&hl.fl=title%2Cpost&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E 
Theese are my fields: 
     <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
      <field name="post" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
 
This is the solr schema section for my fied type text_general: 
    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />

    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.GreekStemFilterFactory"/>
    <filter class="solr.GreekLowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>
 
B) I can see in the highlight section more disturbing results i.e highlighting not the whole word as expected but single fragments: .where you get to see all of Athens ... <em>Day</em> 2 - Carmens I dont want to see this result in the highlighted section (Only need to see both words "Day 1"). Any ideas ? 
I m reading the Solr highlight section but .. really... there is not even 1 example!!!

原文：https://stackoverflow.com/questions/25930180

更新时间：2024-01-27 20:01

最满意答案

with open(src, newline='') as file:
    r = csv.reader(file, delimiter=';')
    for line in r:
        if len(line[0]) ==2 and line[0].isalpha() and line[16]=='15%':
            print(line) #Or whatever it is you want to do
 
 没有正则表达式真的需要，但r'[a-zA-Z]{2}'也可以工作 

with open(src, newline='') as file:
    r = csv.reader(file, delimiter=';')
    for line in r:
        if len(line[0]) ==2 and line[0].isalpha() and line[16]=='15%':
            print(line) #Or whatever it is you want to do
 
No regex really necessary, but r'[a-zA-Z]{2}' could also work

如何使用正则表达式搜索csv文件的ip地址？(How to search a csv file for ip addresses using a regex?)[2023-11-15]

CSV文件是一系列行，每行都有多个字段。你的x变量依次引用每一行; 但一行是一个列表，你不能在列表中使用正则表达式。我不确定你想要做什么; 如果每行只有一个字段，则根本不应该使用csv模块，只需遍历文件中的行即可。 A CSV file is a series of rows, each of which has multiple fields. Your x variable refers to each row in turn; but a row is a list, you can't use ...

Python 2与Python 3正则表达式匹配行为(Python 2 vs Python 3 Regex matching behavior)[2021-04-18]

使用re.UNICODE标志： >>> import re >>> P = re.compile(r'[\s\t]+', flags=re.UNICODE) >>> re.sub(P, u' ', u'\xa0 haha') u' haha' 没有标志，只有ASCII空白符合; \xa0不是ASCII标准的一部分（它是Latin-1码点）。 re.UNICODE标志是Python 3中的默认标志; 如果你想拥有Python 2（字符串）行为，请使用re.ASCII 。请注意，在字符类中包含\t是没有意义 ...

分割CSV文件需要正则表达式帮助(Regex help needed for splitting CSV file)[2022-09-13]

import re DataL = [ '''Grand Total for ATHLET:,,,"1,312 ",,62:58:18,130.62 ,,''', '''Grand Total for SELF:,,,"6,589 ",,397:57:58,708.53 ,,''' ] Pattern = re.compile(r''',(?=[^"']*(?:(?:[^'"]*["'][^"']*){2})*$)''') for (i, d) in enum ...

将正则表达式传递给python的csv模块中的分隔符字段或numpy的genfromtxt / loadtxt？(Pass regex to delimiter field in python's csv module or numpy's genfromtxt / loadtxt?)[2022-12-22]

我担心你要求的三个包中的答案是否定的。但是，您可以直接replace('\t', ',') （或相反）。例如： from StringIO import StringIO # py3k: from io import StringIO import csv with open('./file') as fh: io = StringIO(fh.read().replace('\t', ',')) reader = csv.reader(io) for row in reader: p ...

大熊猫用正则表达式读取csv(pandas read csv with regex)[2022-11-22]

我会将所有这些CSV收集到DataFrames的字典中，结构如下： df['20140803'] - 包含属于所有df_trip_20140803_*.csv CSV文件的连接数据的DF。解： import os import re import glob import pandas as pd fpattern = r'D:\temp\.data\41444939\df_trip_{}_{}.csv' files = glob.glob(fpattern.format('*','*')) dates ...

如何设置SOLR的高亮 (highlight)？

如何设置SOLR的高亮 (highlight)

如何设置SOLR的高亮 (highlight)？

apache solr关键字高亮(highlight)

Solr高亮显示highlight的三种实现

Solr: a custom Search RequestHandler

Custom SOLR Search Components - 2 Dev Tricks

使用Hibernate+solr取代hibernate search

Riak Search

Faceted search

Solr：如何突出显示整个搜索短语？(Solr: how to highlight the whole search phrase only?)

最满意答案

相关问答

在python中使用带有read_csv（）的正则表达式分隔符？(Using regex separators with read_csv() in python?)[2022-04-29]

Python正则表达式可以读取.csv文件中的行(Python regex findall to read line in .csv file)[2022-02-07]

在Python中处理正则表达式匹配CSV记录(Processing regex-matching CSV records in python)[2023-07-13]

在python中使用正则表达式匹配文件名(Matching filenames with regex in python)[2022-02-18]

python 2.7中的正则表达式和csv问题(Regex and csv issues in python 2.7)[2023-05-12]

如何使用正则表达式搜索csv文件的ip地址？(How to search a csv file for ip addresses using a regex?)[2023-11-15]

Python 2与Python 3正则表达式匹配行为(Python 2 vs Python 3 Regex matching behavior)[2021-04-18]

分割CSV文件需要正则表达式帮助(Regex help needed for splitting CSV file)[2022-09-13]

将正则表达式传递给python的csv模块中的分隔符字段或numpy的genfromtxt / loadtxt？(Pass regex to delimiter field in python's csv module or numpy's genfromtxt / loadtxt?)[2022-12-22]

大熊猫用正则表达式读取csv(pandas read csv with regex)[2022-11-22]

相关文章

最新问答