使用php实现自动完成的Solr配置(Solr configuration for autocompletion implementation with php)
我如何索引我的数据并在solr中配置solr和我的搜索选项,可以实现具有以下要求的自动完成(如谷歌):
产品: - 我们的产品有标题,描述,id,例如标题:toshiba tecra s1:centrino 1.5 ghz / xp pro / 15.0“tft / 40 gb / 256 mb + 256mb / cd-rw-dvd-rom / lan / wi-fi - 此产品的此产品或字段必须以下列方式编制索引(如果用户开始输入,则无法区分用户搜索searchterm的方式,例如TOSHIBA或tOSHiba)前三个字符“tos”最多20个结果(完整标题(短语)例如“toshiba tecra s1:centrino 1.5 ghz / xp pro / 15.0”tft / 40 gb / 256 mb + 256mb / cd-rw-dvd-rom / lan / wi-fi“)应出现在自动完成框中。 - 如果用户输入两个术语“toshiba tecra”,则搜索结果必须更加精确,并且只显示所有文档,其中包含(连贯的)术语“toshiba tecra”
获得任何提示,使用什么样的tokenizer / searchcomponent等会很棒。
我正在使用solr版本3.5
谢谢oyur想法Ramo
how do i have to index my data and configure solr and my search options in solr, that an autocompletion (like google) with the following requirements is possible:
Products: - We have products with their titles, descriptions, id's, e.g. for the title: toshiba tecra s1: centrino 1.5 ghz/xp pro/15.0" tft/40 gb/256 mb+256mb/cd-rw-dvd-rom/lan/wi-fi - this products or fields of this product has to be indexed in such a way that the following should be possible (no differentation how a user search for the searchterm, e.g. TOSHIBA or tOSHiba) - if a user starts entering the first three characters "tos" max. 20 results (the complete title (phrase) e.g. "toshiba tecra s1: centrino 1.5 ghz/xp pro/15.0" tft/40 gb/256 mb+256mb/cd-rw-dvd-rom/lan/wi-fi") should appear in the autocomplete box. - if a user enters e.g. two terms "toshiba tecra" the searchresult must be more precisly and just all documents should be shown, that contain the (coherent) terms "toshiba tecra"
It would be great to get any hints for this, what kind of tokenizer/searchcomponent etc. to use.
I'm using solr Version 3.5
Thank you for oyur thoughts Ramo
原文:https://stackoverflow.com/questions/8459570
最满意答案
您可以使用
-1
来始终获取最后一部分而不是第二部分。df['c'] = df['b'].apply(lambda x: x.split("'")[-1]) print(df) # a b c # 0 1 ciao ciao # 1 2 hotel hotel # 2 3 l'hotel hotel
但是,请记住,如果您有两个或更多撇号的字符串,这将会制动(但您的要求无论如何都没有指定在这些情况下要做什么)。
You can use
-1
to always get the last part rather than the second part.df['c'] = df['b'].apply(lambda x: x.split("'")[-1]) print(df) # a b c # 0 1 ciao ciao # 1 2 hotel hotel # 2 3 l'hotel hotel
However, keep in mind that this will brake if you have have strings with 2 or more apostrophes (but your requirement doesn't specify what to do in these cases anyway).
相关问答
更多-
使用df.apply更简洁/灵活的方法: df.b = df.b.str[1:].apply('s{}s'.format) print(df) id b 0 1 shis is string1s 1 1 shis is string2s 2 1 shis is string3s 3 1 shis is string4s 并且,要仅替换第一次出现的t ,请使用pd.Series.str.replace : df.b = df.b.str.rep ...
-
我建议不要直接在这里依赖pandas,而是通过打开文件并逐行处理它来构建dict列表并使用它来创建数据帧来解析: with open('yourfile.txt','r') as f: content = f.read().splitlines() state = None l_dict = [] for line in content: if '[edit]' in line: state = line.split('[')[0] else: l ...
-
如何在Pandas的列中删除特殊字符前面的部分字符串?(How to remove part of string ahead of special character in a column in Pandas?)[2024-01-16]
您可以使用-1来始终获取最后一部分而不是第二部分。 df['c'] = df['b'].apply(lambda x: x.split("'")[-1]) print(df) # a b c # 0 1 ciao ciao # 1 2 hotel hotel # 2 3 l'hotel hotel 但是,请记住,如果您有两个或更多撇号的字符串,这将会制动(但您的要求无论如何都没有指定在这些情况下要做什么)。 You can use -1 t ... -
你可以使用replace df['column1'].str.replace(r"\{.*\}","") Out[385]: 0 a 1 b 2 c 3 d Name: column1, dtype: object You can using replace df['column1'].str.replace(r"\{.*\}","") Out[385]: 0 a 1 b 2 c 3 d Name: column1, dtype: object
-
如何在pandas数据框列中的数值之前删除字符串?(How to remove strings before a numeric value in a pandas dataframe column?)[2022-03-07]
让我们使用正则表达式并extract : df['Column A'] = df['Column A'].str.extract(r'(\d+.+$)') 输出: 0 251 St. Louis Apt.54 1 123 Orange Drive 2 171 Poplar street 3 11th street 4 77 yorkshire avenue Name: Column A, dtype: object 正则表达式状态得到一组字 ... -
为什么你会首先以这种方式读取数据,难道你不能把它读成两列吗? 但无论如何,这可以做到,看看这个: In [35]: df=pd.DataFrame({'Consultation':['CONSULTATION 15.00', 'CONSULTATION 10.00', 'CONSULTATION 18.00', 'CONSULTATION 0.00', 'CONSULTATION 18.00']}) In [36]: import re In [37] ...
-
//Here you will have 4 elements $parts = explode ('/', $string); //this will glue the first 3 elements list ($first, $second, $third) = $parts; //Here you can see the desired result var_dump (implode ('/',array($first, $second, $third))); 产量 string ...
-
如何使用jquery删除字符串中的以下符号/特殊字符¶(How to remove following symbol/special character in string using jquery ¶)[2023-06-24]
var s = "This is a string that contains ¶, a special character."; s = s.replace(/¶/g, ""); 产量: "This is a string that contains , a special character." 这将删除所有出现的角色。 没有必要的jQuery - 只是vanilla浏览器提供的JavaScript。 https://developer.mozilla.org/en/JavaScript/Refer ... -
Pandas DataFrame:删除非数字字符后的所有内容(Pandas DataFrame: Remove everything after a non-digit character)[2022-06-04]
你可以使用extract : df.result = df.result.str.extract('(\d+)', expand=False) print (df) time result 1 09:00 52 2 10:00 62 3 11:00 57 4 12:00 30 5 13:00 46 You can use extract: df.result = df.result.str.extract('(\d+)', expand=False ... -
那不是常规的撇号。 你需要更像这样的东西。 mystring = mystring.Replace("\x92", ""); That is not a regular apostrophe. You need something more like this. mystring = mystring.Replace("\x92", "");