Elasticsearch Python - 索引分析器和搜索分析器(Elasticsearch Python - Index Analyzer & Search Analyzer)
我正在使用python api - http://elasticsearch-py.readthedocs.org
如何设置更改索引的索引分析器和标记器? 谢谢
我找到了改变索引映射的建议,但没有关于如何从python中做到这一点的文档。
在ElasticSearch中使用Analyzer进行部分搜索显示n-gram-analyzer的设置,但没有代码在python中实现它。
I am using python api - http://elasticsearch-py.readthedocs.org
How can I set change the index analyzer and tokenizer for the index? Thanks
I found suggestions to change the mapping of the index, but there was no documentation on how to do that from python.
Partial Search using Analyzer in ElasticSearch shows settings for n-gram-analyzer but no code to implement it in python.
原文:https://stackoverflow.com/questions/21507165
最满意答案
选项1
isin
df2[~df2.Email.isin(df1.Email)] Email 4 dddd@abc.com 5 dddd@abc.com 6 3333@abc.com
选项2
query
df2.query('Email not in @df1.Email') Email 4 dddd@abc.com 5 dddd@abc.com 6 3333@abc.com
选项3
merge
pd.DataFrame.merge
withindicator=True
,使您可以查看该行来自哪个数据帧。 然后我们可以过滤它。df2.merge( df1, 'outer', indicator=True ).query('_merge == "left_only"').drop('_merge', 1) Email 20 dddd@abc.com 21 dddd@abc.com 22 3333@abc.com
Option 1
isin
df2[~df2.Email.isin(df1.Email)] Email 4 dddd@abc.com 5 dddd@abc.com 6 3333@abc.com
Option 2
query
df2.query('Email not in @df1.Email') Email 4 dddd@abc.com 5 dddd@abc.com 6 3333@abc.com
Option 3
merge
pd.DataFrame.merge
withindicator=True
, enables you to see which dataframe the row came from. We can then filter on it.df2.merge( df1, 'outer', indicator=True ).query('_merge == "left_only"').drop('_merge', 1) Email 20 dddd@abc.com 21 dddd@abc.com 22 3333@abc.com
相关问答
更多-
Pandas - 通过获取其他列的行差异来创建新列(Pandas - Make new columns by taking the row difference of other columns)[2021-06-11]
pd.concat([df, df.diff().add_suffix('_diff')], axis=1) print(df) A B C A_diff B_diff C_diff 0 4 6 1 NaN NaN NaN 1 8 9 6 4.0 3.0 5.0 2 9 3 7 1.0 -6.0 1.0 3 2 2 1 -7.0 -1.0 -6.0 pd.concat([df, ... -
与Pandas Groupby在两列日期之间的工作日(Business days between two columns of dates with Pandas Groupby)[2023-06-10]
以下应该工作 - 首先从日期数字中删除前导零): df = pd.DataFrame(data=[['A', datetime(2016, 1, 7), datetime(2016, 1, 9)], ['A', datetime(2016, 3, 1), datetime(2016, 3, 8)], ['B', datetime(2016, 5, 1), datetime(2016, 5, 10)], ... -
熊猫输出差异列(Pandas output difference columns)[2023-05-29]
使用 In [184]: df.groupby('Id').apply(lambda x: x.columns[x.nunique().ne(1)].tolist()) Out[184]: Id 111 [X, Y, Z] 222 [Name, X, Y] dtype: object 此外,与列名称 In [210]: df.groupby('Id').apply( lambda x: x.columns[x.nunique().ne(1)].tolist() ... -
并排检查两个pandas数据帧的列之间的差异(Check for differences between the columns of two pandas data frames side by side)[2022-04-13]
使用DataFrame.all检查每行的所有值是否为True : print ((df1 == df2).all()) a True b True c True d False e True dtype: bool 详情: print (df1 == df2) a b c d e 0 True True True True True 1 True True True True True 2 Tr ... -
选项1 isin df2[~df2.Email.isin(df1.Email)] Email 4 dddd@abc.com 5 dddd@abc.com 6 3333@abc.com 选项2 query df2.query('Email not in @df1.Email') Email 4 dddd@abc.com 5 dddd@abc.com 6 3333@abc.com 选项3 merge pd.DataFrame.merge with in ...
-
使用indicator进行左连接,该indicator提供有关每行原点的信息,然后您可以根据indicator进行过滤: df1.merge(df2, indicator=True, how="left")[lambda x: x._merge=='left_only'].drop('_merge',1) #State City Population #0 NY Albany 856654 #2 SC Charleston 35323 #4 WV ...
-
试试Series.iteritems 。 import pandas as pd s = pd.Series([1, 2, 3, 4], index=iter('ABCD')) for ind, val in s.iteritems(): print ind, val 打印: A 1 B 2 C 3 D 4 Try Series.iteritems. import pandas as pd s = pd.Series([1, 2, 3, 4], index=iter('ABCD')) f ...
-
首先,您需要将CET时间戳转换为datetime并指定时区: S1 = pd.to_datetime(pd.Series(D1)) T1_cet = pd.DatetimeIndex(S1).tz_localize('Europe/Berlin') 然后将UTC时间戳转换为datetime并指定时区以避免混淆: S2 = pd.to_datetime(pd.Series(D2), unit='s') T2_utc = pd.DatetimeIndex(S1).tz_localize('UTC') 现在将 ...
-
比较列熊猫(Compare columns pandas)[2022-05-31]
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randint(0, 2, (50, 4)), columns=["id1", "id2", "count1", "count2"]) df["compare"] = ((df.id1==df.id2) & (df.count1==df.count2)).astype(int) import pandas as pd import numpy as np df = pd ... -
Pandas DataFrame由两列组成,并获得第一个和最后一个(Pandas DataFrame groupby two columns and get first and last)[2022-10-05]
首先确保您的列是正确的日期时间列: In [11]: df['Time'] = pd.to_datetime(df['Time']) 现在,您可以执行groupby并使用first和last groupby方法使用agg: In [12]: g = df.groupby(['id', 'value']) In [13]: res = g['Time'].agg({'first': 'first', 'last': 'last'}) In [14]: res = g['Time'].agg({'ente ...