首页 \ 问答 \ Elasticsearch和Lucene文档限制(Elasticsearch and Lucene document limit)

Elasticsearch和Lucene文档限制(Elasticsearch and Lucene document limit)

 根据stats api，在我们的elasticsearch安装中记录的数据显示，实际文档数量大约为2700万。 我们了解到，这种差异来自嵌套文档计数 - stats api显示全部。  
 在Lucene文档中，我们读到了一个分片有20亿硬文档数限制。 我是否应该担心elasticsearch即将达到文档限制？ 或者我应该监视count api的数据吗？ 

Document count in our elasticsearch installation from stats api shows about 700 million when the actual document count is about 27 million from the count api. We understand that this difference is from nested documents count - stats api shows all. 
In Lucene documentation, we read that there is 2 billion hard document count limit for a shard. Should I worry that elasticsearch is about to hit the document limit? Or should I monitor the data from the count api?

原文：https://stackoverflow.com/questions/41018120

更新时间：2024-01-25 17:01

最满意答案

 因此，您希望总结患病（和健康）患者的价值观。  
 所以这里是你如何解决病人（同样的模式将适用于健康）。 而不是Icount += 1考虑将值附加到列表，例如  
i_list = []
for row in csv_data:
    if row and int(row[13]) > 0:
        i_list.append(int(row[13]))
    ...

Icount = len(i_list)
IPavg = sum(i_list)/Icount

So you are looking to sum up the values of ill (and healthy) patients. 
So here's how you can address ill patients (same pattern would apply to healthy). Instead of Icount += 1 consider appending the value to a list, e.g.  
i_list = []
for row in csv_data:
    if row and int(row[13]) > 0:
        i_list.append(int(row[13]))
    ...

Icount = len(i_list)
IPavg = sum(i_list)/Icount

相关问答

显示数据集的平均值，不同的日期/时间范围(Show averages of a dataset, different date/time ranges)[2023-08-21]

我会尝试使用DATEPART函数，这样你就可以得到fldRecordUpdatedTimestamp的不同部分，然后是AVG字段fldConfigItemValue。这可以归结为您的时间戳的单个小时（可能是分钟，在T-SQL中检查DATEPART的 MSDN），因此如果您希望每周获得每日平均值，那么您需要包括： day_fldRecordUpdatedTimestamp week_fldRecordUpdatedTimestamp 这将是每周内每天的平均值。下面的示例显示了每月的平均值 - 请注意，如 ...
选择一个间隔的几个平均值(select several averages of an interval)[2023-07-28]

尝试 SELECT avg(`value`) FROM `table` WHERE timestamp >= NOW() - INTERVAL 7 DAY AND timestamp <= NOW() group by concat(date(`timestamp`), case when hour(`timestamp`) between 0 and 7 then 1 when hour(`timestamp`) betwee ...
如何从CSV文件中获取每列的平均值而不是行？(How to get averages per column not row from a CSV file?)[2022-07-12]

如果你想要每列的平均值，那么在阅读文件时最简单的方法就是一次处理所有这些 - 这并不困难。您没有指定您正在使用的Python版本，但以下内容应该同时适用（尽管可以针对其中一个进行优化）。 import csv NUMCOLS = 13 with open('train.csv') as csvfile: reader = csv.reader(csvfile, delimiter=',') # initialize totals Icount = 0 Hcount = ...
如何在CSVPython的同一行中获得2个平均值(How to get 2 averages in the same row in CSVPython)[2023-07-14]

因此，您希望总结患病（和健康）患者的价值观。所以这里是你如何解决病人（同样的模式将适用于健康）。而不是Icount += 1考虑将值附加到列表，例如 i_list = [] for row in csv_data: if row and int(row[13]) > 0: i_list.append(int(row[13])) ... Icount = len(i_list) IPavg = sum(i_list)/Icount So you are looking ...
SQL查询从聚合平均值中查找最大行(SQL query find max row from the aggregated averages)[2022-06-13]

SELECT t1.* FROM (SELECT b.bid, AVG(s.age) AS avg_age FROM sailor s, boat b WHERE b.sid = s.sid GROUP BY b.bid) t1 LEFT OUTER JOIN (SELECT b.bid, AVG(s.age) AS avg_age FROM sailor s, boat b WHERE b.sid = s.sid GROUP BY b.bid) t2 ON (t1.avg_age < ...
SQL：查找具有相关字段的最大平均值(SQL: Finding Maximum of Averages with related field)[2022-04-04]

如果您的问题是“给我一个支付金额最高的州的名称，包括付款金额”。然后你可以像这样解决它： SELECT provider_state, AVG(average_total_payments) AS average FROM gnomics WHERE drg_definition LIKE '%$search%' GROUP BY provider_state ORDER BY average desc LIMIT 1; If your question is "Give me name of the ...
平均值Excel公式(Average of averages Excel formula)[2023-02-28]

答案是在内部平均值上使用SUBTOTAL() 。这将返回一列平均值，因此外部AVERAGE()将具有正确的行为。 {=AVERAGE(SUBTOTAL(1,OFFSET(A21,-4+1-ROW(OFFSET(A1,0,0,8))+1,0,4)))} 原始解决方案不起作用的原因是因为内部AVERAGE只是取所有数组的总平均值，而不是返回其各个平均值的列。当你需要做这种事情时，我发现这个SUBTOTAL()技巧很有用。 Answer is to use SUBTOTAL() on the inner a ...
单个MySQL查询，基于条件的行平均值(Single MySQL query with row averages based on conditions)[2024-04-05]

是的，它可以在MySQL中完成。我认为它甚至可以作为SELECT语句来做（但这将非常复杂，难以维护并且可能非常慢）。既然您目前还不确定是否可以在MySQL中执行此操作并且在此处询问，我建议使用PHP实现解决方案可能会更好地利用您的时间 - 您可以从答案中剪切和粘贴代码，这可能会出现给出正确的结果，但您是否能够评估解决方案的细微差别？你能在破裂时修好它吗？升级它以适应新功能？如果是我，我会使用MySQL程序将其实现为有限状态机（实际上是3个FSM，每个条件一个）。你没有说出满足条件时你应该做什么 ...
使用Java中的数组计算数据表中行的平均值(Calculate averages of rows in a data table using arrays in Java)[2023-11-27]

如何在我的函数中添加另一个数组来计算行的平均值？一种方法： int[] Example1 = {85, 80, 85}; int[] Example2 = {75, 91, 52}; int[] Example3 = {92, 89, 78}; int[][] examples = {Example1, Example2, Example3}; // new array for (int i = 0; i < examples[0].length; i++) { ...
使用列表中数据框的加权平均值创建新数据框(Creating new dataframe using weighted averages from dataframes within list)[2022-02-24]

诀窍是创建一个适用于单个data.frame的函数，然后使用lapply迭代列表。由于lapply返回一个列表，我们将使用do.call将结果对象组合在一起： foo <- function(data, meanCols = LETTERS[1:2], weightCol = "Weight", otherCols = "Site") { means <- t(sapply(data[, meanCols], weighted.mean, w = data[, weightCol])) sumWe ...

lucene的使用

Elasticsearch介绍

elasticsearch

elasticsearch vs solr

初识Lucene

Lucene学习笔记之八：lucene创建索引的时候对文档加权

Scaling Lucene and Solr

安装elasticsearch

理解Lucene/Solr的缓存

elasticsearch 口水篇（6） Mapping 定义索引

Elasticsearch和Lucene文档限制(Elasticsearch and Lucene document limit)

最满意答案

相关问答

显示数据集的平均值，不同的日期/时间范围(Show averages of a dataset, different date/time ranges)[2023-08-21]

选择一个间隔的几个平均值(select several averages of an interval)[2023-07-28]

如何从CSV文件中获取每列的平均值而不是行？(How to get averages per column not row from a CSV file?)[2022-07-12]

如何在CSVPython的同一行中获得2个平均值(How to get 2 averages in the same row in CSVPython)[2023-07-14]

SQL查询从聚合平均值中查找最大行(SQL query find max row from the aggregated averages)[2022-06-13]

SQL：查找具有相关字段的最大平均值(SQL: Finding Maximum of Averages with related field)[2022-04-04]

平均值Excel公式(Average of averages Excel formula)[2023-02-28]

单个MySQL查询，基于条件的行平均值(Single MySQL query with row averages based on conditions)[2024-04-05]

使用Java中的数组计算数据表中行的平均值(Calculate averages of rows in a data table using arrays in Java)[2023-11-27]

使用列表中数据框的加权平均值创建新数据框(Creating new dataframe using weighted averages from dataframes within list)[2022-02-24]

相关文章

最新问答