成功完成Nutch爬网后,Elasticsearch索引失败(Elasticsearch indexing fails after successful Nutch crawl)
我不确定为什么但Nutch 1.13无法将数据索引到ES(v2.3.3)。 它正在爬行,这很好,但是当它需要索引到ES时它会给我这个错误消息:
Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
就在此之前是这样的:
elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)
我不确定超时是否与作业失败有关?
我已经多次运行Nutch v1.10而没有任何问题,但现在决定升级。 从来没有出现此错误,直到现在,升级。
编辑:仔细检查错误消息后:
Error running: /home/david/tutorials/nutch/nutch-1.13/runtime/local/bin/nutch index -Delastic.server.url=http://localhost:9300/search-index/ searchcrawl//crawldb -linkdb searchcrawl//linkdb searchcrawl//segments/20170519125546
在那个特定的细分市场上似乎失败了,这意味着什么? 我只知道如何使用Nutch的基础知识,我绝不是专家。 是否在链接上失败了?
I'm not sure why but Nutch 1.13 is failing to index the data to ES (v2.3.3). It is crawling, that is fine, but when it comes time to index to ES its giving me this error message:
Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
Right before that is has this:
elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)
I'm not sure exactly if the timeout has anything to do with the job failing?
I've run Nutch v1.10 many times with no problems but decided to upgrade now. Never had this error before until now, with upgrading.
EDIT: After closer inspection of the error message:
Error running: /home/david/tutorials/nutch/nutch-1.13/runtime/local/bin/nutch index -Delastic.server.url=http://localhost:9300/search-index/ searchcrawl//crawldb -linkdb searchcrawl//linkdb searchcrawl//segments/20170519125546
It seems to be failing there, on that particular segment, what does that mean? I only know the basics of how to use Nutch, I'm by no means an expert. Is it failing on a link?
原文:https://stackoverflow.com/questions/44074819
最满意答案
重新排列术语:1 - pchisq(3.841459,1,10.50742)= 0.9并在结果周围包围abs以构造最小化函数:
optim( 1, function(x) abs(pchisq(3.841459, 1, x) - 0.1) ) #------- $par [1] 10.50742 $value [1] 1.740301e-08 $counts function gradient 56 NA $convergence [1] 0 $message NULL
要进行灵敏度分析,可以连续更改其他参数的值:
for( crit.val in seq(2.5, 3.5, by=0.1)) { print( optim( 1, function(x) abs(pchisq(crit.val, 1, x) - 0.1), method="Brent" , lower=0, upper=20)$par)} [1] 8.194852 [1] 8.375145 [1] 8.553901 [1] 8.731204 [1] 8.907135 [1] 9.081764 [1] 9.255156 [1] 9.427372 [1] 9.598467 [1] 9.768491 [1] 9.937492
Rearrange terms in: 1 - pchisq(3.841459, 1, 10.50742) = 0.9 and wrap abs around the result to construct a minimization function:
optim( 1, function(x) abs(pchisq(3.841459, 1, x) - 0.1) ) #------- $par [1] 10.50742 $value [1] 1.740301e-08 $counts function gradient 56 NA $convergence [1] 0 $message NULL
To do a sensitivity analysis, you can serially alter the values of the other parameters:
for( crit.val in seq(2.5, 3.5, by=0.1)) { print( optim( 1, function(x) abs(pchisq(crit.val, 1, x) - 0.1), method="Brent" , lower=0, upper=20)$par)} [1] 8.194852 [1] 8.375145 [1] 8.553901 [1] 8.731204 [1] 8.907135 [1] 9.081764 [1] 9.255156 [1] 9.427372 [1] 9.598467 [1] 9.768491 [1] 9.937492
相关问答
更多-
使用c#进行卡方分布?(Chi-square distribution with c#?)[2023-04-25]
几年前,我将Perl模块Statistics :: Distributions移植到JavaScript。 在其他发行版中,它实现了卡方。 它非常轻巧简单。 您可以在http://statistics-distributions-js.googlecode.com/files/statistics-distributions-001.js找到该实现。 将它移植到C#应该不会太难。 或者您可以尝试http://jint.codeplex.com/之类的内容 ,看看您是否可以直接在.NET上运行JavaScri ... -
跨列自动化卡方(Automate Chi-square across columns)[2019-11-17]
您可以使用lapply循环遍历变量。 myTests <- lapply(data[-length(data)], function(x) chisq.test(table(x, data$m1))) 这将返回一个命名列表,其中changin变量作为每个列表项的名称。 names(myTests) [1] "v1.1" "v1.2" "v1.3" "v1.4" "v1.5" 然后使用myTests[[1]]或myTests[["v1.1"]]访问每个。 这些回归 Pearson's Chi-sq ... -
卡方码的顺序(Chi-square code's order)[2021-10-11]
将其添加到代码中 for o_m in range(20,40): c = 0 for i in range(len(z)): Add it in the code for o_m in range(20,40): c = 0 for i in range(len(z)): -
如何使用dplyr执行功能描述和卡方检验(How to use dplyr to perform descriptives and a chi-square test within function)[2022-07-29]
这个怎么样? desc_chi <- function(dataset, group_var) { group_var <- enquo(group_var) dataset %>% group_by(!!group_var) %>% summarise(n = n()) %>% mutate(chisq_pval = chisq.test(n)$p.value) } mtcars %>% desc_chi(cyl) ... -
问题出现在问题的第二句中: “ S / sig^2具有非中心卡方分布,其自由度= n且非中心性参数= n*mu^2 ” 该非中心性参数不正确。 它应该是n*(mu/sig)^2 。 非中心卡方分布的标准定义是它是具有平均μ和标准差1的正常变量的平方和。 您使用标准偏差sig常规变量计算S 让我们将该分布写为N(mu, sig**2) 。 通过使用正态分布的位置比例属性,我们有 N(mu, sig**2) = mu + sig*N(0, 1) = sig*(mu/sig + N(0,1)) = sig*N(m ...
-
使用ODS EXCLUDE排除您不希望在输出中看到的表。 相反,您可以使用ODS SELECT仅显示感兴趣的表。 表名是您可以从文档或ODS TRACE找到的ODS表名。 Use ODS EXCLUDE to exclude tables you don't want to see in your output. Conversely, you can use ODS SELECT to display only the tables of interest. Table Names are ODS tab ...
-
你想要桌子吗? 尝试这个。 simulated <- quantile(logLi, c(0.8, 0.9, 0.95, 0.99)) chisquare <- qchisq(c(0.8, 0.9, 0.95, 0.99), df = 1) rbind(simulated, chisquare) 该图的代码如下。 a <- hist(logLi, freq=FALSE, xlim = c(0,4), breaks = seq(0, ceiling(max(logLi)), by = ...
-
重新排列术语:1 - pchisq(3.841459,1,10.50742)= 0.9并在结果周围包围abs以构造最小化函数: optim( 1, function(x) abs(pchisq(3.841459, 1, x) - 0.1) ) #------- $par [1] 10.50742 $value [1] 1.740301e-08 $counts function gradient 56 NA $convergence [1] 0 $message NUL ...
-
您应该调试程序并发现循环中存在溢出,k = 149。 对于k = 148,Bruch的值是3.3976725289e + 304。 布鲁赫的下一次计算溢出。 修复就是编码 for i := 1 to k do Bruch := Bruch / (f + 2 * i); Summand := power(chi, 2 * k) * Bruch; 通过此更改,在第156次迭代后得到值IntegralChi(138.609137,4) = 1.76835197E-7 。 请注意,您的计算(即使对于这个简单的 ...
-
除了细胞计数<5问题之外,根据我的经验,统计测试的R和Python实现通常都会默认启用各种更正(应该在基本方法上进行改进)。 关闭修正似乎使得scipy p值与R匹配: scipy.stats.chi2_contingency(np.array([[1, 2], [3, 4]]), correction=False) Out[6]: # p-val = 0.778159 (0.079365079365079388, 0.77815968617616582, 1, array([[ 1.2, 1.8], ...