首页 \ 问答 \ 成功完成Nutch爬网后，Elasticsearch索引失败(Elasticsearch indexing fails after successful Nutch crawl)

成功完成Nutch爬网后，Elasticsearch索引失败(Elasticsearch indexing fails after successful Nutch crawl)

 我不确定为什么但Nutch 1.13无法将数据索引到ES（v2.3.3）。 它正在爬行，这很好，但是当它需要索引到ES时它会给我这个错误消息：  
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
 
 就在此之前是这样的：  
elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)
 
 我不确定超时是否与作业失败有关？  
 我已经多次运行Nutch v1.10而没有任何问题，但现在决定升级。 从来没有出现此错误，直到现在，升级。  
 编辑：仔细检查错误消息后：  
    Error running:
  /home/david/tutorials/nutch/nutch-1.13/runtime/local/bin/nutch index -Delastic.server.url=http://localhost:9300/search-index/ searchcrawl//crawldb -linkdb searchcrawl//linkdb searchcrawl//segments/20170519125546
 
 在那个特定的细分市场上似乎失败了，这意味着什么？ 我只知道如何使用Nutch的基础知识，我绝不是专家。 是否在链接上失败了？ 

I'm not sure why but Nutch 1.13 is failing to index the data to ES (v2.3.3). It is crawling, that is fine, but when it comes time to index to ES its giving me this error message: 
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)
 
Right before that is has this: 
elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)
 
I'm not sure exactly if the timeout has anything to do with the job failing? 
I've run Nutch v1.10 many times with no problems but decided to upgrade now. Never had this error before until now, with upgrading. 
EDIT: After closer inspection of the error message: 
    Error running:
  /home/david/tutorials/nutch/nutch-1.13/runtime/local/bin/nutch index -Delastic.server.url=http://localhost:9300/search-index/ searchcrawl//crawldb -linkdb searchcrawl//linkdb searchcrawl//segments/20170519125546
 
It seems to be failing there, on that particular segment, what does that mean? I only know the basics of how to use Nutch, I'm by no means an expert. Is it failing on a link?

原文：https://stackoverflow.com/questions/44074819

更新时间：2022-03-21 22:03

最满意答案

 重新排列术语：1 - pchisq（3.841459,1,10.50742）= 0.9并在结果周围包围abs以构造最小化函数：  
 optim( 1, function(x) abs(pchisq(3.841459, 1, x)  - 0.1) )
#-------
$par
[1] 10.50742

$value
[1] 1.740301e-08

$counts
function gradient 
      56       NA 

$convergence
[1] 0

$message
NULL
 
 要进行灵敏度分析，可以连续更改其他参数的值：  
for( crit.val in seq(2.5, 3.5, by=0.1)) {
         print( optim( 1, 
                function(x) abs(pchisq(crit.val, 1, x)  - 0.1), 
                method="Brent" , lower=0, upper=20)$par)}
[1] 8.194852
[1] 8.375145
[1] 8.553901
[1] 8.731204
[1] 8.907135
[1] 9.081764
[1] 9.255156
[1] 9.427372
[1] 9.598467
[1] 9.768491
[1] 9.937492

Rearrange terms in: 1 - pchisq(3.841459, 1, 10.50742) = 0.9 and wrap abs around the result to construct a minimization function: 
 optim( 1, function(x) abs(pchisq(3.841459, 1, x)  - 0.1) )
#-------
$par
[1] 10.50742

$value
[1] 1.740301e-08

$counts
function gradient 
      56       NA 

$convergence
[1] 0

$message
NULL
 
To do a sensitivity analysis, you can serially alter the values of the other parameters: 
for( crit.val in seq(2.5, 3.5, by=0.1)) {
         print( optim( 1, 
                function(x) abs(pchisq(crit.val, 1, x)  - 0.1), 
                method="Brent" , lower=0, upper=20)$par)}
[1] 8.194852
[1] 8.375145
[1] 8.553901
[1] 8.731204
[1] 8.907135
[1] 9.081764
[1] 9.255156
[1] 9.427372
[1] 9.598467
[1] 9.768491
[1] 9.937492

成功完成Nutch爬网后，Elasticsearch索引失败(Elasticsearch indexing fails after successful Nutch crawl)

最满意答案

相关问答

使用c＃进行卡方分布？(Chi-square distribution with c#?)[2023-04-25]

跨列自动化卡方(Automate Chi-square across columns)[2019-11-17]

卡方码的顺序(Chi-square code's order)[2021-10-11]

如何使用dplyr执行功能描述和卡方检验(How to use dplyr to perform descriptives and a chi-square test within function)[2022-07-29]

Scipy非中心Chi-Squared随机变量(Scipy Non-central Chi-Squared Random Variable)[2023-07-21]

在SAS，Fisher的确切没有卡方统计？(In SAS, Fisher's exact without Chi-square statistics?)[2022-01-02]

在模拟卡方置信区间内难以绘制卡方（1）的hist(difficulty to plot a hist of chi-square~(1) over a simulated chi-square confidence interval)[2022-03-03]

非中心卡方概率和非中心性参数(non-central chi-square probability and non-centrality parameter)[2023-03-11]

Delphi中卡方分布函数的代码(Code for Chi-square distribution function in Delphi)[2022-03-02]

python和R中卡方检验的不同p值(different p-value for chi-square test in python and R)[2023-10-23]

相关文章

最新问答