首页 \ 问答 \ Nutch：工作失败了(Nutch: Job Failed)

Nutch：工作失败了(Nutch: Job Failed)

 我在运行nutch时遇到问题以下是我正在运行的命令  
 bin / nutch注入bin / crawl / crawldb bin / urls  
 运行上面的命令后，得到以下错误  
Injector: starting at 2014-04-02 13:02:29
Injector: crawlDb: bin/crawl/crawldb
Injector: urlDir: bin/urls/seed.txt
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 2
Injector: total number of urls injected after normalization and filtering: 0
Injector: Merging injected urls into crawl db.
Injector: overwrite: false
Injector: update: false
Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)
 
 我第一次跑nutch。 我检查了solr，nutch安装得当。  
 以下详细信息来自日志文件  
java.io.IOException: The temporary job-output directory file:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:449)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-04-02 12:54:46,251 ERROR crawl.Injector - Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)

i have problem while running nutch for inject following is the command i am running 
bin/nutch inject bin/crawl/crawldb bin/urls 
after running above command, gets following error 
Injector: starting at 2014-04-02 13:02:29
Injector: crawlDb: bin/crawl/crawldb
Injector: urlDir: bin/urls/seed.txt
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 2
Injector: total number of urls injected after normalization and filtering: 0
Injector: Merging injected urls into crawl db.
Injector: overwrite: false
Injector: update: false
Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)
 
I am running nutch for the first time. i have checked solr, nutch are installed properly. 
below details are from log file 
java.io.IOException: The temporary job-output directory file:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438/_temporary doesn't exist!
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
    at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:449)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2014-04-02 12:54:46,251 ERROR crawl.Injector - Injector: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294)
    at org.apache.nutch.crawl.Injector.run(Injector.java:316)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.nutch.crawl.Injector.main(Injector.java:306)

原文：https://stackoverflow.com/questions/22804590

更新时间：2022-03-16 08:03

最满意答案

 我假设您的类加载是作为某种推送类文件执行的。 不过你应该拉它们。 为了解释我的意思，让我们看一下普通Java类加载的简短示例：  
class Main {
  public static void main(String[] args) {
    new B();
  }
}
class B extends A { }
class A { }
 
 在创建new B() ， Main的类加载器基本上执行classLoader.loadClass("B") 。 此时， B的超级A尚未加载。 同时，类加载器不能知道B有A作为其超类。 因此，类加载器通过在完成B的类加载之前询问自己的classLoader.loadClass("A")来负责加载类。  
 让我们假设类加载器不知道A或B但它有一个方法来显式加载由classLoader.inject(String, byte[])接收的外部实体的类。 这个调用序列然后不会计算：  
classLoader.inject("B", bBytes);
classLoader.inject("A", aBytes);
 
 因为在加载B ，类加载器还不知道A  
 实现自己的类加载器时需要做的是将类存储在某种映射中，并实现类加载器的类加载方法，如：  
protected Class<?> findClass(String name) throws ClassNotFoundException {
  byte[] bytes = map.get(name);
  if (bytes != null) {
    return defineClass(name, bytes, 0, bytes.length);
  } else {
    throw new ClassNotFoundException(name);
  }
}
 
 通过允许类加载器确定加载顺序，您可以完全避免这个问题。  
 更准确地说，你需要在两个步骤中进行操作和加载，其中伪算法看起来像这样：  
Enumeration<JarEntry> entries = jarFile.entries();
MyClassLoader classLoader = new MyClassLoader();
// First we generate ALL classes that the class loader is supposed to load.
// We then make these classes accessible to the class loader.
while (entries.hasMoreElements()) {
  JarEntry element = entries.nextElement();
  if (element.getName().endsWith(".class")) {
     // Class Manipulation via ASM
     classLoader.addClass( ... );
  }
}
// Now that the class loader knows about all classes that are to be loaded
// we trigger the loading process. That way, the class loader can query
// itself about ANY class that it should know.
while (entries.hasMoreElements()) {
  JarEntry element = entries.nextElement();
  if (element.getName().endsWith(".class")) {
     classLoader.loadClass( ... );
  }
}

I assume that your class loading is performed as some sort of pushing class files. You should however rather pull them. To explain what I mean by this, let us look at a short example of normal Java class loading: 
class Main {
  public static void main(String[] args) {
    new B();
  }
}
class B extends A { }
class A { }
 
When creating a new B(), the class loader of Main basically executes classLoader.loadClass("B"). At this point, B's super class A is not yet loaded. At the same time, the class loader cannot know that B has A as its super class. Thus, the class loader takes responsibility for loading the class by asking itself to classLoader.loadClass("A") before the class loading of B is completed. 
Let us assume that the class loader did not know about either A or B but it had a method to explicitly load classes it is receives by an external entity with classLoader.inject(String, byte[]). This calling sequence would then not compute: 
classLoader.inject("B", bBytes);
classLoader.inject("A", aBytes);
 
because while loading B, the class loader would not yet know about A. 
What you need to do when implementing your own class loader is to store the classes in some sort of map and to implement the class loader's class loading method something like: 
protected Class<?> findClass(String name) throws ClassNotFoundException {
  byte[] bytes = map.get(name);
  if (bytes != null) {
    return defineClass(name, bytes, 0, bytes.length);
  } else {
    throw new ClassNotFoundException(name);
  }
}
 
By allowing the class loader to determine the loading order, you avoid this problem altogether. 
To be even more precise, you need to do manipulation and loading in two steps where a pseudo algorithm would look something like this: 
Enumeration<JarEntry> entries = jarFile.entries();
MyClassLoader classLoader = new MyClassLoader();
// First we generate ALL classes that the class loader is supposed to load.
// We then make these classes accessible to the class loader.
while (entries.hasMoreElements()) {
  JarEntry element = entries.nextElement();
  if (element.getName().endsWith(".class")) {
     // Class Manipulation via ASM
     classLoader.addClass( ... );
  }
}
// Now that the class loader knows about all classes that are to be loaded
// we trigger the loading process. That way, the class loader can query
// itself about ANY class that it should know.
while (entries.hasMoreElements()) {
  JarEntry element = entries.nextElement();
  if (element.getName().endsWith(".class")) {
     classLoader.loadClass( ... );
  }
}

Nutch：工作失败了(Nutch: Job Failed)

最满意答案

相关问答

从jar中加载一个类(Load a class from a jar)[2022-02-13]

如何使用.jar文件中的类？(How to use classes from .jar files?)[2023-08-31]

jar不会生成正确的清单文件(jar does not generate correct manifest file)[2023-11-11]

如何在运行时从文件夹或JAR加载类？(How to load Classes at runtime from a folder or JAR?)[2024-02-18]

在.jar文件中找不到或加载主类(Can't find or load main class, in a .jar file)[2022-08-23]

JAR插件实现(JAR plugins implementation)[2024-01-02]

如何以正确的顺序从jar文件加载类(How to load classes from jar file in correct order)[2023-12-06]

Java：从jar加载类和引用的jar(Java: Load classes and referenced jars from a jar)[2023-01-04]

在不同的JAR中加载两个类(Loading two classes in different JARs)[2023-08-15]

Android：如何从JAR文件动态加载类？(Android: How to dynamically load classes from a JAR file?)[2022-05-30]

相关文章

最新问答