首页 \ 问答 \ 在hadoop多集群环境中运行nutch时出错(error while running nutch on hadoop multi cluster environment)

在hadoop多集群环境中运行nutch时出错(error while running nutch on hadoop multi cluster environment)

我在hadoop多集群环境中运行nutch。

使用以下命令执行nutch时,Hadoop会抛出错误

$ bin / hadoop jar /home/nutch/nutch/runtime/deploy/nutch-1.5.1.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5

错误:线程“main”中的异常java.io.IOException:不是文件:hdfs:// master:54310 / user / nutch / urls / crawldb at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java: 170)atg.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515)atg.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)at com.bdc.dod.dashboard.BDCQueryStatsViewer .run(BDCQueryStatsViewer.java:829)位于sun的com.bdc.dod.dashboard.BDCQueryStatsViewer.main(BDCQueryStatsViewer.java:796)的org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)。在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)的sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)的java.lang.reflect.Method.invoke上的reflect.NativeMethodAccessorImpl.invoke0(Native Method)方法.java:585)org.apache.hadoop.util.RunJar.main(RunJar.java:155)

我尝试了解决这个问题的可能方法并解决了所有问题,例如在/ local / conf路径等中设置http.agent.name。我之前安装过,它很顺利。

任何人都可以提出解决方案吗?

顺便说一下,我按照链接进行安装和运行。


I am running nutch on hadoop multi cluster environment.

Hadoop is throwing an error when nutch is being executed using the following command

$ bin/hadoop jar /home/nutch/nutch/runtime/deploy/nutch-1.5.1.job org.apache.nutch.crawl.Crawl urls -dir urls -depth 1 -topN 5

Error: Exception in thread "main" java.io.IOException: Not a file: hdfs://master:54310/user/nutch/urls/crawldb at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:170) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) at com.bdc.dod.dashboard.BDCQueryStatsViewer.run(BDCQueryStatsViewer.java:829) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.bdc.dod.dashboard.BDCQueryStatsViewer.main(BDCQueryStatsViewer.java:796) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

I tried with possible ways of solving this and fixed all the issues like setting http.agent.name in /local/conf path etc. And I installed earlier and it was smooth.

Can anybody suggest a solution?

By the way, I followed link for installing and running.


原文:https://stackoverflow.com/questions/13514429
更新时间:2022-03-24 16:03

最满意答案

我真的不明白需要检查我是否在UI或后台线程中。 如果我跳过检查怎么办?

使用Dispatcher并检查您是否在UI或后台线程上的原因是因为WPF要求只能在创建它们的线程上访问控件。 原因是因为控件不是线程安全的。 如果你没有做多线程(即你的所有代码都在主线程上),那么你不必担心这一点。 WinForms也有同样的限制。

如果您尝试从不同于创建它的线程访问控件,您将得到一个InvalidOperationException

此外,下面的代码......并没有真正做任何事情,例如。 设定值。 所以如果我不调试它会做什么?

编译发布版本时, Debug.Assert (和您的VerifyCalledOnUIThread方法)甚至不会出现在代码中,所以不会发生任何事情。


I don't really understand the need to check if I am in the UI or Background thread. what if i skip the checks.

The reason for using the Dispatcher and checking if you're on the UI or background thread is because of WPF's requirement that controls only be accessed on the thread they were created on. The reason for this is because the controls are not thread-safe. If you're not doing multithreading (i.e. all of your code is on the main thread), then you don't have to worry about this. WinForms has this same limitation.

If you try to access a control from a different thread than the one it was created on, you'll get an InvalidOperationException.

Also, the code below...does not really do anything, eg. set values. So if I am not debugging will it do anything at all?

When compiling a release build, Debug.Assert (and your VerifyCalledOnUIThread method) will not even appear in the code, so no, nothing will happen.

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)