我如何构建/运行这个简单的Mahout程序而不会出现异常?(How do I build/run this simple Mahout program without getting exceptions?)
我想运行我在Mahout In Action中找到的这段代码:
package org.help; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.math.DenseVector; import org.apache.mahout.math.NamedVector; import org.apache.mahout.math.VectorWritable; public class SeqPrep { public static void main(String args[]) throws IOException{ List<NamedVector> apples = new ArrayList<NamedVector>(); NamedVector apple; apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple"); apples.add(apple); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path("appledata/apples"); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, VectorWritable.class); VectorWritable vec = new VectorWritable(); for(NamedVector vector : apples){ vec.set(vector); writer.append(new Text(vector.getName()), vec); } writer.close(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf); Text key = new Text(); VectorWritable value = new VectorWritable(); while(reader.next(key, value)){ System.out.println(key.toString() + " , " + value.get().asFormatString()); } reader.close(); } }
我编译它:
$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java
我打开它:
$ jar -cvf SeqPrep.jar -C myjavac/ .
现在我想在我的本地hadoop节点上运行它。 我试过了:
hadoop jar SeqPrep.jar org.help.SeqPrep
但我得到:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
所以我尝试使用libjars参数:
$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar
并得到同样的问题。 我不知道还有什么要尝试。
我最终的目标是能够将hadoop fs上的一个.csv文件读入一个稀疏矩阵,然后将它乘以一个随机向量。
编辑:看起来像Razvan得到它(注意:请参阅下面的另一种方法来做到这一点,不会混乱你的hadoop安装)。 以供参考:
$ find /usr/local/hadoop-1.0.3/. |grep mah /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar
接着:
$hadoop jar SeqPrep.jar org.help.SeqPrep small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
编辑:我试图做到这一点,而不是将mahout jar复制到hadoop lib /
$ rm /usr/local/hadoop-1.0.3/lib/mahout-*
然后当然是:
hadoop jar SeqPrep.jar org.help.SeqPrep Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
当我尝试mahout工作文件时:
$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
如果我尝试包含我制作的.jar文件:
$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar
编辑:显然,我一次只能发送一个jar到hadoop。 这意味着我需要将我创建的类添加到mahout核心作业文件中:
~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup ~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class . ~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class
接着:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
编辑:好吧,现在我可以做到这一点,而不会弄乱我的hadoop安装。 我在之前的编辑中更新错误.jar。 它应该是:
~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class
然后:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
I would like to run this code which I found in Mahout In Action:
package org.help; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.math.DenseVector; import org.apache.mahout.math.NamedVector; import org.apache.mahout.math.VectorWritable; public class SeqPrep { public static void main(String args[]) throws IOException{ List<NamedVector> apples = new ArrayList<NamedVector>(); NamedVector apple; apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple"); apples.add(apple); Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path path = new Path("appledata/apples"); SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path, Text.class, VectorWritable.class); VectorWritable vec = new VectorWritable(); for(NamedVector vector : apples){ vec.set(vector); writer.append(new Text(vector.getName()), vec); } writer.close(); SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf); Text key = new Text(); VectorWritable value = new VectorWritable(); while(reader.next(key, value)){ System.out.println(key.toString() + " , " + value.get().asFormatString()); } reader.close(); } }
I compile it with:
$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java
I jar it:
$ jar -cvf SeqPrep.jar -C myjavac/ .
Now I'd like to run it on my local hadoop node. I've tried:
hadoop jar SeqPrep.jar org.help.SeqPrep
But I get:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
So I tried using the libjars parameter:
$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar
and got the same problem. I don't know what else to try.
My eventual goal is to be able to read a .csv file on the hadoop fs into a sparse matrix and then multiply it by a random vector.
edit: Looks like Razvan got it (note: see below for another way to do this that does not mess with your hadoop installation). For reference:
$ find /usr/local/hadoop-1.0.3/. |grep mah /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar /usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar /usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar
and then:
$hadoop jar SeqPrep.jar org.help.SeqPrep small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
edit: I'm trying to do this without copying the mahout jars into the hadoop lib/
$ rm /usr/local/hadoop-1.0.3/lib/mahout-*
and then of course:
hadoop jar SeqPrep.jar org.help.SeqPrep Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149) Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
and when I try the mahout job file:
$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
If I try to include the .jar file I made:
$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar
edit: Apparently I can only send one jar at a time to hadoop. This means I need to add the class I made into the mahout core job file:
~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup ~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class . ~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class
And then:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
edit: Ok, now I can do it without messing with my hadoop installation. I was updating the .jar wrong in that previous edit. It should be:
~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class
then:
~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}
原文:https://stackoverflow.com/questions/11479600