首页 \ 问答 \ 我如何构建/运行这个简单的Mahout程序而不会出现异常?(How do I build/run this simple Mahout program without getting exceptions?)

我如何构建/运行这个简单的Mahout程序而不会出现异常?(How do I build/run this simple Mahout program without getting exceptions?)

我想运行我在Mahout In Action中找到的这段代码:

package org.help;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

public class SeqPrep {

    public static void main(String args[]) throws IOException{

        List<NamedVector> apples = new ArrayList<NamedVector>();

        NamedVector apple;

        apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple");        

        apples.add(apple);

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path("appledata/apples");

        SequenceFile.Writer writer = new SequenceFile.Writer(fs,  conf, path, Text.class, VectorWritable.class);

        VectorWritable vec = new VectorWritable();
        for(NamedVector vector : apples){
            vec.set(vector);
            writer.append(new Text(vector.getName()), vec);
        }
        writer.close();

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf);

        Text key = new Text();
        VectorWritable value = new VectorWritable();
        while(reader.next(key, value)){
            System.out.println(key.toString() + " , " + value.get().asFormatString());
        }
        reader.close();

    }

}

我编译它:

$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java

我打开它:

$ jar -cvf SeqPrep.jar -C myjavac/ .

现在我想在我的本地hadoop节点上运行它。 我试过了:

 hadoop jar SeqPrep.jar org.help.SeqPrep

但我得到:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

所以我尝试使用libjars参数:

$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar

并得到同样的问题。 我不知道还有什么要尝试。

我最终的目标是能够将hadoop fs上的一个.csv文件读入一个稀疏矩阵,然后将它乘以一个随机向量。

编辑:看起来像Razvan得到它(注意:请参阅下面的另一种方法来做到这一点,不会混乱你的hadoop安装)。 以供参考:

$ find /usr/local/hadoop-1.0.3/. |grep mah
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar

接着:

$hadoop jar SeqPrep.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

编辑:我试图做到这一点,而不是将mahout jar复制到hadoop lib /

$ rm /usr/local/hadoop-1.0.3/lib/mahout-*

然后当然是:

hadoop jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

当我尝试mahout工作文件时:

$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

如果我尝试包含我制作的.jar文件:

$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar

编辑:显然,我一次只能发送一个jar到hadoop。 这意味着我需要将我创建的类添加到mahout核心作业文件中:

~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup

~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class .

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class

接着:

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep

编辑:好吧,现在我可以做到这一点,而不会弄乱我的hadoop安装。 我在之前的编辑中更新错误.jar。 它应该是:

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class

然后:

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

I would like to run this code which I found in Mahout In Action:

package org.help;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.math.DenseVector;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.VectorWritable;

public class SeqPrep {

    public static void main(String args[]) throws IOException{

        List<NamedVector> apples = new ArrayList<NamedVector>();

        NamedVector apple;

        apple = new NamedVector(new DenseVector(new double[]{0.11, 510, 1}), "small round green apple");        

        apples.add(apple);

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path("appledata/apples");

        SequenceFile.Writer writer = new SequenceFile.Writer(fs,  conf, path, Text.class, VectorWritable.class);

        VectorWritable vec = new VectorWritable();
        for(NamedVector vector : apples){
            vec.set(vector);
            writer.append(new Text(vector.getName()), vec);
        }
        writer.close();

        SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("appledata/apples"), conf);

        Text key = new Text();
        VectorWritable value = new VectorWritable();
        while(reader.next(key, value)){
            System.out.println(key.toString() + " , " + value.get().asFormatString());
        }
        reader.close();

    }

}

I compile it with:

$ javac -classpath :/usr/local/hadoop-1.0.3/hadoop-core-1.0.3.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar:/home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -d myjavac/ SeqPrep.java

I jar it:

$ jar -cvf SeqPrep.jar -C myjavac/ .

Now I'd like to run it on my local hadoop node. I've tried:

 hadoop jar SeqPrep.jar org.help.SeqPrep

But I get:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

So I tried using the libjars parameter:

$ hadoop jar SeqPrep.jar org.help.SeqPrep -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar -libjars /home/hduser/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-sources.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT.jar -libjars /home/hduser/mahout/trunk/math/target/mahout-math-0.8-SNAPSHOT-sources.jar

and got the same problem. I don't know what else to try.

My eventual goal is to be able to read a .csv file on the hadoop fs into a sparse matrix and then multiply it by a random vector.

edit: Looks like Razvan got it (note: see below for another way to do this that does not mess with your hadoop installation). For reference:

$ find /usr/local/hadoop-1.0.3/. |grep mah
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-job.jar
/usr/local/hadoop-1.0.3/./lib/mahout-core-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-sources.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT-tests.jar
/usr/local/hadoop-1.0.3/./lib/mahout-math-0.8-SNAPSHOT.jar

and then:

$hadoop jar SeqPrep.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

edit: I'm trying to do this without copying the mahout jars into the hadoop lib/

$ rm /usr/local/hadoop-1.0.3/lib/mahout-*

and then of course:

hadoop jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/math/Vector
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

and when I try the mahout job file:

$hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

If I try to include the .jar file I made:

$ hadoop jar ~/mahout/trunk/core/target/mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: SeqPrep.jar

edit: Apparently I can only send one jar at a time to hadoop. This means I need to add the class I made into the mahout core job file:

~/mahout/trunk/core/target$ cp mahout-core-0.8-SNAPSHOT-job.jar mahout-core-0.8-SNAPSHOT-job.jar_backup

~/mahout/trunk/core/target$ cp ~/workspace/seqprep/bin/org/help/SeqPrep.class .

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar SeqPrep.class

And then:

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

Exception in thread "main" java.lang.ClassNotFoundException: org.help.SeqPrep

edit: Ok, now I can do it without messing with my hadoop installation. I was updating the .jar wrong in that previous edit. It should be:

~/mahout/trunk/core/target$ jar uf mahout-core-0.8-SNAPSHOT-job.jar org/help/SeqPrep.class

then:

~/mahout/trunk/core/target$ hadoop jar mahout-core-0.8-SNAPSHOT-job.jar org.help.SeqPrep

small round green apple , small round green apple:{0:0.11,1:510.0,2:1.0}

原文:https://stackoverflow.com/questions/11479600
更新时间:2023-07-01 11:07

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)