云上的BigInsights - 未找到类org.apache.oozie.action.hadoop.SparkMain(BigInsights on cloud - Class org.apache.oozie.action.hadoop.SparkMain not found)
我正在尝试针对Apache Hadoop基本集群的BigInsights执行oozie_spark分支上的spark oozie 示例 。
workflow.xml如下所示:
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkWordCount'> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>${master}</master> <name>Spark-Wordcount</name> <class>org.apache.spark.examples.WordCount</class> <jar>${hdfsSparkAssyJar},${hdfsWordCountJar}</jar> <spark-opts>--conf spark.driver.extraJavaOptions=-Diop.version=4.2.0.0</spark-opts> <arg>${inputDir}/FILE</arg> <arg>${outputDir}</arg> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
configuration.xml:
<configuration> <property> <name>master</name> <value>local</value> </property> <property> <name>queueName</name> <value>default</value> </property> <property> <name>user.name</name> <value>default</value> </property> <property> <name>nameNode</name> <value>default</value> </property> <property> <name>jobTracker</name> <value>default</value> </property> <property> <name>jobDir</name> <value>/user/snowch/test</value> </property> <property> <name>inputDir</name> <value>/user/snowch/test/input</value> </property> <property> <name>outputDir</name> <value>/user/snowch/test/output</value> </property> <property> <name>hdfsWordCountJar</name> <value>/user/snowch/test/lib/OozieWorkflowSparkGroovy.jar</value> </property> <property> <name>oozie.wf.application.path</name> <value>/user/snowch/test</value> </property> <property> <name>hdfsSparkAssyJar</name> <value>/iop/apps/4.2.0.0/spark/jars/spark-assembly.jar</value> </property> </configuration>
但是,我在Yarn日志中看到的错误是:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 13 more
我在火花装配中寻找SparkMain:
$ hdfs dfs -get /iop/apps/4.2.0.0/spark/jars/spark-assembly.jar $ jar tf spark-assembly.jar | grep -i SparkMain
和这里:
$ jar tf /usr/iop/4.2.0.0/spark/lib/spark-examples-1.6.1_IBM_4-hadoop2.7.2-IBM-12.jar | grep SparkMain
我已经看到了另一个与此类似的问题 ,但这个问题特别针对关于云的BigInsights。
I'm trying to execute the spark oozie example on the oozie_spark branch against a BigInsights for Apache Hadoop basic cluster.
The workflow.xml looks like this:
<workflow-app xmlns='uri:oozie:workflow:0.5' name='SparkWordCount'> <start to='spark-node' /> <action name='spark-node'> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <master>${master}</master> <name>Spark-Wordcount</name> <class>org.apache.spark.examples.WordCount</class> <jar>${hdfsSparkAssyJar},${hdfsWordCountJar}</jar> <spark-opts>--conf spark.driver.extraJavaOptions=-Diop.version=4.2.0.0</spark-opts> <arg>${inputDir}/FILE</arg> <arg>${outputDir}</arg> </spark> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
The configuration.xml:
<configuration> <property> <name>master</name> <value>local</value> </property> <property> <name>queueName</name> <value>default</value> </property> <property> <name>user.name</name> <value>default</value> </property> <property> <name>nameNode</name> <value>default</value> </property> <property> <name>jobTracker</name> <value>default</value> </property> <property> <name>jobDir</name> <value>/user/snowch/test</value> </property> <property> <name>inputDir</name> <value>/user/snowch/test/input</value> </property> <property> <name>outputDir</name> <value>/user/snowch/test/output</value> </property> <property> <name>hdfsWordCountJar</name> <value>/user/snowch/test/lib/OozieWorkflowSparkGroovy.jar</value> </property> <property> <name>oozie.wf.application.path</name> <value>/user/snowch/test</value> </property> <property> <name>hdfsSparkAssyJar</name> <value>/iop/apps/4.2.0.0/spark/jars/spark-assembly.jar</value> </property> </configuration>
However, the error I see in the Yarn logs is:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exception invoking main(), java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:234) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:380) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:301) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:187) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:230) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 13 more
I've looked for SparkMain in spark-assembly:
$ hdfs dfs -get /iop/apps/4.2.0.0/spark/jars/spark-assembly.jar $ jar tf spark-assembly.jar | grep -i SparkMain
And here:
$ jar tf /usr/iop/4.2.0.0/spark/lib/spark-examples-1.6.1_IBM_4-hadoop2.7.2-IBM-12.jar | grep SparkMain
I've seen another question similar to this one, but this question is specifically about BigInsights on cloud.
原文:https://stackoverflow.com/questions/38899561