知识点
相关文章
更多最近更新
更多Hadoop Java程序-files功能测试
2019-03-28 13:18|来源: 网络
之前一直用Hadoop streaming方式,-file功能非常实用,可以动态上传文件,例如一些配置文件等。之后开始寻找java程序中的-file功能,费了很大功夫,一直没有测试通过。
后来发现GenericOptionsParser能解析一些特有命令参数,并且做相应处理,例如:遇到-files参数时,将文件上传到mapper节点。经过测试,-files命令参数必须在hadoop jar后紧接着,这个可以通过streaming来查看使用规范,如下:
Usage: $HADOOP_HOME/bin/hadoop jar \
$HADOOP_HOME/hadoop-streaming.jar [options]
Options:
-input <path> DFS input file(s) for the Map step
-output <path> DFS output directory for the Reduce step
-mapper <cmd|JavaClassName> The streaming command to run
-combiner <cmd|JavaClassName> The streaming command to run
-reducer <cmd|JavaClassName> The streaming command to run
-file <file> File/dir to be shipped in the Job jar file.
Deprecated. Use generic option "-files" instead
-inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
-outputformat TextOutputFormat(default)|JavaClassName Optional.
-partitioner JavaClassName Optional.
-numReduceTasks <num> Optional.
-inputreader <spec> Optional.
-cmdenv <n>=<v> Optional. Pass env.var to streaming commands
-mapdebug <path> Optional. To run this script when a map task fails
-reducedebug <path> Optional. To run this script when a reduce task fails
-io <identifier> Optional.
-lazyOutput Optional. Lazily create Output
-verbose
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
hadoop 执行java程序也需要遵循该命令参数规范,特别是-D -libjars -files等参数。
测试代码:
package wordcount.com.cn;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
@SuppressWarnings("deprecation")
public class WordCount {
static class SimpleMapper extends Mapper<LongWritable,Text,Text,Text>
{
BufferedReader reader = null;
List<String> lines = new ArrayList<String>(); //简单测试,没有任何业务逻辑
public void setup(Context context) throws IOException
{
FileReader fr = new FileReader("test_upload_file"); //必须和上传文件名一致
reader = new BufferedReader(fr);
String line = null;
while((line = reader.readLine()) != null)
lines.add(line);
System.out.println(lines);
}
@Override
public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException
{
for(String line:lines)
context.write(new Text("key"),new Text(line));
}
}
static class SimpleReducer extends Reducer<Text,Text,Text,Text>
{
public void reduce(Text key, Iterable<Text> values,, Context context)throws IOException, InterruptedException
{
for(Text value: values)
{
context.write(key, value);
}
}
}
/**
* @param args
* @throws IOException
* @throws InterruptedException
* @throws ClassNotFoundException
*/
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
for (String s:otherArgs)
System.out.println(s);
if (otherArgs.length != 2) {
System.err.println("Usage: Wordcount -files test_upload_file input output");
System.exit(2);
}
Job job = new Job(conf);
job.setJarByClass(WordCount.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
job.setNumReduceTasks(0);
job.setMapperClass(SimpleMapper.class);
job.setReducerClass(SimpleReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true)? 0: 1);
}
}
执行测试:
hadoop jar WordCount.jar -files test_upload_file /user/lmc/tmp/input /user/lmc/tmp/output
测试通过,告捷!
相关问答
更多-
关于hadoop的问题,进来看看呀[2023-07-31]
java是执行文件,不是目录 java path默认是java_home/bin/目录 这个目录底下应该 java和javac等文件 -
hadoop hdfs的问题[2021-10-30]
最下面那张图里环境变量设置的那一行多了一个$符号 export JAVA_HOME=/usr/java/jdk1.6.0_35 -
Hadoop LongSumReducer(Hadoop LongSumReducer)[2023-09-05]
它是实现正确的接口还是为reducer实现扩展正确的类。 例外情况表明实现方法中的包差异与使用相比(新旧vso hadoop api) Is it implementing the correct interface or extending the correct class for the reducer implementation. The exception says a package difference in the implementation method required vs the ... -
做了一个全新安装的hadoop并用同一个罐子运行工作,问题就消失了。 似乎是一个错误,而不是编程错误。 Did a fresh installation of hadoop and ran the job with the same jar, the problem disappeared. Seems to be a bug rather than programming errors.
-
Hadoop上的JavaCV(JavaCV on Hadoop)[2021-12-01]
您需要为openCV安装所需的包。 这篇文章介绍如何安装openCV: http : //www.samontab.com/web/2012/06/installing-opencv-2-4-1-ubuntu-12-04-lts/ 您需要的是以下命令: sudo apt-get install build-essential libgtk2.0-dev libjpeg-dev libtiff4-dev libjasper-dev libopenexr-dev cmake python-dev python- ... -
分析Hadoop(Profiling Hadoop)[2022-02-05]
Hadoop有asm 3.2而我使用的是ASM 5.在ASM5中,ClassVisitor是一个超类,而在3.2中它是一个接口。 出于某种原因,错误是Throwable(信任Shevek),catch块只捕获异常。 任何hadoop日志都没有捕获throwable错误。 因此,调试非常困难。 使用jar jar链接修复asm版本问题,现在一切正常。 如果你正在使用Hadoop并且某些东西不起作用并且没有日志显示任何错误,那么请尝试抓住Throwable。 阿伦 Hadoop had asm 3.2 and ... -
您应该添加/usr/lib/hadoop-0.xx/lib找到的所有jar以避免这种类路径问题。 为了给你一个想法,你可以输入hadoop classpath ,它将打印出获取Hadoop jar和所需库所需的类路径。 在你的情况下,你错过了hadoop-common-0.xx.jar ,所以你应该把它添加到classpath中,你应该很好。 You should add all the jars found in /usr/lib/hadoop-0.xx/lib to avoid this kind of ...
-
TaggedWritable类没有空构造函数,因此在应该读取序列化数据的reduce阶段,app会因为无法通过反射创建TaggedWritable键入键而TaggedWritable 。 您应该添加一个空构造函数。 您的地图阶段已成功完成,因为在地图阶段,您的映射器会TaggedWritable创建TaggedWritable类型的键。 This code solves the problem and gives the expected result. It is from here, public st ...
-
假设zipIn是java.util.zip.ZipInputStream ,你不应该迭代地调用getNextEntry而不是读取字节吗? I resolved this issue after doing some changes in my code. In the first part of code, I was trying to unzip all the zip files whereas I should have access the spilts. Hadoop basic, which ...
-
与Hadoop的Graphbuilder(Graphbuilder with Hadoop)[2023-10-22]
看起来你正在使用hadoop的后级版本。 检查您的图形构建器版本所需的hadoop版本,并确保它是您正在运行的版本。 Looks like you're using a back level version of hadoop. Check the version of hadoop that your version of graph builder needs and make sure that's the version you're running.