在一般情况下，Hadoop 每一个 Reducer 产生一个输出文件，文件以part-r-00000、part-r-00001 的方式进行命名。如果需要人为的控制输出文件的命名或者每一个 Reducer 需要写出多个输出文件时，可以采用MultipleOutputs 类来完成。MultipleOutputs 采用输出记录的键值对（output Key 和 output Value)或者
任意字符串来生成输出文件的名字，文件一般以 name-r-nnnnn 的格式进行命名，其中 name 是程序设置的任意名字；nnnnn 表示分区号。

MultipleOutputs 的使用方式的使用方式：：：：
想要使用 MultipeOutputs，需要完成以下四个步骤：

1. 在 Reducer 中声明 MultipleOutputs 的变量
private MultipleOutputs<NullWritable, Text> multipleOutputs;

2. 在 Reducer 的 setup 函数中进行 MultipleOutputs 的初始化
protected void setup(Context context)throws IOException, InterruptedException {
multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
}

3. 在 reduce 函数中进行输出控制
protected void reduce(Text key, Iterable<Text> values, Context context)throws IOException,
InterruptedException {
for (Text value : values) {
multipleOutputs.write(NullWritable.get(), value, key.toString());
}
}

4. 在 cleanup 函数中关闭输出 MultipleOutputs
protected void cleanup(Context context)throws IOException, InterruptedException {
multipleOutputs.close();
}

注意：multipleOutputs.write(key, value, baseOutputPath)方法的第三个函数表明了该输出所在的目录（相对于用户指定的输出目录）。如果baseOutputPath不包含文件分隔符“/”，那么输出的文件格式为baseOutputPath-r-nnnnn（name-r-nnnnn)；如果包含文件分隔符“/”，例如baseOutputPath=“029070-99999/1901/part”，那么输出文件则为

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

相关问答

hadoop程序运行完成后输出文件夹为空[2022-03-12]

熊猫染芽推翻肚抛w
Hadoop 1输入文件= 1个输出文件，仅限映射(Hadoop 1 input file = 1 output file, map-only)[2023-01-04]

如果关闭推测性执行，则无法阻止您在映射器中手动创建输出文件夹结构/文件，并将记录写入它们（忽略输出上下文/收集器）例如，扩展代码片段（设置方法），您可以执行类似这样的操作（基本上是多个输出正在执行的操作，但假设关闭推测执行以避免文件冲突，其中两个映射任务正在尝试写入相同的文件冲突输出文件）： import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; imp ...
如何控制hadoop流式输出文件的数量(How to control the number of hadoop streaming output files)[2023-09-03]

它看起来像现在你有一个只有地图的流式作业。具有仅地图作业的行为是每个地图任务都有一个输出文件。关于改变这种行为你可以做的不多。您可以通过添加缩减阶段来利用MapReduce的工作方式，以使其具有10,000个缩减器。然后，每个reducer将输出一个文件，所以您剩下10,000个文件。请注意，您的数据记录将在“10,000”中“分散”......它不会只是两个文件连接起来。为此，请在命令行参数中使用-D mapred.reduce.tasks=10000标志。这可能是默认行为，但您也可以将身份 ...
Hadoop MapReduce - 每个输入的一个输出文件(Hadoop MapReduce - one output file for each input)[2023-04-01]

map.input.file环境参数具有映射器正在处理的文件名。在映射器中获取此值，并将其用作映射器的输出键，然后使用单个文件中的所有k / v转到一个缩减器。映射器中的代码。顺便说一句，我使用旧的MR API @Override public void configure(JobConf conf) { this.conf = conf; } @Override. public void map(................) throws IOException { ...
Hadoop - WordCount的结果不是写在输出文件上(Hadoop - Result of WordCount is not writing on output file)[2021-11-30]

Mapper类中有两个map方法。具有@Override注释的那个是实际被覆盖的方法，并且该方法不执行任何操作。因此，映射器中没有任何内容，也没有任何内容进入reducer，因此没有输出。删除使用@Override注释标记的map方法，并使用@Override标记第一个map方法。然后修复任何方法签名问题，它应该工作。 You have two map methods in your Mapper class. The one with the @Override annotation is the ...
即使我将numReducetasks设置为2，Hadoop也只生成一个输出文件(Hadoop makes only one output file even when I set numReducetasks to 2)[2022-10-19]

当你设置不。通过numReducetasks减少器，它只是对框架的一个提示。我不保证你只得到指定的号码。减速器实际上取决于减号。在地图阶段之后获得的分区。而且基于没有。分区你会得到没有。减速器分区基于密钥发生，默认分区程序是散列分区程序。因此，基于散列函数对键进行散列并将其分组。当您谈论如此小的数据时，所有密钥都会转到同一个分区，因为框架会尽最大努力使处理尽可能高效，并为这么小的数据创建多个分区将是一个过度的问题。 When you set no. of reducers through ...
在hadoop中对输出文本文件进行排序，是否有办法查看输出而不对其进行排序？(Sorting the output text file in hadoop, is there a way to view the output without sorting it? or using different sorting method?)[2022-04-04]

我可以查看输出而不进行排序吗？只是 - 它吧 bin/hadoop fs -cat output/part-r-00000 | less 或者将输出文件从HDFS复制到本地FS并使用它 bin/hadoop fs -get output/part-r-00000 /tmp/output 这个命令排序wordcount是否按字母顺序显示所有内容？ sort -k 2 -n -r ：以反向（ -r ）顺序对第二列（ -k 2 ）进行数字排序（ -n ）。假设第二列包含计数，则会将出现次数最多的单词排序 ...
如何提取部分输入文件名以命名输出文件(How to extract partial input filename to name output file)[2021-09-05]

bash字符串函数 > "${1%.*}_output" bash string functions > "${1%.*}_output"
使用Hadoop 0.20+生成多个输出文件(Generating Multiple Output files with Hadoop 0.20+)[2022-03-19]

对MultipleOutputs的支持不在0.20。您需要使用旧版API。它已被添加到0.21，它目前还没有发布为org.apache.hadoop.mapreduce.lib.output.MultipleOutputs。邮件列表中的这个主题讨论了这个问题。 Support for MultipleOutputs isn't in 0.20. You will need to use the older API. It has been added into 0.21 which is curren ...
Hadoop：Reducer将Mapper输出写入输出文件(Hadoop: Reducer writing Mapper output into Output File)[2023-12-09]

您的reduce函数参数应如下所示： public void reduce(Text key, Iterable wtfs, Context context) throws IOException, InterruptedException { 通过定义参数的方式，reduce操作不会获取值列表，因此它只输出从map函数获取的任何输入，因为 sum+ = val.get() 每次都是从0到1，因为形式的每个对分别与减速器分开。此外， ...

知识点

相关文章

最近更新

Hadoop控制输出文件命名

相关问答

hadoop程序运行完成后输出文件夹为空[2022-03-12]

Hadoop 1输入文件= 1个输出文件，仅限映射(Hadoop 1 input file = 1 output file, map-only)[2023-01-04]

如何控制hadoop流式输出文件的数量(How to control the number of hadoop streaming output files)[2023-09-03]

Hadoop MapReduce - 每个输入的一个输出文件(Hadoop MapReduce - one output file for each input)[2023-04-01]

Hadoop - WordCount的结果不是写在输出文件上(Hadoop - Result of WordCount is not writing on output file)[2021-11-30]

即使我将numReducetasks设置为2，Hadoop也只生成一个输出文件(Hadoop makes only one output file even when I set numReducetasks to 2)[2022-10-19]

在hadoop中对输出文本文件进行排序，是否有办法查看输出而不对其进行排序？(Sorting the output text file in hadoop, is there a way to view the output without sorting it? or using different sorting method?)[2022-04-04]

如何提取部分输入文件名以命名输出文件(How to extract partial input filename to name output file)[2021-09-05]

使用Hadoop 0.20+生成多个输出文件(Generating Multiple Output files with Hadoop 0.20+)[2022-03-19]

Hadoop：Reducer将Mapper输出写入输出文件(Hadoop: Reducer writing Mapper output into Output File)[2023-12-09]