首页 \ 问答 \ 在Hadoop中获取WordCount程序中的异常(Getting the exception in WordCount Program in Hadoop)

在Hadoop中获取WordCount程序中的异常(Getting the exception in WordCount Program in Hadoop)

 尝试在hadoop上运行第一个程序时，我遇到了这个异常。 （我在版本0.20.2上使用hadoop新API）。 我在网上搜索，当他们没有在配置逻辑中设置MapperClass和ReducerClass时，看起来大多数人都遇到了这个问题。 但我查了一下，看起来代码还可以。 如果有人可以帮助我，我将非常感激。  
 java.io.IOException：键入map中的键不匹配：期望org.apache.hadoop.io.Text，收到org.apache.hadoop.mapred.MapTask上的org.apache.hadoop.io.LongWritable $ MapOutputBuffer.collect（MapTask的.java：871）  
package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

public void Map(LongWritable key,Text value,Context ctx) throws IOException , InterruptedException {
    String line = value.toString();
    for(String word:line.split("\\W+")) {
        if(word.length()> 0){
            ctx.write(new Text(word), new IntWritable(1));
        }
    }
}
}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException,InterruptedException {
 int wordCount = 0;
    for(IntWritable value:values)
    {
        wordCount+=value.get();
    }
    ctx.write(key,new IntWritable(wordCount));
}

}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException{
    if(args.length!=2){
        System.out.println("invalid usage");
        System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(WordCountJob.class);
    job.setJobName("WordCountJob");



    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(WordCountMapper.class);
    job.setReducerClass(WordCountReducer.class);

    //job.setCombinerClass(WordCountReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);


    System.exit(job.waitForCompletion(true) ? 0:1);

}
}

I am facing this exception when trying to run the first program on hadoop. (I am using hadoop new API on version 0.20.2). I searched on web, it looks like most of the people faced this problem when they did not set MapperClass and ReducerClass in the configuration logic. But I checked and it looks the code is ok . I will really appreciate if someone can help me out. 
java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871) 
package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

public void Map(LongWritable key,Text value,Context ctx) throws IOException , InterruptedException {
    String line = value.toString();
    for(String word:line.split("\\W+")) {
        if(word.length()> 0){
            ctx.write(new Text(word), new IntWritable(1));
        }
    }
}
}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException,InterruptedException {
 int wordCount = 0;
    for(IntWritable value:values)
    {
        wordCount+=value.get();
    }
    ctx.write(key,new IntWritable(wordCount));
}

}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException{
    if(args.length!=2){
        System.out.println("invalid usage");
        System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(WordCountJob.class);
    job.setJobName("WordCountJob");



    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(WordCountMapper.class);
    job.setReducerClass(WordCountReducer.class);

    //job.setCombinerClass(WordCountReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);


    System.exit(job.waitForCompletion(true) ? 0:1);

}
}

原文：https://stackoverflow.com/questions/16003237

更新时间：2022-07-24 09:07

最满意答案

 ROWNUM是防止优化器转换和确保类型安全的最安全的方法。 使用ROWNUM使Oracle认为行顺序很重要，并阻止谓词推送和查看合并等内容。  
select *
from
(
   select id, value, rownum --Add ROWNUM for type safety.
   from eav
   where attr like 'sal%' 
)
where to_number(value) > 5000;
 
 还有其他方法可以做到这一点，但没有一个是可靠的。 不要打扰简单的内联视图，常用表表达式， CASE ，谓词排序或提示。 那些常见的方法并不可靠，我看到它们都失败了。  
 
 最好的长期解决方案是改变EAV表，使每种类型都有不同的列，正如我在这个答案中描述的那样。 现在解决这个问题，或者未来的开发人员在编写复杂查询以避免类型错误时会诅咒你的名字。 

ROWNUM is the safest way to prevent optimizer transformations and ensure type safety. Using ROWNUM makes Oracle think the row order matters, and prevents things like predicate pushing and view mergning. 
select *
from
(
   select id, value, rownum --Add ROWNUM for type safety.
   from eav
   where attr like 'sal%' 
)
where to_number(value) > 5000;
 
There are other ways to do this but none of them are reliable. Don't bother with simple inline views, common table expressions, CASE, predicate ordering, or hints. Those common methods are not reliable and I have seen them all fail. 
 
The best long-term solution is to alter the EAV table to have a different column for each type, as I describe in this answer. Fix this now or future developers will curse your name when they have to write complex queries to avoid type errors.

在Hadoop中获取WordCount程序中的异常(Getting the exception in WordCount Program in Hadoop)

最满意答案

相关问答

oracle是不是就是甲骨文啊？[2023-06-07]

最大（计数）甲骨文(max(count) oracle)[2023-04-08]

甲骨文。(Oracle. Calculate sum of one column in subgroup and save the others column in query result)[2022-01-26]

甲骨文。(Oracle. Parameters in query. Wrong name/number of the variable [closed])[2022-04-12]

甲骨文。(Oracle. Missing keyword when using case statement. Error 00905)[2022-08-04]

甲骨文。(Oracle. Preventing merge subquery and main query conditions)[2023-05-23]

甲骨文。(Oracle. Select and function)[2022-03-08]

Amracle与甲骨文(Amcharts with oracle)[2022-07-28]

甲骨文。(Oracle. How to refresh materialized view when don`t have enough space on divece)[2024-01-16]

甲骨文。(Oracle. Create index on a DATE column)[2024-01-03]

相关文章

最新问答