首页 \ 问答 \ 在Hadoop中获取WordCount程序中的异常(Getting the exception in WordCount Program in Hadoop)

在Hadoop中获取WordCount程序中的异常(Getting the exception in WordCount Program in Hadoop)

尝试在hadoop上运行第一个程序时,我遇到了这个异常。 (我在版本0.20.2上使用hadoop新API)。 我在网上搜索,当他们没有在配置逻辑中设置MapperClass和ReducerClass时,看起来大多数人都遇到了这个问题。 但我查了一下,看起来代码还可以。 如果有人可以帮助我,我将非常感激。

java.io.IOException:键入map中的键不匹配:期望org.apache.hadoop.io.Text,收到org.apache.hadoop.mapred.MapTask上的org.apache.hadoop.io.LongWritable $ MapOutputBuffer.collect(MapTask的.java:871)

package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

public void Map(LongWritable key,Text value,Context ctx) throws IOException , InterruptedException {
    String line = value.toString();
    for(String word:line.split("\\W+")) {
        if(word.length()> 0){
            ctx.write(new Text(word), new IntWritable(1));
        }
    }
}
}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException,InterruptedException {
 int wordCount = 0;
    for(IntWritable value:values)
    {
        wordCount+=value.get();
    }
    ctx.write(key,new IntWritable(wordCount));
}

}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException{
    if(args.length!=2){
        System.out.println("invalid usage");
        System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(WordCountJob.class);
    job.setJobName("WordCountJob");



    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(WordCountMapper.class);
    job.setReducerClass(WordCountReducer.class);

    //job.setCombinerClass(WordCountReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);


    System.exit(job.waitForCompletion(true) ? 0:1);

}
}

I am facing this exception when trying to run the first program on hadoop. (I am using hadoop new API on version 0.20.2). I searched on web, it looks like most of the people faced this problem when they did not set MapperClass and ReducerClass in the configuration logic. But I checked and it looks the code is ok . I will really appreciate if someone can help me out.

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871)

package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {

public void Map(LongWritable key,Text value,Context ctx) throws IOException , InterruptedException {
    String line = value.toString();
    for(String word:line.split("\\W+")) {
        if(word.length()> 0){
            ctx.write(new Text(word), new IntWritable(1));
        }
    }
}
}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {

public void reduce(Text key, Iterable<IntWritable> values, Context ctx) throws IOException,InterruptedException {
 int wordCount = 0;
    for(IntWritable value:values)
    {
        wordCount+=value.get();
    }
    ctx.write(key,new IntWritable(wordCount));
}

}


package com.test.wc;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static void main(String args[]) throws IOException, InterruptedException, ClassNotFoundException{
    if(args.length!=2){
        System.out.println("invalid usage");
        System.exit(-1);
    }

    Job job = new Job();
    job.setJarByClass(WordCountJob.class);
    job.setJobName("WordCountJob");



    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(WordCountMapper.class);
    job.setReducerClass(WordCountReducer.class);

    //job.setCombinerClass(WordCountReducer.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);


    System.exit(job.waitForCompletion(true) ? 0:1);

}
}

原文:https://stackoverflow.com/questions/16003237
更新时间:2022-07-24 09:07

最满意答案

ROWNUM是防止优化器转换和确保类型安全的最安全的方法。 使用ROWNUM使Oracle认为行顺序很重要,并阻止谓词推送和查看合并等内容。

select *
from
(
   select id, value, rownum --Add ROWNUM for type safety.
   from eav
   where attr like 'sal%' 
)
where to_number(value) > 5000;

还有其他方法可以做到这一点,但没有一个是可靠的。 不要打扰简单的内联视图,常用表表达式, CASE ,谓词排序或提示。 那些常见的方法并不可靠,我看到它们都失败了。


最好的长期解决方案是改变EAV表,使每种类型都有不同的列,正如我在这个答案中描述的那样。 现在解决这个问题,或者未来的开发人员在编写复杂查询以避免类型错误时会诅咒你的名字。


ROWNUM is the safest way to prevent optimizer transformations and ensure type safety. Using ROWNUM makes Oracle think the row order matters, and prevents things like predicate pushing and view mergning.

select *
from
(
   select id, value, rownum --Add ROWNUM for type safety.
   from eav
   where attr like 'sal%' 
)
where to_number(value) > 5000;

There are other ways to do this but none of them are reliable. Don't bother with simple inline views, common table expressions, CASE, predicate ordering, or hints. Those common methods are not reliable and I have seen them all fail.


The best long-term solution is to alter the EAV table to have a different column for each type, as I describe in this answer. Fix this now or future developers will curse your name when they have to write complex queries to avoid type errors.

相关问答

更多
  • 全球战略嘛,这个你应该懂。 像Coca Cola进入中国的时候还不是要起个中国名字。一般大公司才这样做。另外不是按国家来的,一般是按语言来的,有二三十个就够了,那些生僻的语言,客户少的语言就没有了。
  • 您的查询不起作用,因为having子句中的子查询在时间范围内没有条件,所以它返回的计数比“top”查询中的任何行都高(因为大概员工也签署了合同在过去三个月之前)。 无论如何,更简单的方法是将窗口rank函数与“顶部”查询结合使用: SELECT name, id, nr_contracts_last_3_mths FROM (SELECT b.name AS name, b.id AS id, COUNT(*) AS nr_con ...
  • 您可以通过汇总和部分条件子句执行部分组来完成此操作: with table1 as (select 9 id, 'X' name, 225 sum1, 0.68 sum2, 3 bonus from dual union all select 10 id, 'X' name, 30 sum1, 0.85 sum2, 3 bonus from dual union all select 11 id, 'X' name, 3384.73 sum1, ...
  • OracleCommand无法处理多行存储过程。 您必须将CommandText更改为一行。 (只需删除所有新行)。 我个人有一个用空格替换新行的方法,所以我可以存储格式化的命令但是在将它们放入OracleCommand之前我将它们“展平”。 OracleCommand can't handle stored procedure with multiple lines. You have to change your CommandText to be in one line. (Just remove a ...
  • Damien_The_Unbeliever关于混合大小写样式是正确的,但你根本不需要子查询,而你拥有的那个是后面的两列 - 你无法与单个值进行比较。 你可以这样做: WHERE CYCLE_S_FACT_MAIN.ENDTIME > CASE WHEN TO_NUMBER(TO_CHAR(SYSDATE, 'HH24')) < 6 THEN TRUNC(SYSDATE) + INTERVAL '6' HOUR ELSE TRUNC(SYSDATE) ...
  • ROWNUM是防止优化器转换和确保类型安全的最安全的方法。 使用ROWNUM使Oracle认为行顺序很重要,并阻止谓词推送和查看合并等内容。 select * from ( select id, value, rownum --Add ROWNUM for type safety. from eav where attr like 'sal%' ) where to_number(value) > 5000; 还有其他方法可以做到这一点,但没有一个是可靠的。 不要打扰简单的内联视图,常 ...
  • 用户定义的函数与内置 函数的行为没有任何不同,因此您可以像调用任何其他函数一样调用它 - to_char,to_date,trunc,round等。 您可以在PL / SQL,Java或C中编写用户定义的函数,以提供SQL或SQL内置函数中不可用的功能。 用户定义的函数可以出现在可以出现表达式的SQL语句中。 例如,用户定义的函数可以在以下中使用: SELECT语句的选择列表 ... 所以只需将函数作为查询的一部分调用: select IntervalToSec(RUN_DURATION) from SYS ...
  • 只需要至少格式化您的数据,如下所示: [{ "IN PROGRESS": 20, "child_creator": "pasternok" }, { "SOLVED": 1, "DELAYED": 2, "child_creator": "kropep" }, { "IN PROGRESS": 13, "child_creator": "Kaess" }, { "ON HOLD": 3, "child_creator": "hutape ...
  • 您可以在创建物化视图时尝试使用压缩,但请记住,这可能会对性能产生负面影响。 CREATE MATERIALIZED VIEW MV_TEST COMPRESS REFRESH FAST ON COMMIT AS SELECT * FROM TEST; 我建议在此物化视图中寻求更多空间或限制数据量,也许您可以避免将所有数据存储在源表中。 You can try to use compression when creating your materialized view but keep in mind t ...
  • 是的,您可以利用CREATION_DATE列中的基于功能的索引。 以下面的演示为例,我有一个包含date类型列的表格。 create table t_hotel (creation_date date); insert into t_hotel select sysdate+0.5 from dual connect by level <=100; 100 rows affected insert into t_hotel select sysdate+0.2 from dual connect b ...

相关文章

更多

最新问答

更多
  • h2元素推动其他h2和div。(h2 element pushing other h2 and div down. two divs, two headers, and they're wrapped within a parent div)
  • 创建一个功能(Create a function)
  • 我投了份简历,是电脑编程方面的学徒,面试时说要培训三个月,前面
  • PDO语句不显示获取的结果(PDOstatement not displaying fetched results)
  • Qt冻结循环的原因?(Qt freezing cause of the loop?)
  • TableView重复youtube-api结果(TableView Repeating youtube-api result)
  • 如何使用自由职业者帐户登录我的php网站?(How can I login into my php website using freelancer account? [closed])
  • SQL Server 2014版本支持的最大数据库数(Maximum number of databases supported by SQL Server 2014 editions)
  • 我如何获得DynamicJasper 3.1.2(或更高版本)的Maven仓库?(How do I get the maven repository for DynamicJasper 3.1.2 (or higher)?)
  • 以编程方式创建UITableView(Creating a UITableView Programmatically)
  • 如何打破按钮上的生命周期循环(How to break do-while loop on button)
  • C#使用EF访问MVC上的部分类的自定义属性(C# access custom attributes of a partial class on MVC with EF)
  • 如何获得facebook app的publish_stream权限?(How to get publish_stream permissions for facebook app?)
  • 如何防止调用冗余函数的postgres视图(how to prevent postgres views calling redundant functions)
  • Sql Server在欧洲获取当前日期时间(Sql Server get current date time in Europe)
  • 设置kotlin扩展名(Setting a kotlin extension)
  • 如何并排放置两个元件?(How to position two elements side by side?)
  • 如何在vim中启用python3?(How to enable python3 in vim?)
  • 在MySQL和/或多列中使用多个表用于Rails应用程序(Using multiple tables in MySQL and/or multiple columns for a Rails application)
  • 如何隐藏谷歌地图上的登录按钮?(How to hide the Sign in button from Google maps?)
  • Mysql左连接旋转90°表(Mysql Left join rotate 90° table)
  • dedecms如何安装?
  • 在哪儿学计算机最好?
  • 学php哪个的书 最好,本人菜鸟
  • 触摸时不要突出显示表格视图行(Do not highlight table view row when touched)
  • 如何覆盖错误堆栈getter(How to override Error stack getter)
  • 带有ImageMagick和许多图像的GIF动画(GIF animation with ImageMagick and many images)
  • USSD INTERFACE - > java web应用程序通信(USSD INTERFACE -> java web app communication)
  • 电脑高中毕业学习去哪里培训
  • 正则表达式验证SMTP响应(Regex to validate SMTP Responses)