首页 \ 教程 \ hadoop

知识点

hadoop

实现Hadoop中的机架感知

Hadoop中的datanode起不起来

Hadoop中的VersionInfo类

Hadoop中的RPC实现(概述)

Hadoop中的Writable分析

Hadoop 中的 ClassNotFoundException

Hadoop中RPC机制

Hadoop中的Speculative Task

Hadoop中InterfaceAudience 注解

Hadoop 中利用 MapReduce 读写 MySQL 数据

ProtocolBuffer 结合 LZO在 Hadoop中的使用

Hadoop中的pi值计算

Hadoop中Reduce任务的执行框架

Hadoop集群中增加新节点

Hadoop 中 IPC 的源码分析

Hadoop中DBInputFormat和DBOutputFormat使用

2019-03-28 12:56|来源: 网络

一、背景

为了方便MapReduce直接访问关系型数据库（Mysql,Oracle），Hadoop提供了DBInputFormat和DBOutputFormat两个类。通过DBInputFormat类把数据库表数据读入到HDFS，根据DBOutputFormat类把MapReduce产生的结果集导入到数据库表中。

推荐阅读：

Hadoop 中利用 MapReduce 读写 MySQL 数据 http://www.linuxidc.com/Linux/2013-07/88117.htm

二、技术细节

1、DBInputFormat（Mysql为例），先创建表:

CREATE TABLE studentinfo (

id INTEGER NOT NULL PRIMARY KEY,

name VARCHAR(32) NOT NULL);2、由于0.20版本对DBInputFormat和DBOutputFormat支持不是很好，该例用了0.19版本来说明这两个类的用法。3、DBInputFormat用法如下：

public class DBInput {
// DROP TABLE IF EXISTS `hadoop`.`studentinfo`;
// CREATE TABLE studentinfo (
// id INTEGER NOT NULL PRIMARY KEY,
// name VARCHAR(32) NOT NULL);

public static class StudentinfoRecord implements Writable, DBWritable {
int id;
String name;
public StudentinfoRecord() {

}
public void readFields(DataInput in) throws IOException {
this.id = in.readInt();
this.name = Text.readString(in);
}
public void write(DataOutput out) throws IOException {
out.writeInt(this.id);
Text.writeString(out, this.name);
}
public void readFields(ResultSet result) throws SQLException {
this.id = result.getInt(1);
this.name = result.getString(2);
}
public void write(PreparedStatement stmt) throws SQLException {
stmt.setInt(1, this.id);
stmt.setString(2, this.name);
}
public String toString() {
return new String(this.id + " " + this.name);
}
}
public class DBInputMapper extends MapReduceBase implements
Mapper<LongWritable, StudentinfoRecord, LongWritable, Text> {
public void map(LongWritable key, StudentinfoRecord value,
OutputCollector<LongWritable, Text> collector, Reporter reporter)
throws IOException {
collector.collect(new LongWritable(value.id), new Text(value
.toString()));
}
}
public static void main(String[] args) throws IOException {
JobConf conf = new JobConf(DBInput.class);
DistributedCache.addFileToClassPath(new Path(
"/lib/mysql-connector-java-5.1.0-bin.jar"), conf);

conf.setMapperClass(DBInputMapper.class);
conf.setReducerClass(IdentityReducer.class);

conf.setMapOutputKeyClass(LongWritable.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(LongWritable.class);
conf.setOutputValueClass(Text.class);

conf.setInputFormat(DBInputFormat.class);
FileOutputFormat.setOutputPath(conf, new Path("/hua01"));
DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",
"jdbc:mysql://192.168.3.244:3306/hadoop", "hua", "hadoop");
String[] fields = { "id", "name" };
DBInputFormat.setInput(conf, StudentinfoRecord.class, "studentinfo",
null, "id", fields);

JobClient.runJob(conf);
}
}

a)StudnetinfoRecord类的变量为表字段，实现Writable和DBWritable两个接口。

实现Writable的方法：

public void readFields(DataInput in) throws IOException {
this.id = in.readInt();
this.name = Text.readString(in);
}
public void write(DataOutput out) throws IOException {
out.writeInt(this.id);
Text.writeString(out, this.name);
}

实现DBWritable的方法：

public void readFields(ResultSet result) throws SQLException {
this.id = result.getInt(1);
this.name = result.getString(2);
}
public void write(PreparedStatement stmt) throws SQLException {
stmt.setInt(1, this.id);
stmt.setString(2, this.name);
}

b)读入Mapper的value类型是StudnetinfoRecord。

c)配置如何连入数据库，读出表studentinfo数据。

DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",
"jdbc:mysql://192.168.3.244:3306/hadoop", "hua", "hadoop");
String[] fields = { "id", "name" };
DBInputFormat.setInput(conf, StudentinfoRecord.class, "studentinfo", null, "id", fields);

接下来请看： http://www.linuxidc.com/Linux/2013-07/88119p2.htm

知识点

相关文章

最近更新

Hadoop中DBInputFormat和DBOutputFormat使用

相关问答

oracle 中的 nulls last 在 hadoop 的 hive 上怎么写。。[2022-05-08]

请问hadoop中metrics的作用？[2022-05-05]

HTrace在Hadoop 2.7.3中(HTrace in Hadoop 2.7.3)[2022-03-02]

Hadoop中的ClassNotFoundException(ClassNotFoundException in Hadoop)[2022-01-12]

Eclipse类未找到异常，即使jar包含在类路径中也是如此(Eclipse Class not found Exception even when jars are included in classpath)[2023-06-05]

矩阵乘法在Hadoop中的实际应用(Practical applications of Matrix multiplication in Hadoop)[2023-09-24]

hadoop namenode端口正在使用中(hadoop namenode port in use)[2022-03-17]

Hadoop和RDBMS(Hadoop and RDBMS)[2023-09-10]

运行Hadoop DbCountPageView.java(Running Hadoop DbCountPageView.java)[2022-09-27]

在hadoop中阻塞池(Block pool in hadoop)[2022-11-16]

知识点

相关文章

最近更新

Hadoop中DBInputFormat和DBOutputFormat使用

相关问答

oracle 中的 nulls last 在 hadoop 的 hive 上 怎么写。。[2022-05-08]

请问hadoop中metrics的作用？[2022-05-05]

HTrace在Hadoop 2.7.3中(HTrace in Hadoop 2.7.3)[2022-03-02]

Hadoop中的ClassNotFoundException(ClassNotFoundException in Hadoop)[2022-01-12]

Eclipse类未找到异常，即使jar包含在类路径中也是如此(Eclipse Class not found Exception even when jars are included in classpath)[2023-06-05]

矩阵乘法在Hadoop中的实际应用(Practical applications of Matrix multiplication in Hadoop)[2023-09-24]

hadoop namenode端口正在使用中(hadoop namenode port in use)[2022-03-17]

Hadoop和RDBMS(Hadoop and RDBMS)[2023-09-10]

运行Hadoop DbCountPageView.java(Running Hadoop DbCountPageView.java)[2022-09-27]

在hadoop中阻塞池(Block pool in hadoop)[2022-11-16]

oracle 中的 nulls last 在 hadoop 的 hive 上怎么写。。[2022-05-08]