首页 \ 教程 \ hadoop

知识点

hadoop

Hadoop HelloWord - 排序

Hadoop系统操作类FileSystem

Hadoop HelloWorld Examples - 单表连接

Hadoop 利用FileSystem API 执行hadoop文件读写操作

第一个java程序 helloword

Hadoop 应用总结

使用Java API操作Hadoop文件

Hadoop状态页面的Browse the filesystem链接无效的问题

Hadoop安装及使用

Hadoop的FileSystem 文件系统实现上传下载文件

Hadoop in Action

原生态在Hadoop上运行Java程序

Hadoop Combiner 操作

Hadoop是什么？Hadoop初认识

Hadoop 异常处理

Hadoop HelloWord Examples -对Hadoop FileSystem进行操作 - 基于Java

2019-03-28 12:55|来源: 网络

我之前对Hadoop的各种文件操作都是基于命令行的，但是进阶后，经常需要直接从java的代码中对HDFS进行修改。今天来练习下。

一个简单的demo，将hdfs的一个文件的内容拷贝到另外hdfs一个文件

相关阅读：

《Hadoop实战》中文版+英文文字版+源码【PDF】 http://www.linuxidc.com/Linux/2012-10/71901.htm

Hadoop HelloWorld Examples - 单表连接 http://www.linuxidc.com/Linux/2013-08/89374.htm

import java.util.*;
import java.io.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.io.IOUtils;

public class ShortestPath {

public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));

//The two lines' code below is quite useful when debugging Configuration, see reference[3].
//System.out.println(conf.getRaw("fs.default.name"));
//System.out.println(conf.toString());

FileSystem fs = FileSystem.get(conf);

FSDataInputStream in= fs.open(new Path(fs.getWorkingDirectory()+"/input/data"));

BufferedReader br = new BufferedReader(new InputStreamReader(in));

FSDataOutputStream out = fs.create(new Path(fs.getWorkingDirectory() +"/testInput/copyData.txt"));

String str = br.readLine();
while(str!=null)
{
out.writeBytes(str);
out.writeBytes("\n");
str = br.readLine();
}
out.close();
br.close();
}
}

以上的拷贝操作也可以通过IOUtils来完成，例如：

import java.util.*;
import java.io.*;

public class ShortestPath {

public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));

//System.out.println(conf.getRaw("fs.default.name"));
//System.out.println(conf.toString());

FileSystem fs = FileSystem.get(conf);

FSDataInputStream in= fs.open(new Path(fs.getWorkingDirectory()+"/input/data"));

FSDataOutputStream out = fs.create(new Path(fs.getWorkingDirectory() +"/testInput/copyData.txt"));

IOUtils.copyBytes(in, out, conf);

in.close();
out.close();
}
}

上面的

conf.addResource(new Path("/usr/local/hadoop/conf/core-site.xml"));

这行代码让我挺困惑的，我一直以为Configuration是自己在构造函数的时候就自动载入这些默认的core-site.xml之类，但是看来不是。而且调用Configuration的toString()函数后显示它载入了多个core-site.xml，更加困惑。菜鸟对配置文件不熟悉，知道的兄弟讲讲。

System.out.println(conf.toString());

其他更多的文件操作，比如删除等，可以参考reference[1,2]，基本大同小异。

Reference

(1) Hadoop: The Definitive Guide【PDF版】 http://www.linuxidc.com/Linux/2012-01/51182.htm

(2)http://eclipse.sys-con.com/node/1287801/mobile

(3)http://www.opensourceconnections.com/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception/

更多Hadoop相关信息见Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13

相关问答

关于hadoop的问题，进来看看呀[2023-07-31]

java是执行文件，不是目录 java path默认是java_home/bin/目录这个目录底下应该 java和javac等文件
hadoop的MapReduce程序运行操作问题[2022-03-24]

都可以，简单的直接用txt打开java文件，写好后打包成class文件，就可以运行了。你看他原来在哪里放class文件的，你就放在那里
hadoop上的FileNotFoundException(FileNotFoundException on hadoop)[2022-04-04]

做了一个全新安装的hadoop并用同一个罐子运行工作，问题就消失了。似乎是一个错误，而不是编程错误。 Did a fresh installation of hadoop and ran the job with the same jar, the problem disappeared. Seems to be a bug rather than programming errors.
hadoop示例没有在亚马逊ec2上运行(hadoop examples not running on amazon ec2)[2022-02-23]

我认为，问题是，54.235.101.85被认为是一个公共IP地址。在所有节点中使用ifconfig获取IP地址列表并检查以10.xxx/172.xxx/192.xxx开头的IP如果找到，请相应地修改所有节点中的配置文件。 I think, the problem is, 54.235.101.85 is suppose to be a public IP address. Use ifconfig in all the nodes to get a list of IP address and chec ...
Hadoop在我的JAR上提供SCDynamicStore，但在hadoop-examples.jar上没有(Hadoop giving SCDynamicStore on my JAR but not on hadoop-examples.jar)[2023-10-12]

最简单的答案是将项目转换为Maven并在POM中包含gson依赖项。现在， mvn package获取所有必需的依赖项，并创建一个JAR文件，其中包含完成集群中作业所需的所有内容。 The easiest answer was to convert the project to Maven and include a gson dependency in the POM. Now mvn package picks up all the necessary dependencies and creates ...
GoogleHadoopFileSystem无法强制转换为hadoop FileSystem？(GoogleHadoopFileSystem cannot be cast to hadoop FileSystem?)[2022-09-14]

简答实际上它与IsolatedClientLoader有关，我们已经找到了根本原因并验证了修复。我提交了https://issues.apache.org/jira/browse/SPARK-9206来跟踪这个问题，并通过简单的修复从我的fork成功构建了一个干净的Spark tarball： https ： //github.com/apache/spark /拉/ 7549 有一些短期选择：现在使用Spark 1.3.1。在bdutil部署中，使用HDFS作为默认文件系统（ - --defaul ...
hadoop MapReduce例外与多个节点(hadoop MapReduce exception with multiple nodes)[2022-08-25]

在启动后namenode脱离安全模式之前，Job正在启动。在namenode离开safemode后启动作业将解决此问题。 Job is being kicked off before namenode is out of safemode after startup. Starting job after namenode leaves safemode will fix the issue.
Wordcount示例hadoop(Wordcount example hadoop)[2022-11-10]

这可能发生在作业仅检测到本地文件系统的情况下，它使用LocalFileSystem API与本地文件系统中的文件进行交互。请参考以下链接，使用MiniDFSCluster单元测试hadoop hdfs着作这是我们在开发环境中开发的mapreduce / hdfs代码的单元测试选项之一。虽然在hadoop clsuter中部署相同的代码，但输入文件将在HDFS位置。 This probably happens in the scenario where the job only detects the ...
与Hadoop的Graphbuilder(Graphbuilder with Hadoop)[2023-10-22]

看起来你正在使用hadoop的后级版本。检查您的图形构建器版本所需的hadoop版本，并确保它是您正在运行的版本。 Looks like you're using a back level version of hadoop. Check the version of hadoop that your version of graph builder needs and make sure that's the version you're running.
本地文件系统上的Hadoop(Hadoop on Local FileSystem)[2022-01-08]

您可以从mapred-site.xml文件中删除fs.default.name值 - 这应该只在core-site.xml文件中。如果要在本地文件系统上以伪模式运行，通常通过在所谓的本地模式下运行来实现 - 通过将core-site.xml中的fs.default.name值设置为file：///（您目前已将其配置为hdfs：// localhost：54310）。您看到的堆栈跟踪是辅助名称节点启动时 - 在“本地模式”下运行时不需要这样，因为没有用于2NN的fsimage或编辑文件。修复core-s ...

知识点

相关文章

最近更新

Hadoop HelloWord Examples -对Hadoop FileSystem进行操作 - 基于Java

相关问答

关于hadoop的问题，进来看看呀[2023-07-31]

hadoop的MapReduce程序运行操作问题[2022-03-24]

hadoop上的FileNotFoundException(FileNotFoundException on hadoop)[2022-04-04]

hadoop示例没有在亚马逊ec2上运行(hadoop examples not running on amazon ec2)[2022-02-23]

Hadoop在我的JAR上提供SCDynamicStore，但在hadoop-examples.jar上没有(Hadoop giving SCDynamicStore on my JAR but not on hadoop-examples.jar)[2023-10-12]

GoogleHadoopFileSystem无法强制转换为hadoop FileSystem？(GoogleHadoopFileSystem cannot be cast to hadoop FileSystem?)[2022-09-14]

hadoop MapReduce例外与多个节点(hadoop MapReduce exception with multiple nodes)[2022-08-25]

Wordcount示例hadoop(Wordcount example hadoop)[2022-11-10]

与Hadoop的Graphbuilder(Graphbuilder with Hadoop)[2023-10-22]

本地文件系统上的Hadoop(Hadoop on Local FileSystem)[2022-01-08]