知识点
相关文章
更多最近更新
更多Linux系统下运行基于本地的Hadoop
2019-03-28 14:11|来源: 网络
感觉,又不同于在Windows下使用Cygwin模拟Linux环境下运行
Hadoop。在Linux下,如果权限不够,根本就不可能让你运行的。
当然,使用root用户没有问题了,看看我的运行过程。我使用的是hadoop-0.18.0版本的。
首先,修改Hadoop配置文件hadoop-env.sh,设置JAVA_HOME:
其次,切换到root用户,并通过ssh登录到127.0.0.1:
接着,准备输入数据文件,在hadoop-0.18.0目录下面新建一个目录my-input,里面新建了7个TXT文件,文件内容就是使用空格分隔的英文单词。
然后,切换到hadoop-0.18.0目录下面,并运行WordCount统计词频的工具:
运行过程如下所示:
最后,查看处理数据的结果:
当然,使用root用户没有问题了,看看我的运行过程。我使用的是hadoop-0.18.0版本的。
首先,修改Hadoop配置文件hadoop-env.sh,设置JAVA_HOME:
# The java implementation to use. Required. export JAVA_HOME="/usr/java/jdk1.6.0_07" |
[www.linuxidc.com@www.linuxidc.com hadoop-0.18.0]$ su root 口令: [root@www.linuxidc.com hadoop-0.18.0]# ssh localhost root@localhost's password: Last login: Wed Sep 24 19:25:21 2008 from localhost.localdomain [root@www.linuxidc.com ~]# |
然后,切换到hadoop-0.18.0目录下面,并运行WordCount统计词频的工具:
[root@www.linuxidc.com hadoop-0.18.0]# bin/hadoop jar hadoop-0.18.0-examples.jar wordcount my-input my-output |
08/09/25 16:32:39 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 08/09/25 16:32:40 INFO mapred.FileInputFormat: Total input paths to process : 7 08/09/25 16:32:40 INFO mapred.FileInputFormat: Total input paths to process : 7 08/09/25 16:32:41 INFO mapred.JobClient: Running job: job_local_0001 08/09/25 16:32:41 INFO mapred.FileInputFormat: Total input paths to process : 7 08/09/25 16:32:41 INFO mapred.FileInputFormat: Total input paths to process : 7 08/09/25 16:32:41 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:41 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:42 INFO mapred.JobClient: map 0% reduce 0% 08/09/25 16:32:44 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:44 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:45 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:45 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720 08/09/25 16:32:45 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680 08/09/25 16:32:45 INFO mapred.MapTask: Index: (0, 26, 26) 08/09/25 16:32:45 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:45 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/e.txt:0+1957 08/09/25 16:32:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 08/09/25 16:32:45 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000000_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:46 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:46 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:46 INFO mapred.JobClient: map 100% reduce 0% 08/09/25 16:32:46 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:46 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:46 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:46 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720 08/09/25 16:32:46 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680 08/09/25 16:32:46 INFO mapred.MapTask: Index: (0, 26, 26) 08/09/25 16:32:46 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:46 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/a.txt:0+1957 08/09/25 16:32:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done. 08/09/25 16:32:46 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000001_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:46 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:46 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:47 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:47 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:47 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:47 INFO mapred.MapTask: bufstart = 0; bufend = 16845; bufvoid = 99614720 08/09/25 16:32:47 INFO mapred.MapTask: kvstart = 0; kvend = 1684; length = 327680 08/09/25 16:32:47 INFO mapred.MapTask: Index: (0, 42, 42) 08/09/25 16:32:47 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:47 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/b.txt:0+10109 08/09/25 16:32:47 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done. 08/09/25 16:32:47 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000002_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:47 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:47 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:48 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:48 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:48 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:48 INFO mapred.MapTask: bufstart = 0; bufend = 3312; bufvoid = 99614720 08/09/25 16:32:48 INFO mapred.MapTask: kvstart = 0; kvend = 331; length = 327680 08/09/25 16:32:48 INFO mapred.MapTask: Index: (0, 72, 72) 08/09/25 16:32:48 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:48 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/d.txt:0+1987 08/09/25 16:32:48 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' done. 08/09/25 16:32:48 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000003_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:48 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:48 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:49 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:49 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:49 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:49 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720 08/09/25 16:32:49 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680 08/09/25 16:32:49 INFO mapred.MapTask: Index: (0, 26, 26) 08/09/25 16:32:49 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:49 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/g.txt:0+1957 08/09/25 16:32:49 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000004_0' done. 08/09/25 16:32:49 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000004_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:49 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:49 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:49 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:49 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:49 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:49 INFO mapred.MapTask: bufstart = 0; bufend = 3262; bufvoid = 99614720 08/09/25 16:32:49 INFO mapred.MapTask: kvstart = 0; kvend = 326; length = 327680 08/09/25 16:32:50 INFO mapred.MapTask: Index: (0, 26, 26) 08/09/25 16:32:50 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:50 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/c.txt:0+1957 08/09/25 16:32:50 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000005_0' done. 08/09/25 16:32:50 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000005_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:50 INFO mapred.MapTask: numReduceTasks: 1 08/09/25 16:32:50 INFO mapred.MapTask: io.sort.mb = 100 08/09/25 16:32:50 INFO mapred.MapTask: data buffer = 79691776/99614720 08/09/25 16:32:50 INFO mapred.MapTask: record buffer = 262144/327680 08/09/25 16:32:50 INFO mapred.MapTask: Starting flush of map output 08/09/25 16:32:50 INFO mapred.MapTask: bufstart = 0; bufend = 3306; bufvoid = 99614720 08/09/25 16:32:50 INFO mapred.MapTask: kvstart = 0; kvend = 330; length = 327680 08/09/25 16:32:50 INFO mapred.MapTask: Index: (0, 50, 50) 08/09/25 16:32:50 INFO mapred.MapTask: Finished spill 0 08/09/25 16:32:50 INFO mapred.LocalJobRunner: file:/home/www.linuxidc.com/hadoop-0.18.0/my-input/f.txt:0+1985 08/09/25 16:32:50 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000006_0' done. 08/09/25 16:32:50 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000006_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:51 INFO mapred.ReduceTask: Initiating final on-disk merge with 7 files 08/09/25 16:32:51 INFO mapred.Merger: Merging 7 sorted segments 08/09/25 16:32:51 INFO mapred.Merger: Down to the last merge-pass, with 7 segments left of total size: 268 bytes 08/09/25 16:32:51 INFO mapred.LocalJobRunner: reduce > reduce 08/09/25 16:32:51 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done. 08/09/25 16:32:51 INFO mapred.TaskRunner: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/www.linuxidc.com/hadoop-0.18.0/my-output 08/09/25 16:32:51 INFO mapred.JobClient: Job complete: job_local_0001 08/09/25 16:32:51 INFO mapred.JobClient: Counters: 11 08/09/25 16:32:51 INFO mapred.JobClient: File Systems 08/09/25 16:32:51 INFO mapred.JobClient: Local bytes read=953869 08/09/25 16:32:51 INFO mapred.JobClient: Local bytes written=961900 08/09/25 16:32:51 INFO mapred.JobClient: Map-Reduce Framework 08/09/25 16:32:51 INFO mapred.JobClient: Reduce input groups=7 08/09/25 16:32:51 INFO mapred.JobClient: Combine output records=21 08/09/25 16:32:51 INFO mapred.JobClient: Map input records=7 08/09/25 16:32:51 INFO mapred.JobClient: Reduce output records=7 08/09/25 16:32:51 INFO mapred.JobClient: Map output bytes=36511 08/09/25 16:32:51 INFO mapred.JobClient: Map input bytes=21909 08/09/25 16:32:51 INFO mapred.JobClient: Combine input records=3649 08/09/25 16:32:51 INFO mapred.JobClient: Map output records=3649 08/09/25 16:32:51 INFO mapred.JobClient: Reduce input records=21 |
[root@www.linuxidc.com hadoop-0.18.0]# cat my-output/part-00000 apache 1826 baketball 1 bash 1813 fax 2 find 1 hash 1 www.linuxidc.com 5 |
相关问答
更多-
现在的计算机技术员应该学些什么基本的技能?[2022-07-28]
系统 -
笔记本,3G内存,win7系统,想配置php+linux+hadoop[2021-11-29]
谁还自己配环境啊,你用xampp不就行了 -
系统中看不到的,相当于上传到HDFS上,有自己的编码方式,你打不开的吧
-
linux系统有哪些软件不能运行?[2022-04-19]
LINUX 玩游戏不太适合,虽然他也有支持的单机 但是要玩网游的话还不行。 软件的话,windows下的软件直接拿过去用当然是不行的,但是很多可以跨平台的软件都有LINUX版本的, 就算有的软件没有LINUX 版本,你需要什么功能就可以找相应替代的LINUX软件。 -
怎么在linux系统下查看hadoop api[2022-08-30]
namenode就是master。 必须要有一台启动namenode服务。 ============= 如果只需要 datanode,那么jps 命令后,查看到线程ID 然后kill 掉就好了。 注意 kill掉 namenode后,整个hadoop集群就宕掉了。 -
那些软件可以在LINUX系统上运行?[2023-08-24]
多,不是一般得多。 描述 视窗 Linux 1)网络 浏览器 Internet Explorer, Netscape / Mozilla, Opera [版权], Firefox, 等 1) Netscape / Mozilla. 2) Galeon. 3) Konqueror. 4) Opera. [版权] 5) Firefox. 6) Nautilus. 7) Epiphany. 8)连接. (用 "-g" 键). 9) Dillo. 10) Encompass. 命令行浏览器 1) Links 2) ... -
Hadoop是不是必须在linux上运行[2022-10-02]
可以这么说,因为不管是在linux系统,还是在windows系统上, 搭建集群环境,都需要提供linxu系统,linux系统本省不必说了。windows下运行hadoop的话,无非两种方法,一是搭建虚拟机,然后安装linxu系统。二是通过Cygwin模拟linux环境。 -
看起来这种行为是由于内存问题导致的。 当我清理系统上的一些空间并再次执行这些步骤时,它成功地运行了。 但是,我不是100%肯定它。 It seems like the behaviour was due to low memory issues. The moment i cleaned up some space on my system and did the steps again, it ran succesfully. However, i'm not 100% sure about it.
-
Hadoop文件系统读取linux文件系统而不是hdfs?(Hadoop filesystem reads linux filesystem instead of hdfs?)[2022-02-10]
如果找不到有效的hadoop配置,则会发生这种情况。 例如,如果你这样做: hadoop fs -ls 并且在默认位置没有找到配置,那么你将看到linux文件系统。 您可以通过在“hadoop”命令之后添加-conf选项来测试它,例如 hadoop -conf=fs -ls This will happen if a valid hadoop configuration is not found. e.g. if you do: hadoop fs -ls a ...