就像上一篇文章（http://www.linuxidc.com/Linux/2012-10/72857.htm ）中说的，其实最大的障碍在于Hadoop自带的pipes静态库和动态库都是为linux平台的，而不是为MacOS平台的。在MacOS下，想要使用pipes，需要重新编译库文件。编译过程和方法见上一篇博文。

其他的，似乎没有太多好说的。我就列出代码吧。

hadoopWordCountPipe.cpp的内容如下：

// the header files of haddop
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
#include <string>
#include <vector>
using namespace std;
const string WORDCOUNT = "WORDCOUNT";
const string INPUT_WORDS = "INPUT_WORDS";
const string OUTPUT_WORDS = "OUTPUT_WORDS";
class WordCountMap: public HadoopPipes::Mapper
{
public:
HadoopPipes::TaskContext::Counter * inputWords;
WordCountMap (HadoopPipes::TaskContext & context)
{
inputWords = context.getCounter(WORDCOUNT, INPUT_WORDS);
}
void map (HadoopPipes::MapContext & context)
{
vector<string> WordVec = HadoopUtils::splitString (context.getInputValue(), " ");
for (int i=0; i<(int)WordVec.size(); i++)
context.emit (WordVec.at(i), "1");
context.incrementCounter (inputWords, WordVec.size());
}
};
class WordCountReduce: public HadoopPipes::Reducer
{
public:
HadoopPipes::TaskContext::Counter * outputWords;
WordCountReduce (HadoopPipes::TaskContext & context)
{
outputWords = context.getCounter (WORDCOUNT, OUTPUT_WORDS);
}
void reduce (HadoopPipes::ReduceContext & context)
{
int sum = 0;
while (context.nextValue())
{
sum += HadoopUtils::toInt (context.getInputValue());
}
context.emit (context.getInputKey(), HadoopUtils::toString(sum));
context.incrementCounter (outputWords, 1);
}
};
int main (int argc, char * argv[])
{
return HadoopPipes::runTask (HadoopPipes::TemplateFactory<WordCountMap, WordCountReduce>());
}

Makefile的内容如下：

HADOOP_INSTALL="/Volumes/Data/Works/Hadoop/hadoop-0.20.2"
PLATFORM=Mac_OS_X-x86_64-64
CC = g++
CPPFLAGS = -m64 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include
hadoopWordCountPipe: hadoopWordCountPipe.cpp
$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes -lhadooputils -lpthread -g -O2 -o $@

知识点

相关文章

最近更新

【Hadoop学习】在伪分布式Hadoop上实践word count程序——C/C++ Pipes版本

相关问答

关于分布式Hadoop在WINDOWS上操作问题[2023-09-09]

hadoop搭建完全分布式完成，可是不会用[2022-04-01]

hadoop完全分布式jdk要什么版本[2023-06-22]

菜鸟求助阿！快疯了！Hadoop分布式系统下文件路径的问题[2023-10-09]

是否可以在没有HDFS的情况下以伪分布式操作运行Hadoop？(Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?)[2022-09-21]

Hadoop伪分布式模式 - Datanode和tasktracker无法启动(Hadoop pseudo distributed mode - Datanode and tasktracker not starting)[2022-05-02]

链接Hadoop MapReduce与管道（C ++）(Chaining Hadoop MapReduce with Pipes (C++))[2022-04-08]

Hadoop -pseudo分布式模式：输入路径不存在(Hadoop -pseudo distributed mode : Input path does not exist)[2023-11-19]