首页 \ 教程 \ hadoop

知识点

hadoop

用PHP写Hadoop的MapReduce程序

探索 Python，第 1 部分: Python 的内置数值类型

Python的文件类型

Python资源索引【转载】

Python基础缩进和选择

python top project of 2013

Hadoop配置机架感知(Python脚本)

用Python编写Nagios Hadoop监控脚本

【转帖】Python 资源索引

Python 字符串操作

Python实现用Hadoop的map/reduce对web日志进行统计

Python内建函数（F）

Python字符串格式化

又拍网架构-又一个用到python的网站

Python 写的Hadoop小程序

2019-03-28 13:33|来源: 网络

该程序是在python2.3上完成的，python版本间有差异。

Mapper：

import sys

line_number = 0
tab_number = 0
pv_number = 0
clk_number = 0
if_compressed_tested = 0
if_compressed = 0

#functions:
def compressed_stat(line):
    global line_number
    global tab_number
    global pv_number
    global clk_number
    try:
        line_number += 1
        line_split_list = line.split("\t")
        line_split_list_size = len(line_split_list)
        tab_number += (line_split_list_size - 1)
        index = 1
        while index < line_split_list_size:
            pv_clk_list = line_split_list[index].strip().split(" ")
            pv_number += int(pv_clk_list[0])
            clk_number += int(pv_clk_list[1])
            index += 1
    except ValueError:
        print line,"\tERROR"

def before_compress_stat(line):
    global line_number
    global pv_number
    global clk_number
    try:
        line_number += 1
        line = line.strip()
        line_split_list = line.split(" ")
        pv_number += int(line_split_list[0])
        clk_number += int(line_split_list[1])
    except ValueError:
        print line,"\tERROR"
#end functions

for line in sys.stdin:
    try:
        line = line.strip()
        if if_compressed_tested == 0:
            if_compressed_tested = 1
            if line.find("\t") > 0:
                if_compressed = 1
        if if_compressed == 0:
            before_compress_stat(line)
        else:
            compressed_stat(line)
    except ValueError:
        pass
if if_compressed == 1:
    print ("%ld %ld %ld %ld"%(line_number, tab_number, pv_number,clk_number))
else:

print ("%ld %ld %ld"%(line_number,pv_number,clk_number))

Reducer：
import sys

line_number = 0
tab_number = 0
pv_number = 0
clk_number = 0
if_compressed_tested = 0
if_compressed = 0

def compressed_stat(line):
    global line_number
    global tab_number
    global pv_number
    global clk_number
    pv_clk_list = line.split(" ")
    if len(pv_clk_list) != 4:
        print line,"\tERROR"
    else:
        line_number += int(pv_clk_list[0])
        tab_number += int(pv_clk_list[1])
        pv_number += int(pv_clk_list[2])
        clk_number += int(pv_clk_list[3])

def before_compress_stat(line):
    global line_number
    global pv_number
    global clk_number
    pv_clk_list = line.split(" ")
    if len(pv_clk_list) != 3:
        print line,"\tERROR"
    else:
        line_number += int(pv_clk_list[0])
        pv_number += int(pv_clk_list[1])
        clk_number += int(pv_clk_list[2])
#

for line in sys.stdin:
    try:
        line = line.strip()
        if line.count("ERROR") > 0:
            print line
            continue

        if if_compressed_tested == 0:
            if_compressed_tested = 1
            if len(line.split(" ")) == 4:
                if_compressed = 1
            elif len(line.split(" ")) == 3:
                if_compressed = 0
            else:
                print line,"\tERROR"
                continue

        if if_compressed == 0:
            before_compress_stat(line)
        else:
            compressed_stat(line)
    except ValueError:
        print line, "\tERROR"
        pass

if if_compressed == 0:
    print "LINE_NUMBER:",line_number,"TOTAL_PV_NUMBER:",pv_number, "TOTAL_CLK_NUMBER:",clk_number
else:
    print "LINE_NUMBER:",line_number,"TAB_NUMBER",tab_number,"TOTAL_PV_NUMBER:",pv_number, "TOTAL_CLK_NUMBER:",clk_number

知识点

相关文章

最近更新

Python 写的Hadoop小程序

相关问答

hadoop用python写的Map部分哪里有问题啊？[2022-04-26]

python写的程序能够在hadoop上跑吗[2021-07-13]

Python可以写Web应用程序么？[2023-07-09]

怎么用rstudio写python程序[2022-06-11]

用python 比较两个strings 是否相同, 忽略大小写程序怎么写。。[2019-05-13]

谁可以帮我写一个python小程序？[2023-05-15]

用python写小程序[2021-09-29]

python是如何写界面程序的？[2023-11-15]

如何用python脚本语言写带窗口程序？[2022-06-22]

kafka - 可以将python程序连接到hadoop集群外的Kafka吗？(kafka - can python program connect to Kafka outside hadoop cluster?)[2023-12-03]