首页 \ 教程 \ hadoop

知识点

hadoop

Hadoop lzo文件的并行map处理

Hadoop集群lzo文件的并行map处理

使用Hadoop的lzo问题

Hadoop Core 学习笔记(二) lzo文件的写入和读取

Hadoop集群上使用Lzo压缩

Hadoop 用FileStatus类来查看HDFS中文件或目录的元信息

Hadoop集群lzo的安装

Hadoop LZO 安装教程

使Hadoop支持Splittable压缩lzo

Hadoop集群内lzo的安装与配置

ProtocolBuffer 结合 LZO在 Hadoop中的使用

Hadoop集群内lzo的安装与配置详解

ProtocolBuffer和LZO技术Hadoop系统上的使用

linux中查看文件或者目录大小

Hadoop Lzo 源码分析之分片/切片原理

查看Hadoop里的LZO文件的内容

2019-03-28 13:20|来源: 网络

最近常常需要查看LZO文件里面的内容，这些文件通常很大，放在hdfs上。我没有好的方法，我以前偶尔查看其中内容都是直接get到本地然后用lzop解压缩然后再more的。这样做当你偶尔使用的时候即使文件稍微大点，也许也是可以接受的。但现在我需要常常grep里面的内容，就不那么欢乐了。

所以写了个shell脚本lzoc[ lzo cat]，用来专门查看ＨＤＦＳ里ＬＺＯ文件的内容，正常情况下它不输出任何多余的东西，这样就可以和more 、 head、tail等工具一起结合使用了。

代码如下:

它有三个选项：

-c 指示删除已经存在当前目录的同名文件，这往往是为了删除旧的副本而制定的，
-d 指示最后阶段删除当前目录里中间文件，因为我们会把文件从hdfs中get出来
-i 指示输出一些交互信息，如果你cat出来的内容要用作它用，那么你不要使用这个选项

使用示例：

$./lzoc /user/Hadoop/output/filename.lzo | more

#! /bin/sh

#description:
# cat the lzo file on hadoop
filePath="" #full Path of the hadoop lzo file
lzoFileName="" #file with .lzo as extension after hadoop fs -get ....
fileName="" #file name without extension-name

deleteAfterExecute=N #has -c option, which indicates that old files should be deleted
deleteBeforeExecute=N #has -d option, which indicates that related files should be deleted in the final state
interactiveMsg=N #only the text of the file should print

if [ $# -lt 1 ]
then
echo "must has aleast one parameter, which is the fileName."
exit -1
else
#normal command style
eval filePath=\${$#} #get the last parameter, must guarantee that it is less then 9
lzoFileName=${filePath##*/}
fileName=${lzoFileName%.lzo*}
fi

#parase options
if [ $# -gt 1 ]
then
while getopts cdi OPTION
do
case $OPTION
in
c)
deleteBeforeExecute=Y;;
d)
deleteAfterExecute=Y;;
i)
interactiveMsg=Y;;
\?)
echo "illegal option:$OPTION";
exit -2;;
esac
done
fi

#delete old file if needed
if [ $deleteBeforeExecute == "Y" ]; then
if [ -e $fileName ]; then
echo "delete old file"
rm $fileName;
fi
if [ -e $lzoFileName ]; then
echo "delete old lzo file"
rm $lzoFileName
fi
fi

#make sure hadoop is on
which hadoop > /dev/null 2>&1
if [ $? -eq 1 ]; then
echo "Command not exist,hadoop may not have been started."
exit -3
fi

#make sure fileExist,should not be a directory
hadoop fs -test -e $filePath > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "No such file for directory:"$filePath
exit -4
fi

#can not cat a directory
hadoop fs -test -d $filePath > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "Can not cat a directory:"$filePath
exit -4
fi

#make sure lzop is installed
which lzop > /dev/null 2>&1
if [ $? -eq 1 ]; then
echo "Tool missed:lzop is not installed."
exit -5
fi

#test whether lzo file exist
if [ -e $lzoFileName ]; then
if [ $interactiveMsg == "Y" ]; then
echo "LZO file already exist."
fi
else
if [ $interactiveMsg == "Y" ]; then
echo "LZO file not exist."
fi
#get the file from hadoop
hadoop fs -get $filePath .
fi

#test whether file exist
if [ -e $fileName ]; then
if [ $interactiveMsg == "Y" ]; then
echo "File already exist."
fi
else
if [ $interactiveMsg == "Y" ]; then
echo "File not exist."
fi
#decomopress the lzo file
lzop -dv $lzoFileName > /dev/null 2>&1
fi

#clear

#cat the file
cat $fileName

#delete files in the final state is needed
if [ $deleteAfterExecute == "Y" ]
then
if [ -e $fileName ]; then
rm $fileName
fi
if [ -e $lzoFileName ]; then
rm $lzoFileName
fi
if [ $interactiveMsg == "Y" ]; then
echo "files has been deleted"
fi
fi

知识点

相关文章

最近更新

查看Hadoop里的LZO文件的内容

相关问答

hadoop怎么查看本地文件[2021-04-18]

为什么启动hadoop后在hdfs上系统里创建input的文件夹的命令没有用命令为hadoop[2023-07-08]

hadoop 函数包放在哪个文件里[2022-02-17]

如何查看文件损坏Hadoop[2022-03-03]

在Spark DataFrames中读取json行的LZO文件(Reading LZO file of json lines in Spark DataFrames)[2022-07-17]

在hadoop中查看gzip文件内容(View gzipped file content in hadoop)[2022-12-04]

用python打开一个lzo文件，而不用解压文件(Open an lzo file in python, without decompressing the file)[2022-10-26]

在AWS EMR上对Hadoop输出使用LZO时，是否会对文件（存储在S3中）进行索引以便将来自动拆分？(When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automatic splitting?)[2022-03-26]

在普通Java中用原始文件创建lzo文件和Lzo索引文件(Create lzo file and Lzo index file from raw file in plain Java)[2023-12-13]

pip安装python-lzo时找不到文件错误(File not found error when pip installing python-lzo)[2021-09-24]

知识点

相关文章

最近更新

查看Hadoop里的LZO文件的内容

相关问答

hadoop怎么查看本地文件[2021-04-18]

为什么启动hadoop后在hdfs上系统里创建input的文件夹的命令没有用 命令为hadoop[2023-07-08]

hadoop 函数包放在哪个文件里[2022-02-17]

如何查看文件损坏Hadoop[2022-03-03]

在Spark DataFrames中读取json行的LZO文件(Reading LZO file of json lines in Spark DataFrames)[2022-07-17]

在hadoop中查看gzip文件内容(View gzipped file content in hadoop)[2022-12-04]

用python打开一个lzo文件，而不用解压文件(Open an lzo file in python, without decompressing the file)[2022-10-26]

在AWS EMR上对Hadoop输出使用LZO时，是否会对文件（存储在S3中）进行索引以便将来自动拆分？(When using LZO on Hadoop output on AWS EMR, does it index the files (stored on S3) for future automatic splitting?)[2022-03-26]

在普通Java中用原始文件创建lzo文件和Lzo索引文件(Create lzo file and Lzo index file from raw file in plain Java)[2023-12-13]

pip安装python-lzo时找不到文件错误(File not found error when pip installing python-lzo)[2021-09-24]

为什么启动hadoop后在hdfs上系统里创建input的文件夹的命令没有用命令为hadoop[2023-07-08]