首页 \ 问答 \ 使用MAPRFS运行`hadoop fs`时如何启用其他日志记录?(How to enable additional logging when running `hadoop fs` with MAPRFS?)

使用MAPRFS运行`hadoop fs`时如何启用其他日志记录?(How to enable additional logging when running `hadoop fs` with MAPRFS?)

当我运行此命令时:

hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'

我收到以下错误。

2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376
14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)

任何人都可以建议如何启用额外的详细/调试日志记录?

上述错误似乎来自MAPR hadoop类。 在这些包中启用更详细的日志记录以及org.apache会很不错。*

我尝试修改/opt/mapr/conf/logging.properties但它似乎没有帮助。

BTW,运行Hadoop 1.0.3和MapR 3.1.1.26113.GA

谢谢,

网络连接

ps这与我在http://answers.mapr.com/questions/11374/write-to-maprfs-with-hadoop-cli-fails-inside-docker-while-running-on-a-data-的问题有关。 节点#


When I run this command:

hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'

I get the following errors.

2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376
14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)

Can anyone suggest how to enable additional verbose/debug logging?

The above errors seem to be coming from the MAPR hadoop classes. It would be nice to enable more verbose logging in those packages, as well as org.apache.*

I tried modifying /opt/mapr/conf/logging.properties but it didn't seem to help.

BTW, running Hadoop 1.0.3 and MapR 3.1.1.26113.GA

thanks,

Fi

p.s. This is related to my question at http://answers.mapr.com/questions/11374/write-to-maprfs-with-hadoop-cli-fails-inside-docker-while-running-on-a-data-node#


原文:https://stackoverflow.com/questions/26748461
更新时间:2022-11-09 18:11

最满意答案

Bash和grep

循环遍历搜索文件并对每行进行grepping,将结果重定向到正确命名的文件:

while read str; do grep -F "$str" infile > "$str".txt; done < search.txt

infile是你的大文件。 这导致以下文件:

==> Value003.txt <==
line1"value001","value002","Value003"
line3"value001","value002","Value003"
line4"value001","value002","Value003"
line5"value001","value002","Value003"

==> Value007.txt <==
line2"Value004","Value005","Value006","Value007"
line6"Value004","Value005","Value006","Value007"

==> Value009.txt <==
line7"value010","value022","Value009"

请注意,这会多次处理非常大的文件,即使grep很快,使用Bash循环遍历文件也很慢,因此只有在search.txt相对较小时才可行。

AWK

要仅处理大文件一次,您可以使用awk迭代它,并且对于每一行检查是否有任何字符串匹配:

#!/usr/bin/awk -f

# Read search file into array
NR == FNR {
    searchstr[$0]
    next
}

{
    # Iterate over search strings
    for (str in searchstr) {
        # Print to file if matches
        if (index($0, str)) {
            print $0 > str ".txt"
            # next  # Uncomment if only one search string can occur per line
            # close(str ".txt") # Uncomment if there are too many open files
        }
    }
}

必须如下调用:

awk -f script.awk search.txt infile

在一个不太易读的单行版本中:

awk 'NR==FNR{ss[$0];next}{for(s in ss)if(index($0,s))print$0>s".txt"}' search.txt infile

请注意,某些awks对打开文件句柄1的数量有限制,而其他(GNU awk)可以管理更多但速度超过该限制 - 这取决于search.txt的大小。 如果它成为问题,我们可以在if子句中添加close(str ".txt")以在每次写入后关闭文件。

如果每行只能出现一个搜索字符串,我们可以取消注释循环中的next语句。


1原始的awk有15个打开文件的限制!


Bash and grep

Looping over the search file and grepping for each line, redirecting the result to the properly named file:

while read str; do grep -F "$str" infile > "$str".txt; done < search.txt

where infile is your large file. This results in the following files:

==> Value003.txt <==
line1"value001","value002","Value003"
line3"value001","value002","Value003"
line4"value001","value002","Value003"
line5"value001","value002","Value003"

==> Value007.txt <==
line2"Value004","Value005","Value006","Value007"
line6"Value004","Value005","Value006","Value007"

==> Value009.txt <==
line7"value010","value022","Value009"

Notice that this processes the very large file multiple times, and even though grep is fast, looping over a file with Bash is slow, so this is only viable if search.txt is relatively small.

Awk

To process the large file only once, you could iterate over it with awk, and for each line check if any of the strings match:

#!/usr/bin/awk -f

# Read search file into array
NR == FNR {
    searchstr[$0]
    next
}

{
    # Iterate over search strings
    for (str in searchstr) {
        # Print to file if matches
        if (index($0, str)) {
            print $0 > str ".txt"
            # next  # Uncomment if only one search string can occur per line
            # close(str ".txt") # Uncomment if there are too many open files
        }
    }
}

This has to be called as follows:

awk -f script.awk search.txt infile

In a less readable one-line version:

awk 'NR==FNR{ss[$0];next}{for(s in ss)if(index($0,s))print$0>s".txt"}' search.txt infile

Notice that some awks have a limit to the number of open filehandles1, and others (GNU awk) can manage more but slow down beyond that limit – this depends on the size of your search.txt. If it becomes a problem, we can add close(str ".txt") to the if clause to close the file after each write.

If only one search string can occur on each line, we can uncomment the next statement in the loop.


1 The original awk had a limit of 15 open files!

相关问答

更多

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。