首页 \ 问答 \ 使用MAPRFS运行`hadoop fs`时如何启用其他日志记录？(How to enable additional logging when running `hadoop fs` with MAPRFS?)

使用MAPRFS运行`hadoop fs`时如何启用其他日志记录？(How to enable additional logging when running `hadoop fs` with MAPRFS?)

 当我运行此命令时：  
hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'
 
 我收到以下错误。  
2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376
14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)
 
 任何人都可以建议如何启用额外的详细/调试日志记录？  
 上述错误似乎来自MAPR hadoop类。 在这些包中启用更详细的日志记录以及org.apache会很不错。*  
 我尝试修改/opt/mapr/conf/logging.properties但它似乎没有帮助。  
 BTW，运行Hadoop 1.0.3和MapR 3.1.1.26113.GA  
 谢谢，  
 网络连接  
 ps这与我在http://answers.mapr.com/questions/11374/write-to-maprfs-with-hadoop-cli-fails-inside-docker-while-running-on-a-data-的问题有关。 节点＃ 

When I run this command: 
hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'
 
I get the following errors. 
2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376
14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument
14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument
copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)
 
Can anyone suggest how to enable additional verbose/debug logging? 
The above errors seem to be coming from the MAPR hadoop classes. It would be nice to enable more verbose logging in those packages, as well as org.apache.* 
I tried modifying /opt/mapr/conf/logging.properties but it didn't seem to help. 
BTW, running Hadoop 1.0.3 and MapR 3.1.1.26113.GA 
thanks, 
Fi 
p.s. This is related to my question at http://answers.mapr.com/questions/11374/write-to-maprfs-with-hadoop-cli-fails-inside-docker-while-running-on-a-data-node# 

原文：https://stackoverflow.com/questions/26748461

更新时间：2022-11-09 18:11

最满意答案

 Bash和grep  
 循环遍历搜索文件并对每行进行grepping，将结果重定向到正确命名的文件：  
while read str; do grep -F "$str" infile > "$str".txt; done < search.txt
 
 infile是你的大文件。 这导致以下文件：  
==> Value003.txt <==
line1"value001","value002","Value003"
line3"value001","value002","Value003"
line4"value001","value002","Value003"
line5"value001","value002","Value003"

==> Value007.txt <==
line2"Value004","Value005","Value006","Value007"
line6"Value004","Value005","Value006","Value007"

==> Value009.txt <==
line7"value010","value022","Value009"
 
 请注意，这会多次处理非常大的文件，即使grep很快，使用Bash循环遍历文件也很慢，因此只有在search.txt相对较小时才可行。  
 AWK  
 要仅处理大文件一次，您可以使用awk迭代它，并且对于每一行检查是否有任何字符串匹配：  
#!/usr/bin/awk -f

# Read search file into array
NR == FNR {
    searchstr[$0]
    next
}

{
    # Iterate over search strings
    for (str in searchstr) {
        # Print to file if matches
        if (index($0, str)) {
            print $0 > str ".txt"
            # next  # Uncomment if only one search string can occur per line
            # close(str ".txt") # Uncomment if there are too many open files
        }
    }
}
 
 必须如下调用：  
awk -f script.awk search.txt infile
 
 在一个不太易读的单行版本中：  
awk 'NR==FNR{ss[$0];next}{for(s in ss)if(index($0,s))print$0>s".txt"}' search.txt infile
 
 请注意，某些awks对打开文件句柄¹的数量有限制，而其他（GNU awk）可以管理更多但速度超过该限制 - 这取决于search.txt的大小。 如果它成为问题，我们可以在if子句中添加close(str ".txt")以在每次写入后关闭文件。  
 如果每行只能出现一个搜索字符串，我们可以取消注释循环中的next语句。  
 
 ¹原始的awk有15个打开文件的限制！ 

Bash and grep 
Looping over the search file and grepping for each line, redirecting the result to the properly named file: 
while read str; do grep -F "$str" infile > "$str".txt; done < search.txt
 
where infile is your large file. This results in the following files: 
==> Value003.txt <==
line1"value001","value002","Value003"
line3"value001","value002","Value003"
line4"value001","value002","Value003"
line5"value001","value002","Value003"

==> Value007.txt <==
line2"Value004","Value005","Value006","Value007"
line6"Value004","Value005","Value006","Value007"

==> Value009.txt <==
line7"value010","value022","Value009"
 
Notice that this processes the very large file multiple times, and even though grep is fast, looping over a file with Bash is slow, so this is only viable if search.txt is relatively small. 
Awk 
To process the large file only once, you could iterate over it with awk, and for each line check if any of the strings match: 
#!/usr/bin/awk -f

# Read search file into array
NR == FNR {
    searchstr[$0]
    next
}

{
    # Iterate over search strings
    for (str in searchstr) {
        # Print to file if matches
        if (index($0, str)) {
            print $0 > str ".txt"
            # next  # Uncomment if only one search string can occur per line
            # close(str ".txt") # Uncomment if there are too many open files
        }
    }
}
 
This has to be called as follows: 
awk -f script.awk search.txt infile
 
In a less readable one-line version: 
awk 'NR==FNR{ss[$0];next}{for(s in ss)if(index($0,s))print$0>s".txt"}' search.txt infile
 
Notice that some awks have a limit to the number of open filehandles¹, and others (GNU awk) can manage more but slow down beyond that limit – this depends on the size of your search.txt. If it becomes a problem, we can add close(str ".txt") to the if clause to close the file after each write. 
If only one search string can occur on each line, we can uncomment the next statement in the loop. 
 
¹ The original awk had a limit of 15 open files!

使用MAPRFS运行`hadoop fs`时如何启用其他日志记录？(How to enable additional logging when running `hadoop fs` with MAPRFS?)

最满意答案

Bash和grep

AWK

Bash and grep

Awk

相关问答

在文本文件中搜索多个字符串，并将结果打印到新的文本文件中(Search text file for multiple strings and print out results to a new text file)[2021-08-20]

文本文件有1或2或3个字符串，后跟双打。(Text file has 1 or 2 or 3 strings followed by doubles. How do I read strings?(each line has 1 or 2 or 3 string in the beginning) [closed])[2023-06-05]

批处理脚本在文本文件中搜索字符串;(Batch script to search strings in text file; echo string found or not found)[2022-04-07]

使用另一个字符串搜索80 GB文本文件，并将每个字符串的结果保存到单独的文件中(Search an 80 GB text file using strings in another and save the results for each string to separate files)[2022-03-18]

将两个字符串保存到一个文本文件而不会相互干扰(Saving two strings to one text file without disrupting each other)[2022-01-05]

如何从文本文件中读取字符串并将其存储在数组中(How to read a string from a text file and store it in an array)[2023-08-08]

在另一个字符串中搜索文本文件中的字符串(searching the strings in a text file in another string)[2023-05-03]

使用any（）在另一个字符串中搜索多个字符串(Searching for Multiple Strings within another String using any())[2022-02-11]

在大文本文件中搜索字符串的最快方法[关闭](Fastest way to search string in large text file [closed])[2022-02-18]

计算多个文件中多个字符串的出现次数(Counting occurrences of multiple strings in multiple files)[2022-05-13]

相关文章

最新问答