使用MAPRFS运行`hadoop fs`时如何启用其他日志记录?(How to enable additional logging when running `hadoop fs` with MAPRFS?)
当我运行此命令时:
hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'
我收到以下错误。
2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376 14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)
任何人都可以建议如何启用额外的详细/调试日志记录?
上述错误似乎来自MAPR hadoop类。 在这些包中启用更详细的日志记录以及org.apache会很不错。*
我尝试修改/opt/mapr/conf/logging.properties但它似乎没有帮助。
BTW,运行Hadoop 1.0.3和MapR 3.1.1.26113.GA
谢谢,
网络连接
ps这与我在http://answers.mapr.com/questions/11374/write-to-maprfs-with-hadoop-cli-fails-inside-docker-while-running-on-a-data-的问题有关。 节点#
When I run this command:
hadoop fs -copyFromLocal /tmp/1GB.img 'maprfs://maprfs.example.com/tmp/1GB.img'
I get the following errors.
2014-11-05 01:21:08,7669 ERROR Client fs/client/fileclient/cc/writebuf.cc:154 Thread: 240 FlushWrite failed: File 1GB.img, error: Invalid argument(22), pfid 4484.66.266002, off 65536, fid 5189.87.131376 14/11/05 01:21:08 ERROR fs.Inode: Write failed for file: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Flush failed for file: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Marking failure for: /tmp/1GB.img, error: Invalid argument 14/11/05 01:21:08 ERROR fs.Inode: Throwing exception for: /tmp/1GB.img, error: Invalid argument copyFromLocal: 4484.66.266002 /tmp/1GB.img (Invalid argument)
Can anyone suggest how to enable additional verbose/debug logging?
The above errors seem to be coming from the MAPR hadoop classes. It would be nice to enable more verbose logging in those packages, as well as org.apache.*
I tried modifying /opt/mapr/conf/logging.properties but it didn't seem to help.
BTW, running Hadoop 1.0.3 and MapR 3.1.1.26113.GA
thanks,
Fi
p.s. This is related to my question at http://answers.mapr.com/questions/11374/write-to-maprfs-with-hadoop-cli-fails-inside-docker-while-running-on-a-data-node#
原文:https://stackoverflow.com/questions/26748461
最满意答案
Bash和grep
循环遍历搜索文件并对每行进行grepping,将结果重定向到正确命名的文件:
while read str; do grep -F "$str" infile > "$str".txt; done < search.txt
infile
是你的大文件。 这导致以下文件:==> Value003.txt <== line1"value001","value002","Value003" line3"value001","value002","Value003" line4"value001","value002","Value003" line5"value001","value002","Value003" ==> Value007.txt <== line2"Value004","Value005","Value006","Value007" line6"Value004","Value005","Value006","Value007" ==> Value009.txt <== line7"value010","value022","Value009"
请注意,这会多次处理非常大的文件,即使grep很快,使用Bash循环遍历文件也很慢,因此只有在
search.txt
相对较小时才可行。AWK
要仅处理大文件一次,您可以使用awk迭代它,并且对于每一行检查是否有任何字符串匹配:
#!/usr/bin/awk -f # Read search file into array NR == FNR { searchstr[$0] next } { # Iterate over search strings for (str in searchstr) { # Print to file if matches if (index($0, str)) { print $0 > str ".txt" # next # Uncomment if only one search string can occur per line # close(str ".txt") # Uncomment if there are too many open files } } }
必须如下调用:
awk -f script.awk search.txt infile
在一个不太易读的单行版本中:
awk 'NR==FNR{ss[$0];next}{for(s in ss)if(index($0,s))print$0>s".txt"}' search.txt infile
请注意,某些awks对打开文件句柄1的数量有限制,而其他(GNU awk)可以管理更多但速度超过该限制 - 这取决于
search.txt
的大小。 如果它成为问题,我们可以在if
子句中添加close(str ".txt")
以在每次写入后关闭文件。如果每行只能出现一个搜索字符串,我们可以取消注释循环中的
next
语句。
1原始的awk有15个打开文件的限制!
Bash and grep
Looping over the search file and grepping for each line, redirecting the result to the properly named file:
while read str; do grep -F "$str" infile > "$str".txt; done < search.txt
where
infile
is your large file. This results in the following files:==> Value003.txt <== line1"value001","value002","Value003" line3"value001","value002","Value003" line4"value001","value002","Value003" line5"value001","value002","Value003" ==> Value007.txt <== line2"Value004","Value005","Value006","Value007" line6"Value004","Value005","Value006","Value007" ==> Value009.txt <== line7"value010","value022","Value009"
Notice that this processes the very large file multiple times, and even though grep is fast, looping over a file with Bash is slow, so this is only viable if
search.txt
is relatively small.Awk
To process the large file only once, you could iterate over it with awk, and for each line check if any of the strings match:
#!/usr/bin/awk -f # Read search file into array NR == FNR { searchstr[$0] next } { # Iterate over search strings for (str in searchstr) { # Print to file if matches if (index($0, str)) { print $0 > str ".txt" # next # Uncomment if only one search string can occur per line # close(str ".txt") # Uncomment if there are too many open files } } }
This has to be called as follows:
awk -f script.awk search.txt infile
In a less readable one-line version:
awk 'NR==FNR{ss[$0];next}{for(s in ss)if(index($0,s))print$0>s".txt"}' search.txt infile
Notice that some awks have a limit to the number of open filehandles1, and others (GNU awk) can manage more but slow down beyond that limit – this depends on the size of your
search.txt
. If it becomes a problem, we can addclose(str ".txt")
to theif
clause to close the file after each write.If only one search string can occur on each line, we can uncomment the
next
statement in the loop.
1 The original awk had a limit of 15 open files!
相关问答
更多-
在文本文件中搜索多个字符串,并将结果打印到新的文本文件中(Search text file for multiple strings and print out results to a new text file)[2021-08-20]
我会使用一个单独的函数: 输入文件的路径 输出文件的路径 包含(startkeyword,endkeyword)对的iterable 然后,如果在开始和结束之间,我将逐行处理文件,计算每对被发现的时间。 这样调用者就可以知道找到了哪些对,以及每次对的次数。 这是一个可能的实现: def process(infile, outfile, keywords): '''Search through inputfile whatever is between a pair startkeyword (exc ... -
假设有一个文本行存储在string对象中( s可以通过getline从文件/流中读取) std::string s = "Manchester City 17 26 2.16"; 我们需要在第一个数字之前读取所有文本到其他string (加入一些没有空格的单词) std::string name = ""; // empty string to store name 我们不知道在第一个数字在字符串中之前有多少非数字(名称的一部分),但我们期望这个数字是。 如果您的任务是这样的,请考虑以下示例: ...
-
批处理脚本在文本文件中搜索字符串;(Batch script to search strings in text file; echo string found or not found)[2022-04-07]
您无法搜索所有字符串并单独显示结果。 您必须独立搜索每个字符串,然后检查返回值以确定字符串是否存在。 另一个问题是, wmic qfe list > kb.txt以UTF-16输出文件,但findstr不能用于Unicode,因此您必须使用find (这对于这种情况来说已经足够了)。 您可以将字符串逐行搜索到搜索文件中,然后使用for /F循环遍历该文件的内容 @echo off for /F %%f in (searchfile.txt) do ( find /i "%%f" kb.txt >N ... -
Bash和grep 循环遍历搜索文件并对每行进行grepping,将结果重定向到正确命名的文件: while read str; do grep -F "$str" infile > "$str".txt; done < search.txt infile是你的大文件。 这导致以下文件: ==> Value003.txt <== line1"value001","value002","Value003" line3"value001","value002","Value003" line4"value00 ...
-
将两个字符串保存到一个文本文件而不会相互干扰(Saving two strings to one text file without disrupting each other)[2022-01-05]
修复: Button confirmButton = (Button) findViewById(R.id.baddNametoFile); confirmButton.setOnClickListener(new View.OnClickListener() { public void onClick(View view) { try { EditText bodyText; ... -
这个: float len=sizeof(string); 只是给你数组string的静态大小,它是一个20个指针的数组,每个指针是4个字节,20 * 4个字节= 80这就是为什么它总是80.你已经有了数组的大小,因为你为每个增加了字符串,这是数组的大小只是按大小打印我的意思是你在数组中分配的字符串数,或文件中的行数。 编辑:如果你想获得数组中字符串的长度,请使用strlen while (fgets(string[i], BUFSIZE, fp)) { i++; len+=strlen( ...
-
问题是您在处理第一个排列时读取/使用整个文件。 处理它的一种强制方法是每次循环时重置文件上的内部指针。 for p in permutations(list(x)): q = ''.join(p) for i in g: if i in q: print i g.seek(0) # resetting the pointer back to the beginning of the file 根据你正在处理的内容,这可能没问题,但你最终会以 ...
-
使用any()在另一个字符串中搜索多个字符串(Searching for Multiple Strings within another String using any())[2022-02-11]
您的会员资格检查错误。 您需要检查region中的任何项目是否在key : >>> for text,key in testset: ... if any(sym in key for sym in region): ... print(key) ... TEXT.NA[Y]ABC Your membership checking is wrong. You need to check if any item from region is in key: >>> for text ... -
执行此操作的标准方法是实现Aho-Corasick算法 。 它会一次读取文件,并查找所有出现的字符串。 有关提供实现和一些示例的文章,请参阅https://www.informit.com/guides/content.aspx?g=dotnet&seqNum=869 。 更多信息后更新 假设文件A中的数字列表足够小以适应内存,这就是你要做的事情,使用上面链接文章中的实现: // Construct the automaton AhoCorasickStringSearcher matcher = new ...
-
如果具有两个匹配的行应仅计为一个,或者如果不存在包含多个匹配的行的可能性 findstr /c:"string1" /c:"string2" /c:"string3" *.txt | find /c /v "" > counts.txt 如果一行可以包含多个匹配,则必须计算每个匹配 ( findstr /c:"string1" *.txt findstr /c:"string2" *.txt findstr /c:"string3" *.txt ) | find /c /v "" ...