barplot - 频率应基于列中的模式和比率(x轴)(barplot - frequency should be based on a pattern in column and ratio (x axis))
让我快速解释一下这个问题。 想象一下像这样的数据集
data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"), "log2.ratio"=c(2.4,0,-1,-2,-1))
实际上我的列表要长得多,可以说12000行。 我真正想做的是获得特定氨基酸模式的频率,然后绘制密度与log2比率的关系。 因此,例如,应在氨基酸列中检测到模式RXXS,有时序列由“;”分隔。 并且应该对两者进行模式分析。
我可以考虑像gsub和子集函数这样丑陋的东西来获得很多log2比率,但应该有一个优雅的解决方案。 (可能有密度函数??)
最后,我想获得密度(y)与log2raito(x)的特定模式和除了这种特定氨基酸序列模式之外的所有其他模式的图。
let me quickly explain the problem. picture a dataset like this
data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"), "log2.ratio"=c(2.4,0,-1,-2,-1))
In real my list is much longer lets say 12000 rows. What i really wanna do is to get the frequency for a specific amino acid pattern, and then plot the density vs the log2ratio. So for example the Pattern R-X-X-S should be detected in the amino acid column AND sometimes the sequence is separated by a ";" and the pattern analysis should be done for both.
I can just think about something ugly like gsub and subset function for a lots of log2 ratios but there should be an elegant solution. (maybe with the density function??)
In the end I would like to get a plot for density (y) vs log2raito (x) for a specific pattern AND for all other but this specific amino acid sequence pattern.
原文:https://stackoverflow.com/questions/13850071
最满意答案
假设您自己的命令生成此标准输出:
[java] ny4aproxy5.company.com,36079435 [java] ny4aproxy4.company.com,36079435 [java] ny4aproxy12.company.com,36079441 [java] ny4aproxy11.company.com,36079435 [java] ny4aproxy3.company.com,36079435 [java] ny4aproxy2.company.com,36079435 [java] ny4aproxy1.company.com,36079435 [java] ny4aproxy10.company.com,36079435 [java] ny4aproxy9.company.com,36079441 [java] ny4aproxy13.company.com,30079441
以下命令仅返回阈值以下的高速缓存:
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{if (($2/max)<0.9) print "outside threshold: " $0 }'
我冒昧地改变了最后的缓存编号,给出了一个指示性的<90%的例子。 输出:
outside threshold: [java] ny4aproxy13.company.com,30079441
或者如果你想知道所有的百分比:
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{print $0, (max-$2)/max*100 }'
输出:
[java] ny4aproxy5.company.com,36079435 1.663e-05 [java] ny4aproxy4.company.com,36079435 1.663e-05 [java] ny4aproxy12.company.com,36079441 0 [java] ny4aproxy11.company.com,36079435 1.663e-05 [java] ny4aproxy3.company.com,36079435 1.663e-05 [java] ny4aproxy2.company.com,36079435 1.663e-05 [java] ny4aproxy1.company.com,36079435 1.663e-05 [java] ny4aproxy10.company.com,36079435 1.663e-05 [java] ny4aproxy9.company.com,36079441 0 [java] ny4aproxy13.company.com,30079441 16.63
说明:
yourcommand | awk
yourcommand | awk
:这个位将自定义命令的标准输出管道输出到awk
awk -F,
,:将输入分隔符声明为逗号
-v max=...
:因为我们需要迭代输出两次以获得最大值然后与max进行比较,我们必须提供awk操作以找到第二次awk操作的最大值。 即通过反引号中的第一个awk函数获得最大值,并通过-v标志将变量'max'传递给第二个awk函数。
{if ($2>max) max=$2}END {print max}
:简单循环查找最大值
{print $0, (max-$2)/max*100 }
:计算与最大值和打印原始行+附加百分比的百分比差异
if (($2/max)<0.9) print "outside than threshold: " $0
:执行简单检查以查看缓存大小的比率是否至少为最大值的90%。 如果没有,请打印'违规'行Assuming that your own command generates this stdout:
[java] ny4aproxy5.company.com,36079435 [java] ny4aproxy4.company.com,36079435 [java] ny4aproxy12.company.com,36079441 [java] ny4aproxy11.company.com,36079435 [java] ny4aproxy3.company.com,36079435 [java] ny4aproxy2.company.com,36079435 [java] ny4aproxy1.company.com,36079435 [java] ny4aproxy10.company.com,36079435 [java] ny4aproxy9.company.com,36079441 [java] ny4aproxy13.company.com,30079441
The following command returns only the caches that are under the threshold:
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{if (($2/max)<0.9) print "outside threshold: " $0 }'
I've taken the liberty to change the last cache number to give an indicative <90% example. Output:
outside threshold: [java] ny4aproxy13.company.com,30079441
or if you want to know all the percentages:
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{print $0, (max-$2)/max*100 }'
Output:
[java] ny4aproxy5.company.com,36079435 1.663e-05 [java] ny4aproxy4.company.com,36079435 1.663e-05 [java] ny4aproxy12.company.com,36079441 0 [java] ny4aproxy11.company.com,36079435 1.663e-05 [java] ny4aproxy3.company.com,36079435 1.663e-05 [java] ny4aproxy2.company.com,36079435 1.663e-05 [java] ny4aproxy1.company.com,36079435 1.663e-05 [java] ny4aproxy10.company.com,36079435 1.663e-05 [java] ny4aproxy9.company.com,36079441 0 [java] ny4aproxy13.company.com,30079441 16.63
Explanation:
yourcommand | awk
: This bit pipes the stdout of your custom command to awk
awk -F,
: declares the input delimiter as a comma
-v max=...
: since we need to iterate through the output twice to first get max then compare with max, we have to feed an awk operation to find max to the second awk operation. I.e. get max via first awk function in backticks and pass variable 'max' to second awk function via -v flag.
{if ($2>max) max=$2}END {print max}
: simple loop to find max value
{print $0, (max-$2)/max*100 }
: calculate percentage difference from max and print original row + append percentage
if (($2/max)<0.9) print "outside than threshold: " $0
: do a simple check to see if ratio of cache size is at least 90% of max. If not, print the 'offending' line
相关问答
更多-
用bash脚本编写一个文件(writing a file in bash script)[2023-05-19]
更新:因为这是一个bash问题,你应该先尝试一下。 ;) cat <<':q' >> test.file 要了解正在发生的事情,请阅读bash的IO重定向 , heredoc语法和cat命令 正如你在上面看到的,有很多方法可以做到这一点。 为了解释一些更多的bash命令,我已经按照你的要求准备了函数: #!/bin/bash write_to_file() { # initialize a local var local file="test.file" # check ... -
你可以使用返回值ssh来检查这个给你: $ ssh -q user@downhost exit $ echo $? 255 $ ssh -q user@uphost exit $ echo $? 0 编辑:另一种方法是使用nmap(你不需要键或登录东西): $ a=`nmap uphost -PN -p ssh | grep open` $ b=`nmap downhost -PN -p ssh | grep open` $ echo $a 22/tcp open ssh $ echo $b (emp ...
-
Bash:写一个会检查数字差异的脚本,怎么做?(Bash: Write a script which will check numeric differences, how is it done?)[2021-11-10]
假设您自己的命令生成此标准输出: [java] ny4aproxy5.company.com,36079435 [java] ny4aproxy4.company.com,36079435 [java] ny4aproxy12.company.com,36079441 [java] ny4aproxy11.company.com,36079435 [java] ny4aproxy3.company.com,36079435 [java] ny4aproxy2.company.com,360794 ... -
根据模式检查bash脚本参数,并替换其中的字符(Check bash script argument against pattern, and replace a character in it)[2023-05-25]
同样的正则表达式也适用于BASH。 考虑以下代码段: re='^[a-zA-Z0-9]+\.[a-zA-Z]{2,}$' s='abc123.xy' [[ $s =~ $re ]] && echo "${s/./_}" abc123_xy Same regex works in BASH also. Consider this snippet: re='^[a-zA-Z0-9]+\.[a-zA-Z]{2,}$' s='abc123.xy' [[ $s =~ $re ]] && echo "${s/./ ... -
BASH脚本 - 如何让bash检查文件的位置?(BASH script - How to get bash to check for the location of a file?)[2023-07-06]
尝试这个: if [ -e /usr/bin/php ]; then Try this: if [ -e /usr/bin/php ]; then -
如何使用numeric.js(How to use numeric.js)[2022-08-18]
尝试直接从网站调用: try to call directly from the website: -
BOOST_CHECK_CLOSE的一个版本,用于处理绝对差异(A version of BOOST_CHECK_CLOSE that deals with absolute differences)[2024-02-03]
TBH,我不明白BOOST_CHECK_SMALL的问题: double val = func(); //should be between 95 and 105 BOOST_CHECK_SMALL(val - 100, 5); 但是,如果它看起来更像是CHECK_CLOSE,那么你必须按照boost的方式滚动你自己的宏 - 只需深入挖掘相应的标题。 您可能必须添加自己的检查类型枚举值,创建谓词并在宏中使用它。 或者使用BOOST_CHECK_PREDICATE ,它实际上看起来比CHECK_SMALL更 ... -
用vim打开你的文件,然后输入: :%s/foo="\zs\(-[0-9]\+\)"\ze/\=submatch(1)+1000/g 它应该做你的工作4。 编辑 awk oneliner适用于您的新要求。 awk '{for(i=1;i<=NF;i++) if($i~/foo="/){ split($i,x,"\""); x[2]=x[2]<100?x[2]+1000:x[2]; $i="foo=\""x[2]"\""; }}1' yourFile 在上面的oneliner中,标准是foo属性值<10 ...
-
您必须提供以下内容以检查用户是否提供了参数。 if [ $# -eq 0 ] ; then $#用于命令行中提供的参数个数 您的完整代码应该如下所示, #!/bin/bash function checkUser { status= ...
-
你的循环可能是这样的: while true; do grep "particular string" new.html if [ $? -gt 0 ]; then break; fi curl $URL -s > new.html done 符号$? 返回上次执行的命令(即grep)的退出代码。 如果找到匹配项,则grep返回0;如果未找到匹配项,则返回1。 其他错误代码表示其他问题 - 例如2意味着找不到文件。 您可能应该优化此检查,或确保文件已正确下载。 如果你需要运行几个检 ...