首页 \ 问答 \ barplot - 频率应基于列中的模式和比率（x轴）(barplot - frequency should be based on a pattern in column and ratio (x axis))

barplot - 频率应基于列中的模式和比率（x轴）(barplot - frequency should be based on a pattern in column and ratio (x axis))

 让我快速解释一下这个问题。 想象一下像这样的数据集  
data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"), 
                   "log2.ratio"=c(2.4,0,-1,-2,-1))
 
 实际上我的列表要长得多，可以说12000行。 我真正想做的是获得特定氨基酸模式的频率，然后绘制密度与log2比率的关系。 因此，例如，应在氨基酸列中检测到模式RXXS，有时序列由“;”分隔。 并且应该对两者进行模式分析。  
 我可以考虑像gsub和子集函数这样丑陋的东西来获得很多log2比率，但应该有一个优雅的解决方案。 （可能有密度函数??）  
 最后，我想获得密度（y）与log2raito（x）的特定模式和除了这种特定氨基酸序列模式之外的所有其他模式的图。 

let me quickly explain the problem. picture a dataset like this 
data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"), 
                   "log2.ratio"=c(2.4,0,-1,-2,-1))
 
In real my list is much longer lets say 12000 rows. What i really wanna do is to get the frequency for a specific amino acid pattern, and then plot the density vs the log2ratio. So for example the Pattern R-X-X-S should be detected in the amino acid column AND sometimes the sequence is separated by a ";" and the pattern analysis should be done for both.  
I can just think about something ugly like gsub and subset function for a lots of log2 ratios but there should be an elegant solution. (maybe with the density function??) 
In the end I would like to get a plot for density (y) vs log2raito (x) for a specific pattern AND for all other but this specific amino acid sequence pattern.

原文：https://stackoverflow.com/questions/13850071

更新时间：2023-05-26 13:05

最满意答案

 假设您自己的命令生成此标准输出：  
 [java] ny4aproxy5.company.com,36079435
 [java] ny4aproxy4.company.com,36079435
 [java] ny4aproxy12.company.com,36079441
 [java] ny4aproxy11.company.com,36079435
 [java] ny4aproxy3.company.com,36079435
 [java] ny4aproxy2.company.com,36079435
 [java] ny4aproxy1.company.com,36079435
 [java] ny4aproxy10.company.com,36079435
 [java] ny4aproxy9.company.com,36079441
 [java] ny4aproxy13.company.com,30079441
 
 以下命令仅返回阈值以下的高速缓存：  
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{if (($2/max)<0.9) print "outside threshold: " $0 }'
 
 我冒昧地改变了最后的缓存编号，给出了一个指示性的<90％的例子。 输出：  
outside threshold:  [java] ny4aproxy13.company.com,30079441
 
 或者如果你想知道所有的百分比：  
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{print $0, (max-$2)/max*100 }'
 
 输出：  
 [java] ny4aproxy5.company.com,36079435 1.663e-05
 [java] ny4aproxy4.company.com,36079435 1.663e-05
 [java] ny4aproxy12.company.com,36079441 0
 [java] ny4aproxy11.company.com,36079435 1.663e-05
 [java] ny4aproxy3.company.com,36079435 1.663e-05
 [java] ny4aproxy2.company.com,36079435 1.663e-05
 [java] ny4aproxy1.company.com,36079435 1.663e-05
 [java] ny4aproxy10.company.com,36079435 1.663e-05
 [java] ny4aproxy9.company.com,36079441 0
 [java] ny4aproxy13.company.com,30079441 16.63
 
 说明：  
 yourcommand | awk yourcommand | awk ：这个位将自定义命令的标准输出管道输出到awk  
 awk -F, ，：将输入分隔符声明为逗号  
 -v max=... ：因为我们需要迭代输出两次以获得最大值然后与max进行比较，我们必须提供awk操作以找到第二次awk操作的最大值。 即通过反引号中的第一个awk函数获得最大值，并通过-v标志将变量'max'传递给第二个awk函数。  
 {if ($2>max) max=$2}END {print max} ：简单循环查找最大值  
 {print $0, (max-$2)/max*100 } ：计算与最大值和打印原始行+附加百分比的百分比差异  
 if (($2/max)<0.9) print "outside than threshold: " $0 ：执行简单检查以查看缓存大小的比率是否至少为最大值的90％。 如果没有，请打印'违规'行 

Assuming that your own command generates this stdout: 
 [java] ny4aproxy5.company.com,36079435
 [java] ny4aproxy4.company.com,36079435
 [java] ny4aproxy12.company.com,36079441
 [java] ny4aproxy11.company.com,36079435
 [java] ny4aproxy3.company.com,36079435
 [java] ny4aproxy2.company.com,36079435
 [java] ny4aproxy1.company.com,36079435
 [java] ny4aproxy10.company.com,36079435
 [java] ny4aproxy9.company.com,36079441
 [java] ny4aproxy13.company.com,30079441
 
The following command returns only the caches that are under the threshold: 
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{if (($2/max)<0.9) print "outside threshold: " $0 }'
 
I've taken the liberty to change the last cache number to give an indicative <90% example. Output: 
outside threshold:  [java] ny4aproxy13.company.com,30079441
 
or if you want to know all the percentages: 
yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{print $0, (max-$2)/max*100 }'
 
Output:  
 [java] ny4aproxy5.company.com,36079435 1.663e-05
 [java] ny4aproxy4.company.com,36079435 1.663e-05
 [java] ny4aproxy12.company.com,36079441 0
 [java] ny4aproxy11.company.com,36079435 1.663e-05
 [java] ny4aproxy3.company.com,36079435 1.663e-05
 [java] ny4aproxy2.company.com,36079435 1.663e-05
 [java] ny4aproxy1.company.com,36079435 1.663e-05
 [java] ny4aproxy10.company.com,36079435 1.663e-05
 [java] ny4aproxy9.company.com,36079441 0
 [java] ny4aproxy13.company.com,30079441 16.63
 
Explanation: 
yourcommand | awk : This bit pipes the stdout of your custom command to awk 
awk -F,: declares the input delimiter as a comma 
-v max=...: since we need to iterate through the output twice to first get max then compare with max, we have to feed an awk operation to find max to the second awk operation. I.e. get max via first awk function in backticks and pass variable 'max' to second awk function via -v flag. 
{if ($2>max) max=$2}END {print max}: simple loop to find max value 
{print $0, (max-$2)/max*100 }: calculate percentage difference from max and print original row + append percentage 
if (($2/max)<0.9) print "outside than threshold: " $0: do a simple check to see if ratio of cache size is at least 90% of max. If not, print the 'offending' line

barplot - 频率应基于列中的模式和比率（x轴）(barplot - frequency should be based on a pattern in column and ratio (x axis))

最满意答案

相关问答

用bash脚本编写一个文件(writing a file in bash script)[2023-05-19]

如何创建一个bash脚本来检查SSH连接？(How to create a bash script to check the SSH connection?)[2022-06-07]

Bash：写一个会检查数字差异的脚本，怎么做？(Bash: Write a script which will check numeric differences, how is it done?)[2021-11-10]

根据模式检查bash脚本参数，并替换其中的字符(Check bash script argument against pattern, and replace a character in it)[2023-05-25]

BASH脚本 - 如何让bash检查文件的位置？(BASH script - How to get bash to check for the location of a file?)[2023-07-06]

如何使用numeric.js(How to use numeric.js)[2022-08-18]

BOOST_CHECK_CLOSE的一个版本，用于处理绝对差异(A version of BOOST_CHECK_CLOSE that deals with absolute differences)[2024-02-03]

（Bash？）脚本检查文件中的数字和约束，并为数字添加偏移量((Bash?)Script to check file for numbers and constraint and add offset to number)[2021-12-29]

Bash脚本 - 检查用户是否已登录(Bash Script - Check if user is logged in or not)[2024-01-20]

Bash：嵌套循环和“检查更改”脚本(Bash: Nested loops and “Check for change” script)[2021-09-12]

相关文章

最新问答