首页 \ 问答 \ barplot - 频率应基于列中的模式和比率(x轴)(barplot - frequency should be based on a pattern in column and ratio (x axis))

barplot - 频率应基于列中的模式和比率(x轴)(barplot - frequency should be based on a pattern in column and ratio (x axis))

让我快速解释一下这个问题。 想象一下像这样的数据集

data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"), 
                   "log2.ratio"=c(2.4,0,-1,-2,-1))

实际上我的列表要长得多,可以说12000行。 我真正想做的是获得特定氨基酸模式的频率,然后绘制密度与log2比率的关系。 因此,例如,应在氨基酸列中检测到模式RXXS,有时序列由“;”分隔。 并且应该对两者进行模式分析。

我可以考虑像gsub和子集函数这样丑陋的东西来获得很多log2比率,但应该有一个优雅的解决方案。 (可能有密度函数??)

最后,我想获得密度(y)与log2raito(x)的特定模式和除了这种特定氨基酸序列模式之外的所有其他模式的图。


let me quickly explain the problem. picture a dataset like this

data<- data.frame("Amino.acid" = c("TRPPS;PNSTED", "ERDDS", "PSRND", "SDEEN", "GSRTN"), 
                   "log2.ratio"=c(2.4,0,-1,-2,-1))

In real my list is much longer lets say 12000 rows. What i really wanna do is to get the frequency for a specific amino acid pattern, and then plot the density vs the log2ratio. So for example the Pattern R-X-X-S should be detected in the amino acid column AND sometimes the sequence is separated by a ";" and the pattern analysis should be done for both.

I can just think about something ugly like gsub and subset function for a lots of log2 ratios but there should be an elegant solution. (maybe with the density function??)

In the end I would like to get a plot for density (y) vs log2raito (x) for a specific pattern AND for all other but this specific amino acid sequence pattern.


原文:https://stackoverflow.com/questions/13850071
更新时间:2023-05-26 13:05

最满意答案

假设您自己的命令生成此标准输出:

 [java] ny4aproxy5.company.com,36079435
 [java] ny4aproxy4.company.com,36079435
 [java] ny4aproxy12.company.com,36079441
 [java] ny4aproxy11.company.com,36079435
 [java] ny4aproxy3.company.com,36079435
 [java] ny4aproxy2.company.com,36079435
 [java] ny4aproxy1.company.com,36079435
 [java] ny4aproxy10.company.com,36079435
 [java] ny4aproxy9.company.com,36079441
 [java] ny4aproxy13.company.com,30079441

以下命令仅返回阈值以下的高速缓存:

yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{if (($2/max)<0.9) print "outside threshold: " $0 }'

我冒昧地改变了最后的缓存编号,给出了一个指示性的<90%的例子。 输出:

outside threshold:  [java] ny4aproxy13.company.com,30079441

或者如果你想知道所有的百分比:

yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{print $0, (max-$2)/max*100 }'

输出:

 [java] ny4aproxy5.company.com,36079435 1.663e-05
 [java] ny4aproxy4.company.com,36079435 1.663e-05
 [java] ny4aproxy12.company.com,36079441 0
 [java] ny4aproxy11.company.com,36079435 1.663e-05
 [java] ny4aproxy3.company.com,36079435 1.663e-05
 [java] ny4aproxy2.company.com,36079435 1.663e-05
 [java] ny4aproxy1.company.com,36079435 1.663e-05
 [java] ny4aproxy10.company.com,36079435 1.663e-05
 [java] ny4aproxy9.company.com,36079441 0
 [java] ny4aproxy13.company.com,30079441 16.63

说明:

yourcommand | awk yourcommand | awk :这个位将自定义命令的标准输出管道输出到awk

awk -F, ,:将输入分隔符声明为逗号

-v max=... :因为我们需要迭代输出两次以获得最大值然后与max进行比较,我们必须提供awk操作以找到第二次awk操作的最大值。 即通过反引号中的第一个awk函数获得最大值,并通过-v标志将变量'max'传递给第二个awk函数。

{if ($2>max) max=$2}END {print max} :简单循环查找最大值

{print $0, (max-$2)/max*100 } :计算与最大值和打印原始行+附加百分比的百分比差异

if (($2/max)<0.9) print "outside than threshold: " $0 :执行简单检查以查看缓存大小的比率是否至少为最大值的90%。 如果没有,请打印'违规'行


Assuming that your own command generates this stdout:

 [java] ny4aproxy5.company.com,36079435
 [java] ny4aproxy4.company.com,36079435
 [java] ny4aproxy12.company.com,36079441
 [java] ny4aproxy11.company.com,36079435
 [java] ny4aproxy3.company.com,36079435
 [java] ny4aproxy2.company.com,36079435
 [java] ny4aproxy1.company.com,36079435
 [java] ny4aproxy10.company.com,36079435
 [java] ny4aproxy9.company.com,36079441
 [java] ny4aproxy13.company.com,30079441

The following command returns only the caches that are under the threshold:

yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{if (($2/max)<0.9) print "outside threshold: " $0 }'

I've taken the liberty to change the last cache number to give an indicative <90% example. Output:

outside threshold:  [java] ny4aproxy13.company.com,30079441

or if you want to know all the percentages:

yourcommand | awk -F, -v max=`yourcommand | awk -F, 'OFS=","{if ($2>max) max=$2}END {print max}'` '{print $0, (max-$2)/max*100 }'

Output:

 [java] ny4aproxy5.company.com,36079435 1.663e-05
 [java] ny4aproxy4.company.com,36079435 1.663e-05
 [java] ny4aproxy12.company.com,36079441 0
 [java] ny4aproxy11.company.com,36079435 1.663e-05
 [java] ny4aproxy3.company.com,36079435 1.663e-05
 [java] ny4aproxy2.company.com,36079435 1.663e-05
 [java] ny4aproxy1.company.com,36079435 1.663e-05
 [java] ny4aproxy10.company.com,36079435 1.663e-05
 [java] ny4aproxy9.company.com,36079441 0
 [java] ny4aproxy13.company.com,30079441 16.63

Explanation:

yourcommand | awk : This bit pipes the stdout of your custom command to awk

awk -F,: declares the input delimiter as a comma

-v max=...: since we need to iterate through the output twice to first get max then compare with max, we have to feed an awk operation to find max to the second awk operation. I.e. get max via first awk function in backticks and pass variable 'max' to second awk function via -v flag.

{if ($2>max) max=$2}END {print max}: simple loop to find max value

{print $0, (max-$2)/max*100 }: calculate percentage difference from max and print original row + append percentage

if (($2/max)<0.9) print "outside than threshold: " $0: do a simple check to see if ratio of cache size is at least 90% of max. If not, print the 'offending' line

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)