首页 \ 问答 \ 将多个过滤器应用于大熊猫DataFrame或Series的高效方法(Efficient way to apply multiple filters to pandas DataFrame or Series)

将多个过滤器应用于大熊猫DataFrame或Series的高效方法(Efficient way to apply multiple filters to pandas DataFrame or Series)

 我有一个场景，用户想要将几个过滤器应用于Pandas DataFrame或Series对象。 基本上，我想有效地将用户运行时指定的一堆过滤（比较操作）链接在一起。  
 过滤器应该是添加剂（也就是每个应用都应该缩小结果）。  
 我正在使用reindex()但是每次都会创建一个新对象，并复制底层数据（如果我正确理解文档）。 因此，过滤大型系列或DataFrame时，效率可能非常低。  
 我在想，使用apply() ， map()或类似的东西可能会更好。 我对熊猫来说很新，尽管如此，仍然试图围绕着一切。  
 TL; DR  
 我想使用以下表单的字典，并将每个操作应用于给定的Series对象并返回一个“已过滤”的系列对象。  
relops = {'>=': [1], '<=': [1]}
 
 长例子  
 我将从一个目前的例子开始，只是过滤一个单一的系列对象。 以下是我目前使用的功能：  
   def apply_relops(series, relops):
        """
        Pass dictionary of relational operators to perform on given series object
        """
        for op, vals in relops.iteritems():
            op_func = ops[op]
            for val in vals:
                filtered = op_func(series, val)
                series = series.reindex(series[filtered])
        return series
 
 用户提供一个字典与他们要执行的操作：  
>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})
>>> print df
>>> print df
   col1  col2
0     0    10
1     1    11
2     2    12

>>> from operator import le, ge
>>> ops ={'>=': ge, '<=': le}
>>> apply_relops(df['col1'], {'>=': [1]})
col1
1       1
2       2
Name: col1
>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})
col1
1       1
Name: col1
 
 再次，我上述方法的“问题”是我认为有可能不必要的复制数据的中间步骤。  
 此外，我想扩展这个，以便传入的字典可以包括列操作符，并根据输入字典过滤整个DataFrame。 但是，我假设为Series可以轻松扩展到DataFrame。 

I have a scenario where a user wants to apply several filters to a Pandas DataFrame or Series object. Essentially, I want to efficiently chain a bunch of filtering (comparison operations) together that are specified at run-time by the user. 
The filters should be additive (aka each one applied should narrow results). 
I'm currently using reindex() but this creates a new object each time and copies the underlying data (if I understand the documentation correctly). So, this could be really inefficient when filtering a big Series or DataFrame. 
I'm thinking that using apply(), map(), or something similar might be better. I'm pretty new to Pandas though so still trying to wrap my head around everything. 
TL;DR 
I want to take a dictionary of the following form and apply each operation to a given Series object and return a 'filtered' Series object. 
relops = {'>=': [1], '<=': [1]}
 
Long Example 
I'll start with an example of what I have currently and just filtering a single Series object. Below is the function I'm currently using: 
   def apply_relops(series, relops):
        """
        Pass dictionary of relational operators to perform on given series object
        """
        for op, vals in relops.iteritems():
            op_func = ops[op]
            for val in vals:
                filtered = op_func(series, val)
                series = series.reindex(series[filtered])
        return series
 
The user provides a dictionary with the operations they want to perform: 
>>> df = pandas.DataFrame({'col1': [0, 1, 2], 'col2': [10, 11, 12]})
>>> print df
>>> print df
   col1  col2
0     0    10
1     1    11
2     2    12

>>> from operator import le, ge
>>> ops ={'>=': ge, '<=': le}
>>> apply_relops(df['col1'], {'>=': [1]})
col1
1       1
2       2
Name: col1
>>> apply_relops(df['col1'], relops = {'>=': [1], '<=': [1]})
col1
1       1
Name: col1
 
Again, the 'problem' with my above approach is that I think there is a lot of possibly unnecessary copying of the data for the in-between steps. 
Also, I would like to expand this so that the dictionary passed in can include the columns to operator on and filter an entire DataFrame based on the input dictionary. However, I'm assuming whatever works for the Series can be easily expanded to a DataFrame.

原文：https://stackoverflow.com/questions/13611065

更新时间：2023-06-12 09:06

最满意答案

 你不应该使用go run来运行你的Go程序。 你应该用go build来编译它，然后用Upstart来运行它。  
 改用exec /path/to/your/binary 。  
 另请参阅： - 无法通过Upstart启动Golang Prog - https://coderwall.com/p/iekaog - https://groups.google.com/forum/m/#!topic/golang-nuts/uBrN-G7anKg （很多例子） 

You shouldn't be using go run to run your Go program. You should compile it with go build and then use Upstart to run that.  
Use exec /path/to/your/binary instead. 
Also see: - Can't Start Golang Prog Via Upstart - https://coderwall.com/p/iekaog - https://groups.google.com/forum/m/#!topic/golang-nuts/uBrN-G7anKg (lots of examples)

将多个过滤器应用于大熊猫DataFrame或Series的高效方法(Efficient way to apply multiple filters to pandas DataFrame or Series)

TL; DR

长例子

TL;DR

Long Example

最满意答案

相关问答

守护进程vs Upstart for python脚本(Daemon vs Upstart for python script)[2022-02-04]

Golang - 通过Upstart运行时找不到命令“go”(Golang - command “go” not found when running via Upstart)[2023-11-21]

Ubuntu Puma Upstart Script失败(Ubuntu Puma Upstart Script fails)[2021-09-05]

grunt：从终端运行时找不到命令(grunt: command not found when running from terminal)[2024-01-05]

Upstart - 导航到目录，git pull，然后运行daemontools(Upstart - navigate to directory, git pull, then run daemontools)[2022-02-06]

如何更改默认的upstart日志目录？(How to change the default upstart log directory?)[2024-02-26]

永远运行Upstart(Run Upstart with Forever)[2022-02-23]

Unicorn服务upstart脚本抛出“-su：bundle：command not found”(Unicorn service upstart script throws “-su: bundle: command not found”)[2023-10-30]

使用upstart运行简单节点服务器时遇到问题(Trouble using upstart to run simple node server)[2022-09-04]

在docker bash脚本中运行时找不到命令(command not found when running in docker bash script)[2024-02-18]

相关文章

最新问答