首页 \ 问答 \ 比较numpy结构化数组(comparing numpy structured arrays)

比较numpy结构化数组(comparing numpy structured arrays)

 快速的问题  
 我希望能够比较两个保证具有相同dtype的numpy结构化数组中的特定dtype字段。 我希望这样做的方式允许我们比较的字段每次根据给定的输入调用函数时（即我不能轻易地对每个字段的比较进行硬编码）  
 例子的长期问题  
 我试图比较具有相同dtype的两个numpy结构化数组中的特定字段。 比如说我们有  
import numpy as np
from io import BytesIO

a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')

b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')
 
 这使  
In[156]: a
Out[154]: 
array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
 
 和  
In[153]: b
Out[151]: 
array([('', (23.0, 17.91), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
 
 现在让我们说我想检查并查找a['pos']['x']字段大于b['pos']['x']字段的任何条目，并将这些条目返回给新的numpy数组，这样的东西会起作用  
newArr = a[a["pos"]["x"]>b["pos"]["x"]]
 
 现在想象一下，我们只想保留x和y字段都大于b对应字段的条目。 这很简单，我们可以再做一次  
newArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]
 
 返回一个空数组，这是正确的答案。  
 然而，现在想象一下，对于这些数组我们有一个非常复杂的dtype（比如有34个字段 - 请参阅这里的一个我正在使用的dtype的例子）我们希望能够比较它们中的任何一个，但可能不是全部它们（类似于前面的例子，但总体上有更多的dtype字段，我们想要比较更多的字段。此外，如果我们想要比较的字段可以在不同的运行中发生变化（所以我们不能真正硬编码）我上面做的方式。）这是我试图找到解决方案的问题。  
 我目前（未完成）尝试解决方案  
 使用蒙面数组  
 我首先想到的解决这个问题的方法是使用掩码数组来选择我们想要比较的数据类型字段。 这样的事情（假设我们可以使我们所有的比较相同）：  
mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)])
# unmask the x and y fields so we can compare them 
mask['pos']['x']=0
mask['pos']['y']=0

maskedA = np.ma.masked_array(a, mask=mask)
# We need to do this or the masked array gets angry (at least in python 3)
b.shape = (1,)

maskedB = np.ma.masked_array(b, mask=mask)
 
 现在我想做点什么  
test = (maskedA>maskedB).any(axis=1)
 
 但这不起作用，因为你可以像这样比较结构化数组 -  
TypeError: unorderable types: MaskedArray() > MaskedArray()
 
 我也试过压缩蒙面数组  
test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)
 
 这会导致不同的错误  
TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
 
 现在，我意识到上述错误可能是因为我不完全理解结构化和蒙版数组是如何工作的，但这也是我提出这个问题的部分原因。 有没有办法使用蒙面数组做这样的事情？  
 我刚想到的解决方案可能会起作用，整体上可能更好......  
 所以我在写这篇文章时想到的另一个选择就是在我解析用户输入以形成数组b的时候进行比较。 它实际上只是在解析器中为每个条件添加几行来进行比较，并将结果添加到一个numpy布尔数组中，然后我可以使用该数组从a提取正确的条目。 现在我想起来这可能是要走的路。  
 我漫长而漫无边际问题的结论。  
 尽管我认为我找到了解决这个问题的方法，但我仍然会发布这个问题至少一点点，看看是否（a）任何人对如何与结构化/屏蔽的numpy数组进行逻辑比较有任何想法，因为我认为知道这一点是有用的，（b）看看是否有人有更好的想法。 请注意，您可以通过逐行复制“长期问题与示例”部分中的片段来轻松地形成MWE，我看不出有任何理由通过这样做占用更多空间。 

The quick problem 
I would like to be able to compare specific dtype fields from two numpy structured arrays that are guaranteed to have the same dtype. I would like to do this in a way that allows the fields we are comparing to be different each time a function is called based on the given inputs (i.e. I can't easily hard code the comparisons for each individual field) 
The long problem with examples 
I am trying to compare specific fields from two numpy structured arrays with the same dtype. for instance, say we have 
import numpy as np
from io import BytesIO

a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')

b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')
 
which gives  
In[156]: a
Out[154]: 
array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
 
and  
In[153]: b
Out[151]: 
array([('', (23.0, 17.91), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
 
Now lets say that I want to check and find any entries in a whose a['pos']['x'] field is greater than the b['pos']['x'] field and return these entries to a new numpy array, something like this would work 
newArr = a[a["pos"]["x"]>b["pos"]["x"]]
 
Now imagine we want to keep only entries in a where both the x and y fields are greater than their counterparts in b. This is fairly simple as we could again do 
newArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]
 
which returns an empty array which is the correct answer. 
Now however, imagine that we have a very complicated dtype for these arrays (say with 34 fields -- see here for an example of the dtype I'm working with) and we want to be able to compare any of them but likely not all of them (similar to the previous example but with more dtype fields overall and more of them we want to compare. Further, what if the fields we want to compare can change from run to run (so we can't really hard code it in the way I did above). That is the problem I am trying to find the solution to. 
My current (unfinished) attempts at solutions 
Using masked arrays 
My first thought to solving this problem was to use masked arrays to select the data type fields that we want to compare. Something like this (assuming we can make all our comparisons the same): 
mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)])
# unmask the x and y fields so we can compare them 
mask['pos']['x']=0
mask['pos']['y']=0

maskedA = np.ma.masked_array(a, mask=mask)
# We need to do this or the masked array gets angry (at least in python 3)
b.shape = (1,)

maskedB = np.ma.masked_array(b, mask=mask)
 
Now I would want to do something like 
test = (maskedA>maskedB).any(axis=1)
 
but this doesn't work because you can compare structured arrays like this -- 
TypeError: unorderable types: MaskedArray() > MaskedArray()
 
I've also tried compressing the masked arrays  
test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)
 
which results in a different error 
TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
 
Now, I realize that the above errors are likely because I don't fully understand how structured and masked arrays work but that is partially why I am asking this question. Is there any way to do something like this using masked arrays? 
The solution I just thought of that will probably work and is probably better overall... 
So the other option that I just thought of while writing this up is to just do the comparisons when I would be parsing the user's input to form array b anyway. It would really just be adding a couple of lines to each conditional in the parser to do the comparison and tack the results into a numpy boolean array that I could then use to extract the proper entries from a. Now that I think about it this is probably the way to go. 
The conclusion to my long and rambling problem. 
Despite the fact that I think I found a solution to this problem I am still going to post this question at least for a little bit to see if (a) anyone has any ideas about how to do logical comparisons with structured/masked numpy arrays because I think it would be a useful thing to know and (b) to see if anyone has a better idea then what I cam up with. Note that you can very easily form a MWE by copying line by line the snippets in the "The long problem with examples" section and I don't see any reason to take up more space by doing this.

原文：https://stackoverflow.com/questions/34421249

更新时间：2022-09-28 13:09

最满意答案

 您可以使用sequence(first:next:)通过重复乘法计算a的幂，将带有prefix(_:)的（延迟求值）序列限制为所需的条目数，然后从截断的序列创建一个数组。 例：  
let a = 0.5   // The base
let n = 4     // The maximal exponent

let series = Array(sequence(first: a, next: { $0 * a }).prefix(n))
print(series) // [0.5, 0.25, 0.125, 0.0625]
 
 另一个选项可以是枚举序列而不创建实际数组：  
for x in sequence(first: a, next: { $0 * a }).prefix(n) {
    // do something with `x`
}

You can use sequence(first:next:) to compute powers of a by repeated multiplication, limit the (lazily evaluated) sequence with prefix(_:) to the desired number of entries, and then create an array from the truncated sequence. Example: 
let a = 0.5   // The base
let n = 4     // The maximal exponent

let series = Array(sequence(first: a, next: { $0 * a }).prefix(n))
print(series) // [0.5, 0.25, 0.125, 0.0625]
 
Another option can be to enumerate the sequence without creating an actual array: 
for x in sequence(first: a, next: { $0 * a }).prefix(n) {
    // do something with `x`
}

比较numpy结构化数组(comparing numpy structured arrays)

快速的问题

例子的长期问题

我目前（未完成）尝试解决方案

使用蒙面数组

我刚想到的解决方案可能会起作用，整体上可能更好......

我漫长而漫无边际问题的结论。

The quick problem

The long problem with examples

My current (unfinished) attempts at solutions

Using masked arrays

The solution I just thought of that will probably work and is probably better overall...

The conclusion to my long and rambling problem.

最满意答案

相关问答

下列中不属于面向对象的编程语言的是?[2022-05-30]

Θ表示几何级数的总和(Θ notation for the sum of a geometric series)[2023-03-18]

几何函数索引(Index of geometric functions)[2023-03-24]

使用Python / Pandas / Numpy的几何级数（无循环和使用循环）(Geometric progression using Python / Pandas / Numpy (without loop and using recurrence))[2022-05-06]

使用递归查找几何和(Finding Geometric sum using recursion)[2022-08-01]

R PerformanceAnalytics :: Return.portfolio（）在geometric = TRUE时生成NaN(R PerformanceAnalytics::Return.portfolio() generates NaN when geometric=TRUE)[2021-12-27]

Swift中的函数映射(Map of functions in Swift)[2022-10-31]

在Swift中映射几何系列的功能方法(Functional way to map a geometric series in Swift)[2021-12-01]

查找使用简单几何级数创建的树的祖先节点(Finding the ancestral nodes of a tree created with a simple geometric progression)[2023-12-14]

几何平均值的安全计算(Safe computation of Geometric Mean)[2023-07-15]

相关文章

最新问答