比较numpy结构化数组(comparing numpy structured arrays)
快速的问题
我希望能够比较两个保证具有相同dtype的numpy结构化数组中的特定dtype字段。 我希望这样做的方式允许我们比较的字段每次根据给定的输入调用函数时(即我不能轻易地对每个字段的比较进行硬编码)
例子的长期问题
我试图比较具有相同dtype的两个numpy结构化数组中的特定字段。 比如说我们有
import numpy as np from io import BytesIO a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|') b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')
这使
In[156]: a Out[154]: array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')], dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
和
In[153]: b Out[151]: array([('', (23.0, 17.91), '0')], dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
现在让我们说我想检查并查找
a['pos']['x']
字段大于b['pos']['x']
字段的任何条目,并将这些条目返回给新的numpy数组,这样的东西会起作用newArr = a[a["pos"]["x"]>b["pos"]["x"]]
现在想象一下,我们只想保留
x
和y
字段都大于b
对应字段的条目。 这很简单,我们可以再做一次newArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]
返回一个空数组,这是正确的答案。
然而,现在想象一下,对于这些数组我们有一个非常复杂的dtype(比如有34个字段 - 请参阅这里的一个我正在使用的dtype的例子)我们希望能够比较它们中的任何一个,但可能不是全部它们(类似于前面的例子,但总体上有更多的dtype字段,我们想要比较更多的字段。此外,如果我们想要比较的字段可以在不同的运行中发生变化(所以我们不能真正硬编码)我上面做的方式。)这是我试图找到解决方案的问题。
我目前(未完成)尝试解决方案
使用蒙面数组
我首先想到的解决这个问题的方法是使用掩码数组来选择我们想要比较的数据类型字段。 这样的事情(假设我们可以使我们所有的比较相同):
mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)]) # unmask the x and y fields so we can compare them mask['pos']['x']=0 mask['pos']['y']=0 maskedA = np.ma.masked_array(a, mask=mask) # We need to do this or the masked array gets angry (at least in python 3) b.shape = (1,) maskedB = np.ma.masked_array(b, mask=mask)
现在我想做点什么
test = (maskedA>maskedB).any(axis=1)
但这不起作用,因为你可以像这样比较结构化数组 -
TypeError: unorderable types: MaskedArray() > MaskedArray()
我也试过压缩蒙面数组
test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)
这会导致不同的错误
TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
现在,我意识到上述错误可能是因为我不完全理解结构化和蒙版数组是如何工作的,但这也是我提出这个问题的部分原因。 有没有办法使用蒙面数组做这样的事情?
我刚想到的解决方案可能会起作用,整体上可能更好......
所以我在写这篇文章时想到的另一个选择就是在我解析用户输入以形成数组
b
的时候进行比较。 它实际上只是在解析器中为每个条件添加几行来进行比较,并将结果添加到一个numpy布尔数组中,然后我可以使用该数组从a
提取正确的条目。 现在我想起来这可能是要走的路。我漫长而漫无边际问题的结论。
尽管我认为我找到了解决这个问题的方法,但我仍然会发布这个问题至少一点点,看看是否(a)任何人对如何与结构化/屏蔽的numpy数组进行逻辑比较有任何想法,因为我认为知道这一点是有用的,(b)看看是否有人有更好的想法。 请注意,您可以通过逐行复制“长期问题与示例”部分中的片段来轻松地形成MWE,我看不出有任何理由通过这样做占用更多空间。
The quick problem
I would like to be able to compare specific dtype fields from two numpy structured arrays that are guaranteed to have the same dtype. I would like to do this in a way that allows the fields we are comparing to be different each time a function is called based on the given inputs (i.e. I can't easily hard code the comparisons for each individual field)
The long problem with examples
I am trying to compare specific fields from two numpy structured arrays with the same dtype. for instance, say we have
import numpy as np from io import BytesIO a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|') b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')
which gives
In[156]: a Out[154]: array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')], dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
and
In[153]: b Out[151]: array([('', (23.0, 17.91), '0')], dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])
Now lets say that I want to check and find any entries in
a
whosea['pos']['x']
field is greater than theb['pos']['x']
field and return these entries to a new numpy array, something like this would worknewArr = a[a["pos"]["x"]>b["pos"]["x"]]
Now imagine we want to keep only entries in
a
where both thex
andy
fields are greater than their counterparts inb
. This is fairly simple as we could again donewArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]
which returns an empty array which is the correct answer.
Now however, imagine that we have a very complicated dtype for these arrays (say with 34 fields -- see here for an example of the dtype I'm working with) and we want to be able to compare any of them but likely not all of them (similar to the previous example but with more dtype fields overall and more of them we want to compare. Further, what if the fields we want to compare can change from run to run (so we can't really hard code it in the way I did above). That is the problem I am trying to find the solution to.
My current (unfinished) attempts at solutions
Using masked arrays
My first thought to solving this problem was to use masked arrays to select the data type fields that we want to compare. Something like this (assuming we can make all our comparisons the same):
mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)]) # unmask the x and y fields so we can compare them mask['pos']['x']=0 mask['pos']['y']=0 maskedA = np.ma.masked_array(a, mask=mask) # We need to do this or the masked array gets angry (at least in python 3) b.shape = (1,) maskedB = np.ma.masked_array(b, mask=mask)
Now I would want to do something like
test = (maskedA>maskedB).any(axis=1)
but this doesn't work because you can compare structured arrays like this --
TypeError: unorderable types: MaskedArray() > MaskedArray()
I've also tried compressing the masked arrays
test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)
which results in a different error
TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Now, I realize that the above errors are likely because I don't fully understand how structured and masked arrays work but that is partially why I am asking this question. Is there any way to do something like this using masked arrays?
The solution I just thought of that will probably work and is probably better overall...
So the other option that I just thought of while writing this up is to just do the comparisons when I would be parsing the user's input to form array
b
anyway. It would really just be adding a couple of lines to each conditional in the parser to do the comparison and tack the results into a numpy boolean array that I could then use to extract the proper entries froma
. Now that I think about it this is probably the way to go.The conclusion to my long and rambling problem.
Despite the fact that I think I found a solution to this problem I am still going to post this question at least for a little bit to see if (a) anyone has any ideas about how to do logical comparisons with structured/masked numpy arrays because I think it would be a useful thing to know and (b) to see if anyone has a better idea then what I cam up with. Note that you can very easily form a MWE by copying line by line the snippets in the "The long problem with examples" section and I don't see any reason to take up more space by doing this.
原文:https://stackoverflow.com/questions/34421249
最满意答案
您可以使用
sequence(first:next:)
通过重复乘法计算a
的幂,将带有prefix(_:)
的(延迟求值)序列限制为所需的条目数,然后从截断的序列创建一个数组。 例:let a = 0.5 // The base let n = 4 // The maximal exponent let series = Array(sequence(first: a, next: { $0 * a }).prefix(n)) print(series) // [0.5, 0.25, 0.125, 0.0625]
另一个选项可以是枚举序列而不创建实际数组:
for x in sequence(first: a, next: { $0 * a }).prefix(n) { // do something with `x` }
You can use
sequence(first:next:)
to compute powers ofa
by repeated multiplication, limit the (lazily evaluated) sequence withprefix(_:)
to the desired number of entries, and then create an array from the truncated sequence. Example:let a = 0.5 // The base let n = 4 // The maximal exponent let series = Array(sequence(first: a, next: { $0 * a }).prefix(n)) print(series) // [0.5, 0.25, 0.125, 0.0625]
Another option can be to enumerate the sequence without creating an actual array:
for x in sequence(first: a, next: { $0 * a }).prefix(n) { // do something with `x` }
相关问答
更多-
下列中不属于面向对象的编程语言的是?[2022-05-30]
a -
几何级数的前n项的总和 c 0 + c 1 + ... + c n-1 由量子给出 (c n - 1)/(c - 1) 注意,如果c> 1,那么该数量从上面乘以c n -1并且从下面起c n-1 - 1 / c。 因此,它是O(c n )和Ω(c n ),因此它是Θ(c n )。 希望这可以帮助! The sum of the first n terms of the geometric series c0 + c1 + ... + cn-1 is given by the quantitiy (cn - ...
-
几何函数索引(Index of geometric functions)[2023-03-24]
在Java中没有函数指针,你需要通过继承和/或接口来完成。 这里是一个例子: interface Shape { void draw(int[] data); } class Polygon implements Shape { public void draw(int[] data) { // Draw polygon using points data[i], data[i+1] for points } } class Circle implements S ... -
下面是它在Pandas系列中对我的作用: N = 10 n0 = 0 n_array = np.arange(n0, n0 + N, 1) u = pd.Series(index = n_array) u[n0] = 1 q = 1.2 # option 1: u = pd.Series(u[n0]*q**(u.index.values - n0), index = n_array) # or option 2 with cumprod u[1:] = q u = u.cumprod() Here is ...
-
使用递归查找几何和(Finding Geometric sum using recursion)[2022-08-01]
第一个问题是转换为int,给出错误的结果,已经由reyeselda95描述。 隐藏了第二个问题,即如果你修复它,你会得到这个: public static double geometricSum(double n) { System.err.println("Calling with " + n); if(n == 0){ return 1; } n = n * 2; return 1.0 / n + geometricSum((1/Math.pow(2, ... -
答案是在开始时删除总计为0的权重。 Answer is to remove any rows at the beginning with weights totaling to 0.
-
Swift中的函数映射(Map of functions in Swift)[2022-10-31]
您可以通过在Dictionary中放置闭包来完成此操作 let map = [ "action": {() in print("action!") }, "error": {() in print("error!") } ] 或者通过在字典外创建函数并给它们命名,然后将这些名称传递给Dictionary func action() { print("action!") } func error() { print("error!") } let map = [ ... -
您可以使用sequence(first:next:)通过重复乘法计算a的幂,将带有prefix(_:)的(延迟求值)序列限制为所需的条目数,然后从截断的序列创建一个数组。 例: let a = 0.5 // The base let n = 4 // The maximal exponent let series = Array(sequence(first: a, next: { $0 * a }).prefix(n)) print(series) // [0.5, 0.25, 0.125, ...
-
查找使用简单几何级数创建的树的祖先节点(Finding the ancestral nodes of a tree created with a simple geometric progression)[2023-12-14]
如果您将根编号为0,则更容易看到该模式。 对于r = 4: 0 1 2 3 4 5..8 9..12 13..16 17..20 在这种情况下,每个节点k将具有floor((k-1)/r)作为其父节点。 但是,如果你在1开始编号,就像在 1 2 3 4 5 6..9 10..13 14..17 18..21 它变得有点麻烦。 你需要在除以 ... -
几何平均值的安全计算(Safe computation of Geometric Mean)[2023-07-15]
通常,在一系列浮点运算中也涉及诸如平方根或立方根的收缩操作,从精度的角度来看,最后执行收缩操作是有利的。 例如, sqrt(1.0/x)比1.0/sqrt(x)更准确, sqrt(a*b)比sqrt(a)*sqrt(b)更精确, cbrt(a*b*c)比cbrt(a)*cbrt(b)*cbrt(c)更准确。 因此,除非所选浮点格式(例如IEEE-754 binary64 (例如C / C ++中的double ))出现溢出或下溢的危险,否则应在中间计算中选择方法[2]。 与准确性相关的其他方面:如果通过取幂 ...