首页 \ 问答 \ 比较numpy结构化数组(comparing numpy structured arrays)

比较numpy结构化数组(comparing numpy structured arrays)

快速的问题

我希望能够比较两个保证具有相同dtype的numpy结构化数组中的特定dtype字段。 我希望这样做的方式允许我们比较的字段每次根据给定的输入调用函数时(即我不能轻易地对每个字段的比较进行硬编码)

例子的长期问题

我试图比较具有相同dtype的两个numpy结构化数组中的特定字段。 比如说我们有

import numpy as np
from io import BytesIO

a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')

b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')

这使

In[156]: a
Out[154]: 
array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])

In[153]: b
Out[151]: 
array([('', (23.0, 17.91), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])

现在让我们说我想检查并查找a['pos']['x']字段大于b['pos']['x']字段的任何条目,并将这些条目返回给新的numpy数组,这样的东西会起作用

newArr = a[a["pos"]["x"]>b["pos"]["x"]]

现在想象一下,我们只想保留xy字段都大于b对应字段的条目。 这很简单,我们可以再做一次

newArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]

返回一个空数组,这是正确的答案。

然而,现在想象一下,对于这些数组我们有一个非常复杂的dtype(比如有34个字段 - 请参阅这里的一个我正在使用的dtype的例子)我们希望能够比较它们中的任何一个,但可能不是全部它们(类似于前面的例子,但总体上有更多的dtype字段,我们想要比较更多的字段。此外,如果我们想要比较的字段可以在不同的运行中发生变化(所以我们不能真正硬编码)我上面做的方式。)这是我试图找到解决方案的问题。

我目前(未完成)尝试解决方案

使用蒙面数组

我首先想到的解决这个问题的方法是使用掩码数组来选择我们想要比较的数据类型字段。 这样的事情(假设我们可以使我们所有的比较相同):

mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)])
# unmask the x and y fields so we can compare them 
mask['pos']['x']=0
mask['pos']['y']=0

maskedA = np.ma.masked_array(a, mask=mask)
# We need to do this or the masked array gets angry (at least in python 3)
b.shape = (1,)

maskedB = np.ma.masked_array(b, mask=mask)

现在我想做点什么

test = (maskedA>maskedB).any(axis=1)

但这不起作用,因为你可以像这样比较结构化数组 -

TypeError: unorderable types: MaskedArray() > MaskedArray()

我也试过压缩蒙面数组

test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)

这会导致不同的错误

TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

现在,我意识到上述错误可能是因为我不完全理解结构化和蒙版数组是如何工作的,但这也是我提出这个问题的部分原因。 有没有办法使用蒙面数组做这样的事情?

我刚想到的解决方案可能会起作用,整体上可能更好......

所以我在写这篇文章时想到的另一个选择就是在我解析用户输入以形成数组b的时候进行比较。 它实际上只是在解析器中为每个条件添加几行来进行比较,并将结果添加到一个numpy布尔数组中,然后我可以使用该数组从a提取正确的条目。 现在我想起来这可能是要走的路。

我漫长而漫无边际问题的结论。

尽管我认为我找到了解决这个问题的方法,但我仍然会发布这个问题至少一点点,看看是否(a)任何人对如何与结构化/屏蔽的numpy数组进行逻辑比较有任何想法,因为我认为知道这一点是有用的,(b)看看是否有人有更好的想法。 请注意,您可以通过逐行复制“长期问题与示例”部分中的片段来轻松地形成MWE,我看不出有任何理由通过这样做占用更多空间。


The quick problem

I would like to be able to compare specific dtype fields from two numpy structured arrays that are guaranteed to have the same dtype. I would like to do this in a way that allows the fields we are comparing to be different each time a function is called based on the given inputs (i.e. I can't easily hard code the comparisons for each individual field)

The long problem with examples

I am trying to compare specific fields from two numpy structured arrays with the same dtype. for instance, say we have

import numpy as np
from io import BytesIO

a = np.genfromtxt(BytesIO('12 23 0|23.2|17.9|0\n12 23 1|13.4|16.9|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')

b = np.genfromtxt(BytesIO(' |23.0|17.91|0'.encode()),dtype=[('id','U7'),('pos',[('x',float),('y',float)]),('flag','U1')],delimiter='|')

which gives

In[156]: a
Out[154]: 
array([('12 23 0', (23.2, 17.9), '0'), ('12 23 1', (13.4, 16.9), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])

and

In[153]: b
Out[151]: 
array([('', (23.0, 17.91), '0')], 
      dtype=[('id', '<U7'), ('pos', [('x', '<f8'), ('y', '<f8')]), ('flag', '<U1')])

Now lets say that I want to check and find any entries in a whose a['pos']['x'] field is greater than the b['pos']['x'] field and return these entries to a new numpy array, something like this would work

newArr = a[a["pos"]["x"]>b["pos"]["x"]]

Now imagine we want to keep only entries in a where both the x and y fields are greater than their counterparts in b. This is fairly simple as we could again do

newArr = a[np.array([np.array([a['pos']['x']>b['pos']['x']),a['pos']['y']>b['pos']['y'])).all(axis=0)]

which returns an empty array which is the correct answer.

Now however, imagine that we have a very complicated dtype for these arrays (say with 34 fields -- see here for an example of the dtype I'm working with) and we want to be able to compare any of them but likely not all of them (similar to the previous example but with more dtype fields overall and more of them we want to compare. Further, what if the fields we want to compare can change from run to run (so we can't really hard code it in the way I did above). That is the problem I am trying to find the solution to.

My current (unfinished) attempts at solutions

Using masked arrays

My first thought to solving this problem was to use masked arrays to select the data type fields that we want to compare. Something like this (assuming we can make all our comparisons the same):

mask = np.ones(z.shape,dtype=[('id',bool),('pos',[('x',bool),('y',bool)]),('flag',bool)])
# unmask the x and y fields so we can compare them 
mask['pos']['x']=0
mask['pos']['y']=0

maskedA = np.ma.masked_array(a, mask=mask)
# We need to do this or the masked array gets angry (at least in python 3)
b.shape = (1,)

maskedB = np.ma.masked_array(b, mask=mask)

Now I would want to do something like

test = (maskedA>maskedB).any(axis=1)

but this doesn't work because you can compare structured arrays like this --

TypeError: unorderable types: MaskedArray() > MaskedArray()

I've also tried compressing the masked arrays

test = (maskedA.compressed()>maskedB.compressed()).any(axis=1)

which results in a different error

TypeError: ufunc 'logical_not' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Now, I realize that the above errors are likely because I don't fully understand how structured and masked arrays work but that is partially why I am asking this question. Is there any way to do something like this using masked arrays?

The solution I just thought of that will probably work and is probably better overall...

So the other option that I just thought of while writing this up is to just do the comparisons when I would be parsing the user's input to form array b anyway. It would really just be adding a couple of lines to each conditional in the parser to do the comparison and tack the results into a numpy boolean array that I could then use to extract the proper entries from a. Now that I think about it this is probably the way to go.

The conclusion to my long and rambling problem.

Despite the fact that I think I found a solution to this problem I am still going to post this question at least for a little bit to see if (a) anyone has any ideas about how to do logical comparisons with structured/masked numpy arrays because I think it would be a useful thing to know and (b) to see if anyone has a better idea then what I cam up with. Note that you can very easily form a MWE by copying line by line the snippets in the "The long problem with examples" section and I don't see any reason to take up more space by doing this.


原文:https://stackoverflow.com/questions/34421249
更新时间:2022-09-28 13:09

最满意答案

您可以使用sequence(first:next:)通过重复乘法计算a的幂,将带有prefix(_:)的(延迟求值)序列限制为所需的条目数,然后从截断的序列创建一个数组。 例:

let a = 0.5   // The base
let n = 4     // The maximal exponent

let series = Array(sequence(first: a, next: { $0 * a }).prefix(n))
print(series) // [0.5, 0.25, 0.125, 0.0625]

另一个选项可以是枚举序列而不创建实际数组:

for x in sequence(first: a, next: { $0 * a }).prefix(n) {
    // do something with `x`
}

You can use sequence(first:next:) to compute powers of a by repeated multiplication, limit the (lazily evaluated) sequence with prefix(_:) to the desired number of entries, and then create an array from the truncated sequence. Example:

let a = 0.5   // The base
let n = 4     // The maximal exponent

let series = Array(sequence(first: a, next: { $0 * a }).prefix(n))
print(series) // [0.5, 0.25, 0.125, 0.0625]

Another option can be to enumerate the sequence without creating an actual array:

for x in sequence(first: a, next: { $0 * a }).prefix(n) {
    // do something with `x`
}

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)