线程创建操作是否意味着在关系之前发生?(Do thread creation operations imply happens-before relationships?)
我知道锁可以确保线程之间的关系发生。 线程创建操作本身是否意味着事先发生关系? 换句话说,在下面的代码中,我们可以确保
#2
的输出是1吗? 这段代码是否有数据竞争?#include <iostream> #include <thread> using namespace std; void func(int *ptr) { cout << *ptr << endl; // #2 } int main() { int data = 1; // #1 thread t(func, &data); t.join(); return 0; }
I know that locks can ensure happens-before relationships among threads. Does a thread creation operation itself imply a happens-before relationship? In other words, in the code below, can we ensure that the output of
#2
is 1? Does this code have a data race?#include <iostream> #include <thread> using namespace std; void func(int *ptr) { cout << *ptr << endl; // #2 } int main() { int data = 1; // #1 thread t(func, &data); t.join(); return 0; }
原文:https://stackoverflow.com/questions/49460385
最满意答案
我们可以通过简单地将起始和结束索引与覆盖列长度的范围数组进行比较来利用
NumPy broadcasting
来实现矢量化解决方案,从而为我们提供一个掩码,该掩码表示输出数组中需要指定为1s
。所以,解决方案将是这样的 -
ncols = z.shape[1] r = np.arange(z.shape[1]) mask = (index[:,0,None] <= r) & (index[:,1,None] >= r) z[mask] = 1
样品运行 -
In [39]: index = np.array([[1,2],[2,4],[1,5],[5,6]]) ...: z = np.zeros(shape = [4,10], dtype = np.float32) In [40]: ncols = z.shape[1] ...: r = np.arange(z.shape[1]) ...: mask = (index[:,0,None] <= r) & (index[:,1,None] >= r) ...: z[mask] = 1 In [41]: z Out[41]: array([[0., 1., 1., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 1., 1., 0., 0., 0., 0., 0.], [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32)
如果
z
总是一个zeros-initialized
数组,我们可以直接从mask
获取输出 -z = mask.astype(int)
样品运行 -
In [37]: mask.astype(int) Out[37]: array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 1, 1, 0, 0, 0, 0, 0], [0, 1, 1, 1, 1, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])
标杆
比较@ hpaulj的
foo0
和我的foo4
如@ hpaulj的帖子中列出的1000
行和可变列数的集合。 我们从10
列开始,因为输入样本是如何列出的,我们给它的行数更多 -1000
。 我们会将列数增加到1000
。这是时间 -
In [14]: ncols = 10 ...: index = np.random.randint(0,ncols,(10000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [15]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 100 loops, best of 3: 6.27 ms per loop 1000 loops, best of 3: 594 µs per loop In [16]: ncols = 100 ...: index = np.random.randint(0,ncols,(10000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [17]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 100 loops, best of 3: 6.49 ms per loop 100 loops, best of 3: 2.74 ms per loop In [38]: ncols = 300 ...: index = np.random.randint(0,ncols,(1000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [39]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 1000 loops, best of 3: 657 µs per loop 1000 loops, best of 3: 600 µs per loop In [40]: ncols = 1000 ...: index = np.random.randint(0,ncols,(1000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [41]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 1000 loops, best of 3: 673 µs per loop 1000 loops, best of 3: 1.78 ms per loop
因此,选择最佳的一个将取决于在loopy和基于广播的矢量化之间设置的问题的列数。
We can leverage
NumPy broadcasting
for a vectorized solution by simply comparing the start and end indices against the ranged array covering the length of columns to give us a mask that represents all the places in the output array required to be assigned as1s
.So, the solution would be something like this -
ncols = z.shape[1] r = np.arange(z.shape[1]) mask = (index[:,0,None] <= r) & (index[:,1,None] >= r) z[mask] = 1
Sample run -
In [39]: index = np.array([[1,2],[2,4],[1,5],[5,6]]) ...: z = np.zeros(shape = [4,10], dtype = np.float32) In [40]: ncols = z.shape[1] ...: r = np.arange(z.shape[1]) ...: mask = (index[:,0,None] <= r) & (index[:,1,None] >= r) ...: z[mask] = 1 In [41]: z Out[41]: array([[0., 1., 1., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 1., 1., 0., 0., 0., 0., 0.], [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32)
If
z
is always azeros-initialized
array, we can directly get the output frommask
-z = mask.astype(int)
Sample run -
In [37]: mask.astype(int) Out[37]: array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 1, 1, 0, 0, 0, 0, 0], [0, 1, 1, 1, 1, 1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])
Benchmarking
Comparing @hpaulj's
foo0
and minefoo4
as listed in @hpaulj's post for a set with1000
rows and variable number of columns. We are starting with10
columns as that was how the input sample was listed and we are giving it a bigger number of rows -1000
. We would increase the number of columns to1000
.Here's the timings -
In [14]: ncols = 10 ...: index = np.random.randint(0,ncols,(10000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [15]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 100 loops, best of 3: 6.27 ms per loop 1000 loops, best of 3: 594 µs per loop In [16]: ncols = 100 ...: index = np.random.randint(0,ncols,(10000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [17]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 100 loops, best of 3: 6.49 ms per loop 100 loops, best of 3: 2.74 ms per loop In [38]: ncols = 300 ...: index = np.random.randint(0,ncols,(1000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [39]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 1000 loops, best of 3: 657 µs per loop 1000 loops, best of 3: 600 µs per loop In [40]: ncols = 1000 ...: index = np.random.randint(0,ncols,(1000,2)) ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32) In [41]: %timeit foo0(z,index) ...: %timeit foo4(z,index) 1000 loops, best of 3: 673 µs per loop 1000 loops, best of 3: 1.78 ms per loop
Thus, choosing the best one would depend on the number of columns of the problem set between the loopy and the broadcasting based vectorized one.
相关问答
更多-
这是(高级)部分索引的情况。 有2个索引数组和1个切片 如果索引子空间是分开的(按切片对象),则广播的索引空间是第一个,后面是x的切片子空间。 http://docs.scipy.org/doc/numpy-1.8.1/reference/arrays.indexing.html#advanced-indexing 先进的索引示例指出,当ind_1 , ind_2广播子空间是shape (2,3,4) : 但是,x [:,ind_1,:,ind_2]已经形成了(2,3,4,10,30,50),因为在索引子空 ...
-
你可以使用NumPy broadcasting - mask = bot_ix > np.arange(arr.shape[0])[:,None] out = np.true_divide(np.einsum('ij,ij->j',arr,mask),mask.sum(0)) 样本运行以验证结果 - In [431]: arr Out[431]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 1 ...
-
这是一个矢量化的方法,利用broadcasting来获取这些索引,用于索引每行的列,然后使用NumPy's advanced-indexing以矢量化方式从每行中提取出这些元素 - idx = A[:,0,None] + np.arange(stride+1) out = A[np.arange(idx.shape[0])[:,None], idx] 样品运行 - In [273]: A Out[273]: array([[ 1, 1, 2, 3, 4, 5, 6], [ 1, ...
-
您可以使用专为此设计的pandas索引。 'series'是附加索引的1d数组。 参考Wes McKinney的Python for Data Analysis: import pandas as pd temp = np.random.randn(366) time_series = pd.Series(temp,index=np.arange(np.datetime64('2015-12-19'),np.datetime64('2016-12-19'))) start = np.datetime64( ...
-
你可以这样做: var1="img" prescan_area_def = "[:, :20]" 并使用eval prescan_area=eval(var1+prescan_area_def) you can do something like: var1="img" prescan_area_def = "[:, :20]" and to use eval prescan_area=eval(var1+prescan_area_def)
-
我们可以通过简单地将起始和结束索引与覆盖列长度的范围数组进行比较来利用NumPy broadcasting来实现矢量化解决方案,从而为我们提供一个掩码,该掩码表示输出数组中需要指定为1s 。 所以,解决方案将是这样的 - ncols = z.shape[1] r = np.arange(z.shape[1]) mask = (index[:,0,None] <= r) & (index[:,1,None] >= r) z[mask] = 1 样品运行 - In [39]: index = np.array ...
-
问题是b[slice]创建了一个副本而不是一个视图(它触发了花式索引)。 代码b[slice][0:2]创建此副本的视图(不是原始b !)。 因此... b[slice][0:2] = a[slice][0:2] * 2 ...将a的对应行分配给b副本的视图。 因为它可能导致这些情况,所以最好不要以这种方式链接索引操作。 相反,只需先计算slice的相关行号,然后进行分配: slice = np.invert(percent).nonzero()[0][:2] # first two rows b[sli ...
-
信息是通过array.__array_interface__暴露的(也许更好的地方),但我认为你应该只使用memmaps开头而不是搞乱这个。 检查例如np.may_share_memory函数(或实际上是np.byte_bounds )的numpy代码。 The informaiton is exposed through array.__array_interface__ (maybe somewhere better too), however I think you should probably j ...
-
如何通过切片范围有效地索引到1D numpy数组(How to efficiently index into a 1D numpy array via slice ranges)[2022-03-27]
data = np.linspace(0,10,50) starts = np.array([0,10,21]) length = 5 对于NumPy这样做的唯一方法,您可以使用此处所述的numpy.meshgrid() http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html 正如hpaulj在评论中指出的那样,这个问题实际上不需要meshgrid,因为你可以使用数组广播。 http://docs.scipy.org/ ... -
您的代码似乎并不能保证您获得一段length ,例如 >>> A = numpy.array([1,3,5,3,9]) >>> bigslice(A, 0, 3) array([1, 3, 5, 3, 9, 1, 3, 5]) 假设这是一个疏忽,也许你可以使用np.pad ,例如 def wpad(A, begin_at, length): to_pad = max(length + begin_at - len(A), 0) return np.pad(A, (0, to_pad), m ...