首页 \ 问答 \ 线程创建操作是否意味着在关系之前发生？(Do thread creation operations imply happens-before relationships?)

线程创建操作是否意味着在关系之前发生？(Do thread creation operations imply happens-before relationships?)

 我知道锁可以确保线程之间的关系发生。 线程创建操作本身是否意味着事先发生关系？ 换句话说，在下面的代码中，我们可以确保#2的输出是1吗？ 这段代码是否有数据竞争？  
#include <iostream>
#include <thread>

using namespace std;

void func(int *ptr)
{
  cout << *ptr << endl; // #2
}

int main()
{
  int data = 1; // #1
  thread t(func, &data);
  t.join();

  return 0;
}

I know that locks can ensure happens-before relationships among threads. Does a thread creation operation itself imply a happens-before relationship? In other words, in the code below, can we ensure that the output of #2 is 1? Does this code have a data race? 
#include <iostream>
#include <thread>

using namespace std;

void func(int *ptr)
{
  cout << *ptr << endl; // #2
}

int main()
{
  int data = 1; // #1
  thread t(func, &data);
  t.join();

  return 0;
}

原文：https://stackoverflow.com/questions/49460385

更新时间：2024-03-22 08:03

最满意答案

 我们可以通过简单地将起始和结束索引与覆盖列长度的范围数组进行比较来利用NumPy broadcasting来实现矢量化解决方案，从而为我们提供一个掩码，该掩码表示输出数组中需要指定为1s 。  
 所以，解决方案将是这样的 -  
ncols = z.shape[1]
r = np.arange(z.shape[1])
mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
z[mask] = 1
 
 样品运行 -  
In [39]: index = np.array([[1,2],[2,4],[1,5],[5,6]])
    ...: z = np.zeros(shape = [4,10], dtype = np.float32)

In [40]: ncols = z.shape[1]
    ...: r = np.arange(z.shape[1])
    ...: mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
    ...: z[mask] = 1

In [41]: z
Out[41]: 
array([[0., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32)
 
 如果z总是一个zeros-initialized数组，我们可以直接从mask获取输出 -  
z = mask.astype(int)
 
 样品运行 -  
In [37]: mask.astype(int)
Out[37]: 
array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])
 
 
 标杆  
 比较@ hpaulj的foo0和我的foo4如@ hpaulj的帖子中列出的1000行和可变列数的集合。 我们从10列开始，因为输入样本是如何列出的，我们给它的行数更多 - 1000 。 我们会将列数增加到1000 。  
 这是时间 -  
In [14]: ncols = 10
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [15]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.27 ms per loop
1000 loops, best of 3: 594 µs per loop

In [16]: ncols = 100
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [17]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.49 ms per loop
100 loops, best of 3: 2.74 ms per loop

In [38]: ncols = 300
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [39]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 657 µs per loop
1000 loops, best of 3: 600 µs per loop

In [40]: ncols = 1000
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [41]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 673 µs per loop
1000 loops, best of 3: 1.78 ms per loop
 
 因此，选择最佳的一个将取决于在loopy和基于广播的矢量化之间设置的问题的列数。 

We can leverage NumPy broadcasting for a vectorized solution by simply comparing the start and end indices against the ranged array covering the length of columns to give us a mask that represents all the places in the output array required to be assigned as 1s. 
So, the solution would be something like this - 
ncols = z.shape[1]
r = np.arange(z.shape[1])
mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
z[mask] = 1
 
Sample run - 
In [39]: index = np.array([[1,2],[2,4],[1,5],[5,6]])
    ...: z = np.zeros(shape = [4,10], dtype = np.float32)

In [40]: ncols = z.shape[1]
    ...: r = np.arange(z.shape[1])
    ...: mask = (index[:,0,None] <= r) & (index[:,1,None] >= r)
    ...: z[mask] = 1

In [41]: z
Out[41]: 
array([[0., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 1., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 1., 0., 0., 0.]], dtype=float32)
 
If z is always a zeros-initialized array, we can directly get the output from mask - 
z = mask.astype(int)
 
Sample run - 
In [37]: mask.astype(int)
Out[37]: 
array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 0, 0, 0, 0, 0],
       [0, 1, 1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 1, 0, 0, 0]])
 
 
Benchmarking 
Comparing @hpaulj's foo0 and mine foo4 as listed in @hpaulj's post for a set with 1000 rows and variable number of columns. We are starting with 10 columns as that was how the input sample was listed and we are giving it a bigger number of rows - 1000. We would increase the number of columns to 1000.  
Here's the timings - 
In [14]: ncols = 10
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [15]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.27 ms per loop
1000 loops, best of 3: 594 µs per loop

In [16]: ncols = 100
    ...: index = np.random.randint(0,ncols,(10000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [17]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
100 loops, best of 3: 6.49 ms per loop
100 loops, best of 3: 2.74 ms per loop

In [38]: ncols = 300
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [39]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 657 µs per loop
1000 loops, best of 3: 600 µs per loop

In [40]: ncols = 1000
    ...: index = np.random.randint(0,ncols,(1000,2))
    ...: z = np.zeros(shape = [len(index),ncols], dtype = np.float32)

In [41]: %timeit foo0(z,index)
    ...: %timeit foo4(z,index)
1000 loops, best of 3: 673 µs per loop
1000 loops, best of 3: 1.78 ms per loop
 
Thus, choosing the best one would depend on the number of columns of the problem set between the loopy and the broadcasting based vectorized one.

线程创建操作是否意味着在关系之前发生？(Do thread creation operations imply happens-before relationships?)

最满意答案

标杆

Benchmarking

相关问答

numpy order数组切片索引如何？(How does numpy order array slice indices?)[2022-12-14]

具有不同切片的平均超过2d numpy阵列(Mean over 2d numpy array with varying slices)[2022-12-15]

Numpy（或Theano）中的切片矩阵(slice matrix in Numpy (or Theano))[2022-09-20]

如何切片Numpy datetime64数组(How to slice Numpy datetime64 array)[2022-05-27]

使用字符串定义Numpy数组切片(Using a string to define Numpy array slice)[2022-08-24]

如何使用开始和结束索引对numpy行进行切片(How to slice numpy rows using start and end index)[2023-11-05]

无法使用numpy将值分配给“双切片”(Cannot assign values to a 'double slice' using numpy)[2023-02-15]

在numpy数组中查找切片的位置(Find location of slice in numpy array)[2022-03-11]

如何通过切片范围有效地索引到1D numpy数组(How to efficiently index into a 1D numpy array via slice ranges)[2022-03-27]

提取比numpy数组大小更大的切片(Extract a larger slice than the numpy array's size)[2022-08-08]

相关文章

最新问答