首页 \ 问答 \ 我的OpenCL代码基于看似noop来改变输出(My OpenCL code changes the output based on a seemingly noop)

我的OpenCL代码基于看似noop来改变输出(My OpenCL code changes the output based on a seemingly noop)

我在Intel CPU和NVIDIA GPU上运行相同的OpenCL内核代码,结果在第一个上是错误的,但在后者上是正确的; 奇怪的是,如果我做了一些看似无关的改变,那么输出在两种情况下都会按预期工作。

该函数的目标是计算A(三角形)和B(常规)之间的矩阵乘法,其中A在操作中的位置由变量left的值确定。 仅当left为true且for循环至少迭代两次时才会出现该错误。

这是代码的一个片段,省略了一些不应该为了清晰起见而影响的位。

__kernel void blas_strmm(int left, int upper, int nota, int unit, int row, int dim, int m, int n,
                         float alpha, __global const float *a, __global const float *b, __global float *c) {

  /* [...] */
  int ty = get_local_id(1);
  int y = ty + BLOCK_SIZE * get_group_id(1);
  int by = y;
  __local float Bs[BLOCK_SIZE][BLOCK_SIZE];
  /* [...] */

  for(int i=start; i<end; i+=BLOCK_SIZE) {
    if(left) {
      ay = i+ty;
      bx = i+tx;
    }   
    else {
      ax = i+tx;
      by = i+ty;
    }   

    barrier(CLK_LOCAL_MEM_FENCE);
    /* [...] (Load As) */
    if(bx >= m || by >= n)
      Bs[tx][ty] = 0;
    else
      Bs[tx][ty] = b[bx*n+by];
    barrier(CLK_LOCAL_MEM_FENCE);

    /* [...] (Calculate Csub) */
  }

  if(y < n && x < (left ? row : m)) // In bounds
    c[x*n+y] = alpha*Csub;
}

现在它变得奇怪了。

如您所见,如果left为真,则始终等于y 。 我检查过(有一些printf ,请注意)并且left总是为true,并且循环中else分支上的代码永远不会被执行。 然而,如果我删除或注释掉那里的by = i+ty行,代码就可以了。 为什么? 我还不知道,但我可能会by没有分配预期值而与之相关。

我的思路让我去检查byy之间是否存在差异,因为它们应该始终具有相同的值; 我添加了一行检查是否by != y但该比较总是返回false,如预期的那样。 所以我继续改变了y的外观,所以这条线

if(bx >= m || by >= n)

转化成

if(bx >= m || y >= n)

并且它再次起作用,即使我仍然使用下面正确的三行变量。

我以开放的心态尝试了其他一些事情,如果我在循环中添加以下行,只要它位于初始if / else之后和if条件之前的任何一点,我就明白代码是有效的。我之前提到过。

if(y >= n) left = 1;

里面的代码( left = 1 )可以代替任何东西( printf ,另一个无用的赋值等),但条件有点限制。 以下是一些使代码输出正确值的示例:

if(y >= n) left = 1;
if(y < n) left = 1;
if(y+1 < n+1) left = 1;
if(n > y) left = 1;

有些不起作用,请注意我正在测试的特定示例中的m = n

if(y >= n+1) left = 1;
if(y > n) left = 1;
if(y >= m) left = 1;
/* etc. */

这就是我现在的意义。 我添加了一行不应该影响程序,但它使它工作。 这个神奇的解决方案对我来说并不令人满意,我想知道我的CPU内部发生了什么以及为什么。

为了确保我没有忘记任何事情,这里是完整的功能代码带有示例输入和输出要点

非常感谢你。


两个用户DarkZeros和sharpneli都对他们的假设是正确的:for循环内部的障碍没有达到适当的次数。 特别是,有一个错误涉及每个本地组的第一个元素,使其运行一次迭代少于其余部分,从而引发一个未定义的行为。 事后看来,这是非常明显的。

谢谢大家的答案和时间。


I'm running the same OpenCL kernel code on an Intel CPU and on a NVIDIA GPU and the results are wrong on the first but right on the latter; the strange thing is that if I do some seemingly irrelevant changes the output works as expected in both cases.

The goal of the function is to calculate the matrix multiplication between A (triangular) and B (regular), where the position of A in the operation is determined by the value of the variable left. The bug only appears when left is true and when the for loop iterates at least twice.

Here is a fragment of the code omitting some bits that shouldn't affect for the sake of clarity.

__kernel void blas_strmm(int left, int upper, int nota, int unit, int row, int dim, int m, int n,
                         float alpha, __global const float *a, __global const float *b, __global float *c) {

  /* [...] */
  int ty = get_local_id(1);
  int y = ty + BLOCK_SIZE * get_group_id(1);
  int by = y;
  __local float Bs[BLOCK_SIZE][BLOCK_SIZE];
  /* [...] */

  for(int i=start; i<end; i+=BLOCK_SIZE) {
    if(left) {
      ay = i+ty;
      bx = i+tx;
    }   
    else {
      ax = i+tx;
      by = i+ty;
    }   

    barrier(CLK_LOCAL_MEM_FENCE);
    /* [...] (Load As) */
    if(bx >= m || by >= n)
      Bs[tx][ty] = 0;
    else
      Bs[tx][ty] = b[bx*n+by];
    barrier(CLK_LOCAL_MEM_FENCE);

    /* [...] (Calculate Csub) */
  }

  if(y < n && x < (left ? row : m)) // In bounds
    c[x*n+y] = alpha*Csub;
}

Now it gets weird.

As you can see, by always equals y if left is true. I checked (with some printfs, mind you) and left is always true, and the code on the else branch inside the loop is never executed. Nevertheless, if I remove or comment out the by = i+ty line there, the code works. Why? I don't know yet, but I though it might be something related to by not having the expected value assigned.

My train of thought took me to check if there was ever a discrepancy between by and y, as they should have the same value always; I added a line that checked if by != y but that comparison always returned false, as expected. So I went on and changed the appearance of by for y so the line

if(bx >= m || by >= n)

transformed into

if(bx >= m || y >= n)

and it worked again, even though I'm still using the variable by properly three lines below.

With an open mind I tried some other things and I got to the point that the code works if I add the following line inside the loop, as long as it is situated at any point after the initial if/else and before the if condition that I mentioned just before.

if(y >= n) left = 1;

The code inside (left = 1) can be substituted for anything (a printf, another useless assignation, etc.), but the condition is a bit more restrictive. Here are some examples that make the code output the correct values:

if(y >= n) left = 1;
if(y < n) left = 1;
if(y+1 < n+1) left = 1;
if(n > y) left = 1;

And some that don't work, note that m = n in the particular example that I'm testing:

if(y >= n+1) left = 1;
if(y > n) left = 1;
if(y >= m) left = 1;
/* etc. */

That's the point where I am now. I have added a line that shouldn't affect the program at all but it makes it work. This magic solution is not satisfactory to me and I would like to know what's happening inside my CPU and why.

Just to be sure I'm not forgetting anything, here is the full function code and a gist with example inputs and outputs.

Thank you very much.


Solution

Both users DarkZeros and sharpneli were right about their assumptions: the barriers inside the for loop weren't being hit the right amount of times. In particular, there was a bug involving the very first element of each local group that made it run one iteration less than the rest, provoking an undefined behaviour. It was painfully obvious to see in hindsight.

Thank you all for your answers and time.


原文:https://stackoverflow.com/questions/19766922
更新时间:2022-04-14 14:04

最满意答案

"payload.getName()"

就像错误说的那样,String上没有属性名称。

如果您有String有效负载,则需要以其他方式提供文件名。


After some research, I was able to fix this. I sent the filename as FILE_NAME header while sending the message.

this.toSftpChannel.send(MessageBuilder.withPayload(fileContent).setHeader("FILE_NAME", filename)
                    .build());

In the flow i used

.fileNameExpression("headers.get('FILE_NAME')") 

to get the sent name and it worked perfectly.

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)