首页 \ 问答 \ 使用OpenGL和GLSL的SSAO算法的奇怪表现行为(Strange performance behaviour with SSAO algorithm using OpenGL and GLSL)

使用OpenGL和GLSL的SSAO算法的奇怪表现行为(Strange performance behaviour with SSAO algorithm using OpenGL and GLSL)

我正在使用Oriented-Hemisphere渲染技术研究SSAO(屏幕空间环境遮挡)算法。

I)算法

该算法需要输入:

  • 1包含预计算样本的数组(在主循环之前加载 - >在我的示例中,我使用根据z轴定向的64个样本 )。
  • 1噪声纹理包含也根据z轴定向的归一化旋转矢量(该纹理生成一次)。
  • 来自GBuffer的2个纹理:包含视图空间中的位置和法向量的'PositionSampler'和'NormalSampler'。

以下是我使用的片段着色器源代码:

#version 400

/*
** Output color value.
*/
layout (location = 0) out vec4 FragColor;

/*
** Vertex inputs.
*/
in VertexData_VS
{
    vec2 TexCoords;

} VertexData_IN;

/*
** Inverse Projection Matrix.
*/
uniform mat4 ProjMatrix;

/*
** GBuffer samplers.
*/
uniform sampler2D PositionSampler;
uniform sampler2D NormalSampler;

/*
** Noise sampler.
*/
uniform sampler2D NoiseSampler;

/*
** Noise texture viewport.
*/
uniform vec2 NoiseTexOffset;

/*
** Ambient light intensity.
*/
uniform vec4 AmbientIntensity;

/*
** SSAO kernel + size.
*/
uniform vec3 SSAOKernel[64];
uniform uint SSAOKernelSize;
uniform float SSAORadius;

/*
** Computes Orientation matrix.
*/
mat3 GetOrientationMatrix(vec3 normal, vec3 rotation)
{
    vec3 tangent = normalize(rotation - normal * dot(rotation, normal)); //Graham Schmidt process 
    vec3 bitangent = cross(normal, tangent);

    return (mat3(tangent, bitangent, normal)); //Orientation according to the normal
}

/*
** Fragment shader entry point.
*/
void main(void)
{
    float OcclusionFactor = 0.0f;

    vec3 gNormal_CS = normalize(texture(
        NormalSampler, VertexData_IN.TexCoords).xyz * 2.0f - 1.0f); //Normal vector in view space from GBuffer
    vec3 rotationVec = normalize(texture(NoiseSampler,
        VertexData_IN.TexCoords * NoiseTexOffset).xyz * 2.0f - 1.0f); //Rotation vector required for Graham Schmidt process

    vec3 Origin_VS = texture(PositionSampler, VertexData_IN.TexCoords).xyz; //Origin vertex in view space from GBuffer
    mat3 OrientMatrix = GetOrientationMatrix(gNormal_CS, rotationVec);

    for (int idx = 0; idx < SSAOKernelSize; idx++) //For each sample (64 iterations)
    {
        vec4 Sample_VS = vec4(Origin_VS + OrientMatrix * SSAOKernel[idx], 1.0f); //Sample translated in view space

        vec4 Sample_HS = ProjMatrix * Sample_VS; //Sample in homogeneus space
        vec3 Sample_CS = Sample_HS.xyz /= Sample_HS.w; //Perspective dividing (clip space)
        vec2 texOffset = Sample_CS.xy * 0.5f + 0.5f; //Recover sample texture coordinates

        vec3 SampleDepth_VS = texture(PositionSampler, texOffset).xyz; //Sample depth in view space

        if (Sample_VS.z < SampleDepth_VS.z)
            if (length(Sample_VS.xyz - SampleDepth_VS) <= SSAORadius)
                OcclusionFactor += 1.0f; //Occlusion accumulation
    }
    OcclusionFactor = 1.0f - (OcclusionFactor / float(SSAOKernelSize));

    FragColor = vec4(OcclusionFactor);
    FragColor *= AmbientIntensity;
}

结果如下(没有模糊渲染通过):

在这里输入图像描述

直到这里似乎都是正确的。

二)表演

我注意到NSight调试器在性能方面有一个非常奇怪的行为:

如果我将相机靠近龙的方向移动,演出将受到严重影响。

但是,在我看来,情况并非如此,因为SSAO算法适用于Screen-Space,并且不依赖于龙的基元数量。

这里有3个屏幕截图,包含3个不同的摄像头位置(这3个案例中所有1024 * 768像素着色器都使用相同的算法执行):

a)GPU闲置:40%(像素受影响:100%)

在这里输入图像描述

b)GPU闲置:25%(影响像素:100%)

在这里输入图像描述

c)GPU闲置:2%! (像素受影响:100%)

在这里输入图像描述

我的渲染引擎在我的示例中使用了2个渲染通道:

  • 材料通行证 (填写位置正常采样器)
  • Ambient pass (填充SSAO纹理)

我认为问题来自这两个通行证的执行,但事实并非如此,因为我在客户端代码中添加了一个条件,即如果摄像机静止,则无法计算物质通行证。 所以当我拍摄上面这3张照片的时候,只有Ambient Pass被执行了。 所以这种缺乏表现与物质关系无关。 我可以给你的另一个论据是,如果我移除龙的网格(只有飞机的场景),结果是相同的:我的相机更接近飞机,更缺乏性能!

对我而言,这种行为是不合逻辑的! 就像我上面所说的那样,在这三种情况下,所有像素着色器都是使用完全相同的像素着色器代码执行的!

现在我注意到另一个奇怪的行为,如果我直接在片段着色器中更改一小段代码:

如果我更换该行:

FragColor = vec4(OcclusionFactor);

由该行:

FragColor = vec4(1.0f, 1.0f, 1.0f, 1.0f);

缺乏表现消失!

这意味着如果SSAO代码被正确执行(我试图在执行过程中放置​​一些断点来检查它),并且在末尾没有使用这个OcclusionFactor来填充最终的输出颜色,所以不缺乏性能!

我认为我们可以得出结论,问题不是来自于“FragColor = vec4(OcclusionFactor)”这一行之前的着色器代码;“ ... 我认为。

你怎么能解释这样的行为?

我在客户端代码和片段着色器代码中尝试了很多代码的组合,但是我找不到解决这个问题的方法! 我真的迷失了。

非常感谢您的帮助!


I'm working on the SSAO (Screen-Space Ambient Occlusion) algorithm using Oriented-Hemisphere rendering technique.

I) The algorithm

This algorithm requires as inputs:

  • 1 array containing precomputed samples (loaded before the main loop -> In my example I use 64 samples oriented according to the z axis).
  • 1 noise texture containing normalized rotation vectors also oriented according to the z axis (this texture is generated once).
  • 2 textures from the GBuffer: the 'PositionSampler' and the 'NormalSampler' containing the positions and normal vectors in view space.

Here's the fragment shader source code I use:

#version 400

/*
** Output color value.
*/
layout (location = 0) out vec4 FragColor;

/*
** Vertex inputs.
*/
in VertexData_VS
{
    vec2 TexCoords;

} VertexData_IN;

/*
** Inverse Projection Matrix.
*/
uniform mat4 ProjMatrix;

/*
** GBuffer samplers.
*/
uniform sampler2D PositionSampler;
uniform sampler2D NormalSampler;

/*
** Noise sampler.
*/
uniform sampler2D NoiseSampler;

/*
** Noise texture viewport.
*/
uniform vec2 NoiseTexOffset;

/*
** Ambient light intensity.
*/
uniform vec4 AmbientIntensity;

/*
** SSAO kernel + size.
*/
uniform vec3 SSAOKernel[64];
uniform uint SSAOKernelSize;
uniform float SSAORadius;

/*
** Computes Orientation matrix.
*/
mat3 GetOrientationMatrix(vec3 normal, vec3 rotation)
{
    vec3 tangent = normalize(rotation - normal * dot(rotation, normal)); //Graham Schmidt process 
    vec3 bitangent = cross(normal, tangent);

    return (mat3(tangent, bitangent, normal)); //Orientation according to the normal
}

/*
** Fragment shader entry point.
*/
void main(void)
{
    float OcclusionFactor = 0.0f;

    vec3 gNormal_CS = normalize(texture(
        NormalSampler, VertexData_IN.TexCoords).xyz * 2.0f - 1.0f); //Normal vector in view space from GBuffer
    vec3 rotationVec = normalize(texture(NoiseSampler,
        VertexData_IN.TexCoords * NoiseTexOffset).xyz * 2.0f - 1.0f); //Rotation vector required for Graham Schmidt process

    vec3 Origin_VS = texture(PositionSampler, VertexData_IN.TexCoords).xyz; //Origin vertex in view space from GBuffer
    mat3 OrientMatrix = GetOrientationMatrix(gNormal_CS, rotationVec);

    for (int idx = 0; idx < SSAOKernelSize; idx++) //For each sample (64 iterations)
    {
        vec4 Sample_VS = vec4(Origin_VS + OrientMatrix * SSAOKernel[idx], 1.0f); //Sample translated in view space

        vec4 Sample_HS = ProjMatrix * Sample_VS; //Sample in homogeneus space
        vec3 Sample_CS = Sample_HS.xyz /= Sample_HS.w; //Perspective dividing (clip space)
        vec2 texOffset = Sample_CS.xy * 0.5f + 0.5f; //Recover sample texture coordinates

        vec3 SampleDepth_VS = texture(PositionSampler, texOffset).xyz; //Sample depth in view space

        if (Sample_VS.z < SampleDepth_VS.z)
            if (length(Sample_VS.xyz - SampleDepth_VS) <= SSAORadius)
                OcclusionFactor += 1.0f; //Occlusion accumulation
    }
    OcclusionFactor = 1.0f - (OcclusionFactor / float(SSAOKernelSize));

    FragColor = vec4(OcclusionFactor);
    FragColor *= AmbientIntensity;
}

And here's the result (without blur render pass):

enter image description here

Until here all seems to be correct.

II) The performance

I noticed NSight Debugger a very strange behaviour concerning the performance:

If I move my camera closer and closer toward the dragon the performances are drastically impacted.

But, in my mind, it should be not the case because SSAO algorithm is apply in Screen-Space and do not depend on the number of primitives of the dragon for example.

Here's 3 screenshots with 3 different camera positions (with those 3 case all 1024*768 pixel shaders are executed using all the same algorithm):

a) GPU idle : 40% (pixel impacted: 100%)

enter image description here

b) GPU idle : 25% (pixel impacted: 100%)

enter image description here

c) GPU idle : 2%! (pixel impacted: 100%)

enter image description here

My rendering engine uses in my example exaclly 2 render passes:

  • the Material Pass (filling the position and normal samplers)
  • the Ambient pass (filling the SSAO texture)

I thought the problem comes from the addition of the execution of these two passes but it's not the case because I've added in my client code a condition to not compute for nothing the material pass if the camera is stationary. So when I took these 3 pictures above there was just the Ambient Pass executed. So this lack of performance in not related to the material pass. An other argument I could give you is if I remove the dragon mesh (the scene with just the plane) the result is the same: more my camera is close to the plane, more the lack of performance is huge!

For me this behaviour is not logical! Like I said above, in these 3 cases all the pixel shaders are executed applying exactly the same pixel shader code!

Now I noticed another strange behaviour if I change a little piece of code directly within the fragment shader:

If I replace the line:

FragColor = vec4(OcclusionFactor);

By the line:

FragColor = vec4(1.0f, 1.0f, 1.0f, 1.0f);

The lack of performance disappears!

It means that if the SSAO code is correctly executed (I tried to place some break points during the execution to check it) and I don't use this OcclusionFactor at the end to fill the final output color, so there is no lack of performance!

I think we can conclude that the problem does not come from the shader code before the line "FragColor = vec4(OcclusionFactor);"... I think.

How can yo explain a such behaviour?

I tried a lot of combination of code both in the client code and in the fragment shader code but I can't find the solution to this problem! I'm really lost.

Thank you very much in advance for your help!


原文:https://stackoverflow.com/questions/31682173
更新时间:2023-06-08 15:06

最满意答案

我发现我对这个问题的第一次尝试失败了,这个错误消息,解决方案是另一个SO问题, Jersey:com.sun.jersey.server.impl.template.ViewableMessageBodyWriter :我忘了添加球衣-json模块到我的项目。


I found that my first stab at this same problem failed with this error message, and the solution was as given in another SO question, Jersey: com.sun.jersey.server.impl.template.ViewableMessageBodyWriter: I had forgotten to add the jersey-json module to my project.

相关问答

更多
  • 创建Response类: public class Response { String result; String message; T customField; public Response(String result, String message, T customField) { this.result = result; this.message = message; this.customField = ...
  • 我在Jersey的文档中找到了答案: @Ref(value="users/${user.uuid}/products", condition="${user.products}") URI productsRef; 这会按照我的意愿为列表创建一个URI,并且我会从JSON解析中隐藏List。 这将产生以下JSON: "User": { "uuid": "1234", "productsRef": "http://mydomain/api/users/1234/produ ...
  • 在你的情况下,Jackson正在使用类org.codehaus.jackson.xc.JaxbAnnotationIntrospector来使用POJO反序列化/序列化ArrayList(我假设你在web.xml中配置了com.sun.jersey.api.json.POJOMappingFeature )。 杰克逊正在寻找一种不存在的注释。 参见JaxbAnnotationIntrospector.java第643 JaxbAnnotationIntrospector.java : Class poten ...
  • 我没有验证这是否特别导致该错误,但com.sun.jersey.config.property.packages参数应该只指向包含您的资源和提供程序的包。 看起来org.codehaus.jackson.jaxrs似乎不属于那里。 试试吧: com.sun.jersey.config.property.packages de.mme2app.open.server ...
  • 如果您使用jQuery / Ajax进行调用,则可以执行以下操作: HTML
    NameGroupBalanceDescription
    jQuery的 $.getJSON('rest/servic ...
  • 在这里,这是一个解决方案,至少在jersey-client.jar API方面。 我需要添加以下行: List g = resource.get(new GenericType>(){}); EntityJAXBxml entity = g.get(1); System.out.println(g.size()+" : "+entity.getdValue()+" : Element count: "+entity.get ...
  • 我发现我对这个问题的第一次尝试失败了,这个错误消息,解决方案是另一个SO问题, Jersey:com.sun.jersey.server.impl.template.ViewableMessageBodyWriter :我忘了添加球衣-json模块到我的项目。 I found that my first stab at this same problem failed with this error message, and the solution was as given in another SO q ...
  • CORS是关于请求/响应头,你返回的(顺便说一句,你返回的是String而不是ObjectWriter)是一个响应体。 所以使用CORS,就像你在例子中看到的那样。 可能是这个链接将有助于https://spring.io/guides/gs/rest-service-cors/ CORS is about request/response headers, what you return (btw you return String and not ObjectWriter) is a response ...
  • 假设在前端,您使用JSON.stringify()将参数作为json对象发送,然后返回端点方法。 将此作为该方法签名JsonObject payload的第二个参数添加。 然后,您可以按如下方式访问该方法中的查询参数 String log = payload.getString("log"); String pass = payload.getString("pass"); 经过修改的版本 @Path("/login") @GET @Produces(MediaType.APPLICATION_JSON) ...
  • 如果您可以让客户端为单个部分设置JSON的Content-Type标头,那么处理这个是微不足道的。 使用multipart,每个部分都可以拥有自己的Content-Type 。 例如,原始多部分请求的一部分可能看起来像 --Boundary_1_1938025186_1463410894758 Content-Type: application/json Content-Disposition: form-data; name="beans" [ {"name": "peeskillet"} ] --Bo ...

相关文章

更多

最新问答

更多
  • 获取MVC 4使用的DisplayMode后缀(Get the DisplayMode Suffix being used by MVC 4)
  • 如何通过引用返回对象?(How is returning an object by reference possible?)
  • 矩阵如何存储在内存中?(How are matrices stored in memory?)
  • 每个请求的Java新会话?(Java New Session For Each Request?)
  • css:浮动div中重叠的标题h1(css: overlapping headlines h1 in floated divs)
  • 无论图像如何,Caffe预测同一类(Caffe predicts same class regardless of image)
  • xcode语法颜色编码解释?(xcode syntax color coding explained?)
  • 在Access 2010 Runtime中使用Office 2000校对工具(Use Office 2000 proofing tools in Access 2010 Runtime)
  • 从单独的Web主机将图像传输到服务器上(Getting images onto server from separate web host)
  • 从旧版本复制文件并保留它们(旧/新版本)(Copy a file from old revision and keep both of them (old / new revision))
  • 西安哪有PLC可控制编程的培训
  • 在Entity Framework中选择基类(Select base class in Entity Framework)
  • 在Android中出现错误“数据集和渲染器应该不为null,并且应该具有相同数量的系列”(Error “Dataset and renderer should be not null and should have the same number of series” in Android)
  • 电脑二级VF有什么用
  • Datamapper Ruby如何添加Hook方法(Datamapper Ruby How to add Hook Method)
  • 金华英语角.
  • 手机软件如何制作
  • 用于Android webview中图像保存的上下文菜单(Context Menu for Image Saving in an Android webview)
  • 注意:未定义的偏移量:PHP(Notice: Undefined offset: PHP)
  • 如何读R中的大数据集[复制](How to read large dataset in R [duplicate])
  • Unity 5 Heighmap与地形宽度/地形长度的分辨率关系?(Unity 5 Heighmap Resolution relationship to terrain width / terrain length?)
  • 如何通知PipedOutputStream线程写入最后一个字节的PipedInputStream线程?(How to notify PipedInputStream thread that PipedOutputStream thread has written last byte?)
  • python的访问器方法有哪些
  • DeviceNetworkInformation:哪个是哪个?(DeviceNetworkInformation: Which is which?)
  • 在Ruby中对组合进行排序(Sorting a combination in Ruby)
  • 网站开发的流程?
  • 使用Zend Framework 2中的JOIN sql检索数据(Retrieve data using JOIN sql in Zend Framework 2)
  • 条带格式类型格式模式编号无法正常工作(Stripes format type format pattern number not working properly)
  • 透明度错误IE11(Transparency bug IE11)
  • linux的基本操作命令。。。