Material system: Implement GPU frustum culling #1137

VReaperV · 2024-05-12T18:42:39Z

Builds on #1105.

Implement frustum culling in compute shaders for the material system.

The culling works in 3 steps (performed in 3 different shaders):

At the start of the frame, in clearSurfaces_cp.glsl all the atomic command counters for the next frame are cleared.
At the end of the frame, in cull_cp.glsl every surfaces bounding sphere is checked against the 5 frustum planes (far plane is skipped because we always have it set to { 0, 0, 0, 0 } for some reason; and we set zFar to encompass the whole map anyway) and the corresponding enabled field in the surface commands buffer is set for the next frame.
Right afterwards, processSurfaces_cp.glsl goes over batches of 64 surfaces for all of the materials. If a material has an amount of surfaces that is not an integer multiple of 64, it is padded out to be such with fake surface commands (all of their fields are always 0). Each material has a corresponding atomic counter in an array. The indirect commands from each enabled surface command are written into an indirect draw buffer. After each command is written the corresponding atomic counter is increased by 1, and the returned value, added with a static material offset, is used as the indirect commands offset.

Both of these work in groups of 64 because compute threads are launched in groups of 32 (warp, Nvidia) or 64 (wavefront, AMD). The threads that are going past the last surface just return. Additionally surface commands have to be processed in batches of 64 (surface batches) for this reason and because atomic counter arrays can only be accessed with a dynamically uniform integral expression – data sourced from a UBO with global workgroup ID is such, while per-thread data wouldn't be.
Surface culling also requires all surface commands for every surface: surface command here corresponds to a stage in drawSurf shader. The additional ones (set to id: 0) are the "fake" surface commands which are never actually used, but they have to be there because indirection there is not possible since buffer writes have to be in a dynamically uniform control flow.

All of this is double buffered (MAX_FRAMES == 2) and holds information for MAX_VIEWS * MAX_FRAMES views.

Graph of how this system works:

VReaperV · 2024-05-13T06:35:52Z

I've fixed the culling now. Just need to deal with the memory issue.

VReaperV · 2024-05-13T06:38:45Z

Also made it work with r_lockPVS so it can be tested now by just doing /r_lockPVS 1 and looking around.

illwieckz · 2024-05-13T07:06:26Z

How do I enable GPU culling? I tried the branch but a look in Orbit shows me that it still uses CPU culling on my end.

I enabled r_useMaterialSystem but this seems to not be enough.

VReaperV · 2024-05-13T07:17:07Z

How do I enable GPU culling? I tried the branch but a look in Orbit shows me that it still uses CPU culling on my end.

I enabled r_useMaterialSystem but this seems to not be enough.

Do you have /r_arb_bindless_textures 1? I've disabled it by default because of the perf problems on AMD that we observed earlier. But it's required for the material system to work.
Also, entities still use CPU culling, but you should no longer see R_RecursiveWorldNode() being called if this is enabled.

VReaperV · 2024-05-13T07:25:31Z

Also, it might crash or something if there are surfaces with more than 4 stages; this can be patched by changing MAX_SURFACE_COMMANDS (both in cull_cp.glsl and Material.h) to a higher number, but a proper fix will require different shader permutations I think.

illwieckz · 2024-05-13T07:31:18Z

]/r_arb_bindless_textures 1
Unknown command r_arb_bindless_textures

If I do this it sets the cvar:

/set r_arb_bindless_textures 1

But I see no difference after both this and r_useMaterialSystem are enabled and I did vid_restart, the code still does a lot of R_RecursiveWorldNode calls.

VReaperV · 2024-05-13T07:45:55Z

Sorry, got the wrong cvar name. r_arb_bindless_texture is the correct one

illwieckz · 2024-05-13T07:51:08Z

Thanks.

I now get some flickering and lines like this (atcshd):

Warn: 0 15 
Warn: 15 15 
Warn: 30 2 
Warn: 32 2 
Warn: 34 1 
Warn: 35 1 
Warn: 36 1

And no R_RecursiveWorldNode  call!

So I guess I'm getting this branch running ! 👍️

VReaperV · 2024-05-13T08:05:52Z

I now get some flickering

I got the same thing... I think there's somehow concurrent operations going on where the processSurfaces_cp is running at the same time as some drawing commands, which is weird since I've set the memory barriers, and doing a glFlush() or using fences didn't help either... Maybe it's just interacting in some weird way with the indirect parameters extension.

and lines like this (atcshd):

Warn: 15 15 
Warn: 30 2 
Warn: 32 2 
Warn: 34 1 
Warn: 35 1 
Warn: 36 1

Ah, that's just some debug output :)

And no R_RecursiveWorldNode  call!

So I guess I'm getting this branch running ! 👍️

Yup! I'm not sure how it's gonna be on performance right now as the rendering shaders have to wait for the compute ones to complete, but it should be better when I make it double- or triple-buffered either way.

VReaperV · 2024-05-14T10:02:07Z

Fixed most of the flickering issues now and made it double buffered. Some of the flickering issues still remain, but it's probbaly an off-by-1 error somewhere. Also now it crashes on map change again.

VReaperV · 2024-05-14T15:05:59Z

Made this properly double-buffered now: R_RenderView() will queue views for surface culling, then in RB_RenderPostProcess() it will dipsatch computes for the queued views. I haven't tested that it works with multiple views yet though, but I need to cleanup and fix some things first before that.

VReaperV · 2024-05-14T22:20:27Z

I'm not getting any flickering now. Also fixed a crash and hopefully the incorrect memory accesses.

illwieckz · 2024-05-14T22:28:38Z

Excluding the missing fog, I see no visual glitch anymore in ATCSHD, and I don't experience GPU pagefaults anymore.

This is coming to be in good shape! 😃️

Framerate is around 370 fps on medium preset @ 4K with a Radeon PRO W7600 On Mesa 23.2.1 radeonsi. I usually get around 530 fps on master.

illwieckz · 2024-05-14T22:33:00Z

I get 400 fps on Mesa 24.0.7 (still 530 fps on master).

VReaperV · 2024-05-14T22:39:57Z

Hmm, I wonder if it's because of bindless textures still... Well, it's also still culling less surfaces right now because the far plane is ignored (we also have it as (0, 0, 0)) and because there's no occlusion culling here yet, so if you're looking e. g. towards one of the sides on atcs it will render all of the surfaces behind walls etc.

illwieckz · 2024-05-15T05:33:20Z

Yes I test with the default spectator scene, so in ATCSHD it means the whole outdoor and the whole alien base is in line of sight.

VReaperV · 2024-05-15T07:15:25Z

Yea... I'll make a separate pr later for occlusion culling, that should fix that part :)

VReaperV · 2024-05-16T06:48:35Z

This works slightly faster for me now after removing some unnecessary branching.

src/engine/renderer/gl_shader.h

src/engine/renderer/Material.cpp

VReaperV · 2024-05-18T07:21:26Z

Here's a more concise graph of how this culling works:

VReaperV · 2024-05-18T07:33:49Z

I've also made this work with multiple different views (i. e. portals) and moved defines to GLHeaders.

Surface commands will now use the minimal array size for the maximum amount of stages used on any compatible surface on the map (padded out to be a multiple of 4 for alignment). This required making the cull shader load after the map is loaded and there doesn't seem to be a better solution than that.

VReaperV · 2024-05-18T08:53:34Z

Frustum culling can now be toggled with r_gpuFrustumCulling.

VReaperV · 2024-07-27T11:24:58Z

Rebased on material branch now since I'm just waiting for someone to approve the changes in #1105 so I can merge them.

VReaperV · 2024-07-27T13:16:23Z

I've added a comment about MaterialPack here.

Add frustum culling in compute shaders to the material system. This will use sphere<>frustum culling and output the correct draw commands into the buffer for each viewframe (one view in any given frame buffered by the material system). Id 0 is reserved for no-command and will result in early return in the shader.

VReaperV · 2024-07-27T13:53:01Z

Added some comments.

VReaperV · 2024-07-29T08:41:48Z

This pr also fixes the issue seen in #1105 where some textures were selected incorrectly.

illwieckz

As said here:

#1180 (review)

I'm not sure to understand everything, but the overall looks good to me and I confirm it fixes many bugs.

VReaperV force-pushed the gpu-frustum-culling branch 2 times, most recently from 936be43 to cc5bbf1 Compare May 16, 2024 06:56

github-advanced-security bot found potential problems May 16, 2024

View reviewed changes

src/engine/renderer/gl_shader.h Fixed Show fixed Hide fixed

src/engine/renderer/gl_shader.h Fixed Show fixed Hide fixed

src/engine/renderer/gl_shader.h Fixed Show fixed Hide fixed

src/engine/renderer/gl_shader.h Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems May 17, 2024

View reviewed changes

src/engine/renderer/Material.cpp Dismissed Show dismissed Hide dismissed

github-advanced-security bot found potential problems May 17, 2024

View reviewed changes

src/engine/renderer/Material.cpp Dismissed Show dismissed Hide dismissed

github-advanced-security bot found potential problems May 17, 2024

View reviewed changes

src/engine/renderer/Material.cpp Dismissed Show dismissed Hide dismissed

VReaperV force-pushed the gpu-frustum-culling branch 2 times, most recently from 41fa83e to 3ad229d Compare May 18, 2024 06:26

VReaperV marked this pull request as ready for review May 18, 2024 06:26

VReaperV mentioned this pull request May 18, 2024

Implement compute shaders support, make fragment shader non-mandatory #1154

Merged

VReaperV changed the title ~~WIP: Material system: Implement GPU frustum culling~~ Material system: Implement GPU frustum culling May 18, 2024

VReaperV mentioned this pull request Jun 16, 2024

Material system: Implement GPU occlusion culling #1180

Merged

VReaperV force-pushed the gpu-frustum-culling branch from b4fbd11 to bfc9bff Compare July 16, 2024 22:03

VReaperV mentioned this pull request Jul 16, 2024

Implement material system #1105

Merged

VReaperV force-pushed the gpu-frustum-culling branch 5 times, most recently from 893d013 to 418997a Compare July 27, 2024 11:20

VReaperV force-pushed the gpu-frustum-culling branch 2 times, most recently from 37b6118 to a09b439 Compare July 27, 2024 13:16

VReaperV force-pushed the gpu-frustum-culling branch from a09b439 to 97fca21 Compare July 27, 2024 13:46

illwieckz added T-Improvement Improvement for an existing feature A-Renderer labels Jul 30, 2024

illwieckz approved these changes Jul 30, 2024

View reviewed changes

VReaperV merged commit 7920c1a into DaemonEngine:master Jul 31, 2024
9 checks passed

VReaperV deleted the gpu-frustum-culling branch July 31, 2024 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Material system: Implement GPU frustum culling #1137

Material system: Implement GPU frustum culling #1137

VReaperV commented May 12, 2024 •

edited

Loading

VReaperV commented May 13, 2024

VReaperV commented May 13, 2024

illwieckz commented May 13, 2024 •

edited

Loading

VReaperV commented May 13, 2024 •

edited

Loading

VReaperV commented May 13, 2024 •

edited

Loading

illwieckz commented May 13, 2024

VReaperV commented May 13, 2024

illwieckz commented May 13, 2024 •

edited

Loading

VReaperV commented May 13, 2024 •

edited

Loading

VReaperV commented May 14, 2024 •

edited

Loading

VReaperV commented May 14, 2024

VReaperV commented May 14, 2024

illwieckz commented May 14, 2024 •

edited

Loading

illwieckz commented May 14, 2024 •

edited

Loading

VReaperV commented May 14, 2024 •

edited

Loading

illwieckz commented May 15, 2024

VReaperV commented May 15, 2024

VReaperV commented May 16, 2024

VReaperV commented May 18, 2024 •

edited

Loading

VReaperV commented May 18, 2024

VReaperV commented May 18, 2024

VReaperV commented Jul 27, 2024

VReaperV commented Jul 27, 2024

VReaperV commented Jul 27, 2024

VReaperV commented Jul 29, 2024

illwieckz left a comment

Material system: Implement GPU frustum culling #1137

Material system: Implement GPU frustum culling #1137

Conversation

VReaperV commented May 12, 2024 • edited Loading

VReaperV commented May 13, 2024

VReaperV commented May 13, 2024

illwieckz commented May 13, 2024 • edited Loading

VReaperV commented May 13, 2024 • edited Loading

VReaperV commented May 13, 2024 • edited Loading

illwieckz commented May 13, 2024

VReaperV commented May 13, 2024

illwieckz commented May 13, 2024 • edited Loading

VReaperV commented May 13, 2024 • edited Loading

VReaperV commented May 14, 2024 • edited Loading

VReaperV commented May 14, 2024

VReaperV commented May 14, 2024

illwieckz commented May 14, 2024 • edited Loading

illwieckz commented May 14, 2024 • edited Loading

VReaperV commented May 14, 2024 • edited Loading

illwieckz commented May 15, 2024

VReaperV commented May 15, 2024

VReaperV commented May 16, 2024

VReaperV commented May 18, 2024 • edited Loading

VReaperV commented May 18, 2024

VReaperV commented May 18, 2024

VReaperV commented Jul 27, 2024

VReaperV commented Jul 27, 2024

VReaperV commented Jul 27, 2024

VReaperV commented Jul 29, 2024

illwieckz left a comment

Choose a reason for hiding this comment

VReaperV commented May 12, 2024 •

edited

Loading

illwieckz commented May 13, 2024 •

edited

Loading

VReaperV commented May 13, 2024 •

edited

Loading

VReaperV commented May 13, 2024 •

edited

Loading

illwieckz commented May 13, 2024 •

edited

Loading

VReaperV commented May 13, 2024 •

edited

Loading

VReaperV commented May 14, 2024 •

edited

Loading

illwieckz commented May 14, 2024 •

edited

Loading

illwieckz commented May 14, 2024 •

edited

Loading

VReaperV commented May 14, 2024 •

edited

Loading

VReaperV commented May 18, 2024 •

edited

Loading