-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic support for particle trails. #288
Conversation
c02ec40
to
ba58b9b
Compare
This commit implements simple fixed-length particle trails in Hanabi. They're stored in a ring buffer with a fixed capacity separate from the main particle buffer. Currently, for simplicity, trail particles are rendered as exact duplicates of the head particles. Nothing in this patch prevents this from being expanded further to support custom rendering for trail particles, including ribbons and trail-index-dependent rendering, in the future. The only reason why this wasn't implemented is to keep the size of this patch manageable, as it's quite large as it is. The size of the trail buffer is known as the `trail_capacity` and doesn't change over the lifetime of the effect. The length of each particle trail is known as the `trail_length` and can be altered at runtime. The interval at which new trail particles spawn is known as the `trail_period` and can likewise change at runtime. There are three primary reasons why particle trails are stored in a separate buffer from the head particles: 1. It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls. 2. Having a separate buffer allows us to skip the update phase for particle trails, enhancing performance. 3. Since trail particles are strictly LIFO, we can use a ring buffer instead of a freelist, which both saves memory (as no freelist needs to be maintained) and enhances performance (as an entire chunk of particles can be freed at once instead of having to do so one by one). The core of the implementation is the `render::effect_cache::TrailChunks` buffer. The long documentation comment attached to that structure explains the setup of the ring buffer and has a diagram. In summary, two parallel ring buffers are maintained on CPU and GPU. The GPU ring buffer has `trail_capacity` entries and stores the trail particles themselves, while the CPU one has `trail_length` entries and stores pointers to indices defining the boundaries of the chunks. A new example, `worms`, has been added in order to demonstrate simple use of trails. This example can be updated over time as new trail features are added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial remarks, I didn't finish but I'm not sure that render indirect instance_count
overflowing is safe, because there will be a draw call executed with the value before the modulo is applied, and who knows what the GPU driver will think about it?
Also minor comment:
It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls.
Well, the way I think this would be done ideally is with a separate sub-effect, one for the "head" and one separate for the "trail". And so, in the trail effect, you wouldn't really need to special case the first particle.
/// | ||
/// This is only used by the `vfx_indirect` compute shader. | ||
trail_render_stride: u32, | ||
__pad1: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen RenderDoc complain that this struct is 20 bytes and it expected me to pass 32 bytes. However in practice 1) everything works without padding, and 2) I've not found a single line in the WGSL spec about padding being needed here. Did I miss something? Why did you add padding here?
@@ -236,6 +245,16 @@ pub(crate) struct GpuSpawnerParams { | |||
count: i32, | |||
/// Index of the effect into the indirect dispatch and render buffers. | |||
effect_index: u32, | |||
/// Whether we should create a trail particle this frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not "whether" (boolean), it's "how many" (count).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is actually either 0 or 1, so "whether" is intentional. Essentially it's just a boolean packed into a u32. I can change the wording if you like to make that clearer though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense. Can you please make it clear in the comment that this is a boolean 0/1?
/// Whether we should create a trail particle this frame. | ||
spawn_trail_particle: u32, | ||
/// Capacity of the trail buffer. | ||
trail_capacity: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that be hard-coded into the shader? I don't think we ever expect the trail capacity to be variable at runtime, nor to batch together effects with different trail capacity (so, which would require the same shader code, otherwise we can't batch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not super important though, don't waste time on this.
if (spawner.spawn_trail_particle != 0) { | ||
let dest_index = trail_render_indirect.base_instance + | ||
atomicAdd(&trail_render_indirect.instance_count, 1u); | ||
trail_buffer.particles[dest_index % spawner.trail_capacity] = particle; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem here I think, we're always incrementing instance_count
, but we don't check that it's within the capacity. I saw there's a comment in GpuTrailRenderIndirect
about the shader performing the modulo, unfortunately the order of passes is "init, indirect, update, render" so here we're going to leave instance_count
at a potentially invalid value for the render pass, until it's finally corrected next frame (but could be once again broken by the next update, and wrong again by the time the next indirect render occurs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The render pass handles overflow: https://github.com/djeedai/bevy_hanabi/pull/288/files#diff-bf68f6b0a654e214feea0412685a5f9ed901eab5d8174aed52a6c466858b082aR123 So you're right that yes, the instance index can overflow, but that's fine as the render pass always modulos it by the capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will break on WASM, which is already likely breaking for another similar "liberty" I took somewhere else without realizing it. So I'd rather avoid this kind of thing.
See https://toji.dev/webgpu-best-practices/indirect-draws.html getting in Chrome validation on indirect calls.
/// Note that, because the trail particle buffer is a ring buffer, it's | ||
/// entirely possible for the bounds of `(base_index, base_index + | ||
/// instance_count)` to be beyond the boundaries of that buffer. This is | ||
/// expected behavior, and the shader will perform the modulo operation | ||
/// correctly to look the particle up in the buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes if this was fixed by the time the indirect render call is executed, but that doesn't seem to be the case. See other comment in the vfx_update.wgsl
shader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment: I believe this is harmless.
@djeedai Could you explain what you mean by the separate sub-effect? Like trails are a separate If that's what you mean, then I'll close this PR as this will require a complete rewrite. |
Don't. This is a longer term goal that's absolutely not reachable easily now, as it's missing too many pieces. So the current PR is fine. I just wanted to seed some discussion and think ahead. What I mean is a single |
All of that makes sense to me. I actually implemented a very crude version of that in the Hanabi Workshop, in the form of particle systems that combine multiple effects into one and are spawned as a group. As you mentioned, this functionality is necessary for many effects. I think it'd be best to implement trails in a way that doesn't require a massive rewrite when we move to nodes. Here's a strawperson short-term proposal. What if we had something like "particle groups", each of which had its own list of indices on GPU? A particle belongs to a single group for its lifetime. Each modifier would specify which group or groups it applies to. The spawner would likewise have be annotated with a set of groups that it spawns particles into. Each particle group would be rendered in a separate drawcall, allowing for differing mesh topology per group. By itself, this wouldn't provide any new functionality, but it would open the door for new functionality in the following ways:
In order to implement trails, we'd need the particle group infrastructure and the "Duplicate Particle" modifier. This would be more flexible than this PR and would be forward-compatible with nodes. I think it'd also quite possibly be less code than this PR, because it'd eliminate the necessity of the chunking infrastructure and the double ring buffers. What do you think? |
Thinking about it more, I'm actually unsure how you'd implement node graphs without some sort of "particle groups" system under the hood. You clearly can't have an output buffer per node; that would be way too much copying of indices. So you need to optimize the node graph into the minimum set of output buffers by counting the total number of paths through the DAG and assigning particles to each of the buffers as necessary. That's precisely equivalent to particle groups as I described them. So, to reiterate, my proposal would be: (1) implement particle groups; (2) implement enough features on top of particle groups to support trails; (3) add the node graph feature as a layer on top of particle groups. |
@@ -144,6 +144,10 @@ required-features = [ "bevy/bevy_winit", "bevy/bevy_pbr", "bevy/png", "3d" ] | |||
name = "2d" | |||
required-features = [ "bevy/bevy_winit", "bevy/bevy_sprite", "2d" ] | |||
|
|||
[[example]] | |||
name = "worms" | |||
required-features = [ "bevy/bevy_winit", "bevy/bevy_pbr", "3d" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs bevy/png too for loading the circle asset.
Closing as #296 obsoletes this. |
This commit implements simple fixed-length particle trails in Hanabi. They're stored in a ring buffer with a fixed capacity separate from the main particle buffer. Currently, for simplicity, trail particles are rendered as exact duplicates of the head particles. Nothing in this patch prevents this from being expanded further to support custom rendering for trail particles, including ribbons and trail-index-dependent rendering, in the future. The only reason why this wasn't implemented is to keep the size of this patch manageable, as it's quite large as it is.
The size of the trail buffer is known as the
trail_capacity
and doesn't change over the lifetime of the effect. The length of each particle trail is known as thetrail_length
and can be altered at runtime. The interval at which new trail particles spawn is known as thetrail_period
and can likewise change at runtime.There are three primary reasons why particle trails are stored in a separate buffer from the head particles:
It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls.
Having a separate buffer allows us to skip the update phase for particle trails, enhancing performance.
Since trail particles are strictly LIFO, we can use a ring buffer instead of a freelist, which both saves memory (as no freelist needs to be maintained) and enhances performance (as an entire chunk of particles can be freed at once instead of having to do so one by one).
The core of the implementation is the
render::effect_cache::TrailChunks
buffer. The long documentation comment attached to that structure explains the setup of the ring buffer and has a diagram. In summary, two parallel ring buffers are maintained on CPU and GPU. The GPU ring buffer hastrail_capacity
entries and stores the trail particles themselves, while the CPU one hastrail_length
entries and stores pointers to indices defining the boundaries of the chunks.A new example,
worms
, has been added in order to demonstrate simple use of trails. This example can be updated over time as new trail features are added.