Add basic support for particle trails. #288

pcwalton · 2024-02-27T03:46:47Z

This commit implements simple fixed-length particle trails in Hanabi. They're stored in a ring buffer with a fixed capacity separate from the main particle buffer. Currently, for simplicity, trail particles are rendered as exact duplicates of the head particles. Nothing in this patch prevents this from being expanded further to support custom rendering for trail particles, including ribbons and trail-index-dependent rendering, in the future. The only reason why this wasn't implemented is to keep the size of this patch manageable, as it's quite large as it is.

The size of the trail buffer is known as the trail_capacity and doesn't change over the lifetime of the effect. The length of each particle trail is known as the trail_length and can be altered at runtime. The interval at which new trail particles spawn is known as the trail_period and can likewise change at runtime.

There are three primary reasons why particle trails are stored in a separate buffer from the head particles:

It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls.
Having a separate buffer allows us to skip the update phase for particle trails, enhancing performance.
Since trail particles are strictly LIFO, we can use a ring buffer instead of a freelist, which both saves memory (as no freelist needs to be maintained) and enhances performance (as an entire chunk of particles can be freed at once instead of having to do so one by one).

The core of the implementation is the
render::effect_cache::TrailChunks buffer. The long documentation comment attached to that structure explains the setup of the ring buffer and has a diagram. In summary, two parallel ring buffers are maintained on CPU and GPU. The GPU ring buffer has trail_capacity entries and stores the trail particles themselves, while the CPU one has trail_length entries and stores pointers to indices defining the boundaries of the chunks.

A new example, worms, has been added in order to demonstrate simple use of trails. This example can be updated over time as new trail features are added.

This commit implements simple fixed-length particle trails in Hanabi. They're stored in a ring buffer with a fixed capacity separate from the main particle buffer. Currently, for simplicity, trail particles are rendered as exact duplicates of the head particles. Nothing in this patch prevents this from being expanded further to support custom rendering for trail particles, including ribbons and trail-index-dependent rendering, in the future. The only reason why this wasn't implemented is to keep the size of this patch manageable, as it's quite large as it is. The size of the trail buffer is known as the `trail_capacity` and doesn't change over the lifetime of the effect. The length of each particle trail is known as the `trail_length` and can be altered at runtime. The interval at which new trail particles spawn is known as the `trail_period` and can likewise change at runtime. There are three primary reasons why particle trails are stored in a separate buffer from the head particles: 1. It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls. 2. Having a separate buffer allows us to skip the update phase for particle trails, enhancing performance. 3. Since trail particles are strictly LIFO, we can use a ring buffer instead of a freelist, which both saves memory (as no freelist needs to be maintained) and enhances performance (as an entire chunk of particles can be freed at once instead of having to do so one by one). The core of the implementation is the `render::effect_cache::TrailChunks` buffer. The long documentation comment attached to that structure explains the setup of the ring buffer and has a diagram. In summary, two parallel ring buffers are maintained on CPU and GPU. The GPU ring buffer has `trail_capacity` entries and stores the trail particles themselves, while the CPU one has `trail_length` entries and stores pointers to indices defining the boundaries of the chunks. A new example, `worms`, has been added in order to demonstrate simple use of trails. This example can be updated over time as new trail features are added.

djeedai

Some initial remarks, I didn't finish but I'm not sure that render indirect instance_count overflowing is safe, because there will be a draw call executed with the value before the modulo is applied, and who knows what the GPU driver will think about it?

Also minor comment:

It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls.

Well, the way I think this would be done ideally is with a separate sub-effect, one for the "head" and one separate for the "trail". And so, in the trail effect, you wouldn't really need to special case the first particle.

djeedai · 2024-02-27T09:14:44Z

src/render/mod.rs

+    ///
+    /// This is only used by the `vfx_indirect` compute shader.
+    trail_render_stride: u32,
+    __pad1: u32,


I've seen RenderDoc complain that this struct is 20 bytes and it expected me to pass 32 bytes. However in practice 1) everything works without padding, and 2) I've not found a single line in the WGSL spec about padding being needed here. Did I miss something? Why did you add padding here?

djeedai · 2024-02-27T09:15:20Z

src/render/mod.rs

@@ -236,6 +245,16 @@ pub(crate) struct GpuSpawnerParams {
    count: i32,
    /// Index of the effect into the indirect dispatch and render buffers.
    effect_index: u32,
+    /// Whether we should create a trail particle this frame.


It's not "whether" (boolean), it's "how many" (count).

This value is actually either 0 or 1, so "whether" is intentional. Essentially it's just a boolean packed into a u32. I can change the wording if you like to make that clearer though.

Right, makes sense. Can you please make it clear in the comment that this is a boolean 0/1?

djeedai · 2024-02-27T09:16:40Z

src/render/mod.rs

+    /// Whether we should create a trail particle this frame.
+    spawn_trail_particle: u32,
+    /// Capacity of the trail buffer.
+    trail_capacity: u32,


Shouldn't that be hard-coded into the shader? I don't think we ever expect the trail capacity to be variable at runtime, nor to batch together effects with different trail capacity (so, which would require the same shader code, otherwise we can't batch)?

It's not super important though, don't waste time on this.

djeedai · 2024-02-27T09:20:49Z

src/render/vfx_update.wgsl

+        if (spawner.spawn_trail_particle != 0) {
+            let dest_index = trail_render_indirect.base_instance +
+                atomicAdd(&trail_render_indirect.instance_count, 1u);
+            trail_buffer.particles[dest_index % spawner.trail_capacity] = particle;
+        }


There's a problem here I think, we're always incrementing instance_count, but we don't check that it's within the capacity. I saw there's a comment in GpuTrailRenderIndirect about the shader performing the modulo, unfortunately the order of passes is "init, indirect, update, render" so here we're going to leave instance_count at a potentially invalid value for the render pass, until it's finally corrected next frame (but could be once again broken by the next update, and wrong again by the time the next indirect render occurs).

The render pass handles overflow: https://github.com/djeedai/bevy_hanabi/pull/288/files#diff-bf68f6b0a654e214feea0412685a5f9ed901eab5d8174aed52a6c466858b082aR123 So you're right that yes, the instance index can overflow, but that's fine as the render pass always modulos it by the capacity.

I believe this will break on WASM, which is already likely breaking for another similar "liberty" I took somewhere else without realizing it. So I'd rather avoid this kind of thing.

See https://toji.dev/webgpu-best-practices/indirect-draws.html getting in Chrome validation on indirect calls.

djeedai · 2024-02-27T09:21:41Z

src/render/mod.rs

+/// Note that, because the trail particle buffer is a ring buffer, it's
+/// entirely possible for the bounds of `(base_index, base_index +
+/// instance_count)` to be beyond the boundaries of that buffer. This is
+/// expected behavior, and the shader will perform the modulo operation
+/// correctly to look the particle up in the buffer.


Yes if this was fixed by the time the indirect render call is executed, but that doesn't seem to be the case. See other comment in the vfx_update.wgsl shader.

See other comment: I believe this is harmless.

pcwalton · 2024-02-27T09:43:53Z

@djeedai Could you explain what you mean by the separate sub-effect? Like trails are a separate EffectAsset, and the main effect just dumps new particles into the other effect? There would need to be a way to convert one particle to the other type, as well as a new attribute to store the index of the most recent particle in the other trail effect (so that they can form a linked list--this is important for ribbons.)

If that's what you mean, then I'll close this PR as this will require a complete rewrite.

djeedai · 2024-02-27T13:46:41Z

If that's what you mean, then I'll close this PR as this will require a complete rewrite.

Don't. This is a longer term goal that's absolutely not reachable easily now, as it's missing too many pieces. So the current PR is fine. I just wanted to seed some discussion and think ahead.

What I mean is a single EffectAsset describes a single "visual effect" authored by an artist, and that "effect" can be composed of multiple sub-effects, each one with its own modifier stack etc. and possibly different outputs. And yes, that means there's a mechanism to trigger spawning between those sub-effects, likely without CPU intervention. Today the EffectAsset contains a single linear stack of modifiers for init/update/render, in that order, executed in sequence each frame. If an artists wants to have an effect with multiple visuals, they need multiple EffectAsset, and somehow coordinate them, which is almost impossible. Tomorrow, we can imagine that there could be multiple init/update/render nodes organized in a graph (and not a stack) and connected to each other. That means for example we could have 2 sources of init/spawn feeding a single update (single particle buffer), so we can easily share a buffer for, say, spawning in two different locations, which is a feature many have asked. Or, have a same update / particle buffer feed into 2 different outputs (render a quad AND render a mesh, for each particle). In that vision, your "the head particle requires a different rendering" just becomes 1 output for the particle heads, and one output for the trails themselves. And the trails have no special casing for the first particle, because they only render the trail part.

pcwalton · 2024-02-27T19:09:28Z

All of that makes sense to me. I actually implemented a very crude version of that in the Hanabi Workshop, in the form of particle systems that combine multiple effects into one and are spawned as a group. As you mentioned, this functionality is necessary for many effects.

I think it'd be best to implement trails in a way that doesn't require a massive rewrite when we move to nodes.

Here's a strawperson short-term proposal. What if we had something like "particle groups", each of which had its own list of indices on GPU? A particle belongs to a single group for its lifetime. Each modifier would specify which group or groups it applies to. The spawner would likewise have be annotated with a set of groups that it spawns particles into. Each particle group would be rendered in a separate drawcall, allowing for differing mesh topology per group. By itself, this wouldn't provide any new functionality, but it would open the door for new functionality in the following ways:

We could add a "Duplicate Particle" modifier that duplicates a particle on a controllable schedule and assigns the duplicate to one or more groups. This would allow trails to be implemented. You'd have two groups, a "head" group and a "trail" group, and all the update modifiers would only apply to the "head" group. The only update modifier that the "trail" group would have would be a Kill Particle modifier based on age. This also immediately allows different rendering for heads and trails.
We could add multiple spawners, each with its own set of groups that it spawns into, which would allow for multiple init sources, each with their own schedules.
Eventually, we'd be able to compile an arbitrary node graph down to this "linear modifiers and particle group" format. We're going to have to compile the node graph into something, and the particle group format seems particularly efficient at minimizing the number of drawcalls and compute invocations. I think that every conceivable DAG can be compiled into the particle group format, so this provides a smooth migration internally.

In order to implement trails, we'd need the particle group infrastructure and the "Duplicate Particle" modifier. This would be more flexible than this PR and would be forward-compatible with nodes. I think it'd also quite possibly be less code than this PR, because it'd eliminate the necessity of the chunking infrastructure and the double ring buffers. What do you think?

pcwalton · 2024-02-28T04:19:40Z

Thinking about it more, I'm actually unsure how you'd implement node graphs without some sort of "particle groups" system under the hood. You clearly can't have an output buffer per node; that would be way too much copying of indices. So you need to optimize the node graph into the minimum set of output buffers by counting the total number of paths through the DAG and assigning particles to each of the buffers as necessary. That's precisely equivalent to particle groups as I described them.

So, to reiterate, my proposal would be: (1) implement particle groups; (2) implement enough features on top of particle groups to support trails; (3) add the node graph feature as a layer on top of particle groups.

djeedai · 2024-02-28T18:44:16Z

Cargo.toml

@@ -144,6 +144,10 @@ required-features = [ "bevy/bevy_winit", "bevy/bevy_pbr", "bevy/png", "3d" ]
 name = "2d"
 required-features = [ "bevy/bevy_winit", "bevy/bevy_sprite", "2d" ]

+[[example]]
+name = "worms"
+required-features = [ "bevy/bevy_winit", "bevy/bevy_pbr", "3d" ]


This needs bevy/png too for loading the circle asset.

pcwalton · 2024-03-05T10:05:59Z

Closing as #296 obsoletes this.

pcwalton force-pushed the trails branch 2 times, most recently from c02ec40 to ba58b9b Compare February 27, 2024 04:00

pcwalton mentioned this pull request Feb 27, 2024

Wishlist #215

Open

11 tasks

pcwalton force-pushed the trails branch from ba58b9b to ecd1d65 Compare February 27, 2024 04:34

pcwalton force-pushed the trails branch from ecd1d65 to 76019d6 Compare February 27, 2024 09:25

djeedai reviewed Feb 27, 2024

View reviewed changes

djeedai reviewed Feb 28, 2024

View reviewed changes

pcwalton mentioned this pull request Mar 5, 2024

Implement particle groups, enabling simple particle trails. #296

Merged

pcwalton closed this Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic support for particle trails. #288

Add basic support for particle trails. #288

pcwalton commented Feb 27, 2024

djeedai left a comment

djeedai Feb 27, 2024

djeedai Feb 27, 2024

pcwalton Feb 27, 2024

djeedai Feb 27, 2024

djeedai Feb 27, 2024

djeedai Feb 27, 2024

djeedai Feb 27, 2024

pcwalton Feb 27, 2024

djeedai Feb 27, 2024

djeedai Feb 27, 2024

pcwalton Feb 27, 2024

pcwalton commented Feb 27, 2024

djeedai commented Feb 27, 2024

pcwalton commented Feb 27, 2024 •

edited

Loading

pcwalton commented Feb 28, 2024 •

edited

Loading

djeedai Feb 28, 2024

pcwalton commented Mar 5, 2024

Add basic support for particle trails. #288

Add basic support for particle trails. #288

Conversation

pcwalton commented Feb 27, 2024

djeedai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pcwalton commented Feb 27, 2024

djeedai commented Feb 27, 2024

pcwalton commented Feb 27, 2024 • edited Loading

pcwalton commented Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

pcwalton commented Mar 5, 2024

pcwalton commented Feb 27, 2024 •

edited

Loading

pcwalton commented Feb 28, 2024 •

edited

Loading