Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic support for particle trails. #288

Closed
wants to merge 1 commit into from

Conversation

pcwalton
Copy link
Contributor

This commit implements simple fixed-length particle trails in Hanabi. They're stored in a ring buffer with a fixed capacity separate from the main particle buffer. Currently, for simplicity, trail particles are rendered as exact duplicates of the head particles. Nothing in this patch prevents this from being expanded further to support custom rendering for trail particles, including ribbons and trail-index-dependent rendering, in the future. The only reason why this wasn't implemented is to keep the size of this patch manageable, as it's quite large as it is.

The size of the trail buffer is known as the trail_capacity and doesn't change over the lifetime of the effect. The length of each particle trail is known as the trail_length and can be altered at runtime. The interval at which new trail particles spawn is known as the trail_period and can likewise change at runtime.

There are three primary reasons why particle trails are stored in a separate buffer from the head particles:

  1. It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls.

  2. Having a separate buffer allows us to skip the update phase for particle trails, enhancing performance.

  3. Since trail particles are strictly LIFO, we can use a ring buffer instead of a freelist, which both saves memory (as no freelist needs to be maintained) and enhances performance (as an entire chunk of particles can be freed at once instead of having to do so one by one).

The core of the implementation is the
render::effect_cache::TrailChunks buffer. The long documentation comment attached to that structure explains the setup of the ring buffer and has a diagram. In summary, two parallel ring buffers are maintained on CPU and GPU. The GPU ring buffer has trail_capacity entries and stores the trail particles themselves, while the CPU one has trail_length entries and stores pointers to indices defining the boundaries of the chunks.

A new example, worms, has been added in order to demonstrate simple use of trails. This example can be updated over time as new trail features are added.

@pcwalton pcwalton force-pushed the trails branch 2 times, most recently from c02ec40 to ba58b9b Compare February 27, 2024 04:00
@pcwalton pcwalton mentioned this pull request Feb 27, 2024
11 tasks
This commit implements simple fixed-length particle trails in Hanabi.
They're stored in a ring buffer with a fixed capacity separate from the
main particle buffer. Currently, for simplicity, trail particles are
rendered as exact duplicates of the head particles. Nothing in this
patch prevents this from being expanded further to support custom
rendering for trail particles, including ribbons and
trail-index-dependent rendering, in the future. The only reason why this
wasn't implemented is to keep the size of this patch manageable, as it's
quite large as it is.

The size of the trail buffer is known as the `trail_capacity` and
doesn't change over the lifetime of the effect. The length of each
particle trail is known as the `trail_length` and can be altered at
runtime. The interval at which new trail particles spawn is known as the
`trail_period` and can likewise change at runtime.

There are three primary reasons why particle trails are stored in a
separate buffer from the head particles:

1. It's common to want a separate rendering for trail particles and head
   particles (e.g. the head particle may want to be some sort of
   particle with a short ribbon behind it), and so we need to separate
   the two so that they can be rendered in separate drawcalls.

2. Having a separate buffer allows us to skip the update phase for
   particle trails, enhancing performance.

3. Since trail particles are strictly LIFO, we can use a ring buffer
   instead of a freelist, which both saves memory (as no freelist needs
   to be maintained) and enhances performance (as an entire chunk of
   particles can be freed at once instead of having to do so one by
   one).

The core of the implementation is the
`render::effect_cache::TrailChunks` buffer. The long documentation
comment attached to that structure explains the setup of the ring buffer
and has a diagram. In summary, two parallel ring buffers are maintained
on CPU and GPU. The GPU ring buffer has `trail_capacity` entries and
stores the trail particles themselves, while the CPU one has
`trail_length` entries and stores pointers to indices defining the
boundaries of the chunks.

A new example, `worms`, has been added in order to demonstrate simple
use of trails. This example can be updated over time as new trail
features are added.
Copy link
Owner

@djeedai djeedai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial remarks, I didn't finish but I'm not sure that render indirect instance_count overflowing is safe, because there will be a draw call executed with the value before the modulo is applied, and who knows what the GPU driver will think about it?

Also minor comment:

It's common to want a separate rendering for trail particles and head particles (e.g. the head particle may want to be some sort of particle with a short ribbon behind it), and so we need to separate the two so that they can be rendered in separate drawcalls.

Well, the way I think this would be done ideally is with a separate sub-effect, one for the "head" and one separate for the "trail". And so, in the trail effect, you wouldn't really need to special case the first particle.

///
/// This is only used by the `vfx_indirect` compute shader.
trail_render_stride: u32,
__pad1: u32,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen RenderDoc complain that this struct is 20 bytes and it expected me to pass 32 bytes. However in practice 1) everything works without padding, and 2) I've not found a single line in the WGSL spec about padding being needed here. Did I miss something? Why did you add padding here?

@@ -236,6 +245,16 @@ pub(crate) struct GpuSpawnerParams {
count: i32,
/// Index of the effect into the indirect dispatch and render buffers.
effect_index: u32,
/// Whether we should create a trail particle this frame.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not "whether" (boolean), it's "how many" (count).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is actually either 0 or 1, so "whether" is intentional. Essentially it's just a boolean packed into a u32. I can change the wording if you like to make that clearer though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. Can you please make it clear in the comment that this is a boolean 0/1?

/// Whether we should create a trail particle this frame.
spawn_trail_particle: u32,
/// Capacity of the trail buffer.
trail_capacity: u32,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be hard-coded into the shader? I don't think we ever expect the trail capacity to be variable at runtime, nor to batch together effects with different trail capacity (so, which would require the same shader code, otherwise we can't batch)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not super important though, don't waste time on this.

Comment on lines +80 to +84
if (spawner.spawn_trail_particle != 0) {
let dest_index = trail_render_indirect.base_instance +
atomicAdd(&trail_render_indirect.instance_count, 1u);
trail_buffer.particles[dest_index % spawner.trail_capacity] = particle;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem here I think, we're always incrementing instance_count, but we don't check that it's within the capacity. I saw there's a comment in GpuTrailRenderIndirect about the shader performing the modulo, unfortunately the order of passes is "init, indirect, update, render" so here we're going to leave instance_count at a potentially invalid value for the render pass, until it's finally corrected next frame (but could be once again broken by the next update, and wrong again by the time the next indirect render occurs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The render pass handles overflow: https://github.com/djeedai/bevy_hanabi/pull/288/files#diff-bf68f6b0a654e214feea0412685a5f9ed901eab5d8174aed52a6c466858b082aR123 So you're right that yes, the instance index can overflow, but that's fine as the render pass always modulos it by the capacity.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will break on WASM, which is already likely breaking for another similar "liberty" I took somewhere else without realizing it. So I'd rather avoid this kind of thing.

See https://toji.dev/webgpu-best-practices/indirect-draws.html getting in Chrome validation on indirect calls.

Comment on lines +333 to +337
/// Note that, because the trail particle buffer is a ring buffer, it's
/// entirely possible for the bounds of `(base_index, base_index +
/// instance_count)` to be beyond the boundaries of that buffer. This is
/// expected behavior, and the shader will perform the modulo operation
/// correctly to look the particle up in the buffer.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if this was fixed by the time the indirect render call is executed, but that doesn't seem to be the case. See other comment in the vfx_update.wgsl shader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment: I believe this is harmless.

@pcwalton
Copy link
Contributor Author

@djeedai Could you explain what you mean by the separate sub-effect? Like trails are a separate EffectAsset, and the main effect just dumps new particles into the other effect? There would need to be a way to convert one particle to the other type, as well as a new attribute to store the index of the most recent particle in the other trail effect (so that they can form a linked list--this is important for ribbons.)

If that's what you mean, then I'll close this PR as this will require a complete rewrite.

@djeedai
Copy link
Owner

djeedai commented Feb 27, 2024

If that's what you mean, then I'll close this PR as this will require a complete rewrite.

Don't. This is a longer term goal that's absolutely not reachable easily now, as it's missing too many pieces. So the current PR is fine. I just wanted to seed some discussion and think ahead.

What I mean is a single EffectAsset describes a single "visual effect" authored by an artist, and that "effect" can be composed of multiple sub-effects, each one with its own modifier stack etc. and possibly different outputs. And yes, that means there's a mechanism to trigger spawning between those sub-effects, likely without CPU intervention. Today the EffectAsset contains a single linear stack of modifiers for init/update/render, in that order, executed in sequence each frame. If an artists wants to have an effect with multiple visuals, they need multiple EffectAsset, and somehow coordinate them, which is almost impossible. Tomorrow, we can imagine that there could be multiple init/update/render nodes organized in a graph (and not a stack) and connected to each other. That means for example we could have 2 sources of init/spawn feeding a single update (single particle buffer), so we can easily share a buffer for, say, spawning in two different locations, which is a feature many have asked. Or, have a same update / particle buffer feed into 2 different outputs (render a quad AND render a mesh, for each particle). In that vision, your "the head particle requires a different rendering" just becomes 1 output for the particle heads, and one output for the trails themselves. And the trails have no special casing for the first particle, because they only render the trail part.

@pcwalton
Copy link
Contributor Author

pcwalton commented Feb 27, 2024

All of that makes sense to me. I actually implemented a very crude version of that in the Hanabi Workshop, in the form of particle systems that combine multiple effects into one and are spawned as a group. As you mentioned, this functionality is necessary for many effects.

I think it'd be best to implement trails in a way that doesn't require a massive rewrite when we move to nodes.

Here's a strawperson short-term proposal. What if we had something like "particle groups", each of which had its own list of indices on GPU? A particle belongs to a single group for its lifetime. Each modifier would specify which group or groups it applies to. The spawner would likewise have be annotated with a set of groups that it spawns particles into. Each particle group would be rendered in a separate drawcall, allowing for differing mesh topology per group. By itself, this wouldn't provide any new functionality, but it would open the door for new functionality in the following ways:

  1. We could add a "Duplicate Particle" modifier that duplicates a particle on a controllable schedule and assigns the duplicate to one or more groups. This would allow trails to be implemented. You'd have two groups, a "head" group and a "trail" group, and all the update modifiers would only apply to the "head" group. The only update modifier that the "trail" group would have would be a Kill Particle modifier based on age. This also immediately allows different rendering for heads and trails.

  2. We could add multiple spawners, each with its own set of groups that it spawns into, which would allow for multiple init sources, each with their own schedules.

  3. Eventually, we'd be able to compile an arbitrary node graph down to this "linear modifiers and particle group" format. We're going to have to compile the node graph into something, and the particle group format seems particularly efficient at minimizing the number of drawcalls and compute invocations. I think that every conceivable DAG can be compiled into the particle group format, so this provides a smooth migration internally.

In order to implement trails, we'd need the particle group infrastructure and the "Duplicate Particle" modifier. This would be more flexible than this PR and would be forward-compatible with nodes. I think it'd also quite possibly be less code than this PR, because it'd eliminate the necessity of the chunking infrastructure and the double ring buffers. What do you think?

@pcwalton
Copy link
Contributor Author

pcwalton commented Feb 28, 2024

Thinking about it more, I'm actually unsure how you'd implement node graphs without some sort of "particle groups" system under the hood. You clearly can't have an output buffer per node; that would be way too much copying of indices. So you need to optimize the node graph into the minimum set of output buffers by counting the total number of paths through the DAG and assigning particles to each of the buffers as necessary. That's precisely equivalent to particle groups as I described them.

So, to reiterate, my proposal would be: (1) implement particle groups; (2) implement enough features on top of particle groups to support trails; (3) add the node graph feature as a layer on top of particle groups.

@@ -144,6 +144,10 @@ required-features = [ "bevy/bevy_winit", "bevy/bevy_pbr", "bevy/png", "3d" ]
name = "2d"
required-features = [ "bevy/bevy_winit", "bevy/bevy_sprite", "2d" ]

[[example]]
name = "worms"
required-features = [ "bevy/bevy_winit", "bevy/bevy_pbr", "3d" ]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs bevy/png too for loading the circle asset.

@pcwalton
Copy link
Contributor Author

pcwalton commented Mar 5, 2024

Closing as #296 obsoletes this.

@pcwalton pcwalton closed this Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants