Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instanced BoundingBoxRenderer #15911

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Instanced BoundingBoxRenderer #15911

wants to merge 8 commits into from

Conversation

kzhsw
Copy link
Contributor

@kzhsw kzhsw commented Nov 29, 2024

Background

Currently in BoundingBoxRenderer, it renders in a loop, calling engine.drawElementsType for each bounding box in renderList, making every bounding box 1 or 2 draw call.
Since it renders on web, it would suffer the same performance issue like meshes when draw calls increases.

Proposal

Since only the worldMatrix changes for different bounding boxes, it should benefit from instancing, like thin instances for meshes.

  1. Allocate buffers for instancing.
  2. Loop renderList and fill buffers.
  3. Copy buffers to gpu.
  4. Render instanced.

With instancing the draw call needed for each BoundingBoxRenderer could be reduced to 1 or 2 (in case this.showBackLines enabled)

Example:
https://playground.babylonjs.com/#37HN68#14

Alternatives

Since there are very few[1] vertex needed, and there is already vectorsWorld in BoundingBox, which updates every time world matrix updates, the computation of matrices can be skipped, and reuse computation result of vectorsWorld to rebuild vertex buffer every time it renders. In this case _indexBuffer needs to be reconstructed is count of bounding box mismatches.

[1]: 24 points in case of CreateBoxVertexData, or 8 points if constructed manually, since uvs and normals are not needed for rendering as lines

Compatibility

  1. It's disabled by default for backwards compatibility.
  2. It could result in a difference in rendering result if
    showBackLines enabled, because drawing the black/white part of each box one after the other can be different from drawing the black part of all boxes and then the white part.
  3. Events of onBeforeBoxRenderingObservable and onAfterBoxRenderingObservable would only be triggered once
    for one rendering, instead of once every bounding box. Events would be triggered with a dummy box to keep backwards compatibility.

References

Forum post: https://forum.babylonjs.com/t/batched-or-instanced-boundingboxrenderer/54977

## Background

Currently in BoundingBoxRenderer, it renders in a loop, calling `engine.drawElementsType` for each bounding box in `renderList`, making every bounding box 1 or 2 [draw call](https://doc.babylonjs.com/features/featuresDeepDive/scene/optimize_your_scene/#reducing-draw-calls).
Since it renders on web, it would suffer the same performance issue like meshes when draw calls increases.

## Proposal

Since only the `worldMatrix` changes for different bounding boxes, it should benefit from instancing, like [thin instances](https://doc.babylonjs.com/features/featuresDeepDive/mesh/copies/thinInstances) for meshes.

1. Allocate buffers for instancing.
2. Loop renderList and fill buffers.
3. Copy buffers to gpu.
4. Render instanced.

With instancing the draw call needed for each BoundingBoxRenderer could be reduced to 1 or 2 (in case `this.showBackLines` enabled)

Example:
https://playground.babylonjs.com/#37HN68#4

## Alternatives

Since there are very few[1] vertex needed, and there is already [`vectorsWorld`](https://github.com/BabylonJS/Babylon.js/blob/73ab4d666c0506a5fe551ce852a23e0e8ec0d15a/packages/dev/core/src/Culling/boundingBox.ts#L43) in `BoundingBox`, which [updates](https://github.com/BabylonJS/Babylon.js/blob/73ab4d666c0506a5fe551ce852a23e0e8ec0d15a/packages/dev/core/src/Culling/boundingBox.ts#L166) every time world matrix updates, the computation of matrices can be skipped, and reuse computation result of `vectorsWorld` to rebuild vertex buffer every time it renders. In this case `_indexBuffer` needs to be reconstructed is count of bounding box mismatches.

[1]: 24 points in case of `CreateBoxVertexData`, or 8 points if constructed manually, since uvs and normals are not needed for rendering as lines

## Compatibility

1. It's disabled by default for backwards compatibility.
2. It could result in a difference in rendering result if
`showBackLines` enabled, because drawing the black/white part of each box one after the other can be different from drawing the black part of all boxes and then the white part.
3. Events of `onBeforeBoxRenderingObservable` and `onAfterBoxRenderingObservable` would only be triggered once
for one rendering, instead of once every bounding box. Events would be triggered with a dummy box to keep backwards compatibility.

## References

Forum post: <https://forum.babylonjs.com/t/batched-or-instanced-boundingboxrenderer/54977>
@bjsplat
Copy link
Collaborator

bjsplat commented Nov 29, 2024

Please make sure to label your PR with "bug", "new feature" or "breaking change" label(s).
To prevent this PR from going to the changelog marked it with the "skip changelog" label.

@bjsplat
Copy link
Collaborator

bjsplat commented Nov 29, 2024

@bjsplat
Copy link
Collaborator

bjsplat commented Nov 29, 2024

@bjsplat
Copy link
Collaborator

bjsplat commented Nov 29, 2024

Copy link
Contributor

@Popov72 Popov72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
* If true, the buffer would be updated instead of recreated when rendering a frame,
* this could reduce memory pressure.
*/
public updatable = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this variable? Just keep the "updatable=true" case in the code, as I can't see why a user would want to set updatable=false (recreating the vertex buffer every frame will likely always be slower than updating it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs of Mesh said that using static buffer could have better performances, but in recent tests on a chrome, recreating it leads to more memory pressure, so this is kept and defaults to true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, performance is better when static=true if you create the buffers once and reuse them. But in your case, you recreate the buffers every frame when updatable=false. So even if the buffers are created with static=false, it's probably slower than creating the buffers once with static=false and updating them afterwards. You should run some tests to make sure, as we don't want to keep a code path that will never be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll run some profiling on chrome and firefox, but I does not have access to safari profiler. Also, since the buffer would be recreated when size change or useInstances change, the buffer recreation code is needed, so there is few extra code for this option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:

Configuration Browser _renderInstanced Heap
updatable=true Chrome 65.1% 102 MB
Firefox 69% 156.33 MB
updatable=false Chrome 69.0% 103 MB
Firefox 72% 154.20 MB
Details

updatable=true:
Chrome:
_renderInstanced: 65.1%
Heap: 102MB
performance
buttom-up
heap

Firefox:
_renderInstanced: 69%
Heap: 156.33MB
call tree
flame graph
heap

updatable=false:
Chrome:
_renderInstanced: 69.0%
Heap: 103MB
performance
buttom-up
heap

Firefox:
_renderInstanced: 72%
Heap: 154.20MB
call tree
flame graph
heap

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let s not make upadtable a configuration and optimize for the most common case as it adds extra complexity and maintenance.

packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
packages/dev/core/src/Rendering/boundingBoxRenderer.ts Outdated Show resolved Hide resolved
Apply suggestions from code review

Co-authored-by: Popov72 <[email protected]>
@@ -90,6 +91,15 @@ Object.defineProperty(AbstractMesh.prototype, "showBoundingBox", {
configurable: true,
});

const mat1 = Matrix.Identity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename mat1 vec and vec2 to better let the devs know they are temp values ? also whey not using babylon TMP values MathTmp.Vector3[0] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra work needs to be done to evaluate that the use of TmpVectors does not conflict with other internal uses, maybe in getWorldMatrix and other places, also, MathTmp is not exported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP, so lets rename only

* If true, the buffer would be updated instead of recreated when rendering a frame,
* this could reduce memory pressure.
*/
public updatable = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let s not make upadtable a configuration and optimize for the most common case as it adds extra complexity and maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants