teaser_compressed.mp4
Brush is a 3D reconstruction engine, using Gaussian splatting. It aims to be highly portable, flexible and fast. 3D reconstruction should be accessible to everyone!
Brush works on a wide range of systems: macOS/windows/linux, AMD/Nvidia cards, Android, and in a browser. To achieve this, it uses WebGPU compatible tech, like the Burn machine learning framework, which has a portable wgpu
backend. This project is currently still a proof of concept, and doesn't yet implement the many extensions to gaussian splatting that have been developed, nor is performance optimal yet.
Try the (experimental) web demo
NOTE: This only works on desktop Chrome 129+ currently (Oct 2024). Firefox and Safari are hopefully supported soon, but currently even firefox nightly and safari technical preview do not work
The demo can load pretrained ply splats, and can load datasets to train on. Currently only two formats are supported. A .zip file containing:
- An
images
&sparse
folder withCOLMAP
data - A .json and images, like the nerfstudio format.
- You can specify a custom transforms_train.json and transforms_eval.json split.
While training you can interact with the scene and see the training dynamics live, and compare the current rendering to training / eval views as the training progresses.
combined_compressed.mp4
rerun_dash_compressed.mp4
While training, additional data can be visualized with the excellent rerun. To install rerun on your machine, please follow their instructions. Open the ./brush_blueprint.rbl in the viewer for best results.
brush_android_compressed.mp4
Training on a pixel 7
Machine learning for real time rendering has a lot of potential, but most popular ML tools don't align well with it. Rendering requires low latency, usually involve dynamic shapes, and it's not pleasant to attempt to ship apps with large PyTorch/Jax/CUDA deps calling out to python in a rendering loop. The usual fix is to write a separate training and inference application. Brush on the other hand, written in rust using wgpu
and burn
, can produce simple dependency free binaries, and can run on nearly all devices.
Install rust 1.81+ and run cargo run
or cargo run --release
. You can run tests with cargo test --all
. Brush uses the wonderful rerun for additional visualizations while training.
It currently requires rerun 0.19 however, which isn't released yet.
Simply cargo run
or cargo run --release
from the workspace root.
Note: Linux has not yet been tested but should work. Windows works well, but does currently only works on Vulkan.
This project uses trunk
to build for the web. Install trunk, and then run trunk serve
or trunk serve --release
to run a development server.
WebGPU is still a new standard, and as such, only the latest versions of Chrome work currently. Firefox nightly should work but unfortunately crashes currently.
The public web demo is registered for the subgroups origin trial. To run the web demo for yourself, please enable the "Unsafe WebGPU support" flag in Chrome.
To build on Android, see the more detailed README instructions at crates/brush-android.
Brush should work on iOs but there is currently no project setup to do so.
Brush is split into various crates. A quick overview of the different responsibilities are:
brush-render
is the main crate that pulls together the kernels into rendering functions.brush-train
has code to actually train Gaussians, and handle larger scale optimizations like splitting/cloning gaussians etc.brush-train-loop
default training loop using brush-train.brush-app
handles the UI and integrating the training loop. This is also the binary target for the web, and mac/Windows/Linux.brush-android
handles running on android.brush-wgsl
handles some kernel inspection for generating CPU-side structs and interacing with naga-oil to handle shader imports.brush-dataset
handles importing different training data formats.brush-prefix-sum
andbrush-sort
are only compute kernels and should be largely independent of Brush (other thanbrush-wgsl
).rrfd
is a small extension ofrfd
The kernels are written in a "sparse" style, that is, only work for visible gaussians is done, though the final calculated gradients are dense. Brush uses a GPU radix sort based on FidelityFX (see crates/brush-sort
). The sorting is done in two parts - first splats are sorted only by depth, then sorted by their tile ID, which saves some sorting time compared to sorting both depth and tile ids at the same time.
Compatibility with WebGPU does bring some challenges, even with (the excellent) wgpu.
- WebGPU lacks native atomic floating point additions, and a software CAS loop has to be used.
- GPU readbacks have to be async on WebGPU. A rendering pass can't do this unless the whole rendering becomes async, which has its own perils, and isn't great for an UI. The reference tile renderer requires reading back the number of "intersections" (each visible tile of a gaussian is one intersection), but this is not feasible. This is worked around by assuming a worst case. To reduce the number of tiles the rasterizer culls away unused tiles by intersecting the gaussian ellipses with the screenspace tiles.
The WGSL kernels use naga_oil to manage imports. brush-wgsl additionally does some reflection to generate rust code to send uniform data to a kernel. In the future, it might be possible to port the kernels to Burns new CubeCL
language, which is much more ergonomic and would allow generating CUDA / rocM kernels. It might also be possible to integrate with George Kopanos' Slang kernels.
Rendering performance is expected to be very competitive with gSplat, while training performance is still a bit slower. You can run some benchmarks using cargo bench
. The performance of the splatting forward and backwards kernel are faster than the legacy gSplat kernels as they use some new techniques for better performance, but they haven't been compared yet to the more recent gSplat kernels. End-to-end training performance is also still slower, due to other overheads.
For additional profiling, you can use tracy and run with cargo run --release --feature=tracy
.
Quality is similar, but for now still somewhat lagging behind the original GS implementation. This is likely due to some suboptimal splitting/cloning heuristics.
Scene | Brush | GS paper |
---|---|---|
Bicycle@7K | 23.4 | 23.604 |
Garden@7k | 26.1 | 26.245 |
Stump@7k | 25.3 | 25.709 |
gSplat, for their reference version of the kernels
Peter Hedman, George Kopanas & Bernhard Kerbl, for the many discussions & pointers.
The Burn team, for help & improvements to Burn along the way
Raph Levien, for the original version of the GPU radix sort.
This is not an official Google product. This repository is a forked public version of the google-research repository