Skip to content

Commit

Permalink
Merge branch 'perf-tests'
Browse files Browse the repository at this point in the history
  • Loading branch information
Scthe committed Aug 26, 2024
2 parents 501f019 + 91f8e51 commit 3acd5df
Show file tree
Hide file tree
Showing 33 changed files with 941 additions and 137 deletions.
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,15 @@ https://github.com/user-attachments/assets/02859b92-a940-42b6-8381-dcac4b81b4d4
* The second pass is dispatched for every tile and blends its hair segments in a front-to-back order. Done by dividing each depth bin into slices, assigning segments to each, and blending.
* It uses a task queue internally. Each "processor" grabs the next tile from a list once it's done with the current tile.
* Separate [strand-space shading calculation](https://youtu.be/ool2E8SQPGU?si=T0YirLDpKp83CjD2&t=1339). Instead of calculating shading for every pixel, I precalculate the values for every strand. You can select how many points are shaded for each strand. The last point always fades to transparency for a nice, thin tip.
* **Kajiya-Kay diffuse, Marschner specular.** Although I do not calculate depth maps for lights, so TT lobe's weight is 0 by default. I like how the current initial scene looks and reconfiguring lights is booooring!
* **Kajiya-Kay diffuse, Marschner specular.** However, I do not calculate depth maps for lights, so TT lobe's weight is 0 by default. I like how the current initial scene looks and reconfiguring lights is booooring!
* **Fake multiple scattering** [like in UE5](https://blog.selfshadow.com/publications/s2016-shading-course/karis/s2016_pbs_epic_hair.pdf#page=39). See "Physically based hair shading in Unreal" by Brian Karis slide 39 if SIGGRAPH does not allow link.
* **Fake attenuation** mimicking [Beer–Lambert law](https://en.wikipedia.org/wiki/Beer%E2%80%93Lambert_law).
* It also **casts and receives shadows as well as AO**. You can also randomize some settings for each strand.
* [LOD](https://youtu.be/ool2E8SQPGU?si=Zv-1N5Y4-nWvlB6v&t=1643) - the user has strand% slider. In a production system, you would automate this and increase hair width with distance. The randomization happens [in my blender exporter](scripts/tfx_exporter.py).
* [LOD](https://youtu.be/ool2E8SQPGU?si=Zv-1N5Y4-nWvlB6v&t=1643). The user has strand% slider. In a production system, you would automate this and increase hair width with distance. The randomization happens [in my blender exporter](scripts/tfx_exporter.py).
* [Tile sort](https://youtu.be/ool2E8SQPGU?si=85yOaqCmYkUR9nHL&t=1803). Ensures stable frametimes. Sorting is approximate (buckets).
* Blender exporter for the older Blender hair system. It's actually the same file format as I've used in my TressFX ports ([1](https://github.com/Scthe/TressFX-OpenGL), [2](https://github.com/Scthe/WebFX), [3](https://github.com/Scthe/Rust-Vulkan-TressFX)).
* Uses [Sintel Lite 2.57b](http://www.blendswap.com/blends/view/7093) by BenDansie as a 3D model. There were no changes to "make it work" or optimize. Only selecting how many points per each strand.
* You might notice that Sintel's hair is less dense than the one showcased in FIFA. This is actually not good as it means we have to process more depth bins/slices till the pixel/tile saturates. Reminds me of similar nonobvious tradeoffs from [Nanite WebGPU](https://github.com/Scthe/nanite-webgpu/tree/master). On the other hand, the tile pass is cheaper.

### Features: Physics simulation

Expand Down Expand Up @@ -66,22 +68,21 @@ Check [src/constants.ts](src/constants.ts) for full documentation.

I'm using Robin Taillandier and Jon Valdes's presentation ["Every Strand Counts: Physics and Rendering Behind Frostbite’s Hair"](https://www.youtube.com/watch?v=ool2E8SQPGU) as a reference point.

* No skinning to triangles. If a character has a beard, it should move based on the underlying mesh.
* There is a [pass that takes all strands and writes their shaded values](https://youtu.be/ool2E8SQPGU?si=HKPzUIWsHh75qBps&t=1333) (in strand-space) into a buffer. I do this for every strand, Frostbite only for visible ones. This pass is entirely separate from rasterization.
* No hair color from texture. The shading pass has the `strandIdx`, so it's a matter of fetching uv and sampling texture.
* Frostbite uses a software rasterizer to write to a depth (and maybe normal) buffer. This is a bit of a problem because of how software rasterizers work. So I re-render the hair using a hardware rasterizer just for depth and normals. Only the color is software rasterized.
* **No skinning to triangles.** If a character has a beard, it should move based on the underlying mesh.
* We both have a [pass that takes all strands and writes their shaded values](https://youtu.be/ool2E8SQPGU?si=HKPzUIWsHh75qBps&t=1333) (in strand-space) into a buffer. I do this for every strand, **Frostbite only for visible ones**.
* **No hair color from texture.** The shading pass has the `strandIdx`, so it's a matter of fetching uv and sampling texture. This tech was not needed for my demo app.
* **Frostbite uses a software rasterizer to write to a depth (and maybe normal) buffer.** This is a problem because of how software rasterizers work. **So I re-render the hair using a hardware rasterizer just for depth and normals.** Only the color is software rasterized.
* Depth is not a problem (just an atomic op on a separate buffer), normals are. However, the Frostbite presentation does not mention normals. Don't they need them for AO or other stuff? Hair shading can omit AO (I even have supplementary [Beer–Lambert law](https://en.wikipedia.org/wiki/Beer%E2%80%93Lambert_law) attenuation). But what about the skin from which the hair grows? Is it faked in diffuse texture? Or is the hair always dense?
* I also use a hardware rasterizer to render hair into shadow maps. Again, it's not complicated, but someone would have to spend time writing it. And I can't be bothered.
* No pre-sorting of tiles, which can result in some frames taking a bit longer than others.
* No curly hair subdivisions.
* The algorithm they use is part of my Blender exporter. In Blender, each hair is a spline. I convert it to equidistant points. Although implementing this in software rasterizer is *a bit* different.
* No specialized support for [headgear](https://youtu.be/ool2E8SQPGU?si=aAFV_WnUwxJPoIRM&t=2071) like headbands. In Frostbite it requires content authoring to mark selected points as non-dynamic.
* No automatic LODs.Instead, you have a slider that works [exactly like Frostbite's system](https://youtu.be/ool2E8SQPGU?si=NTmreF8azhRz4sVB&t=1646). I randomize the strand order in my Blender exporter.
* A different set of constraints. We both have stretch/length constraints and colliders (both Signed Distance Fields and primitives).
* **No curly hair subdivisions.**
* The algorithm they use is part of my Blender exporter. In Blender, each hair is a spline. I convert it to equidistant points. However, implementing this in software rasterizer is *a bit* different.
* **No specialized support for [headgear](https://youtu.be/ool2E8SQPGU?si=aAFV_WnUwxJPoIRM&t=2071) like headbands.** Frostbite requires content authoring to mark selected points as non-dynamic.
* **LOD is manual instead of automatic.** Frostbite [automatically calculates rendered strand count](https://youtu.be/ool2E8SQPGU?si=NTmreF8azhRz4sVB&t=1646). I give you control over this parameter.
* **I simulate all hair strands. Frostbite can choose how much and interpolate the rest.**
* **A different set of constraints.** We both have stretch/length constraints and colliders (both Signed Distance Fields and primitives).
* I have extra global shape constraints, based on my experience with [TressFX](https://github.com/Scthe/Rust-Vulkan-TressFX). I assume that Frostbite also has this, but maybe under a different term (like "shape matching")?
* Frostbite has a global length constraint.
* We have different implementations for local shape constraints. Mine is based on "A Triangle Bending Constraint Model for Position-Based Dynamics" - [Kelager10](http://image.diku.dk/kenny/download/kelager.niebe.ea10.pdf).
* I simulate all hair strands. Frostbite can choose how much and interpolate the rest.

Some things were not explained in the presentation, so I gave my best guess. E.g. the aero grid update step takes wind and colliders as input. But does it do fluid simulation for nice turbulence and vortexes? Possible, but not likely. I just mark 3 regions: lull (inside the mesh), half-lull (grid point is shielded by a collider, half strength), and full strength.

Expand All @@ -91,7 +92,7 @@ Ofc. I cannot rival Frostbite's performance. I am a single person and I have muc
## Usage

* Firefox does not support WebGPU. Use Chrome instead.
* Use the `[W, S, A, D]` keys to move and `[Z, SPACEBAR]` to fly up or down. `[Shift]` to move faster. `[E]` to toggle depth pyramid debug mode.
* Use the `[W, S, A, D]` keys to move and `[Z, SPACEBAR]` to fly up or down. `[Shift]` to move faster.
* As all browsers enforce VSync, use the "Profile" button for accurate timings.

### Running the app locally
Expand Down
3 changes: 2 additions & 1 deletion deno.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
"tasks": {
"start": "DENO_NO_PACKAGE_JSON=1 && deno run --allow-read=. --allow-write=. --unstable-webgpu src/index.deno.ts",
"compile": "DENO_NO_PACKAGE_JSON=1 && deno compile --allow-read=. --allow-write=. --unstable-webgpu src/index.deno.ts",
"test": "DENO_NO_PACKAGE_JSON=1 && deno test --allow-read=. --allow-write=. --unstable-webgpu src"
"test": "DENO_NO_PACKAGE_JSON=1 && deno test --allow-read=. --allow-write=. --unstable-webgpu src",
"testSort": "DENO_NO_PACKAGE_JSON=1 && deno test --allow-read=. --allow-write=. --unstable-webgpu src/passes/swHair/hairTileSortPass.test.ts"
},
"imports": {
"png": "https://deno.land/x/[email protected]/mod.ts",
Expand Down
3 changes: 3 additions & 0 deletions makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ run:
test:
$(DENO) task test

testSort:
$(DENO) task testSort

# Generate .exe
compile:
$(DENO) task compile
Expand Down
24 changes: 19 additions & 5 deletions src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,18 @@ type RGBColor = [number, number, number];

export const DISPLAY_MODE = {
FINAL: 0,
/** Hair tiles using segment count per-tile buffer */
TILES: 1,
HW_RENDER: 2,
USED_SLICES: 3,
DEPTH: 4,
NORMALS: 5,
AO: 6,
/** Hair tiles using PPLL */
TILES_PPLL: 2,
/** Harware rasterize */
HW_RENDER: 3,
/** HairFinePass' slices per pixel. Not super accurate due to per pixel/tile early-out optimizations */
USED_SLICES: 4,
/**zBuffer clamped to sensible values */
DEPTH: 5,
NORMALS: 6,
AO: 7,
};

export type HairFile =
Expand Down Expand Up @@ -236,6 +242,10 @@ export const CONFIG = {
*/
invalidTilesPerSegmentThreshold: 64,

////// SORT PASS
sortBuckets: 64,
sortBucketSize: 16,

////// FINE PASS
/** This is like slices per pixel in original Frostbite presentation, but the slices are inside each depth bin */
slicesPerPixel: 8,
Expand All @@ -247,6 +257,10 @@ export const CONFIG = {
finePassWorkgroupSizeX: 1,
/** Where to store the PPLL slice heads data */
sliceHeadsMemory: 'workgroup' as SliceHeadsMemory,
/** Given distance between pixel and strand, how to calculate alpha? Can be linear 0-1 from strand edge to middle. Or quadratic (faster, denser, but more error prone and 'blocky'). */
alphaQuadratic: false,
/** Alpha comes from pixel's distance to strand. Multiply to make strands "fatter". Faster pixel/tile convergence at the cost of Anti Alias. fuzzy edges. */
alphaMultipler: 1.1,

////// LOD
lodRenderPercent: 100, // LOD %. Fun fact, performance is NOT linear. Range [0..100]
Expand Down
13 changes: 7 additions & 6 deletions src/passes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,13 @@ Passes:
2. [ShadowMapPass](shadowMapPass) to update shadow map. Has separate `GPURenderPipeline` for meshes and hair. Uses a hardware rasterizer for hair, but you should change this if you have extra time.
3. [DrawMeshesPass](drawMeshes) draws solid objects. This also includes a special code for the ball collider.
4. [HairTilesPass](swHair/hairTilesPass.ts) software rasterizes hair segments into tiles. Or, to be more precise, into each tile's depth bins. Dispatches a thread for each hair segment.
5. [HairFinePass](swHair/hairFinePass.ts) software rasterizes each tile and writes the final pixel colors into the buffer. It contains the main part of the order-independent transparency implementation. It uses a task queue internally. Each "processor" grabs the next tile from a list once it's done with the current one. Dispatches a thread for each processor.
6. [HairCombinePass](hairCombine) writes the software-rasterized hair into the HDR texture. Has special code for debug modes.
7. Update depth and normal buffers using [hardware rasterizer](hwHair).
8. [AoPass](aoPass) - GTAO.
9. [HairShadingPass](hairShadingPass) updates the shading for each hair strand. Requires AO and normals. Dispatches a thread for each shading point on each hair strand.
1. You might consider moving this before the software rasterizer if you want.
5. [HairTileSortPass](swHair/hairTileSortPass.ts) sorts the tiles by the segment count (decreasing order). Used to better balance workload. The sorting is approximate (based on buckets).
6. [HairFinePass](swHair/hairFinePass.ts) software rasterizes each tile and writes the final pixel colors into the buffer. It contains the main part of the order-independent transparency implementation. It uses a task queue internally. Each "processor" grabs the next tile from a list once it's done with the current one. Dispatches a thread for each processor.
7. [HairCombinePass](hairCombine) writes the software-rasterized hair into the HDR texture. Has special code for debug modes.
8. Update depth and normal buffers using [hardware rasterizer](hwHair).
9. [AoPass](aoPass) - GTAO.
10. [HairShadingPass](hairShadingPass) updates the shading for each hair strand. Requires AO and normals. Dispatches a thread for each shading point on each hair strand.
1. You might consider moving this before the software rasterizer if you want.
3. Finish
1. [DrawGizmoPass](drawGizmo) renders the move gizmo for the ball collider.
2. [DrawSdfColliderPass](drawSdfCollider) and [DrawGridDbgPass](drawGridDbg) are debug views for physics simulation.
Expand Down
22 changes: 22 additions & 0 deletions src/passes/_shared/shared.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,25 @@ export const useDepthStencilAttachment = (
depthStoreOp,
};
};

// TODO [LOW] use everywhere
export const createComputePipeline = (
device: GPUDevice,
passClass: PassClass,
shaderText: string,
name = '',
mainFn = 'main'
): GPUComputePipeline => {
const shaderModule = device.createShaderModule({
label: labelShader(passClass, name),
code: shaderText,
});
return device.createComputePipeline({
label: labelPipeline(passClass, name),
layout: 'auto',
compute: {
module: shaderModule,
entryPoint: mainFn,
},
});
};
2 changes: 2 additions & 0 deletions src/passes/hairCombine/hairCombinePass.ts
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ export class HairCombinePass {
hairTilesBuffer,
hairTileSegmentsBuffer,
hairRasterizerResultsBuffer,
hairSegmentCountPerTileBuffer,
} = ctx;
const b = SHADER_PARAMS.bindings;

Expand All @@ -99,6 +100,7 @@ export class HairCombinePass {
bindBuffer(b.tilesBuffer, hairTilesBuffer),
bindBuffer(b.tileSegmentsBuffer, hairTileSegmentsBuffer),
bindBuffer(b.rasterizeResultBuffer, hairRasterizerResultsBuffer),
bindBuffer(b.segmentCountPerTile, hairSegmentCountPerTileBuffer),
]);
};
}
29 changes: 24 additions & 5 deletions src/passes/hairCombine/hairCombinePass.wgsl.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@ import * as SHADER_SNIPPETS from '../_shaderSnippets/shaderSnippets.wgls.ts';
import { BUFFER_HAIR_TILE_SEGMENTS } from '../swHair/shared/hairTileSegmentsBuffer.ts';
import { BUFFER_HAIR_RASTERIZER_RESULTS } from '../swHair/shared/hairRasterizerResultBuffer.ts';
import { SHADER_TILE_UTILS } from '../swHair/shaderImpl/tileUtils.wgsl.ts';
import { BUFFER_SEGMENT_COUNT_PER_TILE } from '../swHair/shared/segmentCountPerTileBuffer.ts';

export const SHADER_PARAMS = {
bindings: {
renderUniforms: 0,
tilesBuffer: 1,
tileSegmentsBuffer: 2,
rasterizeResultBuffer: 3,
segmentCountPerTile: 4,
},
};

Expand All @@ -31,6 +33,7 @@ ${RenderUniformsBuffer.SHADER_SNIPPET(b.renderUniforms)}
${BUFFER_HAIR_TILES_RESULT(b.tilesBuffer, 'read')}
${BUFFER_HAIR_TILE_SEGMENTS(b.tileSegmentsBuffer, 'read')}
${BUFFER_HAIR_RASTERIZER_RESULTS(b.rasterizeResultBuffer, 'read')}
${BUFFER_SEGMENT_COUNT_PER_TILE(b.segmentCountPerTile, 'read')}
@vertex
Expand Down Expand Up @@ -62,8 +65,8 @@ fn main_fs(
let tileXY = getHairTileXY_FromPx(fragPositionPx);
let displayMode = getDisplayMode();
if (displayMode == DISPLAY_MODE_TILES) {
result.color = renderTileSegmentCount(viewportSizeU32, tileXY);
if (displayMode == DISPLAY_MODE_TILES || displayMode == DISPLAY_MODE_TILES_PPLL) {
result.color = renderTileSegmentCount(displayMode, viewportSizeU32, tileXY);
} else {
var color = vec4f(0.0, 0.0, 0.0, 1.0);
Expand Down Expand Up @@ -95,19 +98,26 @@ fn getDebugTileColor(tileXY: vec2u) -> vec4f {
}
fn renderTileSegmentCount(
displayMode: u32,
viewportSize: vec2u,
tileXY: vec2u
) -> vec4f {
var color = vec4f(0.0, 0.0, 0.0, 1.0);
// output: segment count in each tile normalized by UI provided value
let maxSegmentsCount = getDbgTileModeMaxSegments();
let segments = getSegmentCountInTiles(viewportSize, maxSegmentsCount, tileXY);
var segments = 0u;
if (displayMode == DISPLAY_MODE_TILES) {
segments = getSegmentCountInTiles_Count(viewportSize, maxSegmentsCount, tileXY);
} else {
segments = getSegmentCountInTiles_PPLL(viewportSize, maxSegmentsCount, tileXY);
}
color.r = f32(segments) / f32(maxSegmentsCount);
color.g = 1.0 - color.r;
// dbg: tile bounds
// let tileIdx: u32 = getHairTileIdx(viewportSize, tileXY, 0u);
// let tileIdx: u32 = getHairTileDepthBinIdx(viewportSize, tileXY, 0u);
// color.r = f32((tileIdx * 17) % 33) / 33.0;
// color.a = 1.0;
Expand All @@ -117,7 +127,7 @@ fn renderTileSegmentCount(
return color;
}
fn getSegmentCountInTiles(
fn getSegmentCountInTiles_PPLL(
viewportSize: vec2u,
maxSegmentsCount: u32,
tileXY: vec2u
Expand All @@ -142,4 +152,13 @@ fn getSegmentCountInTiles(
return count;
}
fn getSegmentCountInTiles_Count(
viewportSize: vec2u,
maxSegmentsCount: u32,
tileXY: vec2u
) -> u32 {
let tileIdx = getHairTileIdx(viewportSize, tileXY);
return _hairSegmentCountPerTile[tileIdx];
}
`;
2 changes: 2 additions & 0 deletions src/passes/passCtx.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,6 @@ export interface PassCtx {
hairTilesBuffer: GPUBuffer;
hairTileSegmentsBuffer: GPUBuffer;
hairRasterizerResultsBuffer: GPUBuffer;
hairTileListBuffer: GPUBuffer;
hairSegmentCountPerTileBuffer: GPUBuffer;
}
6 changes: 5 additions & 1 deletion src/passes/renderUniformsBuffer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ export class RenderUniformsBuffer {
const DISPLAY_MODE_FINAL = ${DISPLAY_MODE.FINAL}u;
const DISPLAY_MODE_TILES = ${DISPLAY_MODE.TILES}u;
const DISPLAY_MODE_TILES_PPLL = ${DISPLAY_MODE.TILES_PPLL}u;
const DISPLAY_MODE_HW_RENDER = ${DISPLAY_MODE.HW_RENDER}u;
const DISPLAY_MODE_USED_SLICES = ${DISPLAY_MODE.USED_SLICES}u;
const DISPLAY_MODE_DEPTH = ${DISPLAY_MODE.DEPTH}u;
Expand Down Expand Up @@ -398,7 +399,10 @@ export class RenderUniformsBuffer {
const hr = CONFIG.hairRender;
let extraData = 0;

if (c.displayMode === DISPLAY_MODE.TILES) {
if (
c.displayMode === DISPLAY_MODE.TILES ||
c.displayMode === DISPLAY_MODE.TILES_PPLL
) {
extraData = hr.dbgTileModeMaxSegments;
} else if (c.displayMode === DISPLAY_MODE.USED_SLICES) {
extraData = hr.dbgSlicesModeMaxSlices;
Expand Down
Loading

0 comments on commit 3acd5df

Please sign in to comment.