Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QoL improvement for displaying GPU profiling information #59

Open
DBouma opened this issue Jan 20, 2022 · 5 comments
Open

QoL improvement for displaying GPU profiling information #59

DBouma opened this issue Jan 20, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@DBouma
Copy link

DBouma commented Jan 20, 2022

The problem

Currently if you want to display GPU timestamps in puffin, you are forced to open a stream and manually convert all your measured timings from GPU timeline to the associated CPU timeline. Since GPU work is always executed an arbitrary time after a submit call, we would either need to:
A: Wait on a fence before proceeding to the next frame to associate GPU timings with the current measured frame
B: Read back n frames back and display that into the current puffin frame
C: Modifying a previously written frame with the measured GPU timings

In case A, puffin displays the total CPU frame plus the time elapsed from cpu work submission to gpu work execution, along with the total GPU frame. While technically correct, this results in a total frame time that is significantly higher compared to the actual frame time of an app, which is not super useful for profiling a total frametime since work overlaps in reality. This is also not an option in general as this forces single buffering.

In case B, puffin displays the measured frame n on the timeline before puffin's measured cpu frame times. This is also technically correct but again suffers from the same issue that case A has (total frame time not being super useful). However this allows for double/triple buffering without too much trouble.

Case C is not something that we are using puffin for, since we use it for realtime visualization of CPU and GPU workloads. This could however be worked around with by allowing a frame visualization delay, but that would be a different Issue alltogether.

Suggestion

I believe puffin would greatly benefit from the ability to separate timelines. Currently it's separated per thread but still shares the same timeline, thus sharing the total execution time. But for GPU timings it makes more sense to be able to define a separate timeline that has it's own "frame time", while potentially still being able to associate CPU and GPU work.

Alternatives

We currently have a little workaround locally that shifts the measured GPU timings to the start of the current puffin frame, essentially working around the total frametime if needed. However this workaround is not ideal and does not give the correct context to where the GPU workload was fired off.

@VZout
Copy link
Member

VZout commented Jan 25, 2022

I think this would be a great improvement! Possibly behind a feature toggle? If your gonna give this a shot and run into problems feel free to reach out and we can look at solutions together.

@MarijnS95
Copy link
Contributor

One solution we discussed was to display a total duration "per thread" stream, ie. right after puffin::ThreadInfo::name. This will still poison the total frame times used in the "recent" and "slowest" frame view though, which appear to be much more complicated to untangle.

For recombining a CPU recording with the GPU timestamps we envision some small refactors to be able to take out the GlobalReporter (the FrameData with perhaps multiple streams it recorded) at the end of a frame and substitute it with a new/empty one. The stream will be held on to until the GPU frame is complete, at which point everything is translated and sent to a sink at once.

Finally, for mapping the GPU timeline to the CPU timeline we currently perform some improper tricks such as calling puffin::now_ns() at around the same time we "calibrate" the clock through the respective graphics APIs (not that it matters when GPU and CPU timelines are shifted to overlap). A better solution is to be able to map the CLOCK_MONOTONIC or QueryPerformanceCounter result from those graphics APIs to puffin::now_ns() directly, but even if we get access to puffins static value for std::time::UNIX_EPOCH.elapsed() there's no way to take the other raw value out of Instant::now().
Fortunately though it seems we can easily replace ThreadProfiler::now_ns with a custom function that returns CLOCK_MONOTONIC/QueryPerformanceCounter directly 😄

@VZout
Copy link
Member

VZout commented Jan 25, 2022

We currently do the same trick with puffin::now_ns(). Would definitely be nice to replace that with something proper. A function that returns the clock seems like an easy solution.

I'm not too familiar with the code base yet but what your suggesting makes sense to me. @emilk Do you have any opinions around this?

@MarijnS95
Copy link
Contributor

MarijnS95 commented Jan 25, 2022

I'd implement the syscalls that feed Instant by hand (as mentioned above the underlying raw value is not available) in now_ns(), so that we have direct access to the raw value that matches what the graphics APIs provide.

Next we provide the UNIX start timestamp with the raw clock timestamp publicly to the user, they can then do the calculation and translate a GPU time with clock calibration to CLOCK_MONOTONIC/QueryPerformanceCounter from the graphics API first, then to puffin by adding the UNIX start time.

I can PR this if everyone agrees on this solution :)

@emilk
Copy link
Collaborator

emilk commented Jan 26, 2022

I believe puffin would greatly benefit from the ability to separate timelines. Currently it's separated per thread but still shares the same timeline, thus sharing the total execution time. But for GPU timings it makes more sense to be able to define a separate timeline that has it's own "frame time", while potentially still being able to associate CPU and GPU work.

If you want separate time-lines, then it sounds like you should just report CPU times and GPU times to two different places! Use puffin::GlobalReporter for the CPU time (for convenience with the macros), and then report GPU times into its own puffin::FrameView. When visualizing, use two separate puffin_egui::ProfilerUi 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants