-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU tracing and providers #124
Comments
UIforETW already uses multiple ETL files and merges them - three of them in the case of heap tracing. However that is mostly just an implementation detail - tracing from multiple providers can be done to one or many ETL files (kernel and user-mode providers are separate, but that is the only enforced separation). The real question is what is the benefit? I'm not at all opposed to increasing the set of DX providers, especially if it helps diagnose some issues, but randomly adding providers without understanding what data they are supplying and how high their data rate will be comes with some risks. Any ideas which of these add value? Note that if there is a specific rare scenario that needs investigation then custom providers can be added in the settings dialog. The GPU tracing option should have a reasonable set of defaults for GPU tracing. |
I don't know what all of those exactly do, but some are kind of important such as the This is also why I was talking about adding the presets of
It also mentions Based on the
In the kernel etl file
|
The kernel providers listed are indeed already recorded - they are key to many types of analysis. Those six providers seem reasonable, although I'll need to test to make sure they don't have too high a data rate (whatever that means). I've also seen problems with OS dependencies on the graphics providers before - I want UIforETW to continue working on Windows 7. |
Any news on this? It would be amazing to be able to use this instead of the log.cmd which comes with GPUView, especially given the keylogger in UIforETW. Oh, and to answer your question about why this is useful: https://graphics.stanford.edu/~mdfisher/GPUView.html and https://www.anandtech.com/show/6857/amd-stuttering-issues-driver-roadmap-fraps/4 show good examples. |
You can already add custom providers in the Settings dialog. There are places to enter extra kernel flags/providers, kernel stack walks, and extra user mode providers. But, that is not the most intuitive or friendly way of configuring GPU providers. It would be reasonable to change the GPU Tracing checkbox to a Combo box dropdown with choices like None, Min, Medium, and Full. If None was the default and Min (roughly) matched the current settings then Medium and Full/Maximum could include whatever they want since they are opt-in. But I haven't done any work on this and I don't know when I might. A pull request would be welcomed. |
Sorry for the delay in replying - I took a break over Christmas, lost my browser tabs and the rest is (not in my browser) history. Anyway Merry Christmas and a Happy New Year to you! LOL So, It took me a little (a lot of) fiddling to get this working but I did get there in the end. Essentially, I collected all the variables in the log.cmd, concatenated them all, removed the duplicates and redundancies, and was left with the string of provider GUIDs. I believe there are some redundancies here with the provider names used by the GUI, and there are definitely some conflicts, but that is resolved by disabling the setting in the GUI and just letting the GUIDs I've extracted define those options. This is the configuration I'm using: Extra kernel mode flags: user mode providers: Two important settings to note: GPU tracing from the main UI must be unticked. Something about it conflicts with the GUIDs provided. I will be honest and say I can't remember what, but this was necessary. Tracing to file is mandatory. Circular memory buffers do not work. This option has also been removed from the log.cmd, I cannot find any documentation on it, so I don't honestly know why it isn't working. Hope this helps! Is there somewhere I should look in the codebase, to see if perhaps I could submit a patch? |
Thanks for looking in to this. If you want to create a PR that fixes up the GPU tracing options to use the new-and-improved settings then that would be great. If you search for bGPUTracing_ in https://github.com/google/UIforETW/blob/master/UIforETW/UIforETWDlg.cpp then you should find all of the relevant code. I seem to remember that the required providers are different on Windows 7 so we would either need to test there or make no changes for that OS. I guess the GPU tracing option needs to be disabled for circular-buffer tracing, although it does seem quite odd that that combination doesn't work. The 'stack' options tend to be quite expensive, in terms of CPU and buffer size, so I wonder if those should be omitted. I guess it depends on how frequently the stacks are actually used. I doubt that GPUView depends on them. |
It looks like you included the DXC_ALL provider in the additional user mode
providers. I'd caution against setting these as the default GPU tracing
options since it may result in changes in behavior by the DX runtime or the
GPU itself. (Like forcing a STABLE_POWER_STATE)
…On Mon, Jul 8, 2019 at 6:59 AM NomDeMorte ***@***.***> wrote:
Sorry for the delay in replying - I took a break over Christmas, lost my
browser tabs and the rest is (not in my browser) history. Anyway Merry
Christmas and a Happy New Year to you! LOL
So, It took me a little (a lot of) fiddling to get this working but I did
get there in the end. Essentially, I collected all the variables in the
log.cmd, concatenated them all, removed the duplicates and redundancies,
and was left with the string of provider GUIDs. I believe there are some
redundancies here with the provider names used by the GUI, and there are
definitely some conflicts, but that is resolved by disabling the setting in
the GUI and just letting the GUIDs I've extracted define those options.
This is the configuration I'm using:
[image: image]
<https://user-images.githubusercontent.com/45681208/60815198-70dd3b00-a1da-11e9-8159-d397e0cb5b1e.png>
[image: image]
<https://user-images.githubusercontent.com/45681208/60815259-95d1ae00-a1da-11e9-893b-e2def1259665.png>
Extra kernel mode flags: LOADER+PROC_THREAD+CSWITCH+DISPATCHER+POWER
user mode providers:
802ec45a-1e99-4b83-9920-87c98277ba9d:0x10FFFF:5:'stack'+a688ee40-d8d9-4736-b6f9-6b74935ba3b1:0xffff:5+CA11C036-0102-4A2D-A6AD-F03CFED5D3C9:0xf:6:'stack'+5d8087dd-3a9b-4f56-90df-49196cdc4f11:0xffffffffffffffff:6:'stack'+db6f6ddb-ac77-4e88-8253-819df9bbf140:0xffffffffffffffff:6:'stack'+7E7D3382-023C-43cb-95D2-6F0CA6D70381:0x1+DX:0x9+531A35AB-63CE-4BCF-AA98-F88C7A89E455:0xffff:4+ee685ec4-8270-4b08-9e4e-8b356f48f92f:0:1
Two important settings to note:
GPU tracing from the main UI must be unticked. Something about it
conflicts with the GUIDs provided. I will be honest and say I can't
remember what, but this was necessary.
Tracing to file is mandatory. Circular memory buffers do not work. This
option has also been removed from the log.cmd, I cannot find any
documentation on it, so I don't honestly know why it isn't working.
Hope this helps! Is there somewhere I should look in the codebase, to see
if perhaps I could submit a patch?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#124?email_source=notifications&email_token=AAW2K3FYIZ5GAWANJ3H35QDP6NB23A5CNFSM4FYQJAZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZNFK2A#issuecomment-509236584>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAW2K3GQMJPDOMB6T2A63OLP6NB23ANCNFSM4FYQJAZQ>
.
|
What provider level is recommended for general tracing? We don't need the most intense recording available, just enough to diagnose some basic CPU issues. The main thing I watch for is being able to see GPU utilization by process in WPA - the current settings seem to work for that. |
Thanks for the heads up. It's entirely possible that I'm doing this entirely wrong. I'm a dev but ETW is not in my skillset at all... Is it the flags or the 'stack' option that cause this? I guess the issue here is that the logger handling is different for the batch file and for UIforETW. I knew it was a square peg in a round hole but I figured so long as I kept the user and kernel providers separated I'd be OK - obviously I'm going to need to be more careful. Bear with me while I butcher this - If there's documentation explaining the difference between enabling the provider and capturing its state, I can't find it, UIforETW uses the same approach as the batch file does for the 'longhaul' switch: GPUView's log.cmd for all other normal modes is quite different: What I've done, is mash all the user providers together, and where the same provider was used twice it failed, so I took the one with the most verbose flags, which was always the second set that was intended to be used with the -capturestate switch I could just omit the duplicates by taking the ones with the less verbose flags from the first set... But perhaps that will cause problems by using them with the capturestate switch? Either way, it's not going to match what the batch file does. Or perhaps it would be worthwhile, for compatibility with GPUView, for UIforETW to allow two sets of custom user providers, one for the session with the -on switch and one for the -capturestate switch? Please pardon me if I'm being annoying, I'm aware I'm out of my field here. |
I have no idea why that is done. The whole -capturestate dance is something I do not understand. I think that you can safely do a -capturestate on all of the user-mode providers and that is what I would default to doing unless that doesn't work. log.cmd seems to like using multiple tracing sessions. Lots of tracing systems seem to like that. It can let you specify different amounts of buffering for different providers but that seems like a bad idea because then they wrap at different times. UIforETW uses a kernel session and a user-mode session and I believe that that is the best design. As with -capturestate I think the best thing to do is to put them all in one session as long as it works. If the same provider is specified twice with different flags then the -start flags are probably the relevant ones. Doing a -capturestate with a more inclusive set of flags probably causes the extra bits to be ignored. That is, it is probably a bug. Alternately you could OR the two flag sets together, but that would mean capturing more data. Using the -start settings for -capturestate as well seems more sane. I think that log.cmd is old, crufty, buggy and inconsistent. |
From reading the docs it appears that -capturestate is only needed with -buffering. That is, it is only needed with circular-buffer tracing. If log.cmd doesn't handle that then -capturestate is unnecessary (and UIforETW should be cleaned up to not use -capturestate in that case). https://docs.microsoft.com/en-us/windows-hardware/test/wpt/start |
Hey I fixed the notification hurrah! Thanks for working with me on this
Sure seems that way. Honestly I think it's trying to do something beyond a batch file's effective scope. It's almost like they need some kind of UI for ETW ;) It also seems as though the whole thing was built to handle the restrictons of a far older PC, where memory and disk space were at a far greater premium - which would explain the conservative buffering at the expense of buffer wrapping out of sync. ironically, log.cmd uses the same buffer size for everything so having the two sets of usermode sessions each with a differnt set of capturestates seems entirely redundant. Accordingly, I agree that it would be good to make it work the UIforETW way of having one usermode session and one kernel. Do you know of any documentation of what the flags actually mean? This might give me a more educated perspective on what flags are actually needed from the perspective of analysing 3d applications (games). The MS doco says to "See providers for a list of valid flags." And of course those commands output a list of providers with no mention of the flags at all.
Well, this is what I also thought, but @jeche seems to know of a pitfall there. I'm not sure if it would be OK if I used differnt flags, or dropped the 'state' parameter - which I honestly do not know the prupose of thanks to the aforementioned lack of documentation. Again, thanks for your help with this. The whole toolchain is basically voodoo magic to me. |
I have never seen documentation on the flags. In the case of Chrome's ETW provider we mapped some flag bits to chrome://tracing categories and you can see that mapping by looking in the Chrome or UIforETW source code, or you can just use the list of categories in the settings box. One way of decoding the flags is to record a trace with all flags and then look in WPA under Trace->System Configuration-> Trace Statistics in the Keyword column. That Keyword is ANDed with the flags and if the result is non-zero then the event is emitted. In the case of GPU tracing just copying the flags used for the -start command in the MIN configuration is probably best. Ignore the -capturestate case because I think that is irrelevant for tracing to file, should use exactly the same flags, and should (probably) be done in a separate place - but that is a UIforETW bug (issue #136). I think @jeche was saying that some DXC_ALL options got pulled in instead of DXC_MIN, leading to overly aggressive tracing that could affect behavior. I didn't read anything about -capturestate into that. And yeah, batch files for general-purpose ETW tracing are a mess. |
Hi,
It seems that the latest version of GPUView now ships with a log.wprp that can be used with Windows Performance Recorder which is apparently the official GUI for capturing ETW traces. (I'm now trying to install the latest Windows ADK to check this) |
OK so it's not just me that failed to find the flags doco, I feel better now. That flag masking keyworks is an outstanding tip, thank you. How do you even know that? :) I feel that it would be good to have that drop-down list you mentioned, or at least take the default configuration of NORMAL if we have to choose just one - practically all of the documentation around just says to run log.cmd with no parameters, and it's also the configuration used by nvidia for tracing game performance so I expect it should be sufficiently lightweight as to not interfere even with that load. The only documentation I've seen that refers to using MIN is this blog - although that's about the best GPUView documentation available so it does carry quite some weight. I do know that VERBOSE lives up to it's name and I doubt it would be necessary. For reference, GPUview describes them as: Log m | Minimum | Log only the minimum basic events I had some other things to add but lectem has just dropped a massive knowledge bomb so I'll come back to that later hah Looks like having three sessions rather than two might be in order after all... |
FYI, we (the DirectX team) found that Normal logging can have somewhat significant impact for some applications, largely due to stack capture. If all you're looking for are performance metrics, then I'd suggest using log.cmd with the "Present" mode (there's an equivalent mode in the wprp file as well). This is the minimum set of providers that will let you do full performance analysis, tracing application framerates, as well as other metrics like latencies. Check out PresentMon for a tool that can consume that minimal set of providers and produce performance metrics. |
Please, don't be sorry, that information is golden. Thanks so much! As I mentioned above, this new information explains why three sessions is necessary and on top of that now we actually know what -capturestate does and why the flags don't conflict. It's great to see that the toolchain is being updated. I know that one major issue with the log file was handling different locales and I didn't see any easy way around that without heading down the GUI route. I'm sure this will be well received in those countries. @randomascii is having three sessions in UIforETW out of the question? As good a tool as WPR is, it's not as useful for me as UIforETW is, I really make use of the key/mouse logging. @jenatali Thanks to you too! Man this thread is officially the single best knowledge base for GPUView online now, not even joking. It's interesting to know about that performance impact, the only clue I had was Matt's blog (linked above) where he mentioned that the impact was very low - but that was quite some time ago, so it's good to get some more updated information. Since some of what I'm doing is bug hunting (especially related to networking and memory and their effect on the game), I'm essentially trying to capture a mystery, so my previous approach has been 'capture it all and filter it later" but obviously I'm going to need to lay off the gas a little. That being said, I can't help but wonder if using present mode would be sufficient for such a purpose. I used PresentMon and OCAT in the past and stumbled across UIforETW when I was looking for something that would allow me to trace keyboard inputs. You guys are a goldmine. |
I used to work on the team that produces the ETW toolset so I got some training. That plus years of use and exploring. It is not at all impossible to do three ETW streams in UIforETW, I just wasn't clear on why it was necessary. It's just a bit more code to create them, and I guess a few new failure modes, but nothing too bad. It would also be great to have an explanation (from @jenatali or @Lectem) on when the --capturestate commands should be issued (at the start or the end) and whether GPU tracing really is incompatible with circular-buffer tracing. Any advice for dealing with Windows 7 would be useful also, although I'm about ready to stop caring about that. The GPU tracing check-box could easily be changed to a drop-box with a few different options if that is helpful. |
I'm attaching the WPRP version of the log file, since it looks like it inadvertently didn't make it into the public ADK. As far as where the capturestate commands should go, log.cmd puts them at the beginning, but we have a circular logging version that puts them at the end. The high-level is that GPUView needs an event that either came from object creation, or from a capturestate command, for objects that are referenced by other events. If you try to emit a capturestate at the beginning of a circular log, then it can get erased by the time you finish the trace. But if you capturestate at the end, then you have to walk through the entire trace before you can really start parsing the results (which GPUView does anyway). One of my explicit goals with PresentMon was to avoid requiring a capturestate, and to be able to do everything with realtime events. For Win7 support, the only difference is that the graphics kernel provider is a different GUID. You can see the PresentMon change for adding Win7 support. |
@randomascii I figured that the best way I could help with this, would be to get out of the way ;) But, it's been a little while, so I thought maybe I should ask if there is anything I can do to help? |
This isn't a high priority task for me so it could be months before I get around to it, if ever. If somebody else wants to create a PR then I'd be happy to look at it. |
The GPU tracing option only adds the DX provider, while GPU View uses a lot more providers/states, would you be interested in having GPUView presets added ?
From the batch file
log.cmd
of GpuView, we can see the following providers can be used:I know that this batch file uses 3 different files however, and then merges them, I am unsure if it would be supported by UIforETW
This
log.cmd
file can be found underC:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\gpuview\log.cmd
The text was updated successfully, but these errors were encountered: