Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger: Overhaul symbol table parsing (add .mdebug and .sndata support) and add inspector GUI for complex data types #10224

Merged
merged 5 commits into from
Aug 27, 2024

Conversation

chaoticgd
Copy link
Contributor

@chaoticgd chaoticgd commented Nov 4, 2023

Description of Changes

I have imported symbol table parsing, symbol database and C++ AST code from CCC, entirely replacing the existing symbol map class, and implemented a GUI to inspect data structures in memory using this information.

The original commit history is on the data_inspector_backup branch.

New screenshot:

Screenshot_20240314_184024

Old screenshot:

Screenshot_20231104_003704

Some highlights:

  • CCC library rewritten to support importing symbol from multiple different symbols tables at once (e.g. some translation units will lack debugging symbols but will have linker symbols) and user-defined symbols.
    • The new symbols database uses a handle system for accessing symbols. This is useful for retaining persistent references to symbols.
    • Support for MIPS debug (.mdebug), SNDLL (.sndata) and ELF (.symtab) symbol tables. For information about these formats, see the readme linked above.
    • This takes up quite a bit more memory (on the order of 100MB) than what was there before when a full .mdebug symbol table is present. If it's too much it could be possible to only load the symbols when the debugger is opened, but I haven't implemented that so far. See #11901.
    • Do comment on whether it would be better for me to put this in pcsx2/DebugTools/ccc/ like I have now with the PCSX2 license headers, or in 3rdparty/ccc/ with my own license headers. The original library is MIT so the latter might make it easier for me to copy changes back and forth? It probably doesn't matter too much, so whatever is convenient is probably fine. I've moved it.
  • New SymbolGuardian class that controls access to the symbol database.
    • The SymbolMap class has been entirely removed and replaced with this.
    • When an ELF file is loaded a worker thread is created to parse the symbol tables. Parsing a fully loaded .mdebug symbol table usually takes about a second (on my PC anyway).
      • One odd side effect of this is if new code is loaded in the meantime (e.g. loading from a savestate), the result of analysis may depend on how long parsing the symbol table takes. I suspect a good solution to this would be let the user scan for functions on demand, but that seems slightly out of scope for this PR. I could make ScanForFunctions use the code from the ELF instead. Please comment on what would be preferable.
      • Some built-in types are automatically added to the database so that even without debug symbols the symbol trees are somewhat useful.
  • The symbol trees in the debugger GUI can be used to inspect complex data types in memory.
    • Uses Qt model/view system.
    • Options for adding/removing/modifying symbols using dialogs and delegates.
    • Options for grouping symbols by module/ELF section/source file.
    • Function hashing system to detect symbols that are no longer valid and gray them out.
  • Added the GNU demangler to replace the Avast one that was there before.
    • This is required to support the GCC 2.x mangling scheme used in lots and lots of games.
    • Based on code from GCC 13.2.0 with the old GCC 2.x demangler (circa 2018) ported back into it.
    • Includes my own fix for a memory safety issue with the cplus_demangle_opname function.
    • A readme is included that describes how I came up with the code that I did.
  • The disassembly manager doesn't really work very well with the new symbol database. For now I've patched it up so it at least mostly works.

Progress:

  • Refactor CCC to have acceptable error handling for integration with PCSX2.
  • Import symbol table parsing and AST code from CCC.
  • Create a data inspector window in the Qt GUI.
  • Create a QAbstractItemModel to inspect data structures in memory using this type information.
  • Implement the globals tab in the data inspector window.
  • Implement the stack tab in the data inspector window.
  • Rewrite CCC so it can replace the existing SymbolMap class.
  • Implement support for other kinds of symbol tables (.symtab, SNDLL) in CCC.
  • Merge the data inspector window into the debugger window.
  • Replace the SymbolMap class with the SymbolDatabase class from CCC.
  • Various improvements (better editor widgets, VS project files, etc).
  • Rebase everything.

Possible future extensions:

  • Support for DWARF 1 symbol tables, as were written out by the Metrowerks compiler. This would allow for many more games to be supported by the data inspector. I think the current architecture I have with the C++ AST would allow for this (with some changes), but it would still probably be a long-term goal.
  • Watch window. This was originally going to be part of this PR, but I've decided to skip it for now. Ideally it would have an expression parser for a C-like language.
  • JSON import/export. This would allow people to save their user-defined symbols, and to transfer certain symbols (e.g. data types) between games.
  • Undo/redo support.
  • Parser for a subset of C++. This would allow for new data types to be created by the user. Something like this would probably do.
  • Import symbol tables from IRX modules and SNDLL files. Hook the loader functions, import/delete the symbols on the fly.

Rationale behind Changes

Many games shipped with debug symbols in this format, so this would be an extremely useful addition for the modding communities of said games.

Suggested Testing Steps

Some games with .mdebug symbols (data types, functions, globals):

  • Alex Ferguson's Player Manager 2001
  • European Tennis Pro
  • Fatal Frame
  • Go Go Golf
  • Hard Hitter Tennis
  • Jet X2O
  • MTV Music Generator 2
  • Orange Pocket: Root
  • Sega Soccer Slam
  • The Sims
  • The Weakest Link

A game with SNDLL symbols (functions, globals):

  • Ratchet & Clank: Size Matters

@chaoticgd chaoticgd marked this pull request as draft November 4, 2023 00:39
@chaoticgd chaoticgd changed the title Add .mdebug symbol table parser and data inspector window Debugger: Add .mdebug symbol table parser and data inspector window Nov 4, 2023
@refractionpcsx2
Copy link
Member

refractionpcsx2 commented Nov 8, 2023

I don't know if anybody else agrees, but is there any reason this couldn't be in the debugger? I feel like they are kind of doing similar jobs since we have function symbols in there, and it would be nice to be able to right click on a variable and do "Go to in memory view" etc, all these features just feel like they would be at home as part of the debugger, rather than a separate screen.

Great features though!

Edit: just to clarify what I was thinking, if it was have it as a tab or something, so it doesn't add extra clutter to the debugger, not all of it on one screen, that might be a bit much!

@chaoticgd
Copy link
Contributor Author

chaoticgd commented Nov 8, 2023

I don't know if anybody else agrees, but is there any reason this couldn't be in the debugger?

There isn't any particular reason, especially now memory consumption is down a lot. That said, I wanted to focus on getting the functionality working first, then worry about integrating it with the existing debugger window. If anyone has more specific design ideas, certainly feel free to suggest them.

One constraint I have is that I think the watch window should at least be accessible alongside the rest of the debugger, so people can observe how values change as they step through code, just like with a regular GUI debugger. I think having it as a tab at the bottom would make sense, maybe with the option to pop it out as its own window.

Perhaps Qt Dock Widgets or KDDockWidgets would make sense here?

@refractionpcsx2
Copy link
Member

Yeah no worries.

I believe Fobes was thinking about changing the debugger so it was separate (docked?) windows, because the debugger is a bit cramped, so it would be nice to split it out, but I want expecting you to go that far with scope lol

@chaoticgd
Copy link
Contributor Author

Got the stack tab working, with some caveats.

Screenshot_20231109_091444

The biggest issue is that for some functions the live ranges seem to be invalid.

It should be possible to display the values of parameters as well as locals, although the registers stored in the symbol table seem to be the registers used in the body of the function, not the prologue, so maybe there's something to be done there to make that more user friendly if you just break on the first instruction of a function.

@chaoticgd
Copy link
Contributor Author

Some further thoughts about integrating this with the debugger window:

One challenge I see is how we're going to approach the multiple sources of truth that are the two different symbol tables. I see two different approaches to deal with this:

  • Implement different tree models for the different symbol tables.
  • Implement support for the standard ELF symbol table in CCC, and have it process both symbol tables into the same data structure in memory.

The latter would probably be cleaner, although also more invasive.

@refractionpcsx2
Copy link
Member

I agree the latter sounds like a better option, try to keep the symbols unique, we don't really want multiple sources of truth if we can help it, but this is going to be true if it's integrated with the debugger or not.

@chaoticgd
Copy link
Contributor Author

Little update: I'm still working on this. Building the new data structures and porting my existing code to use them is taking some time. I'll copy it all into this PR when it's all relatively stable.

@github-actions github-actions bot added the Dependencies Pull requests that update a dependency file label Jan 7, 2024
3rdparty/rcheevos/rcheevos Outdated Show resolved Hide resolved
@chaoticgd chaoticgd changed the title Debugger: Add .mdebug symbol table parser and data inspector window Debugger: Overhaul symbol table parsing (add .mdebug and .sndata support) and add inspector GUI for complex data types Jan 7, 2024
@chaoticgd chaoticgd marked this pull request as ready for review March 14, 2024 18:56
@chaoticgd
Copy link
Contributor Author

This is finally ready for review and testing. Take as long as you want, I'm aware it's ballooned in scope quite a lot.

@Daniel-McCarthy
Copy link
Contributor

Daniel-McCarthy commented Mar 15, 2024

Couple things I've found while giving it a more thorough test:

  • It's crashing for me if I try to delete a local or parameter.
  • Creating a parameter and local should be blocked if the cpu is dead, as typing an address in the dialog to create one during that state will cause a crash.
  • Right click -> Go to in Memory View isn't quite working as expected. It doesn't set the location in memory on my end (the function in MemoryView isn't getting called when the signal is supposed to get called.

    If you right click a row on the Name or Value column and do Go To in Memory View it doesn't change the position in memory. It will if you right click on the Location column, but not actually because it only works for that column, but that if you left click or right click the location column it will instantly move the memory position to it correctly.

Just wanted to give a heads up on these.

Edit: All mentioned issues are now resolved, great work!

@weirdbeardgame
Copy link
Contributor

The earlier issue I was having with Fatal Frame 1 is resolved.

@weirdbeardgame
Copy link
Contributor

image
Okami it looks like the symbols table died

@Risae
Copy link
Contributor

Risae commented Mar 15, 2024

Fixes #10692 (Function column labels)
Partially fixes #10678 (Rename Function)

Copy link
Contributor

@Daniel-McCarthy Daniel-McCarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All issues I ran into are resolved. LGTM, haven't found any new issues. 👍
Great work on this

@chaoticgd
Copy link
Contributor Author

chaoticgd commented Mar 17, 2024

I have actually found another issue, a recent regression with enum editors (actually the view wasn't updating correctly, which was confusing me). Also wondering about what the best way to get Codacy to stop complaining is. I need my imaginary green check mark back!

Edit: These issues have been fixed.

@chaoticgd
Copy link
Contributor Author

That should be all the issues found so far sorted. I want all the people who regularly work on the debugger to have a look through the symbol guardian/symbol database stuff to make sure it's to their satisfaction.

As for the Okami thing, it looks like the game's symbol table is obfuscated so there's nothing to be done.

@chaoticgd
Copy link
Contributor Author

Aand of course there were more bugs, which are now fixed. I'll continue playing about with it to see I can find any more.

@chaoticgd
Copy link
Contributor Author

I've been thinking, and I've come to the conclusion that the current design for the SymbolGuardian class is unnecessarily complicated. Specifically how the worker thread that loads the symbol table(s) takes a lock on the symbol database for that entire duration, which leads to there being TryRead/BlockingRead/TryReadWrite/BlockingReadWrite functions, which are probably not all required.

A better design would be to build the symbol table in a separate SymbolDatabase object and then merge them at the end. I think I strayed away from this originally because I thought it would cause problems with conflicting symbol handles, but I've come up with a solution to that: Make the ccc::SymbolList<>::m_next_handle member variables static std::atomic<u32>. I'm not sure why, but it took me until today to think of this.

So far all the changes made since I said the PR was ready for review were fairly minor, although if this PR is going to take a long time to review anyway I think I may as well make the improvements immediately.

I haven't thought through how exactly all the aspects of this will work, I just want to get this down for now.

@stenzek
Copy link
Contributor

stenzek commented May 4, 2024

Is it possible to squash the commits down into a smaller number? For example, one to add the new demangler, another to swap over to it, etc.

It's quite difficult to review 252 commits over hundreds of files, and it's also not rebase-mergeable (which is our preferred technique).

@chaoticgd
Copy link
Contributor Author

chaoticgd commented May 4, 2024

Is it possible to squash the commits down into a smaller number? For example, one to add the new demangler, another to swap over to it, etc.

It's quite difficult to review 252 commits over hundreds of files, and it's also not rebase-mergeable (which is our preferred technique).

I'll look into that, although I'll have to learn more about rebasing first. I didn't previously spend too much time on that since the other contributors said it would be squashed anyway.

@stenzek
Copy link
Contributor

stenzek commented May 4, 2024

For smaller PRs, yeah, we can just squash them on merge. But with 50k lines added, I'd prefer to split it up, to make bisecting issues easier.

@fjtrujy
Copy link
Contributor

fjtrujy commented Jul 16, 2024

Hello @chaoticgd really nice feature!
If you need help with the rebase or squashing I can help you with that
Cheers

@chaoticgd
Copy link
Contributor Author

chaoticgd commented Jul 17, 2024

Hello @chaoticgd really nice feature! If you need help with the rebase or squashing I can help you with that Cheers

I'd like to try fixing the history myself first, I plan to resume work on it soon.

@F0bes F0bes marked this pull request as draft July 29, 2024 23:01
@fjtrujy
Copy link
Contributor

fjtrujy commented Aug 21, 2024

Hello @chaoticgd really nice feature! If you need help with the rebase or squashing I can help you with that Cheers

I'd like to try fixing the history myself first, I plan to resume work on it soon.

Now that I see you are active again here...

One thing important to highlight when fixing the git history (I suppose you are aware).
If you run a rebase -i all the "merge commits" will disappear, the more often you perform a "merge from master" the more difficult to rewrite the history will be.

Plenty of times when merging from master we need to solve conflicts and we apply these changes on the "merge commits", so it is quite risky to lose these changes when running rebases.

Cheers

This library doesn't support the demangling scheme used by GCC 2.x
compilers and hence doesn't work in lots of cases.
This is the symbol table parser that I'm replacing the existing ELF
symbol table parser with. It supports STABS symbols in .mdebug sections
as well as ELF symbols and SNDLL symbols.

It includes its own symbol database, and an AST which facilitates
debugging tools that let the user inspect complex data structures with
full type information.

More information is provided in the included readme.
This new class uses the CCC library I added in the last commit and
parses the symbol tables on a worker thread.
This code is taken from GCC 13.2.0 with a number of modifications
applied. See the included readme for more information.
This adds three new tabs in the debugger: The Globals tab, the Locals
tab and the Parameters tab. In addition, it rewrites the Functions tab.

All four of these tabs use the new symbol tree widgets and the
associated model. This allows the user the inspect complex data
structures in memory with full type information.

Lastly, new dialogs have been added for creating symbols.
@chaoticgd
Copy link
Contributor Author

chaoticgd commented Aug 26, 2024

I've recreated the git commit history here. These commits have all been ran through the CI checks.

If these new commits are satisfactory I can force push them over the data_inspector branch. The old history will still be available on this branch.

There is a diff between the two. The old SymbolMap class is still included in the non-rebased branch due to an error on my part. Other than that, there are some tiny changes I decided to make while reading through the diffs.

@F0bes
Copy link
Member

F0bes commented Aug 26, 2024

Yeah, that new branch looks fine. Feel free to force push this branch. Once that's done I'll ask some people to test and we can get this merged.

@chaoticgd
Copy link
Contributor Author

Okay, I've pushed it.

@chaoticgd chaoticgd marked this pull request as ready for review August 26, 2024 23:15
Copy link
Contributor

@weirdbeardgame weirdbeardgame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to read and write to ene_wrk and plyr_wrk structs in Fatal frame 1

Copy link
Contributor

@dreamsyntax dreamsyntax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified in GR2004;
A great improvement

@fjtrujy
Copy link
Contributor

fjtrujy commented Aug 27, 2024

Amazing! Well done!

Copy link
Member

@F0bes F0bes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested homebrew, it shows all of my globals (and reminded me of how poorly I made that thing years ago 😅)
Tested some games. I don't have any that have the mdebug or sndata support but I did not find any regressions.
No compiler warnings.
Good work.

@F0bes F0bes merged commit 79dbc27 into PCSX2:master Aug 27, 2024
12 checks passed
@lightningterror
Copy link
Contributor

lightningterror commented Aug 27, 2024

@F0bes
Copy link
Member

F0bes commented Aug 27, 2024

Oh, I checked the diff of a single commit and didn't see anything that touched that code, oops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.