Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix assembling instructions with unknown/don't care context bits #7195

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

plucia-mitre
Copy link
Contributor

Without this change, if unspecified context bits are provided to the assembler they are defaulted to 0 and the resulting context is used to filter for valid assembly instructions. After this change unspecified bits are kept as unspecified through the assembly process possibly providing additional valid assembly results.

This change is needed to help support Pickled Canary's support for context.

Submitting on behalf of a colleague: Will R

Without this change, if unspecified context bits are provided to the assembler they are defaulted to 0 and the resulting context is used to filter for valid assembly instructions. After this change unspecified bits are kept as unspecified through the assembly process possibly providing more valid assembly results.
@plucia-mitre plucia-mitre marked this pull request as draft November 15, 2024 18:06
@ryanmkurtz ryanmkurtz added the Status: Triage Information is being gathered label Nov 15, 2024
@plucia-mitre
Copy link
Contributor Author

We've just pushed some updated test cases which illustrate (at least some of) the current limitations of this implementation.

The proposed solution in this pull request currently may produce unexpected results when the assembler resolves ^instruction phases/root recursion. The restore instruction in MIPS16 illustrates this issue, as it may be encoded with 16-bits or extended to 32-bits depending on the operands. This extension/prefix is implemented with an ^instruction phase across the board in mips16.sinc. For example, with restore 0x1b8,ra,s0-s1, we only expect to receive 32-bit encodings from the assembler (due to the value of the literal operand).

However, when the context provided to the assembler only has ISA_MODE and RELP set to one (all other context bits unknown), half of the encoding is lost. The result is 0x6477, when we expect something akin to 0xf0306477. Additionally, the same 0xf0306477 disassembles to restore 0x38,ra,s0-s1, which is not what we gave the assembler. We suspect this issue has to do with the implementation and usage of the context transition graph required for pure recursive resolution (AssemblyContextGraph.java). It appears that this class was designed for complete input contexts (see computeOptimalApplications()), and would require some alterations to support unknown context bits.

We have started to experiment with possible fixes, but we'd greatly appreciate any tips about the right way to proceed.

@nsadeveloper789
Copy link
Contributor

It's been a while since I've read through this code, but yes, the troubles you're having around ^instruction sound very familiar. If I could have my way, I'd ban ^instruction (or any production of the form A->A) and require Slaspec authors to specify an alternative start symbol, but alas.

I think part of the core issue is that context bits get (ab)used for all sorts of interesting purposes, and believe me, I'm no stranger to abusing context bits.... That said, I suspect what you intend to get by leaving the context unspecified is "all possible encodings that might appear in a binary." Though that context may change wildly during disassembly of a single instruction, there's actually only so many possible input contexts when starting an instruction. We have a notion of "default context" and "global sets". The latter is also sometimes called "context commits." When the disassembler starts, it uses the default context for the processor as input. Some instructions in the Slaspec may use the "globalset" function in its context transitions. Those will place context changes that persist at a given address. When the disassembler encounters that address it applies the change to the input context for that instruction. So, what I might suggest:

  1. Scrape all the .pspec files that go with the same .slaspec file (you may have to "go up" to the .ldefs files). I think you can avoid scraping the actual files by just querying the DefaultLanguageService, and accessing the various attributes on each language returned. You should then be able to collect all the possible "default contexts." For that part, I think you can use ProgramContextImpl.
  2. This part may not be necessary, but perhaps scrape for possible globalset-based transitions that could change the input context. It looks like ContextCommit is a subclass of ContextChange, and those are accessible via Constructor.getContextChanges(). Consider whether you must be cognizant of the target address. To try to get a handle on how its fields are interpreted, examine SleighParserContext.addCommit and applyCommits.
  3. Once you have those, you should be able to collect all the possible input contexts. You'll probably just have to provide each one of them, fully defined, in a separate call to the assembler. It's maybe possible you can compress the set of contexts by leaving some bits undefined. Or maybe you can short-circuit the exhaustion in (2) above by just saying "if there's a globalset that can affect this bit, I'll just leave it undefined." In this way, only select bits are undefined. Maybe that would improve things?

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Information is being gathered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants