Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decompiler: fix loops with shifted structure offset #6718

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

kkots
Copy link

@kkots kkots commented Jul 15, 2024

Ghidra decompiler is not displaying some array accesses as well as it could. For example, if our structures are defined this way:

struct ListStruct {
    ListElement* Data;
    int ArrayNum;
};

struct ListElement {
    int Element1;
    int Element2;
    int Element3;
};

, and we are given code:

        MOV        EAX,dword ptr [ESP + 0x18]
        MOV        EBX,EAX
        DEC        EBX
        JS         LAB_00487d2f
        LEA        ESI,[EBX + EBX*0x2]
        ADD        ESI,ESI
        ADD        ESI,ESI
        LAB_StartOfLoop
        MOV        EAX,dword ptr [ESP + 0x14]
        CMP        dword ptr [ESI + EAX*0x1 + 0x4],EBP
        JZ LAB_OverStuff
        // do stuff
        LAB_OverStuff
        DEC EBX
        SUB ESI,0xc
        TEST EBX,EBX
        JNS LAB_StartOfLoop

, then decompiler produces this result:

      iStack_88c = List.ArrayNum - 1;
      iVar2 = iStack_88c * 0xc;
      if (-1 < iStack_88c) {
        do {
          if (*(int *)((int)&(List.Data)->Element2 + iVar2) == 0) {
            // do stuff
          }
          iStack_88c = iStack_88c + -1;
          iVar2 = iVar2 + -0xc;
        } while (-1 < iStack_88c);
      }

While it should (with this fix) produce this result, which is much neater and readable:

iVar2 = iStack_88c;
for (; -1 < iStack_88c; iStack_88c = iStack_88c + -1) {
    if (List.Data[iVar2].Element2 == 0) {
        // do stuff
    }
    iVar2 = iVar2 + -1;
}

The issue was caused by RulePtrArith not finding the multiplication operation (by 0xc in this case) in the pointer arithmetic expression ESI + EAX*0x1 + 0x4 (ESI is the shifted structure offset here, EAX is List.Data and 0x4 is the offset to Element2). By letting RulePtrArith recognize certain cases through CPUI_MULTIEQUAL and transforming expressions, it can create a situation where it can find the multiplier and produce better results. The expression transformation involves taking the 0xc in this case out of the -= operator and the place where iVar2 is initialized and duplicating it in every place it's used. When multiplication by 0xc is right there in the pointer arithmetic expression, RulePtrArith can find it and generate CPUI_PTRADD (addition of a number to a pointer).
Another issue was ActionNodeJoin creating CPUI_MULTIEQUALs whose output type was unknown or undefined and then later RulePropagateCopy was taking those and putting them inside CPUI_PTRADD expressions in place of the base pointer which produced now an expression where the base pointer has unknown type, and then RulePtraddUndo would see that there's a CPUI_PTRADD in which the base pointer has size that doesn't match the original size with which CPUI_PTRADD was created, so it converts the CPUI_PTRADD back to CPUI_INT_ADD (addition of two integers). This is fixed by assigning a type to the output of CPUI_MULTIEQUAL in ActionNodeJoin because everyone else seems to just be doing their job.

EDIT: It seems this "shifted structure offset" rule that transforms expressions cannot be applied if the variable that holds the "shifted offset" is stored on the stack and there's a function call in the loop body, because that function call, if passed any arguments, could potentially write to it (you never know what it was passed and whether it's a pointer to that stack location). So it only works when the "shifted offset" variable is in a register.
EDIT: Even if the function was not passed any variables, guess what, global variables exist, and those could point to that stack location, you never know. So any presence of calls in the loop makes using this rule impossible if the offset is stored on the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants