Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#: Fix join order in cs/zipslip #78

Merged
merged 2 commits into from
Aug 9, 2024
Merged

Conversation

MathiasVP
Copy link
Collaborator

This PR fixes a bad join order in cs/zipslip. @LWSimpkins was kind enough to provide me with a log file that showed the problematic predicate:

Starting to evaluate predicate _Call::Call.getArgument/1#dispred#28f02664#bff_102#join_rhs_params#join_rhs/3@b06f61j1
[2024-08-08 19:32:06] (876s) Sequence stamp origin is -6152927877850464690
[2024-08-08 19:32:06] (876s) Pausing evaluation to evict 4.11GiB ARRAYS at sequence stamp o+0
[2024-08-08 19:32:07] (876s) Unpausing evaluation: 102.98GiB forgotten: 102.98GiB UNREACHABLE (115351 items up to o+-2)
[2024-08-08 19:34:34] (1023s) Pausing evaluation to evict 4.11GiB ARRAYS at sequence stamp o+169351
[2024-08-08 19:34:34] (1023s) Unpausing evaluation: 88.81GiB forgotten: 88.81GiB UNREACHABLE (78441 items up to o+169348)
[2024-08-08 19:36:27] (1136s) Pausing evaluation to evict 4.11GiB ARRAYS at sequence stamp o+316777
[2024-08-08 19:36:27] (1136s) Unpausing evaluation: 78.14GiB forgotten: 78.14GiB UNREACHABLE (68998 items up to o+316774)
[2024-08-08 19:37:52] (1222s) Pausing evaluation to evict 4.11GiB ARRAYS at sequence stamp o+448813
[2024-08-08 19:37:52] (1222s) Unpausing evaluation: 71.35GiB forgotten: 71.35GiB UNREACHABLE (63009 items up to o+448811)
[2024-08-08 19:39:31] (1320s) Pausing evaluation to evict 4.11GiB ARRAYS at sequence stamp o+565602
[2024-08-08 19:39:31] (1321s) Unpausing evaluation: 60.93GiB forgotten: 60.93GiB UNREACHABLE (53814 items up to o+565600)

The first thing to do in this case is to notice the #bff in the predicate name. This means the compiler has "optimized" the predicate and inserted additional context into the predicate from the call site in order to reduce the size of the predicate (this is called the "magic" optimization since it may sometimes magically make predicates evaluate much more efficiently). However, the additional context has a risk of being joined with in an inefficient way (just like any other conjuncts one writes in a predicate). Furthermore, there's a risk that any performance tricks we do will be undone by the compiler as part of this optimization. So the first commit disables this optimization for the getAnArgument predicate.

Furthermore, I also did the same to the ComparisonTest.getAnArgument predicate since I saw bad magic in there as well. That wasn't a performance problem in itself, though. Just a looming threat that could have caused a future problem 😅

When we run the code from the first commit the code is still slow. Running codeql query compile "codeql\csharp\ql\src\Security Features\CWE-022\ZipSlip.ql" --dump-dil --dump-ra allows us to better see the code that we end up executing. Doing that, we see the body of the predicate that's slow:

EVALUATE NONRECURSIVE RELATION:
  SYNTHETIC `_Call::Call.getArgument/1#dispred#28f02664_102#join_rhs_params#join_rhs`(entity arg0, entity arg1, entity arg2) :- 
    SENTINEL params
    SENTINEL `Call::Call.getArgument/1#dispred#28f02664_102#join_rhs`
    {2} r1 = SCAN params OUTPUT In.3, In.0 'arg1'
    {3}    | JOIN WITH `Call::Call.getArgument/1#dispred#28f02664_102#join_rhs` ON FIRST 1 OUTPUT Rhs.1 'arg0', Lhs.1 'arg1', Rhs.2 'arg2'
    return r1

The SYNTHETIC keyword means the compiler has generated this predicate. Let's search for all the places where this is used. Luckily, there's just 1 place:

  rec `ZipSlipQuery::SanitizerMethodCall.getFilePathArgument/0#dispred#5fc00ba7`(entity this, entity result) :- 
    SENTINEL `Call::MethodCall.getTarget/0#dispred#fd6fd5d9`
    SENTINEL `Call::Call.getArgument/1#dispred#28f02664`
    [[[ BASE CASE ]]]
    {2} r1 = JOIN ZipSlipQuery::RootSanitizerMethodCall#acedb750 WITH `Expr::QualifiableExpr.getQualifier/0#dispred#a475453f#bf` ON FIRST 1 OUTPUT Lhs.0 'this', Rhs.1 'result'
    return r1

    [[[ SEMINAIVE VARIANT - STANDARD ORDER]]]
    {2} r2 = SCAN ZipSlipQuery::WrapperSanitizerMethodCall#cdaba707#reorder_1_0#prev_delta OUTPUT In.1 'this', In.0
    {4}    | JOIN WITH `_Call::Call.getArgument/1#dispred#28f02664_102#join_rhs_params#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0 'this', Rhs.2 'result'
    {2}    | JOIN WITH ZipSlipQuery::AbstractWrapperSanitizerMethod#class#fa38d971#prev ON FIRST 2 OUTPUT Lhs.2 'this', Lhs.3 'result'

    {2} r3 = SCAN ZipSlipQuery::AbstractWrapperSanitizerMethod#class#fa38d971#prev_delta OUTPUT In.1, In.0
    {2}    | JOIN WITH params_03#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1
    {2}    | JOIN WITH ZipSlipQuery::WrapperSanitizerMethodCall#cdaba707#reorder_1_0#prev ON FIRST 1 OUTPUT Rhs.1 'this', Lhs.1
    {2}    | JOIN WITH `Call::Call.getArgument/1#dispred#28f02664` ON FIRST 2 OUTPUT Lhs.0 'this', Rhs.2 'result'

    {2} r4 = r2 UNION r3
    {2}    | AND NOT `ZipSlipQuery::SanitizerMethodCall.getFilePathArgument/0#dispred#5fc00ba7#prev`(FIRST 2)
    return r4

Aha! So it's generated because of the ZipSlipQuery::SanitizerMethodCall.getFilePathArgument predicate. Let's look at the DIL for that:

incremental
`ZipSlipQuery::SanitizerMethodCall.getFilePathArgument/0#dispred#5fc00ba7`(
  /* ZipSlipQuery::SanitizerMethodCall */ interned entity this,
  /* Expr::Expr */ entity result
)
{
  [base_case]
  ZipSlipQuery::RootSanitizerMethodCall#acedb750(this) and
  `Expr::QualifiableExpr.getQualifier/0#dispred#a475453f#bf`(this, result)
  [recursive_case]
  (
    exists(
      /* ZipSlipQuery::AbstractWrapperSanitizerMethod */ interned entity wrapperMethod,
      /* Variable::Parameter */ interned entity call_result#3
     |
      exists(interned int call_result#2 |
        exists(
          interned dontcare string _,
          /* @type_or_ref */ interned dontcare entity _1,
          interned dontcare int _2,
          /* @parameterizable */ interned dontcare entity _3,
          /* @parameter */ interned dontcare entity _4
         |
          params(call_result#3, _, _1, call_result#2, _2, _3, _4)
        ) and
        `Call::Call.getArgument/1#dispred#28f02664`(this, call_result#2, result)
      ) and
      delta previous rec ZipSlipQuery::WrapperSanitizerMethodCall#cdaba707(this,
        wrapperMethod) and
      previous rec ZipSlipQuery::AbstractWrapperSanitizerMethod#class#fa38d971(wrapperMethod,
        call_result#3)
    )
    or
    exists(interned int call_result#2 |
      exists(
        /* ZipSlipQuery::AbstractWrapperSanitizerMethod */ interned entity wrapperMethod
       |
        exists(/* Variable::Parameter */ interned entity call_result#3 |
          exists(
            interned dontcare string _,
            /* @type_or_ref */ interned dontcare entity _1,
            interned dontcare int _2,
            /* @parameterizable */ interned dontcare entity _3,
            /* @parameter */ interned dontcare entity _4
           |
            params(call_result#3, _, _1, call_result#2, _2, _3, _4)
          ) and
          delta previous rec ZipSlipQuery::AbstractWrapperSanitizerMethod#class#fa38d971(wrapperMethod,
            call_result#3)
        ) and
        previous rec ZipSlipQuery::WrapperSanitizerMethodCall#cdaba707(this,
          wrapperMethod)
      ) and
      `Call::Call.getArgument/1#dispred#28f02664`(this, call_result#2, result)
    )
  ) and
  not(
    previous rec `ZipSlipQuery::SanitizerMethodCall.getFilePathArgument/0#dispred#5fc00ba7`(this,
      result)
  )
}

If you squent a bit you will notice that the body of _Call::Call.getArgument/1#dispred#28f02664_102#join_rhs_params#join_rhs corresponds to this part of the DIL:

exists(interned int call_result#2 |
  exists(
    interned dontcare string _,
    /* @type_or_ref */ interned dontcare entity _1,
    interned dontcare int _2,
    /* @parameterizable */ interned dontcare entity _3,
    /* @parameter */ interned dontcare entity _4
   |
    params(call_result#3, _, _1, call_result#2, _2, _3, _4)
  ) and
  `Call::Call.getArgument/1#dispred#28f02664`(this, call_result#2, result)
)

and here we the problem: we're joining params and Call::Call.getArgument on the argument index 😱 That's a really really bad join order.

By factoring out some parts of the predicate we ensure that we're joining on both this and index. This results in much better code:

exists(interned int index |
  delta previous rec `ZipSlipQuery::WrapperSanitizerMethodCall.paramFilePathIndex/1#dispred#c49ea256`(this,
    index) and
  `Call::Call.getArgument/1#dispred#28f02664`(this, index, result)
)

Now, we join on both the index and the MethodCall (i.e., this) which is much better.

@LWSimpkins LWSimpkins merged commit aeaca1d into main Aug 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants