Solidity binding fixes driven by Sanctuary #1149

ggiraldez · 2024-11-15T00:03:44Z

This PR contains fixes to bindings for issues that were found by trying to resolve all references in Sanctuary, as well as test cases to verify them. It also improves documentation of the rules and improvements (renames and rearrangements) for readability. The built-in types are also standardized with a PascalCase naming convention.

Most notably, it introduces extension scopes to support resolving attached function set via using directives. These extension scopes need to be accessible at any point during the resolution of a symbol stack if the originating reference is located in a lexical scope which is affected by a using directive. This forced a refactor of the lexical scope node structure, introducing a new .extended_scope available in nodes where references may need to resolve to attached functions (ie. function bodies, constant initialization expressions, etc.) Also, for Solidity < 0.7.0 using directives are inherited, so we need to propagate the new extension scopes for all sub-contracts relative to where the directive is located.

The extension scope mechanism is implemented using the jump to scope and scope stack features of the stack graphs. The extended scope would optionally push the extension scope for the current contract and then continue resolution through the normal lexical scope. This effectively doubles the search space in the graph when performing a resolution, and this happens every time we inject a new extension scope to the scope stack. This has the potential to exponentially increase the running times and memory requirements. This is a known caveat which will be addressed in a future PR.

…e-or-more operator

- `string` and `bytes` are exported as built-in variables resolving to types that provide a `concat` function - `address` can be used as a function to cast a parameter to `address` to eg. retrieve the balance

This makes resolving to attached functions on types even when the reference of those types happen in a different lexical scope.

And also provide alternative paths with and without propagating the dynamic scope. Otherwise, since scope accumulate on the stack it's possible we'll need to resolve an attached function with the wrong dynamic scope at the top of the scope stack.

Both contracts and libraries can optionally push the dynamics scope when traversing to the parent lexical scope (ie. the source unit). But for libraries, they can also optionally push their name to correctly bind internal types which were extended (with `using`) by their qualified name (ie. `Lib.Type`).

…erations

Applying a function call with a type will always return a value of that type, so a symbol stack `type,()` is equivalent to `type,@typeof`. Reflect on the binding entry point of the using clause.

These are parsed as modifiers, and they need a similar treatment as parent constructor calls in new constructor definitions.

OmarTawfik · 2024-11-19T07:08:34Z

crates/solidity/inputs/language/bindings/rules.msgb

@@ -797,54 +1081,82 @@ inherit .lexical_scope
    | [FixedKeyword]
    | [UfixedKeyword]
 )] {
-  let @elementary.symbol = (format "%{}" (source-text @keyword))
+  var symbol = (source-text @keyword)


Q: since these involve string matching and formatting, should we have specific queries instead?

dedicated queries for uint, int, ufixed, fixed that each check for the single value.

another general query for the others that don't need string matching.

Can we do this without enumerating all the possible values in the general query? That will result in string matching anyway, just in a different place. I don't see how it's possible otherwise, given that we don't have a negation operator for queries.

Aside: there are a couple of cases where having a negation query operator would have been useful. I remember @AntonyBlakey asking me about that, and at the time I didn't have any specific use cases. But I've now seen a handful, this one being one of them.

A negation operator would be easy, as you probably know. Although the semantics ... would it mean 'does not appear as a child' or is it negative lookahead. I think t-s is the former.

Yeah, the semantics especially in combination with other operators can be hard to design.

That will result in string matching anyway, just in a different place.

But we won't be matching all values for all these kinds, right? I wonder if it makes a perf impact..
Additionally, for readability/maintainability reasons, I'm not sure what we gain by combining them.

@elementary [ElementaryType @keyword ([BoolKeyword])] { let @elementary.symbol = "%bool" } @elementary [ElementaryType @keyword ([ByteKeyword])] { let @elementary.symbol = "%byte" } @elementary [ElementaryType @keyword ([FixedKeyword])] { let symbol = (source-text @keyword) if (symbol == "fixed") { let @elementary.symbol = "%fixed128x18" } else { let @elementary.symbol = (format "%{}" symbol) } } // ...etc

Oh, I now see what you mean. That looks reasonable. I'll change that.

crates/solidity/inputs/language/bindings/rules.msgb

OmarTawfik · 2024-11-19T07:15:10Z

crates/solidity/inputs/language/src/definition.rs

                    enabled = From("0.5.0")
                ),
                BuiltInFunction(
                    name = "encode",
-                    parameters = ["$args"],
+                    parameters = ["$Types"],


in order to generate a usable built_ins.sol, I wonder if we should add parameter names to all of these?

I haven't done that out of laziness, but I can add the names.

This case though, is one of those that cannot be written in valid Solidity, because the %Types argument here is a varargs .... There are other similar cases.

Aha. I think there are a few inconsistencies:

We need a valid Solidity type, to be able to display to the users/bind in the future. Maybe we can use an array instead of the varargs, similar to how TypeScript type system does it?

I realized that it is not consistent, since Yul parameters are untyped. Our BuiltInFunction type doesn't differentiate between them..

Very few signatures have a "storage location" like memory. If we don't have this data, maybe we can remove the location from all of them? until we actually have a use for it, and can validate/test its impact.

Very few signatures have a "storage location" like memory. If we don't have this data, maybe we can remove the location from all of them? until we actually have a use for it, and can validate/test its impact.

I took this from the Solidity docs. IIUC you need storage location only for types that don't fit an EVM register. I think all instances of such parameters are annotated for the built-ins.

I realized that it is not consistent, since Yul parameters are untyped. Our BuiltInFunction type doesn't differentiate between them..

Hmm... good point. I didn't foresee this, but it's a problem for defining the Yul built-ins.

We need a valid Solidity type, to be able to display to the users/bind in the future. Maybe we can use an array instead of the varargs, similar to how TypeScript type system does it?

An array sounds good, but of what type?

OmarTawfik · 2024-11-19T07:16:27Z

crates/solidity/inputs/language/src/definition.rs

@@ -6886,15 +6877,20 @@ codegen_language_macros::compile!(Language(
        ),
        BuiltInType(
            name = "$bytes",


nit: $Bytes?

Yep, that's a leftover. I even switched to %Array but didn't notice this one. Will change it.

Hmm... on second thought, this is different from %Array and more similar to %address which remains in all lowercase. The rationale behind it is the rules that push the symbol types build them directly from the keywords. But maybe we should change those as well...

Changing the other elementary types to produce title case variants is not trivial. There is no usable string transformation function in the graph builder. We could implement one, but I'm not sure it's worth it.

Changing the other elementary types to produce title case variants is not trivial.

we already define pascal_case and a bunch of other _case helpers in crates/infra/utils/src/codegen/tera.rs.
If we are moving to split .msgb files, and using templates to rendering them, maybe we can use it? It would only apply to the rules we specifically put in built_ins.msgb.
IIUC, that can also be used to bind to an Address type instead of %address.

IIRC, this is common in many other "std" libraries. For example, C#'s string keyword would bind to a System.String type.
One other idea if we are brainstorming: define a full keywords_to_types mapping here, and use it in the automatically-generated built_ins.msgb, rather than special casing things like this here, and also rules like this.

Not blocker for this PR though.

If we are moving to split .msgb files, and using templates to rendering them, maybe we can use it? It would only apply to the rules we specifically put in built_ins.msgb.

I'm a bit wary of exploding the number of rules. Generating them should easy through templating as you say, but then we need to parse them and execute them. If we need to generate one stanza for each of the elementary types (which we do to change the case of the generated symbol), that's a lot of new rules.

In that case, adding a graph function to transform the string would be much more efficient.

IIUC, that can also be used to bind to an Address type instead of %address.

This would not be possible anyway, cause we could be conflicting with a user-defined Address struct/contract/whatever. We still need to prefix built-in types with % to avoid conflicts with user definitions.

OmarTawfik · 2024-11-19T07:17:00Z

crates/solidity/inputs/language/src/definition.rs

@@ -6961,7 +6981,7 @@ codegen_language_macros::compile!(Language(
            functions = []
        ),
        BuiltInType(
-            name = "$typeContractType",
+            name = "$ContractTypeType",


I found a few TypeType. Is this expected?

Yes, these are the types returned by the type() expression, hence the double Type. But I see a conflict here, because we also have UserTypeType which is the type of a user-defined value type. I'm thinking I'll rename that one to UDVTType or similar.

I renamed UserTypeType to UserDefinedValueType.

Sounds good. I also see three remaining IntTypeType, ContractTypeType, and InterfaceTypeType.

Yes, those remaining are the types returned by type() expressions. I renamed UserTypeType because that wasn't something returned by type().

OmarTawfik · 2024-11-19T07:22:33Z

crates/solidity/testing/snapshots/bindings_output/yul/slot_suffix/generated/0.4.11-success.txt

+   │                                   │   
+   │                                   ╰─── def: 3
+   │ 
+ 4 │             let s := sload(data_slot)


shouldn't this be resolved to a slot property in the built-ins? rather than the data parameter?

We don't have a built-ins struct to represent the .offset and .slot for Yul. I'll add it. But in this case I'm not sure, because data_slot is equivalent to data.slot so there are two bindings there. I think it may be confusing to bind this to the built-in and not data.

I added a built-in struct to define the possible members of Yul external values. Binding to it is hard-coded in the rules, whenever we encounter a YulPath with a member access.

One thing I noticed is that the definition of YulPath does not have version restrictions, but it's only available since 0.7.0 (from what I can test with Remix). Is this worth addressing in the grammar?

because data_slot is equivalent to data.slot so there are two bindings there.

Yes, and IIUC, both should be bound to the slot built-in property (of "int" type), right? not data (of type bytes).
WDYT?

the definition of YulPath does not have version restrictions, but it's only available since 0.7.0

let x.y.z := 0 is legal in 0.6.12. Am I missing something?

Sorry, I wasn't clear. Yes it's legal, but x.y.z as a whole is the identifier. Shouldn't that be parsed as a YulIdentifier? I can work around it in the binding rules by creating the reference/definition on the YulPath instead for < 0.7.0 though.

Nevermind, I was misreading the spec. x.y.z is parsed as a YulIdentifier, specifically between 0.5.8 and <0.7.0. I added a test case to verify that it's bound correctly.

Yes, and IIUC, both should be bound to the slot built-in property (of "int" type), right? not data (of type bytes).
WDYT?

Ok, makes sense. I'll change it.

OmarTawfik · 2024-11-19T07:26:09Z

crates/solidity/testing/snapshots/bindings_output/contracts/call_public_getter/input.sol

+}
+contract Test {
+    function test(Base base) public {
+        base.value().x;


solc produces an error for this code. Not sure I'm following?

TypeError: Member "x" not found or not visible after argument-dependent lookup in int256.

Wow, ok. I clearly didn't test this in Remix and the semantics are totally unexpected to me. I thought the getter function would return the same struct but that's not the case. The documentation is also fuzzy. This complicates things and I'll need to re-think a bit how we're binding public getters.

I introduced a fix for the simplest case: a struct that has a single field. In that case, the getter would return the value of such field. If the struct has more fields, Solidity will generate a getter that returns a tuple. That case is not relevant for name binding, since you cannot chain calls to tuple results (since tuples are not a real type).

There is an additional caveat: if the only field in the struct is not an elementary type, then it cannot be returned in a getter. We don't handle that case and blindly bind to the type of the first field.

For more complex cases such as the one presented in Solidity documentation, I don't think we can do anything and will need to be handled in later passes.

For reference, the complex example referred is:

// SPDX-License-Identifier: GPL-3.0 pragma solidity >=0.4.0 <0.9.0; contract Complex { struct Data { uint a; bytes3 b; mapping(uint => uint) map; uint[3] c; uint[] d; bytes e; } mapping(uint => mapping(bool => Data[])) public data; }

Which generates this getter:

function data(uint arg1, bool arg2, uint arg3) public returns (uint a, bytes3 b, bytes memory e) { a = data[arg1][arg2][arg3].a; b = data[arg1][arg2][arg3].b; e = data[arg1][arg2][arg3].e; }

For more complex cases such as the one presented in Solidity documentation, I don't think we can do anything and will need to be handled in later passes.

But does that mean that we still bind the getter identifier to the correct definition?
If so, I think we can accept that for now, and add it as a GitHub issue, or maybe in the "limitations" list so that we don't forget about it later.

Yes, we still bind the getter. I'll definitely add this to the limitations document.

A public getter will never return a complex data type, and a special getter function is generated for arrays, mappings and structs. Arrays and mappings we were already handling, but this commit makes that more explicit. For structs, we bind the type of the first field. There is a caveat here: if the first field is not a type that can be flattened and returned in a public getter we're still binding it, instead of the first field that can be returned. But I don't see a way to be 100% correct here.

ggiraldez · 2024-11-21T21:22:12Z

Re-running the tests against Sanctuary I noticed some things broke with the last few changes. Expect a couple more commits added to this PR.

We are usually binding `this` to the enclosing contract type, but it can also be used in library code. For this special ocasion, bind it to a built-in. This will need to be resolved later to the actual contract instance by the backend.

This happens when there is special member access in some Yul identifier (like `x.slot` or `x.offset`). I think this issue was introduced when unreserving the `address` keyword since that changed the structure of `YulPath`. There is a proper test case in NomicFoundation#1149 already, but without this fix running Sanctuary with existing rules crashes.

This avoids some oddities, such as `this` used in libraries which we previously had to bind to an artificial built-in.

ggiraldez added 30 commits November 14, 2024 16:30

Add test case which broke with query engine not skipping trivia in on…

aa18f14

…e-or-more operator

Make using directives inherited in Solidity < 0.7.0

42bfde6

Resolve elementary types in expressions and add their built-ins

634b33b

- `string` and `bytes` are exported as built-in variables resolving to types that provide a `concat` function - `address` can be used as a function to cast a parameter to `address` to eg. retrieve the balance

Bind constants as library members

458855c

Propagate dynamic scopes when escaping a contract's lexical scope

bcd0d1f

This makes resolving to attached functions on types even when the reference of those types happen in a different lexical scope.

Bind elementary casting and struct construction to the resulting type

915fbd8

Fix calling parent constructors in constructors

f2ab055

Bind arithmetic, logical, bit-wise, comparison, etc. expressions

13cf7e5

Propagate the dynamic scope with resolving contract bases

5d8c379

Define a new scope for internal state variables for derived contracts

ddac849

Resolve using dynamic scope from an interface/contract casting

a5f3eda

Resolve attached functions when applied to namespace qualified types

901501a

Public getters are used as functions, so bind them as such

1d566b4

Functions can be called in derived contracts by qualifying with the name

a38e8bd

Add support for binding unnamed function declarations

3aab65d

Support old var declarations

9d620d7

Make inherited state vars accessible in all parent contracts

7daa1f4

Normalize value type aliases to bind attached functions correctly

0d2802f

Support binding literal hex addresses

58ced15

Prior to 0.5.0, this could be used as an address

438cb81

Add binding test cases for most of the remaining Sanctuary issues

9cbb4fc

_slot/_offset suffixes for storage variables in Solidity < 0.7.0

f7655e7

Bind to the output of both operands for all arithmetic and bitwise op…

b3bcf54

…erations

Bind legacy call options .value() and .gas() on Solidity < 0.7.0

d6b7174

Bind user defined types .wrap() and .unwrap()

5ec110b

Allow binding attached functions on typed casted expressions

ac13986

Applying a function call with a type will always return a value of that type, so a symbol stack `type,()` is equivalent to `type,@typeof`. Reflect on the binding entry point of the using clause.

Bind modifiers in libraries

6a560d5

Bind legacy constructor parent invocations

4df6b54

These are parsed as modifiers, and they need a similar treatment as parent constructor calls in new constructor definitions.

OmarTawfik requested changes Nov 19, 2024

View reviewed changes

OmarTawfik reviewed Nov 19, 2024

View reviewed changes

ggiraldez added 4 commits November 19, 2024 19:32

Bind members of Yul external variables

4579182

Rename built-in type for User Defined Value Types

2b3ae36

Add names to most parameters of built-in functions

1163b07

ggiraldez requested a review from OmarTawfik November 20, 2024 18:00

ggiraldez marked this pull request as draft November 21, 2024 21:22

ggiraldez added 5 commits November 22, 2024 12:39

Bind inherited public getters

c6b73c6

Fix scope bindings for modifiers in libraries

a015082

Bind attached functions when there's a using .. for * in a library

fc929d3

Allow Yul functions to access external constants

90a4f26

ggiraldez marked this pull request as ready for review November 22, 2024 18:57

ggiraldez mentioned this pull request Nov 25, 2024

Bind type(...).min and type(...).max to the operand type #1158

Open

ggiraldez mentioned this pull request Nov 25, 2024

Add bindings test to Sanctuary #1159

Open

ggiraldez added 5 commits November 25, 2024 14:47

this and super don't generate references

d5823c0

This avoids some oddities, such as `this` used in libraries which we previously had to bind to an artificial built-in.

Refactor elementary types handling in binding rules

4768079

Bind x_slot in Solidity <= 0.5.0 to the slot built-in

5ed161b

Add test case with Yul identifiers with dots

a6b610c

Tweak some parameters of built-in functions

4c19da0

ggiraldez force-pushed the sanctuary-fixes-only branch from 40e6da0 to 4c19da0 Compare November 26, 2024 23:54

ggiraldez added 4 commits November 26, 2024 19:41

Merge remote-tracking branch 'upstream/main' into sanctuary-fixes-only

b62e37e

Update snapshots after merge

f6d3113

Update performance test counts and ignore built-ins in them

a865d10

Fix formatting

c278a6c

ggiraldez mentioned this pull request Nov 27, 2024

Inject extension scopes while running the resolution algorithm #1170

Draft

Solidity binding fixes driven by Sanctuary #1149

Are you sure you want to change the base?

Solidity binding fixes driven by Sanctuary #1149

Conversation

ggiraldez commented Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AntonyBlakey Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

ggiraldez Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

ggiraldez Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

OmarTawfik Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez commented Nov 21, 2024

ggiraldez commented Nov 15, 2024 •

edited

Loading

AntonyBlakey Nov 22, 2024 •

edited

Loading

OmarTawfik Nov 26, 2024 •

edited

Loading

ggiraldez Nov 26, 2024 •

edited

Loading

ggiraldez Nov 20, 2024 •

edited

Loading

ggiraldez Nov 26, 2024 •

edited

Loading

ggiraldez Nov 26, 2024 •

edited

Loading

ggiraldez Nov 20, 2024 •

edited

Loading

OmarTawfik Nov 26, 2024 •

edited

Loading