Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solidity binding fixes driven by Sanctuary #1149

Open
wants to merge 67 commits into
base: main
Choose a base branch
from

Conversation

ggiraldez
Copy link
Contributor

@ggiraldez ggiraldez commented Nov 15, 2024

This PR contains fixes to bindings for issues that were found by trying to resolve all references in Sanctuary, as well as test cases to verify them. It also improves documentation of the rules and improvements (renames and rearrangements) for readability. The built-in types are also standardized with a PascalCase naming convention.

Most notably, it introduces extension scopes to support resolving attached function set via using directives. These extension scopes need to be accessible at any point during the resolution of a symbol stack if the originating reference is located in a lexical scope which is affected by a using directive. This forced a refactor of the lexical scope node structure, introducing a new .extended_scope available in nodes where references may need to resolve to attached functions (ie. function bodies, constant initialization expressions, etc.) Also, for Solidity < 0.7.0 using directives are inherited, so we need to propagate the new extension scopes for all sub-contracts relative to where the directive is located.

The extension scope mechanism is implemented using the jump to scope and scope stack features of the stack graphs. The extended scope would optionally push the extension scope for the current contract and then continue resolution through the normal lexical scope. This effectively doubles the search space in the graph when performing a resolution, and this happens every time we inject a new extension scope to the scope stack. This has the potential to exponentially increase the running times and memory requirements. This is a known caveat which will be addressed in a future PR.

- `string` and `bytes` are exported as built-in variables resolving to types
that provide a `concat` function
- `address` can be used as a function to cast a parameter to `address` to eg.
retrieve the balance
This makes resolving to attached functions on types even when the reference of
those types happen in a different lexical scope.
And also provide alternative paths with and without propagating the dynamic
scope. Otherwise, since scope accumulate on the stack it's possible we'll need
to resolve an attached function with the wrong dynamic scope at the top of the
scope stack.
Both contracts and libraries can optionally push the dynamics scope when
traversing to the parent lexical scope (ie. the source unit). But for libraries,
they can also optionally push their name to correctly bind internal types which
were extended (with `using`) by their qualified name (ie. `Lib.Type`).
Applying a function call with a type will always return a value of that type, so
a symbol stack `type,()` is equivalent to `type,@typeof`. Reflect on the binding
entry point of the using clause.
These are parsed as modifiers, and they need a similar treatment as parent
constructor calls in new constructor definitions.
@@ -797,54 +1081,82 @@ inherit .lexical_scope
| [FixedKeyword]
| [UfixedKeyword]
)] {
let @elementary.symbol = (format "%{}" (source-text @keyword))
var symbol = (source-text @keyword)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: since these involve string matching and formatting, should we have specific queries instead?

  • dedicated queries for uint, int, ufixed, fixed that each check for the single value.
  • another general query for the others that don't need string matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this without enumerating all the possible values in the general query? That will result in string matching anyway, just in a different place. I don't see how it's possible otherwise, given that we don't have a negation operator for queries.

Aside: there are a couple of cases where having a negation query operator would have been useful. I remember @AntonyBlakey asking me about that, and at the time I didn't have any specific use cases. But I've now seen a handful, this one being one of them.

Copy link
Contributor

@AntonyBlakey AntonyBlakey Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A negation operator would be easy, as you probably know. Although the semantics ... would it mean 'does not appear as a child' or is it negative lookahead. I think t-s is the former.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the semantics especially in combination with other operators can be hard to design.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will result in string matching anyway, just in a different place.

But we won't be matching all values for all these kinds, right? I wonder if it makes a perf impact..
Additionally, for readability/maintainability reasons, I'm not sure what we gain by combining them.

@elementary [ElementaryType @keyword ([BoolKeyword])] {
  let @elementary.symbol = "%bool"
}

@elementary [ElementaryType @keyword ([ByteKeyword])] {
  let @elementary.symbol = "%byte"
}

@elementary [ElementaryType @keyword ([FixedKeyword])] {
  let symbol = (source-text @keyword)
  if (symbol == "fixed") {
    let @elementary.symbol = "%fixed128x18"
  } else {
    let @elementary.symbol = (format "%{}" symbol)
  }
}

// ...etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I now see what you mean. That looks reasonable. I'll change that.

enabled = From("0.5.0")
),
BuiltInFunction(
name = "encode",
parameters = ["$args"],
parameters = ["$Types"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to generate a usable built_ins.sol, I wonder if we should add parameter names to all of these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done that out of laziness, but I can add the names.

This case though, is one of those that cannot be written in valid Solidity, because the %Types argument here is a varargs .... There are other similar cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha. I think there are a few inconsistencies:

  • We need a valid Solidity type, to be able to display to the users/bind in the future. Maybe we can use an array instead of the varargs, similar to how TypeScript type system does it?
  • I realized that it is not consistent, since Yul parameters are untyped. Our BuiltInFunction type doesn't differentiate between them..
  • Very few signatures have a "storage location" like memory. If we don't have this data, maybe we can remove the location from all of them? until we actually have a use for it, and can validate/test its impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very few signatures have a "storage location" like memory. If we don't have this data, maybe we can remove the location from all of them? until we actually have a use for it, and can validate/test its impact.

I took this from the Solidity docs. IIUC you need storage location only for types that don't fit an EVM register. I think all instances of such parameters are annotated for the built-ins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that it is not consistent, since Yul parameters are untyped. Our BuiltInFunction type doesn't differentiate between them..

Hmm... good point. I didn't foresee this, but it's a problem for defining the Yul built-ins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a valid Solidity type, to be able to display to the users/bind in the future. Maybe we can use an array instead of the varargs, similar to how TypeScript type system does it?

An array sounds good, but of what type?

@@ -6886,15 +6877,20 @@ codegen_language_macros::compile!(Language(
),
BuiltInType(
name = "$bytes",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: $Bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's a leftover. I even switched to %Array but didn't notice this one. Will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... on second thought, this is different from %Array and more similar to %address which remains in all lowercase. The rationale behind it is the rules that push the symbol types build them directly from the keywords. But maybe we should change those as well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the other elementary types to produce title case variants is not trivial. There is no usable string transformation function in the graph builder. We could implement one, but I'm not sure it's worth it.

Copy link
Collaborator

@OmarTawfik OmarTawfik Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the other elementary types to produce title case variants is not trivial.

we already define pascal_case and a bunch of other _case helpers in crates/infra/utils/src/codegen/tera.rs.
If we are moving to split .msgb files, and using templates to rendering them, maybe we can use it? It would only apply to the rules we specifically put in built_ins.msgb.
IIUC, that can also be used to bind to an Address type instead of %address.

IIRC, this is common in many other "std" libraries. For example, C#'s string keyword would bind to a System.String type.
One other idea if we are brainstorming: define a full keywords_to_types mapping here, and use it in the automatically-generated built_ins.msgb, rather than special casing things like this here, and also rules like this.

Not blocker for this PR though.

Copy link
Contributor Author

@ggiraldez ggiraldez Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are moving to split .msgb files, and using templates to rendering them, maybe we can use it? It would only apply to the rules we specifically put in built_ins.msgb.

I'm a bit wary of exploding the number of rules. Generating them should easy through templating as you say, but then we need to parse them and execute them. If we need to generate one stanza for each of the elementary types (which we do to change the case of the generated symbol), that's a lot of new rules.

In that case, adding a graph function to transform the string would be much more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, that can also be used to bind to an Address type instead of %address.

This would not be possible anyway, cause we could be conflicting with a user-defined Address struct/contract/whatever. We still need to prefix built-in types with % to avoid conflicts with user definitions.

@@ -6961,7 +6981,7 @@ codegen_language_macros::compile!(Language(
functions = []
),
BuiltInType(
name = "$typeContractType",
name = "$ContractTypeType",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few TypeType. Is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are the types returned by the type() expression, hence the double Type. But I see a conflict here, because we also have UserTypeType which is the type of a user-defined value type. I'm thinking I'll rename that one to UDVTType or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed UserTypeType to UserDefinedValueType.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I also see three remaining IntTypeType, ContractTypeType, and InterfaceTypeType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those remaining are the types returned by type() expressions. I renamed UserTypeType because that wasn't something returned by type().

│ │
│ ╰─── def: 3
4 │ let s := sload(data_slot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be resolved to a slot property in the built-ins? rather than the data parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a built-ins struct to represent the .offset and .slot for Yul. I'll add it. But in this case I'm not sure, because data_slot is equivalent to data.slot so there are two bindings there. I think it may be confusing to bind this to the built-in and not data.

Copy link
Contributor Author

@ggiraldez ggiraldez Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a built-in struct to define the possible members of Yul external values. Binding to it is hard-coded in the rules, whenever we encounter a YulPath with a member access.

One thing I noticed is that the definition of YulPath does not have version restrictions, but it's only available since 0.7.0 (from what I can test with Remix). Is this worth addressing in the grammar?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because data_slot is equivalent to data.slot so there are two bindings there.

Yes, and IIUC, both should be bound to the slot built-in property (of "int" type), right? not data (of type bytes).
WDYT?

the definition of YulPath does not have version restrictions, but it's only available since 0.7.0

let x.y.z := 0 is legal in 0.6.12. Am I missing something?

Copy link
Contributor Author

@ggiraldez ggiraldez Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't clear. Yes it's legal, but x.y.z as a whole is the identifier. Shouldn't that be parsed as a YulIdentifier? I can work around it in the binding rules by creating the reference/definition on the YulPath instead for < 0.7.0 though.

Nevermind, I was misreading the spec. x.y.z is parsed as a YulIdentifier, specifically between 0.5.8 and <0.7.0. I added a test case to verify that it's bound correctly.

Copy link
Contributor Author

@ggiraldez ggiraldez Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and IIUC, both should be bound to the slot built-in property (of "int" type), right? not data (of type bytes).
WDYT?

Ok, makes sense. I'll change it.

}
contract Test {
function test(Base base) public {
base.value().x;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

solc produces an error for this code. Not sure I'm following?

TypeError: Member "x" not found or not visible after argument-dependent lookup in int256.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, ok. I clearly didn't test this in Remix and the semantics are totally unexpected to me. I thought the getter function would return the same struct but that's not the case. The documentation is also fuzzy. This complicates things and I'll need to re-think a bit how we're binding public getters.

Copy link
Contributor Author

@ggiraldez ggiraldez Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a fix for the simplest case: a struct that has a single field. In that case, the getter would return the value of such field. If the struct has more fields, Solidity will generate a getter that returns a tuple. That case is not relevant for name binding, since you cannot chain calls to tuple results (since tuples are not a real type).

There is an additional caveat: if the only field in the struct is not an elementary type, then it cannot be returned in a getter. We don't handle that case and blindly bind to the type of the first field.

For more complex cases such as the one presented in Solidity documentation, I don't think we can do anything and will need to be handled in later passes.

For reference, the complex example referred is:

// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.0 <0.9.0;

contract Complex {
    struct Data {
        uint a;
        bytes3 b;
        mapping(uint => uint) map;
        uint[3] c;
        uint[] d;
        bytes e;
    }
    mapping(uint => mapping(bool => Data[])) public data;
}

Which generates this getter:

function data(uint arg1, bool arg2, uint arg3)
    public
    returns (uint a, bytes3 b, bytes memory e)
{
    a = data[arg1][arg2][arg3].a;
    b = data[arg1][arg2][arg3].b;
    e = data[arg1][arg2][arg3].e;
}

Copy link
Collaborator

@OmarTawfik OmarTawfik Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more complex cases such as the one presented in Solidity documentation, I don't think we can do anything and will need to be handled in later passes.

But does that mean that we still bind the getter identifier to the correct definition?
If so, I think we can accept that for now, and add it as a GitHub issue, or maybe in the "limitations" list so that we don't forget about it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we still bind the getter. I'll definitely add this to the limitations document.

A public getter will never return a complex data type, and a special getter
function is generated for arrays, mappings and structs. Arrays and mappings we
were already handling, but this commit makes that more explicit. For structs, we
bind the type of the first field. There is a caveat here: if the first field is
not a type that can be flattened and returned in a public getter we're still
binding it, instead of the first field that can be returned. But I don't see a
way to be 100% correct here.
@ggiraldez
Copy link
Contributor Author

Re-running the tests against Sanctuary I noticed some things broke with the last few changes. Expect a couple more commits added to this PR.

@ggiraldez ggiraldez marked this pull request as draft November 21, 2024 21:22
We are usually binding `this` to the enclosing contract type, but it can also be
used in library code. For this special ocasion, bind it to a built-in. This will
need to be resolved later to the actual contract instance by the backend.
@ggiraldez ggiraldez marked this pull request as ready for review November 22, 2024 18:57
ggiraldez added a commit to manastech/slang that referenced this pull request Nov 25, 2024
This happens when there is special member access in some Yul identifier (like
`x.slot` or `x.offset`). I think this issue was introduced when unreserving the
`address` keyword since that changed the structure of `YulPath`. There is a
proper test case in NomicFoundation#1149 already, but without this fix running Sanctuary with
existing rules crashes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants