-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solidity binding fixes driven by Sanctuary #1149
base: main
Are you sure you want to change the base?
Conversation
…e-or-more operator
- `string` and `bytes` are exported as built-in variables resolving to types that provide a `concat` function - `address` can be used as a function to cast a parameter to `address` to eg. retrieve the balance
This makes resolving to attached functions on types even when the reference of those types happen in a different lexical scope.
And also provide alternative paths with and without propagating the dynamic scope. Otherwise, since scope accumulate on the stack it's possible we'll need to resolve an attached function with the wrong dynamic scope at the top of the scope stack.
Both contracts and libraries can optionally push the dynamics scope when traversing to the parent lexical scope (ie. the source unit). But for libraries, they can also optionally push their name to correctly bind internal types which were extended (with `using`) by their qualified name (ie. `Lib.Type`).
Applying a function call with a type will always return a value of that type, so a symbol stack `type,()` is equivalent to `type,@typeof`. Reflect on the binding entry point of the using clause.
These are parsed as modifiers, and they need a similar treatment as parent constructor calls in new constructor definitions.
@@ -797,54 +1081,82 @@ inherit .lexical_scope | |||
| [FixedKeyword] | |||
| [UfixedKeyword] | |||
)] { | |||
let @elementary.symbol = (format "%{}" (source-text @keyword)) | |||
var symbol = (source-text @keyword) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: since these involve string matching and formatting, should we have specific queries instead?
- dedicated queries for
uint
,int
,ufixed
,fixed
that each check for the single value. - another general query for the others that don't need string matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this without enumerating all the possible values in the general query? That will result in string matching anyway, just in a different place. I don't see how it's possible otherwise, given that we don't have a negation operator for queries.
Aside: there are a couple of cases where having a negation query operator would have been useful. I remember @AntonyBlakey asking me about that, and at the time I didn't have any specific use cases. But I've now seen a handful, this one being one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A negation operator would be easy, as you probably know. Although the semantics ... would it mean 'does not appear as a child' or is it negative lookahead. I think t-s is the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the semantics especially in combination with other operators can be hard to design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will result in string matching anyway, just in a different place.
But we won't be matching all values for all these kinds, right? I wonder if it makes a perf impact..
Additionally, for readability/maintainability reasons, I'm not sure what we gain by combining them.
@elementary [ElementaryType @keyword ([BoolKeyword])] {
let @elementary.symbol = "%bool"
}
@elementary [ElementaryType @keyword ([ByteKeyword])] {
let @elementary.symbol = "%byte"
}
@elementary [ElementaryType @keyword ([FixedKeyword])] {
let symbol = (source-text @keyword)
if (symbol == "fixed") {
let @elementary.symbol = "%fixed128x18"
} else {
let @elementary.symbol = (format "%{}" symbol)
}
}
// ...etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I now see what you mean. That looks reasonable. I'll change that.
enabled = From("0.5.0") | ||
), | ||
BuiltInFunction( | ||
name = "encode", | ||
parameters = ["$args"], | ||
parameters = ["$Types"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to generate a usable built_ins.sol
, I wonder if we should add parameter names to all of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done that out of laziness, but I can add the names.
This case though, is one of those that cannot be written in valid Solidity, because the %Types
argument here is a varargs ...
. There are other similar cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha. I think there are a few inconsistencies:
- We need a valid Solidity type, to be able to display to the users/bind in the future. Maybe we can use an array instead of the varargs, similar to how TypeScript type system does it?
- I realized that it is not consistent, since Yul parameters are untyped. Our
BuiltInFunction
type doesn't differentiate between them.. - Very few signatures have a "storage location" like
memory
. If we don't have this data, maybe we can remove the location from all of them? until we actually have a use for it, and can validate/test its impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very few signatures have a "storage location" like memory. If we don't have this data, maybe we can remove the location from all of them? until we actually have a use for it, and can validate/test its impact.
I took this from the Solidity docs. IIUC you need storage location only for types that don't fit an EVM register. I think all instances of such parameters are annotated for the built-ins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that it is not consistent, since Yul parameters are untyped. Our BuiltInFunction type doesn't differentiate between them..
Hmm... good point. I didn't foresee this, but it's a problem for defining the Yul built-ins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a valid Solidity type, to be able to display to the users/bind in the future. Maybe we can use an array instead of the varargs, similar to how TypeScript type system does it?
An array sounds good, but of what type?
@@ -6886,15 +6877,20 @@ codegen_language_macros::compile!(Language( | |||
), | |||
BuiltInType( | |||
name = "$bytes", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: $Bytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's a leftover. I even switched to %Array
but didn't notice this one. Will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... on second thought, this is different from %Array
and more similar to %address
which remains in all lowercase. The rationale behind it is the rules that push the symbol types build them directly from the keywords. But maybe we should change those as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the other elementary types to produce title case variants is not trivial. There is no usable string transformation function in the graph builder. We could implement one, but I'm not sure it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the other elementary types to produce title case variants is not trivial.
we already define pascal_case
and a bunch of other _case
helpers in crates/infra/utils/src/codegen/tera.rs
.
If we are moving to split .msgb
files, and using templates to rendering them, maybe we can use it? It would only apply to the rules we specifically put in built_ins.msgb
.
IIUC, that can also be used to bind to an Address
type instead of %address
.
IIRC, this is common in many other "std" libraries. For example, C#'s string
keyword would bind to a System.String
type.
One other idea if we are brainstorming: define a full keywords_to_types
mapping here, and use it in the automatically-generated built_ins.msgb
, rather than special casing things like this here, and also rules like this.
Not blocker for this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are moving to split .msgb files, and using templates to rendering them, maybe we can use it? It would only apply to the rules we specifically put in built_ins.msgb.
I'm a bit wary of exploding the number of rules. Generating them should easy through templating as you say, but then we need to parse them and execute them. If we need to generate one stanza for each of the elementary types (which we do to change the case of the generated symbol), that's a lot of new rules.
In that case, adding a graph function to transform the string would be much more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, that can also be used to bind to an Address type instead of %address.
This would not be possible anyway, cause we could be conflicting with a user-defined Address
struct/contract/whatever. We still need to prefix built-in types with %
to avoid conflicts with user definitions.
@@ -6961,7 +6981,7 @@ codegen_language_macros::compile!(Language( | |||
functions = [] | |||
), | |||
BuiltInType( | |||
name = "$typeContractType", | |||
name = "$ContractTypeType", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a few TypeType
. Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are the types returned by the type()
expression, hence the double Type
. But I see a conflict here, because we also have UserTypeType
which is the type of a user-defined value type. I'm thinking I'll rename that one to UDVTType
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed UserTypeType
to UserDefinedValueType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I also see three remaining IntTypeType
, ContractTypeType
, and InterfaceTypeType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those remaining are the types returned by type()
expressions. I renamed UserTypeType
because that wasn't something returned by type()
.
│ │ | ||
│ ╰─── def: 3 | ||
│ | ||
4 │ let s := sload(data_slot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be resolved to a slot
property in the built-ins? rather than the data
parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a built-ins struct to represent the .offset
and .slot
for Yul. I'll add it. But in this case I'm not sure, because data_slot
is equivalent to data.slot
so there are two bindings there. I think it may be confusing to bind this to the built-in and not data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a built-in struct to define the possible members of Yul external values. Binding to it is hard-coded in the rules, whenever we encounter a YulPath
with a member access.
One thing I noticed is that the definition of YulPath
does not have version restrictions, but it's only available since 0.7.0 (from what I can test with Remix). Is this worth addressing in the grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because
data_slot
is equivalent todata.slot
so there are two bindings there.
Yes, and IIUC, both should be bound to the slot
built-in property (of "int" type), right? not data
(of type bytes
).
WDYT?
the definition of
YulPath
does not have version restrictions, but it's only available since0.7.0
let x.y.z := 0
is legal in 0.6.12
. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't clear. Yes it's legal, but x.y.z
as a whole is the identifier. Shouldn't that be parsed as a YulIdentifier
? I can work around it in the binding rules by creating the reference/definition on the YulPath
instead for < 0.7.0 though.
Nevermind, I was misreading the spec. x.y.z
is parsed as a YulIdentifier
, specifically between 0.5.8 and <0.7.0. I added a test case to verify that it's bound correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and IIUC, both should be bound to the slot built-in property (of "int" type), right? not data (of type bytes).
WDYT?
Ok, makes sense. I'll change it.
} | ||
contract Test { | ||
function test(Base base) public { | ||
base.value().x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solc produces an error for this code. Not sure I'm following?
TypeError: Member "x" not found or not visible after argument-dependent lookup in int256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, ok. I clearly didn't test this in Remix and the semantics are totally unexpected to me. I thought the getter function would return the same struct
but that's not the case. The documentation is also fuzzy. This complicates things and I'll need to re-think a bit how we're binding public getters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced a fix for the simplest case: a struct that has a single field. In that case, the getter would return the value of such field. If the struct has more fields, Solidity will generate a getter that returns a tuple. That case is not relevant for name binding, since you cannot chain calls to tuple results (since tuples are not a real type).
There is an additional caveat: if the only field in the struct is not an elementary type, then it cannot be returned in a getter. We don't handle that case and blindly bind to the type of the first field.
For more complex cases such as the one presented in Solidity documentation, I don't think we can do anything and will need to be handled in later passes.
For reference, the complex example referred is:
// SPDX-License-Identifier: GPL-3.0
pragma solidity >=0.4.0 <0.9.0;
contract Complex {
struct Data {
uint a;
bytes3 b;
mapping(uint => uint) map;
uint[3] c;
uint[] d;
bytes e;
}
mapping(uint => mapping(bool => Data[])) public data;
}
Which generates this getter:
function data(uint arg1, bool arg2, uint arg3)
public
returns (uint a, bytes3 b, bytes memory e)
{
a = data[arg1][arg2][arg3].a;
b = data[arg1][arg2][arg3].b;
e = data[arg1][arg2][arg3].e;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more complex cases such as the one presented in Solidity documentation, I don't think we can do anything and will need to be handled in later passes.
But does that mean that we still bind the getter identifier to the correct definition?
If so, I think we can accept that for now, and add it as a GitHub issue, or maybe in the "limitations" list so that we don't forget about it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we still bind the getter. I'll definitely add this to the limitations document.
A public getter will never return a complex data type, and a special getter function is generated for arrays, mappings and structs. Arrays and mappings we were already handling, but this commit makes that more explicit. For structs, we bind the type of the first field. There is a caveat here: if the first field is not a type that can be flattened and returned in a public getter we're still binding it, instead of the first field that can be returned. But I don't see a way to be 100% correct here.
Re-running the tests against Sanctuary I noticed some things broke with the last few changes. Expect a couple more commits added to this PR. |
We are usually binding `this` to the enclosing contract type, but it can also be used in library code. For this special ocasion, bind it to a built-in. This will need to be resolved later to the actual contract instance by the backend.
This happens when there is special member access in some Yul identifier (like `x.slot` or `x.offset`). I think this issue was introduced when unreserving the `address` keyword since that changed the structure of `YulPath`. There is a proper test case in NomicFoundation#1149 already, but without this fix running Sanctuary with existing rules crashes.
This avoids some oddities, such as `this` used in libraries which we previously had to bind to an artificial built-in.
40e6da0
to
4c19da0
Compare
This PR contains fixes to bindings for issues that were found by trying to resolve all references in Sanctuary, as well as test cases to verify them. It also improves documentation of the rules and improvements (renames and rearrangements) for readability. The built-in types are also standardized with a PascalCase naming convention.
Most notably, it introduces extension scopes to support resolving attached function set via
using
directives. These extension scopes need to be accessible at any point during the resolution of a symbol stack if the originating reference is located in a lexical scope which is affected by ausing
directive. This forced a refactor of the lexical scope node structure, introducing a new.extended_scope
available in nodes where references may need to resolve to attached functions (ie. function bodies, constant initialization expressions, etc.) Also, for Solidity < 0.7.0using
directives are inherited, so we need to propagate the new extension scopes for all sub-contracts relative to where the directive is located.The extension scope mechanism is implemented using the jump to scope and scope stack features of the stack graphs. The extended scope would optionally push the extension scope for the current contract and then continue resolution through the normal lexical scope. This effectively doubles the search space in the graph when performing a resolution, and this happens every time we inject a new extension scope to the scope stack. This has the potential to exponentially increase the running times and memory requirements. This is a known caveat which will be addressed in a future PR.