-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement support for functions #351
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit extends the syntax and semantics for functions in AirScript: * You can now define functions that take and produce values, using the `fn` keyword, see below for an example. * You can now call value-producing functions in expresssion position, i.e. as operands to a binary operator, as an iterable for use by comprehensions, etc. * Functions are strongly typed, and are type checked to ensure that both the function itself, and all call sites, are correctly typed. Example: ``` fn madd3(a: felt[3], b: felt) -> felt { let d = [c * b for c in a] return sum(d) } ``` In this example: * Two parameters are bound, `a` and `b`, a vector of 3 felts, and a felt, respectively * The return type is declared to be felt * The body of the function can be arbitrarily complex, i.e. you can define variables, comprehensions, etc. * Not illustrated here, but all of the usual global bindings (i.e. trace columns, public inputs, random values, etc.) are in scope and can be referenced. Things you cannot do with functions: * A function _must_ return a value, i.e. it cannot have an empty body * A function _may not_ contain constraints * You may call functions within a function, but recursion is not supported, i.e. an error will be raised if a call is made to a function which is already on the call stack
This commit does two things: 1. Modifies the AST to allow `let` in expression position (scalar or otherwise). 2. Refactors the constant propagation and inlining passes to properly handle occurrances of `let` in expression position, and make use of this new capability to simplify inlining of certain syntax nodes. In particular, the inliner now makes liberal use of this new flexibility in the AST, in order to expand syntax nodes in expression position. Such nodes, with the introduction of functions, can have arbitrarily complex expansions, and with this change those expansions can now be done in-place, rather than attempting to "lift" expressions that may produce block-like expansions into the nearest containing statement block, requiring expensive let-tree rewrites. In fact, it became clear during the testing and implementation of functions that without the ability to expand the syntax tree in this manner, it would be virtually impossible to correctly inline a full AirScript program. For example, consider the following: ``` trace_columns { main: [clk, a, b[4]] } fn fold_vec(a: felt[4]) -> felt { let m = a[0] * a[1] let n = m * a[2] let o = n * a[3] return o } integrity_constraints { let o = a let m = fold_vec(b) enf o = m } ``` After inlining (the old way), we would get (commentary inline): ``` integrity_constraints { # This binding will be shadowed by the binding of the same name from # the inlined body of `fold_vec` let o = a # This `m` is the one inlined from `fold_vec` let m = b[0] * b[1] let n = m * b[2] # This `o` is the one inlined from `fold_vec`, and now shadows the `o` # bound in the original `integrity_constraints` let o = n * b[3] # The inliner moved the original binding of `m` into the innermost # `let` body, so that it can bind the value "returned" from # `fold_vec`, as expected. Because of this, it shadows the `m` that # was inlined from `fold_vec`, and no one is the wiser because the # semantics of the original code are preserved let m = o # This constraint is now incorrect, as the binding, `o`, we intended # to constrain has been shadowed by a different `o`. enf o = m } ``` To summarize, the original inliner needed to split the current block at the statement being expanded/inlined, and move code in two directions: "lifting" inlined statements into the current block before the split, and "lowering" the statements after the split by placing them in the innermost `let` block. This was necessary so that the result of the inlined function (or more generally, any expression with a block-like expansion, e.g. comprehensions) could be bound to the name used in the original source code, with all of the necessary bindings in scope so that the expression that was bound would correctly evaluate during codegen. As we can see, the result of this is that an expanded/inlined syntax node could introduce bindings that would shadow other bindings in scope, and change the behavior of the resulting program (as demonstrated above). This commit allows bindings to be introduced anywhere that an expression is valid. This has the effect of no longer requiring code motion just to support let-bound variables in an inlined/expanded expression. This simplifies the inliner quite a bit.
6 tasks
bobbinth
approved these changes
Jun 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you! I did a relatively shallow review of the parser, but I did verify that Winterfell codegen works correctly (MASM codegen should be OK too as we have a test which compares identical constraints written with and without functions).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new type of function to the AirScript language: "pure" functions. To summarize:
return
statementreturn
statementreturn
is the keyword for a new statement of the formreturn EXPR
, whereEXPR
is the expression which evaluates to the value to be used as the output of the function. A barereturn
is not valid syntax.Example
The example above demonstrates the key features of the new syntax. A few notes:
felt
,felt[N]
, orfelt[N][M]
whereN
andM
are placeholders for the corresponding dimension of the given aggregate type (vectors and matrices, respectively)integrity_constraints
orboundary_constraints
, except the constraints themselves as mentioned above (i.e. theenf
keyword is not allowed in this context at all)This PR supercedes #344
Internally, the support for functions also comes with some refactoring of the AST, so as to allow us to represent arbitrarily complex expansions of syntax nodes in expression position. Previously, the AST was designed to forbid representing syntax that was not supported by the surface language. However, due to the requirements of the inlining phase, it has been necessary to relax this to some degree, and this PR takes that even further, now allowing the use of the
Let
syntax node in bothExpr
andScalarExpr
as an explicit variant.Despite this additional flexibility though, it is still the case that the surface language does not have syntax for this. Furthermore, we explicitly do not support the use of
Let
in expression position until after semantic analysis; and while supported during constant propagation, we would not expect to seeLet
in this position until after the inlining pass has been applied.The point being: do not look at the changes to the AST as representing changes to the language syntax. What is supported by the language is determined by the language grammar/parser, and the AST itself is considerably less restrictive so as to allow for arbitrary transformations during compilation.
@bobbinth I've added some tests, as well as built on top of those initially implemented by @tohrnii. I'm not certain how best to evaluate the codegen side of things, so I'll defer to you on that. However, the codegen tests written by @tohrnii do pass, so I'm assuming those are good.