Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement support for functions #351

Merged
merged 6 commits into from
Aug 9, 2024
Merged

Conversation

bitwalker
Copy link
Contributor

This PR introduces a new type of function to the AirScript language: "pure" functions. To summarize:

  • Pure functions take zero or more parameters, and return a result of scalar or aggregate type.
  • Pure functions must declare the type of all parameters and the result. These types are checked to ensure that uses of the parameters are consistent, and that all callers are passing values of the correct type.
  • Pure functions MUST return a value, using the return statement
  • Correspondingly, they MAY NOT have an empty body, and at minimum must contain a return statement
  • return is the keyword for a new statement of the form return EXPR, where EXPR is the expression which evaluates to the value to be used as the output of the function. A bare return is not valid syntax.
  • Pure functions MAY NOT contain constraints. This is reserved for evaluator functions, which have slightly different syntax and semantics.
  • Like evaluator functions, pure functions may be placed in library modules and imported for use in any program linked to that library.

Example

fn madd3(a: felt[3], b: felt) -> felt {
    let d = [c * b for c in a]
    return sum(d)
}

The example above demonstrates the key features of the new syntax. A few notes:

  • Valid types are felt, felt[N], or felt[N][M] where N and M are placeholders for the corresponding dimension of the given aggregate type (vectors and matrices, respectively)
  • Function bodies can contain all the things you'd expect to find in integrity_constraints or boundary_constraints, except the constraints themselves as mentioned above (i.e. the enf keyword is not allowed in this context at all)

This PR supercedes #344


Internally, the support for functions also comes with some refactoring of the AST, so as to allow us to represent arbitrarily complex expansions of syntax nodes in expression position. Previously, the AST was designed to forbid representing syntax that was not supported by the surface language. However, due to the requirements of the inlining phase, it has been necessary to relax this to some degree, and this PR takes that even further, now allowing the use of the Let syntax node in both Expr and ScalarExpr as an explicit variant.

Despite this additional flexibility though, it is still the case that the surface language does not have syntax for this. Furthermore, we explicitly do not support the use of Let in expression position until after semantic analysis; and while supported during constant propagation, we would not expect to see Let in this position until after the inlining pass has been applied.

The point being: do not look at the changes to the AST as representing changes to the language syntax. What is supported by the language is determined by the language grammar/parser, and the AST itself is considerably less restrictive so as to allow for arbitrary transformations during compilation.

@bobbinth I've added some tests, as well as built on top of those initially implemented by @tohrnii. I'm not certain how best to evaluate the codegen side of things, so I'll defer to you on that. However, the codegen tests written by @tohrnii do pass, so I'm assuming those are good.

This commit extends the syntax and semantics for functions in AirScript:

* You can now define functions that take and produce values, using the
  `fn` keyword, see below for an example.
* You can now call value-producing functions in expresssion position,
  i.e. as operands to a binary operator, as an iterable for use by
  comprehensions, etc.
* Functions are strongly typed, and are type checked to ensure that both
  the function itself, and all call sites, are correctly typed.

Example:

```
fn madd3(a: felt[3], b: felt) -> felt {
    let d = [c * b for c in a]
    return sum(d)
}
```

In this example:

* Two parameters are bound, `a` and `b`, a vector of 3 felts, and a
  felt, respectively
* The return type is declared to be felt
* The body of the function can be arbitrarily complex, i.e. you can
  define variables, comprehensions, etc.
* Not illustrated here, but all of the usual global bindings (i.e. trace
  columns, public inputs, random values, etc.) are in scope and can be
  referenced.

Things you cannot do with functions:

* A function _must_ return a value, i.e. it cannot have an empty body
* A function _may not_ contain constraints
* You may call functions within a function, but recursion is not
  supported, i.e. an error will be raised if a call is made to a
  function which is already on the call stack
This commit does two things:

1. Modifies the AST to allow `let` in expression position (scalar or
   otherwise).
2. Refactors the constant propagation and inlining passes to properly
   handle occurrances of `let` in expression position, and make use of
   this new capability to simplify inlining of certain syntax nodes.

In particular, the inliner now makes liberal use of this new flexibility
in the AST, in order to expand syntax nodes in expression position. Such
nodes, with the introduction of functions, can have arbitrarily complex
expansions, and with this change those expansions can now be done in-place,
rather than attempting to "lift" expressions that may produce block-like
expansions into the nearest containing statement block, requiring expensive
let-tree rewrites.

In fact, it became clear during the testing and implementation of
functions that without the ability to expand the syntax tree in this
manner, it would be virtually impossible to correctly inline a full
AirScript program. For example, consider the following:

```
trace_columns {
    main: [clk, a, b[4]]
}

fn fold_vec(a: felt[4]) -> felt {
    let m = a[0] * a[1]
    let n = m * a[2]
    let o = n * a[3]
    return o
}

integrity_constraints {
    let o = a
    let m = fold_vec(b)
    enf o = m
}
```

After inlining (the old way), we would get (commentary inline):

```
integrity_constraints {
    # This binding will be shadowed by the binding of the same name from
    # the inlined body of `fold_vec`
    let o = a

    # This `m` is the one inlined from `fold_vec`
    let m = b[0] * b[1]
    let n = m * b[2]
    # This `o` is the one inlined from `fold_vec`, and now shadows the `o`
    # bound in the original `integrity_constraints`
    let o = n * b[3]

    # The inliner moved the original binding of `m` into the innermost
    # `let` body, so that it can bind the value "returned" from
    # `fold_vec`, as expected. Because of this, it shadows the `m` that
    # was inlined from `fold_vec`, and no one is the wiser because the
    # semantics of the original code are preserved
    let m = o
    # This constraint is now incorrect, as the binding, `o`, we intended
    # to constrain has been shadowed by a different `o`.
    enf o = m
}
```

To summarize, the original inliner needed to split the current block at
the statement being expanded/inlined, and move code in two directions:
"lifting" inlined statements into the current block before the split,
and "lowering" the statements after the split by placing them in the
innermost `let` block. This was necessary so that the result of the
inlined function (or more generally, any expression with a block-like
expansion, e.g. comprehensions) could be bound to the name used in the
original source code, with all of the necessary bindings in scope so
that the expression that was bound would correctly evaluate during
codegen.

As we can see, the result of this is that an expanded/inlined syntax node
could introduce bindings that would shadow other bindings in scope, and
change the behavior of the resulting program (as demonstrated above).

This commit allows bindings to be introduced anywhere that an expression
is valid. This has the effect of no longer requiring code motion just to
support let-bound variables in an inlined/expanded expression. This
simplifies the inliner quite a bit.
@bitwalker bitwalker added enhancement New feature or request parser labels Jun 21, 2024
@bitwalker bitwalker requested a review from bobbinth June 21, 2024 00:50
@bitwalker bitwalker self-assigned this Jun 21, 2024
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you! I did a relatively shallow review of the parser, but I did verify that Winterfell codegen works correctly (MASM codegen should be OK too as we have a test which compares identical constraints written with and without functions).

parser/src/parser/grammar.lalrpop Outdated Show resolved Hide resolved
@bitwalker bitwalker merged commit 0f5feff into next Aug 9, 2024
4 checks passed
@bitwalker bitwalker deleted the bitwalker/functions branch August 9, 2024 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants