Skip to content

Project: Improve AST infrastructure

Robin Sommer edited this page Oct 19, 2021 · 3 revisions

Goals

The HILTI/Spicy AST infrastructure has become pretty complex, inconsistent at times, and generally hard to maintain & extend. There are a few areas here that are worth looking into for improving the state of affairs.

The main objectives for this work are:

  • Reduce code complexity and hard-to-trace control flow & runtime behavior.
  • Improve performance of AST processing.
  • Improve build times for the toolchain itself

Update: Some of this is done, see notes below.

Type resolution

[This has been done in #984.]

During AST processing, we’re computing a number of types dynamically right now through type::Computed and expression::TypeWrapped (todo: more ways?), often by providing lambdas that will be evaled at later points to then return a type. This structure makes it very hard to understand when exactly a type will be available, and what dependencies its inference has.

We should instead generally store all types as children of an AST node. If a type is unknown at first, let’s set the type's child node to a new type type::Unresolved at first. We then add a new AST pass TypeResolver that on each run resolves as many of these as possible by replacing the unresolved types with actual types where it can. Eventually all types should become resolved that way.

One other specific issue that our current type resolution/checking has: we’re doing quite a few isA<T> or tryAs<T> right now. Sometimes these may miss if the AST node being examined has not been fully resolved yet.

Spicy to HILTI translation.

[This has been done in #984.]

Currently, the Spicy-to-HILTI translation is just another AST pass during the main iteration loop that will execute multiple times as we keep going over the AST, each time translating what it now can. This structure makes it hard to comprehend when exactly which part of the Spicy code makes it over into HILTI, and when exactly we can rely on seeing the final, pure HILTI AST. This approach is necessary at the moment because the type resolution happens incrementally: the translation can fully complete only once all Spicy-side types are known, but due to various type dependencies, it takes the AST a while to get there.

Once we improve type resolution per above, it should become possible to break the AST processing into three distinct phases:

  1. Fully resolve the Spicy AST, without performing any translation to HILTI yet.
  2. Translate the Spicy AST into a pure HILTI AST.
  3. Fully resolve the HILTI AST.

Risk: This is a bit speculative, we’ll need to see if we run into further problems with this change, like needing the current approach for resolving other things than types (e.g., operators). Hard to predict without trying.

AST matching

[This is still pending.]

Currently it can be challenging (or at least cumbersome) to identify any structural properties of an AST subtree, in particular for expressions (e.g., “is this a constant string?”). Part of the problem here are in particular type coercions as they can “hide” the actual subtree that they enclose. We should revisit (1) how we represent coercions, and (2) more generally see if we can develop helpers for easier AST matching.

Visitor infrastructure

[This is still pending.]

The current visitor infrastructure depends on autogenerated code containing lots of if const-based type dispatching. While that all works, it’s pretty intransparent what’s happening there behind the scenes, and it leads to slow compile times.

We should investigate other forms of implementing our visitors, including traditional virtual methods. Note, however, that our visitor have a couple of characteristics that we’ll either need to retain or solve differently:

  • When dispatching for a node, more one than one visitor method may execute: we are matching a node against all its type-erased “parent classes" as well (e.g., type::String triggers operator() for type::String, Type, and Node).

  • We overload separately for const and non-const instances. This is a bit of an artifact from earlier versions, and may not be that useful anymore; should probably switch to just non-const.

Once we have a new structure for visitors, we should be able to introduce them incrementally: replace visitors one by one with new versions, maybe focussing on Spicy-side visitors first to limit the effort until we are sure we're doing the right thing.

Performance

[This is still pending.]

We should look out for more ways to improve the performance of AST processing. Ideas:

  • Checking two types for equality is expensive currently because for composite types we need to recursively go through any of their child types. If we uniqued types globally so that we could guarantee that there’s only ever exactly one instance of each specific type, we could use pointer comparison instead (LLVM implements type comparison this way).