Sequent Core is a GHC plugin library based on a sequent calculus. It includes:
- A set of datatypes for a language expressing function code as interactions between values and their contexts (continuations)
- A library for writing GHC optimizer plugins using the Sequent Core language in place of the built-in Core language
- Example plugins written using Sequent Core
At the moment, this library is highly experimental. We would appreciate any feedback at [email protected].
Like most functional languages and λ-calculi, GHC's Core language emphasizes values. An insight of the study of sequent calculi is that just as important are contexts—we can see computation as an interaction between a value and its context. Since the properties of a term, such as strictness, often depend on its context, giving them equal footing in an AST gives an analysis more information where it's needed, potentially allowing for significant savings in the size and complexity of optimization passes and other transforms.
This library reformulates the Core language as a sequent calculus, which we call Sequent Core. We provide faithful translations between Core and Sequent Core so that passes operating on Sequent Core can be first-class citizens alongside traditional Core-to-Core passes. To this end, we also provide a simple wrapper for writing a GHC plugin using only the Sequent Core representation of module code.
After type-checking, GHC distills Haskell code down to a basic language, simply called Core. A typical bit of (simplified) Core code might be this:
case f x y of
True -> g x
False -> y
If this expression is evaluated, what is the first thing that will happen? This
is not a simple question, especially in Haskell. As this case
is strict, we
know that the scrutinee f x y
must be evaluated. (In fact, in Core (but not
in Haskell), every case
is strict.) As function calls are strict in the
function being called, this forces f x
and, in turn, f
. So the first action
is to evaluate f
.
In analyzing Core code, generally one proceeds by recursion, top to bottom. But
the particular context of the current expression is often crucial—above, we
needed to know that f x y
is being forced to conclude that f
will be
evaluated. Thus the recursive calls need to maintain information about the outer
structure (much as a zipper does). But each recursive function then has to
invent its own representation of a context! Thus simple tasks that recognize
patterns in the code often require substantial ad-hoc bookkeeping.
We would prefer to write the above code like this:
compute p. <f | $ x
; $ y
; case of
True -> <g | $ x; ret p >
False -> <y | ret p>>
We will explicate this syntax shortly, but the basic idea is apparent: We
perform a computation in some context we call p
. The top
of the term says what to do first, namely evaluate f
. Next, we apply that
value to x
, and then apply that value to y
. Then, we perform a case
analysis on that result. In the True
case, we evaluate g
, apply x
to it,
then return; in the False
case, we evaluate y and return it directly. In
either case, the return sends the returned value to the context p
.
The essential term in Sequent Core is the command. It encompasses:
- A term—usually a variable or a lambda
- A continuation—the context of the term, i.e. the operations being performed on it
- A list of bindings in scope, each naming a suspended computation
The general syntax we use to represent a command looks like this, where t
is
a term, k
is a continuation, and bs
is a series of bindings (some of
which may be mutually recursive blocks):
let bs in <t | k>
If there are no bindings, we leave off the let ... in
.
Just as with CPS, each λ-abstraction has an extra argument to bind a continuation---the context in which the function is called. Passing a value to this continuation thus determines the result of the function call.
Applying a function to an argument is accomplished with an App
continuation, which
specifies an argument for the computed function. We use the notation $ t; k
,
where t
is some argument and k
is the outer continuation. Thus, if the
context is bound as p
, then
f x y
is expressed as:
<f | $ x; $ y; ret p>.
This can be read as “Take f
and apply it to x
, then apply that to y
, then
return that to p
.”
The argument specified is some term. However, a term can be a Compute
form,
which wraps a command. Since our semantics is call-by-name, this command will
remain unevaluated until forced.
So a slightly more complicated example would be:
f (g x) (h y)
which becomes:
<f | $ compute q. <g | $ x; ret q>
; $ compute q. <h | $ y; ret q>
; ret p>
To use a value of a data type, one performs a case analysis. In Sequent Core, we
represent this by a Case
continuation. Just as an App
is an application
missing the function, a Case
is a case expression missing the scrutinee. The
right-hand side of each case is a command.
case f x of
Left y -> g y
Right z -> h z
becomes:
<f | $ x
; case of
Left y -> <g | $ y; ret p>
Right z -> <h | $ z; ret p>>
As before, Sequent Core emphasizes the execution order by putting the first
action, the application of f
to x
, on top.
A command describes an interaction between a value and a continuation. It is an
action in progress. Of course, usually a Haskell program has a number of
suspended computations as well, waiting to be activated on demand. Since these
let
-bound computations are not taking part, we don't include them in the
term or computation part of the command; instead, each command carries a list
of bindings with it. Hence:
let x = f y in g x z
becomes:
let
x = compute q. <f | $ y; ret q>
in
<g | $ x; $ z; ret p>
There are a few odds and ends in Core that we must deal with in Sequent Core, since we aim to translate back and forth faithfully:
- Types
For simpler data structures, Core includes types as expressions. This allows, for instance, type abstraction and application to use the sameLam
andApp
constructors as for terms. Thus we includeType
as a constructor for Terms; it acts the same way it does in Core. - Coercions
Similarly, Core includes coercion terms for doing type-safe casts, so they are Terms in Sequent Core as well. - Casts
Coercions are used bycast
expressions: The Core expressione `cast` γ
is operationally the same ase
, but its type is altered according toγ
. We express a cast using a continuation, so ift
is a term,compute p. <t | cast γ; ret p>
is the Sequent Core form oft `cast` γ
. - Ticks
Finally, Core includes ticks, which are essentially markers for bookkeeping in the profiler. These wrap expressions, so we include them as continuations in a similar manner to casts.
Our data types divide the constructors of the Core
datatype into the three
types, Term
, Kont
, and Command
. Thus
the Sequent Core syntax is closely related to Core, making the translation
relatively simple. Here are all the constructors of the original Core
type,
showing where we put each one:
Constructor | |||
---|---|---|---|
Var | Term | ||
Lit | Term | ||
App | Kont | ||
Let | Command | ||
Case | Kont | ||
Cast | Kont | ||
Tick | Kont | ||
Type | Term | ||
Coercion | Term |
(As noted above, an application will be a simple Term if the function is a data constructor and there are enough arguments to saturate it.)
To get a feel for Sequent Core, let us consider a simple function,
this tail-recursive sum
:
sum :: [Int] -> Int
sum = sum' 0
where
sum' :: Int -> [Int] -> Int
sum' a [] = a
sum' a (x:xs) = sum' (a+x) xs
Here is the Core that is generated by the desugarer (much simplified; to get
more gory details, use GHC's -ddump-ds
option):
Main.sum =
letrec {
sum' :: Int -> [Int] -> Int
sum' =
\ (a :: Int) (ds :: [Int]) ->
case ds of _ {
[] -> a;
: x xs ->
sum' (+ @Int $fNumInt a x) xs
}; } in
sum' (I# 0#)
Largely the structure remains intact, though Core rewrites while
as let
,
makes recursive bindings explicit, etc. Also note that (+)
is called with
four arguments—types and constraints are turned into explicit arguments in
Core. Finally, note that the zero is explicitly boxed; Core makes boxing and
unboxing of primitives explicit as well.
Now for the Sequent Core version:
Main.sum =
letrec
{
sum' :: Int -> [Int] -> Int
sum' =
\ (a :: Int) (ds :: [Int]) | (p :: Cont# Int) ->
<ds
| case as _ of
[] -> <a | ret p>
x : xs ->
<sum'
| $ compute (q :: Cont# Int).
<+
| $ @Int
; $ $fNumInt
; $ a
; $ x
; ret q>
; $ xs
; ret p>>
}
in compute (p :: Cont# (Int -> Int)).
<sum' | $ (compute q. <I# | $ 0#; ret q>)
; ret p>>
The Language.SequentCore.Plugin
module provides a new way of writing GHC
optimization plugins: Rather than write a function taking Core and producing
Core, you can write your plugin as a function taking Sequent Core and producing
Sequent Core, then use the sequentPass
function to wrap it as a regular
Core-to-Core pass.
This package includes a simple plugin, Language.SequentCore.Dump
, which does
nothing but pretty-print the Sequent Core code to the console. (It is
essentially ddump-ds
but it outputs Sequent Core rather than Core.) There are
three definitions in the module:
plugin :: Plugin
install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
showSequentCore :: [SeqCoreBind] -> CoreM [SeqCoreBind]
First, there is the plugin
record:
plugin :: Plugin
plugin = defaultPlugin {
installCoreToDos = install
}
Any GHC plugin must export a value called plugin
of type Plugin
(exported
from the GhcPlugins
library; see the imports in Dump.hs
). The
defaultPlugin
record has suitable defaults, so we need only hook in our code.
The installCoreToDos
field is a hook that allows the plugin to modify the
Core-to-Core pipeline, so that's what we set.
Next is the install
function:
install :: [CommandLineOption] -> [CoreToDo] -> CoreM [CoreToDo]
install _ todos =
do reinitializeGlobals
return $ newPass : todos
where
newPass = CoreDoPluginPass "sequent-core-dump" passFunc
passFunc = sequentPass showSequentCore
The install
function must take a list of command-line arguments (we ignore
these here) and a list of Core-to-Core passes, and return a modified list of
passes. First, it must call GHC's reinitializeGlobals
(as a workaround for a
Windows-specific bug). Then it returns the pipeline with a new pass stuck onto
the front. The pass is a CoreDoPluginPass
, which we give the (arbitrary) name
"sequent-core-dump"
, defined by the function passFunc
. Here is where the
Sequent Core library steps in: The sequentPass
function, exported by
Language.SequentCore.Plugin
, wraps showSequentCore
so that it operates on
Core instead of Sequent Core.
Finally, we have the implementation of the pass itself:
showSequentCore :: [SeqCoreBind] -> CoreM [SeqCoreBind]
showSequentCore bs = do
putMsg (ppr_binds_top bs)
return bs
As advertised, the pass simply dumps the module's bindings to the console, then
returns them unchanged. Now, if you have installed this package, you can see the
Sequent Core for some module Module.hs
by compiling it with the -fplugin
flag:
$ ghc -c Module.hs -fplugin=Language.SequentCore.Dump
Hopefully this (and the Haddock documentation) should get you started writing optimization passes using Sequent Core. As this is all in a very early state, we would appreciate any feedback or ideas at [email protected]. Thanks!
Luke Maurer
Paul Downen
Iavor S. Diatchki