-
Notifications
You must be signed in to change notification settings - Fork 6
Introduction
For the larger part of my career as a compiler engineer, I've written C++ code, initially working on the GCC compiler, and then mostly LLVM. A few years back I worked on an interpreter (for a DSL) written in OCaml and went on to write a compiler for the same language in OCaml, targeting LLVM. More recently, as part of my day job, I'm now working on a compiler for the Sway language, written in Rust.
With that background, as a personal project, I ventured to start writing an extensible compiler framework in Rust. The design and ideas are mostly based on the MLIR framework. Extensible here means that the compiler does not have a fixed set of operations (opcodes) or type system, but instead can be (almost) arbitrarily extended.
A "hello world" IR in pliron
, when printed, looks like this:
builtin.module @bar {
^block_1v1():
builtin.func @foo: builtin.function<() -> (builtin.int<si64>)> {
^entry_block_2v1():
c0_op_3v1_res0 = test.constant builtin.integer <0x0: builtin.int<si64>>;
test.return c0_op_3v1_res0
}
}
As with MLIR, module
, func
, constant
and return
are operations, prefixed with their dialect names. This code declares a module bar
containing a function foo
that returns a constant 0
.
For compilers, static analyzers and other related tools written in Rust, today, the only way to adopt MLIR is by wrapping around MLIR's C bindings. Such an endeavour however comes at a cost: debugging is hard. Here's an illustrative example:
// Built against LLVM Debug 18.1.2
1│#include <mlir-c/IR.h>
2│
3│int main() {
4│ MlirContext ctx = mlirContextCreate();
5│ MlirStringRef filname = mlirStringRefCreateFromCString("foo.mlir");
6│ MlirLocation loc = mlirLocationFileLineColGet(ctx, filname, 1, 1);
7│
8│ MlirModule module1 = mlirModuleCreateEmpty(loc);
9│ MlirOperation opr1 = mlirModuleGetOperation(module1);
10│
11│ // mlirOperationDestroy(opr1);
12│
13│ mlirOperationDump(opr1);
14│ MlirOperation opr2 = mlirOperationClone(opr1);
15│ mlirOperationDump(opr2);
16│
17│ return 0;
18│}
Running this code prints the following:
module {
}
module {
}
If line 11 is uncommented, then the following is printed and the program crashes.
"builtin.module"() ({
}) : () -> ()
test.out: llvm-project/mlir/lib/IR/Region.cpp:79: void
mlir::Region::cloneInto(mlir::Region *, Region::iterator, mlir::IRMapping
&): Assertion `this != dest && "cannot clone region into itself"' failed.
Aborted (core dumped)
What makes this hard to debug?
- Even after
opr1
was erased, dumping it actually works, giving an impression that it's all fine at that point. - The crash message provides no information as to why it happened. In a large program, to be able to debug this, a developer must be familiar with MLIR internals, which isn't common for Rust programmers using this API. Often Rust programmers may not even be fluent in C++.
The type-system exposed by the llvm-c (or mlir-c) API is fundamentally weaker than what can be natively expressed in Rust (or even C++).
As an example, the C++ API of LLVM provides an IntegerType::getBitWidth
method. Its counterpart in the C-API is LLVMGetIntTypeWidth(LLVMTypeRef IntegerTy)
. The argument here is a generic LLVMTypeRef
. Thus the type-system does not prevent us from calling this function with a type other than IntegerType
. In the best case (with a debug-build), this hits an assert at runtime, but otherwise we end up with a non-deterministic value or a crash.
To overcome the type-system limitation, projects such as inkwell define Rust types over the llvm-c types to provide a safer API. This however has limitations because we cannot always validate the inputs to an llvm-c function. For example, GEP indices cannot be validated before we construct a GEP, leading to possible crashes.
This problem is further amplified by the fact that the llvm-c API does not expose many functionalities that are available in the C++ API. For example, when constructing an ArrayType
, to pre-validate that the element type is valid, one could call ArrayType::isValidElementType
with the C++ API. But this is not available in the C-API. Similarly, the LLVM community was reluctant to expose GetElementPtrInst::getIndexedType
, a public C++ method in the LLVM-C API. Without this, we'll need to re-implement the method if we want to validate the indices before building a GEP.
More importantly, the C-API is limited in the higher-level compiler functionality that it provides. For example:
- The MLIR-C API does not provide means to create new dialects, operations, types or interfaces, but rather use what is already defined in the MLIR codebase.
- The LLVM-C API does not provide access to the many analyses / transformations directly that's available in LLVM. One cannot get the dominator tree of a function or do a SCEV analysis from the C-API, for example.
In other words, the C-API is designed to interact with the compiler, not to extend it.
Finally, and obviously, the static memory safety guarantees of Rust are lost when interacting with a C++ library, limiting it to outside the Rust wrappers. A natively written framework guarantees memory safety.
In it's current state pliron
is a compiler infrastructure and not yet a useful compiler. In other words, the tools and data-structures to represent an IR (or multiple dialects of them) are mostly there, but there are no useful algorithms (analyses / optimizations) implemented yet. We have a proof-of-concept LLVM-IR dialect that is capable of representing a simple fibonacci program.
At its current state, pliron
only demonstrates that it is possible
(and practical) to write an MLIR-like extensible compiler in Rust.
There is plenty of work left to enable production use of pliron
.
- Provide a proof-of-concept dialect for the cranelift IR.
- Complete the
LLVM
dialect. - Support for symbol tables.
- Generation of print and parse functions for operations, types and attributes based on a meta-language in derive macros. See discussion and a possible syntax.
- Integrate suitable APInt and
APFloat
libraries for numeric constants.
and a whole lot more ...
- A comparison of
pliron
with other compiler frameworks