title | category | categoryindex | index |
---|---|---|---|
Overview |
Compiler Internals |
200 |
100 |
There are several artifacts involved in the development of F#:
-
FSharp.Compiler.Service (docs, source). Contains all logic for F# compilation - including parsing, syntax tree processing, typechecking, constraint solving, optimizations, IL importing, IL writing, pretty printing of F# constructs, and F# metadata format processing - and the F# compiler APIs for tooling.
-
The F# compiler executable, called
fsc
, which is called as a console app. It sets the .NET GC into batch mode and then invokesFSharp.Compiler.Service
with command-line arguments. -
The FSharp.Core Library, called
FSharp.Core
. Contains all primitive F# types and logic for how they interact, core data structures and library functions for operating on them, structured printing logic, units of measure for scientific programming, core numeric functionality, F# quotations, F# type reflection logic, and asynchronous programming types and logic. -
The F# Interactive tool, called
fsi
. A REPL for F# that supports execution and pretty-printing of F# code and results, loading F# script files, referencing assemblies, and referencing packages from NuGet.
The FSharp.Compiler.Service
is by far the largest of these components and contains nearly all logic that fsc
and fsi
use. It is the primary subject of this guide.
The following are the key data formats and internal data representations of the F# compiler code in its various configurations:
-
Input source files Read as Unicode text, or binary for referenced assemblies.
-
Input command-line arguments See CompilerOptions.fs for the full code implementing the arguments table. Command-line arguments are also accepted by the F# Compiler Service API in project specifications, and as optional input to F# Interactive.
-
Tokens, see pars.fsy, lex.fsl, LexHelpers.fs and related files.
-
Abstract Syntax Tree (AST), see SyntaxTree.fs, the untyped syntax tree resulting from parsing.
-
Typed Abstract Syntax Tree (Typed Tree), see TypedTree.fs, TypedTreeBasics.fs, TypedTreeOps.fs, and related files. The typed, bound syntax tree including both type/module definitions and their backing expressions, resulting from type checking and the subject of successive phases of optimization and representation change.
-
Type checking context/state, see for example
TcState
in ParseAndCheckInputs.fsi and its constituent parts, particularlyTcEnv
in CheckExpressions.fsi andNameResolutionEnv
in NameResolution.fsi. A set of tables representing the available names, assemblies etc. in scope during type checking, plus associated information. -
Abstract IL, the output of code generation, then used for binary generation, and the input format when reading .NET assemblies, see
ILModuleDef
in il.fsi. -
The .NET Binary format (with added "pickled" F# Metadata resource), the final output of fsc.exe, see the ECMA 335 specification and the ilread.fs and ilwrite.fs binary reader/generator implementations. The added F# metadata is stored in a binary resource, see TypedTreePickle.fs.
-
The incrementally emitted .NET reflection assembly, the incremental output of fsi.exe. See ilreflect.fs.
The following are the most relevant parts of the F# compiler tooling, making up the "engine" and API surface area of FSharp.Compiler.Service
.
-
The incremental project build engine state in IncrementalBuild.fsi/IncrementalBuild.fs, a part of the F# Compiler Service API.
-
The corresponding APIs wrapping and accessing these structures in the public-facing
FSharp.Compiler.Service
API and Symbol API. -
The F# Compiler Service Caches, the various caches maintained by an instance of an
FSharpChecker
.
The following is a diagram of how the different phases of the F# compiler work:
stateDiagram-v2
state "Compilation phases" as Flow {
Lexing: Lexing
Parsing: Parsing
Import: Import
Typechecking: Type checking
Codegen: Code generation
Emit: IL emit
Inputs --> Lexing: Source and signature files
Inputs --> Import: References
Lexing --> Parsing
Parsing --> Typechecking
Import --> Typechecking
Typechecking --> Codegen
Codegen --> Emit
state Lexing {
BasicLexing: Basic Lexing
WhitespaceSensitiveLexing: Whitespace Sensitive Lexing
[*] --> BasicLexing
BasicLexing --> WhitespaceSensitiveLexing: A token stream from input source text.
WhitespaceSensitiveLexing --> [*]: A token stream, augmented per the F# Language Specification.
}
state Parsing {
Parser: Parsing
[*] --> Parser
Parser --> [*]: AST per the grammar in the F# Language Specification.
}
state Import {
Resolving: Resolving references
ImportNET: Importing .NET references
ImportFS: Importing F# references
[*] --> Resolving
Resolving --> ImportNET
Resolving --> ImportFS
ImportNET --> [*]
ImportFS --> [*]
}
state Typechecking {
SequentialTypechecking: Sequentially type checking files
PatternMatchCompilation: Pattern match compilation
ConstraintSolving: Constraint solving
PostInferenceChecks: Post inference checks
[*] --> SequentialTypechecking
SequentialTypechecking --> PatternMatchCompilation
PatternMatchCompilation --> ConstraintSolving
ConstraintSolving --> PostInferenceChecks
PostInferenceChecks --> [*]
}
state Codegen {
QuotationTranslation: Quotation translation
Optimization: Optimization
Codegeneration: Code generation
AbstractILRewrite: Abstract IL rewriting
[*] --> QuotationTranslation
QuotationTranslation --> Optimization
Optimization --> Codegeneration
Codegeneration --> AbstractILRewrite
AbstractILRewrite --> [*]
}
state Emit {
Binary: Binary emit
Reflection: Reflection emit
Output: Output (assembly, references, PDBs, etc.)
[*] --> Binary
[*] --> Reflection
Binary --> Output
Reflection --> Output
}
}
The following are the key phases and high-level logical operations of the F# compiler code in its various configurations:
-
Basic lexing. Produces a token stream from input source file text. F# uses the FsLex tool to process a declarative specification of the tokenizer in lex.fsl. This compiles the tokenizer specification to a number of tables which are then interpreted by the code in prim-lexing.fs (see also prim-lexing.fsi.
-
White-space sensitive lexing. Accepts and produces a token stream, augmenting per the F# Language Specification.
-
Parsing. Accepts a token stream and produces an AST per the grammar in the F# Language Specification. F# uses the FsYacc tool to process a declarative specification of the parser in pars.fsy. This compiles the grammar to a number of tables which are then interpreted by the code in prim-parsing.fs (see also prim-parsing.fsi.
-
Resolving references. For .NET SDK generally references are resolved explicitly by external tooling. There is a legacy aspect to this if references use old .NET Framework references including for scripting. See ReferenceResolver.fs for the abstract definition of compiler reference resolution. See LegacyMSBuildReferenceResolver.fs for reference resolution used by the .NET Framework F# compiler when running on .NET Framework. See SimulatedMSBuildReferenceResolver.fs when not using the .NET Framework F# compiler. See DependencyManager for reference resolution and package management used in
fsi
. -
Importing referenced .NET binaries, see import.fsi/import.fs. Accepts file references and produces a Typed Tree node for each referenced assembly, including information about its type definitions (and type forwarders if any).
-
Importing referenced F# binaries and optimization information as Typed Tree data structures, see TypedTreePickle.fs. Accepts binary data and produces Typed Tree nodes for each referenced assembly, including information about its type/module/function/member definitions.
-
Sequentially type checking files, see CheckDeclarations.fsi/CheckDeclarations.fs. Accepts an AST plus a type checking context/state and produces new Typed Tree nodes incorporated into an updated type checking state, plus additional Typed Tree Expression nodes used during code generation. A key part of this is checking syntactic types and expressions, see CheckExpressions.fsi/CheckExpressions.fs including the state held across the checking of a file (see
TcFileState
) and the environment active as we traverse declarations and expressions (seeTcEnv
). -
Pattern match compilation, see PatternMatchCompilation.fsi/PatternMatchCompilation.fs. Accepts a subset of checked Typed Tree nodes representing F# pattern matching and produces Typed Tree expressions implementing the pattern matching. Called during type checking as each construct involving pattern matching is processed.
-
Constraint solving, see ConstraintSolver.fsi/ConstraintSolver.fs. A constraint solver state is maintained during type checking of a single file, and constraints are progressively asserted (i.e. added to this state). Fresh inference variables are generated and variables are eliminated (solved). Variables are also generalized at various language constructs, or explicitly declared, making them "rigid". Called during type checking as each construct is processed.
-
Post-inference type checks, see PostInferenceChecks.fsi/PostInferenceChecks.fs. Called at the end of type checking/inference for each file. A range of checks that can only be enforced after type checking on a file is complete, such as analysis when using
byref<'T>
or otherIsByRefLike
structs. -
Quotation translation, see QuotationTranslator.fsi/QuotationTranslator.fs/QuotationPickler.fsi/QuotationPickler.fs. Generates the stored information for F# quotation nodes, generated from the Typed Tree expression structures of the F# compiler. Quotations are ultimately stored as binary data plus some added type references. "ReflectedDefinition" quotations are collected and stored in a single blob.
-
Optimization phases, primarily the "Optimize" (peephole/inlining) and "Top Level Representation" (lambda lifting) phases, see Optimizer.fsi/Optimizer.fs and InnerLambdasToTopLevelFuncs.fsi/InnerLambdasToTopLevelFuncs.fs and LowerCalls.fs. Each of these takes Typed Tree nodes for types and expressions and either modifies the nodes in place or produces new Typed Tree nodes. These phases are orchestrated in CompilerOptions.fs
-
Code generation, see IlxGen.fsi/IlxGen.fs. Accepts Typed Tree nodes and produces Abstract IL nodes, sometimes applying optimizations.
-
Abstract IL code rewriting, see EraseClosures.fs and EraseUnions.fs. Eliminates some constructs by rewriting Abstract IL nodes.
-
Binary emit, see ilwrite.fsi/ilwrite.fs.
-
Reflection-Emit, see ilreflect.fs.
These and transformations used to build the following:
-
The F# Compiler Service API, see the Symbol API and Service API
-
The F# Interactive Shell, see fsi.fs.
-
The F# Compiler Shell, see fsc.fs and fscmain.fs.
The F# compiler is bootstrapped. That is, an existing F# compiler is used to build a "proto" compiler from the current source code. That "proto" compiler is then used to compile itself, producing a "final" compiler. This ensures the final compiler is compiled with all relevant optimizations and fixes.
FSharp.Build.dll
and Microsoft.FSharp.targets
give MSBuild support for F# projects (.fsproj
) and contain the targets. Although not strictly part of the F# compiler, they are essential for using F# in all contexts for .NET, aside from some more targeted scripting scenarios. The targets expose things like the CoreCompile
and Fsc
tasks called by MSBuild.