Skip to content

Latest commit

 

History

History
185 lines (134 loc) · 17.8 KB

overview.md

File metadata and controls

185 lines (134 loc) · 17.8 KB
title category categoryindex index
Overview
Compiler Internals
200
100

Overview

There are several artifacts involved in the development of F#:

  • FSharp.Compiler.Service (docs, source). Contains all logic for F# compilation - including parsing, syntax tree processing, typechecking, constraint solving, optimizations, IL importing, IL writing, pretty printing of F# constructs, and F# metadata format processing - and the F# compiler APIs for tooling.

  • The F# compiler executable, called fsc, which is called as a console app. It sets the .NET GC into batch mode and then invokes FSharp.Compiler.Service with command-line arguments.

  • The FSharp.Core Library, called FSharp.Core. Contains all primitive F# types and logic for how they interact, core data structures and library functions for operating on them, structured printing logic, units of measure for scientific programming, core numeric functionality, F# quotations, F# type reflection logic, and asynchronous programming types and logic.

  • The F# Interactive tool, called fsi. A REPL for F# that supports execution and pretty-printing of F# code and results, loading F# script files, referencing assemblies, and referencing packages from NuGet.

The FSharp.Compiler.Service is by far the largest of these components and contains nearly all logic that fsc and fsi use. It is the primary subject of this guide.

Key compiler data formats and representations

The following are the key data formats and internal data representations of the F# compiler code in its various configurations:

  • Input source files Read as Unicode text, or binary for referenced assemblies.

  • Input command-line arguments See CompilerOptions.fs for the full code implementing the arguments table. Command-line arguments are also accepted by the F# Compiler Service API in project specifications, and as optional input to F# Interactive.

  • Tokens, see pars.fsy, lex.fsl, LexHelpers.fs and related files.

  • Abstract Syntax Tree (AST), see SyntaxTree.fs, the untyped syntax tree resulting from parsing.

  • Typed Abstract Syntax Tree (Typed Tree), see TypedTree.fs, TypedTreeBasics.fs, TypedTreeOps.fs, and related files. The typed, bound syntax tree including both type/module definitions and their backing expressions, resulting from type checking and the subject of successive phases of optimization and representation change.

  • Type checking context/state, see for example TcState in ParseAndCheckInputs.fsi and its constituent parts, particularly TcEnv in CheckExpressions.fsi and NameResolutionEnv in NameResolution.fsi. A set of tables representing the available names, assemblies etc. in scope during type checking, plus associated information.

  • Abstract IL, the output of code generation, then used for binary generation, and the input format when reading .NET assemblies, see ILModuleDef in il.fsi.

  • The .NET Binary format (with added "pickled" F# Metadata resource), the final output of fsc.exe, see the ECMA 335 specification and the ilread.fs and ilwrite.fs binary reader/generator implementations. The added F# metadata is stored in a binary resource, see TypedTreePickle.fs.

  • The incrementally emitted .NET reflection assembly, the incremental output of fsi.exe. See ilreflect.fs.

Key constructs and APIs for F# tooling

The following are the most relevant parts of the F# compiler tooling, making up the "engine" and API surface area of FSharp.Compiler.Service.

Key compiler phases

The following is a diagram of how the different phases of the F# compiler work:

stateDiagram-v2
    state "Compilation phases" as Flow {
      Lexing: Lexing
      Parsing: Parsing
      Import: Import
      Typechecking: Type checking
      Codegen: Code generation
      Emit: IL emit
      Inputs --> Lexing: Source and signature files
      Inputs --> Import: References
      Lexing --> Parsing
      Parsing --> Typechecking
      Import --> Typechecking
      Typechecking --> Codegen
      Codegen --> Emit
      state Lexing {
          BasicLexing: Basic Lexing
          WhitespaceSensitiveLexing: Whitespace Sensitive Lexing
          [*] --> BasicLexing
          BasicLexing --> WhitespaceSensitiveLexing: A token stream from input source text.
          WhitespaceSensitiveLexing --> [*]: A token stream, augmented per the F# Language Specification.
      }
      state Parsing {
          Parser: Parsing
          [*] --> Parser
          Parser --> [*]: AST per the grammar in the F# Language Specification.
      }
      state Import {
          Resolving: Resolving references
          ImportNET: Importing .NET references
          ImportFS: Importing F# references
          [*] --> Resolving
          Resolving --> ImportNET
          Resolving --> ImportFS
          ImportNET --> [*]
          ImportFS --> [*]
      }
      state Typechecking {
          SequentialTypechecking: Sequentially type checking files
          PatternMatchCompilation: Pattern match compilation
          ConstraintSolving: Constraint solving
          PostInferenceChecks: Post inference checks
          [*] --> SequentialTypechecking
          SequentialTypechecking --> PatternMatchCompilation
          PatternMatchCompilation --> ConstraintSolving
          ConstraintSolving --> PostInferenceChecks
          PostInferenceChecks --> [*]
      }
      state Codegen {
          QuotationTranslation: Quotation translation
          Optimization: Optimization
          Codegeneration: Code generation
          AbstractILRewrite: Abstract IL rewriting
          [*] --> QuotationTranslation
          QuotationTranslation --> Optimization
          Optimization --> Codegeneration
          Codegeneration --> AbstractILRewrite
          AbstractILRewrite --> [*]
      }
      state Emit {
          Binary: Binary emit
          Reflection: Reflection emit
          Output: Output (assembly, references, PDBs, etc.)
          [*] --> Binary
          [*] --> Reflection
          Binary --> Output
          Reflection --> Output
      }
  }
Loading

The following are the key phases and high-level logical operations of the F# compiler code in its various configurations:

  • Basic lexing. Produces a token stream from input source file text. F# uses the FsLex tool to process a declarative specification of the tokenizer in lex.fsl. This compiles the tokenizer specification to a number of tables which are then interpreted by the code in prim-lexing.fs (see also prim-lexing.fsi.

  • White-space sensitive lexing. Accepts and produces a token stream, augmenting per the F# Language Specification.

  • Parsing. Accepts a token stream and produces an AST per the grammar in the F# Language Specification. F# uses the FsYacc tool to process a declarative specification of the parser in pars.fsy. This compiles the grammar to a number of tables which are then interpreted by the code in prim-parsing.fs (see also prim-parsing.fsi.

  • Resolving references. For .NET SDK generally references are resolved explicitly by external tooling. There is a legacy aspect to this if references use old .NET Framework references including for scripting. See ReferenceResolver.fs for the abstract definition of compiler reference resolution. See LegacyMSBuildReferenceResolver.fs for reference resolution used by the .NET Framework F# compiler when running on .NET Framework. See SimulatedMSBuildReferenceResolver.fs when not using the .NET Framework F# compiler. See DependencyManager for reference resolution and package management used in fsi.

  • Importing referenced .NET binaries, see import.fsi/import.fs. Accepts file references and produces a Typed Tree node for each referenced assembly, including information about its type definitions (and type forwarders if any).

  • Importing referenced F# binaries and optimization information as Typed Tree data structures, see TypedTreePickle.fs. Accepts binary data and produces Typed Tree nodes for each referenced assembly, including information about its type/module/function/member definitions.

  • Sequentially type checking files, see CheckDeclarations.fsi/CheckDeclarations.fs. Accepts an AST plus a type checking context/state and produces new Typed Tree nodes incorporated into an updated type checking state, plus additional Typed Tree Expression nodes used during code generation. A key part of this is checking syntactic types and expressions, see CheckExpressions.fsi/CheckExpressions.fs including the state held across the checking of a file (see TcFileState) and the environment active as we traverse declarations and expressions (see TcEnv).

  • Pattern match compilation, see PatternMatchCompilation.fsi/PatternMatchCompilation.fs. Accepts a subset of checked Typed Tree nodes representing F# pattern matching and produces Typed Tree expressions implementing the pattern matching. Called during type checking as each construct involving pattern matching is processed.

  • Constraint solving, see ConstraintSolver.fsi/ConstraintSolver.fs. A constraint solver state is maintained during type checking of a single file, and constraints are progressively asserted (i.e. added to this state). Fresh inference variables are generated and variables are eliminated (solved). Variables are also generalized at various language constructs, or explicitly declared, making them "rigid". Called during type checking as each construct is processed.

  • Post-inference type checks, see PostInferenceChecks.fsi/PostInferenceChecks.fs. Called at the end of type checking/inference for each file. A range of checks that can only be enforced after type checking on a file is complete, such as analysis when using byref<'T> or other IsByRefLike structs.

  • Quotation translation, see QuotationTranslator.fsi/QuotationTranslator.fs/QuotationPickler.fsi/QuotationPickler.fs. Generates the stored information for F# quotation nodes, generated from the Typed Tree expression structures of the F# compiler. Quotations are ultimately stored as binary data plus some added type references. "ReflectedDefinition" quotations are collected and stored in a single blob.

  • Optimization phases, primarily the "Optimize" (peephole/inlining) and "Top Level Representation" (lambda lifting) phases, see Optimizer.fsi/Optimizer.fs and InnerLambdasToTopLevelFuncs.fsi/InnerLambdasToTopLevelFuncs.fs and LowerCalls.fs. Each of these takes Typed Tree nodes for types and expressions and either modifies the nodes in place or produces new Typed Tree nodes. These phases are orchestrated in CompilerOptions.fs

  • Code generation, see IlxGen.fsi/IlxGen.fs. Accepts Typed Tree nodes and produces Abstract IL nodes, sometimes applying optimizations.

  • Abstract IL code rewriting, see EraseClosures.fs and EraseUnions.fs. Eliminates some constructs by rewriting Abstract IL nodes.

  • Binary emit, see ilwrite.fsi/ilwrite.fs.

  • Reflection-Emit, see ilreflect.fs.

These and transformations used to build the following:

Bootstrapping

The F# compiler is bootstrapped. That is, an existing F# compiler is used to build a "proto" compiler from the current source code. That "proto" compiler is then used to compile itself, producing a "final" compiler. This ensures the final compiler is compiled with all relevant optimizations and fixes.

FSharp.Build

FSharp.Build.dll and Microsoft.FSharp.targets give MSBuild support for F# projects (.fsproj) and contain the targets. Although not strictly part of the F# compiler, they are essential for using F# in all contexts for .NET, aside from some more targeted scripting scenarios. The targets expose things like the CoreCompile and Fsc tasks called by MSBuild.