This crate contains utilities used to generate code for parsing and compiling various font tables. For an in-depth overview of what code we generate and how it works, see the codegen-tour document.
The basics:
- Inputs live in
resources/codegen_inputs
. - To run the code generator:
where
# Rebuild all the things (normal use case) $ cargo run --bin=codegen resources/codegen_plan.toml # Process a single file $ cargo run --bin=codegen file $mode $input $ cargo run --bin=codegen file parse resources/codegen_inputs/cmap.rs > read-fonts/generated/generated_cmap.rs
$input
is the path to an input file, and$mode
is one of 'parse' or 'compile', and which will generate the code corresponding to theread-fonts
orwrite-fonts
crate, respectively. This will print the output tostdout
; you can redirect it elsewhere as desired. - But inputs are more commonly run through a 'codegen plan', which describes the
inputs and their destinations. The default plan lives in
resources/codegen_plan.toml
. - outputs are written into
$crate/generated/generated_$name.rs
(where$crate
is one ofread-fonts
orwrite-fonts
.) - these output files (which are not in the module tree) are included with the
include!
macro into a corresponding module, generally in$crate/src/tables/$name.rs
.
- Create a new codegen input file in
resources/codegen_inputs
. The name of this file is not important, but in general it should be the name of the corresponding table in the spec. Each top-level table (table with a tag) gets its own file. To assist with creating this file, you may use the preprocessor; see below. - Add a task in
resources/codegen_plan.toml
to generate an output inread-fonts/generated
. - Add a module corresponding to the new table to the
read-fonts
crate. In general this means adding a new file inread-fonts/src/tables
, and adding an entry inread-fonts/src/tables.rs
. The module shouldinclude!
the generated file. - Run the codegen tool, with
$ cargo run --bin=codegen resources/codegen_plan.toml
, and runcargo check
to see if there are any errors. - If there are any errors, add attributes as to your table as appropriate. Look at other tables for examples.
- Update
read-fonts/src/table_provider.rs
to provide a getter for your table. - Update
otexplorer
to add support for your table. Run theotexplorer
tool, and ensure it is producing reasonable output. - Repeat this process for the
write-fonts
crate.
It is possible that in adding a table you will need to modify the codegen code itself, for instance to add a new attribute.
This can be a fiddily process. In general, the workflow is something like this:
- Update
codegen_inputs/test.rs
to include an input matching the input you are trying to support. - Make a modification to the codegen code.
- Run
$ cargo run --bin=codegen resources/test_plan.toml && cargo test
to see if the generated code compiles, and inspect to see that it is working as intended. - repeat the edit/test cycle until you are satisfied.
To speed up writing of the codegen inputs, there is a preprocessor, which takes a simple text input and does basic reformatting into the expected input format.
The text in the preprocessor inputs (which live in resources/raw_tables
) is
copied directly from the Microsoft OpenType® docs; it is then
augmented with links to the original documentation, and a few basic annotations
to indicate the type of the object (record/table/flags/enums)
Inputs to the preprocessor look like this:
/// an optional comment for each top-level item
@table Gpos1_0
uint16 majorVersion Major version of the GPOS table, = 1
uint16 minorVersion Minor version of the GPOS table, = 0
Offset16 scriptListOffset Offset to ScriptList table, from beginning of GPOS table
Offset16 featureListOffset Offset to FeatureList table, from beginning of GPOS table
Offset16 lookupListOffset Offset to LookupList table, from beginning of GPOS table
/// Part of [Name1]
@record LangTagRecord
uint16 length Language-tag string length (in bytes)
Offset16 langTagOffset Language-tag string offset from start of storage area (in bytes).
/// [Axis value table flags](https://docs.microsoft.com/en-us/typography/opentype/spec/stat#flags).
@flags(u16) AxisValueTableFlags
0x0001 OLDER_SIBLING_FONT_ATTRIBUTE If set, this axis value table provides axis value information
0x0002 ELIDABLE_AXIS_VALUE_NAME If set, do something else
@enum(u16) GlyphClassDef
1 Base Base glyph (single character, spacing glyph)
2 Ligature Ligature glyph (multiple character, spacing glyph)
3 Mark Mark glyph (non-spacing combining glyph)
4 Component Component glyph (part of single character, spacing glyph)
- all objects are separated by a newline, and begin with
@OBJECT_TYPE
. - record & table are currently interchangeable, but this may change, and you should follow the spec.
- enum & flags require an explicit format
- this does not handle lifetimes, which will need to be added manually
- it also does not add annotations, which are necessary in any non-trivial case.
- you will generally need to do some cleanup.
run this like,
$ cargo run --bin preprocessor resources/raw_tables/my_table.txt > resources/codegen_inputs/my_table.rs
The codegen tool reads in a file in rust-like syntax, and generates the final rust source.
To run the tool on a single input:
# cargo run --bin=codegen resources/codegen_inputs/my_table.rs
This will write the generated source to stdout; you can redirect it as desired.
Codegen inputs can be annotated with various table and field attributes that inform how the code is generated. These use the same syntax as proc-macro attributes.
The following annotations are supported on top-level objects:
#[skip_font_write]
: if present, we will not generate aFontWrite
implementation for this type. This is useful if a type needs some manual processing before it can be compiled.#[skip_from_obj]
: if present, we will not generate aFromObjRef
implementation for this type.#[read_args(name: type,+)]
if present, this type will be given an implementation ofFontReadWithArgs
, expecting the provided arguments. The provided names will be available to other attributes on this type, as if they were fields on the type itself.#[generic_offset(T)]
Indicate that this type contains an offset with a generic target, for which we will add aPhantomData
field. This is is used for common tables that contain offsets which point to different concrete types depending on the containing table, such as theLayout
subtable shared between GPOS and GSUB.#[write_fonts_only]
Indicate that this table should only be generated forwrite-fonts
(i.e. should be ignored inread-fonts
).#[validate(method)]
Provide a method to perform additional pre-compilation validation for this type. The method must be manually implemented on the type, with the signaturefn(&self, &mut ValidationCtx)
.
#[nullable]
: only allowed on offsets or arrays of offsets, and indicates that this field is allowed to be null. This changes the behaviour of getters, as well as validation and compilation code.#[since_version(version)]
: indicates that a field only exists in a given version of the table. Theversion
may be either a single integer literal (#[since_version(1)]
), or a major.minor pair (#[since_version(1.1)]
).#[if_flag($field, Flags::SOME_FLAG)]
: indicates that a given field is only present if a particular flag is set on the named field. The field is expected to be a bitset with acontains
method.#[if_cond($field, Flags::SOME_FLAG_A, Flags::SOME_FLAG_B, ...)]
: indicates that a given field is only present if at least one of the listed flags is set on the named field. The field is expected to be a bitset with acontains
method.#[skip_getter]
: if present, we will not generate a getter for this field. Used on things like padding fields.#[offset_getter(method name)]
: only allowed on offsets or arrays of offsets. If present, we will not generate a method that resolves this offset, but will instead expect that one will be implemented manually, and will have the provided name.#[offset_data(method name)]
: only on offset fields. If present, the provided 'method name' must be implemented, and must returnFontData
that will be used to resolve this offset. Used in places where offsets are not resolved from the base of the containing table. Uncommon.#[offset_adjustment(expr)]
: related to the above, but for encoding: the provided expression must evaluate to au32
, which will be subtracted from the computed offset during compilation.#[version]
: May only be supplied for one field. If present, this field is treated as the 'version', used when determining the availability of versioned fields.#[format = x]
: Indicates that this field is the format field of a multi-format table, and that it has the provided format value.#[count(arg)]
and#[count(fn_name(arg, +))]
: This annotation has two forms. The simple form accepts a single argument, which can be either the token..
(meaning all remaining data, and only valid on the last field in a table), the name of a field (preceded by the$
token) or a literal integer. The less-simple form begins with a function identifier, and then one or more arguments, comma separated. Currently accepted function identifiers are 'add', 'subtract', 'add_multiply', 'multiply_add', 'half', 'map_delta_size', and 'delta_value_count'.#[compile(arg)]
: If present, this field will not be included in the compile type. The value may be either the literal 'skip', or an expression that evaluates to the field's type: the skip case is only expected in cases where there is a manualFontWrite
impl, and the field does not make sense on the compile type.#[compile_with(method_name)]
: Specify custom compilation behaviour. This attribute lets you name a method that will be called to get some type that will be used to compile this field. This may be any type that implements theFontWrite
trait; this can be used in cases where the logic to compile a given type requires some custom implementation.#[compile_type(type)]
: specify an alternate type to be used in the struct generated for this type.#[default(expr)]
: specify a value that will be used in the implementation ofDefault
for the containing type. Unlike with#[compile]
, this value is set when the type is created, and can be manually modified by the user.#[read_with(args,+)]
: specify that this field's type needs to be read withFontReadWithArgs
, and passed the provided args. Args is a comma separated list of fields or input args to the type.#[read_offset_with(args,+)]
: on offsets or arrays of offsets, indicates that the type referenced by this offset needs to be passed the provided args when it is read.#[validate(arg)]
: arg is either the literal 'skip' or the name of a method. If the name of a method, that method will be called during validation, and can implement custom validation logic.#[traverse_with(method name)]
: uncommon/hacky: provides a method name that will be called in traversal code to get theFieldType
for this field. To skip traversing this field, you can use the 'skip' keyword (#[traverse_with(skip)]
).#[to_owned(expr)]
: uncommon/hacky: provide an expression that will be used inFromObjRef
to convert the parse type to the compile type.
There is also the concept of a 'codegen plan', which is a simple toml file describing a number of different operations to be run in parallel. This is intended to be the general mechanism by which codegen is run.
See ../resources/codegen_plan.toml
for an example.