Skip to content

Custom cross check language specification

ahomescu edited this page Oct 24, 2017 · 16 revisions

In many cases, we can add identical cross-checks to the original C and the transpiled Rust code, e.g., when the C code is naively translated to the perfectly equivalent Rust code, and everything just works. However, this might not always be the case, and we need to handle mismatches such as:

  • Type mismatches between C and Rust, e.g., a C const char* (with or without an attached length parameter) being translated to a str. Additionally, if a string+length value pair (with the types const char* and size_t) gets translated to a single str, we may want to omit the cross-check on the length parameter.
  • Whole functions added or removed by the transpiler or refactoring tool, e.g., helpers.

Note that this list is not exhaustive, so there may be many more cases of mismatches.

To handle all these cases, we need a language that lets us add new cross-checks, or modify or delete existing ones.

The cross-check language

In its proposed form, the cross-check metadata would be stored as a JSON (or TOML or some other text format) encoding of an array of configuration entries. Each configuration entry describes the configuration for that specific check, and for every language.

An example configuration file for a function foo with 3 arguments would look something like:

[
  { "c": {                 // Cross-check configuration for the C variant
      "key": "main.c:foo", // Which cross-check this entry is for (see below)
      "args": [            // How to cross-check each argument
         "default",        // Use the default cross-check setting for the first argument
         "skip",           // Skip cross-checking the second argument
         "default"
      ],
      "return": "skip"     // Skip cross-checking the return value
    },
    "rust": {              // Cross-check configuration for the Rust variant
      "key": "main.rs:foo",
      "args": [
         "default",        // The Rust version of foo only has 2 arguments
         "default"
      ],
      "return": "skip"
    }
  }
]

Inline vs external configuration

The simplest way to link each configuration entry to its corresponding cross-check is to write the entries inline in the C/Rust source code. However, this approach could make the Rust code very ugly and difficult to maintain. It would also make it difficult to distribute the Rust code without the associated cross-checks, or distribute them as two separate packages and merge them on the client side.

In the current implementation of the Rust cross-checker, configuration settings are passed to the enclosing scope's #[cross_check] attribute, e.g.:

#[cross_check(yes, name=foo)]
fn bar() { }

#[cross_check(yes, id=0x1234)]
fn baz() { }

An alternative solution is to store all cross-check configuration entries in separate text files that exclusively contain the cross-check information. This solves the aforementioned problems, but comes with its own challenges. When making either manual or automated changes to either the C or Rust code, the corresponding cross-check metadata would have to be kept in sync. Additionally, we would need a mechanism to map configuration entries to their corresponding cross-checks, which is discussed in the next section.

Configuration keys

If we store cross-check configuration separately from the cross-checks themselves, we need a mapping between cross-checks and configuration entries. If we look at this mapping as a key-value store, then the cross-checks are the keys and the entries are values. We have several ways of encoding the keys:

  • If we only place cross-checks at clearly identifiable source code locations, e.g., functions, we can just reference them by name and scope. For example, the cross-check configuration for function foo in file main.c could use the key main.c:foo. This should work well for C code, but might run into issues with C++ or Rust constructs like anonymous namespaces and lambdas.

  • Alternatively, we can use the exact location as the key, i.e., the file name + line number, e.g., main.rs:12. This cannot handle more than one cross-check per line, and also requires keeping the line numbers updated.

  • If we can establish that every function of interest gets at least a local symbol in the object file, we could use the file + symbol name as the key, with or without C++/Rust name mangling.

  • We could assign each cross-check a UUID, and use that as the key. To make this work, we would also need to tag each cross-check in the source code with its UUID, like this:

#[cross_check(uuid="00000000-0000-0000-0000-000000000000")]
fn baz() { }

There are some potential issues with using the paths of source files as components of the keys. All the paths must be relative to either the configuration file, or to some other specified location, e.g., the project top-level directory. The latter case implies that both the C and Rust compilers are aware of this top-level location, so they can use it to build the lookup keys for the cross-checks. Additionally, the compilers need to be able to locate the cross-check configuration files.

Any suggestions and comments on how to handle this are appreciated.