Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Revamp interface for enum representations #3050

Open
pvdrz opened this issue Dec 10, 2024 · 0 comments
Open

[RFC] Revamp interface for enum representations #3050

pvdrz opened this issue Dec 10, 2024 · 0 comments

Comments

@pvdrz
Copy link
Contributor

pvdrz commented Dec 10, 2024

Summary

Introduce a new, extendable interface to pick multiple enum representations for the same type. In a nutshell we would introduce a new --enum-style CLI argument with its equivalent Builder method which would supersede the already existing --constified-enum-module, --bitfield-enum, --newtype-enum, --newtype-global-enum, --rustified-enum and --rustified-non-exhaustive-enum CLI arguments and their respective Builder method counterparts.

This interface would allow users to pick multiple enum representations for a single C enum and to enable different features for each representation.

Motivation

The main motivation is the lack of a "silver bullet" representation for C enums in Rust.

Currently we have two (#2908 and #2980) upcoming enum representations that interact with the already existing representation in non-trivial ways. In particular, #2908 introduces extensions to the already --rustified-enum representation, these extensions generate safe and unsafe conversions between the C enum values and the "rustified" enum values to avoid unsoundness issues.

At the same time, the existing interface has become increasingly bloated, as each new representation requires the addition of a new CLI flag and method, even when it's essentially an old representation with just an extra feature. An example of this is, --newtype-enum and --newtype-global-enum, where the only difference is the namespacing of the constants for each variant.

Guide-level explanation

Bindgen can map C/C++ enums into Rust in different ways. The way bindgen maps enums can be customized using the Builder::enum_style method, which receives a sequence of EnumVariations and a regex pattern:

impl Builder {
    /// Apply the provided representations to the C enums whose name matches
    // the provided regex pattern.
    pub fn enum_style<I, P>(
        mut self,
        representations: I,
        pattern: P,
    ) -> Self
    where
        I: IntoIterator<Item=EnumRepresentation>,
        P: AsRef<str>; 
}

/// This is just `EnumVariations` with a new name for clarity.
pub enum EnumRepresentation {
    /// Represent a C enum using a Rust enum.
    Rust {
        /// Indicates whether the Rust enum should be `#[non_exhaustive]`.
        non_exhaustive: bool,
    },
    /// Represent a C enum using a newtype over the enum's ctype.
    NewType {
        /// Indicates whether the newtype will have bitwise operators.
        bitfield: bool,
        /// Indicates if the variants will be represented as global
        /// constants instead of being inside an `impl` block of the newtype.
        global: bool,
    },
    /// Represent a C enum using a ctype constant for each variant.
    Const {
        /// The generated constants will be inside a module with the same name
        /// as the enum.
        module: bool
    },
}

When this method is used, bindgen will generate the provided representation for each C enum whose name matches the provided regex pattern.

This interface has a CLI equivalent under the --enum-style. Which takes arguments of the form <REPRS>=<REGEX>. Where <REGEX> is a regex pattern and <REPRS> is a comma-separated sequence of enum representations. Each enum representation consists of a name optionally followed by a comma-separated list of features:

  • rust(non_exhaustive?): See EnumRepresentation::Rust.
  • newtype(bitfield?, global?): See EnumRepresentation::NewType.
  • const(module?): See EnumRepresentation::Const.

Reference-level explanation

This feature would be fairly self contained and its only interaction would be with the already existing enum representation features.

Internally, RegexSets for each enum representation would still be stored separatedly. However, a declarative macro would be used to generate both EnumRepresentation and a constant slice with all the possible values that EnumRepresentation could have. With the current representations that would be:

const ALL_REPRS: &[EnumRepresentation] = &[
    Rust { non_exhaustive: false },
    Rust { non_exhaustive: true},
    NewType { bitfield: false, global: false },
    NewType { bitfield: true, global: false },
    NewType { bitfield: false, global: true },
    NewType { bitfield: true, global: true },
    Const { module: false },
    Const { module: true },
]; 

This constant would allow us to generate another slice of type &[(EnumRepresentation, RegexSet)] which would replace all the existing fields of BindgenOptions related to enum representation as iterating over it would allow us to choose the right representation for each enum.

With this approach adding a new feature to the existing representation or adding a new representation would require less changes and should be easier to maintain.

Drawbacks

The main drawback is the fact that this is a breaking change, as it would deprecate the existing interface for enum representation.

Another drawback is related to allowing multiple representations for a single C enum, as the current behavior of bindgen is to choose one documented option by default. This is, if the user calls bindgen with --rustified-enum Foo and --constified-enum Foo, only the Rust representation for Foo will be chosen by bindgen. With the new interface --enum-style rust,const=Foo, both representations would be generated. Which is a breaking change and might cause unexpected behavior on users that rely on bindgen choosing one of the two.

Additionally, allowing multiple representations for a single C enum would make bindgen more likely to generate invalid Rust code, for example, calling bindgen with --enum-style rust,newtype=Foo would produce both a Rust enum and a Newtype for Foo, which would cause a name collision.

Finally, the heavy reliance on macros makes this code less intuitive for new contributors.

Rationale and alternatives

An alternative would be simply to not implement this RFC, which would keep the enum representation interface prone to bloat and increasingly difficult to maintain.

Unresolved questions

Currently it is not clear if we should prevent the generation of invalid Rust code by adding extra checks which guarantee that incompatible representations won't be used for the same C enum.

Future possibilities

The advantages of this design are the ease to extend it to new representations or new features. Examples of this, are #2908 and #2980, which could be integrated by adding new fields to EnumRepresentation::Rust and by adding a new variant to EnumRepresentation respectively.

This interface would be easily representable if bindgen were to adopt a configuration format like TOML, as enum styles could be represented by arrays:

[[enum-style]]
pattern = "foo"
representations = ["rust", "const"]

cc @emilio @jbaublitz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant