Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using rustified enums in function signatures is (almost?) always wrong #3051

Open
pvdrz opened this issue Dec 10, 2024 · 8 comments
Open

Using rustified enums in function signatures is (almost?) always wrong #3051

pvdrz opened this issue Dec 10, 2024 · 8 comments

Comments

@pvdrz
Copy link
Contributor

pvdrz commented Dec 10, 2024

So this is an issue that came up from #2908. If you have the following headers:

typedef enum State {Working = 1, Failed = 0} State; 

takes_state(enum State state);
enum State returns_state();

Runing bindgen --rustified-enum=State will produce this code:

#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum State {
    Working = 1,
    Failed = 0,
}
extern "C" {
    pub fn takes_state(state: State);
}
extern "C" {
    pub fn returns_state() -> State;
}

Sadly, using takes_state or returns_state can cause undefined behavior. This is documented behavior but it could easily be improved by adding appropriate conversion methods for State (which is part of the purpose of #2908). Additionally, bindgen could expose the underlying type of the C enum and use this type instead of State.

Essentially, I would expect bindgen to generate:

#[repr(u32)]
#[derive(Debug, Copy, Clone, Hash, PartialEq, Eq)]
pub enum State {
    Working = 1,
    Failed = 0,
}

pub type State_ctype = ::std::os::raw::c_uint;

// This can be optionally generated (#2908)
impl State {
    const unsafe fn from_ctype_unchecked(v: State_ctype) -> Self {
        std::mem::transmute(v)
    }
}

// This can be optionally generated
impl From<State> for State_ctype { 
    fn from(value: State) -> Self {
        value as Self
    }
}

// This can be optionally generated (#2908)
impl TryFrom<State_ctype> for State {
    type Err = InvalidState;
    
    fn try_from(value: State_ctype) -> Result<Self, Self::Err> {
        match value {
            0 => Ok(Self::Working),
            1 => Ok(Self::Failed),
             _ => Err(InvalidState(value)),
        }
    }
}

// Here we no longer use `State` but `State_ctype`.
extern "C" {
    pub fn takes_state(state: State_ctype);
}
extern "C" {
    pub fn returns_state() -> State_ctype;
}

The main advantages of this approach is its flexibility as the user can explicitly opt-in for the conversion methods according to the safety guarantees. If your C library is kind enough to never produce invalid values for an enum, then you can codify that behavior

// Safety: `returns_state` is always safe to call and it is guaranteed to produce a valid `State`.
unsafe { State::from_ctype_unchecked(returns_state())) }

However, if you're unsure, you can deal with this accordingly and expose a Rust function that acknowledges this:

fn returns_state() -> Result<State, <State as TryFrom<State_ctype>>::Err> {
    // Safety: `returns_state` is always safe to call
    unsafe { bindings::returns_state() }.try_into()
}

The additional improvement over the current behavior is whentakes_state and returns_state are used in conjunction:

unsafe { takes_state(returns_state()) }

With the current behavior returns_state can create an invalid value for State, which is UB. Then this is passed to takes_state and "hopefully" all works out. By using the State_ctype all the unsafety would only happen on the C side.

Cc @emilio @jbaublitz @ojeda

@ojeda
Copy link
Contributor

ojeda commented Dec 10, 2024

Is this #2646, but about changing the current --rustified-enum? i.e. is it about changing the existing behavior instead of creating a new kind of enum?

Cc @tgross35 @y86-dev in case they are interested (from https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Enum.20handling)

@pvdrz
Copy link
Contributor Author

pvdrz commented Dec 10, 2024

Is this #2646, but about changing the current --rustified-enum? i.e. is it about changing the existing behavior instead of creating a new kind of enum?

this is only about changing --rustified-enum so it uses the ctype of the enum instead of the enum in function signatures (and probably everywhere else as well?) and exposes the ctype as a type alias.

I think #2646 goes more in the direction of #3050 where you want to generate multiple representations for a single C enum.

@pvdrz
Copy link
Contributor Author

pvdrz commented Dec 10, 2024

Maybe a reasonable way forward here would be to implement first #3050 and then add an extra feature to the Rust enum representations where the ctype is used instead of the enum itself (use_ctype?). So you'd do something like

bindgen --enum-style const,rust(safe_conversion,unsafe_conversion,use_ctype)=<REGEX>

@ojeda
Copy link
Contributor

ojeda commented Dec 10, 2024

(and probably everywhere else as well?)

Yeah, I think it should be done everywhere. Otherwise you may get e.g. a type from C that contains the enum with an outside value.

I think #2646 goes more in the direction of #3050 where you want to generate multiple representations for a single C enum.

So, to clarify, #2646 came from that Zulip thread (https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Enum.20handling) which was about generating conversion functions from a raw "C" type (i.e. without UB) to a Rust enum. This came up because in many cases what we want to end up with is an (exhaustive) Rust enum, but we want to remain sound, yet avoid writing manually all the boilerplate code, with all that it entails (e.g. mistakes over time, since C APIs may change at any time within the kernel).

It is true that the issue mentions both constified_enum and rustified_enum, but I think that is because @tgross35 wanted to explain it was kind of a mix between the two, i.e. we want the raw type alias in signatures etc., similar to what constified_enum does, but we want the Rust enums as well, like rustified_enum, and we want the conversion methods from one to the other, since they can all be autogenerated.

So the fundamental idea is about generating all that boilerplate in bindgen, including conversion functions, to be able to get Rust enums without UB. Now, that can be done with a "new mode" altogether, or changing how the existing ones behave, or allowing for multiple representations, etc. It is not clear what the best approach is, but perhaps we should centralize the discussion in a single place, i.e. on how to provide a way to solve that core issue.

(Please @tgross35 et al correct me if I misremember!)

@tgross35
Copy link
Contributor

tgross35 commented Dec 10, 2024

I don't know what would be best here. I think the most important thing is that there should be some form that uses Rust enums but uses an integer for everything extern. I'm not sure what the feeling is on changing the existing rustified_enum but it does seem like this would have been a better default, if we could go back in time, considering the current state is a pretty quiet yet severe footgun.

From and TryFrom aren't strictly necessary but do make things easier to work with. There probably isn't that much need to have the const integers (e.g. const foo_one: foo = 1; that I included in #2646), that was just for convenience. So I think that what is proposed in the top post here is reasonable, either as a fix to rustified_enum or as its own option.

One note, it is probably a bit cleaner if the C type is just in the Rust enum's impl block.

impl State {
    pub type CType = core::ffi::c_int;
    pub const C_WORKING: Self::CType = 1; // if applicable
}

@y86-dev
Copy link

y86-dev commented Dec 10, 2024

So, to clarify, #2646 came from that Zulip thread (https://rust-for-linux.zulipchat.com/#narrow/channel/288089-General/topic/Enum.20handling) which was about generating conversion functions from a raw "C" type (i.e. without UB) to a Rust enum. This came up because in many cases what we want to end up with is an (exhaustive) Rust enum, but we want to remain sound, yet avoid writing manually all the boilerplate code, with all that it entails (e.g. mistakes over time, since C APIs may change at any time within the kernel).

One important part is that we want to exhaustively match on the enum coming from the C side. Because we want to ensure a compilation error when a C developer adds/removes/renames a variant without checking the rust side. Then we get notified "this doesn't build for me!" instead of silently introducing a bug that will haunt us in the future.

@pvdrz
Copy link
Contributor Author

pvdrz commented Dec 10, 2024

One note, it is probably a bit cleaner if the C type is just in the Rust enum's impl block.

impl State {
pub type CType = core::ffi::c_int;
pub const C_WORKING: Self::CType = 1; // if applicable
}

I think inherent associated types are unstable, so StateCType would be defined globally. The constants themselves would be fine I think (cc @jbaublitz).

I'm curious about the reason to have these raw C_.* constants in the first place. Once you have the rust enum you can do variant as CType to obtain them. Is there any context where that would be too cumbersome?

@tgross35
Copy link
Contributor

One note, it is probably a bit cleaner if the C type is just in the Rust enum's impl block.
impl State {
pub type CType = core::ffi::c_int;
pub const C_WORKING: Self::CType = 1; // if applicable
}

I think inherent associated types are unstable, so StateCType would be defined globally. The constants themselves would be fine I think (cc @jbaublitz).

You're right :( I thought we had those for a while.

I'm curious about the reason to have these raw C_.* constants in the first place. Once you have the rust enum you can do variant as CType to obtain them. Is there any context where that would be too cumbersome?

I was originally just thinking for convenience when calling the C methods with constant values, but indeed this doesn't gain much over casting the Rust type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants