-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to generate safe and unsafe conversions for rustified enums #2908
base: main
Are you sure you want to change the base?
Conversation
@tgross35 @ojeda I could use some guidance with some of the behavior you would like to see in corner cases for this PR. Specifically I'm interested in the cases where constants that get generated even in the rustified case conflict with the constants @tgross35 requested in the issue. I have to investigate further, but I think I encountered a case with anonymous enums where this could theoretically happen. How do we want to address this? Another prefix for the constants? Something else? |
efd0708
to
e2b79ac
Compare
I guess that could be a possibility, |
bindgen-cli/options.rs
Outdated
#[arg(long, value_name = "REGEX")] | ||
rustified_non_exhaustive_enum: Vec<String>, | ||
/// Mark any enum whose name matches the provided regex as a Rust enum. This parameter takes | ||
/// options in the shape REGEX or REGEX=OPTIONS where OPTIONS can be a comma separated list of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this disallows a regex without options but with a =
character, right? I guess we don't need that, but just noting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. If you provide no equal sign, it uses the empty option set.
pub type Planet_ctype = ::std::os::raw::c_uint; | ||
pub const Planet_earth: Planet_ctype = 0; | ||
pub const Planet_mars: Planet_ctype = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the PR is generating the constants for (some of) the existing Rust enum
styles -- not sure if that is what others were expecting or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so are you saying that a non exhaustive rustified enum shouldn't generate constants? I'm happy to only do it if either try_from_raw
or from_raw_unchecked
is enabled.
8c41d55
to
52851c7
Compare
So I think the only CI failure remaining is the case where the generated constant for rustified enums conflicts with an existing constant already generated by bindgen in previous versions. This may be resolved by your PR revision comment of only generating constants in cases where either |
390527d
to
699f11c
Compare
It is up to the maintainers (I don't really know what they would prefer or their policies! :) -- I added that comment mainly because I was surprised by all the tests changing, i.e. I thought we were adding a new feature/mode/... that wouldn't necessarily change existing output/users. In any case, if we are hitting a case here when the constants are added, does it mean that, if you enable |
The test that is consistently failing is the |
699f11c
to
797ac8b
Compare
@jbaublitz I think I'm still leaning towards the option of putting those constants as associated constants under the |
@pvdrz I'll give it a shot. Part of the problem there is just a naming problem. For example, in most cases the variant of the enum will be named the same thing as the constant without some sort of suffix or prefix. Because they're constants it may just be sufficient to rely on the case being different if we convert all of the constants to upper case in keeping with the Rust const case convention. However, if users decide to define an enum variant as upper case, I forsee some conflicts happening. I'll try running my initial try through CI and I'll see what fails. |
What is the type generated in functions receiving/returning each kind of There is a |
Both2 = -1, | ||
Both3, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a good idea to add a few functions here, e.g. enum Plain f(enum Plain);
and so on, to double-check the generated type in expectations
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, it appears after adding some tests that it is converting it to the type that will result in undefined behavior. I'll have to take a look at how to avoid that. I think the code to handle that is in a place I haven't touched, so I'll have to track down where that's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I feared, yeah. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is what you're recommending only doing this in the case of TryFrom generation like with the constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean -- do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, one alternative to roll this change slowly would be to add another flag like --use-enum-ctypes-on-fns
or --return-enum-ctypes
(name is a WIP 🤣) that when enabled, makes all functions return the ctype instead of the rustified enum. We could keep it disabled by default and eventually enable it by default as unsoundness is a pretty good basis for it.
@jbaublitz @ojeda what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've been thinking something: Let's say we have two functions (...)
How would a user call takes_plain
with a value that is not in the Rust-side enum
? i.e. a C function may need to be used with more values too. Of course, perhaps the bindgen
user should avoid this kind of enum
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a user call
takes_plain
with a value that is not in the Rust-sideenum
? i.e. a C function may need to be use with more values too.
I think the only case where a Rust user would call takes_plain
with a value outside the rustified enum would be if there is a common pattern in the C library like:
// This value could be outside of `Plain`
enum Plain value = returns_plain();
// But that's OK because this function already assumes that this could happen
// and preemptively checks that internally
takes_plain(value);
So doing that from the rust side would be weird as you'd have:
unsafe extern "C" {
fn returns_plain() -> Plain_ctype;
fn takes_plain(plain: Plain);
}
// This value could be outside of `Plain`
let value: Plain_ctype = unsafe { returns_plain() };
// But here, we have to do an unsafe transmute that might cause UB
// even though the call to the C function would be fine.
unsafe { takes_plain(std::mem::transmute(value)) };
However, if you just use the Plain_ctype
everywhere, that's not cumbersome anymore
unsafe extern "C" {
fn returns_plain() -> Plain_ctype;
fn takes_plain(plain: Plain_ctype);
}
// This value could be outside of `Plain`
let value: Plain_ctype = unsafe { returns_plain() };
// No weird transmute
unsafe { takes_plain(value) };
Of course, perhaps the
bindgen
user should avoid this kind ofenum
in this case.
Yes maybe in that case you'd be better not using --rustified-enum
at all. However, I could see a case where you'd want to offer both the rustified enum and safe wrappers to those functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My previous comment was intended to reply to your "two functions", rather than the --use-enum-ctypes-on-fns
one; our messages crossed :)
However, if you just use the
Plain_ctype
everywhere, that's not cumbersome anymore
Exactly, that is why I think using the Rust enum
for parameters may not be great, unless the expectation is that users avoid it in some cases. So they would probably still not use it as their "default" or "catch all" case, i.e. --... *
.
However, I could see a case where you'd want to offer both the rustified enum and safe wrappers to those functions.
Hmm... What do you mean by safe wrappers? i.e. the raw functions would still need to be marked unsafe, even if we took Rust enum
s, so the user would still need to wrap it.
Having said that, I would like to eventually have a [[safe]]
attribute on the C side that allows us to do things like that -- for the kernel for instance, we are now at the point were starting to add annotations on the C side for several things may be OK for some subsystems at least. Similarly, we could have a [[enum ...]]
thing that gives bindgen
this information without having to maintain the lists outside the source code. I don't know if nowadays libclang
preserves unknown attributes -- if so, we could already do it without having to modify and wait for Clang.
The other extreme is to have an "easy-to-use" mode, where by default all is provided: the raw C type and its constants, plus a Rustified enum
type, plus a non-exhaustive one, plus the conversion functions (fallible, infallible and unsafe) to convert to both of those... Then users can use whatever they think it is best for each enum
, without having to know about the bindgen
kinds, and it is sound. Then when they are sure how they use each enum
, they can restrict it to one kind via the arguments (or via attributes if we ever get there) so that they avoid bindgen
generating code that will not be needed. So it would be easy to start using, and allow people to customize later. But it would be more "expensive" by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(My previous comment was intended to reply to your "two functions", rather than the
--use-enum-ctypes-on-fns
one; our messages crossed :)However, if you just use the
Plain_ctype
everywhere, that's not cumbersome anymoreExactly, that is why I think using the Rust
enum
for parameters may not be great, unless the expectation is that users avoid it in some cases. So they would probably still not use it as their "default" or "catch all" case, i.e.--... *
.
Yeah I think that the extra convenience of using the rustified enum would only work if you want to expose the enum itself. And if you want to expose safe bindings for functions that use the c enum, then you have to do those yourself accordingly to the invariants of the library itself.
However, I could see a case where you'd want to offer both the rustified enum and safe wrappers to those functions.
Hmm... What do you mean by safe wrappers? i.e. the raw functions would still need to be marked unsafe, even if we took Rust
enum
s, so the user would still need to wrap it.
That's true, even if the function takes the rustified enum then you still have to figure out its safety requirements and expose it accordingly. So you'd still have to do things like
mod bindings {
unsafe extern "C" {
pub fn takes_plain(plain: Plain);
}
}
/// Safety: It is safe, believe me.
pub fn takes_plain(plain: Plain) {
unsafe { bindings::takes_plain(plain) }
}
So, having to add an extra plain.to_raw_ctype()
or whatever to obtain the C value seems like little effort. Maybe @emilio or @kulp have a better idea on why bindings of functions that take C enums become functions that take Rustified enums instead of using the integer type directly as they have been around for longer.
Having said that, I would like to eventually have a
[[safe]]
attribute on the C side that allows us to do things like that -- for the kernel for instance, we are now at the point were starting to add annotations on the C side for several things may be OK for some subsystems at least. Similarly, we could have a[[enum ...]]
thing that givesbindgen
this information without having to maintain the lists outside the source code. I don't know if nowadayslibclang
preserves unknown attributes -- if so, we could already do it without having to modify and wait for Clang.
That sounds like a sensible addition. In the meantime, it is already possible to embed html-like tags in the comments of the headers and bindgen can read those to introduce custom behavior.
The other extreme is to have an "easy-to-use" mode, where by default all is provided: the raw C type and its constants, plus a Rustified
enum
type, plus a non-exhaustive one, plus the conversion functions (fallible, infallible and unsafe) to convert to both of those... Then users can use whatever they think it is best for eachenum
, without having to know about thebindgen
kinds, and it is sound. Then when they are sure how they use eachenum
, they can restrict it to one kind via the arguments (or via attributes if we ever get there) so that they avoidbindgen
generating code that will not be needed. So it would be easy to start using, and allow people to customize later. But it would be more "expensive" by default.
Yeah ideally you should be able to decide if a C enum should be translated in different ways without interference between each representation. I think the API would need some bike-shedding, specially from the CLI side where there isn't much flexibility. Maybe the idea of having a bindgen.toml
file where we can store more structured information would be useful here as you could do something like
[[enum_style]]
regex = "Foo"
rustified = { generate = true, try_from_raw = true, from_raw_unchecked = true }
constified = true
Then anything that matches the Foo
regex would get all of those immediately. Maybe a CLI equivalent could be --enum-style "Foo=rustified(try_from_raw,from_raw_unchecked),constified"
It would also be nice to be able to replace the multitude of --newtype-enum
--newtype-global-enum
, --constified-enum
, etc, by a single option that can be extended. This would also give @jbaublitz some flexibility as it wouldn't be necessary to include the raw type constants with this feature as those would be enabled automatically if you pass the extra constified
style.
} | ||
} | ||
impl FromRawUnchecked { | ||
pub type FromRawUnchecked_ctype = ::std::os::raw::c_uint; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbaublitz maybe this type should just be ctype
as it is already inside the impl
block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just read that having type
on non-trait impl
blocks is unstable: rust-lang/rust#8995
So sadly we have to keep this outside the impl
block without changing the name.
} | ||
impl FromRawUnchecked { | ||
pub type FromRawUnchecked_ctype = ::std::os::raw::c_uint; | ||
pub const FRU1: FromRawUnchecked_ctype = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then here the type could be just FromRawUnchecked::ctype
or even Self::ctype
.
Edit: ignore (see #2908 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make sense to add a suffix to the names of these constants (like _raw
) to avoid misuses
@jbaublitz @ojeda the discussion we've had so far here makes me think that we need to figure out a better way to integrate this feature as it has revealed some problems we have with the current enum representations we handle in bindgen. I'm going to draft a proposal on how to rework the enum-style API bindgen has to address some of the shortcomings we've found here and if that proposal makes it into Thanks for all your effort here, I've really enjoyed the discussion and I hope we can get this shipped soon. |
Okay sounds good! Do you want me to address the concerns you brought up while I wait? |
So, for now I think the safest would be to just remove the raw ctype constants from this implementation. I think you can still access them by doing the respective Regarding the function signatures, I wouldn't touch that on this PR either. We can introduce that change later. |
39f1b4a
to
9012360
Compare
Just a note that I've spent today trying to track down this failure, but I've been unable to reproduce it on my version of libclang. Tomorrow I'm going to try to reproduce on the version of libclang in CI. |
9012360
to
8581938
Compare
Okay @pvdrz, it's passing CI and I think I've made all of the requested changes. Any additional feedback or is this good to go? |
Closes #2646