From 851817340b281e649d160e96e7250cbb90cd8543 Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Thu, 24 Oct 2024 14:18:24 -0700 Subject: [PATCH 1/9] Initial writeup --- text/0000-layout-packed-aligned.md | 176 +++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100644 text/0000-layout-packed-aligned.md diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md new file mode 100644 index 00000000000..7274da7fa32 --- /dev/null +++ b/text/0000-layout-packed-aligned.md @@ -0,0 +1,176 @@ +- Feature Name: `c_layout_packed_aligned` +- Start Date: 2024-10-24 s +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/100743) + +# Summary +[summary]: #summary + +This RFC makes it legal to have `#[repr(C)]` structs that are: +- Both packed and aligned. +- Packed, and transitively contains`#[repr(align)]` types. + +It also introduces `#[repr(system)]` which is designed for interoperability with operating system APIs. +It has the same behavior as `#[repr(C)]` except on `*-pc-windows-gnu` targets where it uses the msvc layout +rules instead. + +# Motivation +[motivation]: #motivation + +This RFC enables the following struct definitions: + +```rs +#[repr(C, packed(2), align(4))] +struct Foo { // Alignment = 4, Size = 8 + a: u8, // Offset = 0 + b: u32, // Offset = 2 +} +``` + +This is commonly needed when Rust is being used to interop with existing C and C++ code bases, which may contain +unaligned types. For example in `clang` it is possible to create the following type definition, and there is +currently no easy way to create a matching Rust type: + +```cpp +struct __attribute__((packed, aligned(4))) MyStruct { + uint8_t a; + uint32_t b; +}; +``` + +Currently `#[repr(packed(_))]` structs cannot transitively contain #[repr(align(_))] structs due to differing behavior between msvc and gcc/clang. +However, in most cases, the user would expect `#[repr(C)]` to produce a struct layout matching the same type as defined by the current target. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +## `#[repr(C)]` +When `align` and `packed` attributes exist on the same type, or when `packed` structs transitively contains `align` types, +the resulting layout matches the current compilation target. + +For example, given: +```c +#[repr(C, align(4))] +struct Foo(u8); +#[repr(C, packed(1))] +struct Bar(Foo); +``` +`align_of::()` would be 4 for `*-pc-windows-msvc` and 1 for everything else. + + +## `#[repr(system)]` +When `align` and `packed` attributes exist on the same type, or when `packed` structs transitively contains `align` types, +the resulting layout matches the current compilation target. + +For example, given: +```c +#[repr(C, align(4))] +struct Foo(u8); +#[repr(C, packed(1))] +struct Bar(Foo); +``` +`align_of::()` would be 4 for `*-pc-windows-msvc` and `*-pc-windows-gnu`. It would be 1 for everything else. + + + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +In the following paragraphs, "Decreasing M to N" means: +``` +if M > N { + M = n +} +``` + +"Increasing M to N" means: +``` +if M < N { + M = N +} +``` + + +`#[repr(align(N))]` increases the base alignment of a type to be N. + +`#[repr(packed(M))]` decreases the alignment of the struct fields to be M. Because the base alignment of the type +is defined as the maximum of the alignment for any fields, this also has the indirect result of decreasing the base +alignment of the type to be M. + +When the align and packed modifiers are applied on the same type as `#[repr(align(N), packed(M))]`, +the alignment of the struct fields are decreased to be M. Then, the base alignment of the type is +increased to be N. + +When a `#[repr(packed(M))]` struct transitively contains a field with `#[repr(align(N))]` type, +- The field is first `pad_to_align`. Then, the field is added to the struct with alignment decreased to M. The packing requirement overrides the alignment requirement. (GCC, `#[repr(Rust)]`, `#[repr(C)]` on gnu targets, `#[repr(system)]` on non-windows targets) +- The field is added to the struct with alignment increased to N. The alignment requirement overrides the packing requirement. (MSVC, `#[repr(C)]` on msvc targets, `#[repr(system)]` on windows targets) + +# Drawbacks +[drawbacks]: #drawbacks + +Historically the meaning of `#[repr(C)]` has been somewhat ambiguous. When someone puts `#[repr(C)]` on their struct, their intention could be one of three things: +1. Having a target-independent and stable representation of the data structure for storage or transmission. +2. FFI with C and C++ libraries compiled for the same target. +3. Interoperability with operating system APIs. + +Today, `#[repr(C)]` is being used for all 3 scenarios because the user cannot create a `#[repr(C)]` struct with ambiguous layout between targets. However, this also means +that there exists some C layouts that cannot be specified using `#[repr(C)]`. + +This RFC addresses use case 2 with `#[repr(C)]` and use case 3 with `#[repr(system)]`. For use case 1, people will have to seek alternative solutions such as `crABI` or +protobuf. However, it could be a footgun if people continue to use `#[repr(C)]` for use case 1. + + + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This RFC clarifies that: +- `repr(C)` must interoperate with the C compiler for the target. +- `repr(system)` must interoperate with the operating system APIs for the target. +- Similiar to Clang, `repr(C)` does not guarantee consistent layout between targets. + +Alternatively, we can also create syntax that allows the user to specify exactly which semantic to use when packed structs transitively contains aligned fields. +For example, a new attribute: #[repr(align_override_packed(N))] that can be used when the behavior of the child overriding the parent alignment is desired. + +#[repr(align(N))] #[repr(packed)] can be used together to get the opposite behavior, parent/outer alignment wins. + +Explicitly specifying the pack/align semantic has the drawback of complicating FFI. For example, you might need two different definition files depending on the target. + +Therefore, a stable layout across compilation target should be relegated as future work. + + + + +# Prior art +[prior-art]: #prior-art + +Clang matches the Windows ABI for `x86_64-pc-windows-msvc` and matches the GCC ABI for `x86_64-pc-windows-gnu`. + +MinGW always uses the GCC ABI. + +We already have both `C` and `system` [calling conventions](https://doc.rust-lang.org/beta/nomicon/ffi.html#foreign-calling-conventions) +to support differing behavior on `x86_windows` and `x86_64_windows`. + + +This issue was introduced in the [original implementation](https://github.com/rust-lang/rust/issues/33158) of `#[repr(packed(N))]` and have since underwent extensive community discussions: +- [#[repr(align(N))] fields not allowed in #[repr(packed(M>=N))] structs](https://github.com/rust-lang/rust/issues/100743) +- [repr(C) does not always match the current target's C toolchain (when that target is windows-msvc)](https://github.com/rust-lang/unsafe-code-guidelines/issues/521) +- [repr(C) is unsound on MSVC targets](https://github.com/rust-lang/rust/issues/81996) +- [E0587 error on packed and aligned structures from C](https://github.com/rust-lang/rust/issues/59154) +- [E0587 error on packed and aligned structures from C (bindgen)](https://github.com/rust-lang/rust-bindgen/issues/1538) +- [Support for both packed and aligned (in repr(C)](https://github.com/rust-lang/rust/issues/118018) +- [bindgen wanted features & bugfixes (Rust-for-Linux)](https://github.com/Rust-for-Linux/linux/issues/353) +- [packed type cannot transitively contain a #[repr(align)] type](https://github.com/rust-lang/rust-bindgen/issues/2179) +- [structure layout using __aligned__ attribute is incorrect](https://github.com/rust-lang/rust-bindgen/issues/867) + + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +None for now. + + +# Future possibilities +[future-possibilities]: #future-possibilities + +People intending for a stable struct layout consistent across targets would be directed to use `crABI`. From 3d4419820a3a2bd1f88b8855f8f2ab9415b56bb9 Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Thu, 24 Oct 2024 14:27:33 -0700 Subject: [PATCH 2/9] Update PR numbers --- text/0000-layout-packed-aligned.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 7274da7fa32..693af102253 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -1,7 +1,7 @@ -- Feature Name: `c_layout_packed_aligned` -- Start Date: 2024-10-24 s -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/100743) +- Feature Name: `layout_packed_aligned` +- Start Date: 2024-10-24 +- RFC PR: [rust-lang/rfcs#3718](https://github.com/rust-lang/rfcs/pull/3718) +- Rust Issue: [rust-lang/rust#100743](https://github.com/rust-lang/rust/issues/100743) # Summary [summary]: #summary From e77c6be23d218a0202b156acfbc9f02cc206e86a Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Thu, 24 Oct 2024 14:30:02 -0700 Subject: [PATCH 3/9] update --- text/0000-layout-packed-aligned.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 693af102253..9c88b726f62 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -46,7 +46,7 @@ However, in most cases, the user would expect `#[repr(C)]` to produce a struct l ## `#[repr(C)]` When `align` and `packed` attributes exist on the same type, or when `packed` structs transitively contains `align` types, -the resulting layout matches the current compilation target. +the resulting layout matches the target toolchain ABI. For example, given: ```c @@ -60,7 +60,7 @@ struct Bar(Foo); ## `#[repr(system)]` When `align` and `packed` attributes exist on the same type, or when `packed` structs transitively contains `align` types, -the resulting layout matches the current compilation target. +the resulting layout matches the target OS ABI. For example, given: ```c From af3cc3067948ff3de32a80fe49cffb17caf75fc6 Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Fri, 25 Oct 2024 09:00:03 -0700 Subject: [PATCH 4/9] Update --- text/0000-layout-packed-aligned.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 9c88b726f62..6c1e28d7fae 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -64,9 +64,9 @@ the resulting layout matches the target OS ABI. For example, given: ```c -#[repr(C, align(4))] +#[repr(system, align(4))] struct Foo(u8); -#[repr(C, packed(1))] +#[repr(system, packed(1))] struct Bar(Foo); ``` `align_of::()` would be 4 for `*-pc-windows-msvc` and `*-pc-windows-gnu`. It would be 1 for everything else. From 175d29fabbe696ab2a54a8c6ef9e23542da766fb Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Tue, 29 Oct 2024 09:05:11 -0700 Subject: [PATCH 5/9] Updates --- text/0000-layout-packed-aligned.md | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 6c1e28d7fae..8cb8be18c59 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -38,8 +38,9 @@ struct __attribute__((packed, aligned(4))) MyStruct { }; ``` -Currently `#[repr(packed(_))]` structs cannot transitively contain #[repr(align(_))] structs due to differing behavior between msvc and gcc/clang. -However, in most cases, the user would expect `#[repr(C)]` to produce a struct layout matching the same type as defined by the current target. +Currently, `#[repr(packed(_))]` structs cannot be `#[repr(align(_))]` or transitively contain `#[repr(align(_))]` types. Attempting to do so results in a [hard error](https://doc.rust-lang.org/nightly/error_codes/E0588.html). + +This behavior was added in the [original implementation](https://github.com/rust-lang/rust/issues/33158) of `#[repr(packed)]` due to concerns over differing behavior between msvc and gcc/clang. This makes it cumbersome or even impossible to produce C-compatible struct layouts in Rust when the corresponding C types were annotated with both `packed` and `aligned`. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -71,7 +72,11 @@ struct Bar(Foo); ``` `align_of::()` would be 4 for `*-pc-windows-msvc` and `*-pc-windows-gnu`. It would be 1 for everything else. - +## `#[repr(Rust)]` +When `align(N)` and `packed(M)` attributes exist on the same type, or when `packed` structs contain `aligned` fields, +the type will have a base alignment of `N`, while the struct fields will be laid out as if their alignment was +decreased to `M`. However, in general Rust is free to reorder +these fields for optimization purposes, and the only guarantee is that the fields will maintain a minimum alignment of `M`. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -101,8 +106,9 @@ When the align and packed modifiers are applied on the same type as `#[repr(alig the alignment of the struct fields are decreased to be M. Then, the base alignment of the type is increased to be N. -When a `#[repr(packed(M))]` struct transitively contains a field with `#[repr(align(N))]` type, -- The field is first `pad_to_align`. Then, the field is added to the struct with alignment decreased to M. The packing requirement overrides the alignment requirement. (GCC, `#[repr(Rust)]`, `#[repr(C)]` on gnu targets, `#[repr(system)]` on non-windows targets) +When a `#[repr(packed(M))]` struct transitively contains a field with `#[repr(align(N))]` type, depending on the +target triplet, either: +- The field is first `pad_to_align`. Then, the field is added to the struct with alignment decreased to M. The packing requirement overrides the alignment requirement. (GCC, `#[repr(Rust)]`, `#[repr(C)]` on gnu targets, `#[repr(system)]` on non-windows targets), or - The field is added to the struct with alignment increased to N. The alignment requirement overrides the packing requirement. (MSVC, `#[repr(C)]` on msvc targets, `#[repr(system)]` on windows targets) # Drawbacks From a53aab1607fefe1399622670e1e3034b4ee7e0bb Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Tue, 29 Oct 2024 09:08:33 -0700 Subject: [PATCH 6/9] Updates --- text/0000-layout-packed-aligned.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 8cb8be18c59..831a7315759 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -74,8 +74,8 @@ struct Bar(Foo); ## `#[repr(Rust)]` When `align(N)` and `packed(M)` attributes exist on the same type, or when `packed` structs contain `aligned` fields, -the type will have a base alignment of `N`, while the struct fields will be laid out as if their alignment was -decreased to `M`. However, in general Rust is free to reorder +the type will have their base alignment increased to `N`, while the struct fields will be laid out as if their +alignments were decreased to `M`. However, in general Rust is free to reorder these fields for optimization purposes, and the only guarantee is that the fields will maintain a minimum alignment of `M`. # Reference-level explanation From 5190399bddb30e734096d79be0696457cc0bda35 Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Mon, 4 Nov 2024 14:26:23 -0800 Subject: [PATCH 7/9] Update text/0000-layout-packed-aligned.md Co-authored-by: Christopher Durham --- text/0000-layout-packed-aligned.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 831a7315759..8ec7edf2a75 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -108,8 +108,8 @@ increased to be N. When a `#[repr(packed(M))]` struct transitively contains a field with `#[repr(align(N))]` type, depending on the target triplet, either: -- The field is first `pad_to_align`. Then, the field is added to the struct with alignment decreased to M. The packing requirement overrides the alignment requirement. (GCC, `#[repr(Rust)]`, `#[repr(C)]` on gnu targets, `#[repr(system)]` on non-windows targets), or -- The field is added to the struct with alignment increased to N. The alignment requirement overrides the packing requirement. (MSVC, `#[repr(C)]` on msvc targets, `#[repr(system)]` on windows targets) +- The field is added to the struct with alignment decreased to M. The packing requirement overrides the alignment requirement. (This is the case for GCC, `#[repr(Rust)]`, `#[repr(C)]` on gnu targets, and `#[repr(system)]` on non-windows targets.) +- The field is added to the struct with alignment decreased to M and then increased to N. The alignment requirement overrides the packing requirement. (This is the case for MSVC, `#[repr(C)]` on msvc targets, `#[repr(system)]` on windows targets.) # Drawbacks [drawbacks]: #drawbacks From a12370d073b82d77d9eabcc74704935bad3bae8e Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Tue, 12 Nov 2024 09:44:08 -0800 Subject: [PATCH 8/9] Update drawbacks --- text/0000-layout-packed-aligned.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index 8ec7edf2a75..aa5f0524b25 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -114,7 +114,9 @@ target triplet, either: # Drawbacks [drawbacks]: #drawbacks -Historically the meaning of `#[repr(C)]` has been somewhat ambiguous. When someone puts `#[repr(C)]` on their struct, their intention could be one of three things: +Although [https://doc.rust-lang.org/reference/type-layout.html#the-c-representation](the Rust reference) documents the meaning +of repr(C) quite clearly (types are laid out linearly, according to a fixed algorithm.), when you see `#[repr(C)]` in code, +its meaning can be somewhat ambiguous. When someone puts `#[repr(C)]` on their struct, their intention could be one of three things: 1. Having a target-independent and stable representation of the data structure for storage or transmission. 2. FFI with C and C++ libraries compiled for the same target. 3. Interoperability with operating system APIs. @@ -125,6 +127,19 @@ that there exists some C layouts that cannot be specified using `#[repr(C)]`. This RFC addresses use case 2 with `#[repr(C)]` and use case 3 with `#[repr(system)]`. For use case 1, people will have to seek alternative solutions such as `crABI` or protobuf. However, it could be a footgun if people continue to use `#[repr(C)]` for use case 1. +It's worthy to note that while this RFC does require people to stop treating `repr(C)` as a linear layout but rather as an +ABI compatiblity layout, our intention is not proposing a breaking change: `packed` structs are previously banned from +transitively containing `aligned` fields, so in most cases existing `repr(C)` structs will be laid out in exactly the same +way as it did before. However, due to an oversight in the current implementation of the Rust compiler, the restriction +can actuall be +[circumvented](https://github.com/rust-lang/rust/issues/100743#issuecomment-1229343705) using generics. Applications +using this pattern to circumvent the restriction will see a change in the struct layout on MSVC targets. + +This RFC alone still doesn't make `repr(C)` fully match the target (MSVC) toolchain in all cases; the known other +divergences are enums with overflowing discriminant and how a field of type [T; 0] is handled. So while this does +improve parity, the reality is that there are still edge cases to keep track of for now. These cases shall be addressed +in future RFCs. + # Rationale and alternatives From 2ed92ef75ad416edd5f950d3742ef7944542675f Mon Sep 17 00:00:00 2001 From: Zhixing Zhang Date: Thu, 21 Nov 2024 11:29:13 -0800 Subject: [PATCH 9/9] Update 0000-layout-packed-aligned.md --- text/0000-layout-packed-aligned.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-layout-packed-aligned.md b/text/0000-layout-packed-aligned.md index aa5f0524b25..3cf6bb0074b 100644 --- a/text/0000-layout-packed-aligned.md +++ b/text/0000-layout-packed-aligned.md @@ -114,8 +114,8 @@ target triplet, either: # Drawbacks [drawbacks]: #drawbacks -Although [https://doc.rust-lang.org/reference/type-layout.html#the-c-representation](the Rust reference) documents the meaning -of repr(C) quite clearly (types are laid out linearly, according to a fixed algorithm.), when you see `#[repr(C)]` in code, +Although [The Rust reference](https://doc.rust-lang.org/reference/type-layout.html#the-c-representation) documents the meaning +of repr(C) quite clearly (types are laid out linearly, according to a fixed algorithm), when you see `#[repr(C)]` in code, its meaning can be somewhat ambiguous. When someone puts `#[repr(C)]` on their struct, their intention could be one of three things: 1. Having a target-independent and stable representation of the data structure for storage or transmission. 2. FFI with C and C++ libraries compiled for the same target.