-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad codegen for cloning a boxed array #41160
Comments
Seems like inlining does not work good enough. |
EDIT: Not very relevant since mir optimisations are not used at the moment anyhow.So I tried to look into this with my limited knowledge of the compiler internals, both with the clone function from the first post in this issue, and a simple test function that simply returns a boxed array:https://gist.github.com/oyvindln/bda3c6efb9c428c8bcf12acee8f3f223 Using For the test code in the first post in this issue, the assembly output seems identical with and without mir inlining, though the mir is a bit different. The mir output seems to suggest that the copy propagation pass doesn't manage to turn this into this:
|
A workaround for this issue, while box syntax is still unstable, would be for the compiler to special case Box::new(x) to box x. This may also have some use to avoid stack overflows with large boxed values for non-optimised builds. |
For the rust side, copy propagation of arrays in MIR is not happening simply because it is not implemented for the "Repeat" (array?) rvalue mir type. As for the inlining pass, I don't know if more should be done. I don't know what's keeping LLVM from optimising this properly. |
Is it possible to fix this by using mir copy propagation for now? |
So I found that adding pub fn create_boxed_array_deref() -> Box<[u16; 100]> {
Box::new(*&[0u16; 100])
} evades the extra stack copy. (This doesn't work with references.) pub fn create_boxed_array_deref() -> Box<[u16; 100]> {
const A: [u16; 100] = [0;100];
Box::new(A)
} With an struct containing an array (at least a simple one), it there's some similar behavoiur: pub struct WrappedArray {
pub val: [u16; 100]
}
impl WrappedArray {
fn new() -> WrappedArray {
WrappedArray {
val: [0; 100],
}
}
}
impl Default for WrappedArray {
#[inline(always)]
fn default() -> WrappedArray {
WrappedArray {
val: [100; 100],
}
}
}
impl Copy for WrappedArray {}
impl Clone for WrappedArray {
fn clone(&self) -> WrappedArray { *self }
}
pub fn boxed_struct() -> Box<WrappedArray> {
// Makes stack copies:
Box::new(WrappedArray::new())
Box::new(WrappedArray::new().clone())
Box::new(WrappedArray { val: [0; 100] })
// Does not make stack copies:
const A: WrappedArray = WrappedArray { val: [0; 100] };
Box::new(A)
Box::new(WrappedArray { val: [0; 100] }.clone())
Box::new(WrappedArray::default())
Box::new(*&WrappedArray { val: [0; 100] })
// And anything with box syntax
} So there seems to be some oddness with copy-propagation of array types. I couldn't get any of this to work when copying from a reference like in the original example (other than doing box::clone()). |
This promotes the array to a static variable rather than initialising it on the stack. |
There seems to be some issues with eliminating array temporaries in general. Something as simple as this: example#[inline]
fn write_v(val: [u64;30], to: &mut [u64;30]) {
*to = val;
}
pub fn write_to_array(to: &mut [u64;30]) {
write_v([55;30], to);
// This optimizes fine on the other hand:
// *to = [55;30];
} EDIT: Even simpler: pub fn write_to_array(to: &mut [u64;30]) {
let v = [55;30];
*to = v;
} |
Suspecting it's the loop that fills the array that's causing issues:
Running the llvm optimizer with |
It looks like both clang and gcc have issues with similar things in C++: I would think making Box::new an intrinsic/compiler-built in might make sense here, as that would help avoid blowing up the stack in unoptimized code (otherwise there will be 3x the size of the type reserved on the stack.) though that doesn't solve the underlying issue. |
C++ does not have an issue when returning large arrays from functions due to copy elision (https://en.cppreference.com/w/cpp/language/copy_elision): Using an assignment loop would cause an extra memset in the beginning of the function: Rust unfortunately has the extra memcpy regardless: True copy elision would definitely be a killer feature in Rust because one would no longer have to worry about having to box anything for the sole purpose of avoiding the extra memcpy. Suppose such feature were to be implemented, it would be interesting to see the performance boost of existing Rust applications. |
Update: The above issue appears to be specific to the array initialization shorthand in Rust. I have filed it separately #56882 |
https://rust.godbolt.org/z/dab87z Optimized well on nightly as a result of #82806. |
For this code https://is.gd/JbfHAI
It copies data to the stack first, before allocating and copying again to the heap.
The text was updated successfully, but these errors were encountered: