Bad codegen for cloning a boxed array #41160

arthurprs · 2017-04-08T14:09:24Z

#![crate_type = "lib"]
pub extern fn clone(bits: &mut Box<[u64; 1024]>) -> Box<[u64; 1024]> {
  Box::new(**bits)
}

It copies data to the stack first, before allocating and copying again to the heap.

.cfi_offset rbx, -16
	mov	rsi, qword ptr [rdi]
	lea	rdi, [rsp]
	mov	edx, 8192
	call	memcpy@PLT
	mov	edi, 8192
	mov	esi, 8
	call	__rust_allocate@PLT
	mov	rbx, rax
	test	rbx, rbx
	je	.LBB0_2
	lea	rsi, [rsp]
	mov	edx, 8192
	mov	rdi, rbx
	call	memcpy@PLT
	mov	rax, rbx
	add	rsp, 8192
	pop	rbx
	ret

The text was updated successfully, but these errors were encountered:

nagisa · 2017-04-08T18:07:40Z

Seems like inlining does not work good enough. box **bits produces good enough code.

oyvindln · 2017-05-11T14:19:18Z

EDIT: Not very relevant since mir optimisations are not used at the moment anyhow.

So I tried to look into this with my limited knowledge of the compiler internals, both with the clone function from the first post in this issue, and a simple test function that simply returns a boxed array:

https://gist.github.com/oyvindln/bda3c6efb9c428c8bcf12acee8f3f223

Using -Z mir-opt-level=3, which enables mir inlining does seem to get rid of one memcpy call with my simplified test code, but not the second. If I understand the assembly correctly, it looks like without mir inlining, there is actually reserved space for 2 arrays on the stack (.cfi_def_cfa_offset is 416, vs 216), and one is memcpied to the other one. So, even though it looks to be inlined by LLVM, LLVM isn't able to get rid of the temporaries. The function using box syntax instead avoids the temporary stack usage entirely.

For the test code in the first post in this issue, the assembly output seems identical with and without mir inlining, though the mir is a bit different. The mir output seems to suggest that the copy propagation pass doesn't manage to turn this into this:

DEBUG:rustc_mir::transform::copy_prop: Considering destination local: _1
DEBUG:rustc_mir::transform::copy_prop: Can't copy-propagate local: source use is not an assignment

log of the pass

oyvindln · 2017-05-11T15:57:37Z

A workaround for this issue, while box syntax is still unstable, would be for the compiler to special case Box::new(x) to box x. This may also have some use to avoid stack overflows with large boxed values for non-optimised builds.

oyvindln · 2017-05-13T20:45:42Z

For the rust side, copy propagation of arrays in MIR is not happening simply because it is not implemented for the "Repeat" (array?) rvalue mir type. As for the inlining pass, I don't know if more should be done.

I don't know what's keeping LLVM from optimising this properly.

arthurprs · 2017-05-19T10:36:25Z

Is it possible to fix this by using mir copy propagation for now?

oyvindln · 2017-08-30T09:42:06Z

So I found that adding *& like this (forces a move?):

pub fn create_boxed_array_deref() -> Box<[u16; 100]> {
    Box::new(*&[0u16; 100])
}

evades the extra stack copy. (This doesn't work with references.)
Using a const value as the parameter also avoids it:

pub fn create_boxed_array_deref() -> Box<[u16; 100]> {
    const A: [u16; 100] = [0;100];
    Box::new(A)
}

With an struct containing an array (at least a simple one), it there's some similar behavoiur:

pub struct WrappedArray {
    pub val: [u16; 100]
}

impl WrappedArray {
    fn new() -> WrappedArray {
        WrappedArray {
            val: [0; 100],
        }
    }
}

impl Default for WrappedArray {
    #[inline(always)]
    fn default() -> WrappedArray {
        WrappedArray {
            val: [100; 100],
        }
    }
}

impl Copy for WrappedArray {}

impl Clone for WrappedArray {
    fn clone(&self) -> WrappedArray { *self }
}

pub fn boxed_struct() -> Box<WrappedArray> {
    // Makes stack copies:
    Box::new(WrappedArray::new())
    Box::new(WrappedArray::new().clone())
    Box::new(WrappedArray { val: [0; 100] })

   // Does not make stack copies:
   const A: WrappedArray = WrappedArray { val: [0; 100] };
   Box::new(A)
   Box::new(WrappedArray { val: [0; 100] }.clone())
   Box::new(WrappedArray::default())
   Box::new(*&WrappedArray { val: [0; 100] })
   // And anything with box syntax
}

So there seems to be some oddness with copy-propagation of array types. I couldn't get any of this to work when copying from a reference like in the original example (other than doing box::clone()).

nagisa · 2017-08-30T10:11:07Z

So I found that adding *& like this (forces a move?):

This promotes the array to a static variable rather than initialising it on the stack.

oyvindln · 2017-09-02T11:09:36Z

There seems to be some issues with eliminating array temporaries in general. Something as simple as this:

example

#[inline]
fn write_v(val: [u64;30], to: &mut [u64;30]) {
    *to = val;
}

pub fn write_to_array(to: &mut [u64;30]) {
    write_v([55;30], to);
    // This optimizes fine on the other hand:
    // *to = [55;30];
}

is enough to give the optimizer issues and results in a stack temporary.

EDIT: Even simpler:

pub fn write_to_array(to: &mut [u64;30]) {
    let v = [55;30];
    *to = v;
}

oyvindln · 2017-09-03T08:42:31Z

Suspecting it's the loop that fills the array that's causing issues: -Z llvm-debug gives


Can't analyze slices for alloca:   %v = alloca [30 x i64], align 8
  A pointer to this alloca escaped by:
    %6 = icmp eq i64* %5, %3

Running the llvm optimizer with -O3 directly on llvm-ir ouput from rustc with -C opt-level=1 get's rid of the stack temporary completely, so maybe there is a pass ordering issue here?
EDIT: Seems like that is due to the loop being unrolled.

oyvindln · 2017-09-09T16:47:46Z

It looks like both clang and gcc have issues with similar things in C++:
https://godbolt.org/g/dAs2dr

I would think making Box::new an intrinsic/compiler-built in might make sense here, as that would help avoid blowing up the stack in unoptimized code (otherwise there will be 3x the size of the type reserved on the stack.) though that doesn't solve the underlying issue.

isegal · 2018-12-16T03:39:58Z

C++ does not have an issue when returning large arrays from functions due to copy elision (https://en.cppreference.com/w/cpp/language/copy_elision):
Ex: https://gcc.godbolt.org/z/Ci6jWs

Using an assignment loop would cause an extra memset in the beginning of the function:
https://gcc.godbolt.org/z/KKRjes

Rust unfortunately has the extra memcpy regardless:
https://gcc.godbolt.org/z/JapD8c

True copy elision would definitely be a killer feature in Rust because one would no longer have to worry about having to box anything for the sole purpose of avoiding the extra memcpy. Suppose such feature were to be implemented, it would be interesting to see the performance boost of existing Rust applications.

isegal · 2018-12-16T17:20:50Z

Update: The above issue appears to be specific to the array initialization shorthand in Rust. I have filed it separately #56882

nikic · 2021-03-13T20:37:28Z

https://rust.godbolt.org/z/dab87z Optimized well on nightly as a result of #82806.

arthurprs changed the title ~~Bad code codegen for cloning a boxed array~~ Bad codegen for cloning a boxed array Apr 8, 2017

nagisa mentioned this issue May 8, 2017

Boxed array initialisation with Box::new does not seem to optimise properly #41831

Closed

Mark-Simulacrum added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jun 23, 2017

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jul 26, 2017

arthurprs mentioned this issue Oct 9, 2017

Optimizations not applied when separate variables are used #45126

Closed

oyvindln mentioned this issue Jan 27, 2018

LZOxide too large for Windows Frommi/miniz_oxide#21

Closed

abusch mentioned this issue Mar 1, 2018

Future plans for this crate Twinklebear/light_arena#2

Closed

arthurprs mentioned this issue Apr 19, 2018

Box::new(expr) first puts expr on the stack, then copies. #50047

Closed

oyvindln mentioned this issue Jun 27, 2018

Tracking issue for box_syntax #49733

Closed

aweinstock314 mentioned this issue Nov 21, 2019

Consider supporting non-Copy scalar types. dimforge/nalgebra#282

Open

jonas-schievink added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 31, 2020

nikic closed this as completed Mar 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad codegen for cloning a boxed array #41160

Bad codegen for cloning a boxed array #41160

arthurprs commented Apr 8, 2017

nagisa commented Apr 8, 2017

oyvindln commented May 11, 2017 •

edited

Loading

oyvindln commented May 11, 2017

oyvindln commented May 13, 2017

arthurprs commented May 19, 2017

oyvindln commented Aug 30, 2017 •

edited

Loading

nagisa commented Aug 30, 2017

oyvindln commented Sep 2, 2017 •

edited

Loading

oyvindln commented Sep 3, 2017 •

edited

Loading

oyvindln commented Sep 9, 2017

isegal commented Dec 16, 2018 •

edited

Loading

isegal commented Dec 16, 2018

nikic commented Mar 13, 2021

Bad codegen for cloning a boxed array #41160

Bad codegen for cloning a boxed array #41160

Comments

arthurprs commented Apr 8, 2017

nagisa commented Apr 8, 2017

oyvindln commented May 11, 2017 • edited Loading

oyvindln commented May 11, 2017

oyvindln commented May 13, 2017

arthurprs commented May 19, 2017

oyvindln commented Aug 30, 2017 • edited Loading

nagisa commented Aug 30, 2017

oyvindln commented Sep 2, 2017 • edited Loading

oyvindln commented Sep 3, 2017 • edited Loading

oyvindln commented Sep 9, 2017

isegal commented Dec 16, 2018 • edited Loading

isegal commented Dec 16, 2018

nikic commented Mar 13, 2021

oyvindln commented May 11, 2017 •

edited

Loading

oyvindln commented Aug 30, 2017 •

edited

Loading

oyvindln commented Sep 2, 2017 •

edited

Loading

oyvindln commented Sep 3, 2017 •

edited

Loading

isegal commented Dec 16, 2018 •

edited

Loading