Should "pairs of scalars" use LLVM vectors? #48426

eddyb · 2018-02-22T11:48:28Z

As seen on reddit, we could pass e.g. (f32, f32) not as two floats, but one <2 x float>.
It's unclear how well LLVM would be able to deal with it.

cc @dotdash @alexcrichton

The text was updated successfully, but these errors were encountered:

hanna-kruppe · 2018-02-22T12:23:35Z

I'm going to assume this would only impact parameter passing and the callee would immediately extractelement into two scalars. Otherwise this seems too big of a burden on our codegen (and most code would still need the extractvectors, but ad-hoc when needed instead of once on entry).

There's two aspects here, impact on optimizations and impact on backend/codegen.

For the former, I estimate that unless the value really is a logical vector and used as such (as in the linked post), this just introduces redundant vector creations and destructuring at no benefit. More IR is not good of course, and because these are dictated by the function signature, they can't easily be removed. Additionally, they might hamper inference of attributes such as nocapture (for pointers), returned, range metadata, etc. because there's no way to attach these to the individual vector elements. That could potentially harm some interprocedural optimizations.

The advantage for optimizations is really just that if you do vector-esque operations on the elements, instcombine or the SLP vectorizer may fold the "extract - do scalar ops - build vector" sequence into a vector op. But I don't think this will be very useful:

it is flaky: it only tilts the heuristics in favor of doing the optimization because there's no need to create the vector from scalars, but if you do anything with the pair that can't be done as a vector operation, it might decide to not vectorize anything
it would only help for vectors of length two
we're gaining better support for explicitly using SIMD anyway
small functions like the one in the linked discussion should be inlined anway, and once they are inlined any difference that the calling convention might have made is erased

As for the backend, unless the target has a suitable vector register, the value will have to be legalized (in this case, most likely split into scalars or smaller vectors, or promoted to a longer vector). Having illegal values in the function signature is a bit more tricky because these don't just pass through the general SelectionDAG legalizer (which is generally quite robust) but are also inspected by the calling convention code, which may not be equipped to handle such types if Clang doesn't use them in function signatures. I've certainly seen more bugs there than when legalizing types within a function, especially in less mainstream backends (i.e., pretty much everything except x86 and Arm).

However, when a suitable vector register is available (or the backend legalizes it well) then this would generally pass the pair in that register, which would be nice for avoiding stack traffic. But then again, passing in a vector register means you need extra instructions if the scalars in the pair ever need to, well, be in scalar registers. I expect that to be the most common case.

tl;dr Passing vectors maybe helps optimizing literal vector types of length two a bit, and passes them a bit more efficiently, but in all other cases makes the IR more opaque, risks unearthing backend bugs, and likely causes redundant data traffic before and after calls.

hanna-kruppe · 2018-02-22T12:29:59Z

I guess having extra registers for passing pairs could still be beneficial for functions that receive a scalar pair and forward it without ever operating on the contained scalars (i.e., just forwards it, or only copies it around). Not sure how common that is, though, and it still requires traffic between vector and scalar registers when the pair is built by the caller and when it's finally used by other code.

gnzlbg · 2018-03-28T12:14:50Z

@eddyb how isn't this just a bug in LLVM SLP vectorizer?

hanna-kruppe · 2018-03-28T12:38:31Z

@gnzlbg The vectorizers can't change the function's ABI which is the core of the issue here. You're right that within a function you can recognize and vectorize straight-line code with data parallelism, but if the data is not yet in the right format for that, you pay for converting to and from the vector representaton.

gnzlbg · 2018-03-28T12:47:39Z

The vectorizers can't change the function's ABI which is the core of the issue here.

Aren't both ABIs equivalent in this case? If so, LLVM should be able to leverage that. Otherwise that would be another LLVM bug.

For the case that @eddyb was talking about everything is inlined, so the ABI shouldn't really matter.

hanna-kruppe · 2018-03-28T13:41:14Z

ABI is a huge complicated mess but the short story is, if we emit a function in IR that takes two float arguments then on x86 those two arguments are passed in two different xmm registers, whereas if it takes one <2 x float> argument it's passed in one xmm register, so the ABI is different, end of story.

For the case that @eddyb was talking about everything is inlined, so the ABI shouldn't really matter.

Which case are you referring to? The reddit discussion focuses on a trivial function in isolation (taken from the blog post discussed, which contains a few more examples but they're all similarly trivial):

pub struct Vec2 {
    pub x: f32,
    pub y: f32
}

pub fn add (a: Vec2, b: Vec2) -> Vec2 {
    Vec2 {
        x: a.x + b.x,
        y: a.y + b.y
    }
}

gnzlbg · 2018-03-28T14:50:02Z

Which case are you referring to?

This: https://godbolt.org/g/CyiLFs

Independently of the ABI specified for add (which is a problem on its own), what bothers me is that in the foo function LLVM does not transform that scalar code to a single addps.

hanna-kruppe · 2018-03-28T14:59:32Z

OK, good to know, but I don't think that's in any way related to the ABI decision that this issue is about.

gnzlbg · 2018-03-28T15:07:04Z

I think it is very relevant. If the code would optimize properly, which ABI we choose would not matter for crate private or #[inline] functions. It would only matter for crate-public non-#[inline] functions when ThinLTO or FatLTO are disabled.

The functions in the reddit post are all tiny, I would expect them to often be inlined, or marked with #[inline] when exported by a crate, and thus to not be affected by this.

hanna-kruppe · 2018-03-28T15:10:27Z

This issue matters for every call (that passes a scalar pair) that isn't inlined, of which there are many. What you describe has a completely different root cause, and likely a completely different fix. So I don't see the point in discussing it here, it should be a separate issue.

gnzlbg · 2018-03-28T15:21:55Z

I fully agree with this:

The advantage for optimizations is really just that if you do vector-esque operations on the elements, instcombine or the SLP vectorizer may fold the "extract - do scalar ops - build vector" sequence into a vector op. But I don't think this will be very useful

steveklabnik · 2019-07-23T13:13:33Z

Triage: not sure if anything is going on here, or if this issue is pulling its weight. That is, this feels like a small discussion from a year ago, with no clear resolution.

gnzlbg · 2019-07-23T13:50:53Z

I think that the current default is better, and therefore we should not change it.

The current default is unsurprising: user writes scalar code and gets scalar semantics, while users that explicitly use logical vectors get vector semantics. One can change from one to the other and what happens is explicit in the code.

If we change the defaults to use vectors for pairs, at best, the user code becomes faster, and at worst we generate extra IR that might end up producing slower scalar code.

workingjubilee · 2022-07-06T03:20:13Z

I agree that this issue is not "pulling its weight" and it is grounded on a somewhat shaky basis. There has been some fussing about with the ABI recently in order to undo performance regressions that were induced by previously trying to play too many games with LLVM's understanding of the code, see #85265

There is almost certainly some profit to be had, but it requires a more sophisticated plan-of-attack which would be, essentially, off-topic here.

CryZe mentioned this issue Feb 22, 2018

Struct Layout Optimization in Rust 1.18 regressed C ABI #48433

Closed

eddyb added A-codegen Area: Code generation and removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Feb 22, 2018

jkordish added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Apr 18, 2018

workingjubilee closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should "pairs of scalars" use LLVM vectors? #48426

Should "pairs of scalars" use LLVM vectors? #48426

eddyb commented Feb 22, 2018

hanna-kruppe commented Feb 22, 2018

hanna-kruppe commented Feb 22, 2018 •

edited

Loading

gnzlbg commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018

gnzlbg commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018 •

edited

Loading

gnzlbg commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018

gnzlbg commented Mar 28, 2018 •

edited

Loading

hanna-kruppe commented Mar 28, 2018

gnzlbg commented Mar 28, 2018

steveklabnik commented Jul 23, 2019

gnzlbg commented Jul 23, 2019

workingjubilee commented Jul 6, 2022

Should "pairs of scalars" use LLVM vectors? #48426

Should "pairs of scalars" use LLVM vectors? #48426

Comments

eddyb commented Feb 22, 2018

hanna-kruppe commented Feb 22, 2018

hanna-kruppe commented Feb 22, 2018 • edited Loading

gnzlbg commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018

gnzlbg commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018 • edited Loading

gnzlbg commented Mar 28, 2018

hanna-kruppe commented Mar 28, 2018

gnzlbg commented Mar 28, 2018 • edited Loading

hanna-kruppe commented Mar 28, 2018

gnzlbg commented Mar 28, 2018

steveklabnik commented Jul 23, 2019

gnzlbg commented Jul 23, 2019

workingjubilee commented Jul 6, 2022

hanna-kruppe commented Feb 22, 2018 •

edited

Loading

hanna-kruppe commented Mar 28, 2018 •

edited

Loading

gnzlbg commented Mar 28, 2018 •

edited

Loading