-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should "pairs of scalars" use LLVM vectors? #48426
Comments
I'm going to assume this would only impact parameter passing and the callee would immediately There's two aspects here, impact on optimizations and impact on backend/codegen. For the former, I estimate that unless the value really is a logical vector and used as such (as in the linked post), this just introduces redundant vector creations and destructuring at no benefit. More IR is not good of course, and because these are dictated by the function signature, they can't easily be removed. Additionally, they might hamper inference of attributes such as The advantage for optimizations is really just that if you do vector-esque operations on the elements, instcombine or the SLP vectorizer may fold the "extract - do scalar ops - build vector" sequence into a vector op. But I don't think this will be very useful:
As for the backend, unless the target has a suitable vector register, the value will have to be legalized (in this case, most likely split into scalars or smaller vectors, or promoted to a longer vector). Having illegal values in the function signature is a bit more tricky because these don't just pass through the general SelectionDAG legalizer (which is generally quite robust) but are also inspected by the calling convention code, which may not be equipped to handle such types if Clang doesn't use them in function signatures. I've certainly seen more bugs there than when legalizing types within a function, especially in less mainstream backends (i.e., pretty much everything except x86 and Arm). However, when a suitable vector register is available (or the backend legalizes it well) then this would generally pass the pair in that register, which would be nice for avoiding stack traffic. But then again, passing in a vector register means you need extra instructions if the scalars in the pair ever need to, well, be in scalar registers. I expect that to be the most common case. tl;dr Passing vectors maybe helps optimizing literal vector types of length two a bit, and passes them a bit more efficiently, but in all other cases makes the IR more opaque, risks unearthing backend bugs, and likely causes redundant data traffic before and after calls. |
I guess having extra registers for passing pairs could still be beneficial for functions that receive a scalar pair and forward it without ever operating on the contained scalars (i.e., just forwards it, or only copies it around). Not sure how common that is, though, and it still requires traffic between vector and scalar registers when the pair is built by the caller and when it's finally used by other code. |
@eddyb how isn't this just a bug in LLVM SLP vectorizer? |
@gnzlbg The vectorizers can't change the function's ABI which is the core of the issue here. You're right that within a function you can recognize and vectorize straight-line code with data parallelism, but if the data is not yet in the right format for that, you pay for converting to and from the vector representaton. |
Aren't both ABIs equivalent in this case? If so, LLVM should be able to leverage that. Otherwise that would be another LLVM bug. For the case that @eddyb was talking about everything is inlined, so the ABI shouldn't really matter. |
ABI is a huge complicated mess but the short story is, if we emit a function in IR that takes two
Which case are you referring to? The reddit discussion focuses on a trivial function in isolation (taken from the blog post discussed, which contains a few more examples but they're all similarly trivial): pub struct Vec2 {
pub x: f32,
pub y: f32
}
pub fn add (a: Vec2, b: Vec2) -> Vec2 {
Vec2 {
x: a.x + b.x,
y: a.y + b.y
}
} |
This: https://godbolt.org/g/CyiLFs Independently of the ABI specified for |
OK, good to know, but I don't think that's in any way related to the ABI decision that this issue is about. |
I think it is very relevant. If the code would optimize properly, which ABI we choose would not matter for crate private or The functions in the reddit post are all tiny, I would expect them to often be inlined, or marked with |
This issue matters for every call (that passes a scalar pair) that isn't inlined, of which there are many. What you describe has a completely different root cause, and likely a completely different fix. So I don't see the point in discussing it here, it should be a separate issue. |
I fully agree with this:
|
Triage: not sure if anything is going on here, or if this issue is pulling its weight. That is, this feels like a small discussion from a year ago, with no clear resolution. |
I think that the current default is better, and therefore we should not change it. The current default is unsurprising: user writes scalar code and gets scalar semantics, while users that explicitly use logical vectors get vector semantics. One can change from one to the other and what happens is explicit in the code. If we change the defaults to use vectors for pairs, at best, the user code becomes faster, and at worst we generate extra IR that might end up producing slower scalar code. |
I agree that this issue is not "pulling its weight" and it is grounded on a somewhat shaky basis. There has been some fussing about with the ABI recently in order to undo performance regressions that were induced by previously trying to play too many games with LLVM's understanding of the code, see #85265 There is almost certainly some profit to be had, but it requires a more sophisticated plan-of-attack which would be, essentially, off-topic here. |
As seen on reddit, we could pass e.g.
(f32, f32)
not as twofloat
s, but one<2 x float>
.It's unclear how well LLVM would be able to deal with it.
cc @dotdash @alexcrichton
The text was updated successfully, but these errors were encountered: