-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement special swizzles for masks and remove {to,from}_bitmask_vector
#423
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good enough, though I think we should re-add conversions from arbitrary sized Mask
from/to integers to make >64 element vectors more ergonomic, maybe when Rust finally gains uint<N>
types.
Is this a bug in the code, or in codegen? If the code behaves correctly with Miri but gives the wrong result with codegen then that seems like a critical bug to me that needs investigation. |
unsafe { | ||
// Compute the bitmask | ||
let mut bytes: <LaneCount<N> as SupportedLaneCount>::BitMask = | ||
core::intrinsics::simd::simd_bitmask(self.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This intrinsic still has some other uses... are those working fine with non-power-of-2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered here:
We still use the intrinsic, but currently (not yet synced to rust-lang/rust) we only use integer return types
Are vectors with more than 64 elements a thing? Miri currently supports simd_bitmask and simd_select_bitmask only for vectors up to size 64, which I thought was sufficient since this crate also only goes up to 64. |
Did things break only on big-endian or also on little-endian? |
yes, e.g. RISC-V V theoretically supports vectors with thousands of elements. NEC's SX-Aurora supports
portable-simd only supports 64 elements only because that's where we stopped for now since we currently have to implement some things for each size manually, the long term plan is for the max size to be much larger. This is kinda like how Rust used to only implement |
Bitmask vectors are a bit of a compromise, to allow bitmasks to be useful on vectors with length >64. They are wasteful, since they are
Simd<u8, N>
, 8x bigger than they need to be in order to work well with generics.Additionally, in #422, I'm finding that bitmask vector codegen doesn't work for non-powers-of-two, and even with a workaround, seems to have incorrect codegen on a handful of architectures.
In this PR I add an
extract
swizzle, and implement all of the special swizzles for masks, which I believe allows us to remove bitmask vectors. With a hypothetical 128-element vector, you could do: