-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move probing logic into control module #578
base: master
Are you sure you want to change the base?
Conversation
d5038e4
to
d987f62
Compare
|
||
/// Finds an empty slot in the group to insert a new item. | ||
#[inline] | ||
pub(crate) fn empty(&self) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a combination of RawTableInner::find_insert_slot_in_group
and RawTableInner::fix_insert_slot
.
/// | ||
/// This is equivalent to `empty().is_some()`. | ||
#[inline] | ||
pub(crate) fn has_empty(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was used in RawTableInner::find_or_find_insert_slot_inner
, RawTableInner::find_inner
, and RawIterHashInner::next
.
} | ||
impl<T> GroupProbeItems<T> { | ||
/// Return the index of the first item that satisfies the predicate. | ||
pub(crate) fn find(self, mut f: impl FnMut(&T) -> bool) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is taken from RawTableInner::find_or_find_insert_slot_inner
.
} | ||
|
||
/// Gets the inner slice of groups. | ||
fn control_slice(&self) -> &[Tag] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method basically indicated a bug in the existing group code: we assume that there are always this many initialized tags in the control data, but that's not what Group::static_empty
gave us: it only gave us a single group, rather than two like this would imply.
So, to help ensure that methods like this always work, and so we don't have to waltz around the concept of "we probably won't read that many groups," the static group allocations were doubled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a single group is sufficient. We only need GROUP_SIZE - 1
trailing bytes, and the static empty table has 1 bucket. Hence a total of GROUP_SIZE
control bytes.
Also since control_slice
is only used by debugging code, it only needs to include the first bucket_mask + 1
control bytes. The trailing bytes are a mirror of the first bytes, or EMPTY in the case of tables smaller than the group size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I agree that we technically only need this many bytes, but all the existing code relies on num_ctrl_bytes
, which is defined as bucket_mask + 1 + Group::WIDTH
, not bucket_mask + Group::WIDTH
. So, in all cases, you need at least two groups.
I'm open to changing this, but I figured that the empty table case wasn't really worth optimising for as long as it didn't allocate anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I went back and double-checked the implementation. We only need Group::WIDTH - 1
tail bytes for probing, but we actually need Group::WIDTH
padding bytes for set_ctrl
(the last byte is only written but never read), hence the current definition of num_ctrl_bytes
.
So the correct length for the static allocation should be 1 + Group::WIDTH
.
} | ||
|
||
/// Verifies that the control slice is valid. | ||
fn assert_at_least_one_empty(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this while I was debugging; ultimately not needed, but it feels nice to have anyway. If you want, I can be more aggressive making sure this code is optimized out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert will incorrectly pass on tables smaller than the group size since the tail bytes are EMPTY. Hence the recommendation for control_bytes
to ignore the trailing bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the tail bytes being empty satisfy the condition that at least one of the bytes is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it would satisfy this condition even if the table was actually full.
self.control_slice() | ||
); | ||
} | ||
self.iter += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug that made me think my code wasn't working at first: I was checking if self.stride > self.bucket_mask
, when really, I should be checking if self.stride > self.bucket_mask * Group::WIDTH
.
Except… that can overflow. So, since this is all for a debug assertion anyway, I decided to just add an extra field when debug assertions are enabled that tracks how many iterations have passed. That way, we can debug "infinite" loops, but we don't actually take the performance hit in release mode.
@@ -181,11 +143,6 @@ impl TableLayout { | |||
} | |||
} | |||
|
|||
/// A reference to an empty bucket into which an can be inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to delete InsertSlot
because I think it's a bad API: there's not much stopping you from creating an invalid insertion slot, and it doesn't really gain you anything by making the type different from an ordinary index. We can revisit this type of API for HashTable
if it seems worthwhile in the future.
FWIW, I also don't like Bucket
either, but that one stays for now since it's mostly separate from the probing logic.
0d28e9b
to
e8b1ac6
Compare
Some((_, ptr)) => { | ||
// FIXME: this is the worst API ever oh no oh dear I hate it | ||
// FIXME: use NonNull::add if MSRV > 1.80.0. hopefully this is | ||
// deleted before then… | ||
// SAFETY: Bucket wants our pointer to be offset by one, so, | ||
// make it so… | ||
let ptr = unsafe { NonNull::new_unchecked(ptr.as_ptr().add(1)) }; | ||
Some(Bucket { ptr }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shout outs to miri for helping me find this issue.
So, okay. I didn't want to fix all usages of Bucket
in this PR and make them all actually put the pointer in the correct place since that's a big change. But I'm also not changing ProbeItems
to return a pointer to somewhere other than where the item is. That's clearly the wrong answer.
If it's actually a big enough deal for performance, I'll try and fix the bucket API to be less bad. But that API is on the chopping block and hopefully I can get to it before we want to do a release anyway.
e8b1ac6
to
d652353
Compare
src/control/group/mod.rs
Outdated
// | ||
// I attempted an implementation on ARM using NEON instructions, but it | ||
// turns out that most NEON instructions have multi-cycle latency, which in | ||
// the end outweighs any gains over the generic implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot to delete this paragraph when I added the new NEON implementation.
@@ -30,6 +30,12 @@ impl Tag { | |||
self.0 & 0x01 != 0 | |||
} | |||
|
|||
/// Checks whether a control value is EMPTY. | |||
#[inline] | |||
pub(crate) const fn is_empty(self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't needed, you can just do == Tag::EMPTY
. It will give you more efficient code too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why special_is_empty
doesn't do that, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a good reason for having it implemented this way. In the insertion code we use this code:
self.table.growth_left -= old_ctrl.special_is_empty() as usize;
Because special_is_empty
only looks at the low bit, this becomes branchless and compiles down to 2 instructions: and
& sub
.
/// | ||
/// The given bucket mask must be equal to the power-of-two number of buckets minus one, | ||
/// and the given control pointer must be a valid hash table. | ||
pub(crate) unsafe fn new(ctrl: NonNull<Tag>, bucket_mask: usize, hash: u64) -> Probe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All non-generic functions need to have #[inline]
. And hot generic functions should as well, since it causes them to be replicated in all codegen units they are referenced by instead of relying on LTO to inline them.
} | ||
|
||
/// Gets the inner slice of groups. | ||
fn control_slice(&self) -> &[Tag] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually a single group is sufficient. We only need GROUP_SIZE - 1
trailing bytes, and the static empty table has 1 bucket. Hence a total of GROUP_SIZE
control bytes.
Also since control_slice
is only used by debugging code, it only needs to include the first bucket_mask + 1
control bytes. The trailing bytes are a mirror of the first bytes, or EMPTY in the case of tables smaller than the group size.
} | ||
|
||
/// Verifies that the control slice is valid. | ||
fn assert_at_least_one_empty(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert will incorrectly pass on tables smaller than the group size since the tail bytes are EMPTY. Hence the recommendation for control_bytes
to ignore the trailing bytes.
self.assert_at_least_one_empty(); | ||
loop { | ||
// SAFETY: We always return an item from the iterator. | ||
let group_probe = unsafe { self.next().unwrap_unchecked() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using unwrap_unchecked
everywhere, it would be better to not use the Iterator
trait here and just have an inherent next
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some overall thoughts on this file:
This is a lot of code that is generic over T
which really doesn't need to be. This can severely hurt compilation times. It would be better to keep all of the probing logic only using usize
and have a separate get
/get_mut
on the RawTable
to access elements by index.
Basically, I would expect this entire file to only deal with usize
indices and not contain any mention of T
. Perhaps it would be easier if you first ripped out the Bucket
API and replaced it with indices before cleaning up the internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure if I'm misunderstanding this, but I only made things generic over T when I absolutely had to: the only things that involve T
either directly return pointers to T
or require calling functions that take &T
, both of which can't be made non-generic. The majority of the probing logic isn't generic at all, and operates just on usize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that GroupProbeItems
and ProbeItems
are generic over T
and have methods that contain a lot of the searching logic which is going to be replicated for each T
. I think it would be better if all the code in this file only dealt with usize
indices and instead let the caller deal with mapping these indices to values in the data array. Similarly methods like find_full
can take a FnMut(usize) -> bool
closure where the closure deals with getting the element for a given index.
☔ The latest upstream changes (presumably #586) made this pull request unmergeable. Please resolve the merge conflicts. |
d652353
to
262899b
Compare
So, I've gotten back into this, and I am realising that However, I am going to see what tinier bits of this I can merge in that will be less intrusive, like the double-width static group allocation since that's technically removing some potential UB. |
262899b
to
267d3e4
Compare
This is the follow-up to #568, which should be merged first. (That's why this is marked as a draft.)
Essentially, this pulls the bulk of the probing logic out of
RawTable
and into iterators which can more easily be reasoned with in isolation. The code isn't identical to that inRawTable
, but it seems to pass all the tests, so, I'm assuming that it all works correctly.