Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move probing logic into control module #578

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

clarfonthey
Copy link
Contributor

This is the follow-up to #568, which should be merged first. (That's why this is marked as a draft.)

Essentially, this pulls the bulk of the probing logic out of RawTable and into iterators which can more easily be reasoned with in isolation. The code isn't identical to that in RawTable, but it seems to pass all the tests, so, I'm assuming that it all works correctly.

@clarfonthey clarfonthey force-pushed the control-module branch 2 times, most recently from d5038e4 to d987f62 Compare October 15, 2024 00:36

/// Finds an empty slot in the group to insert a new item.
#[inline]
pub(crate) fn empty(&self) -> Option<usize> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a combination of RawTableInner::find_insert_slot_in_group and RawTableInner::fix_insert_slot.

///
/// This is equivalent to `empty().is_some()`.
#[inline]
pub(crate) fn has_empty(&self) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was used in RawTableInner::find_or_find_insert_slot_inner, RawTableInner::find_inner, and RawIterHashInner::next.

}
impl<T> GroupProbeItems<T> {
/// Return the index of the first item that satisfies the predicate.
pub(crate) fn find(self, mut f: impl FnMut(&T) -> bool) -> Option<usize> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is taken from RawTableInner::find_or_find_insert_slot_inner.

}

/// Gets the inner slice of groups.
fn control_slice(&self) -> &[Tag] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method basically indicated a bug in the existing group code: we assume that there are always this many initialized tags in the control data, but that's not what Group::static_empty gave us: it only gave us a single group, rather than two like this would imply.

So, to help ensure that methods like this always work, and so we don't have to waltz around the concept of "we probably won't read that many groups," the static group allocations were doubled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a single group is sufficient. We only need GROUP_SIZE - 1 trailing bytes, and the static empty table has 1 bucket. Hence a total of GROUP_SIZE control bytes.

Also since control_slice is only used by debugging code, it only needs to include the first bucket_mask + 1 control bytes. The trailing bytes are a mirror of the first bytes, or EMPTY in the case of tables smaller than the group size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I agree that we technically only need this many bytes, but all the existing code relies on num_ctrl_bytes, which is defined as bucket_mask + 1 + Group::WIDTH, not bucket_mask + Group::WIDTH. So, in all cases, you need at least two groups.

I'm open to changing this, but I figured that the empty table case wasn't really worth optimising for as long as it didn't allocate anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I went back and double-checked the implementation. We only need Group::WIDTH - 1 tail bytes for probing, but we actually need Group::WIDTH padding bytes for set_ctrl (the last byte is only written but never read), hence the current definition of num_ctrl_bytes.

So the correct length for the static allocation should be 1 + Group::WIDTH.

}

/// Verifies that the control slice is valid.
fn assert_at_least_one_empty(&self) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this while I was debugging; ultimately not needed, but it feels nice to have anyway. If you want, I can be more aggressive making sure this code is optimized out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert will incorrectly pass on tables smaller than the group size since the tail bytes are EMPTY. Hence the recommendation for control_bytes to ignore the trailing bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the tail bytes being empty satisfy the condition that at least one of the bytes is empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it would satisfy this condition even if the table was actually full.

self.control_slice()
);
}
self.iter += 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bug that made me think my code wasn't working at first: I was checking if self.stride > self.bucket_mask, when really, I should be checking if self.stride > self.bucket_mask * Group::WIDTH.

Except… that can overflow. So, since this is all for a debug assertion anyway, I decided to just add an extra field when debug assertions are enabled that tracks how many iterations have passed. That way, we can debug "infinite" loops, but we don't actually take the performance hit in release mode.

@@ -181,11 +143,6 @@ impl TableLayout {
}
}

/// A reference to an empty bucket into which an can be inserted.
Copy link
Contributor Author

@clarfonthey clarfonthey Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to delete InsertSlot because I think it's a bad API: there's not much stopping you from creating an invalid insertion slot, and it doesn't really gain you anything by making the type different from an ordinary index. We can revisit this type of API for HashTable if it seems worthwhile in the future.

FWIW, I also don't like Bucket either, but that one stays for now since it's mostly separate from the probing logic.

@clarfonthey clarfonthey force-pushed the control-module branch 3 times, most recently from 0d28e9b to e8b1ac6 Compare October 15, 2024 01:20
Comment on lines +3646 to +3655
Some((_, ptr)) => {
// FIXME: this is the worst API ever oh no oh dear I hate it
// FIXME: use NonNull::add if MSRV > 1.80.0. hopefully this is
// deleted before then…
// SAFETY: Bucket wants our pointer to be offset by one, so,
// make it so…
let ptr = unsafe { NonNull::new_unchecked(ptr.as_ptr().add(1)) };
Some(Bucket { ptr })
Copy link
Contributor Author

@clarfonthey clarfonthey Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shout outs to miri for helping me find this issue.

So, okay. I didn't want to fix all usages of Bucket in this PR and make them all actually put the pointer in the correct place since that's a big change. But I'm also not changing ProbeItems to return a pointer to somewhere other than where the item is. That's clearly the wrong answer.

If it's actually a big enough deal for performance, I'll try and fix the bucket API to be less bad. But that API is on the chopping block and hopefully I can get to it before we want to do a release anyway.

//
// I attempted an implementation on ARM using NEON instructions, but it
// turns out that most NEON instructions have multi-cycle latency, which in
// the end outweighs any gains over the generic implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I forgot to delete this paragraph when I added the new NEON implementation.

@@ -30,6 +30,12 @@ impl Tag {
self.0 & 0x01 != 0
}

/// Checks whether a control value is EMPTY.
#[inline]
pub(crate) const fn is_empty(self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't needed, you can just do == Tag::EMPTY. It will give you more efficient code too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why special_is_empty doesn't do that, then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a good reason for having it implemented this way. In the insertion code we use this code:

self.table.growth_left -= old_ctrl.special_is_empty() as usize;

Because special_is_empty only looks at the low bit, this becomes branchless and compiles down to 2 instructions: and & sub.

///
/// The given bucket mask must be equal to the power-of-two number of buckets minus one,
/// and the given control pointer must be a valid hash table.
pub(crate) unsafe fn new(ctrl: NonNull<Tag>, bucket_mask: usize, hash: u64) -> Probe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All non-generic functions need to have #[inline]. And hot generic functions should as well, since it causes them to be replicated in all codegen units they are referenced by instead of relying on LTO to inline them.

}

/// Gets the inner slice of groups.
fn control_slice(&self) -> &[Tag] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a single group is sufficient. We only need GROUP_SIZE - 1 trailing bytes, and the static empty table has 1 bucket. Hence a total of GROUP_SIZE control bytes.

Also since control_slice is only used by debugging code, it only needs to include the first bucket_mask + 1 control bytes. The trailing bytes are a mirror of the first bytes, or EMPTY in the case of tables smaller than the group size.

}

/// Verifies that the control slice is valid.
fn assert_at_least_one_empty(&self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert will incorrectly pass on tables smaller than the group size since the tail bytes are EMPTY. Hence the recommendation for control_bytes to ignore the trailing bytes.

self.assert_at_least_one_empty();
loop {
// SAFETY: We always return an item from the iterator.
let group_probe = unsafe { self.next().unwrap_unchecked() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using unwrap_unchecked everywhere, it would be better to not use the Iterator trait here and just have an inherent next method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some overall thoughts on this file:

This is a lot of code that is generic over T which really doesn't need to be. This can severely hurt compilation times. It would be better to keep all of the probing logic only using usize and have a separate get/get_mut on the RawTable to access elements by index.

Basically, I would expect this entire file to only deal with usize indices and not contain any mention of T. Perhaps it would be easier if you first ripped out the Bucket API and replaced it with indices before cleaning up the internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if I'm misunderstanding this, but I only made things generic over T when I absolutely had to: the only things that involve T either directly return pointers to T or require calling functions that take &T, both of which can't be made non-generic. The majority of the probing logic isn't generic at all, and operates just on usize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that GroupProbeItems and ProbeItems are generic over T and have methods that contain a lot of the searching logic which is going to be replicated for each T. I think it would be better if all the code in this file only dealt with usize indices and instead let the caller deal with mapping these indices to values in the data array. Similarly methods like find_full can take a FnMut(usize) -> bool closure where the closure deals with getting the element for a given index.

@bors
Copy link
Contributor

bors commented Nov 13, 2024

☔ The latest upstream changes (presumably #586) made this pull request unmergeable. Please resolve the merge conflicts.

@clarfonthey
Copy link
Contributor Author

So, I've gotten back into this, and I am realising that cargo llvm-lines is returning double what it did for the commit before it, but it's kind of strange, because it's not due to monomorphisation. I'm going to try and investigate a bit.

However, I am going to see what tinier bits of this I can merge in that will be less intrusive, like the double-width static group allocation since that's technically removing some potential UB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants