Move probing logic into control module #578

clarfonthey · 2024-10-15T00:34:46Z

This is the follow-up to #568, which should be merged first. (That's why this is marked as a draft.)

Essentially, this pulls the bulk of the probing logic out of RawTable and into iterators which can more easily be reasoned with in isolation. The code isn't identical to that in RawTable, but it seems to pass all the tests, so, I'm assuming that it all works correctly.

clarfonthey · 2024-10-15T00:44:25Z

src/control/probe.rs

+
+    /// Finds an empty slot in the group to insert a new item.
+    #[inline]
+    pub(crate) fn empty(&self) -> Option<usize> {


This is a combination of RawTableInner::find_insert_slot_in_group and RawTableInner::fix_insert_slot.

clarfonthey · 2024-10-15T00:45:30Z

src/control/probe.rs

+    ///
+    /// This is equivalent to `empty().is_some()`.
+    #[inline]
+    pub(crate) fn has_empty(&self) -> bool {


This was used in RawTableInner::find_or_find_insert_slot_inner, RawTableInner::find_inner, and RawIterHashInner::next.

clarfonthey · 2024-10-15T00:46:27Z

src/control/probe.rs

+}
+impl<T> GroupProbeItems<T> {
+    /// Return the index of the first item that satisfies the predicate.
+    pub(crate) fn find(self, mut f: impl FnMut(&T) -> bool) -> Option<usize> {


This is taken from RawTableInner::find_or_find_insert_slot_inner.

clarfonthey · 2024-10-15T00:48:56Z

src/control/probe.rs

+    }
+
+    /// Gets the inner slice of groups.
+    fn control_slice(&self) -> &[Tag] {


This method basically indicated a bug in the existing group code: we assume that there are always this many initialized tags in the control data, but that's not what Group::static_empty gave us: it only gave us a single group, rather than two like this would imply.

So, to help ensure that methods like this always work, and so we don't have to waltz around the concept of "we probably won't read that many groups," the static group allocations were doubled.

Actually a single group is sufficient. We only need GROUP_SIZE - 1 trailing bytes, and the static empty table has 1 bucket. Hence a total of GROUP_SIZE control bytes.

Also since control_slice is only used by debugging code, it only needs to include the first bucket_mask + 1 control bytes. The trailing bytes are a mirror of the first bytes, or EMPTY in the case of tables smaller than the group size.

So, I agree that we technically only need this many bytes, but all the existing code relies on num_ctrl_bytes, which is defined as bucket_mask + 1 + Group::WIDTH, not bucket_mask + Group::WIDTH. So, in all cases, you need at least two groups.

I'm open to changing this, but I figured that the empty table case wasn't really worth optimising for as long as it didn't allocate anything.

So I went back and double-checked the implementation. We only need Group::WIDTH - 1 tail bytes for probing, but we actually need Group::WIDTH padding bytes for set_ctrl (the last byte is only written but never read), hence the current definition of num_ctrl_bytes.

So the correct length for the static allocation should be 1 + Group::WIDTH.

clarfonthey · 2024-10-15T00:49:33Z

src/control/probe.rs

+    }
+
+    /// Verifies that the control slice is valid.
+    fn assert_at_least_one_empty(&self) {


I added this while I was debugging; ultimately not needed, but it feels nice to have anyway. If you want, I can be more aggressive making sure this code is optimized out.

This assert will incorrectly pass on tables smaller than the group size since the tail bytes are EMPTY. Hence the recommendation for control_bytes to ignore the trailing bytes.

Wouldn't the tail bytes being empty satisfy the condition that at least one of the bytes is empty?

The problem is that it would satisfy this condition even if the table was actually full.

clarfonthey · 2024-10-15T00:51:57Z

src/control/probe.rs

+                    self.control_slice()
+                );
+            }
+            self.iter += 1;


This is the bug that made me think my code wasn't working at first: I was checking if self.stride > self.bucket_mask, when really, I should be checking if self.stride > self.bucket_mask * Group::WIDTH.

Except… that can overflow. So, since this is all for a debug assertion anyway, I decided to just add an extra field when debug assertions are enabled that tracks how many iterations have passed. That way, we can debug "infinite" loops, but we don't actually take the performance hit in release mode.

clarfonthey · 2024-10-15T00:53:30Z

src/raw/mod.rs

@@ -181,11 +143,6 @@ impl TableLayout {
    }
 }

-/// A reference to an empty bucket into which an can be inserted.


I decided to delete InsertSlot because I think it's a bad API: there's not much stopping you from creating an invalid insertion slot, and it doesn't really gain you anything by making the type different from an ordinary index. We can revisit this type of API for HashTable if it seems worthwhile in the future.

FWIW, I also don't like Bucket either, but that one stays for now since it's mostly separate from the probing logic.

clarfonthey · 2024-10-15T01:23:58Z

src/raw/mod.rs

+            Some((_, ptr)) => {
+                // FIXME: this is the worst API ever oh no oh dear I hate it
+                // FIXME: use NonNull::add if MSRV > 1.80.0. hopefully this is
+                //   deleted before then…
+                // SAFETY: Bucket wants our pointer to be offset by one, so,
+                //   make it so…
+                let ptr = unsafe { NonNull::new_unchecked(ptr.as_ptr().add(1)) };
+                Some(Bucket { ptr })


Shout outs to miri for helping me find this issue.

So, okay. I didn't want to fix all usages of Bucket in this PR and make them all actually put the pointer in the correct place since that's a big change. But I'm also not changing ProbeItems to return a pointer to somewhere other than where the item is. That's clearly the wrong answer.

If it's actually a big enough deal for performance, I'll try and fix the bucket API to be less bad. But that API is on the chopping block and hopefully I can get to it before we want to do a release anyway.

Amanieu · 2024-11-06T22:53:09Z

src/control/group/mod.rs

+    //
+    // I attempted an implementation on ARM using NEON instructions, but it
+    // turns out that most NEON instructions have multi-cycle latency, which in
+    // the end outweighs any gains over the generic implementation.


Oops, I forgot to delete this paragraph when I added the new NEON implementation.

Amanieu · 2024-11-06T22:57:20Z

src/control/tag.rs

@@ -30,6 +30,12 @@ impl Tag {
        self.0 & 0x01 != 0
    }

+    /// Checks whether a control value is EMPTY.
+    #[inline]
+    pub(crate) const fn is_empty(self) -> bool {


This method isn't needed, you can just do == Tag::EMPTY. It will give you more efficient code too.

Is there a reason why special_is_empty doesn't do that, then?

Yes, there is a good reason for having it implemented this way. In the insertion code we use this code:

self.table.growth_left -= old_ctrl.special_is_empty() as usize;

Because special_is_empty only looks at the low bit, this becomes branchless and compiles down to 2 instructions: and & sub.

Amanieu · 2024-11-06T23:10:43Z

src/control/probe.rs

+    ///
+    /// The given bucket mask must be equal to the power-of-two number of buckets minus one,
+    /// and the given control pointer must be a valid hash table.
+    pub(crate) unsafe fn new(ctrl: NonNull<Tag>, bucket_mask: usize, hash: u64) -> Probe {


All non-generic functions need to have #[inline]. And hot generic functions should as well, since it causes them to be replicated in all codegen units they are referenced by instead of relying on LTO to inline them.

Amanieu · 2024-11-06T23:12:09Z

src/control/probe.rs

+    }
+
+    /// Gets the inner slice of groups.
+    fn control_slice(&self) -> &[Tag] {


Actually a single group is sufficient. We only need GROUP_SIZE - 1 trailing bytes, and the static empty table has 1 bucket. Hence a total of GROUP_SIZE control bytes.

Also since control_slice is only used by debugging code, it only needs to include the first bucket_mask + 1 control bytes. The trailing bytes are a mirror of the first bytes, or EMPTY in the case of tables smaller than the group size.

Amanieu · 2024-11-06T23:15:12Z

src/control/probe.rs

+    }
+
+    /// Verifies that the control slice is valid.
+    fn assert_at_least_one_empty(&self) {


This assert will incorrectly pass on tables smaller than the group size since the tail bytes are EMPTY. Hence the recommendation for control_bytes to ignore the trailing bytes.

Amanieu · 2024-11-06T23:30:11Z

src/control/probe.rs

+        self.assert_at_least_one_empty();
+        loop {
+            // SAFETY: We always return an item from the iterator.
+            let group_probe = unsafe { self.next().unwrap_unchecked() };


Rather than using unwrap_unchecked everywhere, it would be better to not use the Iterator trait here and just have an inherent next method.

Amanieu · 2024-11-06T23:31:15Z

src/control/probe.rs

Some overall thoughts on this file:

This is a lot of code that is generic over T which really doesn't need to be. This can severely hurt compilation times. It would be better to keep all of the probing logic only using usize and have a separate get/get_mut on the RawTable to access elements by index.

Basically, I would expect this entire file to only deal with usize indices and not contain any mention of T. Perhaps it would be easier if you first ripped out the Bucket API and replaced it with indices before cleaning up the internals.

I'm not entirely sure if I'm misunderstanding this, but I only made things generic over T when I absolutely had to: the only things that involve T either directly return pointers to T or require calling functions that take &T, both of which can't be made non-generic. The majority of the probing logic isn't generic at all, and operates just on usize.

My main concern is that GroupProbeItems and ProbeItems are generic over T and have methods that contain a lot of the searching logic which is going to be replicated for each T. I think it would be better if all the code in this file only dealt with usize indices and instead let the caller deal with mapping these indices to values in the data array. Similarly methods like find_full can take a FnMut(usize) -> bool closure where the closure deals with getting the element for a given index.

bors · 2024-11-13T12:24:18Z

☔ The latest upstream changes (presumably #586) made this pull request unmergeable. Please resolve the merge conflicts.

clarfonthey · 2024-12-07T23:18:06Z

So, I've gotten back into this, and I am realising that cargo llvm-lines is returning double what it did for the commit before it, but it's kind of strange, because it's not due to monomorphisation. I'm going to try and investigate a bit.

However, I am going to see what tinier bits of this I can merge in that will be less intrusive, like the double-width static group allocation since that's technically removing some potential UB.

clarfonthey mentioned this pull request Oct 15, 2024

Further sequester Group/Tag code #568

Merged

clarfonthey force-pushed the control-module branch 2 times, most recently from d5038e4 to d987f62 Compare October 15, 2024 00:36

clarfonthey commented Oct 15, 2024

View reviewed changes

clarfonthey force-pushed the control-module branch 3 times, most recently from 0d28e9b to e8b1ac6 Compare October 15, 2024 01:20

clarfonthey commented Oct 15, 2024

View reviewed changes

clarfonthey force-pushed the control-module branch from e8b1ac6 to d652353 Compare October 15, 2024 01:25

Amanieu reviewed Nov 6, 2024

View reviewed changes

clarfonthey force-pushed the control-module branch from d652353 to 262899b Compare November 13, 2024 19:34

Move probing logic to control module

267d3e4

clarfonthey force-pushed the control-module branch from 262899b to 267d3e4 Compare December 7, 2024 23:18

clarfonthey mentioned this pull request Dec 7, 2024

Double the width of static allocations for empty tables #596

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move probing logic into control module #578

Move probing logic into control module #578

clarfonthey commented Oct 15, 2024

clarfonthey Oct 15, 2024

clarfonthey Oct 15, 2024

clarfonthey Oct 15, 2024

clarfonthey Oct 15, 2024

Amanieu Nov 6, 2024

clarfonthey Nov 6, 2024

Amanieu Nov 7, 2024

clarfonthey Oct 15, 2024

Amanieu Nov 6, 2024

clarfonthey Nov 6, 2024

Amanieu Nov 7, 2024

clarfonthey Oct 15, 2024

clarfonthey Oct 15, 2024 •

edited

Loading

clarfonthey Oct 15, 2024 •

edited

Loading

Amanieu Nov 6, 2024

Amanieu Nov 6, 2024

clarfonthey Nov 6, 2024

Amanieu Nov 7, 2024

Amanieu Nov 6, 2024

Amanieu Nov 6, 2024

Amanieu Nov 6, 2024

Amanieu Nov 6, 2024

Amanieu Nov 6, 2024

clarfonthey Nov 6, 2024

Amanieu Nov 14, 2024

bors commented Nov 13, 2024

clarfonthey commented Dec 7, 2024

Move probing logic into control module #578

Are you sure you want to change the base?

Move probing logic into control module #578

Conversation

clarfonthey commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clarfonthey Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

clarfonthey Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bors commented Nov 13, 2024

clarfonthey commented Dec 7, 2024

clarfonthey Oct 15, 2024 •

edited

Loading

clarfonthey Oct 15, 2024 •

edited

Loading