Add low-level `HashTable` API #466

Amanieu · 2023-08-31T17:12:53Z

The primary use case for this type over HashMap or HashSet is to support types that do not implement the Hash and Eq traits, but instead require additional data not contained in the key itself to compute a hash and compare two elements for equality.

HashTable has some similarities with RawTable, but has a completely safe API. It is intended as a replacement for the existing raw entry API, with the intend of deprecating the latter and eventually removing it.

Examples of when this can be useful include:

An IndexMap implementation where indices into a Vec are stored as elements in a HashTable<usize>. Hashing and comparing the elements requires indexing the associated Vec to get the actual value referred to by the index.
Avoiding re-computing a hash when it is already known.
Mutating the key of an element in a way that doesn't affect its hash.

To achieve this, HashTable methods that search for an element in the table require a hash value and equality function to be explicitly passed in as arguments. The method will then iterate over the elements with the given hash and call the equality function on each of them, until a match is found.

Zoxc · 2023-09-02T20:00:39Z

I wonder if it would make sense to have eq and hasher utility methods for when you have Eq or Hash available. Passing different closures to methods is a bit of a code generation trap. It could help in some cases for that.

JustForFun88 · 2023-09-04T06:54:09Z

I wonder if it would make sense to have eq and hasher utility methods for when you have Eq or Hash available. Passing different closures to methods is a bit of a code generation trap. It could help in some cases for that.

I think it's hardly possible to predict what type users will want to store in a table. It can be T, (T1, T2), (T1, T2, ...Tn).

beviu · 2023-09-04T11:28:28Z

In both HashMap and HashTable, only one entry "view" (Entry or RawEntry) structure can exist at a time. If you want to manipulate multiple entries at the same time, it seems like you have to use the raw Bucket and RawTable API.

I think that this is not possible to fix though: it is similar to arrays where there can only be one mutable reference to an element at a time (although there is split_off in that case).

The use case I was wondering about from #450 was if you want modify an entry's key (possibly changing its hash), but also check if the new key is already in the map before actually?

If the check is done before looking up the entry for the old key, then the bucket for the new key is looked up twice.
If the check is done after looking up the entry for the old key, then the bucket for the old key is looked up twice.

bors · 2023-09-05T13:30:28Z

☔ The latest upstream changes (presumably #468) made this pull request unmergeable. Please resolve the merge conflicts.

cuviper

I played with actually converting indexmap to this. Apart from get_many_mut noted below, I think I would also need something like Occupied/VacantEntry::into_table(self) -> &mut HashTable, because my OccupiedEntry::remove methods need to adjust other indices in the table.

src/table.rs

Zoxc · 2023-09-12T06:02:42Z

It would be useful to have fallible variants of entry and insert_unchecked. They can have better code generation as they don't need to panic.

Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1">Before</th><td colspan="2">After</th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 clap:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 hyper:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 regex:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 syn:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 syntex_syntax:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`

Amanieu · 2023-09-22T06:18:15Z

It would be useful to have fallible variants of entry and insert_unchecked. They can have better code generation as they don't need to panic.

You mean fallible allocations? Isn't that already addressed by try_reserve?

src/table.rs

Zoxc · 2023-09-22T17:39:13Z

You mean fallible allocations? Isn't that already addressed by try_reserve?

I mean variants which won't grow, like Vec::push_within_capacity.

Amanieu · 2023-09-23T02:10:25Z

I mean variants which won't grow, like Vec::push_within_capacity.

I could see an insert_unchecked_within_capacity, but this doesn't make sense for entry since the first thing is does is reserve(1). I'll leave it for a future PR though.

Zoxc · 2023-09-23T03:05:43Z

It does make sense for entry_within_capacity too. It would just return None if there isn't capacity for 1 insertion.

src/table.rs

JustForFun88 · 2023-09-24T14:17:25Z

src/table.rs

+        eq: impl FnMut(&T) -> bool,
+        hasher: impl Fn(&T) -> u64,
+    ) -> Entry<'_, T, A> {
+        match self.raw.find_or_find_insert_slot(hash, eq, hasher) {


I'm playing with HashTable and came across a small problem. Can we directly provide some version of the find_or_find_insert_slot (checked insert) function?
It's very annoying that every time I need to insert a value I have to use the entry syntax (which is not that cheap) or use find + insert_unchecked, which is slow.

It can be something like this one:

pub fn insert<V>( &mut self, hash: u64, value: T, mut eq: impl FnMut(&T, &T) -> bool, hasher: impl Fn(&T) -> u64, replace: impl FnOnce(&mut T, T) -> V, ) -> Option<V> { match self .raw .find_or_find_insert_slot(hash, |found_val| eq(found_val, &value), hasher) { Ok(bucket) => Some(replace(unsafe { &mut bucket.as_mut() }, value)), Err(slot) => { unsafe { self.raw.insert_in_slot(hash, slot, value); } None } } }

Then it can be used like:

pub struct NewMap<K, V, S = DefaultHashBuilder, A: Allocator = Global> { pub(crate) hash_builder: S, pub(crate) table: HashTable<(K, V), A>, } impl<K, V, S, A> NewMap<K, V, S, A> where K: Eq + core::hash::Hash, S: core::hash::BuildHasher, A: Allocator, { pub fn insert(&mut self, k: K, v: V) -> Option<V> { let hash = make_hash::<K, S>(&self.hash_builder, &k); let hasher = make_hasher::<_, V, S>(&self.hash_builder); self.table.insert( hash, (k, v), |found, new| found.0 == new.0, hasher, |(_, val_ref), (_, val)| core::mem::replace(val_ref, val), ) } }

entry already maps directly to find_or_find_insert_slot. You can then use Entry::insert to unconditionally overwrite an existing value, or Entry::or_insert to only insert a new value if an old one doesn't already exist.

Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1">Before</th><td colspan="2">After</th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 clap:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 hyper:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 regex:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 syn:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 syntex_syntax:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`

The primary use case for this type over [`HashMap`] or [`HashSet`] is to support types that do not implement the [`Hash`] and [`Eq`] traits, but instead require additional data not contained in the key itself to compute a hash and compare two elements for equality. `HashTable` has some similarities with `RawTable`, but has a completely safe API. It is intended as a replacement for the existing raw entry API, with the intend of deprecating the latter and eventually removing it. Examples of when this can be useful include: - An `IndexMap` implementation where indices into a `Vec` are stored as elements in a `HashTable<usize>`. Hashing and comparing the elements requires indexing the associated `Vec` to get the actual value referred to by the index. - Avoiding re-computing a hash when it is already known. - Mutating the key of an element in a way that doesn't affect its hash. To achieve this, `HashTable` methods that search for an element in the table require a hash value and equality function to be explicitly passed in as arguments. The method will then iterate over the elements with the given hash and call the equality function on each of them, until a match is found.

Co-authored-by: Josh Stone <[email protected]>

Amanieu · 2023-10-19T16:29:46Z

@bors r+

bors · 2023-10-19T16:29:49Z

📌 Commit b533626 has been approved by Amanieu

It is now in the queue for this repository.

Amanieu · 2023-10-19T16:30:11Z

@bors r+

bors · 2023-10-19T16:30:12Z

📌 Commit 9556bf4 has been approved by Amanieu

It is now in the queue for this repository.

bors · 2023-10-19T16:30:55Z

⌛ Testing commit 9556bf4 with merge ef84e09...

bors · 2023-10-19T16:51:42Z

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing ef84e09 to master...

Optimize hash map operations in the query system This optimizes hash map operations in the query system by explicitly passing hashes and using more optimal operations. `find_or_find_insert_slot` in particular saves a hash table lookup over `entry`. It's not yet available in a safe API, but will be in rust-lang/hashbrown#466. <table><tr><td rowspan="2">Benchmark</td><td colspan="1">Before</th><td colspan="2">After</th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 clap:check</td><td align="right">1.6189s</td><td align="right">1.6129s</td><td align="right"> -0.37%</td></tr><tr><td>🟣 hyper:check</td><td align="right">0.2353s</td><td align="right">0.2337s</td><td align="right"> -0.67%</td></tr><tr><td>🟣 regex:check</td><td align="right">0.9344s</td><td align="right">0.9289s</td><td align="right"> -0.59%</td></tr><tr><td>🟣 syn:check</td><td align="right">1.4693s</td><td align="right">1.4652s</td><td align="right"> -0.28%</td></tr><tr><td>🟣 syntex_syntax:check</td><td align="right">5.6606s</td><td align="right">5.6439s</td><td align="right"> -0.30%</td></tr><tr><td>Total</td><td align="right">9.9185s</td><td align="right">9.8846s</td><td align="right"> -0.34%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9956s</td><td align="right"> -0.44%</td></tr></table> r? `@cjgillot`

Amanieu force-pushed the hashtable branch from 42ee24a to 763ccf4 Compare August 31, 2023 17:16

This was referenced Aug 31, 2023

Add a insert_unique_hashed_nocheck method #452

Closed

Add reinsert to RawOccupiedEntryMut #450

Closed

feat: add return val to replace_bucket_with replace_entry_with #419

Open

cuviper reviewed Sep 6, 2023

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

src/table.rs Outdated Show resolved Hide resolved

cuviper reviewed Sep 7, 2023

View reviewed changes

src/table.rs Show resolved Hide resolved

cuviper reviewed Sep 7, 2023

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

Zoxc mentioned this pull request Sep 11, 2023

Optimize hash map operations in the query system rust-lang/rust#115747

Closed

Amanieu force-pushed the hashtable branch 3 times, most recently from 98e2f78 to c33ce2d Compare September 22, 2023 06:27

cuviper reviewed Sep 22, 2023

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

Zoxc reviewed Sep 23, 2023

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

Zoxc reviewed Sep 23, 2023

View reviewed changes

src/table.rs Outdated Show resolved Hide resolved

JustForFun88 reviewed Sep 24, 2023

View reviewed changes

Amanieu force-pushed the hashtable branch from 56fdb91 to a2cc3c7 Compare September 25, 2023 09:36

Amanieu added 3 commits October 19, 2023 00:44

Add HashTable::get_many_mut

a2b8f18

Add Send and Sync for hash_table::OccupiedEntry

3b8426e

Amanieu and others added 5 commits October 19, 2023 00:44

Add the ability to recover the original HashTable from an entry

06ba464

Update src/table.rs

cce9925

Co-authored-by: Josh Stone <[email protected]>

Make HashTable::find_entry return AbsentEntry on failure

05bee57

Rename insert_unchecked to insert_unique

878b5bf

Minor cleanups

cbbb823

Amanieu force-pushed the hashtable branch from a2cc3c7 to cbbb823 Compare October 19, 2023 16:06

Fix rustdoc warnings

9556bf4

Amanieu force-pushed the hashtable branch from b533626 to 9556bf4 Compare October 19, 2023 16:30

bors merged commit ef84e09 into rust-lang:master Oct 19, 2023
25 checks passed

bors mentioned this pull request Oct 19, 2023

Initial implementation of try_get_many #408

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low-level `HashTable` API #466

Add low-level `HashTable` API #466

Amanieu commented Aug 31, 2023 •

edited

Loading

Zoxc commented Sep 2, 2023

JustForFun88 commented Sep 4, 2023

beviu commented Sep 4, 2023

bors commented Sep 5, 2023

cuviper left a comment

Zoxc commented Sep 12, 2023

Amanieu commented Sep 22, 2023

Zoxc commented Sep 22, 2023

Amanieu commented Sep 23, 2023

Zoxc commented Sep 23, 2023

JustForFun88 Sep 24, 2023

Amanieu Sep 24, 2023

Amanieu commented Oct 19, 2023

bors commented Oct 19, 2023

Amanieu commented Oct 19, 2023

bors commented Oct 19, 2023

bors commented Oct 19, 2023

bors commented Oct 19, 2023

Add low-level HashTable API #466

Add low-level HashTable API #466

Conversation

Amanieu commented Aug 31, 2023 • edited Loading

Zoxc commented Sep 2, 2023

JustForFun88 commented Sep 4, 2023

beviu commented Sep 4, 2023

bors commented Sep 5, 2023

cuviper left a comment

Choose a reason for hiding this comment

Zoxc commented Sep 12, 2023

Amanieu commented Sep 22, 2023

Zoxc commented Sep 22, 2023

Amanieu commented Sep 23, 2023

Zoxc commented Sep 23, 2023

JustForFun88 Sep 24, 2023

Choose a reason for hiding this comment

Amanieu Sep 24, 2023

Choose a reason for hiding this comment

Amanieu commented Oct 19, 2023

bors commented Oct 19, 2023

Amanieu commented Oct 19, 2023

bors commented Oct 19, 2023

bors commented Oct 19, 2023

bors commented Oct 19, 2023

Add low-level `HashTable` API #466

Add low-level `HashTable` API #466

Amanieu commented Aug 31, 2023 •

edited

Loading