0.4.1 (#19)

* add missing constructor methods to SegmentedHashMap (#17) * add SegmentedHashMap BuildHasher constructors * SegmentedHashMap#with_hasher: default capacity and number of segments, user specified BuildHasher instance * SegmentedHashMap#with_capacity_and_hasher: default number of segments, user specified capacity and BuildHasher instance * correctly implement SegmentedHashMap#default I'm not exactly sure what the previous implementation would've done, but it sure wouldn't have been what you wanted. At the very least, the segment shift would've been wrong. * add default test case This will catch a faultily-implemented Default, such as for the previous version of SegmentedHashMap * Rewrite or update all documentation (#18) * update SegmentedHashMap module and struct docs A fair amount of this is copied from the documentation of std::collections::HashMap. * update SegmentedHashMap constructor documentation * update SegmentedHashMap#capacity documentation Note to self: update the segment module documentation to explain what a segment is. Also, update the crate documentation to explain the lockfree hash table algorithm. * update SegmentedHashMap#get* docs The standard library is a little terser when referring to the requirements on `Q`, which is a nice change. * update SegmentedHashMap#insert* docs * update SegmentedHashMap#insert_*or_modify* docs goodness is this a lot of functions to maintain * update SegmentedHashMap#remove* docs * update SegmentedHashMap#remove_*if* docs * update SegmentedHashMap#modify docs * update HashMap documentation this is 100% copy-pasted from the SegmentedHashMap documentation I wrote, except when it shouldn't have been. * update segment module documentation The new documentation explains the basics of segmented hash tables, as well as the likely performance wins and losses for using it instead of the regular hash table. * update crate-level documentation Describe, in *excruciating* detail, the gory details of the hash table algorithm. * update README.md Shill for SegmentedHashTable a little bit * bump version to 0.4.1
Gregory-Meyer · Mar 11, 2020 · b7770a0 · b7770a0
1 parent cf85f62
commit b7770a0
Show file tree

Hide file tree

Showing 7 changed files with 650 additions and 639 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "cht"
-version = "0.4.0"
+version = "0.4.1"
 authors = ["Gregory Meyer <[email protected]>"]
 edition = "2018"
 description = "Lockfree resizeable concurrent hash table."

diff --git a/README.md b/README.md
@@ -4,15 +4,18 @@
 [![docs.rs](https://docs.rs/cht/badge.svg)](https://docs.rs/cht)
 [![Travis CI](https://travis-ci.com/Gregory-Meyer/cht.svg?branch=master)](https://travis-ci.com/Gregory-Meyer/cht)
 
-cht provides a lockfree hash table that supports concurrent lookups, insertions,
-and deletions.
+cht provides a lockfree hash table that supports fully concurrent lookups,
+insertions, modifications, and deletions. The table may also be concurrently
+resized to allow more elements to be inserted. cht also provides a segmented
+hash table using the same lockfree algorithm for increased concurrent write
+performance.
 
 ## Usage
 
 In your `Cargo.toml`:
 
 ```toml
-cht = "^0.3.0"
+cht = "^0.4.1"
 ```
 
 Then in your code:

diff --git a/src/lib.rs b/src/lib.rs
@@ -22,14 +22,78 @@
 // CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 // SOFTWARE.
 
-//! Lockfree resizeable concurrent hash table.
+//! Lockfree hash tables.
 //!
-//! The hash table in this crate was inspired by
-//! [a blog post by Jeff Phreshing], which describes the implementation of a
-//! hash table in [Junction].
+//! The hash tables in this crate are, at their core, open addressing hash
+//! tables implemented using open addressing and boxed buckets. The core of
+//! these hash tables are bucket arrays, which consist of a vector of atomic
+//! pointers to buckets, an atomic pointer to the next bucket array, and an
+//! epoch number. In the context of this crate, an atomic pointer is a nullable
+//! pointer that is accessed and manipulated using atomic memory operations.
+//! Each bucket consists of a key and a possibly-uninitialized value.
+//!
+//! The key insight into making the hash table resizeable is to incrementally
+//! copy buckets from the old bucket array to the new bucket array. As buckets
+//! are copied between bucket arrays, their pointers in the old bucket array are
+//! CAS'd with a null pointer that has a sentinel bit set. If the CAS fails,
+//! that thread must read the bucket pointer again and retry copying it into the
+//! new bucket array. If at any time a thread reads a bucket pointer with the
+//! sentinel bit set, that thread knows that a new (larger) bucket array has
+//! been allocated. That thread will then immediately attempt to copy all
+//! buckets to the new bucket array. It is possible to implement an algorithm in
+//! which a subset of buckets are relocated per-thread; such an algorithm has
+//! not been implemented for the sake of simplicity.
+//!
+//! Bucket pointers that have been copied from an old bucket array into a new
+//! bucket array are marked with a borrowed bit. If a thread copies a bucket
+//! from an old bucket array into a new bucket array, fails to CAS the bucket
+//! pointer in the old bucket array, it attempts to CAS the bucket pointer in
+//! the new bucket array that it previously inserted to. If the bucket pointer
+//! in the new bucket array does *not* have the borrowed tag bit set, that
+//! thread knows that the value in the new bucket array was modified more
+//! recently than the value in the old bucket array. To avoid discarding updates
+//! to the new bucket array, a thread will never replace a bucket pointer that
+//! has the borrowed tag bit set with one that does not. To see why this is
+//! necessary, consider the case where a bucket pointer is copied into the new
+//! array, removed from the new array by a second thread, then copied into the
+//! new array again by a third thread.
+//!
+//! Mutating operations are, at their core, an atomic compare-and-swap (CAS) on
+//! a bucket pointer. Insertions CAS null pointers and bucket pointers with
+//! matching keys, modifications CAS bucket pointers with matching keys, and
+//! removals CAS non-tombstone bucket pointers. Tombstone bucket pointers are
+//! bucket pointers with a tombstone bit set as part of a removal; this
+//! indicates that the bucket's value has been moved from and will be destroyed
+//! if it has not beel already.
+//!
+//! As previously mentioned, removing an entry from the hash table results in
+//! that bucket pointer having a tombstone bit set. Insertions cannot
+//! displace a tombstone bucket unless their key compares equal, so once an
+//! entry is inserted into the hash table, the specific index it is assigned to
+//! will only ever hold entries whose keys compare equal. Without this
+//! restriction, resizing operations could result in the old and new bucket
+//! arrays being temporarily inconsistent. Consider the case where one thread,
+//! as part of a resizing operation, copies a bucket into a new bucket array
+//! while another thread removes and replaces that bucket from the old bucket
+//! array. If the new bucket has a non-matching key, what happens to the bucket
+//! that was just copied into the new bucket array?
+//!
+//! Tombstone bucket pointers are typically not copied into new bucket arrays.
+//! The exception is the case where a bucket pointer was copied to the new
+//! bucket array, then CAS on the old bucket array fails because that bucket has
+//! been replaced with a tombstone. In this case, the tombstone bucket pointer
+//! will be copied over to reflect the update without displacing a key from its
+//! bucket.
+//!
+//! This hash table algorithm was inspired by [a blog post by Jeff Phreshing]
+//! that describes the implementation of the Linear hash table in [Junction], a
+//! C++ library of concurrent data structrures. Additional inspiration was drawn
+//! from the lockfree hash table described by Cliff Click in [a tech talk] given
+//! at Google in 2007.
 //!
 //! [a blog post by Jeff Phreshing]: https://preshing.com/20160222/a-resizable-concurrent-map/
 //! [Junction]: https://github.com/preshing/junction
+//! [a tech talk]: https://youtu.be/HJ-719EGIts
 
 pub mod map;
 pub mod segment;