Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashTable iterator over entries #599

Open
TylerBloom opened this issue Dec 11, 2024 · 4 comments
Open

HashTable iterator over entries #599

TylerBloom opened this issue Dec 11, 2024 · 4 comments

Comments

@TylerBloom
Copy link

TylerBloom commented Dec 11, 2024

Hello,
I have a crate that creates a key-to-key map (i.e. a "double-sided" map). This is implemented with two maps that index into each other. I want to implement an iterator similar to hash_table::ExtractIf, but I must provide an reference to both the left and right elements to the predicate and have that iterator yield both elements.

The problem is that there is no direct way to solve this with the existing APIs to the HashTable and still make the borrow checker happy. The best solution that I've come up with the existing APIs is to use HashTable::extract_if on one table to "smuggle out" elements from the other side and then pull the smuggled item out with Iterator::map. This works but is far from ideal:

let push = Rc::new(RefCell::new(None));
let pull = push_cache.clone();
let right_set = &mut self.right_set;
self.left_set
    .extract_if(move |left| {
        let Ok(right_entry) =
            right_set.find_entry(left.hash, hash_and_id(left.hash, left.id))
        else {
            // NOTE: This shouldn't happen.
            return false;
        };
        if f(&left.value, &right_entry.get().value) {
            push.borrow_mut().insert(right_entry.remove().0);
            true
        } else {
            false
        }
    })
    .map(move |l| {
        let Some(r) = pull.borrow_mut().take() else {
            panic!()
        };
        (l.value, r.value)
    })

However, if there were an iterator over the hash table that yielded OccupiedEntrys rather than just references, this iterator would be trivial to construct. I could zip the two iterators together, pass the references to the predicate, and, if true, call OccupiedEntry::remove on both entries.

Would such an iterator be possible to implement for HashTable?

@clarfonthey
Copy link
Contributor

So, just to be clear, you want an iterator that yields OccupiedEntry for occupied entries and… something else, for vacant entries?

Note that we can't return VacantEntry because that requires storing a key, and we won't know what the key is for empty entries. Perhaps Option<OccupiedEntry> or something similar would work.

In this case, does it even need to be OccupiedEntry? Would it be possible to just return Option<(&K, &V)> for occupied entries?

@Amanieu
Copy link
Member

Amanieu commented Dec 11, 2024

I think the solution for this is the same as #588: some form of cursor API which allows you to advance through the table while choosing at each step whether to remove the element.

The API could look something like this (very rough draft):

struct ExtractCursor<'a>;

impl ExtractCursor<'_> {
    fn get(&self) -> &T;
	fn get_mut(&mut self) -> &mut T;
    fn remove(&mut self) -> T;
    fn next(&mut self) -> bool;
}

@TylerBloom
Copy link
Author

TylerBloom commented Dec 11, 2024

So, just to be clear, you want an iterator that yields OccupiedEntry for occupied entries and… something else, for vacant entries?

Sorry, I should have been more clear. I only want to iterate over the occupied entries so that I can (optionally) remove values from them. I do not have a need to iterate over the vacant entries (nor do I know how one would do that). To put this into context, the code sample I provided could be rewritten (loosely) like this with an iterator over entries:

// NOTE: Each table contains wrapper structs that contain data to point 
self.left_set.iter_entries()
             // Get the corresponding entry from the right table
             .map(|l| (l, self.right_set.find_entry(l.hash, hash_and_id(l.hash, l.id)).unwrap())))
             // Pass it through the predicate (similar to `extract_if` or `retain`)
             .filter(|(l, r)| predicate(l.get(), r.get()))
             // Call remove on the entries because they passed through and yield both items
             .map(|(l, r)| (l.remove(), r.remove()))

The cursor API could also work. The solution would look a lot different, but that's fine by me.

@cuviper
Copy link
Member

cuviper commented Dec 11, 2024

Note that the current OccupiedEntry holds a &mut HashTable, which means we cannot have multiple entries in existence at the same time -- the Iterator API is too flexible for what you're trying to do. A cursor avoids that problem by only ever having one instance for the table, and you can extract items without your Item counting as a table borrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants