Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashMap reports zero capacity without freeing memory #79178

Open
Diggsey opened this issue Nov 18, 2020 · 6 comments · May be fixed by rust-lang/hashbrown#436
Open

HashMap reports zero capacity without freeing memory #79178

Diggsey opened this issue Nov 18, 2020 · 6 comments · May be fixed by rust-lang/hashbrown#436
Labels
A-collections Area: `std::collection` C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@Diggsey
Copy link
Contributor

Diggsey commented Nov 18, 2020

It's generally expected that when std collections return a capacity of zero, that they are not backed by any allocation. I believe this was also true for the HashMap prior to hashbrown, although I could be wrong.

In any case, whilst HashMap::capacity() is documented as a lower bound I still find it extremely surprising that it reports a capacity of zero without freeing its memory.

You can reproduce the issue with the following program (using mockalloc to detect the leaked memory):

use std::alloc::System;
use std::collections::hash_map::{DefaultHasher, HashMap};
use std::hash::BuildHasherDefault;

use mockalloc::Mockalloc;

#[global_allocator]
static ALLOCATOR: Mockalloc<System> = Mockalloc(System);

const NUMBERS: &[usize] = &[
    25, 38, 41, 42, 89, 115, 184, 237, 273, 286, 300, 326, 377, 413, 482, 536, 602, 650, 702, 746,
    750, 807, 810, 836, 960, 979, 982, 1007,
];

fn main() {
    let mut guard =
        HashMap::<String, (), _>::with_hasher(BuildHasherDefault::<DefaultHasher>::default());
    mockalloc::assert_allocs(|| {
        for &n in NUMBERS {
            let k = n.to_string();
            guard.insert(k, ());
        }
        for &n in NUMBERS {
            let k = n.to_string();
            guard.remove(&k);
            if guard.len() * 3 < guard.capacity() {
                guard.shrink_to_fit();
            }
        }
        eprintln!("Capacity: {}", guard.capacity());
        //guard.shrink_to_fit();
    });
}

Given these lines:

if guard.len() * 3 < guard.capacity() {
    guard.shrink_to_fit();
}

I expected all the memory to be reclaimed by the end of the loop. However, this is not the case: the memory is only reclaimed if shrink_to_fit() is called an additional time.

IMO, the fix should be two-fold:

  1. When the last element of a HashMap is removed, it should automatically clear out any tombstones so that the capacity() method is accurate again.

  2. A new method should be introduced to obtain the upper bound on the capacity. At the moment code which attempts to shrink under-utilized hashmaps is broken, and it's not possible to fix this code without a new method.

@Diggsey Diggsey added the C-bug Category: This is a bug. label Nov 18, 2020
@jyn514 jyn514 added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-collections Area: `std::collection` labels Nov 18, 2020
@poliorcetics
Copy link
Contributor

I don’t exactly remember the API for hashbrown but I believe it doesn’t offer those capabilities so the work will have to be done there before being added to alloc/std

@gowrizrh
Copy link

gowrizrh commented Feb 5, 2021

@Diggsey I looked into this, and it kind of seems intentional?

Given these lines:

if guard.len() * 3 < guard.capacity() {
    guard.shrink_to_fit();
}

I expected all the memory to be reclaimed by the end of the loop. However, this is not the case: the memory is only reclaimed if shrink_to_fit() is called an additional time.

That condition is never true, and is not executed at all. So the line that's commented out is the first time it is actually called.

Regardless, hashbrown never drops (well as of now atleast) when an item is requested to be removed. It erases the element from the inner table, but does not drop it.

So when the last guard.shrink_to_capacity() is called (for the first time) it seems to work just fine.

It's generally expected that when std collections return a capacity of zero, that they are not backed by any allocation. I believe this was also true for the HashMap prior to hashbrown, although I could be wrong.

I was curious so I wanted to test this out so I tried creating a Vec, added some numbers and removed it and the capacity never reduced as it did with the HashMap (which maybe expected)

but, interestingly in the above code example, when replacing the HashMap with a HashSet (which also internally uses hashbrown) the capacity returned changes on each run?

Update: Nevermind, I completely missed setting the default hasher.

@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 5, 2021

That condition is never true, and is not executed at all. So the line that's commented out is the first time it is actually called.

Yes... that's the issue. The expression should evaluate to true, but it's not because there is a situation where hashbrown reports a capacity of zero, but the backing memory has not been freed.

Regardless, hashbrown never drops (well as of now atleast) when an item is requested to be removed.

That's not the issue: the issue is that capacity can report zero when the backing memory is still allocated. Users of standard collections expect capacity == 0 to mean that the memory is freed.

So when the last guard.shrink_to_capacity() is called (for the first time) it seems to work just fine.

The method is shrink_to_fit: it doesn't make sense for this method to do anything if the capacity is already zero: there should be nothing left to shrink.

@gowrizrh
Copy link

gowrizrh commented Feb 5, 2021

Yes... that's the issue. The expression should evaluate to true, but it's not because there is a situation where hashbrown reports a capacity of zero, but the backing memory has not been freed.

Why should that evaluate to true? Are you saying that the capacity should reduce appropriately when items are removed?

@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 5, 2021

Why should that evaluate to true? Are you saying that the capacity should reduce appropriately when items are removed?

The way hashbrown works, the capacity already reduces (in some cases) when items are removed, and that's what causes the issue.

The logic of the condition is this: "if the hashmap is less than 1/3 full, then shrink it to fit the number of items".

At some point whilst removing items, this should result in the hashmap becoming empty, since at some point the number of items should hit zero whilst the capacity is non-zero (in which case the hashmap is less than 1/3 full, since 0 < 1/3).

The problem is that very occasionally, when you remove the last item from the hashmap, the capacity reported by the map also drops to zero, even though no memory has been freed.

@gowrizrh
Copy link

gowrizrh commented Feb 5, 2021

Ah! gotcha, sorry I was understanding this the wrong way around

@Enselic Enselic added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Dec 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants