Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a memory limited hashset with LocalVocab #1570

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

s1dharth-s
Copy link
Collaborator

  • Added CustomHashSetWithMemoryLimit: a wrapper class around absl::node_hash_set which tracks the memory used by the elements of the hashset.
  • Modified LocalVocab to use the wrapper hashset implementation.

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

Attention: Patch coverage is 91.17647% with 6 lines in your changes missing coverage. Please review.

Project coverage is 89.20%. Comparing base (1bcfeeb) to head (a73c84a).

Files with missing lines Patch % Lines
src/util/HashSet.h 90.76% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1570      +/-   ##
==========================================
- Coverage   89.21%   89.20%   -0.01%     
==========================================
  Files         372      373       +1     
  Lines       34723    34789      +66     
  Branches     3915     3919       +4     
==========================================
+ Hits        30979    31035      +56     
- Misses       2471     2480       +9     
- Partials     1273     1274       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round of reviews.

src/engine/LocalVocab.cpp Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
@@ -32,4 +37,88 @@ template <class T,
using HashSetWithMemoryLimit =
absl::flat_hash_set<T, HashFct, EqualElem, Alloc>;

template <class T,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HashSet has a Slot array which might have more slots than elements and might rehash with size doubling or so .
Also take this into account, it requires some more interaction with the absl code, but the documentation is good and the interface is all there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the logic to track the slot sizes. But I do not yet know how to deal with the size doubling.

src/util/HashSet.h Show resolved Hide resolved
src/parser/LiteralOrIri.h Show resolved Hide resolved
@@ -90,5 +91,7 @@ class Literal {
static Literal literalWithoutQuotes(
std::string_view rdfContentWithoutQuotes,
std::optional<std::variant<Iri, std::string>> descriptor = std::nullopt);

size_t getDynamicMemoryUsage() const { return content_.capacity(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you can figure out if the small buffer optimization applies.
(basically check if &content <= content.data() < (&content + sizeof(content)).
(but again, not supejr important).

src/engine/LocalVocab.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
@joka921 joka921 marked this pull request as ready for review November 7, 2024 12:37
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A preliminary round of reviews.

src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
src/util/HashSet.h Outdated Show resolved Hide resolved
test/LocalVocabTest.cpp Outdated Show resolved Hide resolved
src/engine/LocalVocab.h Outdated Show resolved Hide resolved
src/engine/LocalVocab.h Outdated Show resolved Hide resolved
src/parser/Iri.h Outdated
// memory usage as this does not currently take into account small string
// optimization of `std::string`
size_t getDynamicMemoryUsage() const {
return sizeof(std::string) + iri_.capacity();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to make clear, whether the hash set expects only the dynamic part of the memory usage, without the sizeof() part (probably yes, as this is managed differently by a lot of containers).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this from Iri.h and Literal.h. I think the allocator (I am not sure if I could use the exisiting AllocatorWithLimit or if I need to write a new one??) could keep track of the static memory while the SizeGetters just return the dynamic memory usage.


// Try to allocate the amount of memory requested
void increaseMemoryUsed(ad_utility::MemorySize amount) {
memoryLeft_.ptr()->wlock()->decrease_if_enough_left_or_throw(amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little wasteful, as we always have to obtain a global mutex.
I think for a follow-up (or preparation) PR,
You could implement a wrapper around the memoryLeftThreadsafe object, that stores a (single-threaded) small pool of memory and only goes to the global wlock() when that pool is exhausted.
Currently increase (1), increase(1), increase(1) needs a global synchronization for each of the three inserts, which seems wastefult.
(But as you have abstracted away all the memory handling, that is easy to integrate.)

…the memory usage instead of `ad_utility::MemorySize`
…ry used by its `AllocationMemoryLeftThreadSafe` object.
…iteralOrIri` objects. Also added comments.
…. Change the default template argument of `SizeGetter` from `SizeOfSizeGetter` to `DefaultValueSizeGetter`.
…ables used to count slot size to have same name. Initialize `memoryUsed_` and `currentNumSlots_` in class.
@sparql-conformance
Copy link

Copy link

sonarcloud bot commented Nov 12, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants