low memory population inquiry #25

slimsag · 2021-08-20T20:40:41Z

Hello - super interesting work on the binary fuse filters. I'm enjoying them a lot. I noticed the readme says:

The construction of a binary fuse filter is fast but it needs a fair amount of temporary memory: plan for about 24 bytes of memory per set entry. It is possible to construct a binary fuse filter with almost no temporary memory, but the construction is then somewhat slower.

I'm exploring application of binary fuse filters in search engines (using them as an ngram set lookup). I previously was investigating moving the memory allocations from the population code ot instead utilize mmaped system memory as a means to reduce the physical memory requirements of populating filters - but to hear you think it may be possible to accept a slower construction time with almost no temporary memory is extremely interesting to me.

I am curious if anyone has had more in-depth thoughts about how this would be approached, or tried an implementation of this? If not I will likely try my hand at it, just figured I'd ask in case anyone already had and I might be spared a few hours :)

lemire · 2021-08-20T20:48:57Z

@thomasmueller do you want to answer @slimsag ?

lemire · 2021-08-24T14:06:30Z

I am sure that @thomasmueller will chime in at some point.

The short story is that it is possible to do various tradeoffs with respect to temporary memory usage. We published what we feel is the best default. We expect that it should work very well.

For the time being, we decided not to push alternative constructions.

hskun · 2021-11-06T10:18:25Z

I also need the low memory population.
I'm doing a project to process about 2^26 items hash use binary_fuse16.
population need lots memory.
please help.

thomasmueller · 2021-11-06T16:34:07Z

I'm sorry for the delay! Yes, memory usage can be reduced a lot, to the point where you basically only need 64 bits per key. There are two ways I know of:

(A) Split the set into 16 subsets (chunks), using 4 bits of the key (or hash of the key). This idea is very old, and once you understand how it works, it is quite easy (I think)... So, instead of 1 filter, you have 16 smaller filters. You can construct them one-by-one so that you need less temporary memory (basically reduce the memory usage by a factor of 16). Disadvantage: lookup may be a tiny bit slower, because you now have 16 filters instead of just 1. Well, it's might not actually be slower - you have to test it. A second advantage: you can construct multiple smaller filters concurrently, if you have enough memory.

(B) Using a special construction algorithm that preserves memory, at the expense of being slower. For the binary fuse filter, construction will be maybe half as fast than now (I don't have the exact numbers). Lookup is unchanged. I did work on a prototype implementation.

Approach (A) sounds more useful. It is relatively simple, and has the added advantage of the parallel construction. Let me know if you have any question, or if you are really interested in approach (B).

Regards,
Thomas

slimsag · 2021-11-07T00:32:24Z

Do I understand correctly that approach A above also requires no change to the xorfilter code, it merely requires constructing 16 independent xor filters (with smaller bit size)? That's smart

thomasmueller · 2021-11-07T20:24:34Z

Yes, approach A requires no change. The 16 independent xor filters would have have the same "bits per entry" size as one large filter.

hskun · 2021-11-10T00:38:05Z

(B) Using a special construction algorithm that preserves memory, at the expense of being slower. For the binary fuse filter, construction will be maybe half as fast than now (I don't have the exact numbers). Lookup is unchanged. I did work on a prototype implementation.

Hi Thomas,
thanks your explain.
I really interested B ,can you help?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low memory population inquiry #25

low memory population inquiry #25

slimsag commented Aug 20, 2021

lemire commented Aug 20, 2021

lemire commented Aug 24, 2021

hskun commented Nov 6, 2021

thomasmueller commented Nov 6, 2021

slimsag commented Nov 7, 2021

thomasmueller commented Nov 7, 2021

hskun commented Nov 10, 2021

low memory population inquiry #25

low memory population inquiry #25

Comments

slimsag commented Aug 20, 2021

lemire commented Aug 20, 2021

lemire commented Aug 24, 2021

hskun commented Nov 6, 2021

thomasmueller commented Nov 6, 2021

slimsag commented Nov 7, 2021

thomasmueller commented Nov 7, 2021

hskun commented Nov 10, 2021