-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select vs. ranked for succinct counting blocked bloom filter #28
Comments
Yes, documentation is needed... I'm currently away, and won't have time this week sorry. But next week should be ok. |
So, the "succinct counting blocked Bloom filter", as the name implies, is a blocked Bloom filter, and at the same time a counting Bloom filter. The data structure contains 3 areas:
There are two ways to succinctly count how often the bits in the first area were set: using "rank", and using "select". Using "select" may be a bit easier to explain. Say, in the Bloom filter area, we have a block of 64 bits as follows: 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000001 So only bit 0 and bit 8 are set. The question is, how often (the count). This is stored in a block of 64 bits in the second area. Only bits that are set in the first area need to be counted, so we only need 2 counters. Those are encoded in the number of zeros between two bit that are set. E.g. for counts 1 (for Bloom filter bit 0) and 3 (for Bloom filter bit 8), it is: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001001 If we increment the count for bit 0 from 1 to 2, we shift all the bits to the left, and get: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00010010 If we decrement the count for bit 0 from 1 to 0, we reset the low bit and shift, so we get: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000100 This was for "select". (I called it "select" because the count is read using the "select" operation). "Rank" is similar to a https://en.wikipedia.org/wiki/Hash_array_mapped_trie. For "rank", we count the number of bits that are set in the Bloom filter area (in this case it's 2). So the lowest 2 bits are used to say whether the count is higher than 1. For those that are set, we need one more bit in this area, to the left, until there are none left. For counts 1 (for Bloom filter bit 0) and 3 (for Bloom filter bit 8), it is: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000110 If we increment the count for bit 0 from 1 to 2, we set the last bit and shift, to get: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001011 Both "rank" and "select" use roughly the same space (I think rank uses a bit less on average). The overflow area is needed if a block in the counter area grows too large, in which case a pointer to the overflow area is stored, and the highest bit is set. Well I see it is very, very hard to explain, and understand. I'm afraid I need to give up here... In order to explain this, more time, and more examples are needed... |
What is the difference between the select version and the ranked version? Thanks.
The text was updated successfully, but these errors were encountered: