You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While the xor filters are what is new and exciting about the contents of this repository, the other filters are good and there are tradeoffs which make them useable. In order to make an accessibly library out of this work, I think it needs a simple API with feature discoverability.
There are a few considerations:
Mutability - currently modeled with checks and mutator methods which can throw, rather than types
Some filters support addition of elements after construction (supportsAdd/add)
Some filters support removal of elements after construction (supportsRemove/remove)
Bits per key - Some filter implementations parameterise this but others don't, and it feels like an implementation detail that this aspect is not orthogonal to the concept of filter type. At the moment, there is the FilterType enum which is handy if you already know what you want and understand the implementations. Constructing a filter this way requires a setting parameter (roughly the number of bits per key) which is unnecessary when the filter implementation has the number of bits per key "baked in", such as Xor8/Xor16.
Memory consumption during construction all filters require users to buffer keys into a potentially large long[] prior to construction, which will cause problems in some VM applications.
Some filters here could easily avoid doing this, and could be constructed from an iterator type (e.g. Bloom, Cuckoo).
XorN filters require the user to first construct an array of keys, then itself creates another copy of the keys called reverseOrder, which is an unavoidable part of the construction algorithm, but is quite costly in memory. An iterator type could still be used instead of a long[] for the user input to reduce the total cost of construction here.
Seeds For verification purposes it is advantageous that wherever/whenever a filter is built from the same keys, the bitwise result is the same. I don't believe this is possible without externalising the seed to the user as a deterministic iterator type.
I propose new interfaces:
interfaceFilter {
booleanmayContain(longkey);
// serialisation can be specified at this level
}
interfaceMutableFilterextendsFilter {
voidadd(longkey);
}
interfaceRemovableFilterextendMutableFilter {
voidremove(longkey);
}
I propose a simple builder style API which exposes these concerns and provides sensible defaults.
The type of filter would be first choice the user needs to make (bloom/xor/cuckoo/xorplus).
Depending on this choice, the user is forced to accept building mutable/immutable/removable filters: you can't try to build an xor filter and end up with a remove or an add method. The user can then provide a number of bits per key, falling back to sensible values when not supplied, and according to a basic fallback policy whenever the precise choice isn't possible. The user can provide a seed iterator, which falls back to something similar to calls to Hash.randomSeed(), and this would replace the calls to new Random().nextLong() everywhere. The complicated taxonomy of bloom filter variants can take parameters in a similar manner.
This would look something like
varxorBuilder = Filter.xor()
.withBitsPerKey(9) // promotes to 16
.withSeedProvider(newSplittableRandom(0)::nextLong);
...
// get a succinct counting blocked bloom filtervarbloomBuilder = Filter.bloom()
.withBitsPerKey(11)// just pass this on to the bloom filter implementation
.blocked()
.counting();
For building, the user gets a choice between passing arrays and an LongIterator:
Yes I agree. I don't have time currently to give a detailed answer however... One thing to consider is try to separate the interfaces from the implementation, for OSGi / backward compatibility. I'm not saying we need to support OSGi currently... But having to import implementation classes in interface definitions sounds wrong. So the builders should be separated from the interfaces I think.
While the xor filters are what is new and exciting about the contents of this repository, the other filters are good and there are tradeoffs which make them useable. In order to make an accessibly library out of this work, I think it needs a simple API with feature discoverability.
There are a few considerations:
supportsAdd
/add
)supportsRemove
/remove
)FilterType
enum which is handy if you already know what you want and understand the implementations. Constructing a filter this way requires asetting
parameter (roughly the number of bits per key) which is unnecessary when the filter implementation has the number of bits per key "baked in", such asXor8
/Xor16
.long[]
prior to construction, which will cause problems in some VM applications.Bloom
,Cuckoo
).XorN
filters require the user to first construct an array of keys, then itself creates another copy of the keys calledreverseOrder
, which is an unavoidable part of the construction algorithm, but is quite costly in memory. An iterator type could still be used instead of along[]
for the user input to reduce the total cost of construction here.I propose new interfaces:
I propose a simple builder style API which exposes these concerns and provides sensible defaults.
The type of filter would be first choice the user needs to make (bloom/xor/cuckoo/xorplus).
Depending on this choice, the user is forced to accept building mutable/immutable/removable filters: you can't try to build an xor filter and end up with a remove or an add method. The user can then provide a number of bits per key, falling back to sensible values when not supplied, and according to a basic fallback policy whenever the precise choice isn't possible. The user can provide a seed iterator, which falls back to something similar to calls to
Hash.randomSeed()
, and this would replace the calls tonew Random().nextLong()
everywhere. The complicated taxonomy of bloom filter variants can take parameters in a similar manner.This would look something like
For building, the user gets a choice between passing arrays and an
LongIterator
:The text was updated successfully, but these errors were encountered: