-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Strongly filtered CAGRA #480
Comments
Related: BFKNN as a strongly-filtered CAGRA replacement #252 |
In general, in order to achieve a good recall when the filtering rate is high, it is necessary to traverse more nodes. For example, if the the filtering rate is very high at 99%, it will be difficult to achieve a recall similar to that without filtering unless you traverse roughly 100 times as many nodes. To do this, the size of the hash table would also need to be increased by a factor of 100, but this is not very practical. We have just submitted a PR for a new multi-CTA algorithm that can reduce the size of the hash table by a factor of 1/32 to 1/64, so I think it would be good to use this new multi-CTA algorithm to address this issue. |
@anaruse what would you think about using a dataset-long bitset in place of the hashmap here as an experiment? The bitset I guess will have to stay in global memory; do you think the performance will be unreasonably bad? |
Using a dataset-long bitset would also be a good approach. Perhaps performance is not an issue when using a bitset, it is the memory usage that should be a concern. I think it is a good idea to use bitset when the memory usage of the hash table exceeds the memory usage of the bitset. |
CAGRA has been observed to yield low recall when filtering is enabled, especially when the ratio of filtered-out values is high. This can be related in part to #208 and #472 , but there also may be fundamental reasons for the lower recall.
This feature request tracks the progress and suggestions to enable high-recall strongly filtered CAGRA.
As an experiment, I suggest to try the following tweaks, enabled by a boolean search parameter:
The text was updated successfully, but these errors were encountered: