Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering for CAGRA to C API #452

Open
wants to merge 11 commits into
base: branch-25.02
Choose a base branch
from

Conversation

ajit283
Copy link
Contributor

@ajit283 ajit283 commented Nov 8, 2024

Adds the CAGRA filtering feature to the C API using DLPack Tensor as blocklist

@ajit283 ajit283 requested a review from a team as a code owner November 8, 2024 16:11
Copy link

copy-pr-bot bot commented Nov 8, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Nov 8, 2024
@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Nov 20, 2024
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! It'd be great to get this functionality into the cagra c-api

cpp/include/cuvs/neighbors/cagra.h Outdated Show resolved Hide resolved
Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@benfred
Copy link
Member

benfred commented Nov 29, 2024

/merge

@benfred
Copy link
Member

benfred commented Nov 29, 2024

/ok to test

cuvs::neighbors::cagra::search(
*res_ptr, search_params, *index_ptr, queries_mds, neighbors_mds, distances_mds);
} else if (filter.type == BITSET) {
using filter_mdspan_type = raft::device_vector_view<std::uint32_t, int64_t, raft::row_major>;
Copy link
Contributor Author

@ajit283 ajit283 Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the build fails because of the std::uint32_t as the first type argument. The type arguments should be <index_t, index_t>, see raft bitset.hpp:

template <typename bitset_t = uint32_t, typename index_t = uint32_t>
struct bitset {
  static constexpr index_t bitset_element_size = sizeof(bitset_t) * 8;

  /**
   * @brief Construct a new bitset object with a list of indices to unset.
   *
   * @param res RAFT resources
   * @param mask_index List of indices to unset in the bitset
   * @param bitset_len Length of the bitset
   * @param default_value Default value to set the bits to. Default is true.
   */
  bitset(const raft::resources& res,
         raft::device_vector_view<const index_t, index_t> mask_index,
         index_t bitset_len,
         bool default_value = true);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually I missed this. The function you are using here is creating a bitset from a list of indices and I don't think it is the workflow that we expect.
The C++ function accepts a bitset_view, not a bitset, so at this point the memory for the bitset should already allocated and we just need to transfer the pointer and the length of the bitset. The C function should also assume that the filter given in input is a bitset already allocated and filled, instead of a list of neighbors to filter. So the filter taken as a parameter in this function should be manipulated as a bitset_view object.

cuvs::neighbors::cagra::search(
*res_ptr, search_params, *index_ptr, queries_mds, neighbors_mds, distances_mds);
} else if (filter.type == BITSET) {
using filter_mdspan_type = raft::device_vector_view<std::uint32_t, int64_t, raft::row_major>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually I missed this. The function you are using here is creating a bitset from a list of indices and I don't think it is the workflow that we expect.
The C++ function accepts a bitset_view, not a bitset, so at this point the memory for the bitset should already allocated and we just need to transfer the pointer and the length of the bitset. The C function should also assume that the filter given in input is a bitset already allocated and filled, instead of a list of neighbors to filter. So the filter taken as a parameter in this function should be manipulated as a bitset_view object.

cpp/test/neighbors/ann_cagra_c.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lowener
Copy link
Contributor

lowener commented Dec 4, 2024

/ok to test

@ajit283 ajit283 changed the base branch from branch-24.12 to branch-25.02 December 10, 2024 11:51
@cjnolet
Copy link
Member

cjnolet commented Dec 10, 2024

/ok to test

*/
cuvsError_t cuvsCagraSearch(cuvsResources_t res,
cuvsCagraSearchParams_t params,
cuvsCagraIndex_t index,
DLManagedTensor* queries,
DLManagedTensor* neighbors,
DLManagedTensor* distances);
DLManagedTensor* distances,
cuvsFilter filter);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API change needs to be propagated to:

  • the python package
  • the example C project (cuvs/example/c)
  • probably the rust package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants