[RELEASE] cuvs v24.12 #523

GPUtester · 2024-12-06T19:13:52Z

❄️ Code freeze for `branch-24.12` and v24.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-24.12 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-24.12 into main for the release

Forward-merge branch-24.10 into branch-24.12

Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) - rhdong (https://github.com/rhdong) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #352

Forward-merge branch-24.10 into branch-24.12

Contributes to rapidsai/build-planning#94 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #392

1) Replace the domain name from `raft` to `cuvs` to avoid confusion when using tools such as NSYS to analyze the program timeline. 2) Use C++17 feature `__has_include` instead of a CMake script to find out if NVTX available in the benchmark executable. It turns out our CMake check has been not reliable due to not finding include directories correctly. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Micka (https://github.com/lowener) URL: #400

Branch 24.12 merge 24.10

Contributes to rapidsai/build-planning#106 Proposes specifying the RAPIDS version in `conda install` calls that install CI artifacts, to reduce the risk of CI jobs picking up artifacts from other releases. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #406

Add a command-line option to disable the CUDA event/stream synchronization on each iteration. Only one sync is done per benchmark loop in this case instead. As a result, the measured QPS is observed due to: 1) A small `cudaEventSynchronize` is removed from the benchmark loop; 2) If a GPU algorithm has little to no sync between the GPU and CPU, the kernel launch latency and other CPU overheads are completely hidden. The new option is experimental and disabled by default. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #405

This package is available in `dependencies.yaml`, but due to an oversight was not added to conda metas. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) - James Lamb (https://github.com/jameslamb) URL: #408

Remove a collection of unneccesarily complex CMake logic. Major change is that we explicitly opt-in to building the C API bindings by default since it is a hard requirement for our python bindings, and the project has numerous conditions to disable it. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #416

This attempts to rename `cuvs_bench` to `cuvs-bench` and similarly for the CPU package. This follows from this thread: rapidsai/docker#715 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) - Dante Gama Dessavre (https://github.com/dantegd) URL: #417

Fixes an issue in CI computations of sccache hit rates. See #414 for details.

Follow-up PR to address feedback: rapidsai/raft#2474 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #422

I noticed it was missing while switching Milvus to cuVS Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #461

`.view()` is required Authors: - tsuki (https://github.com/enp1s0) Approvers: - Micka (https://github.com/lowener) URL: #484

This PR adds an option to build the full HNSW hierarchy on the CPU when converting a CAGRA index to an hnswlib index. This lets us enable an `extend()` API. For hnswlib: 1. Update to `v0.7.0` 2. Remove dependency as symbols are compiled within DSO Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Bradley Dice (https://github.com/bdice) - Corey J. Nolet (https://github.com/cjnolet) URL: #465

Wheel-building CI jobs are failing like this: > Checking final_dist/cuvs_cu12-25.2.0a26-cp310-cp310-manylinux_2_28_aarch64.whl: FAILED due to warnings > WARNING `long_description` missing. > Error: Process completed with exit code 1. ([build link](https://github.com/rapidsai/cuvs/actions/runs/12133882036)) Looks like the root cause is a combination of the following: * there was a new `twine` release (6.x) 3 days ago: https://pypi.org/project/twine/#history * it contains pypa/twine#1168, which makes `twine check --strict` fail if the wheel's `long_description` is empty * the `cuvs` wheel README (used as the wheel `long_description`) is empty This proposes adding a small README, with just 2 sentences copied from the project's root-level README, to get past that check. ## Notes for Reviewers The `long_description` becomes the project homepage when a project is hosted on PyPI. The wheels produced from this repo aren't currently being published to pypi.org so this change won't be seen there, but a more user-friendly README should be added if/when we decide to publish `cuvs-cu{11,12}` to pypi.org. ref: rapidsai/build-planning#70

This notebook is adapting the Question Retrieval nb to use Milvus. It can serve as a good example on how to do Bulk ingest, how to use cuVS, and especially CAGRA+HNSW on Milvus Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #451

Ref : #472 ## The cause of the bug The bitonic sort was used on an array that was not a power of 2 long. In the current search implementation, the bitonic sort is used to move the invalid elements to the end of the buffer as: https://github.com/rapidsai/cuvs/blob/5062594138a40231475299c7bac61083b0669fd1/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L758-L763 https://github.com/rapidsai/cuvs/blob/5062594138a40231475299c7bac61083b0669fd1/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L644-L649 The problem is that the (max) array length (=`MAX_ITOPK + MAX_CANDIDATES`) is not always the power of two. These bitonic sorts are called even if no elements are filtered out unless `cuvs::neighbors::filtering::none_sample_filter` is specified as the filter, so #472 occurs. ## Fix This PR changes the filtering process so that the bitonic sort is not used to move the invalid elements to the end of the buffer. Authors: - tsuki (https://github.com/enp1s0) Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: #489

Let me know if this is out of scope for cuVS! Authors: - Ajit Mistry (https://github.com/ajit283) - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) URL: #311

@cjnolet

Non-blocking / stream-ordered dynamic batching as a new index type. ## API This PR implements dynamic batching as a new index type, mirroring the API of other indices. * [_building is wrapping_] Building the index means creating a lightweight wrapper on top of an existing index and initializing necessary components, such as IO batch buffers and synchronization primitives. * [_type erasure_] The underlying/upstream index type is erased once the dynamic_batching wrapper is created, i.e. there's no way to recover the original search index type or parameters. * [_explicit control over batching_] To allow multiple user requests group into a dynamic batch request, the users must use copies of the same dynamic batching index (the user-facing index type is a thin wrapper on top of a shared pointer, hence the copy is shallow and cheap). The search function is thread-safe. ## Feature: stream-ordered dynamic batching Non-blocking / stream-ordered dynamic batching means the batching does not involve synchronizing with a GPU stream. The control is returned to the user as soon as the necessary work is submitted to the GPU. This entails a few good-to-know features: 1. The dynamic batching index has the same blocking properties as the upstream index: if the upstream index does not involve stream sync during search, that the dynamic batching index does not involve it as well (otherwise, the dynamic batching search obviously waits till the upstream search synchronizes under the hood). 2. It's responsibility of the user to synchronize the stream before getting the results back - even if the upstream index search does not need it (the batch results are scattered back to the request threads in a post-processing kernel). 3. If the upstream index does not synchronize during search, the dynamic batching index can group the queries even in a single-threaded application (_try it with --no-lap-sync option in the ann-bench benchmarks_). Overall, stream-ordered dynamic batching makes it easy to modify existing cuVS indexes, because the wrapped index has the same execution behavior as the upstream index. ## Work-in-progress TODO - [x] Add dynamic batching option to more indices in ann-bench - [x] Add tests - [x] **(postponed to 25.02)** Do proper benchmarking and possibly fine-tune the inter-thread communication - [x] Review the API side (`cpp/include/cuvs/neighbors/dynamic_batching.hpp`) [ready for review CC @cjnolet] - [x] Review the algorithm side (`cpp/src/neighbors/detail/dynamic_batching.cuh`) [ready for preliminary review: requests for algorithm docsting/clarifications are especially welcome] Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #261

First draft for scalar quantization. WIP status: * only int8_t target type * quantile computation inefficient (via sampling & sorting) Authors: - Malte Förster (https://github.com/mfoerste4) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #494

Skip some checks involving hard-coded offsets into the data when the number of records in the checked PQ list is smaller than needed. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #512

Due to the tight integration between cuvs and raft, we need to ensure that cuvs is updated for rapidsai/raft#2503 or builds of cuvs that rely on cloning raft will get an incompatible version of cutlass due to raft's update. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Ben Frederickson (https://github.com/benfred) URL: #516

Partially addresses #455 Authors: - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) Approvers: - Ben Frederickson (https://github.com/benfred) - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) URL: #463

@cjnolet

…466) PR allows calculating ground truth for cuvs-bench on CPU systems. Current version uses a simple NumPy brute force, perhaps we should consider using faiss? cc @cjnolet @divyegala Authors: - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #466

copy-pr-bot · 2024-12-06T19:13:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

review-notebook-app · 2024-12-06T19:13:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

raydouglass and others added 30 commits September 19, 2024 12:06

DOC v24.12 Updates [skip ci]

6f72fe7

Merge pull request #340 from rapidsai/branch-24.10

585e51a

Forward-merge branch-24.10 into branch-24.12

Merge pull request #345 from rapidsai/branch-24.10

69bd39b

Forward-merge branch-24.10 into branch-24.12

Merge pull request #349 from rapidsai/branch-24.10

345bb62

Forward-merge branch-24.10 into branch-24.12

Merge pull request #351 from rapidsai/branch-24.10

531f0d1

Forward-merge branch-24.10 into branch-24.12

Merge pull request #354 from rapidsai/branch-24.10

ad2e3a6

Forward-merge branch-24.10 into branch-24.12

Merge pull request #356 from rapidsai/branch-24.10

b0b8b53

Forward-merge branch-24.10 into branch-24.12

Merge pull request #360 from rapidsai/branch-24.10

224c85e

Forward-merge branch-24.10 into branch-24.12

Fix Question Retrieval notebook (#352)

2ad6397

Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) - rhdong (https://github.com/rhdong) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #352

Merge pull request #361 from rapidsai/branch-24.10

aa717ec

Forward-merge branch-24.10 into branch-24.12

Merge pull request #362 from rapidsai/branch-24.10

845f88e

Forward-merge branch-24.10 into branch-24.12

Merge pull request #366 from rapidsai/branch-24.10

2c43f7c

Forward-merge branch-24.10 into branch-24.12

Merge pull request #371 from rapidsai/branch-24.10

260bfb4

Forward-merge branch-24.10 into branch-24.12

Merge pull request #373 from rapidsai/branch-24.10

40f4a58

Forward-merge branch-24.10 into branch-24.12

Merge pull request #374 from rapidsai/branch-24.10

5d729d3

Forward-merge branch-24.10 into branch-24.12

Merge pull request #377 from rapidsai/branch-24.10

2b884ca

Forward-merge branch-24.10 into branch-24.12

Merge pull request #379 from rapidsai/branch-24.10

bf404b0

Forward-merge branch-24.10 into branch-24.12

Merge pull request #381 from rapidsai/branch-24.10

330cb0b

Forward-merge branch-24.10 into branch-24.12

Merge pull request #384 from rapidsai/branch-24.10

aeacfdf

Forward-merge branch-24.10 into branch-24.12

Prune workflows based on changed files (#392)

397e56e

Contributes to rapidsai/build-planning#94 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - James Lamb (https://github.com/jameslamb) URL: #392

Merge branch-24.10 into branch-24.12

0f2f933

Merge pull request #407 from benfred/branch-24.12-merge-24.10

587e997

Branch 24.12 merge 24.10

Use Python for sccache hit rate computation. (#420)

009bb8d

Fixes an issue in CI computations of sccache hit rates. See #414 for details.

Use environment variables in cache hit rate computation. (#422)

e7f1085

Follow-up PR to address feedback: rapidsai/raft#2474 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #422

lowener and others added 13 commits November 25, 2024 20:09

Add serialization API to brute-force (#461)

e1359e1

I noticed it was missing while switching Milvus to cuVS Authors: - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #461

[Doc] Fix CAGRA search sample code (#484)

5062594

`.view()` is required Authors: - tsuki (https://github.com/enp1s0) Approvers: - Micka (https://github.com/lowener) URL: #484

GPUtester requested review from a team as code owners December 6, 2024 19:13

GPUtester requested review from bdice and removed request for a team December 6, 2024 19:13

github-actions bot added ci cpp CMake Python labels Dec 6, 2024

Update Changelog [skip ci]

cf2885c

raydouglass merged commit b5eed5e into main Dec 11, 2024
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] cuvs v24.12 #523

[RELEASE] cuvs v24.12 #523

GPUtester commented Dec 6, 2024

copy-pr-bot bot commented Dec 6, 2024

review-notebook-app bot commented Dec 6, 2024

[RELEASE] cuvs v24.12 #523

[RELEASE] cuvs v24.12 #523

Conversation

GPUtester commented Dec 6, 2024

❄️ Code freeze for branch-24.12 and v24.12 release

What does this mean?

What is the purpose of this PR?

copy-pr-bot bot commented Dec 6, 2024

review-notebook-app bot commented Dec 6, 2024

❄️ Code freeze for `branch-24.12` and v24.12 release