Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch-24.02 into branch-24.04 #38

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 141 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
# <div align="left"><img src="https://rapids.ai/assets/images/rapids_logo.png" width="90px"/>&nbsp;cuVS: Vector Search and Clustering on the GPU</div>

### NOTE: cuVS is currently being

## Contents
<hr>

1. [Useful Resources](#useful-resources)
2. [What is cuVS?](#what-is-cuvs)
3. [Getting Started](#getting-started)
4. [Installing cuVS](#installing)
3. [Installing cuVS](#installing)
4. [Getting Started](#getting-started)
5. [Contributing](#contributing)
6. [References](#references)

<hr>

## Useful Resources

- [cuVS Reference Documentation](https://docs.rapids.ai/api/cuvs/stable/): API Documentation.
Expand All @@ -26,15 +21,152 @@

## What is cuVS?

cuVS contains many algorithms for running approximate nearest neighbors and clustering on the GPU.
cuVS contains state-of-the-art implementations of several algorithms for running approximate nearest neighbors and clustering on the GPU. It can be used directly or through the various databases and other libraries that have integrated it. The primary goal of cuVS is to simplify the use of GPUs for vector similarity search and clustering.

**Please note** that cuVS is a new library mostly derived from the approximate nearest neighbors and clustering algorithms in the [RAPIDS RAFT](https://github.com/rapidsai) library of data mining primitives. RAPIDS RAFT currently contains the most fully-featured versions of the approximate nearest neighbors and clustering algorithms in cuVS. We are in the process of migrating the algorithms from RAFT to cuVS, but if you are unsure of which to use, please consider the following:
1. RAFT contains C++ and Python APIs for all of the approximate nearest neighbors and clustering algorithms.
2. cuVS contains a growing support for different languages, including C, C++, Python, and Rust. We will be adding more language support to cuVS in the future but will not be improving the language support for RAFT.
3. Once all of RAFT's approximate nearest neighbors and clustering algorithms are moved to cuVS, the RAFT APIs will be deprecated and eventually removed altogether. Once removed, RAFT will become a lightweight header-only library. In the meantime, there's no harm in using RAFT if support for additional languages is not needed.

## Installing cuVS

cuVS comes with pre-built packages that can be installed through [conda](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-python). Different packages are available for the different languages supported by cuVS:

| Python | C++ | C | Rust |
|--------|-----|---|------|
| `pycuvs`| `libcuvs` | `libcuvs_c` | `cuvs-rs` |

### Stable release

It is recommended to use [mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) to install the desired packages. The following command will install the Python package. You can substitute `pycuvs` for any of the packages in the table above:
```bash
mamba install -c conda-forge -c nvidia -c rapidsai pycuvs
```

### Nightlies
If installing a version that has not yet been released, the `rapidsai` channel can be replaced with `rapidsai-nightly`:
```bash
mamba install -c conda-forge -c nvidia -c rapidsai-nightly pycuvs=24.02*
```

Please see the [Build and Install Guide](docs/source/build.md) for more information on installing cuVS and building from source.

## Getting Started

The following code snippets train an approximate nearest neighbors index for the CAGRA algorithm.

### Python API

```python
from cuvs.neighbors import cagra

dataset = load_data()
index_params = cagra.IndexParams()

index = cagra.build_index(build_params, dataset)
```

### C++ API

```c++
#include <cuvs/neighbors/cagra.hpp>

using namespace cuvs::neighbors;

raft::device_matrix_view<float> dataset = load_dataset();
raft::device_resources res;

cagra::index_params index_params;

auto index = cagra::build(res, index_params, dataset);
```

For more example of the C++ APIs, refer to [cpp/examples](https://github.com/rapidsai/cuvs/tree/HEAD/cpp/examples) directory in the codebase.

### C API

```c
#include <cuvs/neighbors/cagra.h>

cuvsResources_t res;
cuvsCagraIndexParams_t index_params;
cuvsCagraIndex_t index;

DLManagedTensor *dataset;
load_dataset(dataset);

cuvsResourcesCreate(&res);
cuvsCagraIndexParamsCreate(&index_params);
cuvsCagraIndexCreate(&index);

cuvsCagraBuild(res, index_params, dataset, index);

cuvsCagraIndexDestroy(index);
cuvsCagraIndexParamsDestroy(index_params);
cuvsResourcesDestroy(res);
```

## Installing cuVS

## Contributing

If you are interested in contributing to the cuVS library, please read our [Contributing guidelines](docs/source/contributing.md). Refer to the [Developer Guide](docs/source/developer_guide.md) for details on the developer guidelines, workflows, and principals.

## References

When citing cuVS generally, please consider referencing this Github repository.
```bibtex
@misc{rapidsai,
title={Rapidsai/cuVS: Vector Search and Clustering on the GPU.},
url={https://github.com/rapidsai/cuvs},
journal={GitHub},
publisher={Nvidia RAPIDS},
author={Rapidsai},
year={2024}
}
```

If citing CAGRA, please consider the following bibtex:
```bibtex
@misc{ootomo2023cagra,
title={CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs},
author={Hiroyuki Ootomo and Akira Naruse and Corey Nolet and Ray Wang and Tamas Feher and Yong Wang},
year={2023},
eprint={2308.15136},
archivePrefix={arXiv},
primaryClass={cs.DS}
}
```

If citing the k-selection routines, please consider the following bibtex:
```bibtex
@proceedings{10.1145/3581784,
title = {SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
year = {2023},
isbn = {9798400701092},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
abstract = {Started in 1988, the SC Conference has become the annual nexus for researchers and practitioners from academia, industry and government to share information and foster collaborations to advance the state of the art in High Performance Computing (HPC), Networking, Storage, and Analysis.},
location = {, Denver, CO, USA, }
}
```

If citing the nearest neighbors descent API, please consider the following bibtex:
```bibtex
@inproceedings{10.1145/3459637.3482344,
author = {Wang, Hui and Zhao, Wan-Lei and Zeng, Xiangxiang and Yang, Jianye},
title = {Fast K-NN Graph Construction by GPU Based NN-Descent},
year = {2021},
isbn = {9781450384469},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3459637.3482344},
doi = {10.1145/3459637.3482344},
abstract = {NN-Descent is a classic k-NN graph construction approach. It is still widely employed in machine learning, computer vision, and information retrieval tasks due to its efficiency and genericness. However, the current design only works well on CPU. In this paper, NN-Descent has been redesigned to adapt to the GPU architecture. A new graph update strategy called selective update is proposed. It reduces the data exchange between GPU cores and GPU global memory significantly, which is the processing bottleneck under GPU computation architecture. This redesign leads to full exploitation of the parallelism of the GPU hardware. In the meantime, the genericness, as well as the simplicity of NN-Descent, are well-preserved. Moreover, a procedure that allows to k-NN graph to be merged efficiently on GPU is proposed. It makes the construction of high-quality k-NN graphs for out-of-GPU-memory datasets tractable. Our approach is 100-250\texttimes{} faster than the single-thread NN-Descent and is 2.5-5\texttimes{} faster than the existing GPU-based approaches as we tested on million as well as billion scale datasets.},
booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
pages = {1929–1938},
numpages = {10},
keywords = {high-dimensional, nn-descent, gpu, k-nearest neighbor graph},
location = {Virtual Event, Queensland, Australia},
series = {CIKM '21}
}
```
10 changes: 5 additions & 5 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ ARGS=$*
# scripts, and that this script resides in the repo dir!
REPODIR=$(cd $(dirname $0); pwd)

VALIDARGS="clean libcuvs python docs tests template clean --uninstall -v -g -n --compile-static-lib --allgpuarch --no-nvtx --show_depr_warn --incl-cache-stats --time -h"
VALIDARGS="clean libcuvs python docs tests examples clean --uninstall -v -g -n --compile-static-lib --allgpuarch --no-nvtx --show_depr_warn --incl-cache-stats --time -h"
HELP="$0 [<target> ...] [<flag> ...] [--cmake-args=\"<args>\"] [--cache-tool=<tool>] [--limit-tests=<targets>] [--build-metrics=<filename>]
where <target> is:
clean - remove all existing build artifacts and configuration (start over)
Expand All @@ -27,7 +27,7 @@ HELP="$0 [<target> ...] [<flag> ...] [--cmake-args=\"<args>\"] [--cache-tool=<to
python - build the cuvs Python package
docs - build the documentation
tests - build the tests
template - build the example CUVS application template
examples - build the examples

and <flag> is:
-v - verbose build mode
Expand Down Expand Up @@ -433,10 +433,10 @@ if hasArg docs; then
fi

################################################################################
# Initiate build for example CUVS application template (if needed)
# Initiate build for c++ examples (if needed)

if hasArg template; then
pushd ${REPODIR}/cpp/template
if hasArg examples; then
pushd ${REPODIR}/cpp/examples
./build.sh
popd
fi
8 changes: 2 additions & 6 deletions ci/build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,11 @@ rapids-print-env

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)
#PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
libcuvs \
libcuvs-headers \
cuvs \
raft-dask
libcuvs

export RAPIDS_VERSION_NUMBER="24.02"
export RAPIDS_DOCS_DIR="$(mktemp -d)"
Expand Down
28 changes: 14 additions & 14 deletions ci/build_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,22 +15,22 @@ rapids-print-env

rapids-logger "Begin py build"

CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
#CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)

version=$(rapids-generate-version)
git_commit=$(git rev-parse HEAD)
export RAPIDS_PACKAGE_VERSION=${version}
echo "${version}" > VERSION
#version=$(rapids-generate-version)
#git_commit=$(git rev-parse HEAD)
#export RAPIDS_PACKAGE_VERSION=${version}
#echo "${version}" > VERSION

package_dir="python"
for package_name in cuvs raft-dask; do
underscore_package_name=$(echo "${package_name}" | tr "-" "_")
sed -i "/^__git_commit__/ s/= .*/= \"${git_commit}\"/g" "${package_dir}/${package_name}/${underscore_package_name}/_version.py"
done
#package_dir="python"
#for package_name in cuvs; do
# underscore_package_name=$(echo "${package_name}" | tr "-" "_")
# sed -i "/^__git_commit__/ s/= .*/= \"${git_commit}\"/g" "${package_dir}/${package_name}/${underscore_package_name}/_version.py"
#done

# TODO: Remove `--no-test` flags once importing on a CPU
# node works correctly
rapids-conda-retry mambabuild \
--no-test \
--channel "${CPP_CHANNEL}" \
conda/recipes/cuvs
#rapids-conda-retry mambabuild \
# --no-test \
# --channel "${CPP_CHANNEL}" \
# conda/recipes/cuvs
2 changes: 1 addition & 1 deletion ci/build_wheel_cuvs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@
set -euo pipefail

# Set up skbuild options. Enable sccache in skbuild config options
export SKBUILD_CONFIGURE_OPTIONS="-DRAFT_BUILD_WHEELS=ON -DDETECT_CONDA_ENV=OFF -DFIND_RAFT_CPP=OFF"
export SKBUILD_CONFIGURE_OPTIONS="-DCUVS_BUILD_WHEELS=ON -DDETECT_CONDA_ENV=OFF -DFIND_CUVS_CPP=OFF"

#ci/build_wheel.sh cuvs python/cuvs
6 changes: 3 additions & 3 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/bin/bash
# Copyright (c) 2020-2023, NVIDIA CORPORATION.
# Copyright (c) 2020-2024, NVIDIA CORPORATION.
########################
# RAFT Version Updater #
########################
Expand Down Expand Up @@ -38,7 +38,7 @@ function sed_runner() {

sed_runner "s/set(RAPIDS_VERSION .*)/set(RAPIDS_VERSION \"${NEXT_SHORT_TAG}\")/g" cpp/CMakeLists.txt
sed_runner "s/set(RAPIDS_VERSION .*)/set(RAPIDS_VERSION \"${NEXT_SHORT_TAG}\")/g" cpp/template/cmake/thirdparty/fetch_rapids.cmake
sed_runner "s/set(RAFT_VERSION .*)/set(RAFT_VERSION \"${NEXT_FULL_TAG}\")/g" cpp/CMakeLists.txt
sed_runner "s/set(CUVS_VERSION .*)/set(CUVS_VERSION \"${NEXT_FULL_TAG}\")/g" cpp/CMakeLists.txt
sed_runner 's/'"cuvs_version .*)"'/'"cuvs_version ${NEXT_FULL_TAG})"'/g' python/cuvs/CMakeLists.txt
sed_runner 's/'"branch-.*\/RAPIDS.cmake"'/'"branch-${NEXT_SHORT_TAG}\/RAPIDS.cmake"'/g' fetch_rapids.cmake

Expand Down Expand Up @@ -85,7 +85,7 @@ sed_runner "s/RAPIDS_VERSION_NUMBER=\".*/RAPIDS_VERSION_NUMBER=\"${NEXT_SHORT_TA

sed_runner "/^PROJECT_NUMBER/ s|\".*\"|\"${NEXT_SHORT_TAG}\"|g" cpp/doxygen/Doxyfile

sed_runner "/^set(RAFT_VERSION/ s|\".*\"|\"${NEXT_SHORT_TAG}\"|g" docs/source/build.md
sed_runner "/^set(CUVS_VERSION/ s|\".*\"|\"${NEXT_SHORT_TAG}\"|g" docs/source/build.md
sed_runner "s|branch-[0-9][0-9].[0-9][0-9]|branch-${NEXT_SHORT_TAG}|g" docs/source/build.md
sed_runner "/rapidsai\/raft/ s|branch-[0-9][0-9].[0-9][0-9]|branch-${NEXT_SHORT_TAG}|g" docs/source/developer_guide.md

Expand Down
10 changes: 5 additions & 5 deletions ci/test_python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,18 +20,18 @@ set -u

rapids-logger "Downloading artifacts from previous jobs"
CPP_CHANNEL=$(rapids-download-conda-from-s3 cpp)
PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)
#PYTHON_CHANNEL=$(rapids-download-conda-from-s3 python)

RAPIDS_TESTS_DIR=${RAPIDS_TESTS_DIR:-"${PWD}/test-results"}
RAPIDS_COVERAGE_DIR=${RAPIDS_COVERAGE_DIR:-"${PWD}/coverage-results"}
mkdir -p "${RAPIDS_TESTS_DIR}" "${RAPIDS_COVERAGE_DIR}"

rapids-print-env

rapids-mamba-retry install \
--channel "${CPP_CHANNEL}" \
--channel "${PYTHON_CHANNEL}" \
libcuvs #cuvs
#rapids-mamba-retry install \
# --channel "${CPP_CHANNEL}" \
## --channel "${PYTHON_CHANNEL}" \
# libcuvs #cuvs

rapids-logger "Check GPU usage"
nvidia-smi
Expand Down
4 changes: 2 additions & 2 deletions ci/test_wheel_cuvs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
set -euo pipefail

mkdir -p ./dist
RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
RAPIDS_PY_WHEEL_NAME="cuvs_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist
#RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
#RAPIDS_PY_WHEEL_NAME="cuvs_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-s3 ./dist

## echo to expand wildcard before adding `[extra]` requires for pip
#python -m pip install $(echo ./dist/cuvs*.whl)[test]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
# Copyright (c) 2022-2023, NVIDIA CORPORATION.
# Copyright (c) 2022-2024, NVIDIA CORPORATION.

# Just building template so we verify it uses libraft.so and fail if it doesn't build
./build.sh template
./build.sh examples
6 changes: 3 additions & 3 deletions conda/recipes/libcuvs/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,9 @@ outputs:
home: https://rapids.ai/
license: Apache-2.0
summary: libcuvs tests
- name: libcuvs-template
- name: libcuvs-examples
version: {{ version }}
script: build_libcuvs_template.sh
script: build_libcuvs_examples.sh
build:
script_env: *script_env
number: {{ GIT_DESCRIBE_NUMBER }}
Expand Down Expand Up @@ -241,4 +241,4 @@ outputs:
about:
home: https://rapids.ai/
license: Apache-2.0
summary: libcuvs template
summary: libcuvs examples
4 changes: 2 additions & 2 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ set(lang_list "CXX")

if(NOT BUILD_CPU_ONLY)
include(rapids-cuda)
rapids_cuda_init_architectures(cuVS)
rapids_cuda_init_architectures(CUVS)
list(APPEND lang_list "CUDA")
endif()

project(
cuVS
CUVS
VERSION ${CUVS_VERSION}
LANGUAGES ${lang_list}
)
Expand Down
File renamed without changes.
8 changes: 4 additions & 4 deletions cpp/template/README.md → cpp/examples/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Example CUVS Project Template
# cuVS C++ Examples

This template project provides a drop-in sample to either start building a new application with, or using CUVS in an existing CMake project.

First, please refer to our [installation docs](https://docs.rapids.ai/api/cuvs/stable/build.html#cuda-gpu-requirements) for the minimum requirements to use CUVS.
First, please refer to our [installation docs](https://docs.rapids.ai/api/cuvs/stable/build.html#cuda-gpu-requirements) for the minimum requirements to use cuVS.

Once the minimum requirements are satisfied, this example template application can be built with the provided `build.sh` script. This is a bash script that calls the appropriate CMake commands, so you can look into it to see the typical CMake based build workflow.

This directory (`CUVS_SOURCE/cpp/template`) can be copied directly in order to build a new application with CUVS.
This directory (`CUVS_SOURCE/cpp/examples`) can be copied directly in order to build a new application with CUVS.

CUVS can be integrated into an existing CMake project by copying the contents in the `configure rapids-cmake` and `configure cuvs` sections of the provided `CMakeLists.txt` into your project, along with `cmake/thirdparty/get_cuvs.cmake`.
cuVS can be integrated into an existing CMake project by copying the contents in the `configure rapids-cmake` and `configure cuvs` sections of the provided `CMakeLists.txt` into your project, along with `cmake/thirdparty/get_cuvs.cmake`.

Make sure to link against the appropriate Cmake targets. Use `cuvs::cuvs` to utilize the shared library.

Expand Down
2 changes: 1 addition & 1 deletion cpp/template/build.sh → cpp/examples/build.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright (c) 2023, NVIDIA CORPORATION.
# Copyright (c) 2023-2024, NVIDIA CORPORATION.

# cuvs empty project template build script

Expand Down
Loading
Loading