Skip to content
This repository has been archived by the owner on Feb 29, 2024. It is now read-only.

multi-gpu-docs #42

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/icicle/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 64 additions & 0 deletions docs/icicle/multi-gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Multi GPU with ICICLE

:::info

If you are looking for the Multi GPU API documentation refer here for [Rust](./rust-bindings/multi-gpu.md).

:::

One of the givens with ZK compute can be the large input sizes, its not rare to se today circuits exceeding 2^25 MSM, such input sizes push even high end GPUs to their limits. To scale and run such large circuits we will need to combine multiple GPUs.
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved

Multi GPU is an approach at developing software to run on multiple GPU devices. There are many ways to approach writing such software. This documentation will cover how you can develop with ICICLE for multiple GPUs.
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved


## Approaches to Multi GPU programming

There are many [different strategies](https://github.com/NVIDIA/multi-gpu-programming-models) available for implementing multi GPU, whoever they really split into two catagories.

Check failure on line 16 in docs/icicle/multi-gpu.md

View workflow job for this annotation

GitHub Actions / Check Spelling

catagories ==> categories
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved

### GPU Server approach

This approach usually involves a single or multiple CPUs opening threads to read / write from multiple GPUs. You can think about it as a scaled up HOST - Device model.

![alt text](image.png)

This approach wont let us tackle larger computation sizes but it will allow us to compute multiple computations which we wouldn't be able to load onto a single GPU.

Check failure on line 24 in docs/icicle/multi-gpu.md

View workflow job for this annotation

GitHub Actions / Check Spelling

wont ==> won't

For example lets say that you had to compute two MSM of size 2^20 on a 16GB VRAM GPU you would normally have preform them asynchronously. How ever if you double the number of GPUs in your system you can now run them in parallel.

Check failure on line 26 in docs/icicle/multi-gpu.md

View workflow job for this annotation

GitHub Actions / Check Spelling

preform ==> perform
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved


### Inter GPU approach

This approach involves a more sphisticated approch to multi GPU computation. Using technologies such as [GPUDirect, NCCL, NVSHMEM](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-cwes1084/) and NVLink its possible to combine multiple GPUs and split a computation amongst the different devices.

Check failure on line 31 in docs/icicle/multi-gpu.md

View workflow job for this annotation

GitHub Actions / Check Spelling

approch ==> approach
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved

This approach requires redesigning the algorithm at the software level to be compatible with splitting amongst devices. In some cases to lower latency to a minimum special inter GPU connections would be installed on a server to allow GPU direct communication with each other.
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved


# Writing ICICLE Code for Multi GPUs

The approach we have taken for the moment is a GPU Server approach, we assume you have a machine with multiple GPUs and you wish to run some computation on each GPU.
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved

To dive deeper and learn about the API checkout the docs for our different ICICLE API

- [Rust Multi GPU APIs](./rust-bindings/multi-gpu.md)
- C++ Multi GPU APIs


## Best practices

- Never hardcode device IDs, if you want your software to take advantage of all GPUs on a machine use methods such as `get_device_count` to support arbitrary number of GPUs.

- Launch one thread per GPU, to avoid nasty errors and hard to read code we suggest that for every GPU task you wish to launch you create a dedicated thread. This will make your code way more manageable, easy to read and performant.

## ZKContainer support for multi GPUs

Multi GPU support should work with ZK-Containers by simple defining which devices the docker container should interact with:
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved

```sh
docker run -it --gpus '"device=0,2"' zk-container-image
```

If you wish to expose all GPUs

```sh
docker run --gpus all zk-container-image
```
157 changes: 157 additions & 0 deletions docs/icicle/rust-bindings/multi-gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Multi GPU APIs

To learn more about the theory of Multi GPU programming refer to [this part](../multi-gpu.md) of documentation.

## Device management API

To stream line device management we offer as part of `icicle-cuda-runtime` package methods for dealing with devices.
ImmanuelSegol marked this conversation as resolved.
Show resolved Hide resolved

#### [`set_device`](https://github.com/vhnatyk/icicle/blob/275eaa99040ab06b088154d64cfa50b25fbad2df/wrappers/rust/icicle-cuda-runtime/src/device.rs#L6)

Sets the current CUDA device by its ID, when calling `set_device` it will set the current thread to a CUDA device.

**Parameters:**

- `device_id: usize`: The ID of the device to set as the current device. Device IDs start from 0.

**Returns:**

- `CudaResult<()>`: An empty result indicating success if the device is set successfully. In case of failure, returns a `CudaError`.

**Errors:**

- Returns a `CudaError` if the specified device ID is invalid or if a CUDA-related error occurs during the operation.

**Example:**

```rust
let device_id = 0; // Device ID to set
match set_device(device_id) {
Ok(()) => println!("Device set successfully."),
Err(e) => eprintln!("Failed to set device: {:?}", e),
}
```

#### [`get_device_count`](https://github.com/vhnatyk/icicle/blob/275eaa99040ab06b088154d64cfa50b25fbad2df/wrappers/rust/icicle-cuda-runtime/src/device.rs#L10)

Retrieves the number of CUDA devices available on the machine.

**Returns:**

- `CudaResult<usize>`: The number of available CUDA devices. On success, contains the count of CUDA devices. On failure, returns a `CudaError`.

**Errors:**

- Returns a `CudaError` if a CUDA-related error occurs during the retrieval of the device count.

**Example:**

```rust
match get_device_count() {
Ok(count) => println!("Number of devices available: {}", count),
Err(e) => eprintln!("Failed to get device count: {:?}", e),
}
```

#### [`get_device`](https://github.com/vhnatyk/icicle/blob/275eaa99040ab06b088154d64cfa50b25fbad2df/wrappers/rust/icicle-cuda-runtime/src/device.rs#L15)

Retrieves the ID of the current CUDA device.

**Returns:**

- `CudaResult<usize>`: The ID of the current CUDA device. On success, contains the device ID. On failure, returns a `CudaError`.

**Errors:**

- Returns a `CudaError` if a CUDA-related error occurs during the retrieval of the current device ID.

**Example:**

```rust
match get_device() {
Ok(device_id) => println!("Current device ID: {}", device_id),
Err(e) => eprintln!("Failed to get current device: {:?}", e),
}
```

## Device context API

The `DeviceContext` is embedded into `NTTConfig`, `MSMConfig` and `PoseidonConfig`, meaning you can simple pass a `device_id` to your existing config an the same computation will be triggered on a different device automatically.

#### [`DeviceContext`](https://github.com/vhnatyk/icicle/blob/eef6876b037a6b0797464e7cdcf9c1ecfcf41808/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L11)

Represents the configuration a CUDA device, encapsulating the device's stream, ID, and memory pool. The default device is always `0`, unless configured otherwise.

```rust
pub struct DeviceContext<'a> {
pub stream: &'a CudaStream,
pub device_id: usize,
pub mempool: CudaMemPool,
}
```

##### Fields

- **`stream: &'a CudaStream`**

A reference to a `CudaStream`. This stream is used for executing CUDA operations. By default, it points to a null stream CUDA's default execution stream.

- **`device_id: usize`**

The index of the GPU currently in use. The default value is `0`, indicating the first GPU in the system.

- **`mempool: CudaMemPool`**

Represents the memory pool used for CUDA memory allocations. The default is set to a null pointer, which signifies the use of the default CUDA memory pool.

##### Implementation Notes

- The `DeviceContext` structure is cloneable and can be debugged, facilitating easier logging and duplication of contexts when needed.


#### [`DeviceContext::default_for_device(device_id: usize) -> DeviceContext<'static>`](https://github.com/vhnatyk/icicle/blob/eef6876b037a6b0797464e7cdcf9c1ecfcf41808/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L30C12-L30C30)

Provides a default `DeviceContext` with system-wide defaults, ideal for straightforward setups.

#### Returns

A `DeviceContext` instance configured with:
- The default stream (`null_mut()`).
- The default device ID (`0`).
- The default memory pool (`null_mut()`).

#### Parameters

- **`device_id: usize`**: The ID of the device for which to create the context.

#### Returns

A `DeviceContext` instance with the provided `device_id` and default settings for the stream and memory pool.


#### [`check_device(device_id: i32)`](https://github.com/vhnatyk/icicle/blob/eef6876b037a6b0797464e7cdcf9c1ecfcf41808/wrappers/rust/icicle-cuda-runtime/src/device_context.rs#L42)

Validates that the specified `device_id` matches the ID of the currently active device, ensuring operations are targeted correctly.

#### Parameters

- **`device_id: i32`**: The device ID to verify against the currently active device.

#### Behavior

- **Panics** if the `device_id` does not match the active device's ID, preventing cross-device operation errors.

#### Example

```rust
let device_id: i32 = 0; // Example device ID
check_device(device_id);
// Ensures that the current context is correctly set for the specified device ID.
```


## A Multi GPU example

todo


20 changes: 18 additions & 2 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,20 @@ module.exports = {
id: "icicle/golang-bindings",
},
{
type: "doc",
type: "category",
label: "Rust bindings",
id: "icicle/rust-bindings",
link: {
type: `doc`,
id: "icicle/rust-bindings",
},
collapsed: true,
items: [
{
type: "doc",
label: "Multi GPU Support",
id: "icicle/rust-bindings/multi-gpu",
}
]
},
{
type: "category",
Expand Down Expand Up @@ -60,6 +71,11 @@ module.exports = {
}
],
},
{
type: "doc",
label: "Multi GPU Support",
id: "icicle/multi-gpu",
},
{
type: "doc",
label: "Supporting additional curves",
Expand Down
Loading