From 590d26ab4d61b785789ee6bacae29c37337e6703 Mon Sep 17 00:00:00 2001 From: Jiaming Yuan Date: Tue, 11 Apr 2023 18:15:55 +0800 Subject: [PATCH] Add document about main guard. (#1157) Close https://github.com/rapidsai/dask-cuda/issues/1152 . Authors: - Jiaming Yuan (https://github.com/trivialfis) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: https://github.com/rapidsai/dask-cuda/pull/1157 --- docs/source/examples/best-practices.rst | 4 +--- docs/source/examples/ucx.rst | 6 +++--- docs/source/quickstart.rst | 4 ++++ 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/source/examples/best-practices.rst b/docs/source/examples/best-practices.rst index 84cc78b88..2de3809c8 100644 --- a/docs/source/examples/best-practices.rst +++ b/docs/source/examples/best-practices.rst @@ -9,9 +9,7 @@ When choosing between two multi-GPU setups, it is best to pick the one where mos `DGX `_, a cloud instance with `multi-gpu options `_ , a high-density GPU HPC instance, etc. This is done for two reasons: - Moving data between GPUs is costly and performance decreases when computation stops due to communication overheads, Host-to-Device/Device-to-Host transfers, etc -- Multi-GPU instances often come with accelerated networking like `NVLink `_. These accelerated -networking paths usually have much higher throughput/bandwidth compared with traditional networking *and* don't force and Host-to-Device/Device-to-Host transfers. See -`Accelerated Networking`_ for more discussion +- Multi-GPU instances often come with accelerated networking like `NVLink `_. These accelerated networking paths usually have much higher throughput/bandwidth compared with traditional networking *and* don't force and Host-to-Device/Device-to-Host transfers. See `Accelerated Networking`_ for more discussion. .. code-block:: python diff --git a/docs/source/examples/ucx.rst b/docs/source/examples/ucx.rst index 6230caf67..18c569ff1 100644 --- a/docs/source/examples/ucx.rst +++ b/docs/source/examples/ucx.rst @@ -69,7 +69,7 @@ To start a Dask scheduler using UCX with automatic configuration and one GB of R .. note:: The ``interface="ib0"`` is intentionally specified above to ensure RDMACM is used in systems that support InfiniBand. On systems that don't support InfiniBand or where RDMACM isn't required, the ``interface`` argument may be omitted or specified to listen on a different interface. - We specify ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda`` above for optimal performance with InfiniBand, see details `here `_. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand. + We specify ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda`` above for optimal performance with InfiniBand, see details `here `__. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand. Workers ^^^^^^^ @@ -86,7 +86,7 @@ To start workers with automatic UCX configuration and an RMM pool of 14GB per GP .. note:: Analogous to the scheduler setup, the ``interface="ib0"`` is intentionally specified above to ensure RDMACM is used in systems that support InfiniBand. On systems that don't support InfiniBand or where RDMACM isn't required, the ``interface`` argument may be omitted or specified to listen on a different interface. - We specify ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda`` above for optimal performance with InfiniBand, see details `here `_. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand. + We specify ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda`` above for optimal performance with InfiniBand, see details `here `__. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand. Client ^^^^^^ @@ -122,7 +122,7 @@ Alternatively, the ``with dask.config.set`` statement from the example above may We specify ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda`` above for optimal performance with InfiniBand, see details `here `_. If not using InfiniBand, that option may be omitted. In UCX 1.12 and newer, that option is default and may be omitted as well even when using InfiniBand. ``dask cuda worker`` with Manual Configuration ------------------------------------------- +---------------------------------------------- When using ``dask cuda worker`` with UCX communication and manual configuration, the scheduler, workers, and client must all be started manually, each using the same UCX configuration. diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index c5592b439..c42bd4837 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -16,6 +16,10 @@ To create a Dask-CUDA cluster using all available GPUs and connect a Dask.distri cluster = LocalCUDACluster() client = Client(cluster) +.. tip:: + + Be sure to include an ``if __name__ == "__main__":`` block when using :py:class:`dask_cuda.LocalCUDACluster` in a standalone Python script. See `standalone Python scripts `_ for more details. + ``dask cuda worker`` --------------------