The main objective of (GPU) Mekong is to provide a simplified path to scale out the execution of GPU programs from one GPU to almost any number, independent of whether the GPUs are located within one host or distributed at the cloud or cluster level. Unlike existing solutions, this work proposes to maintain the GPU’s native programming model, which relies on a bulk-synchronous, thread-collective execution; that is, no hybrid solutions like OpenCL/CUDA programs combined with message passing are required. As a result, we can maintain the simplicity and efficiency of GPU computing in the scale-out case, together with a high productivity and performance.
Homepage: www.gpumekong.org
Note: this is work in progress
GPU Mekong is an external project to LLVM/clang, so for the most part follow the Getting Started Guide, except for these changes:
- before running
cmake
, clone mekong into yourllvm/tools
directory, just like clang, e.g.:
$ cd where-you-want-llvm-to-live
$ cd llvm/tools
$ git clone github.com/unihd-ceg/mekong-cuda
- apply the patch
llvm-enable-mekong.patch
:
$ cd where-you-want-llvm-to-live
$ cd llvm
$ patch -p1 < tools/mekong/llvm-enable-mekong.diff
Now you should be able to compile and install llvm as usual.
Mekong adds a new tool to the bin directory, the compiler driver "mekc". It supports a subset of the gpucc arguments and orchestrates the pipeline. CUDA linking directives are automatically added, except for the linker search path.
So in order to compiler an application myapp.cu
to the binary myapp
,
use the following command:
$ mekc myapp.cu -o myapp -L /usr/local/cuda/lib64
For details about gpucc CUDA compilation, consult Compiling CUDA with clang.
Two new compilation steps are added after preprocessing but
before compiling: "polyhedral analysis" and "rewriting".
Analog to the -c
switch, there is a new switch -A
that terminates
compilation after analysis and a switch -R
that terminates compilation
after rewriting the source code.
Applications compiled with Mekong generally do not require configuration. However, there are some environment variables that influence the behavior of the mekong system, and therefore the application, at runtime.
CUDA_VISIBLE_DEVICES
. Although not directly used by Mekong directly, this variable determines the set of visible (= utilized) GPUs. Valid values are comma separated lists of GPU indices starting at0
. E.g. to only enable the first four available GPUs, useCUDA_VISIBLE_DEVICES=0,1,2,3
.MELOGLEVEL
. Set to numbers0
to8
to enable logging. Currently only log levels3
through7
are in use. Higher numbers correspond to more detailed, verbose logging.MEDEBUG
. Can be set tonotransfers
to disable all transfers for buffer synchronization ornopatterns
to disable memory access patterns and transfers. These options are used for timing analysis.