This is a Prometheus Exporter for
exporting NVIDIA GPU metrics. It uses the NVIDIA Go NVML bindings
for NVIDIA Management Library
(NVML) which is a C-based API that can be used for monitoring NVIDIA GPU devices.
Unlike some other similar exporters, it does not call the
nvidia-smi
binary.
This Exporter is a fork of https://github.com/mindprince/nvidia_gpu_prometheus_exporter with the following main changes:
- added parsing of /run/gpustat/XX for jobid and uid of the user running on the GPU. Slurm scripts that take advantage of this are available on jobstats website.
- switched from Go bindings to NVIDIA Go NVML bindings
- added support for MIG instance autodetection and stats
E.g.
go build
The exporter requires the following:
- access to NVML library (
libnvidia-ml.so.1
). - access to the GPU devices.
To make sure that the exporter can access the NVML libraries, either add them
to the search path for shared libraries. Or set LD_LIBRARY_PATH
to point to
their location.
By default the metrics are exposed on port 9445
. This can be updated using
the -web.listen-address
flag.