Skip to content

Commit

Permalink
Adds documentation for configuring Slurm (#324)
Browse files Browse the repository at this point in the history
* Outlines slurm setup docs

* Fills in outline
  • Loading branch information
djperrefort authored Jun 19, 2024
1 parent bff37b9 commit c428b6c
Showing 1 changed file with 86 additions and 1 deletion.
87 changes: 86 additions & 1 deletion docs/install/slurm.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,88 @@
# Configuring Slurm

This documentation is coming soon.
Keystone is agnostic to most Slurm settings, and requires minimal modification to a running Slurm cluster.
However, certain fairshare features are incompatible with the Keystone accounting system and must be disabled.

## Enable Resource Tracking

Slurm categorizes system usage in terms of trackable resources (TRES).
To impose a usage limit on a computational resource, that resource must be represented in Slurm as a TRES.
Resource tracking for common TRES' like CPU, memory, and energy usage is enabled by default.
However, administrators may wish to extend the default list to enable limits on additional resource types (e.g., for GPU usage).

The `AccountingStorageTRES` setting is used to extend the default list of TRES values stored in the Slurm database.
See the official [Surm documentation](https://slurm.schedmd.com/tres.html#conf) on `slurm.conf` settings for details.

??? Example "Example: Tracking GPU usage"

The following example enables tracking for GPU resources in addition to the default TRES tracked by slurm.

```
AccountingStorageTRES=gres/gpu
```

??? Example "Example: Tracking GPU and IOP"

The following example extends the default TRES list with resources called GPU and iop1.

```
AccountingStorageTRES=gres/gpu,license/iop1
```

## Disable Usage Decay

Slurm defaults to using the [multifactor priority plugin](https://slurm.schedmd.com/priority_multifactor.html) to schedule jobs.
This can be confirmed by checking the Slurm `PriorityType` setting.

```bash
scontrol show config | grep PriorityType
```

When using the multifactor plugin, the `PriorityDecayHalfLife` and `PriorityUsageResetPeriod` settings need to be disabled.
Leaving either of these features enabled will cause Slurm to periodically reduce an account's recorded resource usage, causing inaccuracies in resource allocation limits.

```
PriorityDecayHalfLife=00:00:00
PriorityUsageResetPeriod=NONE
```

!!! Important

Disabling the `PriorityDecayHalfLife` and `PriorityUsageResetPeriod` values may affect your Slurm fairshare policy.
Administrators should adjust the rest of their fairshair policy settings as they see appropriate.

## Configuring Charging Rates

Keystone enforces allocation limits in units of billable TRES.
The total billable TRES for a given job is determined using the allocated resources in addition to a series of billing weights:

$$
Billable = \sum_{T} \, \left ( W_{T} * A_{T} \right )
$$

The billing weights are set on a per-partition basis using the `TRESBillingWeights` setting.
Billing weights are definable in variety of units.
See the [Slurm documentation](https://slurm.schedmd.com/tres.html) for more details.

??? Example "Example: Billing for CPU"

The following example only charges users for CPU resources.

```
PartitionName=partition_name TRESBillingWeights="CPU=1.0"
```

??? Example "Example: Billing for CPU and GPU"

The following example charges users for GPU resources at twice the rate of CPU resources.

```
PartitionName=partition_name TRESBillingWeights="CPU=1.0,GRES/gpu=2.0"
```

To ensure the total allocated resources are calculated correctly, the `MAX_TRES` flag must be enabled.
Doing so ensures the billable TRES include individual TRES on a node (e.g. cpus, mem, gres) plus the sum of all global TRES (e.g. licenses).

```
PriorityFlags=MAX_TRES
```

0 comments on commit c428b6c

Please sign in to comment.