-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds documentation for configuring Slurm (#324)
* Outlines slurm setup docs * Fills in outline
- Loading branch information
1 parent
bff37b9
commit c428b6c
Showing
1 changed file
with
86 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,88 @@ | ||
# Configuring Slurm | ||
|
||
This documentation is coming soon. | ||
Keystone is agnostic to most Slurm settings, and requires minimal modification to a running Slurm cluster. | ||
However, certain fairshare features are incompatible with the Keystone accounting system and must be disabled. | ||
|
||
## Enable Resource Tracking | ||
|
||
Slurm categorizes system usage in terms of trackable resources (TRES). | ||
To impose a usage limit on a computational resource, that resource must be represented in Slurm as a TRES. | ||
Resource tracking for common TRES' like CPU, memory, and energy usage is enabled by default. | ||
However, administrators may wish to extend the default list to enable limits on additional resource types (e.g., for GPU usage). | ||
|
||
The `AccountingStorageTRES` setting is used to extend the default list of TRES values stored in the Slurm database. | ||
See the official [Surm documentation](https://slurm.schedmd.com/tres.html#conf) on `slurm.conf` settings for details. | ||
|
||
??? Example "Example: Tracking GPU usage" | ||
|
||
The following example enables tracking for GPU resources in addition to the default TRES tracked by slurm. | ||
|
||
``` | ||
AccountingStorageTRES=gres/gpu | ||
``` | ||
|
||
??? Example "Example: Tracking GPU and IOP" | ||
|
||
The following example extends the default TRES list with resources called GPU and iop1. | ||
|
||
``` | ||
AccountingStorageTRES=gres/gpu,license/iop1 | ||
``` | ||
|
||
## Disable Usage Decay | ||
|
||
Slurm defaults to using the [multifactor priority plugin](https://slurm.schedmd.com/priority_multifactor.html) to schedule jobs. | ||
This can be confirmed by checking the Slurm `PriorityType` setting. | ||
|
||
```bash | ||
scontrol show config | grep PriorityType | ||
``` | ||
|
||
When using the multifactor plugin, the `PriorityDecayHalfLife` and `PriorityUsageResetPeriod` settings need to be disabled. | ||
Leaving either of these features enabled will cause Slurm to periodically reduce an account's recorded resource usage, causing inaccuracies in resource allocation limits. | ||
|
||
``` | ||
PriorityDecayHalfLife=00:00:00 | ||
PriorityUsageResetPeriod=NONE | ||
``` | ||
|
||
!!! Important | ||
|
||
Disabling the `PriorityDecayHalfLife` and `PriorityUsageResetPeriod` values may affect your Slurm fairshare policy. | ||
Administrators should adjust the rest of their fairshair policy settings as they see appropriate. | ||
|
||
## Configuring Charging Rates | ||
|
||
Keystone enforces allocation limits in units of billable TRES. | ||
The total billable TRES for a given job is determined using the allocated resources in addition to a series of billing weights: | ||
|
||
$$ | ||
Billable = \sum_{T} \, \left ( W_{T} * A_{T} \right ) | ||
$$ | ||
|
||
The billing weights are set on a per-partition basis using the `TRESBillingWeights` setting. | ||
Billing weights are definable in variety of units. | ||
See the [Slurm documentation](https://slurm.schedmd.com/tres.html) for more details. | ||
|
||
??? Example "Example: Billing for CPU" | ||
|
||
The following example only charges users for CPU resources. | ||
|
||
``` | ||
PartitionName=partition_name TRESBillingWeights="CPU=1.0" | ||
``` | ||
|
||
??? Example "Example: Billing for CPU and GPU" | ||
|
||
The following example charges users for GPU resources at twice the rate of CPU resources. | ||
|
||
``` | ||
PartitionName=partition_name TRESBillingWeights="CPU=1.0,GRES/gpu=2.0" | ||
``` | ||
|
||
To ensure the total allocated resources are calculated correctly, the `MAX_TRES` flag must be enabled. | ||
Doing so ensures the billable TRES include individual TRES on a node (e.g. cpus, mem, gres) plus the sum of all global TRES (e.g. licenses). | ||
|
||
``` | ||
PriorityFlags=MAX_TRES | ||
``` |