AWS ParallelCluster v3.8.0
enrico-usai
released this
19 Dec 17:40
·
18 commits
to release-3.8
since this release
We're excited to announce the release of AWS ParallelCluster 3.8.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for EC2 Capacity Blocks for ML.
- Add support for Rocky Linux 8 as
CustomAmi
created throughbuild-image
process. No public official ParallelCluster Rocky8 Linux AMI is made available at this time. - Add
Scheduling/ScalingStrategy
parameter to control the cluster scaling strategy to use when launching EC2 instances for Slurm compute nodes.
Possible values areall-or-nothing
,greedy-all-or-nothing
,best-effort
, withall-or-nothing
being the default. - Add
HeadNode/SharedStorageType
parameter to use EFS storage instead of NFS exports from the head node root volume
for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and/home
data. This enhancement reduces the load on the head node networking. - Allow for mounting
home
as an EFS or FSx external shared storage via theSharedStorage
section of the config file. - Add new parameter
SlurmSettings/MungeKeySecretArn
to permit to use an external user-defined MUNGE key from AWS Secrets Manager. - Add
Monitoring/Alarms/Enabled
parameter to toggle Amazon CloudWatch Alarms for the cluster. - Add head node alarms to monitor EC2 health checks, CPU utilization and the overall status of the head node, and add them to the CloudWatch Dashboard created with the cluster.
- Add support for Data Repository Associations when using
PERSISTENT_2
asDeploymentType
for a managed FSx for Lustre. - Add
Scheduling/SlurmSettings/Database/DatabaseName
parameter to allow users to specify a custom name for the database on the database server to be used for Slurm accounting. - Make
InstanceType
an optional configuration parameter when configuringCapacityReservationTarget/CapacityReservationId
in the compute resource. - Add possibility to specify a prefix for IAM roles and policies created by ParallelCluster API.
- Add possibility to specify a permissions boundary to be applied for IAM roles and policies created by ParallelCluster API.
- Add support for il-central-1 region.
CHANGES
- Upgrade Slurm to 23.02.7 (from 23.02.6).
- Upgrade NVIDIA driver to version 535.129.03.
- Upgrade CUDA Toolkit to version 12.2.2.
- Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.
- Remove support of
all_or_nothing_batch
configuration parameter in the Slurm resume program, in favor of the newScheduling/ScalingStrategy
cluster configuration. - Changed cluster alarms naming convention to '[cluster-name]-[component-name]-[metric]'.
- Change default EBS volume types in ADC regions from
gp2
togp3
, for both the root and additional volumes. - The optional permissions boundary for the ParallelCluster API is now applied to every IAM role created by the API infrastructure.
- Upgrade EFA installer to
1.29.1
.- Efa-driver:
efa-2.6.0-1
- Efa-config:
efa-config-1.15-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.19.0-1
- Rdma-core:
rdma-core-46.0-1
- Open MPI:
openmpi40-aws-4.1.6-1
- Efa-driver:
- Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.
- Upgrade
aws-cfn-bootstrap
to version 2.0-28. - Add support for Python 3.10 in aws-parallelcluster-batch-cli.
BUG FIXES
- Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.
- Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.
- Fix disabling Slurm power save mode when setting
ScaledownIdletime = -1
. - Fix hard-coded path to Slurm installation dir in
update_slurm_database_password.sh
script for Slurm Accounting.