Skip to content

AWS ParallelCluster v3.8.0

Compare
Choose a tag to compare
@enrico-usai enrico-usai released this 19 Dec 17:40
· 18 commits to release-3.8 since this release

We're excited to announce the release of AWS ParallelCluster 3.8.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for EC2 Capacity Blocks for ML.
  • Add support for Rocky Linux 8 as CustomAmi created through build-image process. No public official ParallelCluster Rocky8 Linux AMI is made available at this time.
  • Add Scheduling/ScalingStrategy parameter to control the cluster scaling strategy to use when launching EC2 instances for Slurm compute nodes.
    Possible values are all-or-nothing, greedy-all-or-nothing, best-effort, with all-or-nothing being the default.
  • Add HeadNode/SharedStorageType parameter to use EFS storage instead of NFS exports from the head node root volume
    for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and /home data. This enhancement reduces the load on the head node networking.
  • Allow for mounting home as an EFS or FSx external shared storage via the SharedStorage section of the config file.
  • Add new parameter SlurmSettings/MungeKeySecretArn to permit to use an external user-defined MUNGE key from AWS Secrets Manager.
  • Add Monitoring/Alarms/Enabled parameter to toggle Amazon CloudWatch Alarms for the cluster.
  • Add head node alarms to monitor EC2 health checks, CPU utilization and the overall status of the head node, and add them to the CloudWatch Dashboard created with the cluster.
  • Add support for Data Repository Associations when using PERSISTENT_2 as DeploymentType for a managed FSx for Lustre.
  • Add Scheduling/SlurmSettings/Database/DatabaseName parameter to allow users to specify a custom name for the database on the database server to be used for Slurm accounting.
  • Make InstanceType an optional configuration parameter when configuring CapacityReservationTarget/CapacityReservationId in the compute resource.
  • Add possibility to specify a prefix for IAM roles and policies created by ParallelCluster API.
  • Add possibility to specify a permissions boundary to be applied for IAM roles and policies created by ParallelCluster API.
  • Add support for il-central-1 region.

CHANGES

  • Upgrade Slurm to 23.02.7 (from 23.02.6).
  • Upgrade NVIDIA driver to version 535.129.03.
  • Upgrade CUDA Toolkit to version 12.2.2.
  • Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.
  • Remove support of all_or_nothing_batch configuration parameter in the Slurm resume program, in favor of the new Scheduling/ScalingStrategy cluster configuration.
  • Changed cluster alarms naming convention to '[cluster-name]-[component-name]-[metric]'.
  • Change default EBS volume types in ADC regions from gp2 to gp3, for both the root and additional volumes.
  • The optional permissions boundary for the ParallelCluster API is now applied to every IAM role created by the API infrastructure.
  • Upgrade EFA installer to 1.29.1.
    • Efa-driver: efa-2.6.0-1
    • Efa-config: efa-config-1.15-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.19.0-1
    • Rdma-core: rdma-core-46.0-1
    • Open MPI: openmpi40-aws-4.1.6-1
  • Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.
  • Upgrade aws-cfn-bootstrap to version 2.0-28.
  • Add support for Python 3.10 in aws-parallelcluster-batch-cli.

BUG FIXES

  • Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.
  • Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.
  • Fix disabling Slurm power save mode when setting ScaledownIdletime = -1.
  • Fix hard-coded path to Slurm installation dir in update_slurm_database_password.sh script for Slurm Accounting.