Releases: aws/aws-parallelcluster
AWS ParallelCluster v3.1.2
We're excited to announce the release of AWS ParallelCluster 3.1.2
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Upgrade Slurm to version 21.08.6.
BUG FIXES
- Fix the update of
/etc/hosts
file on computes nodes when a cluster is deployed in subnets without internet access. - Fix compute nodes bootstrap by waiting for ephemeral drives initialization before joining the cluster.
AWS ParallelCluster v3.1.1
We're excited to announce the release of AWS ParallelCluster 3.1.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for multiple users cluster environments by integrating with Active Directory (AD) domains managed via AWS Directory Service.
- Enable cluster creation in subnets with no internet access.
- Add abbreviated flags for
cluster-name
(-n),region
(-r),image-id
(-i) andcluster-configuration
/image-configuration
(-c) to the cli. - Add support for multiple compute resources with same instance type per queue.
- Add support for
UseEc2Hostnames
in the cluster configuration file. When set totrue
, use EC2 default hostnames (e.g. ip-1-2-3-4) for compute nodes. - Add support for GPU scheduling with Slurm on ARM instances with NVIDIA cards. Install NVIDIA drivers and CUDA library for ARM.
- Add
parallelcluster:compute-resource-name
tag to LaunchTemplates used by compute nodes. - Add support for
NEW_CHANGED_DELETED
as value of FSx for LustreAutoImportPolicy
option. - Explicitly set cloud-init datasource to be EC2. This save boot time for Ubuntu and CentOS platforms.
- Improve Security Groups created within the cluster to allow inbound connections from custom security groups when
SecurityGroups
parameter is specified for head node and/or queues. - Build Slurm with
slurmrestd
support.
CHANGES
- Upgrade Slurm to version 21.08.5.
- Upgrade NICE DCV to version 2021.3-11591.
- Upgrade NVIDIA driver to version 470.103.01.
- Upgrade CUDA library to version 11.4.4.
- Upgrade NVIDIA Fabric manager to version 470.103.01.
- Upgrade Intel MPI Library to 2021.4.0.441.
- Upgrade PMIx to version 3.2.3.
- Disable package update at instance launch time on Amazon Linux 2.
- Enable possibility to suppress
SlurmQueues
andComputeResources
length validators. - Use compute resource name rather than instance type in compute fleet Launch Template name.
- Disable EC2 ImageBuilder enhanced image metadata when building ParallelCluster custom images.
- Remove dumping of failed compute nodes to
/home/logs/compute
. Compute nodes log files are available in CloudWatch
and in EC2 console logs.
BUG FIXES
- Redirect stderr and stdout to CLI log file to prevent unwanted text to pollute the
pcluster
CLI output. - Fix exporting of cluster logs when there is no prefix specified, previously exported to a
None
prefix. - Fix rollback not being performed in case of cluster update failure.
- Do not configure GPUs in Slurm when NVIDIA driver is not installed.
- Fix
ecs:ListContainerInstances
permission inBatchUserRole
. - Fix
RootVolume
schema for theHeadNode
by raising an error if unsupportedKmsKeyId
is specified. - Fix
EfaSecurityGroupValidator
. Previously, it may produce false failures when custom security groups were provided and EFA was enabled. - Fix FSx metrics not displayed in Cloudwatch Dashboard.
AWS ParallelCluster v3.0.3
We're excited to announce the release of AWS ParallelCluster 3.0.3
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Disable log4j-cve-2021-44228-hotpatch service on Amazon Linux to avoid incurring in potential performance degradation.
AWS ParallelCluster v2.11.4
We're excited to announce the release of AWS ParallelCluster 2.11.4
Upgrade
How to upgrade?
sudo pip install aws-parallelcluster==2.11.4
CHANGES
- CentOS 8 is no longer supported (EOL on December 31st, 2021).
- Upgrade Slurm to version 20.11.8.
- Upgrade Cinc Client to version 17.2.29.
- Upgrade NICE DCV to version 2021.2-11190.
- Upgrade NVIDIA driver to version 470.82.01.
- Upgrade CUDA library to version 11.4.3.
- Upgrade NVIDIA Fabric manager to 470.82.01.
- Disable packages update at instance launch time on Amazon Linux 2.
- Disable unattended packages update on Ubuntu.
- Install Python 3 version of
aws-cfn-bootstrap
scripts on CentOS 7 and Ubuntu 18.04, aligning with Ubuntu 20.04 and Amazon Linux 2.
BUG FIXES
- Disable update of
ec2_iam_role
parameter. - Fix
CpuOptions
configuration in LaunchTemplate for t2 instances.
AWS ParallelCluster v3.0.2
We're excited to announce the release of AWS ParallelCluster 3.0.2
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
3.0.2
CHANGES
- Upgrade EFA installer to version 1.14.1. Thereafter, EFA enables GDR support by default on supported instance type(s).
ParallelCluster does not reinstall EFA during node start. Previously, EFA was reinstalled ifGdrSupport
had been
turned on in the configuration file. TheGdrSupport
parameter has no effect and should no longer be used.- EFA configuration:
efa-config-1.9-1
- EFA profile:
efa-profile-1.5-1
- EFA kernel module:
efa-1.14.2
- RDMA core:
rdma-core-37.0
- Libfabric:
libfabric-1.13.2
- Open MPI:
openmpi40-aws-4.1.1-2
- EFA configuration:
BUG FIXES
- Fix issue that is preventing cluster names to start with
parallelcluster-
prefix.
AWS ParallelCluster v2.11.3
We're excited to announce the release of AWS ParallelCluster 2.11.3
Upgrade
How to upgrade?
sudo pip3 install "aws-parallelcluster<3.0" --upgrade --user
2.11.3
CHANGES
- Upgrade EFA installer to version 1.14.1. Thereafter, EFA enables GDR support by default on supported instance type(s). ParallelCluster does not reinstall EFA during node start. Previously, EFA was reinstalled if
enable_efa_gdr
had been
turned on in the configuration file.- EFA configuration:
efa-config-1.9-1
- EFA profile:
efa-profile-1.5-1
- EFA kernel module:
efa-1.14.2
- RDMA core:
rdma-core-37.0
- Libfabric:
libfabric-1.13.2
- Open MPI:
openmpi40-aws-4.1.1-2
- EFA configuration:
- Include tags from cluster configuration file in the RunInstances dry runs performed during configuration validation.
BUG FIXES
- Fix the create custom AMI functionality issues:
- SGE download URL no more reachable. Use Debian repository to download SGE source archive.
- Outdated CA certificates used by Cinc. Update ca-certificates package during AMI build time.
- Fix cluster update when using proxy setup.
AWS ParallelCluster v3.0.1
We're excited to announce the release of AWS ParallelCluster 3.0.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
3.0.1
ENHANCEMENTS
- Add
pcluster3-config-converter
CLI command to convert cluster configuration from ParallelCluster 2 to ParallelCluster 3 version. - The region parameter is now retrieved from the provider chain, thus supporting the use of profiles and defaults specified in the
~/.aws/config
file. - Export
ParallelClusterApiInvokeUrl
andParallelClusterApiUserRole
in CloudFormation API Stack so they can be used by cross-stack references.
CHANGES
- Drop support for SysVinit. Only Systemd is supported.
- Include tags from cluster configuration file in the RunInstances dry runs performed during configuration validation.
- Allow '*' character in the configuration of S3Access/BucketName.
BUG FIXES
- Pin to the transitive dependencies resulting from the dependency on connexion.
- Fix cleanup of ECR resources when API infrastructure template is deleted.
- Fix supervisord service not enabled on Ubuntu. This was causing supervisord not to be started on instance reboot.
- Update ca-certificates package during AMI build time and have Cinc use the updated CA certificates bundle.
- Close stderr before exiting from pcluster CLI commands to avoid BrokenPipeError for processes that close the other end of the stdout pipe.
AWS ParallelCluster v3.0.0
We're excited to announce the release of AWS ParallelCluster 3.0.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
3.0.0
ENHANCEMENTS
- Add support for pcluster actions (e.g., create-cluster, update-cluster, delete-cluster) through HTTP endpoints
with Amazon API Gateway. - Revamp custom AMI creation and management by leveraging EC2 Image Builder. This also includes the implementation of
build-image
,delete-image
,describe-image
andlist-image
commands to manage custom ParallelCluster images. - Add
list-official-images
command to describe ParallelCluster official AMIs. - Add
export-cluster-logs
,list-cluster-logs
andget-cluster-log-events
commands to retrieve both CloudWatch Logs
and CloudFormation Stack Events. Addexport-image-logs
,list-image-logs
andget-image-log-events
commands to
retrieve both Image Builder Logs and CloudFormation Stack Events. - Enable the possibility to restart / reboot the head node also for instance types with
instance store.
Those operations remain anyway managed by the user that is responsible for the status of the cluster while operating
on the head node, e.g. stopping the compute fleet first. - Add support to use an existing Private Route53 Hosted Zone when using Slurm as scheduler.
- Add the possibility to configure the instance profile as alternative to configuring the IAM role for the head and for
each compute queue. - Add the possibility to configure IAM role, profile and policies for head node and for each compute queue.
- Add possibility to configure different security groups for each queue.
- Allow full control on the name of CloudFormation stacks created by ParallelCluster by removing the
parallelcluster-
prefix. - Add multiple queues and compute resources support for
pcluster configure
when the scheduler is Slurm. - Add prompt for availability zone in
pcluster configure
automated subnets creation. - Add configuration
HeadNode / Imds / Secured
to enable/disable restricted access to Instance Metadata Service (IMDS). - Implement scaling protection mechanism with Slurm scheduler: compute fleet is automatically set to 'PROTECTED'
state in case recurrent failures are encountered when provisioning nodes. - Add
--suppress-validators
and--validation-failure-level
parameters tocreate
andupdate
commands. - Add support for associating an existing Elastic IP to the head node.
- Extend limits for supported number of Slurm queues (10) and compute resources (5).
- Encrypt root EBS volumes and shared EBS volumes by default. Note that if the scheduler is AWS Batch, the root volumes
of the compute nodes cannot be encrypted by ParallelCluster.
CHANGES
- Upgrade EFA installer to version 1.13.0
- EFA configuration:
efa-config-1.9
- EFA profile:
efa-profile-1.5
- EFA kernel module:
efa-1.13.0
- RDMA core:
rdma-core-35
- Libfabric:
libfabric-1.13.0
- Open MPI:
openmpi40-aws-4.1.1-2
- EFA configuration:
- Upgrade NICE DCV to version 2021.1-10851.
- Upgrade Slurm to version 20.11.8.
- Upgrade NVIDIA driver to version 470.57.02.
- Upgrade CUDA library to version 11.4.0.
- Upgrade Cinc Client to version 17.2.29.
- Upgrade Python runtime used by Lambda functions in AWS Batch integration to python3.8.
- Remove support for SGE and Torque schedulers.
- Remove support for CentOS8.
- Change format and syntax of the configuration file to be used to create the cluster, from ini to YAML. A cluster configuration
file now only includes the definition of a single cluster. - Remove
--cluster-template
,--extra-parameters
and--tags
parameters for thecreate
command. - Remove
--cluster-template
,--extra-parameters
,--reset-desired
and--yes
parameters for theupdate
command. - Remove
--config
parameter fordelete
,status
,start
,stop
,instances
andlist
commands. - Remove possibility to specify aliases for
ssh
command in the configuration file. - Distribute AWS Batch commands:
awsbhosts
,awsbkill
,awsbout
,awsbqueues
,awsbstat
andawsbsub
as a
separateaws-parallelcluster-awsbatch-cli
PyPI package. - Add timestamp suffix to CloudWatch Log Group name created for the cluster.
- Remove
pcluster-config
CLI utility. - Remove
amis.txt
file. - Remove additional EBS volume attached to the head node by default.
- Change NICE DCV session storage path to
/home/{UserName}
. - Create a single ParallelCluster S3 bucket for each AWS region rather than for each cluster.
- Adopt inclusive language
- Rename MasterServer to HeadNode in CLI outputs.
- Rename variable exported in the AWS Batch job environment from MASTER_IP to PCLUSTER_HEAD_NODE_IP.
- Rename all CFN outputs from Master* to HeadNode*.
- Rename NodeType and tags from Master to HeadNode.
- Rename tags (Note: the following tags are crucial for ParallelCluster scaling logic):
aws-parallelcluster-node-type
->parallelcluster:node-type
ClusterName
->parallelcluster:cluster-name
aws-parallelcluster-attributes
->parallelcluster:attributes
Version
->parallelcluster:version
- Remove tag:
Application
. - Remove runtime creation method
of custom ParallelCluster AMIs. - Retain CloudWatch logs on cluster deletion by default. If you want to delete the logs during cluster deletion, set
Monitoring / Logs / CloudWatch / RetainOnDeletion
to False in the configuration file. - Remove instance store software encryption option (encrypted_ephemeral) and rely on default hardware encryption provided
by NVMe instance store volumes. - Add tag 'Name' to every shared storage with the value specified in the shared storage name config.
- Remove installation of MPICH and FFTW packages.
- Remove Ganglia support.
AWS ParallelCluster v2.11.2
We're excited to announce the release of AWS ParallelCluster 2.11.2
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
2.11.2
CHANGES
- When using a custom AMI with a preinstalled EFA package, no actions are taken at node bootstrap time in case GPUDirect RDMA is enabled. The original EFA package deployment is preserved as during the createami process.
- Upgrade EFA installer to version 1.13.0
- Update
rdma-core
to v35.0. - Update
libfabric
to v1.13.0amzn1.0.
- Update
BUG FIXES
- Lock the version of
nvidia-fabricmanager
package to the installed NVIDIA drivers to prevent updates and misalignments. - Slurm: fix issue that prevented powering-up nodes to be correctly reset after a stop and start of the cluster.
AWS ParallelCluster v2.11.1
We're excited to announce the release of AWS ParallelCluster 2.11.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Restore
noatime
option, which has positive impact on the performances of NFS filesystem. - Upgrade EFA installer to version 1.12.3
- EFA configuration:
efa-config-1.9
(fromefa-config-1.8-1
) - EFA kernel module:
efa-1.13.0
(fromefa-1.12.3
)
- EFA configuration:
BUG FIXES
- Pin to version 1.247347 of the CloudWatch agent due to performance impact of latest CW agent version 1.247348.
- Avoid failures when building SGE using instance type with vCPU >=32.