Enabling CI on EuroHPC and HPC Systems with Reframe, Apptainer, Slurm REST API and Github Actions: Part 1 #2208
prudhomm
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Enabling CI on EuroHPC and HPC Systems with Reframe, Apptainer, Slurm REST API and Github Actions
In the specialized domain of High-Performance Computing (HPC), integrating Continuous Integration (CI) is not just about ensuring code quality, but also about verifying large-scale performance and scalability. While CI is a staple in standard software development, its adaptation in HPC environments, particularly EuroHPC systems, faces unique challenges. These systems demand tests of large-scale runs and scalability properties, which are resource-intensive and cannot be executed as frequently as typical CI processes. In this context, we utilize strategies like scheduled CI runs or manual triggers via
workflow_dispatch
in GitHub Actions to balance the need for extensive testing against resource constraints. This blog post explores how leveraging Reframe and the Slurm REST API enables effective CI integration on EuroHPC systems, allowing for critical performance checks while managing the demands of HPC workflows.To do this, we developed a workflow that is based on four main tools
Apptainer, formerly known as Singularity, is a containerization solution designed specifically for HPC applications. It allows users to create portable and reproducible computing environments, crucial for consistent testing across different systems.
Reframe is a framework for writing regression tests for HPC systems. It simplifies the process of writing and running tests, particularly those that assess performance and scalability on these complex systems.
The Slurm REST API provides a programmable interface to the Slurm Workload Manager, enabling automated job submission, monitoring, and management directly from CI workflows.
Finally GitHub Actions is a key component of our toolchain, providing the automation platform for CI workflows. It enables the execution of automated testing and deployment processes directly from a GitHub repository. With GitHub Actions, you can automate, customize, and execute software development workflows right in your repository, making it an integral tool for implementing CI/CD in the complex environment of HPC systems.
This, combined with Apptainer, Reframe, and the Slurm REST API, forms a comprehensive solution for addressing the unique challenges of CI in HPC contexts.
In Part 2, I will describe in more details our complete workflow
Acknowledgements
Funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) and Poland, Germany, Spain, Hungary, France, Greece under grant agreement number: 101093457.
This publication expresses the opinions of the authors and not necessarily those of the EuroHPC JU and Associated Countries which are not responsible for any use of the information contained in this publication.
Beta Was this translation helpful? Give feedback.
All reactions