-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
fe0c26e
commit 82e246f
Showing
18 changed files
with
754 additions
and
2,814 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
--- | ||
title: "Docs" | ||
description: "" | ||
summary: "" | ||
date: 2023-09-07T16:12:03+02:00 | ||
lastmod: 2023-09-07T16:12:03+02:00 | ||
draft: false | ||
toc: true | ||
--- | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
title: "Introduction to OpenCHAMI" | ||
description: "What is OpenCHAMI?" | ||
summary: "" | ||
date: 2024-03-07T16:12:03+02:00 | ||
lastmod: 2024-03-07T16:12:03+02:00 | ||
draft: false | ||
toc: true | ||
--- | ||
|
||
In 2023, a group of some of the largest HPC sites came together to invest in an open source future system manager that could bridge the worlds of cloud and HPC. They formed a consortium and established governance to guide the development of that system manager. At early board meetings, the team decided on core concepts and used them to establish the name of the project. | ||
|
||
* **Open** - The community is more important than any included software and all parts of the solution should be freely available for anyone to build an HPC system or customize to meet the needs of their system or site. | ||
* **Composable** - Rather than a fully integrated system, the goal of OpenCHAMI is to be modular. Not all sites or systems will need the same set of software, and it should be possible to replace software components as needed. | ||
* **Heterogeneous** - Sites with multiple kinds of hardware, should be able to manage them all with the same resilient and scalable infrastructure. OpenCHAMI makes no design assumptions that force a single system image, single architecture, or even a single High Speed Interconnect that must be shared across all nodes. | ||
* **Adaptable** - The community values constant evolution of system management software. This is true at the individual system scale where adaptability means resiliency and stability. It is also true as the industry evolves to embrace new technologies and the solution itself needs to adapt. | ||
* **Management Infrastructure** - Describing the solution in this way narrows the scope of OpenCHAMI to activities in the management plane of HPC systems. Sysadmins need a stable base upon which they can choose the best Operating Systems and user interaction methods for their users. | ||
|
||
|
||
|
||
## First steps | ||
|
||
A core team at Los Alamos National Laboratory took the lead on software development. Their first goal was to strip back much of CSM to the bare minimum needed to boot ten nodes. Within a few months, they had established the core repositories and were able to share their progress at SC23 in Denver. The lightweight solution involved just a few microservices and support services and was able discover and boot ten nodes. [Learn more here...](https://github.com/OpenCHAMI/lanl-demo-sc23) | ||
|
||
Following the initial demonstration, additional teams have been adding resoruces to the effort with their own goals. Work continues on using OpenCHAMI to manage clusters at the US National Laboratories. At the same time, sites are working to manage HPC systems from the public cloud with OpenCHAMI on GKE. Other sites are collaborating on managing multiple HPC clusters with the same OpenCHAMI installation. | ||
|
||
In May of 2024, the team will showcase progress at the International Supercomputing Conference in Germany. Installation times, boot times, and overall scale have improved considerably in the last six months. | ||
|
||
Join us as we plan the next steps in our journey! | ||
|
||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,17 @@ | ||
--- | ||
title: "Software" | ||
description: "" | ||
summary: "" | ||
date: 2024-03-07T16:12:03+02:00 | ||
lastmod: 2024-03-07T16:12:03+02:00 | ||
draft: false | ||
weight: 999 | ||
toc: true | ||
seo: | ||
title: "" # custom title (optional) | ||
description: "" # custom description (recommended) | ||
canonical: "" # custom canonical URL (optional) | ||
noindex: false # false (default) or true | ||
weight: 50 | ||
toc: false | ||
pinned: false | ||
homepage: false | ||
--- | ||
|
||
# OpenCHAMI Documentation | ||
{{< callout context="tip" title="Did you know?" icon="rocket" >}} | ||
The [OpenCHAMI quickstart](https://github.com/openchami/deployment-recipes/) has been used to boot over 600 compute nodes in about five minutes, including POST!. | ||
{{< /callout >}} | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
title: "Architecture" | ||
date: 2024-03-07T16:12:03+02:00 | ||
lastmod: 2024-03-07T16:12:03+02:00 | ||
draft: false | ||
weight: 50 | ||
toc: false | ||
pinned: false | ||
homepage: false | ||
--- | ||
|
||
|
||
## Philosophy | ||
|
||
The OpenCHAMI archtectural concepts share a lot with the original UNIX concepts. Tools should do one thing well and provide useful inputs and outputs to interoperate with other tools. With tools that interact on the same system, UNIX pipes and plain text remain core to interoperability. For tools that interact across distributed systems and potentially across the internet, we add additional useful constratins. | ||
|
||
* REST | ||
* TLS | ||
* JWT | ||
* json/yaml | ||
|
||
## Early design decisions | ||
|
||
In establishing the governance and charter of OpenCHAMI, the board and technical steering committee made a few foundational decisions. | ||
|
||
1. All development must be publically available through the [OpenCHAMI Github Organization](https://github.com/OpenCHAMI) including meeting notes and design discussions. | ||
1. The software development effort must start with MIT-licensed microservices from the Cray System Manager(CSM) which was developed for the first Exascale Class Supercomputers. | ||
1. Initial development must be focused on containerized microservices with REST APIS and cloud-like authentication/authorization. | ||
1. The system must operate well for traditional, highly parallel, shared memory workloads. | ||
1. The system must support new types of workloads that are being developed to support Machine Learning, Model Training, and Inference. | ||
1. The system must support the evoloving concept of HPC multitenancy. | ||
|
||
## Third Party Services | ||
|
||
Most of the components found in a deployment of OpenCHAMI are not part of the OpenCHAMI project. Common open source-software like `dnsmasq` and `haproxy` have much larger developer and user communities, and they fulfill HPC needs without customization. There are also plenty of resources for teams that would prefer alternatives to each of the recommended third party applications. | ||
|
||
Where OpenCHAMI microservices do need to be specific, we favor, but do not require, a set of common technologies. | ||
|
||
## OpenCHAMI services tend to: | ||
|
||
* be HTTPS Microservices | ||
* run as containers | ||
* be configured at runtime through flags and environment variables | ||
* be based on go 1.21 and the wolfi base containers from chainguard | ||
* leverage bearer tokens for decentralized authentication and authorization | ||
* use go-chi with its robust middleware support |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
title: "Cloud-Init: Standard node personalization for the cloud" | ||
description: "" | ||
summary: "" | ||
date: 2024-03-21T00:00:00+00:00 | ||
lastmod: 2024-03-21T00:00:00+00:00 | ||
draft: false | ||
weight: 800 | ||
toc: true | ||
--- | ||
|
||
|
||
Cloud-Init is a defacto standard in the cloud world. When you launch a virtual machine through the cloud providers, you may specify simple identity information as well as scripts that will be executed as part of the initialization of the instance. In many cases, this is the only post-boot configuration required. We use it for the same reason in OpenCHAMI. Our cloud-init server provides identity and configuration information based on the contents of SMD and suitable for the cloud-init client which is included with most Linux distributions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
title: "Deploy OpenHPC" | ||
description: "" | ||
summary: "Deploying Alma Linux with OpenHPC on the compute nodes" | ||
date: 2023-09-07T16:04:48+02:00 | ||
lastmod: 2023-09-07T16:04:48+02:00 | ||
draft: false | ||
weight: 810 | ||
toc: true | ||
seo: | ||
title: "" # custom title (optional) | ||
description: "" # custom description (recommended) | ||
canonical: "" # custom canonical URL (optional) | ||
noindex: false # false (default) or true | ||
--- |
Oops, something went wrong.