diff --git a/assets/scss/common/_variables-custom.scss b/assets/scss/common/_variables-custom.scss index d2280cc..040e272 100644 --- a/assets/scss/common/_variables-custom.scss +++ b/assets/scss/common/_variables-custom.scss @@ -1 +1,9 @@ // Put your custom SCSS variables here + +$container-max-widths: ( + sm: 540px, + md: 720px, + lg: 1240px, + xl: 1241px, + xxl: 1320px +); \ No newline at end of file diff --git a/content/docs/software/architecture/_index.md b/content/docs/software/architecture/_index.md new file mode 100644 index 0000000..bd417f0 --- /dev/null +++ b/content/docs/software/architecture/_index.md @@ -0,0 +1,11 @@ +--- +title: "Architecture" +date: 2024-03-07T16:12:03+02:00 +lastmod: 2024-03-07T16:12:03+02:00 +draft: false +weight: 50 +toc: true +pinned: false +homepage: false +--- + diff --git a/content/docs/software/architecture/design_decisions.md b/content/docs/software/architecture/design_decisions.md new file mode 100644 index 0000000..f0cd412 --- /dev/null +++ b/content/docs/software/architecture/design_decisions.md @@ -0,0 +1,19 @@ +--- +title: "Early Design Decisions" +date: 2024-04-07T16:12:03+02:00 +lastmod: 2024-04-07T16:12:03+02:00 +draft: false +weight: 50 +toc: true +pinned: false +homepage: false +--- + +In establishing the governance and charter of OpenCHAMI, the board and technical steering committee made a few foundational decisions. + +1. All development must be publically available through the [OpenCHAMI Github Organization](https://github.com/OpenCHAMI) including meeting notes and design discussions. +1. The software development effort must start with MIT-licensed microservices from the Cray System Manager(CSM) which was developed for the first Exascale Class Supercomputers. +1. Initial development must be focused on containerized microservices with REST APIS and cloud-like authentication/authorization. +1. The system must operate well for traditional, highly parallel, shared memory workloads. +1. The system must support new types of workloads that are being developed to support Machine Learning, Model Training, and Inference. +1. The system must support the evoloving concept of HPC multitenancy. \ No newline at end of file diff --git a/content/docs/software/architecture/index.md b/content/docs/software/architecture/index.md deleted file mode 100644 index 58d8462..0000000 --- a/content/docs/software/architecture/index.md +++ /dev/null @@ -1,46 +0,0 @@ ---- -title: "Architecture" -date: 2024-03-07T16:12:03+02:00 -lastmod: 2024-03-07T16:12:03+02:00 -draft: false -weight: 50 -toc: false -pinned: false -homepage: false ---- - - -## Philosophy - -The OpenCHAMI archtectural concepts share a lot with the original UNIX concepts. Tools should do one thing well and provide useful inputs and outputs to interoperate with other tools. With tools that interact on the same system, UNIX pipes and plain text remain core to interoperability. For tools that interact across distributed systems and potentially across the internet, we add additional useful constratins. - -* REST -* TLS -* JWT -* json/yaml - -## Early design decisions - -In establishing the governance and charter of OpenCHAMI, the board and technical steering committee made a few foundational decisions. - -1. All development must be publically available through the [OpenCHAMI Github Organization](https://github.com/OpenCHAMI) including meeting notes and design discussions. -1. The software development effort must start with MIT-licensed microservices from the Cray System Manager(CSM) which was developed for the first Exascale Class Supercomputers. -1. Initial development must be focused on containerized microservices with REST APIS and cloud-like authentication/authorization. -1. The system must operate well for traditional, highly parallel, shared memory workloads. -1. The system must support new types of workloads that are being developed to support Machine Learning, Model Training, and Inference. -1. The system must support the evoloving concept of HPC multitenancy. - -## Third Party Services - -Most of the components found in a deployment of OpenCHAMI are not part of the OpenCHAMI project. Common open source-software like `dnsmasq` and `haproxy` have much larger developer and user communities, and they fulfill HPC needs without customization. There are also plenty of resources for teams that would prefer alternatives to each of the recommended third party applications. - -Where OpenCHAMI microservices do need to be specific, we favor, but do not require, a set of common technologies. - -## OpenCHAMI services tend to: - -* be HTTPS Microservices -* run as containers -* be configured at runtime through flags and environment variables -* be based on go 1.21 and the wolfi base containers from chainguard -* leverage bearer tokens for decentralized authentication and authorization -* use go-chi with its robust middleware support \ No newline at end of file diff --git a/content/docs/software/architecture/philosophy.md b/content/docs/software/architecture/philosophy.md new file mode 100644 index 0000000..0b75900 --- /dev/null +++ b/content/docs/software/architecture/philosophy.md @@ -0,0 +1,51 @@ +--- +title: "Design Philosophy" +date: 2024-04-07T16:12:03+02:00 +lastmod: 2024-04-07T16:12:03+02:00 +draft: false +weight: 50 +toc: true +pinned: false +homepage: false +--- +{{< callout context="note" title="Service Philosophy" icon="file-certificate" >}} +## OpenCHAMI services tend to: + +* be HTTPS Microservices +* run as containers +* be configured at runtime through flags and environment variables +* be based on go 1.21 and the wolfi base containers from chainguard +* leverage bearer tokens for decentralized authentication and authorization +* use go-chi with its robust middleware support +{{< /callout >}} + +The OpenCHAMI archtectural concepts share a lot with the original UNIX concepts. Tools should do one thing well and provide useful inputs and outputs to interoperate with other tools. + +## Cloud Design Patterns + +The UNIX philosophy remains core to most sofware development and is as relevant today to containerized microservices as it was for `sed` and `awk` in their early incarnations. The technologies and design patterns available today are different and so the expressions are different of those concepts are different. + +### The single container pattern + +Like the first principle of UNIX philosophy, a program should do one thing and do it well. In microservice development, each container should do one thing and do it well. That means separating the long-running services, like web servers, from the scripts that are used to support those long running services. This shifts the focus away from trying to containerize all aspects of an operation in one container and instead focuses on externalizing communication and configuration of services. + +## Container inputs and outputs + +If the first principle is to do one thing well which can be implemented with a single container or runtime, we need to follow up with a second design pattern to address the useful inputs and outputs. In the UNIX world, pipes and text files are ubiquitous, but those both have some drawbacks in distributed systems that may evolve at different rates over time. Modern containerized development extends the philosophy with practical tools to improve the speed and reliablity of development in an inherently distributed system. + +Use structured data where possible. Text processing is expensive and brittle. Updates to the way one tool produces text need to be mirrored in any tools that consume text. Structured data is much more forgiving. When a program produces `yaml` or `json` data instead of plain text, other tools that interact with it can target the data rather than the program itself. This "loose coupling" between the program that produces the data and the program that reads it allows both programs to evolve at different speeds while remaining interoperable. + +### The Sidecar Pattern + +In containerized development, a sidecar is a container that operates in support of another container. For example, if a program needs to re-read a configuration file as it changes, it is common to have a sidecar responsible for the update of that configuration file. In this example, it is also common for the sidecar to signal the main container/process when a reload is necessary. Keep in mind that many programs designed to operate in containers tend to avoid configuration files alltogether. + +To use a concrete + +### Runtime Configuration + + +* REST +* TLS +* JWT +* json/yaml + diff --git a/content/docs/software/architecture/third_party_services.md b/content/docs/software/architecture/third_party_services.md new file mode 100644 index 0000000..b6472ea --- /dev/null +++ b/content/docs/software/architecture/third_party_services.md @@ -0,0 +1,14 @@ +--- +title: "Third Party Services" +date: 2024-04-07T16:12:03+02:00 +lastmod: 2024-04-07T16:12:03+02:00 +draft: false +weight: 50 +toc: true +pinned: false +homepage: false +--- + +Most of the components found in a deployment of OpenCHAMI are not part of the OpenCHAMI project. Common open source-software like `dnsmasq` and `haproxy` have much larger developer and user communities, and they fulfill HPC needs without customization. There are also plenty of resources for teams that would prefer alternatives to each of the recommended third party applications. + +Where OpenCHAMI microservices do need to be specific, we favor, but do not require, a set of common technologies. \ No newline at end of file diff --git a/content/guides/add_nodes.md b/content/guides/add_nodes.md new file mode 100644 index 0000000..fdee5f2 --- /dev/null +++ b/content/guides/add_nodes.md @@ -0,0 +1,53 @@ +--- +title: "Add Nodes" +description: "" +summary: "Add Nodes" +date: 2024-04-07T16:04:48+02:00 +lastmod: 2024-04-07T16:04:48+02:00 +draft: false +weight: 510 +toc: true +seo: + title: "" # custom title (optional) + description: "" # custom description (recommended) + canonical: "" # custom canonical URL (optional) + noindex: false # false (default) or true +--- + +Adding nodes to an OpenCHAMI system can happen through either discovery with [Magellan](/docs/software/magellan) or through manually creating the nodes using the API. In this tutorial, we'll show you how to manually interact with the API to add one or more nodes to the system. + +Regardless of the tool you choose, you'll need access to an OpenCHAMI deployment, a valid token, and the certificate so that your client can verify the connection. + +{{< details "Your token and certificate" >}} +The quickstart repository has a set of bash functions for obtaining the certificate and token needed for these examples: +```bash +source bash_functions.sh +get_ca_cert > cacert.pem +ACCESS_TOKEN=$(gen_access_token) +echo $ACCESS_TOKEN +``` +{{< /details >}} + + +## Using curl to read the list of nodes + +Curl is a useful tool for interacting with HTTP APIs. It provides plenty of feedback through flags and allows for extensive customization. Assiming the system name you've chosen for your cluster is `foobar` and you've updated your `/etc/hosts` file to resolve the url below correctly, you can use curl as described below to verify that your certificates and token are working properly. + +```bash +curl --cacert cacert.pem -H "Authorization: Bearer $ACCESS_TOKEN" https://foobar.openchami.cluster/hsm/v2/State/Components +``` + +### Common errors + +1. `curl: (6) Could not resolve host: foobar.openchami.cluster` + This indicates that your local system cannot match the domain name to the ip address of your installation. Check your /etc/hosts file and update it if necessary. + +1. `token is unauthorized` + This indicates that something isn't working with the access token in your Authorization header. First confirm that the header is being specified correctly. It's imporant that the header matches precisely. `"Authorization: Bearer "` where the token is a very long string. + + + + +## Adding nodes with ochami-cmdline + +Coming soon! \ No newline at end of file diff --git a/content/guides/deploy_openhpc.md b/content/guides/deploy_openhpc.md index d6cbfd6..f750b58 100644 --- a/content/guides/deploy_openhpc.md +++ b/content/guides/deploy_openhpc.md @@ -1,3 +1,4 @@ +--- title: "Deploy OpenHPC" description: "" summary: "Deploying Alma Linux with OpenHPC on the compute nodes" @@ -11,4 +12,6 @@ seo: description: "" # custom description (recommended) canonical: "" # custom canonical URL (optional) noindex: false # false (default) or true ---- \ No newline at end of file +--- + +Coming Soon! \ No newline at end of file diff --git a/content/guides/docker_tour.md b/content/guides/docker_tour.md new file mode 100644 index 0000000..899ecbd --- /dev/null +++ b/content/guides/docker_tour.md @@ -0,0 +1,70 @@ +--- +title: "Docker Tour" +description: "" +date: 2024-04-07T16:04:48+02:00 +lastmod: 2024-04-07T16:04:48+02:00 +draft: false +weight: 100 +toc: true +seo: + title: "" # custom title (optional) + description: "" # custom description (recommended) + canonical: "" # custom canonical URL (optional) + noindex: false # false (default) or true +--- + +The [quickstart](/guides/getting_started/) is designed to launch quickly so developers and sysadmins can get familiar with the system. It makes many assumptions about a small system that may not be valid for your site. The `docker compose` environment and all the concepts may not be familiar to you. This tour is meant to provide devlopers with a starting point when trying to make changes. + +## What is Docker Compose? + +Docker compose is a part of the docker ecosystem. It is documented based on the [compose file format](https://docs.docker.com/compose/compose-file/) which is partially supported by other tools. It uses the underlying docker runtime to deploy containerized applications in a prescribed order and with various elements shared between containers. + +### How does the quickstart Docker Compose? + +The quickstart compose files define the service, volumes, and networks necessary for establishing a containerized system for managing an HPC cluster. Services that need to communicate with each other share networks. Containers that need to share files do so through volumes. Containers define their own healthchecks and their own service dependencies to ensure one service doesn't start until another process is complete or a dependent service is running. The whole process takes about a minute, even when downloading container images for the first time. + +## Docker Volumes + +Docker volumes are specified in a top-level construct `volumes` within the compose format, and used by individual services. They must be defined before they can be used. Volumes exist as containers for files on the host running the docker compose project. From an HPC background, you can think of them as shared mounts. In the following example, we define three volumes. When docker compose reads this configuration, it creates an empty volume for each of these names and keeps track of which containers are allowed to either read or write to the volume. They do not exist on disk in a way that is easily browsable by the sysadmin. Instead, they are opaque references to temporary directories that are maintained by the docker daemon. Sysadmins with control over the docker daemon may attatch and detatch these volumes with advanced docker commands. Volumes follow their own lifecycles which are separate from the container lifecycles. Once a volume is created, it exists until the sysadmin deletes it. Volumes even survive restarts of the docker daemon as well as restarts of the server. In our quickstart, we specifically delete them using the `--volumes` flag passed to `docker compose down`. Without the `--volumes` flag, the databases and certificates of previous runs would persist from experiment to experiment. + +In the following example, we are creating three volumes with the intention of using them to create files through one container and read them in a different container without having to create services to copy those files around. + +* The first empty volume is called `step-root-ca` which is used in OpenCHAMI to hold the CA bundle needed to verify all locally signed certificates. The certificate authority writes to it and all other containers can mount it with the `:ro` flag to read the certificate. +* The second empty volume is called `haproxy-certs`. This volume holds the certificates that our API gateway (haproxy) needs for SSL termination. Haproxy itself doesn't have the capacity to request certs. We rely on a sidecar which interacts with the certificate authority to generate and renew the SSL certificates as needed. + +```yaml +volumes: + step-root-ca: + haproxy-certs: +``` + +Containers specify which volumes they need access to in a `volumes:` section of their service definition. + +## Docker Networks + +Docker networks are specified in a top-level construct `networks:` within the compose format, and used by individual services. They must be defined before they can be used. Containers that share a network may communicate freely across any port, using just the name of the service as the hostname. It is helpful to think of each docker network as a shared localhost network between containers. These networks also provide a degree of isolation between services. Without a specific directive to expose a port or service, it is inaccessible outside it's network(s). + +In the following example excerpt, we show a chain of services. `hydra` is our secure service for managing credentials. We don't want other services calling it directly. We built `opaal` specifically to implement only the very few client operations necessary for our system. All access from outside docker compose to internal services must go through `haproxy` which has access to the outside world. `haproxy` has access to both `internal` and `external` networks which allows it to act as a proxy for `opaal`. + +```yaml +networks: + external: + internal: + hydra-only: + +services: + hydra: + networks: + - hydra-only + + opaal: + networks: + - hydra-only + - internal + + haproxy: + networks: + - internal + - external + +``` \ No newline at end of file diff --git a/content/guides/getting_started.md b/content/guides/getting_started.md index 62de554..9ff173e 100644 --- a/content/guides/getting_started.md +++ b/content/guides/getting_started.md @@ -46,6 +46,8 @@ docker exec -it step-ca step ca root > cacert.pem # Use curl to confirm that everything is working curl --cacert cacert.pem https://foobar.openchami.cluster/login ``` + +Explore the environment on [Github](https://github.com/openchami/deployment-recipes/tree/main/lanl/). {{< /callout >}} ### Dependencies and Assumptions @@ -66,6 +68,11 @@ This quickstart makes a few assumptions about the target operating system and is Now that you've got a set of containers up and running that provide OpenCHAMI services, it's time to use them. We've got a set of administration guides and user guides for you to choose from. {{< card-grid >}} +{{< link-card + title="Docker Compose Tour" + description="Learn just enough docker compose to explore our quickstart files" + href="/guides/docker_tour/" +>}} {{< link-card title="Run a job" description="Deploy slurm and run a simple job" diff --git a/content/guides/install_slurm.md b/content/guides/install_slurm.md index 46f5fb8..ed0498e 100644 --- a/content/guides/install_slurm.md +++ b/content/guides/install_slurm.md @@ -12,4 +12,6 @@ seo: description: "" # custom description (recommended) canonical: "" # custom canonical URL (optional) noindex: false # false (default) or true ---- \ No newline at end of file +--- + +Coming soon! \ No newline at end of file diff --git a/layouts/_default/_markup/render-codeblock-mermaid.html b/layouts/_default/_markup/render-codeblock-mermaid.html new file mode 100644 index 0000000..b3fb4dc --- /dev/null +++ b/layouts/_default/_markup/render-codeblock-mermaid.html @@ -0,0 +1,6 @@ +
+    {{- .Inner | safeHTML }}
+
+{{ .Page.Store.Set "hasMermaid" true }} + + \ No newline at end of file diff --git a/layouts/partials/footer/script-footer-custom.html b/layouts/partials/footer/script-footer-custom.html index 4411a70..8b83364 100644 --- a/layouts/partials/footer/script-footer-custom.html +++ b/layouts/partials/footer/script-footer-custom.html @@ -11,3 +11,10 @@ {{ partial "footer/esbuild" (dict "src" "js/gallery.js" "load" "async" "transpile" false) -}} {{ end -}} */}} +{{ if .Store.Get "hasMermaid" }} + +{{ end }} +