Skip to content

Commit

Permalink
docs(admin): add first draft (#9)
Browse files Browse the repository at this point in the history
Co-authored-by: Alvaro Gonzalez <[email protected]>
  • Loading branch information
uniqueg and lvarin authored Jun 30, 2023
1 parent 276b506 commit d2a0f95
Show file tree
Hide file tree
Showing 3 changed files with 308 additions and 0 deletions.
278 changes: 278 additions & 0 deletions docs/guides/guide-admin/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,282 @@
# Administrator guide

Welcome to the systems administrator guide to the [ELIXIR Cloud][elixir-cloud].
Whether you would like to onboard your data or compute center, set up your own
[GA4GH][ga4gh]-based cloud or simply play around with our compute and storage
solutions, this is the right place to get you off the ground.

## General deployment notes

Most of our services (see our [GitHub organization][elixir-cloud-aai-github]
for a comprehensive list) come with [Helm](https://helm.sh/) charts for
deployment on Cloud Native infrastructure and [Docker
Compose](https://docs.docker.com/compose/) configurations for
testing/development deployments. If you do not have experience with these
technologies, please find some brief primers with references to additional
documentation below.

### Using Helm

[Helm][helm] is an IaC tool that is described as the "package manager for
Kubernetes". It allows the management of the lifecycle of a Kubernetes
application, i.e., its deployment, configuration, upgrade, retiring, etc.
Applications ara packaged into "Charts". Using Helm Charts allows us to
version control an application and therefore follow its evolution over time,
make identical copies (e.g., development, staging, production), make
predictable upgrades, and share/publish the application.

Some useful Helm commands to manage a Chart are:

- `helm create`: Create a Helm Chart
- `helm install`: Install an application
- `helm upgrade`: Upgrade an application
- `helm uninstall`: Uninstall an application

### Using Docker Compose

Most of our services provide a [Docker Compose][docker-compose] configuration
file for easy deployment of the software on a local machine. If the [Docker
Engine][docker-engine] and [Docker Compose][docker-compose] are already
installed on your system, it is as simple as cloning the service's Git
repository, changing into the folder where the Docker Compose file resides
(typically `docker-compose.yml` in a repository's root directory) and running
the following:

```sh
docker-compose up -d
```

!!! note "Non-standard name or location of config file"

The command will be different if the Docker Compose config file is _not_ in
the current working directory and/or is _not_ called `docker-compose.yml`.

This will bring the service up. The argument `-d` (or `--detach`) starts the
app in daemonized mode, i.e., all launched containers that compose creates run
in the background.

In order to stop the deployment, simply run:

```sh
docker-compose down
```

## Onboarding your compute center

Follow the instructions below to onboard your compute node with the [ELIXIR
Cloud][elixir-cloud]. Afterwards, your compute cluster will be accessible
through the [GA4GH][ga4gh] Task Execution Service ([TES][ga4gh-tes]) API and,
optionally, available in the ELIXIR Cloud compute network.

### Deploying compute

Depending on whether you have a Native Cloud cluster or an HPC/HTC, you will
need to follow the instructions for deploying [TESK][tesk] or [Funnel][funnel]
below, respectively.

#### Deploying TESK

[TESK][tesk] uses the Kubernetes Batch API ([Jobs][k8s-jobs]) to schedule
execution of TES tasks. This means that it should be possible to deploy TESK in
any flavor of Kubernetes, but tests are currently only performed with
[Kubernetes][k8s], [OpenShift][openshift], and [Minikube][minikube]. Follow
these instructions if you wish to deploy a TES endpoint on your Native Cloud
cluster, and please let us know if you deploy TESK in any new and interensting
platform.

TESK currently does not use any other storage (DB) than Kubernetes itself.
[Persistent Volume Claims][k8s-pvc] are used as a temporary storage to handle
input and output files of a task and pass them over between executors of a
task. Note that PVCs are destroyed immediately after task completion! This
means your cluster will need to provide a ReadWriteMany
[StorageClass][k8s-storage-class]. Commonly used storage classes are
[NFS][nfs] and [CephFS][cephfs].

Here is an overview of TESK's architecture:

<div>
<a href="https://github.com/elixir-cloud-aai/TESK">
<img src="images/tesk_architecture.png" alt="TESK architecture" width="627"/>
</a>
</div>

A [Helm][helm] chart is provided for the convenient deployment of TESK. The
chart is available in the [TESK code repository][tesk-helm].

Follow these steps:

1. [Install Helm][helm-install]
2. Clone the [TESK repository][tesk]:

```sh
git clone https://github.com/elixir-cloud-aai/TESK.git
```

3. Find the Helm chart at `charts/tesk`
4. Edit file
[`values.yaml`]
(see [notes](#notes-for-editing-chart-values) below)
5. Log into the cluster and install TESK with:

```sh
helm install -n TESK-NAMESPACE TESK-DEPLOYMENT-NAME . \
-f secrets.yaml \
-f values.yaml
```

* Replace `TESK-NAMESPACE` with the name of the namespace where you want to
install TESK. If the namespace is not specified, the default namespace will
be used.
* The argument provided for `TESK-DEPLOYMENT-NAME` will be used by Helm to
refer to the deployment, for example when upgrading or deleting the
deployment. You can choose whichever name you like.

You should now have a working TESK isntance!

##### Notes for editing chart values

In the [TESK deployment documentation][tesk-docs-deploy] documentation there is
a [description of every value][tesk-docs-deploy-values]. Briefly, the most
important are:

1. `host_name`: Will be used to serve the API.
2. `storageClass`: Specify the storage class. If left empty, TESK will use the
default one configred in the Kubernetes cluster.
3. `auth.mode`: Enable (`auth`) or disable (`noauth`; default) authentication.
When enabled, an OIDC client **must** be in a file `./secrets.yaml`, with
the following format:

```yaml
auth:
client_id: <client_id>
client_secret: <client_secret>
```

4. `ftp`: Which FTP credentials mode to use. Two options are supported:
`.classic_ftp_secret` for basic authentication (username and password) or
`.netrc_secret` for using a [`.netrc`][netrc] file.

For the classic approach, you must write in `values.yaml`:

```yaml
ftp:
classic_ftp_secret: ftp-secret
```

And in a file `.secrets.yaml` write down the username and password as:

```yaml
ftp:
username: <username>
password: <password>
```

For the `.netrc` approach, create a `.netrc` file in the `ftp` folder with
the connections details in the correct format.

5. `clusterType`: Type of Kubernetes flavor. Currently supported: `kubernetes`
(default) and `openshift`.

!!! warning "Careful"
When creating a `.secrets.yaml` file, ensure that the file is never shared
or committed to a code repository!

#### Deploying Funnel

Follow these instructions if you wish to deploy a TES endpoint in front of your
HPC/HTC cluster (currently tested with [Slurm][slurm] and [OpenPBS][openpbs].

1. Make sure the build dependencies `make` and [Go 1.11+][go-install] are
installed, `GOPATH` is set and `GOPATH/bin` is added to `PATH`.

For example, in Ubuntu this can be achieved via:

```sh
sudo apt update
sudo apt install make golang-go
export GOPATH=/your/desired/path
export PATH=$GOPATH/bin:$PATH
go version
```

2. Clone the repository:

```sh
git clone https://github.com/ohsu-comp-bio/funnel.git
```

3. Build Funnel:

```sh
cd funnel
make
```

4. Test the installation by starting the Funnel server with:

```sh
funnel server run
```

If all works, Funnel should be ready for deployment on your HPC/HTC.

##### Slurm

For the use of Funnel with Slurm, make sure the following conditions are met:

1. The `funnel` binary must be placed in a server with access to Slurm.
2. A config file must be created and placed on the same server. [This
file][funnel-config-slurm] can be used as a starting point.
3. If we would like to deploy Funnel as a Systemd service,
[this file][funnel-config-slurm-service] can be used as a template. Set the
correct paths to the `funnel` binary and config file.

If successfull Funnel should be listening on port `8080`.

##### OpenPBS

!!! warning "Under construction"
More info coming soon...


### Deploying storage

Follow the instructions below to connect your TES endpoint to one or more
ELIXIR Cloud cloud storage solutions. The currently supported solutions are:

- [MinIO][minio] (Amazon S3)
- [`vsftpd`][vsftp] (FTP)

!!! note "Other storage solutions"

Other S3 and FTP implementations may work but have not being tested.

#### Deploying MinIO (Amazon S3)

In order to deploy the [MinIO][minio] server, follow the [official
documentation][minio-docs-k8s]. It is very simple

If you are deploying Minio to OpenShift, you may find this
[Minio-OpenShift][minio-deploy-openshift-template] template useful.

#### Deploying `vsftpd` (FTP)

There are a lot of guides available online to deploy [`vsftpd`][vsftpd], for
example [this one][vsftpd-deploy]. There are only two considerations:

1. It is required to activate secure FTP support with `ssl_enable=YES`.
2. For onboarding with the ELIXIR Cloud, currently the server should have one
account with a specific username and password created. Please [contact
us][elixir-cloud-aai-email] for details.

### Registering your TES service

We are currently working on implementing access control mechanisms and
providing a user interface for the [ELIXIR Cloud
Registry][elixir-cloud-registry]. Once available, we will add registration
instructions here. For now, please let us know about your new TES endpoint by
[email][elixir-cloud-aai-email].
## Custom cloud deployments

!!! warning "Under construction"
More info coming soon...
1 change: 1 addition & 0 deletions includes/abbreviations.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
*[FOSS]: Free & Open Source Software
*[GA4GH]: The Global Alliance for Genomics and Health is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework.
*[GSoC]: Google Summer of Code
*[IaC]: Infrastructure as Code
*[LIMS]: Laboratory Information Management System
*[NBDC]: National Bioscience Database Center
*[TES]: GA4GH Task Execution Service API
Expand Down
29 changes: 29 additions & 0 deletions includes/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,16 @@
[bh-denbi]: <https://www.denbi.de/de-nbi-events/1454-biohackathon-germany>
[bh-elixir]: <https://www.biohackathon-europe.org/>
[bh-mena]: <https://cbrcconferences.kaust.edu.sa/bio-hackathon-2023>
[cephfs]: <https://docs.ceph.com/en/quincy/cephfs/>
[contributor-covenant]: <https://www.contributor-covenant.org>
[conv-commits]: <https://www.conventionalcommits.org/en/v1.0.0-beta.2/#specification>
[conv-commits-blog]: <https://nitayneeman.com/posts/understanding-semantic-commit-messages-using-git-and-angular/>
[conv-commits-lint]: <https://github.com/conventional-changelog/commitlint>
[cwl-tes]: <https://github.com/ohsu-comp-bio/cwl-tes>
[docker-compose]: <https://docs.docker.com/compose/>
[docker-engine]: <https://docs.docker.com/engine/>
[elixir]: <https://elixir-europe.org/>
[elixir-cloud]: <https://elixir-cloud.dcc.sib.swiss/>
[elixir-cloud-aai]: <https://elixir-cloud.dcc.sib.swiss/>
[elixir-cloud-aai-contributors]: <https://elixir-cloud.dcc.sib.swiss/contributors>
[elixir-cloud-aai-github]: <https://github.com/elixir-cloud-aai/>
Expand All @@ -28,6 +32,8 @@
[elixir-cloud-services]: <https://github.com/elixir-cloud-aai/elixir-cloud-aai/blob/dev/resources/resources.md>
[fair]: <https://www.go-fair.org/fair-principles/>
[funnel]: <https://ohsu-comp-bio.github.io/funnel/>
[funnel-config-slurm]: <https://raw.githubusercontent.com/lvarin/test-funnel-slurm/main/funnel_config.yml>
[funnel-config-slurm-service]: <https://raw.githubusercontent.com/ohsu-comp-bio/funnel/52ef90fb76e620226f2af1bca5d14d35e1c4ad4a/deployments/systemd/funnel-server.service>
[ga4gh]: <https://ga4gh.org/>
[ga4gh-cloud]: <https://ga4gh-cloud.github.io/>
[ga4gh-dps]: <https://www.ga4gh.org/how-we-work/driver-projects/>
Expand All @@ -48,6 +54,8 @@
[github-merge-squash]: <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/about-pull-request-merges#squash-and-merge-your-commits>
[github-merge-rebase]: <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/about-pull-request-merges#rebase-and-merge-your-commits>
[github-pr]: <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>
[go-gopath]: <https://go.dev/doc/gopath_code#GOPATH>
[go-install]: <https://go.dev/doc/install>
[good-issues]: <https://medium.com/nyc-planning-digital/writing-a-proper-github-issue-97427d62a20f>
[good-bug-reports]: <http://testthewebforward.org/docs/bugs.html>
[gsoc]: <https://summerofcode.withgoogle.com/>
Expand All @@ -62,8 +70,14 @@
[gsoc-ga4gh]: <https://summerofcode.withgoogle.com/organizations/6274606475771904/>
[gsoc-stipends]: <https://developers.google.com/open-source/gsoc/help/student-stipends>
[gsoc-timeline]: <https://developers.google.com/open-source/gsoc/timeline>
[helm]: <https://helm.sh/>
[helm-install]: <https://helm.sh/docs/intro/install/>
[issue-tracker-example]: <https://github.com/elixir-cloud-aai/elixir-cloud-aai.github.io/issues>
[jsdoc]: <https://jsdoc.app/index.html>
[k8s]: <https://kubernetes.io/>
[k8s-jobs]: <https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion>
[k8s-pvc]: <https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims>
[k8s-storage-class]: <https://kubernetes.io/docs/concepts/storage/storage-classes/>
[linkedin-vani]: <https://www.linkedin.com/in/vani-s-78701315b/>
[linkedin-sarthak]: <https://www.linkedin.com/in/sarthakgupta072/>
[linkedin-akash]: <https://www.linkedin.com/in/akash-saini-ak7778/>
Expand All @@ -72,7 +86,15 @@
[linkedin-ayush]: <https://www.linkedin.com/in/ayush-kumar-514a17197/>
[linkedin-lakshya]: <https://www.linkedin.com/in/lakshyaagarg/>
[linkedin-suyash]: <https://www.linkedin.com/in/sgalpha01/>
[minikube]: <https://minikube.sigs.k8s.io/>
[minio]: <https://min.io/>
[minio-deploy-openshift-template]: <https://github.com/CSCfi/Minio-OpenShift>
[minio-docs-k8s]: <https://min.io/docs/minio/kubernetes/upstream/index.html>
[nextflow]: <https://www.nextflow.io/>
[netrc]: <https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html>
[nfs]: <https://en.wikipedia.org/wiki/Network_File_System>
[openpbs]: <https://www.openpbs.org/>
[openshift]: <https://www.redhat.com/en/technologies/cloud-computing/openshift>
[osi]: <https://opensource.org/>
[py]: <https://www.python.org/>
[py-black]: <https://github.com/psf/black>
Expand All @@ -88,6 +110,13 @@
[py-pytest]: <https://docs.pytest.org/en/latest/>
[py-typing]: <https://docs.python.org/3/library/typing.html>
[sem-ver]: <https://semver.org/>
[slurm]: <https://slurm.schedmd.com/>
[snakemake]: <https://snakemake.readthedocs.io/en/stable/>
[snakemake-docs]: <https://snakemake.readthedocs.io/en/stable/executing/cloud.html#executing-a-snakemake-workflow-via-ga4gh-tes>
[tesk]: <https://github.com/elixir-cloud-aai/TESK>
[tesk-docs-deploy]: <https://github.com/elixir-cloud-aai/TESK/blob/master/charts/tesk/README.md>
[tesk-docs-deploy-values]: <https://github.com/elixir-cloud-aai/TESK/tree/master/charts/tesk#description-of-values>
[tesk-helm]: <https://github.com/elixir-cloud-aai/TESK/tree/master/charts/tesk>
[tesk-helm-values]: <https://github.com/elixir-cloud-aai/TESK/blob/master/charts/tesk/values.yaml>
[vsftpd]: <https://security.appspot.com/vsftpd.html>
[vsftpd-deploy]: <https://www.digitalocean.com/community/tutorials/how-to-set-up-vsftpd-for-a-user-s-directory-on-ubuntu-20-04>

0 comments on commit d2a0f95

Please sign in to comment.