Elastic Fabric Adapter support in Openshift

This is a proof of concept showing how to get working EFA support in Openshift.

Prerequisites

a working Openshift cluster on Amazon Web Services with available kubeconfig
aws-cli configured with proper AWS credentials

Node configuration

Hugepages configured and reserved: manifests/hugepages.yaml
Memlock ulimits set to unlimited (hard and soft): manifests/unlimited-memlock.yaml
Daemonset running in order to expose EFA capabilities: manifests/efa-k8s-device-plugin.yml

Openshift configuration

Node Feature Discovery operator
Patched MPI operator (patched to keep using kubectl exec instead of [rs]sh): manifests/mpi-operator.yaml
Provide AWS ECR registry credentials (example for us-east-1 and us-west-2):

us_east_1_token=$(echo AWS:$(/usr/local/bin/aws ecr get-login-password --region us-east-1) | base64 -w0)
us-west-2_token=$(echo AWS:$(/usr/local/bin/aws ecr get-login-password --region us-west-2) | base64 -w0)

The registries used are 994408522926.dkr.ecr.us-east-1.amazonaws.com and 602401143452.dkr.ecr.us-west-2.amazonaws.com

Jobs configuration

Container images used for actual MPI: container/Dockerfile-[intel|openmpi]. Prebuilt images available at quay.io/cgament/mpi:efa and quay.io/cgament/mpi:intel
We are using benchmarks for MPI over InfiniBand, Omni-Path, Ethernet/iWARP, and RoCE benchmarks from Ohio State Univeristy, specifically the osu_latency. Example MpiJobs:
- jobs/mpi-tcp.yaml -- an MPI job running over TCP, used to determine network connectivity between nodes
- jobs/mpi-support.yaml -- will just show detected EFA support in the pod logs
- jobs/mpi-latency.yaml -- latency benchmark using OpenMPI
- jobs/mpi-intel.yaml -- latency benchmark using IntelMPI

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
containers		containers
jobs		jobs
manifests		manifests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elastic Fabric Adapter support in Openshift

Prerequisites

Node configuration

Openshift configuration

Jobs configuration

About

Releases

Packages

kwozyman/ocp-aws-efa-poc

Folders and files

Latest commit

History

Repository files navigation

Elastic Fabric Adapter support in Openshift

Prerequisites

Node configuration

Openshift configuration

Jobs configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages