Skip to content

Latest commit

 

History

History
157 lines (115 loc) · 7.48 KB

README.md

File metadata and controls

157 lines (115 loc) · 7.48 KB

Build Status CII Best Practices

Machine Learning eXchange (MLX)

Data and AI Assets Catalog and Execution Engine

Allows upload, registration, execution, and deployment of:

  • AI pipelines and pipeline components
  • Models
  • Datasets
  • Notebooks

Additionally it provides:

  • Automated sample pipeline code generation to execute registered models, datasets and notebooks
  • Pipelines engine powered by Kubeflow Pipelines on Tekton, core of Watson Pipelines
  • Components registry for Kubeflow Pipelines
  • Datasets management by Datashim
  • Preregistered Datasets from Data Asset Exchange (DAX) and Models from Model Asset Exchange (MAX)
  • Serving engine by KFServing
  • Model Metadata schemas

1. Prerequisites

Quickstart (MLX Asset Catalog Only)

Cluster Deployment (MLX Asset Catalog and Execution Engine)

  • An existing Kubernetes cluster. Version 1.17+
  • The minimum capacity requirement for MLX is 8 vCPUs and 16GB RAM
  • If you are using IBM Cloud, follow the appropriate instructions for standing up your Kubernetes cluster using IBM Cloud Public
  • If you are using OpenShift on IBM Cloud, please follow the instructions for standing up your IBM Cloud Red Hat OpenShift cluster
  • kustomize v3.0+ is installed

2. Deployment

For a simple up-and-running MLX with asset catalog only, we created a Quickstart Guide using Docker Compose.

For a full deployment, use an Operator based on the Kubeflow Operator architecture.

3. Access the MLX UI

  1. By default the MLX UI is available at :30380/os

To find the public ip of a node of your cluster

kubectl get node -o wide

Look for the ExternalIP column.

  1. If you are on a openshift cluster you can also make use of the IstioIngresGateway Route. You can find it in the OpenShift Console or in the CLI
oc get route -n istio-system

4. Import Data and AI Assets in MLX Catalog

Import data and AI assets using MLX's catalog importer

5. Usage Steps

  1. Pipelines

  2. Components

  3. Models

  4. Notebooks

  5. Datasets

6. Delete MLX

  • Delete MLX deployment, the KfDef instance
kubectl delete kfdef -n kubeflow --all

Note that the users profile namespaces created by profile-controller will not be deleted. The ${KUBEFLOW_NAMESPACE} created outside of the operator will not be deleted either.

  • Delete Kubeflow Operator
kubectl delete -f deploy/operator.yaml -n ${OPERATOR_NAMESPACE}
kubectl delete clusterrolebinding kubeflow-operator
kubectl delete -f deploy/service_account.yaml -n ${OPERATOR_NAMESPACE}
kubectl delete -f deploy/crds/kfdef.apps.kubeflow.org_kfdefs_crd.yaml
kubectl delete ns ${OPERATOR_NAMESPACE}

7. Troubleshooting

  • When deleting the Kubeflow deployment, some mutatingwebhookconfigurations resources are cluster-wide resources and may not be removed as their owner is not the KfDef instance. To remove them, run following:

     kubectl delete mutatingwebhookconfigurations admission-webhook-mutating-webhook-configuration
     kubectl delete mutatingwebhookconfigurations inferenceservice.serving.kubeflow.org
     kubectl delete mutatingwebhookconfigurations istio-sidecar-injector
     kubectl delete mutatingwebhookconfigurations katib-mutating-webhook-config
     kubectl delete mutatingwebhookconfigurations mutating-webhook-configurations
     kubectl delete mutatingwebhookconfigurations cache-webhook-kubeflow
  • If you don't see any sample pipeline or receive Failed to establish a new connection messages. It's because IBM Cloud NFS storage might be taking too long to provision which makes the storage and backend microservices timed out. In this case, you have to run the below commands to restart the pods.

     # Replace kubeflow with the KFP namespace
     NAMESPACE=kubeflow
     kubectl get pods -n ${NAMESPACE:-kubeflow}
     kubectl delete pod -n ${NAMESPACE:-kubeflow} $(kubectl get pods -n ${NAMESPACE:-kubeflow} -l app=ml-pipeline | grep ml-pipeline | awk '{print $1;exit}')
     kubectl delete pod -n ${NAMESPACE:-kubeflow} $(kubectl get pods -n ${NAMESPACE:-kubeflow} -l app=ml-pipeline-persistenceagent | grep ml-pipeline | awk '{print $1;exit}')
     kubectl delete pod -n ${NAMESPACE:-kubeflow} $(kubectl get pods -n ${NAMESPACE:-kubeflow} -l app=ml-pipeline-ui | grep ml-pipeline | awk '{print $1;exit}')
     kubectl delete pod -n ${NAMESPACE:-kubeflow} $(kubectl get pods -n ${NAMESPACE:-kubeflow} -l app=ml-pipeline-scheduledworkflow | grep ml-pipeline | awk '{print $1;exit}')

    Then you can redeploy the bootstrapper to properly populate the default assets. Remember to insert the IBM Github Token if you want to retrieve any asset within IBM Github.

     vim bootstrapper/bootstrap.yaml # Insert the IBM Github Token
     kubectl delete -f bootstrapper/bootstrap.yaml -n $NAMESPACE
     kubectl apply -f bootstrapper/bootstrap.yaml -n $NAMESPACE
  • Additional troubleshooting on IBM Cloud is available at the wiki page.