In machine learning, it is common to run a sequence of algorithms to process and learn from data. The entire ML lifecycle can be represented as a sequence of tasks and input/output dependencies in a workflow (aka DAG). This can all be packaged into what is called a pipeline.
ML Pipelines are a:
- Consistent way for data scientists to consume multiple phases and projects in AI lifecycle
- Composable layer, so different parts of the AI lifecycle can be snapped together like legos
- One-stop shop for people interested in training, validating, deploying and monitoring AI models
The simplest way to create a pipeline is to use Kubeflow. Kubeflow provides an SDK to create and execute pipelines.
- Package your custom components into docker images, or use registered components here by just pulling the yaml specification
- Create a python class for each component to describe interactions with their respective docker containers
- Define the pipeline as a python function
- Compile to pipeline.
- This produces a yaml file compressed as a .tar.gz.
For an in-depth guide to creating pipelines with Kubeflow, take a look at their documentation.
- Click on the "Pipelines" link in left hand navigation panel
- Click on "UPLOAD A PIPELINE"
- Select a file to upload (Must be tar.gz or tgz format)
- Enter a name for the pipeline; Otherwise a default will be given
- Under the "Pipelines" tab, select the pipeline
- This will take you to pipeline "view" page
- Click "Create run", and fill out the run details and parameters
- Clicking "Start" will kick off the run
- A history of finished / currently running pipelines will show up under the Experiments view in the "All runs" tab.
- Clicking on the run row will show progress and output of each run.
You can find the sample pipelines in the Machine Learning Exchange catalog here