Merge pull request #460 from GoogleCloudPlatform/cl_extra

Add cl extra solution
GoogleCloudPlatform · May 15, 2024 · 2450574 · 2450574
2 parents 5d10e7c + 268015a
commit 2450574
Show file tree

Hide file tree

Showing 8 changed files with 902 additions and 0 deletions.
diff --git a/...ooks/kubeflow_pipelines/pipelines/challenge_labs_solution_extra/challenge_lab_extra.ipynb b/...ooks/kubeflow_pipelines/pipelines/challenge_labs_solution_extra/challenge_lab_extra.ipynb
@@ -0,0 +1,397 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "# KFP Challenge Extra\n",
+    "\n",
+    "This extra challenge lab shows how to add KFP task specific metrics artifacts and widgets, extending the pipeline we created in the challenge lab 3.<br>\n",
+    "Since we already added a batch prediction task in the challenge lab 2, we use its result to compute metrics.\n",
+    "\n",
+    "Please see the [cls_metrics.py](./pipeline_vertex/cls_metrics.py) and [pipeline.py](./pipeline_vertex/pipeline.py) for the detail.\n",
+    "\n",
+    "### Reference\n",
+    "- KFP ClassificationMetrics artifact: https://kubeflow-pipelines.readthedocs.io/en/2.0.0b6/source/dsl.html#kfp.dsl.ClassificationMetrics\n",
+    "- BQTable artifact: https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.8.0/api/artifact_types.html#google_cloud_pipeline_components.types.artifact_types.BQTable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from datetime import datetime\n",
+    "\n",
+    "from google.cloud import aiplatform"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "REGION = \"us-central1\"\n",
+    "PROJECT_ID = !(gcloud config get-value project)\n",
+    "PROJECT_ID = PROJECT_ID[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Set `PATH` to include the directory containing KFP CLI\n",
+    "PATH = %env PATH\n",
+    "%env PATH=/home/jupyter/.local/bin:{PATH}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Build the trainer image"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "ARTIFACT_REGISTRY_DIR = \"asl-artifact-repo\"\n",
+    "IMAGE_NAME = \"trainer_image_covertype_vertex\"\n",
+    "IMAGE_TAG = \"latest\"\n",
+    "TRAINING_CONTAINER_IMAGE_URI = f\"us-docker.pkg.dev/{PROJECT_ID}/{ARTIFACT_REGISTRY_DIR}/{IMAGE_NAME}:{IMAGE_TAG}\"\n",
+    "TRAINING_CONTAINER_IMAGE_URI"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The notebook assumes the training container is already created under `us-docker.pkg.dev/{PROJECT_ID}/asl-artifact-repo/trainer_image_covertype_vertex`. You can check if it exists with the command below.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!gcloud artifacts docker images list \\\n",
+    "--include-tags us-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REGISTRY_DIR/$IMAGE_NAME \\\n",
+    "--filter TAGS:$IMAGE_TAG"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If the image doesn't exists, remove the comment out below and build it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# !gcloud builds submit --timeout 15m --tag $TRAINING_CONTAINER_IMAGE_URI trainer_image_vertex"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To match the ml framework version we use at training time while serving the model, we will have to supply the following serving container to the pipeline:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "SERVING_CONTAINER_IMAGE_URI = (\n",
+    "    \"us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Note:** If you change the version of the training ml framework you'll have to supply a serving container with matching version (see [pre-built containers for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Compile and run the pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let stat by defining the environment variables that will be passed to the pipeline compiler:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "ARTIFACT_STORE = f\"gs://{PROJECT_ID}-kfp-artifact-store\"\n",
+    "PIPELINE_ROOT = f\"{ARTIFACT_STORE}/pipeline\"\n",
+    "DATA_ROOT = f\"{ARTIFACT_STORE}/data\"\n",
+    "\n",
+    "TRAINING_FILE_PATH = f\"{DATA_ROOT}/training/dataset.csv\"\n",
+    "VALIDATION_FILE_PATH = f\"{DATA_ROOT}/validation/dataset.csv\"\n",
+    "\n",
+    "TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
+    "BASE_OUTPUT_DIR = f\"{ARTIFACT_STORE}/models/{TIMESTAMP}\"\n",
+    "\n",
+    "%env PIPELINE_ROOT={PIPELINE_ROOT}\n",
+    "%env PROJECT_ID={PROJECT_ID}\n",
+    "%env REGION={REGION}\n",
+    "%env SERVING_CONTAINER_IMAGE_URI={SERVING_CONTAINER_IMAGE_URI}\n",
+    "%env TRAINING_CONTAINER_IMAGE_URI={TRAINING_CONTAINER_IMAGE_URI}\n",
+    "%env TRAINING_FILE_PATH={TRAINING_FILE_PATH}\n",
+    "%env VALIDATION_FILE_PATH={VALIDATION_FILE_PATH}\n",
+    "%env BASE_OUTPUT_DIR={BASE_OUTPUT_DIR}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us make sure that the `ARTIFACT_STORE` has been created, and let us create it if not:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Note:** In case the artifact store was not created and properly set before hand, you may need\n",
+    "to run in **CloudShell** the following command to allow Vertex AI to access it:\n",
+    "\n",
+    "```\n",
+    "PROJECT_ID=$(gcloud config get-value project)\n",
+    "PROJECT_NUMBER=$(gcloud projects list --filter=\"name=$PROJECT_ID\" --format=\"value(PROJECT_NUMBER)\")\n",
+    "gcloud projects add-iam-policy-binding $PROJECT_ID \\\n",
+    "    --member=\"serviceAccount:[email protected]\" \\\n",
+    "    --role=\"roles/storage.objectAdmin\"\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Use the CLI compiler to compile the pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We compile the pipeline from the Python file we generated into a JSON description using the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "PIPELINE_YAML = \"covertype_kfp_pipeline.yaml\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!kfp dsl compile --py pipeline_vertex/pipeline.py --output $PIPELINE_YAML"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Note:** You can also use the Python SDK to compile the pipeline:\n",
+    "\n",
+    "```python\n",
+    "from kfp import compiler\n",
+    "\n",
+    "compiler.Compiler().compile(\n",
+    "    pipeline_func=create_pipeline, \n",
+    "    package_path=PIPELINE_YAML,\n",
+    ")\n",
+    "\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The result is the pipeline file. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "!head {PIPELINE_YAML}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Deploy and run the pipeline package"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "EXPERIMENT_NAME = \"kfp-covertype-experiment\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "aiplatform.init(\n",
+    "    project=PROJECT_ID,\n",
+    "    location=REGION,\n",
+    "    experiment=EXPERIMENT_NAME,\n",
+    "    experiment_tensorboard=False,\n",
+    ")\n",
+    "\n",
+    "pipeline = aiplatform.PipelineJob(\n",
+    "    display_name=\"covertype_kfp_pipeline_challenge_lab\",\n",
+    "    template_path=PIPELINE_YAML,\n",
+    "    enable_caching=True,\n",
+    ")\n",
+    "\n",
+    "pipeline.submit(experiment=EXPERIMENT_NAME)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Copyright 2024 Google LLC\n",
+    "\n",
+    "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "you may not use this file except in compliance with the License.\n",
+    "You may obtain a copy of the License at\n",
+    "\n",
+    "    https://www.apache.org/licenses/LICENSE-2.0\n",
+    "\n",
+    "Unless required by applicable law or agreed to in writing, software\n",
+    "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "See the License for the specific language governing permissions and\n",
+    "limitations under the License."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "environment": {
+   "kernel": "python3",
+   "name": "tf2-gpu.2-12.m119",
+   "type": "gcloud",
+   "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-12:m119"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (Local)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/...ks/kubeflow_pipelines/pipelines/challenge_labs_solution_extra/pipeline_vertex/__init__.py b/...ks/kubeflow_pipelines/pipelines/challenge_labs_solution_extra/pipeline_vertex/__init__.py