Skip to content

Commit

Permalink
Merge pull request #460 from GoogleCloudPlatform/cl_extra
Browse files Browse the repository at this point in the history
Add cl extra solution
  • Loading branch information
takumiohym authored May 15, 2024
2 parents 5d10e7c + 268015a commit 2450574
Show file tree
Hide file tree
Showing 8 changed files with 902 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,397 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"# KFP Challenge Extra\n",
"\n",
"This extra challenge lab shows how to add KFP task specific metrics artifacts and widgets, extending the pipeline we created in the challenge lab 3.<br>\n",
"Since we already added a batch prediction task in the challenge lab 2, we use its result to compute metrics.\n",
"\n",
"Please see the [cls_metrics.py](./pipeline_vertex/cls_metrics.py) and [pipeline.py](./pipeline_vertex/pipeline.py) for the detail.\n",
"\n",
"### Reference\n",
"- KFP ClassificationMetrics artifact: https://kubeflow-pipelines.readthedocs.io/en/2.0.0b6/source/dsl.html#kfp.dsl.ClassificationMetrics\n",
"- BQTable artifact: https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.8.0/api/artifact_types.html#google_cloud_pipeline_components.types.artifact_types.BQTable"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"from google.cloud import aiplatform"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"REGION = \"us-central1\"\n",
"PROJECT_ID = !(gcloud config get-value project)\n",
"PROJECT_ID = PROJECT_ID[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Set `PATH` to include the directory containing KFP CLI\n",
"PATH = %env PATH\n",
"%env PATH=/home/jupyter/.local/bin:{PATH}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Build the trainer image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"ARTIFACT_REGISTRY_DIR = \"asl-artifact-repo\"\n",
"IMAGE_NAME = \"trainer_image_covertype_vertex\"\n",
"IMAGE_TAG = \"latest\"\n",
"TRAINING_CONTAINER_IMAGE_URI = f\"us-docker.pkg.dev/{PROJECT_ID}/{ARTIFACT_REGISTRY_DIR}/{IMAGE_NAME}:{IMAGE_TAG}\"\n",
"TRAINING_CONTAINER_IMAGE_URI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The notebook assumes the training container is already created under `us-docker.pkg.dev/{PROJECT_ID}/asl-artifact-repo/trainer_image_covertype_vertex`. You can check if it exists with the command below.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!gcloud artifacts docker images list \\\n",
"--include-tags us-docker.pkg.dev/$PROJECT_ID/$ARTIFACT_REGISTRY_DIR/$IMAGE_NAME \\\n",
"--filter TAGS:$IMAGE_TAG"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the image doesn't exists, remove the comment out below and build it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# !gcloud builds submit --timeout 15m --tag $TRAINING_CONTAINER_IMAGE_URI trainer_image_vertex"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To match the ml framework version we use at training time while serving the model, we will have to supply the following serving container to the pipeline:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"SERVING_CONTAINER_IMAGE_URI = (\n",
" \"us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** If you change the version of the training ml framework you'll have to supply a serving container with matching version (see [pre-built containers for prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Compile and run the pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let stat by defining the environment variables that will be passed to the pipeline compiler:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"ARTIFACT_STORE = f\"gs://{PROJECT_ID}-kfp-artifact-store\"\n",
"PIPELINE_ROOT = f\"{ARTIFACT_STORE}/pipeline\"\n",
"DATA_ROOT = f\"{ARTIFACT_STORE}/data\"\n",
"\n",
"TRAINING_FILE_PATH = f\"{DATA_ROOT}/training/dataset.csv\"\n",
"VALIDATION_FILE_PATH = f\"{DATA_ROOT}/validation/dataset.csv\"\n",
"\n",
"TIMESTAMP = datetime.now().strftime(\"%Y%m%d%H%M%S\")\n",
"BASE_OUTPUT_DIR = f\"{ARTIFACT_STORE}/models/{TIMESTAMP}\"\n",
"\n",
"%env PIPELINE_ROOT={PIPELINE_ROOT}\n",
"%env PROJECT_ID={PROJECT_ID}\n",
"%env REGION={REGION}\n",
"%env SERVING_CONTAINER_IMAGE_URI={SERVING_CONTAINER_IMAGE_URI}\n",
"%env TRAINING_CONTAINER_IMAGE_URI={TRAINING_CONTAINER_IMAGE_URI}\n",
"%env TRAINING_FILE_PATH={TRAINING_FILE_PATH}\n",
"%env VALIDATION_FILE_PATH={VALIDATION_FILE_PATH}\n",
"%env BASE_OUTPUT_DIR={BASE_OUTPUT_DIR}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us make sure that the `ARTIFACT_STORE` has been created, and let us create it if not:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!gsutil ls | grep ^{ARTIFACT_STORE}/$ || gsutil mb -l {REGION} {ARTIFACT_STORE}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** In case the artifact store was not created and properly set before hand, you may need\n",
"to run in **CloudShell** the following command to allow Vertex AI to access it:\n",
"\n",
"```\n",
"PROJECT_ID=$(gcloud config get-value project)\n",
"PROJECT_NUMBER=$(gcloud projects list --filter=\"name=$PROJECT_ID\" --format=\"value(PROJECT_NUMBER)\")\n",
"gcloud projects add-iam-policy-binding $PROJECT_ID \\\n",
" --member=\"serviceAccount:[email protected]\" \\\n",
" --role=\"roles/storage.objectAdmin\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use the CLI compiler to compile the pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We compile the pipeline from the Python file we generated into a JSON description using the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"PIPELINE_YAML = \"covertype_kfp_pipeline.yaml\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!kfp dsl compile --py pipeline_vertex/pipeline.py --output $PIPELINE_YAML"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** You can also use the Python SDK to compile the pipeline:\n",
"\n",
"```python\n",
"from kfp import compiler\n",
"\n",
"compiler.Compiler().compile(\n",
" pipeline_func=create_pipeline, \n",
" package_path=PIPELINE_YAML,\n",
")\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is the pipeline file. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"!head {PIPELINE_YAML}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy and run the pipeline package"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"EXPERIMENT_NAME = \"kfp-covertype-experiment\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"aiplatform.init(\n",
" project=PROJECT_ID,\n",
" location=REGION,\n",
" experiment=EXPERIMENT_NAME,\n",
" experiment_tensorboard=False,\n",
")\n",
"\n",
"pipeline = aiplatform.PipelineJob(\n",
" display_name=\"covertype_kfp_pipeline_challenge_lab\",\n",
" template_path=PIPELINE_YAML,\n",
" enable_caching=True,\n",
")\n",
"\n",
"pipeline.submit(experiment=EXPERIMENT_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright 2024 Google LLC\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"you may not use this file except in compliance with the License.\n",
"You may obtain a copy of the License at\n",
"\n",
" https://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
"Unless required by applicable law or agreed to in writing, software\n",
"distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"See the License for the specific language governing permissions and\n",
"limitations under the License."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"environment": {
"kernel": "python3",
"name": "tf2-gpu.2-12.m119",
"type": "gcloud",
"uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/tf2-gpu.2-12:m119"
},
"kernelspec": {
"display_name": "Python 3 (Local)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading

0 comments on commit 2450574

Please sign in to comment.