-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
33 changed files
with
479 additions
and
445 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,4 +12,7 @@ node_modules/ | |
.serverless/ | ||
|
||
# Env files | ||
*.env | ||
*.env | ||
|
||
# Python | ||
venv/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,67 +1,96 @@ | ||
# Serverless MLOps | ||
|
||
In this example, we train and deploy a binary classification inference API using serverless computing resources (job+container). We use object storage resources to store data and training artifacts. We use container registry to store docker images. | ||
In this example, we train and deploy a binary classification inference model using Scaleway Serverless. To do this, we use the following resources: | ||
|
||
## Use case: Bank Telemarketing | ||
1. Serverless Job for training | ||
2. Serverless Job to populate data in S3 | ||
3. Serverless Container for inference | ||
|
||
### Context | ||
We use object storage to share data between the two. | ||
|
||
## Context | ||
|
||
In this example we use a bank telemarketing dataset to predict if a client would engage in a term deposit subscription. | ||
|
||
This dataset records marketing phone calls made to clients. The outcome of the phone call is in shown in the `y` column: | ||
|
||
We use a bank telemarketing dataset to predict if a client would engage in a term deposit subscription. This dataset records marketing phone calls made to clients. The outcome of the phone call is in shown in the `y` column: | ||
* `0` : no subscription | ||
* `1` : subscription | ||
|
||
### Data Source | ||
## Data Source | ||
|
||
The dataset has many versions and is open-sourced and published [here](http://archive.ics.uci.edu/dataset/222/bank+marketing) on the UCI Machine Leaning repository and is close to the one analyzed in the following research work: | ||
|
||
* [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014 | ||
|
||
We use the dataset labelled in the source as `bank-additional-full.csv`. You can download, extract this file, rename it to `bank_telemarketing.csv` then put it under this [directory](./s3/data-store/data/). | ||
## Running the example | ||
|
||
## How to deploy your MLOps pipeline on Scaleway Cloud? | ||
|
||
### Step A: Create cloud resources for the ML pipeline | ||
### Step 1. Provision resources with Terraform | ||
|
||
Create `.env` file in `jobs/data-loader-job` and `jobs/ml-job` directories and fill them as it follows: | ||
Set your Scaleway access key, secret key and project ID in environment variables: | ||
|
||
```text | ||
SCW_ACCESS_KEY=<access-key> | ||
SCW_SECRET_KEY=<secret-key> | ||
```console | ||
export TF_VAR_access_key=<your-access-key> | ||
export TF_VAR_secret_key=<your-secret-key> | ||
export TF_VAR_project_id=<your-project-id> | ||
|
||
cd terraform | ||
terraform init | ||
terraform plan | ||
terraform apply | ||
``` | ||
|
||
Create `.tfvars` file in `/terraform` directory and put variable values in it: | ||
### Step 2. Run the data and training Jobs | ||
|
||
*At the time of writing, the Scaleway CLI does not support Jobs, so we use a Python script* | ||
|
||
``` | ||
region = "fr-par" | ||
access_key = "<access-key>" | ||
secret_key = "<secret_key>" | ||
project_id = "<project_id>" | ||
data_file = "bank_telemarketing.csv" | ||
model_object = "classifier.pkl" | ||
image_version = "v1" | ||
cd scripts | ||
python3 -m venv venv | ||
source venv/bin/activate | ||
pip install -r requirements.txt | ||
python3 run upload | ||
python3 run training | ||
``` | ||
|
||
Then perform: | ||
You can then check your Job runs in the [Jobs Console](https://console.scaleway.com/serverless-jobs/jobs). | ||
|
||
### Step 4. Use the inference API | ||
|
||
```bash | ||
cd terraform | ||
terraform init | ||
terraform plan -var-file=testing.tfvars | ||
terraform apply -var-file=testing.tfvars | ||
``` | ||
export INFERENCE_URL=$(terraform output endpoint) | ||
### Step B: Define and run a job to ship data from public source to s3 | ||
curl -X POST \ | ||
-H "Content-Type: application/json" \ | ||
-d @inference/example.json | ||
$INFERENCE_URL | ||
``` | ||
|
||
Use the console to define and run the data loader job using image pushed to Scaleway registry. | ||
## Local testing | ||
|
||
cf. this [readme](./jobs/data-loader-job/README.md) | ||
To test the example locally you can use [Docker Compose](https://docs.docker.com/compose/install/). | ||
|
||
### Step C: Define and run the ML job to train classifier | ||
``` | ||
# Build the containers locally | ||
docker compose build | ||
Use the console to define and the ML job using image pushed to Scaleway registry. | ||
# Run the data job | ||
docker compose run data | ||
cf. this [readme](./jobs/ml-job/README.md) | ||
# Run the training | ||
docker compose run training | ||
### Step D: Call your serverless container to (re)load model and to get inference results | ||
# Start the inference server | ||
docker compose up inference | ||
``` | ||
|
||
Access the inference API locally: | ||
|
||
cf. this [readme](./containers/inference-api/README.md) | ||
``` | ||
curl -X POST \ | ||
-H "Content-Type: application/json" \ | ||
-d @inference/example.json | ||
http://localhost:8080 | ||
``` |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
dataset/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
FROM python:3.12-slim-bookworm | ||
|
||
WORKDIR /app | ||
|
||
RUN apt-get update | ||
RUN apt-get install -y \ | ||
curl \ | ||
unzip | ||
|
||
RUN pip install --upgrade pip | ||
COPY requirements.txt . | ||
RUN pip install -r requirements.txt | ||
|
||
COPY . . | ||
|
||
CMD ["python", "main.py"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
import boto3 | ||
import os | ||
import urllib.request | ||
import zipfile | ||
|
||
DATA_DIR = "dataset" | ||
|
||
ZIP_URL = "http://archive.ics.uci.edu/static/public/222/bank+marketing.zip" | ||
ZIP_DOWNLOAD_PATH = os.path.join(DATA_DIR, "downloaded.zip") | ||
NESTED_ZIP_PATH = os.path.join(DATA_DIR, "bank-additional.zip") | ||
|
||
DATA_FILE = "bank-additional.csv" | ||
DATA_CSV_PATH = os.path.join(DATA_DIR, "bank-additional", DATA_FILE) | ||
|
||
|
||
def main(): | ||
"""Pulls file from source, and uploads to a target S3 bucket""" | ||
|
||
# Download the zip | ||
os.makedirs(DATA_DIR, exist_ok=True) | ||
urllib.request.urlretrieve(ZIP_URL, ZIP_DOWNLOAD_PATH) | ||
|
||
# Extract | ||
with zipfile.ZipFile(ZIP_DOWNLOAD_PATH, "r") as fh: | ||
fh.extractall(DATA_DIR) | ||
|
||
# Remove original zip | ||
os.remove(ZIP_DOWNLOAD_PATH) | ||
|
||
# Extract zips within the zip | ||
with zipfile.ZipFile(NESTED_ZIP_PATH) as fh: | ||
fh.extractall(DATA_DIR) | ||
|
||
access_key = os.environ["SCW_ACCESS_KEY"] | ||
secret_key = os.environ["SCW_SECRET_KEY"] | ||
bucket_name = os.environ["S3_BUCKET_NAME"] | ||
region_name = os.environ["SCW_REGION"] | ||
s3_url = f"https://s3.{region_name}.scw.cloud" | ||
|
||
s3 = boto3.resource( | ||
"s3", | ||
region_name=region_name, | ||
use_ssl=True, | ||
endpoint_url=s3_url, | ||
aws_access_key_id=access_key, | ||
aws_secret_access_key=secret_key, | ||
) | ||
|
||
bucket = s3.Bucket(name=bucket_name) | ||
bucket.upload_file( | ||
Filename=DATA_CSV_PATH, | ||
Key=DATA_FILE, | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
boto3==1.33.2 | ||
requests==2.31.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
version: "3" | ||
|
||
services: | ||
data: | ||
build: | ||
context: ./data | ||
depends_on: | ||
- minio | ||
|
||
training: | ||
build: | ||
context: ./training | ||
depends_on: | ||
- minio | ||
|
||
inference: | ||
build: | ||
context: ./inference | ||
ports: | ||
- 8080:80 | ||
depends_on: | ||
- minio | ||
|
||
minio: | ||
image: minio/minio | ||
ports: | ||
- "9000:9000" | ||
- "9001:9001" | ||
volumes: | ||
- minio_storage:/data | ||
environment: | ||
MINIO_ROOT_USER: example | ||
MINIO_ROOT_PASSWORD: example | ||
command: server --console-address ":9001" /data | ||
|
||
volumes: | ||
minio_storage: {} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
{ | ||
"age": 44, | ||
"job": "blue-collar", | ||
"marital": "married", | ||
"education": "basic.4y", | ||
"default": "unknown", | ||
"housing": "yes", | ||
"loan": "no", | ||
"contact": "cellular", | ||
"month": "aug", | ||
"day_of_week": "thu", | ||
"duration": 210, | ||
"campaign": 1, | ||
"pdays": 999, | ||
"previous": "0", | ||
"poutcome": "nonexistent", | ||
"emp_var_rate": 1.4, | ||
"cons_price_idx": 93.444, | ||
"cons_conf_idx": -36.1, | ||
"euribor3m": 4.963, | ||
"nr_employed": 5228.1 | ||
} |
Oops, something went wrong.