Skip to content

Commit

Permalink
Part 1
Browse files Browse the repository at this point in the history
  • Loading branch information
Shillaker committed Dec 1, 2023
1 parent 23e983f commit defd1d7
Show file tree
Hide file tree
Showing 33 changed files with 479 additions and 445 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ node_modules/
.serverless/

# Env files
*.env
*.env

# Python
venv/
99 changes: 64 additions & 35 deletions jobs/ml-ops/README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,96 @@
# Serverless MLOps

In this example, we train and deploy a binary classification inference API using serverless computing resources (job+container). We use object storage resources to store data and training artifacts. We use container registry to store docker images.
In this example, we train and deploy a binary classification inference model using Scaleway Serverless. To do this, we use the following resources:

## Use case: Bank Telemarketing
1. Serverless Job for training
2. Serverless Job to populate data in S3
3. Serverless Container for inference

### Context
We use object storage to share data between the two.

## Context

In this example we use a bank telemarketing dataset to predict if a client would engage in a term deposit subscription.

This dataset records marketing phone calls made to clients. The outcome of the phone call is in shown in the `y` column:

We use a bank telemarketing dataset to predict if a client would engage in a term deposit subscription. This dataset records marketing phone calls made to clients. The outcome of the phone call is in shown in the `y` column:
* `0` : no subscription
* `1` : subscription

### Data Source
## Data Source

The dataset has many versions and is open-sourced and published [here](http://archive.ics.uci.edu/dataset/222/bank+marketing) on the UCI Machine Leaning repository and is close to the one analyzed in the following research work:

* [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

We use the dataset labelled in the source as `bank-additional-full.csv`. You can download, extract this file, rename it to `bank_telemarketing.csv` then put it under this [directory](./s3/data-store/data/).
## Running the example

## How to deploy your MLOps pipeline on Scaleway Cloud?

### Step A: Create cloud resources for the ML pipeline
### Step 1. Provision resources with Terraform

Create `.env` file in `jobs/data-loader-job` and `jobs/ml-job` directories and fill them as it follows:
Set your Scaleway access key, secret key and project ID in environment variables:

```text
SCW_ACCESS_KEY=<access-key>
SCW_SECRET_KEY=<secret-key>
```console
export TF_VAR_access_key=<your-access-key>
export TF_VAR_secret_key=<your-secret-key>
export TF_VAR_project_id=<your-project-id>

cd terraform
terraform init
terraform plan
terraform apply
```

Create `.tfvars` file in `/terraform` directory and put variable values in it:
### Step 2. Run the data and training Jobs

*At the time of writing, the Scaleway CLI does not support Jobs, so we use a Python script*

```
region = "fr-par"
access_key = "<access-key>"
secret_key = "<secret_key>"
project_id = "<project_id>"
data_file = "bank_telemarketing.csv"
model_object = "classifier.pkl"
image_version = "v1"
cd scripts
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 run upload
python3 run training
```

Then perform:
You can then check your Job runs in the [Jobs Console](https://console.scaleway.com/serverless-jobs/jobs).

### Step 4. Use the inference API

```bash
cd terraform
terraform init
terraform plan -var-file=testing.tfvars
terraform apply -var-file=testing.tfvars
```
export INFERENCE_URL=$(terraform output endpoint)
### Step B: Define and run a job to ship data from public source to s3
curl -X POST \
-H "Content-Type: application/json" \
-d @inference/example.json
$INFERENCE_URL
```

Use the console to define and run the data loader job using image pushed to Scaleway registry.
## Local testing

cf. this [readme](./jobs/data-loader-job/README.md)
To test the example locally you can use [Docker Compose](https://docs.docker.com/compose/install/).

### Step C: Define and run the ML job to train classifier
```
# Build the containers locally
docker compose build
Use the console to define and the ML job using image pushed to Scaleway registry.
# Run the data job
docker compose run data
cf. this [readme](./jobs/ml-job/README.md)
# Run the training
docker compose run training
### Step D: Call your serverless container to (re)load model and to get inference results
# Start the inference server
docker compose up inference
```

Access the inference API locally:

cf. this [readme](./containers/inference-api/README.md)
```
curl -X POST \
-H "Content-Type: application/json" \
-d @inference/example.json
http://localhost:8080
```
15 changes: 0 additions & 15 deletions jobs/ml-ops/containers/inference-api/README.md

This file was deleted.

43 changes: 0 additions & 43 deletions jobs/ml-ops/containers/inference-api/main.py

This file was deleted.

1 change: 1 addition & 0 deletions jobs/ml-ops/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dataset/
16 changes: 16 additions & 0 deletions jobs/ml-ops/data/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM python:3.12-slim-bookworm

WORKDIR /app

RUN apt-get update
RUN apt-get install -y \
curl \
unzip

RUN pip install --upgrade pip
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "main.py"]
57 changes: 57 additions & 0 deletions jobs/ml-ops/data/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import boto3
import os
import urllib.request
import zipfile

DATA_DIR = "dataset"

ZIP_URL = "http://archive.ics.uci.edu/static/public/222/bank+marketing.zip"
ZIP_DOWNLOAD_PATH = os.path.join(DATA_DIR, "downloaded.zip")
NESTED_ZIP_PATH = os.path.join(DATA_DIR, "bank-additional.zip")

DATA_FILE = "bank-additional.csv"
DATA_CSV_PATH = os.path.join(DATA_DIR, "bank-additional", DATA_FILE)


def main():
"""Pulls file from source, and uploads to a target S3 bucket"""

# Download the zip
os.makedirs(DATA_DIR, exist_ok=True)
urllib.request.urlretrieve(ZIP_URL, ZIP_DOWNLOAD_PATH)

# Extract
with zipfile.ZipFile(ZIP_DOWNLOAD_PATH, "r") as fh:
fh.extractall(DATA_DIR)

# Remove original zip
os.remove(ZIP_DOWNLOAD_PATH)

# Extract zips within the zip
with zipfile.ZipFile(NESTED_ZIP_PATH) as fh:
fh.extractall(DATA_DIR)

access_key = os.environ["SCW_ACCESS_KEY"]
secret_key = os.environ["SCW_SECRET_KEY"]
bucket_name = os.environ["S3_BUCKET_NAME"]
region_name = os.environ["SCW_REGION"]
s3_url = f"https://s3.{region_name}.scw.cloud"

s3 = boto3.resource(
"s3",
region_name=region_name,
use_ssl=True,
endpoint_url=s3_url,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
)

bucket = s3.Bucket(name=bucket_name)
bucket.upload_file(
Filename=DATA_CSV_PATH,
Key=DATA_FILE,
)


if __name__ == "__main__":
main()
2 changes: 2 additions & 0 deletions jobs/ml-ops/data/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
boto3==1.33.2
requests==2.31.0
37 changes: 37 additions & 0 deletions jobs/ml-ops/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
version: "3"

services:
data:
build:
context: ./data
depends_on:
- minio

training:
build:
context: ./training
depends_on:
- minio

inference:
build:
context: ./inference
ports:
- 8080:80
depends_on:
- minio

minio:
image: minio/minio
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio_storage:/data
environment:
MINIO_ROOT_USER: example
MINIO_ROOT_PASSWORD: example
command: server --console-address ":9001" /data

volumes:
minio_storage: {}
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ FROM python:3.12-slim-bookworm

WORKDIR /app

COPY . .

RUN pip install --upgrade pip
COPY requirements.txt .
RUN pip install -r requirements.txt

CMD ["uvicorn", "main:app", "--proxy-headers", "--host", "0.0.0.0", "--port", "80"]
COPY . .
CMD ["uvicorn", "main:app", "--proxy-headers", "--host", "0.0.0.0", "--port", "80"]
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,21 @@ class ClientProfile(BaseModel):
nr_employed: float


def clean_data(data: pd.DataFrame) -> pd.DataFrame:
def clean_profile(profile: ClientProfile) -> pd.DataFrame:
"""Removes rows with missing value(s)"""

data = data.dropna()
return data
profile_json = profile.model_dump()

cleaned = pd.DataFrame(index=[0], data=profile_json)
cleaned = cleaned.dropna()

return cleaned


def transform_data(data: pd.DataFrame) -> pd.DataFrame:
"""
This method handles the transformation of categorical variables of the dataset into 0/1 indicators.
It also adds missing categorical variables that are by default false (0).
Transforms categorical variables of the dataset into 0/1 indicators.
Adds missing categorical variables that are by default false (0).
"""

# # use the same category for basic education sub-categories
Expand Down
22 changes: 22 additions & 0 deletions jobs/ml-ops/inference/example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"age": 44,
"job": "blue-collar",
"marital": "married",
"education": "basic.4y",
"default": "unknown",
"housing": "yes",
"loan": "no",
"contact": "cellular",
"month": "aug",
"day_of_week": "thu",
"duration": 210,
"campaign": 1,
"pdays": 999,
"previous": "0",
"poutcome": "nonexistent",
"emp_var_rate": 1.4,
"cons_price_idx": 93.444,
"cons_conf_idx": -36.1,
"euribor3m": 4.963,
"nr_employed": 5228.1
}
Loading

0 comments on commit defd1d7

Please sign in to comment.