MLOps_Zoomcamp_Study/01-intro at main · Hokfu/MLOps_Zoomcamp_Study

History

Name		Name	Last commit message	Last commit date
parent directory ..
models		models
README.md		README.md
module1_homework.ipynb		module1_homework.ipynb
module1_study.ipynb		module1_study.ipynb

README.md

Environment Preparation

For developing and automating machine learning models with MLOps practices, Linux environment is important. We have a variety of options.

AWS EC2
Compute Engine ( Google )
Github CodeSpace

Using Github CodeSpace

Docker is preinstalled on Github CodeSpace. So, we only need to install Anaconda. In the root directory,

wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh
bash Anaconda3-2023.09-0-Linux-x86_64.sh
conda install jupyter

Check conda path in the environment, we can use this command in the terminal.

which conda

To define path, first open ~/.bashrc file with nano. Inside ~/.bashrc,

export PATH="<YourAnacondaPath>/bin:$PATH"

Save it and run the followig in the project repo to make sure of changes.

source ~/.bashrc

We can test if we are using the correct interpreter with the following command

which python

What I studied

We trained a machine learning model on the New York Taxi Dataset. In this module, we only practiced maturity 0 model without MLOps asepcts.

Data Preparation

We used pick up location ID and drop off location ID and trip distance as features. Our target variable is the trip duration which we got by substracting pick up datetime from drop off datetime. We get total seconds with total_seconds() method to the timedelta object and we divide it by 60 to get total minutes.

Removing Outliers

We can experiment mean, median, mode and standard deviations of trip durations. We can also experiment with the following code. The purpose of the experiments is to detect outliers in the target variable.

df.duration.describe(percentiles=[list])

((df.duration>=1) & (df.duration<=60)).mean()

Most of the trips are between 1 minutes and 60 minutes so we removed outliers by only filtering only those trips.

Model Training

Regression models such as linear regression, lasso, ridge are trained and validated with rmse scores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01-intro

01-intro

README.md

Environment Preparation

Using Github CodeSpace

What I studied

Data Preparation

Removing Outliers

Model Training

Related Links

Files

01-intro

Directory actions

More options

Directory actions

More options

Latest commit

History

01-intro

Folders and files

parent directory

README.md

Environment Preparation

Using Github CodeSpace

What I studied

Data Preparation

Removing Outliers

Model Training

Related Links