Hello everybody !
Welcome to your first Data Science lecture !
In this course, you will discover an overview of our job content. You will find educational supports inside the resources/courses/
folder, with the theoretical part in slides/
subfolder, and the applied part in the notebooks/
subfolder.
This lecture will be focused on Data Science basics using Python, with a focus on data analysis and manipulation with pandas and numpy and supervised learning with scikit-learn. You will also discover some visualizations with matplotlib and seaborn.
This course will be delivered by Yannick & Paul, data scientists at Betclic group.
All the 0. steps are optional, if you have already done them, go directly on the first step.
Download Python depending on your Operating System, on the following webpage. Please, choose a release with a security Maintenance status (3.6.x or 3.7.x) to avoid compatibility issues.
Everything is explained on this webpage.
Register on Kaggle. Then, join the Titanic compete
Open a terminal, and run the following command.
git clone https://github.com/paulsteffen-lab/datascience-courses.git
In the same terminal, change directory in the new folder downloaded with the previous command, and install all dependencies specified in the requirements.txt
file with pip
.
cd datascience-courses/
pip install -r requirements.txt
First, get a kaggle.json
containing Kaggle username and key by following the procedure in API credentials in the following webpage and place this file as described.
Then, change directory in the data subfolder, and download titanic data with the following command.
cd resources/data/
kaggle competitions download -c titanic
Finally, unzip titanic.zip
.
unzip titanic.zip -d titanic