Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
learning_path_1.md		learning_path_1.md

README.md

ODS #wheretostart recommendations

This article collects public knowledge on how to start learning field commonly addressed as Data Science. We are adressing what is called DS, what spheres and disciplined are covered by it and suggest recipes for stepping on path of getting familiar with this field.

Data Science is a term describing everything related to processing, storing and mining any information. This includes different disciplines, cources and spheres of our life. General undestanding (helicopter overview) can be derived from reading Google's brilliant web comic (~5 min) and reading great Vas3k's post Machine Learning for Everyone (~15 min).

There is now way to be taught to be data scientist, but you can learn how to become one yourself. There is no right way, but there is a way, which was adopted by a number of data scientists and it lies with online courses (MOOC). With this article we aim at anwering those common questions:

Where to start with data science?
How to become a data scientist?

We are looking up at the awesome resource Teach Yourself CS and aim at providing useful and actionable insights on how to learn skills and get knowledge to get onto data-driven way. However if you don't like our guide there are some alternatives included into the appendix.

Buzzwords disclaimer

We want to provide most inclusive and open information, therefore we do not explicitly distinquish skills between what one would name different job specialisations: ML Engineers, Data Engineers, Deep Learning specialists, Data Analytics etc.

TL;DR:

We assume that you have at least some background with programming, if not, you can address the aforementioned Teach Yourself CS to learn basics of programming and algorithms.

Then you can get at least an overview of these topics, ideally studying suggested courses and/or watching videos. If you need any learning path, you can address the suggested by our fellow community member or any other alternative articles at the end of the article.

General Math for Data Science

Why matters: you need to know fundamental math stuff to understand what's happening on the low level.

Book: Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares

Web page: The Matrix Calculus You Need For Deep Learning

Courses:

Playlists:

Mathematics for Machine Learning: Linear Algebra by Imperial College London
Discrete Math (Full Course: Sets, Logic, Proofs, Probability, Graph Theory, etc) by Dr. Trefor Bazett

Computer Science and different courses on DS

Why matters: you need to know how to code.

Courses list: Open Source Society University

Statistics

Why matters: research can go wrong if you don't check for fundamental flaws.

Courses:

Course from Stanford

General Machine Learning

Why matters: general concepts of how computers can generalize.

Courses:

ODS Machine Learning Course course page, Kaggle Intro post
Andrew Ng’s Machine Learning
CS229 @ Stanford
COMS W4995 Applied Machine Learning
Google's crash-course on ML
(Recommended for coders with at least 1 year of experience) Introduction for machine learning for coders with an intro post

General Neural Networks / Deep Learning

Why matters: neural networks tend to be unreasonably effective sometimes.

Courses:

Natural Language Processing

Why matters: NLP allows to percieve sentiment, extract knowledge, perform search and machine translation.

Courses:

CS224d: Deep Learning for Natural Language Processing from Stanford
A Code-First Introduction to Natural Language Processing by fast.ai
Coursera NLP specialization

Computer Vision

Why matters: CV allows to classify images, segment them, identify objects and process visual information.

Course: CS231n: Convolutional Neural Networks for Visual Recognition from Stanford

Reinforcement learning

Why matters: Reinforcement Learning or RL covers self-driving / autonomous vehicles as well as any other acting agents in any environment.

Courses:

Graph Learning

Why matters: graphs are the best way to model relationship in your data (friendships, particle interactions, object positions, etc.).

Courses:

Surveys:

Practice:

Data Engineering

Data Engineering is about converting a Data Science research from thoughts, insights and research into a production project. It means efficiently using computers and building reliable distributed architectures to perform data conversion, ETL, Batch and Stream processing.

Books:

Blogs:

Courses:

Big Data Analysis with Scala and Spark
MIT 6.824: Distributed Systems
Big Data Specialization
Importing Data in Python (Part 1, Part 2)

Certifications:

Top 13 data engineer and data architect certifications | CIO

Some common questions answered

There are some common pitfals and 'hacks' which any data scientist will encounter. Below is a cherry-picked collection of great articles on the matters:

Feature engineering and Selection: A Practical Approach

Online book on how to work on feature engineering.

Link: https://bookdown.org/max/FES/

Bayesian Statistics explained to Beginners in Simple English

Now some #entrylevel material, which still might be useful to review, because repetitio est mater studiorum.

Link: https://www.analyticsvidhya.com/blog/2016/06/bayesian-statistics-beginners-simple-english/

P-value, explained, one more time with demos

Article includes not only great explanation of what is #pvalue, but how it works and how it can be used to make a correct conclusions.

Link: https://www.freecodecamp.org/news/what-is-statistical-significance-p-value-defined-and-how-to-calculate-it/

🥇Parameter optimization in neural networks.

Play with three interactive visualizations and develop your intuition for optimizing model parameters.

Link: https://www.deeplearning.ai/ai-notes/optimization/

Probabilistic foundations of econometrica: part 1

Great intro into #statistics basics.

Link: https://freakonometrics.hypotheses.org/57649

Implementing Transfer Learning in PyTorch

Fine-tuning and feature extraction with PyTorch

Link: https://medium.com/analytics-vidhya/transfer-learning-in-pytorch-f7736598b1ed

Yet another good intro into difference between artificial neural network and biological one.

If you're getting started in Data Science, you need to start with the basic building building block of Neural Networks - a Perceptron. To understand what it is, there's this good link to get started with.

Link: https://towardsdatascience.com/the-differences-between-artificial-and-biological-neural-networks-a8b46db828b7

Time series basics

Time series — data, with points having timestamps. Some might think that #timeseries are mostly used in algorithmic trading, but they often used in malware detection, network data analysis or any other field, dealing with some flow of time-labeled data. These two resources provide deep and easy #introduction into #TS analysis.

Github: https://github.com/akshaykapoor347/Time-series-modeling-basics

Data Camp presentation: https://s3.amazonaws.com/assets.datacamp.com/production/course_5702/slides/chapter3.pdf

Repo on signals filtering

It is better to study Kalman filter in advance because knowing about it can save lots of time.

Github: Link

Hitchhiker’s guide to Exploratory Data Analysis

Exploratory Data Analysis — stage of finding out distribution of the data, volume, number of missing values and all the other characteristics of the available dataset.

Part 1: https://towardsdatascience.com/hitchhikers-guide-to-exploratory-data-analysis-6e8d896d3f7e Part 2: https://towardsdatascience.com/hitchhikers-guide-to-exploratory-data-analysis-part-2-36ab72201e1d

GANs from Scratch 1: A deep introduction

Great introduction and tutorial. With code in PyTorch and TensorFlow

Link: https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f

Classification and Loss Evaluation - Softmax and Cross Entropy Loss

Nice notes on softmax cross entropy loss and how to implement it in numpy.

Link: https://deepnotes.io/softmax-crossentropy

Simple comic on how #ML works from #Google

Make sure you save the link (or this message) to show it to people without great technical background for it is one of the best and clear explanations there is.

Link: https://cloud.google.com/products/ai/ml-comic-1/

Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas

This book provides basic knowledge about using NumPy, Pandas, Matplotlib and Scikit-Learn with Jupyter Notebook for beginners from scratch. The link below leads to a repository containing the entire book and training materials.

Link: https://github.com/jakevdp/PythonDataScienceHandbook

Hands on ML notebook series

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Link: https://github.com/ageron/handson-ml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

where_to_start

where_to_start

README.md

ODS #wheretostart recommendations

Buzzwords disclaimer

TL;DR:

General Math for Data Science

Computer Science and different courses on DS

Statistics

General Machine Learning

General Neural Networks / Deep Learning

Natural Language Processing

Computer Vision

Reinforcement learning

Graph Learning

Data Engineering

Some common questions answered

Feature engineering and Selection: A Practical Approach

Bayesian Statistics explained to Beginners in Simple English

P-value, explained, one more time with demos

🥇Parameter optimization in neural networks.

Probabilistic foundations of econometrica: part 1

Implementing Transfer Learning in PyTorch

Yet another good intro into difference between artificial neural network and biological one.

Time series basics

Repo on signals filtering

Hitchhiker’s guide to Exploratory Data Analysis

GANs from Scratch 1: A deep introduction

Classification and Loss Evaluation - Softmax and Cross Entropy Loss

Simple comic on how #ML works from #Google

Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas

Hands on ML notebook series

Alternative guides

Files

where_to_start

Directory actions

More options

Directory actions

More options

Latest commit

History

where_to_start

Folders and files

parent directory

README.md

ODS #wheretostart recommendations

Buzzwords disclaimer

TL;DR:

General Math for Data Science

Computer Science and different courses on DS

Statistics

General Machine Learning

General Neural Networks / Deep Learning

Natural Language Processing

Computer Vision

Reinforcement learning

Graph Learning

Data Engineering

Some common questions answered

Feature engineering and Selection: A Practical Approach

​​Bayesian Statistics explained to Beginners in Simple English

P-value, explained, one more time with demos

​​🥇Parameter optimization in neural networks.

Probabilistic foundations of econometrica: part 1

Implementing Transfer Learning in PyTorch

Yet another good intro into difference between artificial neural network and biological one.

Time series basics

Repo on signals filtering

Hitchhiker’s guide to Exploratory Data Analysis

​​GANs from Scratch 1: A deep introduction

Classification and Loss Evaluation - Softmax and Cross Entropy Loss

Simple comic on how #ML works from #Google

Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas

Hands on ML notebook series

Alternative guides

Bayesian Statistics explained to Beginners in Simple English

🥇Parameter optimization in neural networks.

GANs from Scratch 1: A deep introduction