ML for NFL Game Prediction

This project is a result of iteration throughout the 2017 NFL season. It has lived through many different implementations, and as such, some remnants still remain. At the moment, the remaining TODOs are:

Complete modularization into functionality categories according to Project Organization below
Eliminate deprecated manual optimization, replace with AutoML pipeline
Clean up remaining docstrings
Determine and eliminate redundancy
Implement a tensorflow CNN to see if results are superior/inferior

Overview

The basic idea of the project is to:

Gather historical stats for QBs and RBs - two positions that arguably have the most impact on the outcome of NFL games
- Stats gathered:

QB	RB
Team	Team
Completions	Rushing Attempts
Attempts	Rushing Yards
Yards	Rushing TDs
TDs	Rushing Long
Interceptions	Receiving Targets
Sacks	Receiving Yards
Sack Yards	Receiving TDs
QB Rating	Receiving Long
Completion Long	Total Touches
Rushing Attempts	Week
Rushing Yards	Season
Rushing TDs	Opposing Team
Rushing Long	Score
Fumbles	Score against
Fumbles Lost	Win/Loss
Week
Season
Opposing Team
Score
Score against
Win/Loss

Generate cumulative summation numbers for each game for each player
Merge the QB and RB data on a per-game basis
Format the data to make it ML-friendly (done by mapping players and teams to an id in a psql table)
Use this merged data to feed into an AutoML pipeline to iterate over hyperparameters and models
Gather the starting RBs and QBs in the games you want to predict (has to be done close to gametime for accurate starter information)
Pull the cumulative summation data for each player playing & merge QB and RB data on a per-game basis. Format to reflect training data
Use the optimal model output of the AutoML pipeline to predict the NFL games of interest
Use the confidence values for each prediction as a percentage scale to recommend amount of money to bet on each game

Usage

The project requires credentials to be stored in config.txt. At the moment there is only support for an S3 bucket or attached PostGreSQL DB, so one of those should be fulfilled prior to usage.

Generally, the workflow above can be seen in use in src/test/2017_retro_code.py, which is what was used for the 2017 season retrospective. The only difference is the model used was pre-generated from an AutoML pipeline in order to skip the training process.

Timeline

For a sense of how I came to this point, here is a timeline of the project:

Since there were too many iterations to denote in the timeline, the Data Format Testing cases consisted of:

Merging different number and types of players per game:

Test Case	Players Included Per Game
1	Starting Home QB Starting Home RB1
2	Starting Home QB Starting Home RB1 Starting Home RB2
3	Starting Home QB Starting Home RB1 Starting Away QB Starting Away RB1
4	Starting Home QB Starting Home RB1 Starting Home RB2 Starting Away QB Starting Away RB1 Starting Away RB2
5	Starting Home QB Starting Home RB Starting Home WR1
6	Starting Home QB Starting Home RB1 Starting Home WR1 Starting Away QB Starting Away RB1 Starting Away WR1

Using cumulative summation, rolling average, or raw stats for training
Iterating over which stats to use for each position
Determining inter and intra-season weights

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.testrun.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML for NFL Game Prediction

Overview

Usage

Timeline

Project Organization

LICENSE: MIT

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
config.txt		config.txt
requirements.txt		requirements.txt
test_environment.py		test_environment.py
tox.ini		tox.ini

License

Rosswell/ml_for_nfl

Folders and files

Latest commit

History

Repository files navigation

ML for NFL Game Prediction

Overview

Usage

Timeline

Project Organization

LICENSE: MIT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages