Authors
Kevin T. Chu <[email protected]>
-
1.2. Directory Structure
1.3. Template Files
-
2.1. Setting Up
2.2. Conventions
2.3. Environment
This project template is intended to support data science projects that utilize Jupyter notebooks for experimentation and reporting. The design of the template is based on the blog article "Jupyter Notebook Best Practices for Data Science" by Jonathan Whitmore.
Features include:
-
Compatible with standard version control software.
-
Automatically saves HTML and
*.py
versions of Jupyter notebooks to facilitate review of both (1) data science results and (2) implementation code. -
Supports common data science workflows (for both individuals and teams).
- Python
autoenv
virtualenv
virtualenvwrapper
README.markdown
requirements.txt
config/
data/
lab-notebook/
reports/
src/
-
README.markdown
: this file -
requirements.txt
:pip
requirements file containing Python packages for data science, testing, and assessing code quality -
config
: directory containing template configuration files (e.g.,autoenv
configuration file) -
data
: directory where project data should be placed. Note: data placed in this directory does not necessarily need to be committed to the git repository. For projects with large datasets, committing the data to the git repository is discouraged. -
lab-notebook
: directory containing Jupyter notebooks used for experimentation and development. Jupyter notebooks saved in this directory should (1) have a single author and (2) be dated. -
lib
: directory containing source code developed to support project -
reports
: directory containing Jupyter notebooks that present and record final results. Jupyter notebooks saved in this directory should be polished, contain final analysis results, and be the work product of the entire data science team.
Template files and directories are indicated by the 'template' suffix. These files and directories are intended to simplify the set up of the lab notebook. When appropriate, they should be renamed (with the 'template' suffix removed).
-
Create Python virtual environment for project.
$ mkvirtualenv -p /PATH/TO/PYTHON PROJECT_NAME
-
Install required Python packages.
$ pip install -r requirements.txt
-
Set up autoenv.
-
Copy
config/env
to.env
in project root directory. -
Set template variables in
.env
(indicated by{{ }}
notation).
-
-
Jupyter notebooks in the
lab-notebook
directory should be named using the following convention:YYYY-MM-DD-AUTHOR_INITIALS-BRIEF_DESCRIPTION.ipynb
.- Example:
2019-01-17-KTC-information_theory_analysis.ipynb
- Example:
-
Depending on the nature of the project, it may be useful to organize lab notebook entries into sub-directories (e.g., by team member, by sub-project).
-
TODO
-
autoenv
DATA_DIR
-
aliases
- jn
-
Change to the directory where Jupyter notebook should be saved.
$ cd NOTEBOOK_DIR
-
Launch the Jupyter Notebook App.
$ jupyter notebook
-
Use the menu under the
New
button to create a new Jupyter Notebook.
-
Change to the directory containing the Jupyter notebook.
$ cd NOTEBOOK_DIR
-
Launch the Jupyter Notebook App with the specified notebook file.
$ jupyter notebook NOTEBOOK_FILE.ipynb
- J. Whitmore. "Jupyter Notebook Best Practices for Data Science" (2016/09).