Skip to content

velexi-research/INTERNSHIP-2020-VT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Velexi Template: Data Science Project

Authors
Kevin T. Chu <[email protected]>


Table of Contents

  1. Overview

    1.1. Software Dependencies

    1.2. Directory Structure

    1.3. Template Files

  2. Usage

    2.1. Setting Up

    2.2. Conventions

    2.3. Environment

    2.4. Using Jupyter Notebook

  3. References


1. Overview

This project template is intended to support data science projects that utilize Jupyter notebooks for experimentation and reporting. The design of the template is based on the blog article "Jupyter Notebook Best Practices for Data Science" by Jonathan Whitmore.

Features include:

  • Compatible with standard version control software.

  • Automatically saves HTML and *.py versions of Jupyter notebooks to facilitate review of both (1) data science results and (2) implementation code.

  • Supports common data science workflows (for both individuals and teams).

1.1 Software Dependencies

Base Requirements

  • Python

Recommended Python Packages

  • autoenv
  • virtualenv
  • virtualenvwrapper

1.2 Directory Structure

README.markdown
requirements.txt
config/
data/
lab-notebook/
reports/
src/
  • README.markdown: this file

  • requirements.txt: pip requirements file containing Python packages for data science, testing, and assessing code quality

  • config: directory containing template configuration files (e.g., autoenv configuration file)

  • data: directory where project data should be placed. Note: data placed in this directory does not necessarily need to be committed to the git repository. For projects with large datasets, committing the data to the git repository is discouraged.

  • lab-notebook: directory containing Jupyter notebooks used for experimentation and development. Jupyter notebooks saved in this directory should (1) have a single author and (2) be dated.

  • lib: directory containing source code developed to support project

  • reports: directory containing Jupyter notebooks that present and record final results. Jupyter notebooks saved in this directory should be polished, contain final analysis results, and be the work product of the entire data science team.

1.3. Template Files

Template files and directories are indicated by the 'template' suffix. These files and directories are intended to simplify the set up of the lab notebook. When appropriate, they should be renamed (with the 'template' suffix removed).


2. Usage

2.1 Setting Up

  • Create Python virtual environment for project.

    $ mkvirtualenv -p /PATH/TO/PYTHON PROJECT_NAME
  • Install required Python packages.

    $ pip install -r requirements.txt
  • Set up autoenv.

    • Copy config/env to .env in project root directory.

    • Set template variables in .env (indicated by {{ }} notation).

2.2 Conventions

lab-notebook directory

  • Jupyter notebooks in the lab-notebook directory should be named using the following convention: YYYY-MM-DD-AUTHOR_INITIALS-BRIEF_DESCRIPTION.ipynb.

    • Example: 2019-01-17-KTC-information_theory_analysis.ipynb
  • Depending on the nature of the project, it may be useful to organize lab notebook entries into sub-directories (e.g., by team member, by sub-project).

2.3 Environment

  • TODO

  • autoenv

    • DATA_DIR
  • aliases

    • jn

2.4 Using Jupyter Notebook

Creating a New Jupyter Notebook

  1. Change to the directory where Jupyter notebook should be saved.

    $ cd NOTEBOOK_DIR
  2. Launch the Jupyter Notebook App.

    $ jupyter notebook
  3. Use the menu under the New button to create a new Jupyter Notebook.

Opening an Existing Jupyter Notebook

  1. Change to the directory containing the Jupyter notebook.

    $ cd NOTEBOOK_DIR
  2. Launch the Jupyter Notebook App with the specified notebook file.

    $ jupyter notebook NOTEBOOK_FILE.ipynb

3 References


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages

  • HTML 92.0%
  • Python 5.0%
  • Jupyter Notebook 3.0%