-
1.1. Project Organization
1.2. References
-
2.1. Important Notes
2.2. Setup Steps
The DermaML project explores the use of machine learning approaches for predicting a person's age from an image of her/his hand.
- TODO: List of key project references
├── README.md <- this file
├── LICENSE <- license for the contents of the project
├── NOTICE <- copyright notice for the contents of the project
├── Makefile <- Makefile containing useful shortcuts (`make`
│ rules). Use `make help` to show the list of
│ available rules.
├── pyproject.toml <- Python project metadata file (e.g., Python package
│ dependencies)
├── poetry.lock <- Poetry lockfile
├── bin/ <- scripts and programs
├── data/ <- project data. See "Project Conventions" section for
│ │ data organization and processing conventions.
│ ├── final/ <- data directly used to generate key results
│ │ (e.g., data for figures in reports)
│ ├── processed/ <- processed data that is ready for use
│ └── raw/ <- source data in its original format
├── docs/ <- project documentation
│ ├── api/ <- source code documentation (generated by `make docs`)
│ └── references/ <- reference materials (e.g., research articles)
├── extras/ <- additional files and references that may be useful
│ │ for the project
│ └── quick-references/ <- quick references for software tools that support
├── notebooks/ <- research notes and Jupyter notebooks. See "Project
│ Conventions" section for notebook conventions.
├── reports/ <- research reports
├── src/ <- project source code
└── tests/ <- project test code
Note: this project uses poetry
to manage Python
package dependencies.
-
Prerequisites
-
Install Git.
-
Install Python. Recommendation: use
pyenv
to configure the project to use a specific version of Python.-
This project currently requires a Python version >= 3.9 and < 3.12.
-
Recommendation. If your default Python version is not in this range,, use
pyenv
to install Python 3.11 and configure it as the Python version for the project.$ pyenv install 3.11 $ pyenv local 3.11
-
-
Install Poetry 1.2 (or greater).
-
Optional. Install direnv.
-
-
Set up a dedicated virtual environment for the project. Any of the common virtual environment options (e.g.,
venv
,direnv
,conda
) should work. Below are instructions for setting up adirenv
orpoetry
environment.Note: to avoid conflicts between virtual environments, only one method should be used to manage the virtual environment.
-
direnv
Environment. Note:direnv
manages the environment for both Python and the shell.-
Prerequisite. Install
direnv
. -
Copy
extras/dot-envrc
to the project root directory, and rename it to.envrc
.$ cd $PROJECT_ROOT_DIR $ cp extras/dot-envrc .envrc
-
Grant permission to direnv to execute the .envrc file.
$ direnv allow
-
-
poetry
Environment. Note:poetry
only manages the Python environment (it does not manage the shell environment).-
Create a
poetry
environment that uses a specific Python executable. For instance, ifpython3
is on yourPATH
, the following command creates (or activates if it already exists) a Python virtual environment that usespython3
.$ poetry env use python3
For commands to use other Python executables for the virtual environment, see the Poetry Quick Reference.
-
-
-
Upgrade
pip
to the latest released version.$ pip install --upgrade pip
-
Install the Python packages required for the project.
$ poetry install
Known Issues
-
For virtual environments not created with
poetry
(e.g.,direnv
), a system- or user-level installation ofpoetry
might fail (e.g., if paths to Python packages required bypoetry
are missing from thePYTHONPATH
environment variable in the virtual environment). To avoid having to manually modifyPYTHONPATH
, installpoetry
within the virtual environment before runningpoetry install
:$ pip install poetry
-
On ARM-based Macs, some Python dependencies may require additional steps to install correctly.
-
pycaret
. Pycaret depends on packages that may fail to install if the OpenMP libraries are not available on the system. To remedy this issue, install thelibomp
Homebrew package.$ brew install libomp
-
-
-
Download project data from remote storage.
$ dvc pull
-
Do research! Make discoveries! Advance knowledge!
-
All data should be placed in the
data
directory. -
Data should be organized into the following subdirectories.
-
raw
: source data in its original format. Data in this directiory data should never be modified. -
processed
: processed data that is ready for use. All data in this directory should be generated by a deterministic, automateable process (possibly multi-step) that uses only raw data as input. -
final
: data directly used to generate key results (e.g., data for figures in reports). All data in this directory should be generated by a deterministic, automateable process (possibly multi-step) that uses processed and/or raw data as input.
-
- Depending on the nature of the project, it may be useful to organize the
notebooks
directory into sub-directories (e.g., by team member, by sub-project).
-
Research notes should be placed in the
notebooks
directory and should be named using the following conventions:-
YYYY-MM-DD-AUTHOR_INITIALS-BRIEF_DESCRIPTION.md
or
-
YYYY-MM-AUTHOR_INITIALS-Notes.md
where the year and month indicate the month that the notes were written.
The time period covered by each set of research notes should be adjusted to match the pace of the project (which may change over time). For instance, if updates are made only a few times a year, it is reasonable to omit the month from the file name:
YYYY-AUTHOR_INITIALS-Notes.md
. -
-
When a non-trivial modification is made to an existing entry, the modification date should be indicated in a "last updated" line that immediately follows the entry header. For instance:
_Last Updated_: 2022-05-31
-
Jupyter notebooks should be placed in the
notebooks
directory and should be named using the following convention:YYYY-MM-DD-AUTHOR_INITIALS-BRIEF_DESCRIPTION.ipynb
where the date used for the notebook is approximately the date the original experiment was performed.
- Example:
2019-01-17-KC-information_theory_analysis.ipynb
- Example:
-
Notebook Modifications
-
When minor modifications are made to a notebook (e.g., code updates that do not materially change the results, addition of a few of related experiments), use a "History" block (in Markdown format) to document the changes. Example:
### History #### 2022-05-31 - Replaced `seaborn.distplot()` with `seaborn.histplot()` because `distplot()` has been deprecated.
-
When significant changes are made to a notebook (e.g., major modifications to algorithms, addition of experiments to explore a new direction), the modified notebook should saved to a new file with a name constructed from the modification date and the initials of the person who made the modifications.
-
- 600 requests per minute