Python Monorepo

This repository demonstrates a monorepo architecture using Poetry. It aims to manage multiple projects in a single repository and shares common code between these projects.

Key Concepts

Dependencies are managed using a pyproject.toml file in each directory.
Common code is packaged in the shared directory and referenced in each project.
Common code is referenced using relative paths, and it's not packaged as a wheel.
Development environment setup and deployment image building processes are managed using a Makefile and a Dockerfile.
Local package dependencies are handled without the use of scripts as much as possible. Instead, those processes are described in the Makefile and Dockerfile.
Each project has its own GitHub Actions workflow(CI) file.
Training ML Models in the CI/CD Pipeline.

Use Cases

This monorepo is intended to serve the following use cases:

Manage dependencies with Poetry.
Manage multiple projects in a single repository.
Share common code that is not dependent on a specific project.
Deploy each project as a Docker image.
Develop using a Devcontainer also.
Train ML Models triggered by Pull Requests.

Directory Structure

Here's a breakdown of the top-level items in our directory structure:

projects: This directory houses each individual project.
shared: This is where the common code shared across multiple projects is located.
pyproject.toml: This is the root configuration file where development tools like linters and formatters such as Flake8 and Black are specified.
.github/workflows: This directory contains the GitHub Actions workflows for each project. Each project has its own workflow file.

The structure is as follows:

python-monorepo
├── .flake8
├── .gitignore
├── .github
│   └── workflows
│       └── project-a.yml
├── README.md
├── poetry.lock
├── pyproject.toml
├── projects
│   ├── project-a
│   │   ├── .devcontainer
│   │   │   ├── Dockerfile
│   │   │   └── devcontainer.json
│   │   ├── Dockerfile
│   │   ├── Makefile
│   │   ├── poetry.lock
│   │   ├── pyproject.toml
│   │   ├── run.py
│   │   ├── src
│   │   │   ├── __init__.py
│   │   │   └── pipeline.py
│   │   └── tests
│   │       ├── __init__.py
│   │       └── test_pipeline.py
│   └── project-b
│       ├── src
│       │   └── __init__.py
│       └── tests
│           └── __init__.py
└── shared
    ├── pkg1
    │   ├── __pycache__
    │   ├── pkg1
    │   │   ├── __init__.py
    │   │   └── base_pipeline.py
    │   ├── poetry.lock
    │   └── pyproject.toml
    └── pkg2
        ├── pkg2
        │   ├── __init__.py
        │   └── calc.py
        ├── poetry.lock
        └── pyproject.toml

Adding Local Packages

To add local packages to your project, add them to the pyproject.toml file:

cd projects/project-xxx
poetry add ../../shared/pkg-yyy/

Add the copied shared package to the Makefile:

# Copy the shared package written in pyproject.toml
cp -r ../../shared/pkg-yyy $(TMP_DIR)/pkg-yyy

And also add the package to the Dockerfile:

# Copy the shared package written in pyproject.toml
COPY ./pkg-yyy/ /shared/pkg-yyy/

How to Use

Development

For setting up a development environment, navigate to the desired project directory and run make dep.

cd projects/project-xxx
make dep

Alternatively, you could use a Devcontainer.

Deployment

For deployment, navigate to the project directory and run make build.

cd projects/project-xxx
make build

Testing

To run tests, use the following command:

cd projects/project1-xxx
poetry run python -m pytest

Alternatively, you can run tests inside a Docker container:

cd projects/project1-xxx
make test

CI (Continuous Integration)

In Continuous Integration (CI), it is crucial to run code tests at the time of Pull Request creation and when merging into the main branch.

Here, we use "make build" and "make test" to perform builds and tests. Workflows are created on a per-project basis, and builds and tests are conducted only for projects with changes in their respective src directories. This approach eliminates the need to run build/test for every project every time, significantly saving time. Moreover, by standardizing the command interface for build and test across projects using Makefile, the description of the workflow becomes very simple.

Model Training with CI/CD

In the CI/CD pipeline, we can train ML models triggered by Pull Request comments. The comment would work as a command, and take hyperparameters as an argument.

It would be like this:

These hyperparameters can be passed to Hydra and overwritten.

This is useful for training models in a reproducible manner and for sharing the results with team members.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Monorepo

Key Concepts

Use Cases

Directory Structure

Adding Local Packages

How to Use

Development

Deployment

Testing

CI (Continuous Integration)

Model Training with CI/CD

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
img		img
projects		projects
shared		shared
.flake8		.flake8
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

matakanobu/python-monorepo

Folders and files

Latest commit

History

Repository files navigation

Python Monorepo

Key Concepts

Use Cases

Directory Structure

Adding Local Packages

How to Use

Development

Deployment

Testing

CI (Continuous Integration)

Model Training with CI/CD

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages