Skip to content
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.

RD2024-10: Dev workflow setup and guides #5

Merged
merged 13 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions .github/workflows/pr_checks.yaml → .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -1,34 +1,44 @@
name: PR Checks
name: Tests

on: [push]
on:
push:
branches:
- "main"
pull_request:
branches:
- "**"

jobs:
pytest_ruff:
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Python 3.10
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install poetry build tool
run: |
pip install poetry

- name: Install test dependencies
run: |
pip install -r requirements/test.txt
poetry install --only dev
continue-on-error: true

- name: Lint with Ruff
run: |
ruff --output-format=github .
poetry run ruff --output-format=github .
continue-on-error: false

- name: Install full dependencies
run: |
pip install ".[all]"
poetry install
continue-on-error: true

- name: Run unit tests
run: |
pytest
poetry run pytest
continue-on-error: false
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,10 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

# Ruff
.ruff_cache

# Poetry
poetry.lock
33 changes: 33 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Contributing

## Code style

This repository uses [Ruff](https://docs.astral.sh/ruff/) for Python formatting and linting.
Ruff should be installed automatically in your environment as part of the package's
development dependencies.

You can execute Ruff by calling `ruff --fix .` or `ruff format .` from the workspace root.
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Ruff will pick up the configuration defined in the `pyproject.toml` file automatically.

## Testing a development branch

`flamingo` is intended to be installed as a pip requirement in the runtime environment of a Ray job.
However, when developing the package locally it is desirable to be able to test your branch
by running jobs from it before publishing a new library version.
This is possible by submitting your Ray job with a runtime environment that points to your local,
in-development copy of the `flamingo` repo.

This can be done by the following steps:
1. Export a copy of the package dependencies by running
`poetry export --without-hashes --with finetuning,evaluation -o requirements.txt`.
This will create a `requirements.txt` file in the repository that contains the dependencies
for the `finetuning` and `evaluation` job groups.
2. In your Ray runtime environment, specify the following:
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
- `py_modules`: Local path to the `flamingo` module folder (located at `src/flamingo` in the workspace).
- `pip`: Local path to the `requirements.txt` file generated above.
3. Submit your job with an entrypoint command that invokes `flamingo` directly as a module,
e.g., `python -m flamingo run finetuning --config cofig.yaml`.
This is necessary because `py_modules` simply uploads the `flamingo` module
but does not install its entrypoint in the environment path.

An example of this workflow can be found in the `examples/dev_workflow.ipynb` notebook.
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
24 changes: 19 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,30 @@
<img src="https://github.com/mozilla-ai/flamingo/blob/main/assets/flamingo.png" width="300">
</p>

## Installation
## Getting started

sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Install the package for local development in your chosen Python environment by running:
### Installation

This project is built using the [Poetry](https://python-poetry.org/docs/) build tool.
Follow the [installation guide](https://python-poetry.org/docs/#installation)
to install Poetry into your local Python environmennt.

sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Once Poetry is installed, you can install `flamingo` for development by running:

```
pip install -e ".[all]"
poetry lock
poetry install
```

Dependency groups are defined for the logical job groups accessible from the library.
See `pyproject.toml` for exact information.
This will install an editable version of the package along with all of its dependency groups.
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Poetry should recognize your active virtual environment during installation
and install the package dependencies there.

The `pyproject.toml` file defines dependency groups for the logical job types in the package.
Individual dependency groups can be installed by running
`poetry install --with <group1>,<group2>` or `poetry install --only <group>`.

See the [contributing](CONTRIBUTING.md) guide for more information on development workflows.

### Python version

Expand Down
116 changes: 116 additions & 0 deletions examples/dev_workflow.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
{
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"id": "123e34e9-70f8-42ab-b790-b59ddc01b1f3",
"metadata": {},
"source": [
"# Development Workflow"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8c0f15ed-77dc-44ce-adb6-d1b59368f03c",
"metadata": {},
"outputs": [],
"source": [
"# Required imports (after flamingo is installed in your environment)\n",
"import os\n",
"from pathlib import Path\n",
"from ray.job_submission import JobSubmissionClient\n",
"\n",
"import flamingo"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "969884e5-d815-42d9-9d4e-3b8f890657e2",
"metadata": {},
"outputs": [],
"source": [
"# Create a submission client bound to a Ray cluster\n",
"# Note: You will likely have to update the cluster address shown below\n",
"client = JobSubmissionClient(f\"http://10.146.174.91:8265\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3258bb97-d3c6-4fee-aa0c-962c1411eaa7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PosixPath('/Users/sfriedowitz/workspace/flamingo/src/flamingo')"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Determine local module path for the flamingo repo\n",
"flamingo_module = Path(flamingo.__file__).parent\n",
"flamingo_module"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b81b36be-35ce-4398-a6d4-ac1f719f5c95",
"metadata": {},
"outputs": [],
"source": [
"# Construct the runtime environment for your job submission\n",
"# py_modules contains the path to the local flamingo module directory\n",
"# pip contains an export of the dependencies for the flamingo package (see CONTRIBUTING.md)\n",
"runtime_env = {\n",
" \"working_dir\": \"/path/to/directory/with/finetunning_config.yaml\",\n",
" \"env_vars\": {\"WANDB_API_KEY\": os.environ[\"WANDB_API_KEY\"]}, # If running a job that uses W&B\n",
" \"py_modules\": [str(flamingo_module)],\n",
" \"pip\": \"/path/to/flamingo/requirements.txt\"\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4bd300f9-b863-4413-bd3a-430601656816",
"metadata": {},
"outputs": [],
"source": [
"# Submit the job to the Ray cluster\n",
"# Note: flamingo is invoked by 'python -m flamingo' since the CLI is not installed in the environment\n",
"client.submit_job(\n",
" entrypoint=\"python -m flamingo run finetuning --config finetuning_config.yaml\",\n",
" runtime_env=runtime_env\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
46 changes: 33 additions & 13 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,26 +1,46 @@
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[project]
[tool.poetry]
name = "flamingo"
version = "0.1.0"
description = "Ray-centric job library for training and evaluation"
description = "Ray-centric job library for training and evaluation."
repository = "https://github.com/mozilla-ai/flamingo"
readme = "README.md"
requires-python = ">=3.10,<3.11"
dynamic = ["dependencies", "optional-dependencies"]
authors = []
packages = [{ include = "flamingo", from = "src" }]

[tool.setuptools.dynamic]
dependencies = { file = ["requirements/ray.txt", "requirements/core.txt"] }
[tool.poetry.dependencies]
python = ">3.10,<3.11"
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
click = "8.1.7"
torch = "2.1.2"
scipy = "1.10.1"
wandb = "0.16.2"
protobuf = "3.20.0"
pydantic = "1.10.14"
pydantic-yaml = "1.2.0"
ray = { version = "2.8.0", extras = ["default"] }

optional-dependencies.ludwig = { file = "requirements/ludwig.txt" }
[tool.poetry.dev-dependencies]
ruff = "0.1.7"
pytest = "7.4.3"
pytest-cov = "4.1.0"
jupyter = "1.0.0"

optional-dependencies.test = { file = "requirements/test.txt" }
[tool.poetry.group.finetuning.dependencies]
datasets = "2.16.1"
transformers = "4.36.2"
accelerate = "0.26.1"
peft = "0.7.1"
trl = "0.7.10"
bitsandbytes = "0.42.0"

# TODO: Resolve dependency conflicts with ludwig
optional-dependencies.all = { file = ["requirements/test.txt"] }
[tool.poetry.group.evaluation.dependencies]
lm-eval = "0.4.0"
einops = "0.7.0"

[project.scripts]
[tool.poetry.scripts]
flamingo = "flamingo.__main__:cli"

[tool.pytest.ini_options]
Expand Down
19 changes: 0 additions & 19 deletions requirements/core.txt

This file was deleted.

1 change: 0 additions & 1 deletion requirements/ludwig.txt

This file was deleted.

1 change: 0 additions & 1 deletion requirements/ray.txt

This file was deleted.

3 changes: 0 additions & 3 deletions requirements/test.txt

This file was deleted.

9 changes: 0 additions & 9 deletions src/flamingo/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,6 @@ def run_finetuning(config: str) -> None:
run_finetuning(config)


@run.command("ludwig", help="Run a Ludwig training job.")
@click.option("--config", type=str)
@click.option("--dataset", type=str)
def run_ludwig(config: str, dataset: str) -> None:
from flamingo.jobs.ludwig import run_ludwig

run_ludwig(config, dataset)


@run.command("lm-harness", help="Run an lm-harness LLM evaluation job.")
@click.option("--config", type=str)
def run_lm_harness(config: str) -> None:
Expand Down
3 changes: 0 additions & 3 deletions src/flamingo/jobs/ludwig/__init__.py

This file was deleted.

8 changes: 0 additions & 8 deletions src/flamingo/jobs/ludwig/entrypoint.py

This file was deleted.