Skip to content
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.

RD2024-10: Dev workflow setup and guides #5

Merged
merged 13 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions .github/workflows/pr_checks.yaml → .github/workflows/main.yaml
Original file line number Diff line number Diff line change
@@ -1,34 +1,44 @@
name: PR Checks
name: Tests

on: [push]
on:
push:
branches:
- "main"
pull_request:
branches:
- "**"

jobs:
pytest_ruff:
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Python 3.10
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install poetry build tool
run: |
pip install poetry

- name: Install test dependencies
run: |
pip install -r requirements/test.txt
poetry install --only dev
continue-on-error: true

- name: Lint with Ruff
run: |
ruff --output-format=github .
poetry run ruff --output-format=github .
continue-on-error: false

- name: Install full dependencies
run: |
pip install ".[all]"
poetry install
continue-on-error: true

- name: Run unit tests
run: |
pytest
poetry run pytest
continue-on-error: false
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,10 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

# Ruff
.ruff_cache

# Poetry
poetry.lock
45 changes: 45 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Contributing

## Code style

This repository uses [Ruff](https://docs.astral.sh/ruff/) for Python formatting and linting.
Ruff should be installed automatically in your environment as part of the package's
development dependencies.

You can execute Ruff by calling `ruff --fix .` or `ruff format .` from the workspace root.
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Ruff will pick up the configuration defined in the `pyproject.toml` file automatically.

## Testing a development branch

`flamingo` is intended to be installed as a pip requirement in the runtime environment of a Ray job.
However, it is often desirable to test local branches on Ray before publishing a new version of the library.
This is possible submitting a Ray job with a runtime environment that points to your
development branch of the `flamingo` repo.

To do so, follow the steps:

1. Export a copy of the package dependencies by running:

```
poetry export --without-hashes --with finetuning,evaluation -o requirements.txt
```

The following command will create a `requirements.txt` file in the repository
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
that contains the dependencies for the `finetuning` and `evaluation` job groups:

2. When submitting a job to cluster, specify in the Ray runtime environment the following:

- `py_modules`: Local path to the `flamingo` module folder (located at `src/flamingo` in the workspace).
- `pip`: Local path to the `requirements.txt` file generated above.

3. Submit your job with an entrypoint command that invokes `flamingo` directly as a module, eg:

```
python -m flamingo run finetuning --config config.yaml
```

This is necessary because `py_modules` uploads the `flamingo` module
but does not install its entrypoint in the environment path.

An example of this workflow can be found in the `examples/dev_workflow.ipynb` notebook.

56 changes: 46 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,56 @@
<img src="https://github.com/mozilla-ai/flamingo/blob/main/assets/flamingo.png" width="300">
</p>

## Installation
## Getting started

sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Install the package for local development in your chosen Python environment by running:
### Minimum Python version

This library is developed with the same Python version as the Ray cluster
to avoid dependency/syntax errors when executing code remotely.
Currently, installation requires Python between `[3.10, 3.11)` to match the global
cluster environment (Ray cluster is running 3.10.8).

### Installation

This project is built using the [Poetry](https://python-poetry.org/docs/) build tool.
First, install Poetry in your local environment via
```
pip install -e ".[all]"
curl -sSL https://install.python-poetry.org | python3 - -y
```
or see the [installation guide](https://python-poetry.org/docs/#installation)
for more instructions.

Dependency groups are defined for the logical job groups accessible from the library.
See `pyproject.toml` for exact information.
Once Poetry is installed, you can install `flamingo` for development by running
```
poetry lock
poetry install
```
This will install an editable version of the package along with all of its dependency groups.
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
Poetry should recognize your active virtual environment during installation
and install the package dependencies there.

### Python version
The `pyproject.toml` file defines dependency groups for the logical job types in the package.
Individual dependency groups can be installed by running
`poetry install --with <group1>,<group2>` or `poetry install --only <group>`.

This library is developed with the same Python version as the Ray cluster
to avoid dependency/syntax errors when executing code remotely.
Currently, installation requires at least Python 3.10 to match the global
cluster environment (Ray cluster is running 3.10.8).
See the [contributing](CONTRIBUTING.md) guide for more information on development workflows.

### Usage

`flamingo` exposes a simple CLI with a few commands, one for each Ray job type.
Jobs are expected to take as input a YAML configuration file
that contains all necessary parameters/settings for the work.
See the `examples/configs` folder for examples of the configuration structure.

Once installed in your environment, usage is as follows:
```
# Simple test
flamingo run simple --config simple_config.yaml

# LLM finetuning
flamingo run finetuning --config finetuning_config.yaml

# LLM evaluation
flamingo run lm-harness --config lm_harness_config.yaml
```
When submitting a job to Ray, the above commands should be used as your job entrypoints.
1 change: 1 addition & 0 deletions examples/configs/simple_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
magic_number: 42
161 changes: 161 additions & 0 deletions examples/dev_workflow.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
{
sfriedowitz marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"id": "123e34e9-70f8-42ab-b790-b59ddc01b1f3",
"metadata": {},
"source": [
"# Development Workflow"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "8c0f15ed-77dc-44ce-adb6-d1b59368f03c",
"metadata": {},
"outputs": [],
"source": [
"# Required imports\n",
"import os\n",
"from pathlib import Path\n",
"from ray.job_submission import JobSubmissionClient\n",
"\n",
"# flamingo should be installed in your development environment\n",
"import flamingo"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "969884e5-d815-42d9-9d4e-3b8f890657e2",
"metadata": {},
"outputs": [],
"source": [
"# Create a submission client bound to a Ray cluster\n",
"# Note: You will likely have to update the cluster address shown below\n",
"client = JobSubmissionClient(f\"http://10.146.174.91:8265\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "3258bb97-d3c6-4fee-aa0c-962c1411eaa7",
"metadata": {},
"outputs": [],
"source": [
"# Determine local module path for the flamingo repo\n",
"flamingo_module = Path(flamingo.__file__).parent"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "1db3b9aa-99a4-49d9-8773-7b91ccf89c85",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SimpleJobConfig(magic_number=42)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load and inspect the config file\n",
"# Not mandatory for job submission, but helpful when debugging\n",
"from flamingo.jobs.simple import SimpleJobConfig\n",
"\n",
"CONFIG_DIR = Path(\"configs\")\n",
"CONFIG_FILE = \"simple_config.yaml\"\n",
"\n",
"config = SimpleJobConfig.from_yaml_file(CONFIG_DIR / CONFIG_FILE)\n",
"config"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "b81b36be-35ce-4398-a6d4-ac1f719f5c95",
"metadata": {},
"outputs": [],
"source": [
"# Construct the runtime environment for your job submission\n",
"# py_modules contains the path to the local flamingo module directory\n",
"# pip contains an export of the dependencies for the flamingo package (see CONTRIBUTING.md for how to generate)\n",
"runtime_env = {\n",
" \"working_dir\": str(CONFIG_DIR),\n",
" \"env_vars\": {\"WANDB_API_KEY\": os.environ[\"WANDB_API_KEY\"]}, # If running a job that uses W&B\n",
" \"py_modules\": [str(flamingo_module)],\n",
" \"pip\": \"/path/to/flamingo/requirements.txt\"\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "4bd300f9-b863-4413-bd3a-430601656816",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2024-01-20 15:32:25,620\tINFO dashboard_sdk.py:385 -- Package gcs://_ray_pkg_ba0036a72fdb32af.zip already exists, skipping upload.\n",
"2024-01-20 15:32:25,814\tINFO dashboard_sdk.py:385 -- Package gcs://_ray_pkg_8f96eb40a239b233.zip already exists, skipping upload.\n"
]
},
{
"data": {
"text/plain": [
"'raysubmit_tWfixDMGHavrhHPF'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Submit the job to the Ray cluster\n",
"# Note: flamingo is invoked by 'python -m flamingo' since the CLI is not installed in the environment\n",
"client.submit_job(\n",
" entrypoint=f\"python -m flamingo run simple --config {CONFIG_FILE}\",\n",
" runtime_env=runtime_env\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c82892d-bcdf-42e6-b95e-2393e01ab7d6",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading