Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace custom build system with Poetry #563

Open
eecavanna opened this issue Jun 19, 2024 · 8 comments · May be fixed by #792
Open

Replace custom build system with Poetry #563

eecavanna opened this issue Jun 19, 2024 · 8 comments · May be fixed by #792
Assignees
Labels
cleanup 🧹 Related to cleaning up code, documentation, etc. enhancement New feature or request

Comments

@eecavanna
Copy link
Collaborator

Background

Currently, the Runtime uses a custom dependency management solution that works something like this:

  1. Update a requirements/*.in file: main.in for production dependencies and dev.in for development dependencies
  2. Run the command $ make update-deps, which — under the hood — runs these commands, which update the main.txt and dev.txt files

    nmdc-runtime/Makefile

    Lines 6 to 16 in f0e8096

    update-deps:
    # --allow-unsafe pins packages considered unsafe: distribute, pip, setuptools.
    pip install --upgrade pip-tools pip setuptools
    pip-compile --upgrade --build-isolation \
    --allow-unsafe --resolver=backtracking --strip-extras \
    --output-file requirements/main.txt \
    requirements/main.in
    pip-compile --allow-unsafe --upgrade --build-isolation \
    --allow-unsafe --resolver=backtracking --strip-extras \
    --output-file requirements/dev.txt \
    requirements/dev.in
  3. Run the command $ make init, which — under the hood — runs these commands, which install the packages listed in main.txt and dev.txt into the current Python environment

Pros:

  • It is already implemented
  • (Add yours here...)

Cons:

  • People sometimes forget step 2
  • (Add yours here...)

Proposal

Switch to using Poetry.

If the Runtime used Poetry, the above process would become:

  1. Update the pyproject.toml file (there are different sections of the file for production versus development dependencies)
  2. Run $ poetry install to install the packages listed in the file — this will generate a poetry.lock file if one doesn't already exist (commit this file to the repo once it exists)
@eecavanna eecavanna added enhancement New feature or request backlog Not assigned to a sprint or not completed during a planned sprint. Needs to be reprioritized. decision needed tag issues that need to be unblocked by decisions labels Jun 19, 2024
@dwinston
Copy link
Collaborator

Sure, let's pursue this. The main wrinkle is reproducible Docker builds, which may just mean adding a

poetry export -f requirements.txt -o requirements.txt

step (https://pythonspeed.com/articles/pipenv-docker/)?

some more context that informed my initial decision wrt pip-tools:

@eecavanna eecavanna removed the decision needed tag issues that need to be unblocked by decisions label Jun 19, 2024
@eecavanna
Copy link
Collaborator Author

OK, thanks.

I was under the impression having the poetry.lock file in the repo (and not having or using a requirements.txt file anywhere, at any time) would suffice, as long as dependencies are installed via poetry install. The docs for poetry install (source) say:

The install command reads the pyproject.toml file from the current project, resolves the dependencies, and installs them.

If there is a poetry.lock file in the current directory, it will use the exact versions from there instead of resolving them. This ensures that everyone using the library will get the same versions of the dependencies.

Based on that, I don't think a requirements.txt file is necessary for reproducibility, specifically.

I briefly skimmed the first link in your message and saw that the introduction of a requirements.txt file may be to enable us to benefit more from cacheing in order to speed up container image builds. I am in favor of speeding up container image builds from their current durations.

@eecavanna
Copy link
Collaborator Author

I'll take a crack at this tonight. I don't think I'll finish tonight, but I think I'll at least come away with more knowledge about what will be involved with making this switch.

@eecavanna eecavanna added cleanup 🧹 Related to cleaning up code, documentation, etc. and removed backlog Not assigned to a sprint or not completed during a planned sprint. Needs to be reprioritized. labels Nov 23, 2024
@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 23, 2024

I've run into my first hiccup: incompatible ("contradictory") requirements (you can ignore the yellow line about the broken pre-existing environment).

image

I don't know whether this also happens with the current build system and I just haven't noticed it, or it is unique to either Poetry or the version specifiers I ended up with in the pyproject.toml file, which are:

mkdocs-mermaid2-plugin = "^1.2.1"
# ...
nmdc-schema = "==11.1.0"

Anyway, I'll try to resolve it.

I see that nmdc-schema (in its own project.toml file) has declared mkdocs-mermaid2-plugin as one of its production dependencies. I think that imposes that constraint on every project that uses nmdc-schema.

@eecavanna
Copy link
Collaborator Author

eecavanna commented Nov 23, 2024

I resolved that one by updating the version constraint to: ">=0.6.0, <0.7.0"

# Note: The specification for the latest version of this package is (currently) `^1.2.1`. The reason we
#       are specifying an older version is that we also depend on `nmdc-schema==11.1.0`, which specifies
#       that older version as one of its production dependencies.
#
# TODO: Inquire with `nmdc-schema` maintainers about addressing their dependence upon the
#       old version of this package.
#
mkdocs-mermaid2-plugin = ">=0.6.0, <0.7.0"

For reference, the requirements/main.txt file (from the old build system) says:

mkdocs-mermaid2-plugin==0.6.0
    # via
    #   -r requirements/main.in
    #   nmdc-schema

However, after making that change, when I re-ran $ poetry install, I got a new, unrelated conflict:

image

There may be additional conflicts beyond this second one. I do not plan to post about each additional on in a comment here. I am noting them in comments in the pyproject.toml file (alongside the many TODO comments there).

@eecavanna
Copy link
Collaborator Author

Yahoo! There were only 1-2 additional conflicts and they are all resolved now. Poetry has generated its poetry.lock file.

@eecavanna eecavanna linked a pull request Nov 23, 2024 that will close this issue
16 tasks
@eecavanna eecavanna linked a pull request Nov 23, 2024 that will close this issue
16 tasks
@eecavanna
Copy link
Collaborator Author

I created a draft PR (linked above by GitHub) containing the pyproject.toml and poetry.lock files. I think there's still a ways to go here (e.g. updating Dockerfiles, updating the Makefile, updating documentation).

@eecavanna
Copy link
Collaborator Author

The PR is continuing to take shape. All the GitHub Actions workflows are passing again (with Poetry in the mix and the requirements directory gone). I'll look at this some more next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup 🧹 Related to cleaning up code, documentation, etc. enhancement New feature or request
Projects
Status: Front of house
Development

Successfully merging a pull request may close this issue.

2 participants