Skip to content

Commit

Permalink
OEP-18: Python Dependency Management
Browse files Browse the repository at this point in the history
  • Loading branch information
jmbowman committed Apr 2, 2018
1 parent 088b6f4 commit 6fa6176
Showing 1 changed file with 344 additions and 0 deletions.
344 changes: 344 additions & 0 deletions oeps/oep-0018-bp-python-dependencies.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
======================================
OEP-0018: Python Dependency Management
======================================

+-----------------+--------------------------------------------------------+
| OEP | :doc:`OEP-0018 <oep-0018-bp-python-dependencies>` |
+-----------------+--------------------------------------------------------+
| Title | Python Dependencies Management |
+-----------------+--------------------------------------------------------+
| Last Modified | 2018-03-27 |
+-----------------+--------------------------------------------------------+
| Authors | Jeremy Bowman <[email protected]> |
+-----------------+--------------------------------------------------------+
| Arbiter | Nimisha Asthagiri <[email protected]> |
+-----------------+--------------------------------------------------------+
| Status | Draft |
+-----------------+--------------------------------------------------------+
| Type | Best Practice |
+-----------------+--------------------------------------------------------+
| Created | 2018-03-27 |
+-----------------+--------------------------------------------------------+
| `Review Period` | 2018-03-27 - 2018-04-20 |
+-----------------+--------------------------------------------------------+
| `Resolution` | |
+-----------------+--------------------------------------------------------+

Abstract
========

Proposes best practices for declaring and maintaining dependencies on other
Python packages in Open edX software repositories.

Motivation
==========

The Open edX project includes dozens of Python software repositories, most of
which depend on certain other Python packages being installed in order to
function correctly. The simple methods we originally used to do this have
assorted drawbacks that have repeatedly caused problems over the past few
years: accidental upgrades to incompatible versions, strict installation
requirements that restrict the ability of downstream packages to manage their
own dependency versions, lack of clarity regarding the full set of packages
actually depended upon, etc.

Outlined here is a recommended standard for declaring dependencies on other
Python packages which resolves most of these issues and will let us make all
the Open edX Python packages consistent with each other (and many other open
source Python projects) for ease of understanding and maintenance.

Specification
=============

The key to successful Python dependency management is to break it down into
four parts:

1. Identify the different contexts in which dependencies will need to be
installed.
2. For each of these contexts, declare the direct dependencies that will be
needed. Use the least restrictive constraints which should allow pip to
install a working set of dependencies.
3. Auto-generate from the high-level dependencies declaration a complete
set of exact package versions that are known to work for a Python
`virtualenv`_ created for that context.
4. Automate updates of the detailed dependencies listing for each context.

.. _virtualenv: https://virtualenv.pypa.io/

Identify Usage Contexts
-----------------------

The dependencies of Python software are typically installed or run in a
variety of different contexts over the course of developing and using it.
The set of dependencies needed to perform a task can easily vary between these
contexts. For example:

* Just the standard set of core dependencies for execution on a production
server to perform its primary purpose (``base.in``)
* Additional dependencies which are only needed when optional extra features
of a package are desired
* Assorted testing libraries to run automated test suites (``test.in``)
* Static code analysis tools to perform code quality checks (``quality.in``)
* The utilities called directly by a CI server to create and use one or more
virtualenvs and report code coverage statistics to a 3rd-party service
(``jenkins.in``, ``travis.in``)
* `Sphinx`_ and other utilities used to generate developer documentation
(``doc.in``)
* Additional utilities needed to perform common development tasks (``dev.in``)
* Utilities that a particular developer likes to use with a repository, but
aren't strictly needed for any of the regular contexts (``private.in``).

.. _Sphinx: http://www.sphinx-doc.org/

Declare Direct Dependencies
---------------------------

As indicated above, some of the usage contexts have a standard filename used in
the ``requirements`` directory of an Open edX repository to list dependencies.
Others will have an appropriate filename custom to that repository's unique
context. Each of these is a ``pip``-compatible `requirements file`_ listing
the direct dependencies needed for that context. Beyond complying with the
file format, there are a few guidelines each of these files should follow:

* The file should start with a brief comment explaining the context in which
these dependencies are needed. Examples can be found in the
`cookiecutter-django-app`_ repository.
* Each listed dependency should have a brief end-of-line comment explaining
its primary purpose(s) in this context. These comments typically start at
the 27th character, but this is just a convention for consistency with files
generated by ``pip-compile``.
* Version constraints should only be used to exclude dependency versions which
are known (or strongly suspected) to not work in this context.
* Indirect dependencies (used by dependencies but not directly by the code in
the repository itself) should not be listed unless a constraint is needed to
enforce a compatible version; these are automatically detected and captured
elsewhere as described below.
* `Environment markers`_ should be used as necessary to indicate dependencies
which should only be installed on specific operating systems, Python
versions, etc.
* If the dependencies in one context are a superset of those in another one,
do not repeat the dependencies. Instead, explicitly include the file
listing the common dependencies in the superset context file (e.g.
``-r base.txt``) and explain in an end-of-line comment why that set of
dependencies is also needed in this context. Note that the ``pip-compile``
output file should be included rather than the looser requirements file it
was generated from, as this ensures that the same versions of packages are
installed in the different contexts. We don't want to run tests with
different versions of dependencies than we use in production, for example.
* Avoid direct links to packages local directories, GitHub, or other version
control systems if at all possible; all dependencies should be installed
from `PyPI`_. If you think you're in one of the rare circumstances where
installing a package from a URL is appropriate, see the notes below on
`Installing Dependencies from URLs`_

If the repository contains a Python package, base dependencies also need to
be specified in ``setup.py``. These can often be derived from
``requirements/base.in`` with a simple Python function declared in
``setup.py`` itself, but for packages with few base dependencies it may be
better to just list them in both places. Just add comment reminders in both
places to also update the other dependency listing when making any changes.

.. _requirements file: https://pip.readthedocs.io/en/1.1/requirements.html
.. _cookiecutter-django-app: https://github.com/edx/cookiecutter-django-app/tree/master/%7B%7Bcookiecutter.repo_name%7D%7D/requirements
.. _Environment markers: https://www.python.org/dev/peps/pep-0508/#environment-markers
.. _PyPI: https://pypi.org/

Generate Exact Dependency Specifications
----------------------------------------

Although we want to keep our manually edited requirements files very simple,
we need a separate set of requirements files which list every single package
needed for each usage context, with exact versions of each for reproducible
test runs and consistent development and production environments. We can
generate these automatically using `pip-tools`_, which consists of two related
utilities:

* ``pip-compile`` generates a requirements file from one or more high-level
input requirements files, listing exact versions of every listed and
indirect dependency needed to satisfy the given constraints.
* ``pip-sync`` ensures that the current virtualenv contains exactly (and only)
the packages listed in the given requirements files, installing, upgrading,
and uninstalling packages as needed.

Open edX packages use an ``upgrade`` make target to use ``pip-compile`` to
automatically update the detailed requirements files (``requirements/*.txt``)
to use the newest available packages which satisfy the constraints in the
direct dependencies files. These generated files are then used anywhere that
runs a command to install dependencies: ``tox.ini``, ``.travis.yml``, the
``requirements`` make target (for updating a local development environment),
etc.

Sometimes ``pip-compile`` will be unable to find a suitable version of a
dependency for the output file because there are incompatible version
constraints in the input files and/or the stated installation requirements
of the other dependencies. In cases like this, installing and running
`pipdeptree`_ can help identify the conflicting constraints so at least one
of them can be sufficiently relaxed such that a version of the dependency
exists which satisfies them all.

.. _pip-tools: https://github.com/jazzband/pip-tools
.. _pipdeptree: https://github.com/naiquevin/pipdeptree

Automate Updates of Exact Dependency Specifications
---------------------------------------------------

While we want all dependencies explicitly pinned in order to benefit from
consistent testing and development environments, it isn't acceptable to leave
these versions untouched for long stretches of time. Packages we depend on
routinely release new versions to address security issues, fix bugs, and add
new features. While we don't necessarily need to update our repositories
every time a new dependency version is released, we do want to keep them
current enough that upgrading a single package to fix a known issue doesn't
require suddenly adapting to a few years' worth of API changes that we didn't
pay attention to.

Each Open edX repository should have the following:

* An ``upgrade`` make target as described above, to update the pinned versions
of all dependencies (and account for any new or removed indirect
dependencies).
* An automated test suite with reasonably good code coverage, configured to
be run on new GitHub pull requests.
* A service configured to periodically auto-generate a GitHub pull request
that tests the output of running ``make upgrade`` (if it results in any
changes). This can either be a service such as `requires.io`_ which tracks
new releases of Python package dependencies, or a recurring scheduled job.
* At least one designated maintainer who receives notifications of the
generated pull requests and will merge or fix them as needed. This
maintainer should scan the changelog for each upgraded package to look for
changes that merit closer inspection; services like `requires.io`_ and
`AllMyChanges.com`_ can make this easier.

.. _requires.io: https://requires.io/
.. _AllMyChanges.com: https://allmychanges.com/

Installing Dependencies from URLs
---------------------------------

As noted above, you should generally avoid installing requirements from a URL
or local directory instead of PyPI. But there are a few circumstances where
it can be appropriate:

* You need to test a release candidate of the dependency to make sure it will
work with your code.
* You critically need a fix for a package which has not yet been included in
a release, and you cannot arrange for a release to be made in a timely
manner.

In most other circumstances, the package should be added to PyPI instead.
There are several good reasons for this:

* Specified VCS branches, commits, and tags can all be deleted from a
repository at any time, suddenly making it impossible to find and install
the dependency.
* Editable requirements (starting with "-e ") are downloaded and/or inspected
with each installation of the requirements file, even if the correct version
is already installed. This can significantly slow down updates of installed
requirements.
* Packages installed from local directories don't reflect any changes to
package metadata (like required package versions) until the version number
is incremented or the package is uninstalled; just installing again doesn't
help.
* Package URLs tend to be long and difficult to read, with the actual name of
the package hidden in the middle or not even present at all.
* As of this writing, ``pip-tools`` still has some bugs in handling packages
installed from local directories or URLs that require special care to work
around. `Non-editable URL installations`_ are not supported, and
`relative local paths are expanded to absolute paths`_. These can be
partially worked around via a post-processing script for the generated
requirements files; an example can be found in `edx-platform`_ at
``scripts/post-pip-compile.sh``.

If you do need to include a package at a URL, it should be editable (start with
"-e ") and have both the package name and version specified (end with
"#egg=NAME==VERSION").

.. _Non-editable URL installations: https://github.com/jazzband/pip-tools/issues/355
.. _relative local paths are expanded to absolute paths: https://github.com/jazzband/pip-tools/issues/204
.. _edx-platform: https://github.com/edx/edx-platform

Rationale
=========

The practices outlined here help prevent the following problems that we have
encountered in the past:

* A new deployment of an Open edX release fails because an unpinned indirect
dependency recently released a backwards-incompatible version.
* Tests unrelated to a new code change fail, because an unpinned dependency
was upgraded to a backwards-incompatible version. This can be difficult
to diagnose because the upgrade doesn't appear in the diff of pending
changes.
* Tests have been running against a particular set of pinned versions for
years, but we now need to upgrade one (like Django) which requires also
upgrading several of the other dependencies. This can force dealing with
a few years' worth of backwards-incompatible changes in multiple packages
all at once, whereas dealing with them one at a time every few months in
smaller pull requests would have been more manageable.
* We have a different version of a dependency installed than we expect,
because the constraints imposed on pip for choosing a version vary between
different requirements files and we install them one file at a time.
* We keep using years-old package versions despite the availability of newer
versions with accumulated bug fixes and performance improvements.
* We install in production environments packages which are only needed for
testing, because we didn't make a clean distinction between the dependencies
for different usage contexts. This slows down deployments.
* We try to exhaustively pin all indirect dependencies manually, but miss some
(especially when a seemingly innocuous upgrade adds some new dependencies).
* We keep installing a package long after we stopped using it, because nobody
remembers why it was added to the requirements file (especially true for
indirect dependencies that were later dropped as requirements of the package
we use directly).
* We install an exhaustive set of testing dependencies in Travis, even though
we really only need it to run tox and codecov; the rest of the testing
dependencies are installed in a separate virtualenv created by tox, which
should have a separate requirements file.
* An attempt to pin dependencies in setup.py (or parse its dependencies
automatically from a requirements file) forces us to change that package
before we can upgrade one of those dependencies in another repository
using that package.
* We add a dependency without realizing that it requires multiple additional
indirect dependencies; we may have chosen an alternative if that had been
apparent.

Reference Implementation
========================

Many of the Open edX repositories have already begun to comply with the
recommendations outlined here. In particular, repositories generated using
`cookiecutter-django-app`_ should be configured correctly from the outset.
These may also be useful for reference:

* `django-user-tasks <https://github.com/edx/django-user-tasks>`_
* `edx-completion <https://github.com/edx/completion>`_
* `XQueue <https://github.com/edx/xqueue/>`_

Rejected Alternatives
=====================

`pipenv`_ is a relatively new utility for managing Python dependencies,
written by Kenneth Reitz (author of the `requests`_ package). Although it
recently became the default dependency management tool recommendation of the
`Python Packaging User Guide`_, it lacks some features that we strongly want
for Open edX:

* The ability to specify more than 2 sets of dependencies (core and
development)
* The ability to add comments to the dependencies listing explaining why each
one is needed
* Indication of which other dependencies caused the inclusion of indirect
dependencies in the full set of requirements
* Easy interoperability with `tox`_, especially for testing multiple versions
of a major dependency

As a younger package than ``pip-tools``, it also seems to have more
significant still-unresolved problems, although those are gradually being
fixed.

.. _pipenv: https://docs.pipenv.org/
.. _requests: http://python-requests.org/
.. _Python Packaging User Guide: https://packaging.python.org/tutorials/managing-dependencies/#managing-dependencies
.. _tox: https://tox.readthedocs.io/

Change History
==============

0 comments on commit 6fa6176

Please sign in to comment.