-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OEP-0018: Python Dependency Management #56
Conversation
* An attempt to pin dependencies in setup.py (or parse its dependencies | ||
automatically from a requirements file) forced us to change that package | ||
before we could upgrade one of those dependencies in a downstream | ||
repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this bullet, perhaps you can say "downstream repository, and in another case an upstream repository" given that openedx/edx-drf-extensions#26 was done for edx-platform.
c3882a5
to
6fa6176
Compare
contexts. For example: | ||
|
||
* Just the standard set of core dependencies for execution on a production | ||
server to perform its primary purpose (``base.in``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have not explained what "base.in" is, or why you are mentioning it here. Maybe a paragraph at the top giving the overview ("We'll use pip-tools to manage multiple contexts."), then you can tie these filenames into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was hoping to avoid front-loading too much description of the overall process, but there probably does need to be some advance mention of what these names are for.
`cookiecutter-django-app`_ repository. | ||
* Each listed dependency should have a brief end-of-line comment explaining | ||
its primary purpose(s) in this context. These comments typically start at | ||
the 27th character, but this is just a convention for consistency with files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
27th!? We live in a 4-space-indent world, and pip-compile goes to the 27th character? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the purpose: wouldn't this often be obvious or trivial? It sounds kind of like requirement comments on every line of a program in an intro programming course: they don't end up being useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of those leading 26 characters are often the package name and optional version constraint. pip-tools lines up the start of the comments for ease of reading, I figured it just makes sense to match that for consistency.
Unlike imports in a Python file where you can see later in the file what they're used for, seeing something like "singledispatch" in a requirements file tells you very little about why it's there. The main reason for adding the comments is to make it easier to identify requirements that are no longer needed, which is tough if you're not even sure what half of them do. I'm thinking things like "celery task broker" for redis and "Used in shoppingcart for extracting/parsing pdf text" for pdfminer; the reason for including as a dependency, not the overview from the package metadata. I've typically found it very tough to prune out obsolete dependencies without having this kind of comments in the file, but I'm open to arguments against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like something that would easily get out of date. What would the comment for "singledispatch" be? "# So that I can do single dispatch"? If I add it, and put "# used in the foobar manager" in the requirements file, how will that comment get updated if it's also used elsewhere, or if it's no longer used in its original spot? Wouldn't grepping the source for imports be a better way to understand whether and how packages are being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I've done this before is that I put the main thing the package is currently used for in there. If I later notice when scanning the file that the listed use is no longer relevant, I look to see if it's obsolete or still used somewhere else. If the former, I delete it; if the latter, I update the comment with the current usage info. If I have to hunt down all uses of every single package I don't recognize to see if any of them are obsolete, I'm basically never going to bother...it just takes too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the indentation: looking at the .in files in the edx-platform pull request, on a few lines, 26 isn't enough, and the alignment is bumped out. I'd choose a larger number that is a multiple of four, and let pip-tools do what it wants. In some ways, it's better to have a clear distinction between the tool-generated and human-generated comments anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the comments themselves:
lxml==3.8.0 # XML parser
I don't find this useful. I don't know how to get the useful comments without requiring people to add useless ones like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled a bit with the edx-sandbox/shared.in
file comments because it seems like a grab bag of "make this stuff available to instructors in case they want to use it". So yeah, those ended up being pretty generic, but I think that's more the exception than the rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am skeptical that these comments will be accurate and specific enough to be useful in the long run. But we can give it a try. At least requiring a comment will slow people down from adding too many dependencies! :)
``-r base.txt``) and explain in an end-of-line comment why that set of | ||
dependencies is also needed in this context. Note that the ``pip-compile`` | ||
output file should be included rather than the looser requirements file it | ||
was generated from, as this ensures that the same versions of packages are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the "include the output file rather than the requirements file." An example would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is getting kind of long and hard to follow for a bullet point. I'll see if I can clarify it a bit.
was generated from, as this ensures that the same versions of packages are | ||
installed in the different contexts. We don't want to run tests with | ||
different versions of dependencies than we use in production, for example. | ||
* Avoid direct links to packages local directories, GitHub, or other version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/packages/packages'/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to write "...packages in local directories, ..."; fixing.
``requirements/base.in`` with a simple Python function declared in | ||
``setup.py`` itself, but for packages with few base dependencies it may be | ||
better to just list them in both places. Just add comment reminders in both | ||
places to also update the other dependency listing when making any changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As big a fan as I am of not overengineering, it seems like better advice to always use the requirements files if they exist. "Few base dependencies" doesn't seem like a good reason to duplicate the information, especially since adding the reminder comments will lose the terseness benefits of the few dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just not thrilled about copying another function into all setup.py files. Better than the alternatives, I suppose.
installing a package from a URL is appropriate, see the notes below on | ||
`Installing Dependencies from URLs`_ | ||
|
||
If the repository contains a Python package, base dependencies also need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else could a repository contain? IDAs don't have setup.py I guess? Maybe better is: "if the repository contains a setup.py"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, how about: "If the repository contains a setup.py
file defining a Python package, the
base dependencies also need to be specified there."
6fa6176
to
fe58927
Compare
the packages listed in the given requirements files, installing, upgrading, | ||
and uninstalling packages as needed. | ||
|
||
Open edX packages use an ``upgrade`` make target to use ``pip-compile`` to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this, "do use," or "should use"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be technical, "many already do use, and the rest probably should use".
`AllMyChanges.com`_ can make this easier. | ||
|
||
.. _requires.io: https://requires.io/ | ||
.. _AllMyChanges.com: https://allmychanges.com/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why, but AllMyChanges doesn't know about the last five releases of coverage.py: https://allmychanges.com/p/python/coverage/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a number of packages haven't been getting updated for a year or so, the issue was reported 6 days ago: https://github.com/AllMyChanges/allmychanges.com/issues/65 . Do you think it's still worth mentioning here, or pull until that gets resolved? It's missing the last several Django releases also. requires.io is current on these, but missing changelog links for several packages that are included on AllMyChanges.com.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just a sentence like, "At the time of writing, these services were still in development. Carefully evaluate which is currently reliable."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting. does requires.io automatically pull in github's release history? From what I can tell it looks like it needs a CHANGES.txt file or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These services parse CHANGES.txt (or some other file). GitHub histories are not good sources of changes, they are too noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
releases are not noisy at all in general - https://github.com/jgm/pandoc/releases, https://github.com/tqdm/tqdm/releases, etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, i misread your comment as "commit history." I don't know if they use GitHub releases. Frankly, I hope they don't. A text file in the repo is a low-tech way to make the information available to everyone, regardless of how the source is stored or transmitted. Unless I've misunderstood something, GitHub releases are a proprietary feature that separates information from the source tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree changes.txt should take precedence, but where none is available it would be nice to fallback to things like github release notes. It's convenient to use github's release system to link to issues, upload binaries etc. anyway this is probably a discussion not meant for here. I was only asking if anyone knows of good crawlers of https://api.github.com/repos/{REPO}/releases, not if this is a good idea :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requires.io changelog parsing, at least as of when it was first introduced, is summarized in this blog post. At least at the time, it only looked in PyPI and for appropriately named files in the repo, not at the GitHub releases list or commit history.
There's also the changelog parser for Gemnasium at https://github.com/tech-angels/vandamme , it seems somewhat more comprehensive but I haven't looked specifically to see what it scrapes from GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I just implemented https://raw.githubusercontent.com/wiki/tqdm/tqdm/releases.py (result at https://github.com/tqdm/tqdm/wiki/Releases) until https://github.com/AllMyChanges/allmychanges.com/issues/65 gets fixed
``scripts/post-pip-compile.sh``. | ||
|
||
If you do need to include a package at a URL, it should be editable (start with | ||
"-e ") and have both the package name and version specified (end with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "should be editable" advice is contrary to our current instructions. You might want to include here, "(because pip-tools doesn't otherwise support URL installations)" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll reference the bugs noted above.
fe58927
to
1012b6b
Compare
Something I still don't understand after reading this (maybe because i was too focused on paragraphs, and not enough on the big picture): who is supposed to run "make upgrade" to update the version numbers in the .txt files, and when are they supposed to do it? |
installed. | ||
2. For each of these contexts, declare the direct dependencies that will be | ||
needed. Use the least restrictive constraints which should allow pip to | ||
install a working set of dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed a key point here: our philosophy now is to not explicitly pin versions, and instead to let pip-tools pin them for us. This is a big shift from how we used to do things, and needs to be highlighted much more strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, rewording to try to make things like that more clear.
@nedbat The section on automating updates recommends using a service to auto-generate PRs which run I'll add some notes to this effect to the doc. |
IIUC, this means that I as a developer might need to add a dependency, so I edit a .in file. Then I run make update. This can update the pins of an arbitrary number of other packages, some of which might break existing code. What do I do then? |
The automation is important here to make sure that we always get package upgrades in small batches. We should really never go more than a week or so without updating all the dependencies which we're basically current on (unless miraculously no new versions of packages we use were released in that time; more likely to happen for small repos). And I'd argue that someone who doesn't feel qualified to determine if it's probably safe to upgrade a few dependencies probably shouldn't be adding new dependencies without assistance or review either. |
That needs to go into the document. :) |
1012b6b
to
062f005
Compare
300a58e
to
7bc91ce
Compare
dependencies). | ||
* An automated test suite with reasonably good code coverage, configured to | ||
be run on new GitHub pull requests. | ||
* A service configured to periodically auto-generate a GitHub pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often would such a service be expected to run? Daily, weekly, monthly?
dependencies). | ||
* An automated test suite with reasonably good code coverage, configured to | ||
be run on new GitHub pull requests. | ||
* A service configured to periodically auto-generate a GitHub pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I understanding this correctly: Until the next time this service runs, the automatically generated requirements .txt
file may not match the output of the .in
file, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anytime a developer updates one of the .in
files, they should run make upgrade
to update all of the .txt
files. The periodic auto-generated pull requests (probably weekly or so) are just to notify us of updates to packages that we aren't deliberately holding back, so we don't fall too far behind. Those auto-generated PRs would only update the .txt
files, since the input constraints won't have changed (and the suggested changes still need to be reviewed and deliberately merged).
* The need for an old dependency was removed. | ||
* A version constraint needs to be added to prevent upgrading to a | ||
backwards-incompatible release of a required package until appropriate code | ||
changes can be made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the best practice when a development team needs to make a backward incompatible change to an API that is implemented in 1 repo but used in another? If the team didn't pin down the version number prior to this, what are the ramifications (open edX community, failure on master branch, etc.) if they pin down the version only afterward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing ever gets automatically upgraded; changes to the generated .txt
files should always be reviewed before merging to master. I think in this scenario, best practice is to leave packages unpinned by default, but clearly document backwards-incompatible changes in the changelog and release notes; when an update of the .txt
files indicates that the package would be upgraded, that should be noticed so the developer can either constrain the version or make the changes needed to use the new version, as appropriate.
|
||
Many of the Open edX repositories have already begun to comply with the | ||
recommendations outlined here. In particular, repositories generated using | ||
`cookiecutter-django-app`_ should be configured correctly from the outset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor nit. Might be clearer to say "should already be configured correctly..."
What level of pip knowledge do you expect from the readers of this OEP? I ask because it covers a lot of ground and reads like two documents in one. A) This is how we plan to structure our python package dependencies for our projects B) Here are some tips and guidelines for working with the more advanced features of pip and the pipe ecosystem which we use for A, above. I don't want to suggest unneeded work, but perhaps the OEP be narrowed to A with a bonus of linking to additional reading resources for engineers to get up to speed on the tools and techniques to better work with python package dependencies? |
BTW, I've learned a bit of new stuff about pip just by reading it :) |
@johnbaldwin The intended level of detail is to be sufficient to help a developer create a new code repository that follows the guidelines, and keep it that way as changes are made over time. A lot of the seemingly esoteric points covered here come up depressingly often in routine package maintenance, but I'm open to suggestions on specific content that could be harmlessly extracted. I could move about half of the "Installing Dependencies from URLs" section into the Rationale section (which seems more appropriate for background info), and localize the references to |
@jmbowman Thanks for your reply and additional context. It sounds like you're intentionally using this OEP as both a spec and a guide, and I can see why now. Given your comments, and that you need to keep within the structure of the OEP template, I'm not sure restructuring would be worth the effort. I think the thing to do is for the community to start using this as a reference then perhaps folks will create derivative guides, focusing on aspects of package maintenance and dependency management to different audience experience levels. |
7bc91ce
to
dd4b2cf
Compare
* Just the standard set of core dependencies for execution on a production | ||
server to perform its primary purpose (``base.in``) | ||
* Additional dependencies which are only needed when optional extra features | ||
of a package are desired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only bullet point that doesn't have a filename attached. Is that an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filename would typically be based on the name of the extra in setup.py, I'll a note to that effect.
base dependencies also need to be specified there. These can be derived from | ||
``requirements/base.in`` with a simple Python function declared in | ||
``setup.py`` itself. An example can be found in the | ||
`setup.py file for edx-completion`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to inline the function definition in this document? That would give a more canonical best-version, it seems like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. The first draft of the OEP didn't particularly recommend this approach, but earlier review discussions concluded that it was best not to explicitly list the core dependencies in two places, so having a canonical implementation of loading them makes sense. The existing code is about 30 lines; a little long, but probably short enough to include.
In most other circumstances, the package should be added to PyPI instead. If | ||
you do need to include a package at a URL, it should be editable (start with | ||
"-e ") and have both the package name and version specified (end with | ||
"#egg=NAME==VERSION"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a worthwhile place to have an example of a fully-baked url include line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussion on this seems to have reached a conclusion with no hanging issues, so as the arbiter, I'm happy to call this Accepted.
dd4b2cf
to
e2474ce
Compare
Recommendations for managing python package dependencies which should help minimize problems over time. The key technology choice involved is
pip-tools
, which we already use in many of our repositories. The current draft specifies that we should use a service to auto-generate PRs with requirement updates, but stops short of recommending which one to use (largely because we're still in the process of finalizing that decision, and the choice may change over time anyway).