Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requirement: Implement --fix for RequirementSource #225

Merged
merged 16 commits into from
Jan 24, 2022

Conversation

tetsuo-cpp
Copy link
Contributor

Closes #214

@tetsuo-cpp tetsuo-cpp marked this pull request as draft January 19, 2022 12:08
@tetsuo-cpp
Copy link
Contributor Author

tetsuo-cpp commented Jan 19, 2022

@woodruffw @di
I wanted to run this by you both before going too far with this. At the moment, I use pip_api.parse_requirements and then rewrite the file line by line, replacing requirements that correspond to the fixed dependency.

This could be optimised to do all the fixes in-memory and then write the file out at the end but it doesn't fit the fix interface and I don't anticipate this being a serious performance issue, so I'm thinking we can do it like this initially.

  • One problem with this approach is that we lose comments in the requirement file since they don't get parsed by parse_requirements. I'm thinking that the solution might be to add a keep_comments optional argument to pip_api.parse_requirements that returns strings for comments also.

  • The other issue is that the fact that we support multiple requirements files makes this harder to get right (apply fixes transactionally). With a single file, this is easy as we just need to construct the new file with the fixes and then we can move it into place to replace the old one. Multiple files throw a spanner in the works since, if one of our moves fails, we'll have some requirement files with fixes applied and some without. Therefore, we'll have to keep the old ones around and then have some recovery procedure to undo the moves in the case of a failure. If we want to keep things simple and avoid dealing with this, we could also just say that --fix only supports a single -r argument.

What are your thoughts on these two points?

@woodruffw
Copy link
Member

This could be optimised to do all the fixes in-memory and then write the file out at the end but it doesn't fit the fix interface and I don't anticipate this being a serious performance issue, so I'm thinking we can do it like this initially.

Yeah, this sounds good to me. Our execution is probably overwhelmingly dominated by network round-trips and pip subprocesses, so this isn't something we need to immediately optimize.

  • One problem with this approach is that we lose comments in the requirement file since they don't get parsed by parse_requirements. I'm thinking that the solution might be to add a keep_comments optional argument to pip_api.parse_requirements that returns strings for comments also.

@di should probably have the final say on this, but IMO we should skip on comment handling for the MVP. I do think we should ultimately preserve comments, but we might need to rethink the parse_requirements API to do so, since we'll need to support both freestanding and inline comments:

# foo
blah==1.2.3 # bar
  • The other issue is that the fact that we support multiple requirements files makes this harder to get right (apply fixes transactionally).

Hmm, this is a good point. Maybe this suggests that we redesign our initial pip_api._fix APIs to be more transactional to begin with, since we have a similar "partial fix" problem even in non--r mode: we can imagine that upgrading foo 1.2.3 -> 1.2.4 is a correct fix, but that pip install -U foo==1.2.4 fails for whatever reason. If that happens amidst other successful fixes, we perhaps should fail the entire --fix (and revert any upgrades already performed).

If we go the above route, we'll probably want to refactor DependencySource.fix into a non-abstract method like this:

def fix(self, fix_versions: Iterator[ResolvedFixVersion]):
    with self.fix_transaction():
        for fix in fix_versions:
            self.fix_one(fix)

...where fix_one is the previous body of DependencySource.fix for each implementation, and fix_transaction is another abstract method that provides a transactional context manager.

OTOH, that's a lot of complexity, and maybe --fix should really just be "buyer beware": there's no guarantee that the --fix will succeed or leave multiple requirements inputs in a consistent state, only that failures are accompanied by error handling on our side (a nonzero exit). With that approach, it's the user's responsibility to rollback any changes made.

Base automatically changed from alex/pip-fix-logging to main January 19, 2022 14:06
@tetsuo-cpp
Copy link
Contributor Author

Hmm, this is a good point. Maybe this suggests that we redesign our initial pip_api._fix APIs to be more transactional to begin with, since we have a similar "partial fix" problem even in non--r mode: we can imagine that upgrading foo 1.2.3 -> 1.2.4 is a correct fix, but that pip install -U foo==1.2.4 fails for whatever reason. If that happens amidst other successful fixes, we perhaps should fail the entire --fix (and revert any upgrades already performed).

Personally, I think this is ok. Our summary specifically states that we fixed X vulnerabilities out of Y packages (compared with the Z vulnerabilities found in the audit), which indicates that we will fix everything we can but we don't guarantee we can fix everything. At the moment, if the package update fails, we yield a SkippedFixVersion and pip-audit behaves similarly to if we can't resolve a fix version.

But I don't think it's ok to partially apply a single fix to some requirements files but not others.

@tetsuo-cpp
Copy link
Contributor Author

How does this look?

I've attempted to deal with the case where to fail to write to one of the requirements files (or some other error gets thrown). I start by copying all requirements files to temporary files, then attempt to rewrite the requirements files. If we're successful, the temporary files will automatically get cleaned up. If not, I restore the requirements files by copying the temporary files back to the original location.

@woodruffw
Copy link
Member

How does this look?

This overall structure looks good to me! This approach also looks good, and I think it's what we'll want to encapsulate with the "fix transaction" interfaces above. But that can be a task for a follow-up PR 🙂

@tetsuo-cpp
Copy link
Contributor Author

I'm still working on getting this tested. Mocking out things so the tests don't mess around with the filesystem has been a bit of a pain.

This overall structure looks good to me! This approach also looks good, and I think it's what we'll want to encapsulate with the "fix transaction" interfaces above. But that can be a task for a follow-up PR 🙂

Hmm, I'm not following. The way I see it, the "fix transaction" interface is only necessary if we're intending to roll back fixes for other vulnerabilities in the case that one of them failed. Is that what we want? The recovery scheme I'm using here is for when a single vulnerability exists in multiple requirements files and we didn't manage to fix it in all files (as opposed to rolling back successful fixes for vulnerabilities across all files because a vulnerability further down the list couldn't be fixed).

@woodruffw
Copy link
Member

The way I see it, the "fix transaction" interface is only necessary if we're intending to roll back fixes for other vulnerabilities in the case that one of them failed. Is that what we want?

Whoops, this was a misunderstanding on my part. Yeah, we don't need the "transaction" interface in this case.

@tetsuo-cpp tetsuo-cpp marked this pull request as ready for review January 24, 2022 04:25
test/conftest.py Outdated Show resolved Hide resolved
Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! I left some small nits that I can take care of.

@di
Copy link
Member

di commented Jan 24, 2022

I'm still working on getting this tested. Mocking out things so the tests don't mess around with the filesystem has been a bit of a pain.

There might be some useful fixtures for test isolation in https://github.com/di/pip-api/blob/dec3a5e30c911b794763483ed985960a6732a40e/tests/conftest.py you could use to improve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement --fix for RequirementSource
3 participants