Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CI OIDC access to new bors branches #355

Merged
merged 4 commits into from
Mar 9, 2024

Conversation

Mark-Simulacrum
Copy link
Member

Closes rust-lang/infra-team#85

I'd like to touch base with Foundation security folks on the policies here, though as currently proposed they should be safe (different s3 bucket prefix than production).

terraform/rustc-ci/impl/artifacts.tf Show resolved Hide resolved
Comment on lines +183 to +186
"${aws_s3_bucket.artifacts.arn}/rustc-builds-try",
"${aws_s3_bucket.artifacts.arn}/rustc-builds-try/*",
"${aws_s3_bucket.artifacts.arn}/rustc-builds-try-alt",
"${aws_s3_bucket.artifacts.arn}/rustc-builds-try-alt/*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an improvement over the current situation, but it'd be nice to scope these by the commit hash (${aws_s3_bucket.artifacts.arn}/rustc-builds-try-alt/${sha}/*). As far as I'm aware it might not be possible to do that in IAM directly (the commit hash is in the GitHub Actions OIDC claim, but it might not be possible to access it from IAM), but at that point I'd prefer having a lambda function validating the JWT and issuing a properly scoped policy.

Also, the *-less variant shouldn't be needed.

Copy link
Member

@pietroalbini pietroalbini Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context on why this would be helpful: right now a job can override artifacts from other jobs, so in theory a try build (triggered by a trusted reviewer) could override the artifacts of a previously merged commit / try build, while if we scope things by commit hash a build would only be able to write in their scratch space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked into this a bit and it seems like the only way to do this is custom code, we can expose the SHA into the sub claim with a custom claim but I don't think we can parse it out from there.

The Lambda function sounds plausible, we do somehow similar for sync-team kickoff... it would mean custom code in this relatively sensitive area though. I guess the risk would mostly be that we just don't get it's benefits though - should be easy to avoid further escalation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination I think is to not block on the further scope reduction at this time, I'm happy to work on it but I want to get new bors treating unblocked quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked into this a bit and it seems like the only way to do this is custom code, we can expose the SHA into the sub claim with a custom claim but I don't think we can parse it out from there.

Yeah, I remember a while back reading somewhere that it was possible to pass arbitrary custom claims into IAM policies if they had specific keys in the JWT token, but I can't seem to find that piece of docs anymore. It'd need work on the GitHub side to change those claims anyway, so it wouldn't be something we could rely on in a timely manner.

The Lambda function sounds plausible, we do somehow similar for sync-team kickoff... it would mean custom code in this relatively sensitive area though. I guess the risk would mostly be that we just don't get it's benefits though - should be easy to avoid further escalation.

Indeed. A way to limit the damage would be to have the function call be authenticated with OIDC too, so the most damage would be a job overriding another job's artifacts (i.e. the status quo) compared to third parties obtaining tokens, but still, it'd be sensitive code.

My inclination I think is to not block on the further scope reduction at this time, I'm happy to work on it but I want to get new bors treating unblocked quickly.

I think it's fine to land this now to unblock the new bors (it uploads to a different directory than current bors anyway, so no risk of overriding), but it'd be nice to have it in place in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I remember a while back reading somewhere that it was possible to pass arbitrary custom claims into IAM policies if they had specific keys in the JWT token, but I can't seem to find that piece of docs anymore. It'd need work on the GitHub side to change those claims anyway, so it wouldn't be something we could rely on in a timely manner.

Yeah, I'm not quite sure if the GitHub side has exactly the claim we'd need. https://token.actions.githubusercontent.com/.well-known/openid-configuration has a listing but I don't know if the head_ref for example is useful or just the branch name.

https://docs.github.com/en/enterprise-cloud@latest/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#understanding-the-oidc-token lists a "sha": "example-sha" but it's not documented and not clear whether example-sha would be a full commit hash or what.

Copy link
Member

@pietroalbini pietroalbini Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think workflow_sha would be the claim we'd be interested in, as it'd be the hash of the commit containing the workflow we're executing.

Also, I found the AWS feature I was thinking of: Session Tags. If the proper tag was passed in the JWT, then we could have ${iam:ResourceTag/workflow_sha} in the IAM policy I think. But that's not applicable for us because GHA doesn't put the claims under the https://aws.amazon.com/tags key.

Edit: the sha in the docs might be legacy, before they split workflow_sha and job_workflow_sha due to the introduction of reusable workflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I did see some references to Cognito, we should try to figure out how complicated/pricey it'll be.

@Mark-Simulacrum
Copy link
Member Author

Mark-Simulacrum commented Nov 19, 2023

@pietroalbini I've updated to use Cognito as suggested. Some concerns from my side:

  • I haven't yet / didn't see a way to control which repositories can get Cognito tokens. I think as-is, anyone on GitHub could call the cognito APIs and get a token within our account. That feels iffy to me, even if that token itself is not so valued; it gives a possible entry point to pivot off of.
  • The filtering logic on AssumeRoleWithWebIdentity feels less strong to me, I have weaker confidence in it compared to the :sub claim filtering for the direct OIDC usage. I think it's mostly because we're somewhat less on the beaten path, but it also seems like at least to some extent it is fair to say that surface area here is wider. It's easier to miss a filter on the repository in question, for example.
  • We may want to land a version of this that grants access to a separate bucket or at least figure out how to test this (positive and negative cases). I have somewhat low confidence that it'll work from the start.

I think the benefits of filtering to just the one SHA location probably win out overall but it's a close call to me. It's possible that we could achieve similar confidence with e.g. a custom sub claim (possible with GHA, AFAICT) that includes the workflow sha, put that in a separate repository, and enforce there that the real repository checkout SHA is passed as an session name (for example). It seems like that ought to work, and have less chance of the failure mode being "anyone on github has write access"...

@pietroalbini
Copy link
Member

I haven't yet / didn't see a way to control which repositories can get Cognito tokens. I think as-is, anyone on GitHub could call the cognito APIs and get a token within our account. That feels iffy to me, even if that token itself is not so valued; it gives a possible entry point to pivot off of.

Yeah it's not awesome, but from a cursory glance I don't see a solution to this.

The filtering logic on AssumeRoleWithWebIdentity feels less strong to me, I have weaker confidence in it compared to the :sub claim filtering for the direct OIDC usage. I think it's mostly because we're somewhat less on the beaten path, but it also seems like at least to some extent it is fair to say that surface area here is wider. It's easier to miss a filter on the repository in question, for example.

I don't think it's that different from directly interacting with GitHub's OIDC: you still need the condition filtering on the repository either way, whether it's sub or RequestTag/repository. If the worry is missing the condition, I'd wrap that behind a module and use the module across the Terraform configuration.

We may want to land a version of this that grants access to a separate bucket or at least figure out how to test this (positive and negative cases). I have somewhat low confidence that it'll work from the start.

What about the previous proposal of ${aws_s3_bucket.artifacts.arn}/rustc-builds-try/?

It's possible that we could achieve similar confidence with e.g. a custom sub claim (possible with GHA, AFAICT) that includes the workflow sha, put that in a separate repository, and enforce there that the real repository checkout SHA is passed as an session name (for example).

I'm not sure I understand, could you elaborate a bit more?

@Mark-Simulacrum
Copy link
Member Author

What about the previous proposal of ${aws_s3_bucket.artifacts.arn}/rustc-builds-try/?

I can make that the path, yeah, as our way of testing.

I'm not sure I understand, could you elaborate a bit more?

We'd do the following:

  • Move role assumption to an externally hosted workflow (e.g., rust-lang/assume-role-with-sha@some-tag)
  • Adjust the sub claim GitHub sends to include the external repo sha. AWS requires then that the sub contains the external repo sha, i.e., we can be confident that the assume-role call comes from code we authored, and it's not changed in anyway
  • The external workflow will set a tag on the session with the sha of the main repo (rust-lang/rust) - I believe this information should be accessible and the workflow can set tags when calling assume role. If not we'd need to make the workflow know the policy filter we want and filter down there.

Basically the idea is similar to the previously proposed lambda function transforming claims in the oidc token but puts that on the GitHub side, which may be easier to maintain and is more reusable if we want similar things elsewhere (e.g., other repos pushing artifacts), it may even work well for PR CI having this access if we wanted that.

@Mark-Simulacrum
Copy link
Member Author

I don't think it's that different from directly interacting with GitHub's OIDC: you still need the condition filtering on the repository either way, whether it's sub or RequestTag/repository.

I think there is a difference - we are relying more on the tag being only settable by a trusted party, whereas the direct oidc use isn't expanding trust scope to Cognito. In particular the issuance to anyone on GitHub of Cognito IDs makes me somewhat nervous. Basically the worry in my mind is that GitHub "knows" the :sub key is critical but may be less cautious about other metadata (intentionally or not), and we aren't currently checking the sub key anywhere. Maybe we should do that and that covers most of our bases.

Regardless, the PR as-is very much expands our threat model to include Cognito, which didn't used to be the case.

@pietroalbini
Copy link
Member

Basically the idea is similar to the previously proposed lambda function transforming claims in the oidc token but puts that on the GitHub side, which may be easier to maintain and is more reusable if we want similar things elsewhere (e.g., other repos pushing artifacts), it may even work well for PR CI having this access if we wanted that.

That's a good point, and I quite like the solution if we can make it work. That way, compared to the lambda we shouldn't need to validate GitHub's JWTs with our own code. Maybe let's bring this up to the infra meeting for a decision on the approach.

Basically the worry in my mind is that GitHub "knows" the :sub key is critical but may be less cautious about other metadata (intentionally or not), and we aren't currently checking the sub key anywhere. Maybe we should do that and that covers most of our bases.

Sure. If we go with this approach we could check the sub key (mapping it to gh_sub or something).

Regardless, the PR as-is very much expands our threat model to include Cognito, which didn't used to be the case.

Definitely.

@walterhpearce
Copy link

My main thought around this is whether we can nail a solution that would prevent builds from overwriting other artifacts. The move away from bearer secrets is still obviously great, but I'd like to make sure moving to this does give us the ability to lock modifications down to a specific build. Are builds modified at any point post-builder assuming and using this role? Is there a complexity tradeoff on doing a per-build bucket to side-step this issue entirely?

@pietroalbini
Copy link
Member

pietroalbini commented Nov 23, 2023

Are builds modified at any point post-builder assuming and using this role?

Builds are supposed to be immutable after being uploaded.

Is there a complexity tradeoff on doing a per-build bucket to side-step this issue entirely?

Regarding the complexity of one bucket per build, we upload roughly 10 "merge" builds a day plus a variable number of "try" builds a day, so we would quickly create a large amount of buckets and go over the bucket limit. We could ask to increase it, but I don't think it'd be worth it, because I don't think separate buckets would solve the problem.

The problem we're trying to solve is getting the unique identifier of the build (the commit hash) out of the JWT GitHub provides and into the AWS IAM policy. Until we address this we cannot prevent a build from overriding other builds, and once we address it there isn't much difference than scoping on s3://artifacts-${sha}/* and s3://artifacts/${sha}/*.

The three options we identified are:

  • Create a custom Lambda that validates the JWT and creates a scoped IAM session. This would mean we'd be responsible for validating the JWT though, which could lead to bad bugs. Idea
  • Use Cognito as a middleman between GitHub and AWS IAM, to turn the JWT claims into a format IAM understands. Implementation and Mark's concerns.
  • Use a separate, locked down reusable GitHub workflow to generate IAM sessions with the appropriate scoping. Mark's unimplemented thoughts

This grants access under a new directory prefix (rustc-builds-try) as a
temporary measure to avoid mistakes overwriting any actual artifacts. It
might be a good idea in any case to scope try builds into a different
bucket or place than real builds.
This uses Cognito as a dispatch authority to convert OIDC claims to IAM
condition values, and then fitlers the resulting role to only writing
into the passed sha.

See https://awsteele.com/blog/2023/10/25/aws-role-session-tags-for-github-actions.html for some related context.
See write up here:
rust-lang#355 (comment).
We decided not to pursue this for the time being and revisit at a later
point.

This reverts commit 2f7aefc.
@Mark-Simulacrum
Copy link
Member Author

As discussed with infra in a recent meeting, I've reverted the Cognito stuff here (per the concerns listed above). Rather than implementing one of the other options we're going to move ahead with the simpler approach of just limiting access to rustc-builds-try/* for now (perhaps relaxing that once we're happy with bors working as expected, perhaps using the reusable workflow approach discussed above, perhaps something else). I've gone ahead and deployed the changes in this PR so merging it.

@Mark-Simulacrum Mark-Simulacrum merged commit 10c525e into rust-lang:master Mar 9, 2024
3 checks passed
@Mark-Simulacrum Mark-Simulacrum deleted the add-ci-access branch March 9, 2024 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup S3 permissions for try builds on rust-lang/rust using OIDC
3 participants