Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVPROD-12040: Add conditional writes to s3.put #8464

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

drichmdb
Copy link
Contributor

@drichmdb drichmdb commented Nov 7, 2024

DEVPROD-12040

Description

This commit updates s3.put to support S3's new conditional writes
feature. The conditional writes feature guarantees that a write will not
succeed if the remote key already exists. This check respects s3's
strong read-after-write data model.

In order to enable this feature, I updated to the latest version of pail.
This required bumping the minimum go version of this project to 1.21.

While adding this feature, I realized that the existing skip_existing
implementation is buggy and has not been working correctly because of the
final check we do for updatedFiles equaling fileList. This commit also
fixes that bug.

Testing

I have also added a fully automated test for s3.put that uses this flag. We don't really have any automated tests against networked resources for agent commands right now, but I feel that this would be a positive change for the reliability of agent command changes. With an automated test, we don't need to rely on exhaustive testing for every scenario in staging. Writing this test was what caught the bug this PR has now fixed with the current skip_existing implementation.

My idea here is that for testing S3 commands specifically, we can use a bucket that has an aggressive lifecycle policy. Files we write can be removed within hours or days to prevent permanent growth. All files can be written with random IDs to avoid state being clobbered. I've made the minimum changes to somewhat set this up, but as of me marking this ready for review, my test will fail because I have not provisioned a bucket for these purposes and also have not added any credentials to access a bucket to Evergreen's Evergreen variables. If the team is agreed that this is a good approach, I can use whichever AWS account/credentials are deemed appropriate for these resources.

If the consensus is against adding this type of testing, I can remove the test and only test manually in staging.

@drichmdb drichmdb force-pushed the DEVPROD-12040 branch 5 times, most recently from 6c442a8 to 753079e Compare November 8, 2024 21:21
@drichmdb drichmdb requested review from a team and ablack12 November 8, 2024 21:31
@drichmdb drichmdb marked this pull request as ready for review November 8, 2024 21:31
// SkipExisting, when set to 'true', will not upload files if they already
// exist in s3. This will not cause the s3.put command to fail. When set to
// 'error' instead, the command will fail if a remote file already exists.
// This beavior respects s3's strong read-after-write consistency model.
SkipExisting string `mapstructure:"skip_existing" plugin:"expand"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think skip_existing: error doesn't actually make a ton of logical sense -- we aren't skipping an existing file (i.e. succeeding), we're choosing to fail. Parameters are free so I think a new parameter like error_on_existing: true is more clear. This also maintains our behavior of parsing booleans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking this over I agree with you here and also think this doesn't make sense as a feature at all. I reverted back to the old behavior where this is just toggled as a plain boolean and there is no option to error out.

// SkipExisting, when set to 'true', will not upload files if they already
// exist in s3. This will not cause the s3.put command to fail. When set to
// 'error' instead, the command will fail if a remote file already exists.
// This beavior respects s3's strong read-after-write consistency model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@ablack12
Copy link
Contributor

Adding a test that adds files seems fine to me if we clean up quickly, and we aren't concerned about failures on task retries. @Kimchelly could you weigh in on this proposed testing change?

@ablack12 ablack12 requested a review from Kimchelly November 15, 2024 22:21
This commit updates s3.put to support S3's new conditional writes
feature. The conditional writes feature guarantees that a write will not
succeed if the remote key already exists. This check respects s3's
strong read-after-write data model.

In order to enable this feature, I updated to the latest version of pail.
This required bumping the minimum go version of this project to 1.21.

While adding this feature, I realized that the existing skip_existing
implementation is buggy and has not been working correctly because of the
final check we do for updatedFiles equaling fileList. This commit also
fixes that bug.
Copy link
Contributor

@Kimchelly Kimchelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If pail is using go1.21, it likely has to be reverted or reworked to use go1.20 because Evergreen relies on pail and we can't upgrade Evergreen to go1.21 until DEVPROD-3611.

As for the temporary testing bucket, the idea seems reasonable to me.

@@ -797,7 +799,7 @@ buildvariants:
goos: linux
goarch: amd64
nodebin: /opt/node/bin
GOROOT: /opt/golang/go1.20
GOROOT: /opt/golang/go1.21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use go1.21 in our tests because Evergreen is not allowed to compile for go 1.21 yet (DEVPROD-3611).

var ae smithy.APIError

if errors.As(err, &ae) {
if ae.ErrorCode() == "PreconditionFailed" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment clarifying what "PreconditionFailed" is here/the significance in S3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! This is the error that setting IfNoneExists will trigger when set, and it means that the file already exists.

Copy link
Contributor

@Kimchelly Kimchelly Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to the code as well saying this so that future readers can understand what the significance of this error is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely

return errors.Wrapf(err, "checking if file '%s' exists", remoteName)
}
if exists {
logger.Task().Infof("Not uploading file '%s' because remote file '%s' already exists. Continuing to upload other files.", fpath, remoteName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this log in the new implementation? It's useful for users debugging their tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed!

@@ -287,6 +287,8 @@ functions:
JIRA_TOKEN_SECRET: ${jira_token_secret}
JIRA_CONSUMER_KEY: ${jira_consumer_key}
PAPERTRAIL_KEY_ID: ${papertrail_key_id}
AWS_KEY: ${self_tests_aws_key}
AWS_SECRET: ${self_tests_aws_secret}
Copy link
Contributor

@Kimchelly Kimchelly Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of re-introducing static AWS credentials, can we instead add permissions to the assumed role in the self-tests? The infra team removed our AWS static creds and swapped them out for assuming a role because it's better practice and support for static credentials in S3 operations will be removed in DEVPROD-5553.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! If there's already a role that exists during the self-tests I am happy to remove this and use the default identity for the tests

@drichmdb
Copy link
Contributor Author

@Kimchelly Unfortunately the AWS SDK requires go 1.21 for the new conditional put support (and so Pail also requires that). Given that we are still using MacOS 10.14, it seems like this PR may need to stay blocked until we can get rid of that OS, unfortunately.

@ablack12
Copy link
Contributor

@drichmdb sorry we're just looking at this -- for this team, we rely on re-requesting review to know when we should look again.

@drichmdb
Copy link
Contributor Author

@ablack12 No worries! I have actually picked up DEVPROD-6063 in an attempt to unblock DEVPROD-3611, so this PR can sit for a little while.

Copy link
Contributor

@Kimchelly Kimchelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this is blocked for now, but let us know if/when it becomes unblocked.

var ae smithy.APIError

if errors.As(err, &ae) {
if ae.ErrorCode() == "PreconditionFailed" {
Copy link
Contributor

@Kimchelly Kimchelly Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to the code as well saying this so that future readers can understand what the significance of this error is?

Copy link
Contributor

@ablack12 ablack12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly commenting to wait on review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants