Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Granule not found" errors occur when "duplicateHandling" set to "skip" #65

Open
chuckwondo opened this issue Jul 8, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@chuckwondo
Copy link
Collaborator

chuckwondo commented Jul 8, 2022

For the PSScene3Band collection, setting "duplicateHandling" to "skip" (rather than "replace") to avoid unnecessary ingestion (and related costs), the DiscoverGranules step of the DiscoverAndQueueGranules workflow fails with "granule not found" errors. This is for the same reason as #32. We must somehow prefix the granule IDs with PSScene3Band- before discovery checks for duplicates, but this is a harder task than the fix for #32 because Cumulus provides no means to insert custom logic between the "list granules" step and the "check for duplicates" step, so we cannot tweak the granule IDs after they're listed, but before they're checked as duplicates.

Acceptance criteria: Configuring "duplicateHandling" as "skip" on the PSScene3Band collection does not produce "granule not found" errors during discovery, and properly skips granules that have already been ingested. The logic should also work for other collections, but given that we currently have only the PSScene3Band collection available, testing against other collections is not required at this point.

@chuckwondo chuckwondo added the bug Something isn't working label Jul 8, 2022
@chuckwondo
Copy link
Collaborator Author

One approach to consider would be to leverage proxyquire to "inject" custom logic for the list method of the "s3" protocol provider in Cumulus. This could possibly be done by modifying our existing logic that adds the collection name as a prefix to the granule IDs, but rather than doing it after discovery is complete, "inject" the prefixing logic into a custom list method implementation, or subclass the Cumulus S3ProviderClient class and override the list method. This would also require proxyquire to override the providerClientUtils.buildProviderClient function to "intercept" use of the "s3" protocol to use our subclass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant