Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance DiscoverAndQueueGranules workflow to allow unlimited scalability #274

Closed
chuckwondo opened this issue Oct 17, 2023 · 2 comments
Closed
Assignees
Labels
enhancement New feature or request infrastructure Create, update, or remove infrastructure

Comments

@chuckwondo
Copy link
Collaborator

Currently, the DiscoverAndQueueGranules workflow is far more scalable than the out-of-box workflow provided by the core Cumulus examples. Out of the box, s3 discovery collapses at around 500K files (regardless of the number of files per granule), depending upon Lambda configuration, or use of an ECS task in place of a Lambda function.

With the currently "auto chunking", looping logic in the workflow, the number of files that can be discovered would be unlimited, if it weren't for an AWS limit on the number of events in an executing step function, which is 25000. By very rough calculations, this allows us to ingest a span of about 2.5 years of granules. However, since constructing Cumulus rules to span 2.5 years is a bit cumbersome and unintuitive, so we currently construct 1 rule per year for each collection

The ideal situation (while still leveraging existing s3 discovery capabilities) would be to create 1 rule per collection, spanning the entirety of the temporal range of the collection, regardless of how many files that includes. This was the original goal of the "auto chunking", looping workflow, until the 25K event limit on step function executions was reached.

More recently, I discovered the ability of "Map" tasks within step functions to support a "distributed" mode, which means that each "iteration" of a Map task is treated as a separate execution, thus not contributing to the event count of the main workflow. This further means that we can replace the looping logic with a distributed Map task, and thus avoid getting anywhere close to the 25K event limit an any individual workflow or Map task.

@chuckwondo chuckwondo added enhancement New feature or request infrastructure Create, update, or remove infrastructure labels Oct 17, 2023
@chuckwondo chuckwondo self-assigned this Oct 17, 2023
@chuckwondo
Copy link
Collaborator Author

Related PR: #278

@chuckwondo
Copy link
Collaborator Author

Fixed by #278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request infrastructure Create, update, or remove infrastructure
Projects
None yet
Development

No branches or pull requests

1 participant