Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target store returned from StoreToZarr without waiting on completion of StoreDatasetFragments #564

Closed
cisaacstern opened this issue Aug 18, 2023 · 1 comment · Fixed by #574
Assignees
Labels
bug Something isn't working

Comments

@cisaacstern
Copy link
Member

This is my mistake. When I merged #562, I did not understand that Python line numbers were not used for determining sequencing of the Beam DAG. I now see that by not chaining the returned target_store onto the output PCollection of StoreDatasetFragments here

rechunked_datasets | StoreDatasetFragments(target_store=target_store)
return target_store

Beam optimizes by returning target_store before StoreDatasetFragments is complete. AFAICT the solution discussed in #556 (comment) makes the most sense. To summarize:

  1. return target_store from StoreDatasetFragments so that it emits a PCollection of length n target_stores
  2. take a fixed size beam.Sample from the target stores PCollection so we get a singleton PCollection containing a single target_store
  3. return that singleton target_store

xref leap-stc/cmip6-leap-feedstock#9 (comment)

@cisaacstern cisaacstern added the bug Something isn't working label Aug 18, 2023
@cisaacstern
Copy link
Member Author

cisaacstern commented Aug 18, 2023

I can work on this Monday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant