Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety? #18

Closed
ofuhrer opened this issue Nov 12, 2019 · 4 comments · Fixed by #21
Closed

Thread safety? #18

ofuhrer opened this issue Nov 12, 2019 · 4 comments · Fixed by #21

Comments

@ofuhrer
Copy link
Contributor

ofuhrer commented Nov 12, 2019

If I run refresh_downloaded_data() while in another job I actually run write_run_directory() fv3config() fails. What are the type of guarantees that fv3config provides when executing it in multiple streams for the same user?

@mcgibbon
Copy link
Collaborator

I'd say it's worth having thread safety. The way to achieve this wouldn't be through unit testing but rather through API design.

The easiest way to achieve "thread safety" for this feature would probably be to remove refresh_downloaded_data() from the public API and have it accessible only as a stand-alone endpoint. In practice there should be no reason for a script to call refresh_downloaded_data(), I only added it because a user may want to manually do it if somehow the downloaded data gets corrupted.

For any other operations, we can maintain that thread safety by having only non-destructive operations on the data cache.

@mcgibbon
Copy link
Collaborator

@ofuhrer do you agree with my proposed fix for this issue?

@ofuhrer
Copy link
Contributor Author

ofuhrer commented Nov 12, 2019

I think that's a great idea for a v1.0 version of thread safety. You could still get a race if the first download was taking long and some thread would already try to use fv3config, but I think that's a more academic example. It just happened that I ran into the above problem while playing with fv3config and did not understand what happened at first.

Generally: I think it would be interesting to think about the domain model of fv3config. What happens if somebody updates the data or namelists on GCP? Is that "illegal"? Is the data on GCP "versioned" and a release of fv3config always get's the same version of the data (until the user updates fv3config)? How is data provenance handled when using fv3config for a series of simulations?

@mcgibbon
Copy link
Collaborator

mcgibbon commented Nov 12, 2019

I think what we may want to do is refactor the internal representation of remote data options to use a user-specific configuration file. That way we as a group can use our own configuration file and handle versioning/updating in any way we want to internally without exposing these possibly rapid changes to the public. The way the code is set up right now is a lot like this (a mapping from option names to remote URLs), so it would not take much work to move this logic into a configuration file. Then you could also version your files and add/remove options fairly easily.

Opened #20 to deal with some of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants