-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Active interface for cf python: add active.do_active
attribute and uri_analyzer()
function
#69
base: main
Are you sure you want to change the base?
Conversation
@@ -10,6 +10,42 @@ | |||
from activestorage import netcdf_to_zarr as nz | |||
|
|||
|
|||
def uri_analyzer(uri): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to assess if active storage reductions are possible, not just that the file exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! But how do we do that? We need a set of standards here to get conditions in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below ....
uri_analyzer_result = uri_analyzer(self.uri) | ||
self.do_active = uri_analyzer_result | ||
# otherwise just get out and return active.do_active = False | ||
if not self.do_active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi V - I thought that if do_active
is False then we do not want to return, rather we want to set _version
to 2
(or is it 1
) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeps indeed, I followed up on your change request, thanks, bud!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit - this was a defunct version of this comment - I shall rewrite it below ....
Hi V, I don't get what uri_analyzer
is doing. If the file doesn't exist, then we can do anything, so setting self._version = 2
won't help us.
From a cf-python perspective, I know that the file exists - the question is "is it in location that supports active storage requests?" If "yes" then self._version = 1
, and if "no" then self._version = 2
Hi V - couldn't do this as a suggestions, so here it is: def __init__(self, uri, ncvar, missing_value=None, _FillValue=None, valid_min=None, valid_max=None):
"""
Instantiate with a NetCDF4 dataset and the variable of interest within that file.
(We need the variable, because we need variable specific metadata from within that
file, however, if that information is available at instantiation, it can be provided
using keywords and avoid a metadata read.)
"""
# Assume NetCDF4 for now
self.uri = uri
if self.uri is None:
raise ValueError(f"Must use a valid file for uri. Got {self.uri}")
if not os.path.isfile(self.uri):
raise ValueError(f"Must use existing file for uri. {self.uri} not found")
self.ncvar = ncvar
if self.ncvar is None:
raise ValueError("Must set a netCDF variable name to slice")
self.zds = None
self._components = False
self._method = None
# New code. `remote_active` is a function that returns True if an active
# reduction is available on the physical storage, otherwise
# returns False.
if remote_active(uri):
# Do reduction on physical storage
self._version = 2
else:
# Do reduction locally
self._version = 1
# obtain metadata, using netcdf4_python for now |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #69 +/- ##
==========================================
+ Coverage 87.22% 87.61% +0.38%
==========================================
Files 6 6
Lines 415 428 +13
==========================================
+ Hits 362 375 +13
Misses 53 53
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
cheers much @davidhassell - I have just implemented the approach suggested by you - indeed - it do make sense 😁 Just waiting on the coverage test, I might have to improve the coverage |
no need to to tweak coverage - codecov is happy 😁 |
@davidhassell you wanna have a last look and merge, bud? 🍺 |
@davidhassell frenly ping 🤏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi V, Can't apologise enough for not getting on to this sooner, but better late than never, I hope :)
Parameters | ||
---------- | ||
uri: str | ||
input ile URI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input ile URI. | |
Input file URI. |
uri_analyzer_result = uri_analyzer(self.uri) | ||
self.do_active = uri_analyzer_result | ||
# otherwise just get out and return active.do_active = False | ||
if not self.do_active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit - this was a defunct version of this comment - I shall rewrite it below ....
Hi V, I don't get what uri_analyzer
is doing. If the file doesn't exist, then we can do anything, so setting self._version = 2
won't help us.
From a cf-python perspective, I know that the file exists - the question is "is it in location that supports active storage requests?" If "yes" then self._version = 1
, and if "no" then self._version = 2
@@ -10,6 +10,42 @@ | |||
from activestorage import netcdf_to_zarr as nz | |||
|
|||
|
|||
def uri_analyzer(uri): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below ....
(to replace the struck through comment above) Hi V, I don't get what From a cf-python perspective, I know that the file exists - the questions are "Is it in location that supports active storage requests?" and "Is the reduction method supported?" If the answer to either of these is "no", then I don't ant to modify the Dask graph. I think we need an Hope that makes sense! |
Hey folks, I have added two things:
Active.__init__()
on the file/uri, and moved them up in a dedicated file/uri examiner function calleduri_analyzer()
- currently it sits in theactive
module, which is sub-optimal, but I'll move it to a dedicatedtools
module at a later stageActive
it either raises, or returnsbool
active = Active()
now has an attributedo_active
that is either True or False, if False no computation is happening, if True then all the active machinery is happeningactive.do_active
is the result ofuri_analyzer
(False if that returns False etc), but we can extend this