Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active interface for cf python: add active.do_active attribute and uri_analyzer() function #69

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

valeriupredoi
Copy link
Collaborator

@valeriupredoi valeriupredoi commented Mar 13, 2023

Hey folks, I have added two things:

  • I lifted the proto-checks we were doing in Active.__init__() on the file/uri, and moved them up in a dedicated file/uri examiner function called uri_analyzer() - currently it sits in the active module, which is sub-optimal, but I'll move it to a dedicated tools module at a later stage
  • once this is called from within Active it either raises, or returns bool
  • active = Active() now has an attribute do_active that is either True or False, if False no computation is happening, if True then all the active machinery is happening
  • currently the only setter for this active.do_active is the result of uri_analyzer (False if that returns False etc), but we can extend this

@valeriupredoi valeriupredoi added the enhancement New feature or request label Mar 13, 2023
@@ -10,6 +10,42 @@
from activestorage import netcdf_to_zarr as nz


def uri_analyzer(uri):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to assess if active storage reductions are possible, not just that the file exists?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! But how do we do that? We need a set of standards here to get conditions in

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below ....

uri_analyzer_result = uri_analyzer(self.uri)
self.do_active = uri_analyzer_result
# otherwise just get out and return active.do_active = False
if not self.do_active:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi V - I thought that if do_active is False then we do not want to return, rather we want to set _version to 2 (or is it 1) ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeps indeed, I followed up on your change request, thanks, bud!

Copy link
Collaborator

@davidhassell davidhassell Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit - this was a defunct version of this comment - I shall rewrite it below ....

Hi V, I don't get what uri_analyzer is doing. If the file doesn't exist, then we can do anything, so setting self._version = 2 won't help us.

From a cf-python perspective, I know that the file exists - the question is "is it in location that supports active storage requests?" If "yes" then self._version = 1, and if "no" then self._version = 2

@davidhassell
Copy link
Collaborator

Hi V - couldn't do this as a suggestions, so here it is:

    def __init__(self, uri, ncvar, missing_value=None, _FillValue=None, valid_min=None, valid_max=None):
        """
        Instantiate with a NetCDF4 dataset and the variable of interest within that file.
        (We need the variable, because we need variable specific metadata from within that
        file, however, if that information is available at instantiation, it can be provided
        using keywords and avoid a metadata read.)
        """
        # Assume NetCDF4 for now
        self.uri = uri
        if self.uri is None:
            raise ValueError(f"Must use a valid file for uri. Got {self.uri}")
        if not os.path.isfile(self.uri):
            raise ValueError(f"Must use existing file for uri. {self.uri} not found")
        self.ncvar = ncvar
        if self.ncvar is None:
            raise ValueError("Must set a netCDF variable name to slice")
        self.zds = None

        self._components = False
        self._method = None

        # New code. `remote_active` is a function that returns True if an active
        #           reduction is available on the physical storage, otherwise
        #           returns False.
        if remote_active(uri):
            # Do reduction on physical storage
            self._version = 2
        else:
            # Do reduction locally
            self._version = 1

        # obtain metadata, using netcdf4_python for now

@codecov
Copy link

codecov bot commented Mar 21, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.38 🎉

Comparison is base (526bff1) 87.22% compared to head (4bae515) 87.61%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #69      +/-   ##
==========================================
+ Coverage   87.22%   87.61%   +0.38%     
==========================================
  Files           6        6              
  Lines         415      428      +13     
==========================================
+ Hits          362      375      +13     
  Misses         53       53              
Impacted Files Coverage Δ
activestorage/active.py 96.20% <100.00%> (+0.34%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@valeriupredoi
Copy link
Collaborator Author

cheers much @davidhassell - I have just implemented the approach suggested by you - indeed - it do make sense 😁 Just waiting on the coverage test, I might have to improve the coverage

@valeriupredoi
Copy link
Collaborator Author

no need to to tweak coverage - codecov is happy 😁

@valeriupredoi
Copy link
Collaborator Author

@davidhassell you wanna have a last look and merge, bud? 🍺

@valeriupredoi
Copy link
Collaborator Author

@davidhassell frenly ping 🤏

Copy link
Collaborator

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi V, Can't apologise enough for not getting on to this sooner, but better late than never, I hope :)

Parameters
----------
uri: str
input ile URI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input ile URI.
Input file URI.

uri_analyzer_result = uri_analyzer(self.uri)
self.do_active = uri_analyzer_result
# otherwise just get out and return active.do_active = False
if not self.do_active:
Copy link
Collaborator

@davidhassell davidhassell Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit - this was a defunct version of this comment - I shall rewrite it below ....

Hi V, I don't get what uri_analyzer is doing. If the file doesn't exist, then we can do anything, so setting self._version = 2 won't help us.

From a cf-python perspective, I know that the file exists - the question is "is it in location that supports active storage requests?" If "yes" then self._version = 1, and if "no" then self._version = 2

@@ -10,6 +10,42 @@
from activestorage import netcdf_to_zarr as nz


def uri_analyzer(uri):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below ....

@davidhassell
Copy link
Collaborator

davidhassell commented Jun 21, 2023

(to replace the struck through comment above)

Hi V, I don't get what uri_analyzer is doing. If the file doesn't exist, then we can do anything, so setting self._version = 2 won't help us.

From a cf-python perspective, I know that the file exists - the questions are "Is it in location that supports active storage requests?" and "Is the reduction method supported?" If the answer to either of these is "no", then I don't ant to modify the Dask graph.

I think we need an Active classmethod (called uri_analyzer perhaps :)) that can tell me if both the answers are "yes". The Active instance itself doesn't need to worry about this - it just tries to do version 1, falling back to version 2 if that doesn't work. There's no problem with it doing a file-exist check if it wants to, but to raise an Exception if it doesn't, rather than using the result to set the version (although if it doesn't exist you'll end up with a decent exception and message even if you don't).

Hope that makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants