Feature/regions stats utils #64

sherrieF · 2024-04-18T21:25:53Z

This pull request is for the branch feature/regions_stats_utils. It includes the new region_utils.py script that was added to the score-hv/src/score_hv/region_utils.py. The region_utils.py code has an instantiation method, an add_user_region and
a get_user_region method.
The main change in the stats_utils.py in score-hv/src/score_hv/ was to take the calculation of the temporal_means array
out of the stats_utils and put it into the python script that calls the stats_utils methods. In this case it was put into the daily_bfg.py harvester.
The daily_bfg.py harvester was changed to now calculate the temporal mean. The temporal mean is then passed into
the methods in the stats_utils to calculate the desired statistics. Code was added to the daily_bfg file work with the
region/regions requested by the user. If no region was requested by the user a default is set that is global.
The file test_harvester_daily_bfg_prateb.py in the test directory was modified to test all of the statistics returned by
daily_bfg.py harvester. In this file five regions were requested. The values of mean, variance, minimum and maximum
were returned from the daily_bfg.py harvester. The values that were hard coded for the statistics were calculated
off line.
The file test_harvester_daily_toa_radative_flux.py was modified to work with the new daily_bfg.py harvester.
White spaces were removed from the test_harvester_daily_bfg_tmp2m.py. This is the only change that was made.

…ateb.py

The calculation was taken out of the stats_utils python code.

script are within bounds was added to the add_user_region function in the region_utils script.

daily_bfg.py. Changed test_harvester_daily_toa_radative_flux.py to work with the new version of daily_bfg.py.

jrknezha

This is a great functionality to get added! Really going to expand our harvest capabilities. I have a few comments that need to be addressed. The most important being that we can't require the east_lon > west_lon so we can have regions that include the prime meridian.

jrknezha · 2024-04-19T17:13:07Z

src/score_hv/harvesters/daily_bfg.py

-                    'netrf_avetoa', # top of atmosphere net radiation flux (SW and LW) (W m^-2)
-                    #'prate_ave', # surface precip rate (mm weq. s^-1)
-                    'prateb_ave', # bucket surface precip rate (mm weq. s^-1)
+                    'netrf_avetoa',#top of atmoshere net radiative flux (SW and LW) (W/m**2)


tiny thing: atmosphere is missing the p

jrknezha · 2024-04-19T17:25:37Z

src/score_hv/harvesters/daily_bfg.py

+                   msg=f"No latitude values found within the specified range of {min_lat} and {max_lat}."
+                   raise KeyError(msg)
+
+                desired_longitude_indices=[index for index, lon in enumerate(longitudes) if east_lon <= lon <= west_lon]


I think there may be issues with this code in the future because we don't guarantee that east_lon is less than west_lon and we don't want to so that we can have regions surrounding the prime meridian. can we update this to handle these cases?

jrknezha · 2024-04-19T17:27:34Z

src/score_hv/harvesters/daily_bfg.py

+                       """
+                       if statistic == 'mean':
+                          themeans=var_stats_instance.weighted_averages[iregion] 
+                          value=themeans 


why not just set value directly: value=var_stats_instance.weighted_averages[iregion] ?

same comment for the variables below. I think it's cleaner to set it directly

jrknezha · 2024-04-19T17:31:24Z

src/score_hv/region_utils.py

+                 f'max_lat: {new_max_lat}'
+           raise ValueError(msg)        
+
+        if new_east_lon > new_west_lon:


We can't require this here because we need to be able to have regions that go around the prime meridian. such as west: 340 and east: 20 should be a viable option.

This is partly because we require that the values be between 0 and 360.

jrknezha · 2024-04-19T17:36:02Z

src/score_hv/stats_utils.py

+               weighted_average=self.calculate_weighted_average(var_data,gridcell_area_weights,temporal_mean) 
+            if stat=='variance':
+               self.calculate_var_variance(gridcell_area_weights,temporal_mean,weighted_average) 


how is calculate_var_variance supposed to get the weighted_average value if it's in a different if statement? what happens if someone only asks for variance, wouldn't it break?

If the variance is dependent on the weighted_average, it needs to be calculated within the if statement for variance as well.

jrknezha · 2024-04-19T17:40:31Z

tests/test_harvester_daily_bfg_prateb.py

-    gridcell_area_data.close()
-
+           assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value
+           assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value


this assert statement is the exact same as the one above it. please remove the duplicate or change it to check something else

jrknezha · 2024-04-19T17:41:05Z

tests/test_harvester_daily_bfg_prateb.py

-    """
+        if harvested_tuple.statistic == 'variance': 
+           assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value
+           assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value


same thing here with a duplicated assert statement

jrknezha · 2024-04-19T17:41:21Z

tests/test_harvester_daily_bfg_prateb.py

-    gridcell_area_data.close()
+        if harvested_tuple.statistic == 'minimum': 
+           assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value
+           assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value


also a duplicated assert statement

jrknezha · 2024-04-19T17:41:38Z

tests/test_harvester_daily_bfg_prateb.py

+    for i, harvested_tuple in enumerate(data1):
+        if harvested_tuple.statistic == 'maximum': 
+           assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value
+           assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value


also a duplicate assert statement

jrknezha · 2024-04-19T17:43:19Z

tests/test_harvester_daily_toa_radative_flux.py

+        value = summation/(file_count + 1) 
+        ivar_weighted_mean = np.ma.sum(norm_weights * value) 
+        assert ivar_weighted_mean <= (1 + tolerance) * weighted_means[ivar] 
+        assert ivar_weighted_mean >= (1 - tolerance) * weighted_means[ivar]


is this the assert tolerance pattern that should be used in those duplicates above?

sherrieF · 2024-04-19T18:18:49Z

Hi Jessica, Thanks very much for your comments. I will address these. Have a great weekend. Sherrie

…

On Fri, Apr 19, 2024 at 11:46 AM Jessica Knezha ***@***.***> wrote: ***@***.**** requested changes on this pull request. This is a great functionality to get added! Really going to expand our harvest capabilities. I have a few comments that need to be addressed. The most important being that we can't require the east_lon > west_lon so we can have regions that include the prime meridian. ------------------------------ In src/score_hv/harvesters/daily_bfg.py <#64 (comment)>: > #'lhtfl_ave', # surface latent heat flux (W m^-2) - 'netrf_avetoa', # top of atmosphere net radiation flux (SW and LW) (W m^-2) - #'prate_ave', # surface precip rate (mm weq. s^-1) - 'prateb_ave', # bucket surface precip rate (mm weq. s^-1) + 'netrf_avetoa',#top of atmoshere net radiative flux (SW and LW) (W/m**2) tiny thing: atmosphere is missing the p ------------------------------ In src/score_hv/harvesters/daily_bfg.py <#64 (comment)>: > else: - longname = "None" - if 'units' in variable_data.attrs: - units = variable_data.attrs['units'] + msg=f"No latitude values found within the specified range of {min_lat} and {max_lat}." + raise KeyError(msg) + + desired_longitude_indices=[index for index, lon in enumerate(longitudes) if east_lon <= lon <= west_lon] I think there may be issues with this code in the future because we don't guarantee that east_lon is less than west_lon and we don't want to so that we can have regions surrounding the prime meridian. can we update this to handle these cases? ------------------------------ In src/score_hv/harvesters/daily_bfg.py <#64 (comment)>: > + longitudes=xr_dataset['grid_xt'] + num_lat = len(weights['grid_yt']) + num_lon = len(weights['grid_xt']) + value = np.ma.masked_invalid(var_data.mean(dim='time',skipna=True)) + temporal_means.append(value) + var_stats_instance.calculate_requested_statistics(var_data,weights,value) + + for iregion in range(num_regions): + for j, statistic in enumerate(self.config.get_stats()): + """ The second nested loop iterates through each requested + statistic and regions if the user has requested geographical + regions.. + """ + if statistic == 'mean': + themeans=var_stats_instance.weighted_averages[iregion] + value=themeans why not just set value directly: value=var_stats_instance.weighted_averages[iregion] ? ------------------------------ In src/score_hv/region_utils.py <#64 (comment)>: > + if new_min_lat > new_max_lat: + msg = f'minimum latitude must be less than maximum latitude ' \ + f'min_lat: {new_min_lat}, max_lat: {new_max_lat}' + raise ValueError(msg) + + if new_min_lat < -90. or new_min_lat > 90.: + msg = f'minimum latitude must be greater than -90. and less than 90. '\ + f'min_lat: {new_min_lat}' + raise ValueError(msg) + + if new_max_lat > 90.0 or new_max_lat < -90.: + msg = f'maximum latitude must be greater than -90. and less than 90. '\ + f'max_lat: {new_max_lat}' + raise ValueError(msg) + + if new_east_lon > new_west_lon: We can't require this here because we need to be able to have regions that go around the prime meridian. such as west: 340 and east: 20 should be a viable option. This is partly because we require that the values be between 0 and 360. ------------------------------ In src/score_hv/stats_utils.py <#64 (comment)>: > + requested by the user. + var_data:the variable data in an array + opened with xarray in the calling + routine. + gridcell_area_weights:the gridcell weights are from the data file + bfg_control_1536x768_20231116.nc. + Located in the data area of score-hv. + temporal_mean:the temporal means array that is calculated in + the calling function. + Return:Nothing is returned. + """ + for stat in self.stats: + if stat=='mean': + weighted_average=self.calculate_weighted_average(var_data,gridcell_area_weights,temporal_mean) + if stat=='variance': + self.calculate_var_variance(gridcell_area_weights,temporal_mean,weighted_average) how is calculate_var_variance supposed to get the weighted_average value if it's in a different if statement? what happens if someone only asks for variance, wouldn't it break? If the variance is dependent on the weighted_average, it needs to be calculated within the if statement for variance as well. ------------------------------ In tests/test_harvester_daily_bfg_prateb.py <#64 (comment)>: > for i, harvested_tuple in enumerate(data1): if harvested_tuple.statistic == 'mean': - assert global_mean <= (1 + tolerance) * harvested_tuple.value - assert global_mean >= (1 - tolerance) * harvested_tuple.value - - gridcell_area_data.close() - + assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value + assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value this assert statement is the exact same as the one above it. please remove the duplicate or change it to check something else ------------------------------ In tests/test_harvester_daily_bfg_prateb.py <#64 (comment)>: > for i, harvested_tuple in enumerate(data1): - if harvested_tuple.statistic == 'variance': - assert variance <= (1 + tolerance) * harvested_tuple.value - assert variance >= (1 - tolerance) * harvested_tuple.value - - gridcell_area_data.close() - -def test_gridcell_min_max(tolerance=0.001): - """Opens each background Netcdf file using the - netCDF4 library function Dataset and computes the maximum - of the provided variable. In this case prateb_ave. - """ + if harvested_tuple.statistic == 'variance': + assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value + assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value same thing here with a duplicated assert statement ------------------------------ In tests/test_harvester_daily_bfg_prateb.py <#64 (comment)>: > - - assert offline_max <= (1 + tolerance) * harvested_tuple.value - assert offline_max >= (1 - tolerance) * harvested_tuple.value - - - elif harvested_tuple.statistic == 'minimum': - assert minimum <= (1 + tolerance) * harvested_tuple.value - assert minimum >= (1 - tolerance) * harvested_tuple.value - - assert offline_min <= (1 + tolerance) * harvested_tuple.value - assert offline_min >= (1 - tolerance) * harvested_tuple.value - - gridcell_area_data.close() + if harvested_tuple.statistic == 'minimum': + assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value + assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value also a duplicated assert statement ------------------------------ In tests/test_harvester_daily_bfg_prateb.py <#64 (comment)>: > - gridcell_area_data.close() + if harvested_tuple.statistic == 'minimum': + assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value + assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value + iregion = iregion + 1 + +def test_gridcell_max(tolerance=0.001): + data1 = harvest(VALID_CONFIG_DICT) + if 'region' in str(data1[0]): + max_values = [0.0032889172, 0.002300344, 0.0009843283, 0.00071415555, 0.004360093] + + iregion = 0 + for i, harvested_tuple in enumerate(data1): + if harvested_tuple.statistic == 'maximum': + assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value + assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value also a duplicate assert statement ------------------------------ In tests/test_harvester_daily_toa_radative_flux.py <#64 (comment)>: > - separate program from the bfg files above and then - hard coded in this function for a - test of what is returned from the harvester. - """ + for ivar in range(num_vars): + summation = 0.0 + var_name = required_vars[ivar] + for file_count, data_file in enumerate(BFG_PATH): + test_rootgrp = Dataset(data_file) + with Dataset(data_file, 'r') as data: + var_data = test_rootgrp.variables[var_name] + summation += var_data[0] + value = summation/(file_count + 1) + ivar_weighted_mean = np.ma.sum(norm_weights * value) + assert ivar_weighted_mean <= (1 + tolerance) * weighted_means[ivar] + assert ivar_weighted_mean >= (1 - tolerance) * weighted_means[ivar] is this the assert tolerance pattern that should be used in those duplicates above? ------------------------------ In src/score_hv/harvesters/daily_bfg.py <#64 (comment)>: > + longitudes=xr_dataset['grid_xt'] + num_lat = len(weights['grid_yt']) + num_lon = len(weights['grid_xt']) + value = np.ma.masked_invalid(var_data.mean(dim='time',skipna=True)) + temporal_means.append(value) + var_stats_instance.calculate_requested_statistics(var_data,weights,value) + + for iregion in range(num_regions): + for j, statistic in enumerate(self.config.get_stats()): + """ The second nested loop iterates through each requested + statistic and regions if the user has requested geographical + regions.. + """ + if statistic == 'mean': + themeans=var_stats_instance.weighted_averages[iregion] + value=themeans same comment for the variables below. I think it's cleaner to set it directly — Reply to this email directly, view it on GitHub <#64 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK2TAENI7WVGXEBQZWHYB3TY6FKAPAVCNFSM6AAAAABGOBUELOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMJRHE3DMOJXGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

that crosses the prime merdian.

west_longitude value so a region can cross the prime meridian. Changed the get_region function to get_region_coordinates. Put the region coordinate values in the regions class.

put it in the region class. Added the cacluation of the weighted average to the variance calculation function.

and put it in the region_utils class. took out the iregion loop. Now the stats_utils returns all calculated statistics at once.

amschne

Great to see the underlying capability to harvest regional metrics take form! I think most of the difficulty is behind us.

That said, I have a few major comments.

Most of region_utils.py is functionally sound. However, I am having a difficult time understanding get_region_coordinates(). Hopefully, simply talking it through can help me understand the routine, and then I can provide more helpful feedback.
I suggest changing the structure of the regions part of the request dictionary. I provided some suggestions on how to do so, which would remove some complexity toward handling region requests in daily_bfg.py.
Some additional exception handling is needed to help catch (1) bad requests and (2) to correctly assert that the grid cell weighting, applied to regions, is accurate. I will work on this and can push a commit.
There are several variable definitions that seem unnecessary. I suggest reducing definitions where not needed unless they improve execution speed (e.g., when indexing large arrays, especially within nested for loops).

Please consider the following suggestions that could address the above comments. Overall, I think support for regions will be ready for production after addressing the above.

amschne · 2024-04-30T22:04:46Z

src/score_hv/harvesters/daily_bfg.py


 HARVESTER_NAME = 'daily_bfg'
 VALID_STATISTICS = ('mean', 'variance', 'minimum', 'maximum')

+


should probably hook up support to verify that requested bound keys are valid

Suggested change

VALID_REGION_BOUND_KEYS = ('min_lat', 'max_lat', 'east_lon', 'west_lon')

amschne · 2024-04-30T22:10:06Z

src/score_hv/harvesters/daily_bfg.py

-                    #'prate_ave', # surface precip rate (mm weq. s^-1)
-                    'prateb_ave', # bucket surface precip rate (mm weq. s^-1)
+                    'netrf_avetoa',#top of atmosphere net radiative flux (SW and LW) (W/m**2)
+                    'netR',#surface energy balance (W/m**2)


trying to adhere to convention:

Suggested change

'netR',#surface energy balance (W/m**2)

'netef_ave', # surface energy budget (W/m**2)

amschne · 2024-04-30T22:12:53Z

src/score_hv/harvesters/daily_bfg.py

-                                             'longname'])
-
+                                             'longname',
+                                             'region'])
 def get_gridcell_area_data_path():
    return os.path.join(Path(__file__).parent.parent.parent.parent.resolve(),


@KevinCounts - we'll have to change this when we move the location of the data dir

src/score_hv/harvesters/daily_bfg.py

amschne · 2024-04-30T22:40:09Z

src/score_hv/harvesters/daily_bfg.py

+      meets a certain threshold.
+      """
+    sumweights = area_weights.sum()
+    ratio = total_regional_elements / total_original_elements 


this approach to calculate the total solid angle of the region is valid only for uniform area grids and will have to be revised to work with most other grids

amschne · 2024-05-01T23:11:02Z

src/score_hv/harvesters/daily_bfg.py

        for i, variable in enumerate(self.config.get_variables()):
            """ The first nested loop iterates through each requested variable.
+               """
+
+            namelist=self.config.get_variables()


Suggested change

namelist=self.config.get_variables()

amschne · 2024-05-01T23:11:12Z

src/score_hv/harvesters/daily_bfg.py

+               """
+
+            namelist=self.config.get_variables()
+            var_name=namelist[i]


Suggested change

var_name=namelist[i]

amschne · 2024-05-01T23:12:03Z

src/score_hv/harvesters/daily_bfg.py

+
+            namelist=self.config.get_variables()
+            var_name=namelist[i]
+            if var_name == "netR":


var_name can just be replaced with "variable," which is already unpacked

amschne · 2024-05-01T23:12:47Z

src/score_hv/harvesters/daily_bfg.py

+            var_name=namelist[i]
+            if var_name == "netR":
+                 variable_data=calculate_surface_energy_balance(xr_dataset)
+                 longname="surface energy balance"


Suggested change

longname="surface energy balance"

longname="surface energy budget"

amschne · 2024-05-02T16:18:05Z

src/score_hv/region_utils.py

+
+          Parameters:
+          - new_name: Name of the new region.
+          - new_min_lat: Minimum latitude of the new region.


Suggested change

- new_min_lat: Minimum latitude of the new region.

- min_lat: Minimum latitude of the new region.

functions a nested dictionary. So the region dictionary now has region name latitude_range and longitude range as key words. Where region name is a name given to the region by the user."

DEFAULT_LATITUDE_RANGE = (-90, 90) DEFAULT_LONGITUDE_RANGE = (360, 0) The add_user_region method takes a nested dictionary as an argument. The dictionary is parsed in this routine. An outline of this method is: Test to see if the dictionay is empty. Exit if it is Test of see if the user has included a region name. Exit if the region name is missing. Test to see if the key words latitude_region and/or longitude_region is present. If one of them is missing supply the defaults from above. Methods test_user_latitudes and test_user_longitudes were updated. The method get_region_coordinates will now calculate the region coordinates for a region that crosses the prime meridian. The name of the class was changed to GeoRegionsCatalog from GeoRegions.

import var_stats to import var_statsCatalog Changed the line GeoRegions to import GeoRegionsCatalog

a nested dictionary.

amschne · 2024-05-29T20:54:27Z

src/score_hv/stats_utils.py

-def area_weighted_mean(xarray_variable, gridcell_area_weights):  
-    """returns the gridcell area weighted mean of xarray_variable and checks
-    that gridcell_area_weights are valid
+class var_stats:


I think the class needs to be named "var_statsCatalog," which is how it is referenced elsewhere

Agreeing that it needs to match. I would suggest VarStatsCatalog as a similar naming structure to other classes. Either way, this needs to match what gets imported in daily_bfg.py

jrknezha · 2024-06-13T19:56:42Z

This branch is behind in commits from develop. It would be good to merge develop into this branch before we complete the PR to handle any potential merge conflicts easier.

jrknezha

Overall it looks good. Mostly some minor questions in teh tests.

Biggest issue is that the import of the class from stats_utils into daily_bfg.py doesn't match the name in the name of the actual class. Exact location is noted in comments. This is the only thing I see that is holding up approval.

jrknezha · 2024-06-13T20:02:12Z

src/score_hv/harvesters/daily_bfg.py

+        return self.regions
+
+    def set_regions(self):
+        self.regions = self.config_data.get('region')


I'm wondering if it would make sense for the input value to be called regions plural since it can be multiple to make that more obvious to the user. Either way, we want to make sure that the readme reflects the expected value here 'region' (as of now) as I could see that being confusing so let's just catch it early.

jrknezha · 2024-06-13T20:02:56Z

src/score_hv/harvesters/daily_bfg.py

@@ -111,7 +209,7 @@ class DailyBFGHv(object):
        Parameters:
        -----------
        config: DailyBFGConfig object containing information used to determine
-                which variable to parse the log file
+                which  variable to parse the log file


random extra space

jrknezha · 2024-06-13T20:34:46Z

src/score_hv/harvesters/daily_bfg.py

+from dataclasses import dataclass
+from dataclasses import field
+from score_hv.config_base  import ConfigInterface
+from score_hv.stats_utils  import var_statsCatalog


this import doesn't match the name of the class

jrknezha · 2024-06-13T20:36:40Z

src/score_hv/stats_utils.py

-def area_weighted_mean(xarray_variable, gridcell_area_weights):  
-    """returns the gridcell area weighted mean of xarray_variable and checks
-    that gridcell_area_weights are valid
+class var_stats:


Agreeing that it needs to match. I would suggest VarStatsCatalog as a similar naming structure to other classes. Either way, this needs to match what gets imported in daily_bfg.py

jrknezha · 2024-06-13T20:40:01Z

tests/test_harvester_daily_bfg_prateb.py

+                  num_values = len(harvested_tuple.value)
+                  for inum in range(num_values):
+                      assert global_means[inum] <= (1 + tolerance) * harvested_tuple.value[inum]
+                      assert global_means[inum] <= (1 + tolerance) * harvested_tuple.value[inum]


this is the same as the line before it, should it be >= and - ?

jrknezha · 2024-06-13T20:40:21Z

tests/test_harvester_daily_bfg_prateb.py

+                   num_values = len(harvested_tuple.value)
+                   for inum in range(num_values):
+                       assert variances[inum] <= (1 + tolerance) * harvested_tuple.value[inum]
+                       assert variances[inum] <= (1 + tolerance) * harvested_tuple.value[inum]


this assert is the same as the line above it, should it be >= and - ?

jrknezha · 2024-06-13T20:40:48Z

tests/test_harvester_daily_bfg_prateb.py

+                   num_values = len(harvested_tuple.value)
+                   for inum in range(num_values):
+                       assert min_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum]
+                       assert min_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum]


assert is the same as the line above it, should it be >= and -?

jrknezha · 2024-06-13T20:41:20Z

tests/test_harvester_daily_bfg_prateb.py

+                  num_values = len(harvested_tuple.value)
+                  for inum in range(num_values):
+                      assert max_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum]
+                      assert max_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum]


assert is the same as the line above it, should it be >= and -?

parameter var_data.

parameter from the var_stats_instance.calculate_requested_statistics(weights,value) call. It is not used in the calculate_requested_statistics class.

…ested_tuple.value[inum] lines to assert global_means[inum] >= (1 - tolerance) * harvested_tuple.value[inum]

region_solid_angle = (sum_region_weights / sum_global_weights) * 4 * np.pi

to the region_utils. This method returns the data in the specified user region from the original data set.

data have the same shape in the variance calculations. The solid angles. weights that are passed in to the calculate_requested_statistics are the normalized weights based on the Please enter the commit message for your changes. Lines starting

were made. These statistics were calculated with the solid angle weights.

Added a method "calculate_and_normalize_solid_angle" for the calculation of the solid angle and normalizaton of the weights.

variable data to NaN's. Added code for handeling masks.

If they come in as xarray DataArrays they are converted to NumPy arrays before any calculations are done.

This was removed from the find_mininum_value and find_maximum_value methods. The temporal_means array is converted to a NumPy array before any calculations are done.

sherrieF added 5 commits April 4, 2024 15:46

added region capability, stats_utils and test_harvester_daily_bfg_pr…

c09bb22

…ateb.py

Modified daily_bfg.py to calculate the temporal means arrays.

2d588dc

The temporal means array is now calcuated in the calling python script.

1bea5a8

The calculation was taken out of the stats_utils python code.

Tests to make sure that the region values passed in my the calling

bd3a50f

script are within bounds was added to the add_user_region function in the region_utils script.

Changed daily_bfg_prateb.py to test the region values returned by

811aac6

daily_bfg.py. Changed test_harvester_daily_toa_radative_flux.py to work with the new version of daily_bfg.py.

sherrieF requested review from amschne and jrknezha April 18, 2024 21:25

sherrieF assigned jrknezha Apr 18, 2024

jrknezha requested changes Apr 19, 2024

View reviewed changes

sherrieF added 4 commits April 24, 2024 10:46

Added extra values to the test to account for an extra region

8580652

that crosses the prime merdian.

Took out the condition that east_longitude value must be less than

032fff9

west_longitude value so a region can cross the prime meridian. Changed the get_region function to get_region_coordinates. Put the region coordinate values in the regions class.

Took out the calculation of the region array indices and

4747e98

put it in the region class. Added the cacluation of the weighted average to the variance calculation function.

Took out the calculation of the region array coordinates

e04e67f

and put it in the region_utils class. took out the iregion loop. Now the stats_utils returns all calculated statistics at once.

amschne requested changes May 2, 2024

View reviewed changes

jrknezha assigned jrknezha and sherrieF and unassigned jrknezha May 13, 2024

sherrieF added 4 commits May 20, 2024 11:26

Made the region part of the VALID_CONFIG_DICT in the test

fee7410

functions a nested dictionary. So the region dictionary now has region name latitude_range and longitude range as key words. Where region name is a name given to the region by the user."

Changed the region dictionary to a nested dictionary. Changed the line

33d5cd9

import var_stats to import var_statsCatalog Changed the line GeoRegions to import GeoRegionsCatalog

The daily_bfg.py files was changed to work with region as

92f1838

a nested dictionary.

sherrieF requested review from amschne and jrknezha May 29, 2024 20:43

amschne reviewed May 29, 2024

View reviewed changes

jrknezha requested changes Jun 13, 2024

View reviewed changes

sherrieF added 2 commits June 17, 2024 10:02

Changed the name of the class to VarStatsCatalog. Took out un_used

bdd3ec7

parameter var_data.

fixed some indintations in add_user_regions..

ea455f5

sherrieF added 19 commits June 17, 2024 10:10

the variable region to regions. Removed the var_data

372e54f

parameter from the var_stats_instance.calculate_requested_statistics(weights,value) call. It is not used in the calculate_requested_statistics class.

changed duplicate assert global_means[inum] <= (1 + tolerance) * harv…

14047f2

…ested_tuple.value[inum] lines to assert global_means[inum] >= (1 - tolerance) * harvested_tuple.value[inum]

The calculation of the solid angle for the weights was added.

b8d4121

region_solid_angle = (sum_region_weights / sum_global_weights) * 4 * np.pi

A method called get_region_data(self,region_index,data): was added

37fcbc1

to the region_utils. This method returns the data in the specified user region from the original data set.

Small changes to the hard coded values of the variances and means

3691bbe

were made. These statistics were calculated with the solid angle weights.

Added documentation.

8d43059

Added a method "calculate_and_normalize_solid_angle" for the calculation of the solid angle and normalizaton of the weights.

Added a function to replace missing and fill values in the

dda8cb0

variable data to NaN's. Added code for handeling masks.

Added code to check for the array type for weights temporal mean arrays.

2aaf3a5

If they come in as xarray DataArrays they are converted to NumPy arrays before any calculations are done.

Took out lines temporal_means_array=np.array(temporal_mean).

d8f38e1

This was removed from the find_mininum_value and find_maximum_value methods. The temporal_means array is converted to a NumPy array before any calculations are done.

NO changes were made.

e775151

added correct code for getting a region that crossed the prime meridian.

915e3e4

added this file for testing of the masking

88ff9e3

Added the land masking field to the netcdf files to work with masking

f1a9499

added the masking code to this version of daily_bfg.py

5f4a138

added new regions to test

109c9e2

added the prate_ave,sotyp to bfg files.

1f5c15e

test for no region and no masking

22438c5

added code to output netcdf temporal mean file.

e569920


		HARVESTER_NAME = 'daily_bfg'
		VALID_STATISTICS = ('mean', 'variance', 'minimum', 'maximum')


	VALID_REGION_BOUND_KEYS = ('min_lat', 'max_lat', 'east_lon', 'west_lon')

	'netR',#surface energy balance (W/m**2)
	'netef_ave', # surface energy budget (W/m**2)

	longname="surface energy balance"
	longname="surface energy budget"

	- new_min_lat: Minimum latitude of the new region.
	- min_lat: Minimum latitude of the new region.

Feature/regions stats utils #64

Are you sure you want to change the base?

Feature/regions stats utils #64

Conversation

sherrieF commented Apr 18, 2024

jrknezha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sherrieF commented Apr 19, 2024 via email

amschne left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amschne Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrknezha commented Jun 13, 2024

jrknezha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amschne left a comment •

edited

Loading

amschne Apr 30, 2024 •

edited

Loading