-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/regions stats utils #64
base: develop
Are you sure you want to change the base?
Conversation
The calculation was taken out of the stats_utils python code.
script are within bounds was added to the add_user_region function in the region_utils script.
daily_bfg.py. Changed test_harvester_daily_toa_radative_flux.py to work with the new version of daily_bfg.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great functionality to get added! Really going to expand our harvest capabilities. I have a few comments that need to be addressed. The most important being that we can't require the east_lon > west_lon so we can have regions that include the prime meridian.
src/score_hv/harvesters/daily_bfg.py
Outdated
'netrf_avetoa', # top of atmosphere net radiation flux (SW and LW) (W m^-2) | ||
#'prate_ave', # surface precip rate (mm weq. s^-1) | ||
'prateb_ave', # bucket surface precip rate (mm weq. s^-1) | ||
'netrf_avetoa',#top of atmoshere net radiative flux (SW and LW) (W/m**2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny thing: atmosphere is missing the p
src/score_hv/harvesters/daily_bfg.py
Outdated
msg=f"No latitude values found within the specified range of {min_lat} and {max_lat}." | ||
raise KeyError(msg) | ||
|
||
desired_longitude_indices=[index for index, lon in enumerate(longitudes) if east_lon <= lon <= west_lon] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there may be issues with this code in the future because we don't guarantee that east_lon is less than west_lon and we don't want to so that we can have regions surrounding the prime meridian. can we update this to handle these cases?
src/score_hv/harvesters/daily_bfg.py
Outdated
""" | ||
if statistic == 'mean': | ||
themeans=var_stats_instance.weighted_averages[iregion] | ||
value=themeans |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just set value directly: value=var_stats_instance.weighted_averages[iregion] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment for the variables below. I think it's cleaner to set it directly
src/score_hv/region_utils.py
Outdated
f'max_lat: {new_max_lat}' | ||
raise ValueError(msg) | ||
|
||
if new_east_lon > new_west_lon: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't require this here because we need to be able to have regions that go around the prime meridian. such as west: 340 and east: 20 should be a viable option.
This is partly because we require that the values be between 0 and 360.
src/score_hv/stats_utils.py
Outdated
weighted_average=self.calculate_weighted_average(var_data,gridcell_area_weights,temporal_mean) | ||
if stat=='variance': | ||
self.calculate_var_variance(gridcell_area_weights,temporal_mean,weighted_average) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is calculate_var_variance supposed to get the weighted_average value if it's in a different if statement? what happens if someone only asks for variance, wouldn't it break?
If the variance is dependent on the weighted_average, it needs to be calculated within the if statement for variance as well.
gridcell_area_data.close() | ||
|
||
assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value | ||
assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assert statement is the exact same as the one above it. please remove the duplicate or change it to check something else
""" | ||
if harvested_tuple.statistic == 'variance': | ||
assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value | ||
assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here with a duplicated assert statement
gridcell_area_data.close() | ||
if harvested_tuple.statistic == 'minimum': | ||
assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value | ||
assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a duplicated assert statement
for i, harvested_tuple in enumerate(data1): | ||
if harvested_tuple.statistic == 'maximum': | ||
assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value | ||
assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also a duplicate assert statement
value = summation/(file_count + 1) | ||
ivar_weighted_mean = np.ma.sum(norm_weights * value) | ||
assert ivar_weighted_mean <= (1 + tolerance) * weighted_means[ivar] | ||
assert ivar_weighted_mean >= (1 - tolerance) * weighted_means[ivar] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the assert tolerance pattern that should be used in those duplicates above?
Hi Jessica,
Thanks very much for your comments. I will address these.
Have a great weekend.
Sherrie
…On Fri, Apr 19, 2024 at 11:46 AM Jessica Knezha ***@***.***> wrote:
***@***.**** requested changes on this pull request.
This is a great functionality to get added! Really going to expand our
harvest capabilities. I have a few comments that need to be addressed. The
most important being that we can't require the east_lon > west_lon so we
can have regions that include the prime meridian.
------------------------------
In src/score_hv/harvesters/daily_bfg.py
<#64 (comment)>:
> #'lhtfl_ave', # surface latent heat flux (W m^-2)
- 'netrf_avetoa', # top of atmosphere net radiation flux (SW and LW) (W m^-2)
- #'prate_ave', # surface precip rate (mm weq. s^-1)
- 'prateb_ave', # bucket surface precip rate (mm weq. s^-1)
+ 'netrf_avetoa',#top of atmoshere net radiative flux (SW and LW) (W/m**2)
tiny thing: atmosphere is missing the p
------------------------------
In src/score_hv/harvesters/daily_bfg.py
<#64 (comment)>:
> else:
- longname = "None"
- if 'units' in variable_data.attrs:
- units = variable_data.attrs['units']
+ msg=f"No latitude values found within the specified range of {min_lat} and {max_lat}."
+ raise KeyError(msg)
+
+ desired_longitude_indices=[index for index, lon in enumerate(longitudes) if east_lon <= lon <= west_lon]
I think there may be issues with this code in the future because we don't
guarantee that east_lon is less than west_lon and we don't want to so that
we can have regions surrounding the prime meridian. can we update this to
handle these cases?
------------------------------
In src/score_hv/harvesters/daily_bfg.py
<#64 (comment)>:
> + longitudes=xr_dataset['grid_xt']
+ num_lat = len(weights['grid_yt'])
+ num_lon = len(weights['grid_xt'])
+ value = np.ma.masked_invalid(var_data.mean(dim='time',skipna=True))
+ temporal_means.append(value)
+ var_stats_instance.calculate_requested_statistics(var_data,weights,value)
+
+ for iregion in range(num_regions):
+ for j, statistic in enumerate(self.config.get_stats()):
+ """ The second nested loop iterates through each requested
+ statistic and regions if the user has requested geographical
+ regions..
+ """
+ if statistic == 'mean':
+ themeans=var_stats_instance.weighted_averages[iregion]
+ value=themeans
why not just set value directly:
value=var_stats_instance.weighted_averages[iregion] ?
------------------------------
In src/score_hv/region_utils.py
<#64 (comment)>:
> + if new_min_lat > new_max_lat:
+ msg = f'minimum latitude must be less than maximum latitude ' \
+ f'min_lat: {new_min_lat}, max_lat: {new_max_lat}'
+ raise ValueError(msg)
+
+ if new_min_lat < -90. or new_min_lat > 90.:
+ msg = f'minimum latitude must be greater than -90. and less than 90. '\
+ f'min_lat: {new_min_lat}'
+ raise ValueError(msg)
+
+ if new_max_lat > 90.0 or new_max_lat < -90.:
+ msg = f'maximum latitude must be greater than -90. and less than 90. '\
+ f'max_lat: {new_max_lat}'
+ raise ValueError(msg)
+
+ if new_east_lon > new_west_lon:
We can't require this here because we need to be able to have regions that
go around the prime meridian. such as west: 340 and east: 20 should be a
viable option.
This is partly because we require that the values be between 0 and 360.
------------------------------
In src/score_hv/stats_utils.py
<#64 (comment)>:
> + requested by the user.
+ var_data:the variable data in an array
+ opened with xarray in the calling
+ routine.
+ gridcell_area_weights:the gridcell weights are from the data file
+ bfg_control_1536x768_20231116.nc.
+ Located in the data area of score-hv.
+ temporal_mean:the temporal means array that is calculated in
+ the calling function.
+ Return:Nothing is returned.
+ """
+ for stat in self.stats:
+ if stat=='mean':
+ weighted_average=self.calculate_weighted_average(var_data,gridcell_area_weights,temporal_mean)
+ if stat=='variance':
+ self.calculate_var_variance(gridcell_area_weights,temporal_mean,weighted_average)
how is calculate_var_variance supposed to get the weighted_average value
if it's in a different if statement? what happens if someone only asks for
variance, wouldn't it break?
If the variance is dependent on the weighted_average, it needs to be
calculated within the if statement for variance as well.
------------------------------
In tests/test_harvester_daily_bfg_prateb.py
<#64 (comment)>:
> for i, harvested_tuple in enumerate(data1):
if harvested_tuple.statistic == 'mean':
- assert global_mean <= (1 + tolerance) * harvested_tuple.value
- assert global_mean >= (1 - tolerance) * harvested_tuple.value
-
- gridcell_area_data.close()
-
+ assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value
+ assert global_means[iregion] <= (1 + tolerance) * harvested_tuple.value
this assert statement is the exact same as the one above it. please remove
the duplicate or change it to check something else
------------------------------
In tests/test_harvester_daily_bfg_prateb.py
<#64 (comment)>:
> for i, harvested_tuple in enumerate(data1):
- if harvested_tuple.statistic == 'variance':
- assert variance <= (1 + tolerance) * harvested_tuple.value
- assert variance >= (1 - tolerance) * harvested_tuple.value
-
- gridcell_area_data.close()
-
-def test_gridcell_min_max(tolerance=0.001):
- """Opens each background Netcdf file using the
- netCDF4 library function Dataset and computes the maximum
- of the provided variable. In this case prateb_ave.
- """
+ if harvested_tuple.statistic == 'variance':
+ assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value
+ assert variances[iregion] <= (1 + tolerance) * harvested_tuple.value
same thing here with a duplicated assert statement
------------------------------
In tests/test_harvester_daily_bfg_prateb.py
<#64 (comment)>:
> -
- assert offline_max <= (1 + tolerance) * harvested_tuple.value
- assert offline_max >= (1 - tolerance) * harvested_tuple.value
-
-
- elif harvested_tuple.statistic == 'minimum':
- assert minimum <= (1 + tolerance) * harvested_tuple.value
- assert minimum >= (1 - tolerance) * harvested_tuple.value
-
- assert offline_min <= (1 + tolerance) * harvested_tuple.value
- assert offline_min >= (1 - tolerance) * harvested_tuple.value
-
- gridcell_area_data.close()
+ if harvested_tuple.statistic == 'minimum':
+ assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value
+ assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value
also a duplicated assert statement
------------------------------
In tests/test_harvester_daily_bfg_prateb.py
<#64 (comment)>:
> - gridcell_area_data.close()
+ if harvested_tuple.statistic == 'minimum':
+ assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value
+ assert min_values[iregion] <= (1 + tolerance) * harvested_tuple.value
+ iregion = iregion + 1
+
+def test_gridcell_max(tolerance=0.001):
+ data1 = harvest(VALID_CONFIG_DICT)
+ if 'region' in str(data1[0]):
+ max_values = [0.0032889172, 0.002300344, 0.0009843283, 0.00071415555, 0.004360093]
+
+ iregion = 0
+ for i, harvested_tuple in enumerate(data1):
+ if harvested_tuple.statistic == 'maximum':
+ assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value
+ assert max_values[iregion] <= (1 + tolerance) * harvested_tuple.value
also a duplicate assert statement
------------------------------
In tests/test_harvester_daily_toa_radative_flux.py
<#64 (comment)>:
> - separate program from the bfg files above and then
- hard coded in this function for a
- test of what is returned from the harvester.
- """
+ for ivar in range(num_vars):
+ summation = 0.0
+ var_name = required_vars[ivar]
+ for file_count, data_file in enumerate(BFG_PATH):
+ test_rootgrp = Dataset(data_file)
+ with Dataset(data_file, 'r') as data:
+ var_data = test_rootgrp.variables[var_name]
+ summation += var_data[0]
+ value = summation/(file_count + 1)
+ ivar_weighted_mean = np.ma.sum(norm_weights * value)
+ assert ivar_weighted_mean <= (1 + tolerance) * weighted_means[ivar]
+ assert ivar_weighted_mean >= (1 - tolerance) * weighted_means[ivar]
is this the assert tolerance pattern that should be used in those
duplicates above?
------------------------------
In src/score_hv/harvesters/daily_bfg.py
<#64 (comment)>:
> + longitudes=xr_dataset['grid_xt']
+ num_lat = len(weights['grid_yt'])
+ num_lon = len(weights['grid_xt'])
+ value = np.ma.masked_invalid(var_data.mean(dim='time',skipna=True))
+ temporal_means.append(value)
+ var_stats_instance.calculate_requested_statistics(var_data,weights,value)
+
+ for iregion in range(num_regions):
+ for j, statistic in enumerate(self.config.get_stats()):
+ """ The second nested loop iterates through each requested
+ statistic and regions if the user has requested geographical
+ regions..
+ """
+ if statistic == 'mean':
+ themeans=var_stats_instance.weighted_averages[iregion]
+ value=themeans
same comment for the variables below. I think it's cleaner to set it
directly
—
Reply to this email directly, view it on GitHub
<#64 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK2TAENI7WVGXEBQZWHYB3TY6FKAPAVCNFSM6AAAAABGOBUELOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMJRHE3DMOJXGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
that crosses the prime merdian.
west_longitude value so a region can cross the prime meridian. Changed the get_region function to get_region_coordinates. Put the region coordinate values in the regions class.
put it in the region class. Added the cacluation of the weighted average to the variance calculation function.
and put it in the region_utils class. took out the iregion loop. Now the stats_utils returns all calculated statistics at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the underlying capability to harvest regional metrics take form! I think most of the difficulty is behind us.
That said, I have a few major comments.
-
Most of region_utils.py is functionally sound. However, I am having a difficult time understanding get_region_coordinates(). Hopefully, simply talking it through can help me understand the routine, and then I can provide more helpful feedback.
-
I suggest changing the structure of the regions part of the request dictionary. I provided some suggestions on how to do so, which would remove some complexity toward handling region requests in daily_bfg.py.
-
Some additional exception handling is needed to help catch (1) bad requests and (2) to correctly assert that the grid cell weighting, applied to regions, is accurate. I will work on this and can push a commit.
-
There are several variable definitions that seem unnecessary. I suggest reducing definitions where not needed unless they improve execution speed (e.g., when indexing large arrays, especially within nested for loops).
Please consider the following suggestions that could address the above comments. Overall, I think support for regions will be ready for production after addressing the above.
|
||
HARVESTER_NAME = 'daily_bfg' | ||
VALID_STATISTICS = ('mean', 'variance', 'minimum', 'maximum') | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably hook up support to verify that requested bound keys are valid
VALID_REGION_BOUND_KEYS = ('min_lat', 'max_lat', 'east_lon', 'west_lon') |
src/score_hv/harvesters/daily_bfg.py
Outdated
#'prate_ave', # surface precip rate (mm weq. s^-1) | ||
'prateb_ave', # bucket surface precip rate (mm weq. s^-1) | ||
'netrf_avetoa',#top of atmosphere net radiative flux (SW and LW) (W/m**2) | ||
'netR',#surface energy balance (W/m**2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to adhere to convention:
'netR',#surface energy balance (W/m**2) | |
'netef_ave', # surface energy budget (W/m**2) |
'longname']) | ||
|
||
'longname', | ||
'region']) | ||
def get_gridcell_area_data_path(): | ||
return os.path.join(Path(__file__).parent.parent.parent.parent.resolve(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KevinCounts - we'll have to change this when we move the location of the data dir
src/score_hv/harvesters/daily_bfg.py
Outdated
meets a certain threshold. | ||
""" | ||
sumweights = area_weights.sum() | ||
ratio = total_regional_elements / total_original_elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this approach to calculate the total solid angle of the region is valid only for uniform area grids and will have to be revised to work with most other grids
for i, variable in enumerate(self.config.get_variables()): | ||
""" The first nested loop iterates through each requested variable. | ||
""" | ||
|
||
namelist=self.config.get_variables() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namelist=self.config.get_variables() |
""" | ||
|
||
namelist=self.config.get_variables() | ||
var_name=namelist[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var_name=namelist[i] |
src/score_hv/harvesters/daily_bfg.py
Outdated
|
||
namelist=self.config.get_variables() | ||
var_name=namelist[i] | ||
if var_name == "netR": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var_name can just be replaced with "variable," which is already unpacked
var_name=namelist[i] | ||
if var_name == "netR": | ||
variable_data=calculate_surface_energy_balance(xr_dataset) | ||
longname="surface energy balance" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
longname="surface energy balance" | |
longname="surface energy budget" |
src/score_hv/region_utils.py
Outdated
|
||
Parameters: | ||
- new_name: Name of the new region. | ||
- new_min_lat: Minimum latitude of the new region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- new_min_lat: Minimum latitude of the new region. | |
- min_lat: Minimum latitude of the new region. |
functions a nested dictionary. So the region dictionary now has region name latitude_range and longitude range as key words. Where region name is a name given to the region by the user."
DEFAULT_LATITUDE_RANGE = (-90, 90) DEFAULT_LONGITUDE_RANGE = (360, 0) The add_user_region method takes a nested dictionary as an argument. The dictionary is parsed in this routine. An outline of this method is: Test to see if the dictionay is empty. Exit if it is Test of see if the user has included a region name. Exit if the region name is missing. Test to see if the key words latitude_region and/or longitude_region is present. If one of them is missing supply the defaults from above. Methods test_user_latitudes and test_user_longitudes were updated. The method get_region_coordinates will now calculate the region coordinates for a region that crosses the prime meridian. The name of the class was changed to GeoRegionsCatalog from GeoRegions.
import var_stats to import var_statsCatalog Changed the line GeoRegions to import GeoRegionsCatalog
a nested dictionary.
src/score_hv/stats_utils.py
Outdated
def area_weighted_mean(xarray_variable, gridcell_area_weights): | ||
"""returns the gridcell area weighted mean of xarray_variable and checks | ||
that gridcell_area_weights are valid | ||
class var_stats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the class needs to be named "var_statsCatalog," which is how it is referenced elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing that it needs to match. I would suggest VarStatsCatalog as a similar naming structure to other classes. Either way, this needs to match what gets imported in daily_bfg.py
This branch is behind in commits from develop. It would be good to merge develop into this branch before we complete the PR to handle any potential merge conflicts easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. Mostly some minor questions in teh tests.
Biggest issue is that the import of the class from stats_utils into daily_bfg.py doesn't match the name in the name of the actual class. Exact location is noted in comments. This is the only thing I see that is holding up approval.
src/score_hv/harvesters/daily_bfg.py
Outdated
return self.regions | ||
|
||
def set_regions(self): | ||
self.regions = self.config_data.get('region') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would make sense for the input value to be called regions plural since it can be multiple to make that more obvious to the user. Either way, we want to make sure that the readme reflects the expected value here 'region' (as of now) as I could see that being confusing so let's just catch it early.
@@ -111,7 +209,7 @@ class DailyBFGHv(object): | |||
Parameters: | |||
----------- | |||
config: DailyBFGConfig object containing information used to determine | |||
which variable to parse the log file | |||
which variable to parse the log file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random extra space
src/score_hv/harvesters/daily_bfg.py
Outdated
from dataclasses import dataclass | ||
from dataclasses import field | ||
from score_hv.config_base import ConfigInterface | ||
from score_hv.stats_utils import var_statsCatalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this import doesn't match the name of the class
src/score_hv/stats_utils.py
Outdated
def area_weighted_mean(xarray_variable, gridcell_area_weights): | ||
"""returns the gridcell area weighted mean of xarray_variable and checks | ||
that gridcell_area_weights are valid | ||
class var_stats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreeing that it needs to match. I would suggest VarStatsCatalog as a similar naming structure to other classes. Either way, this needs to match what gets imported in daily_bfg.py
num_values = len(harvested_tuple.value) | ||
for inum in range(num_values): | ||
assert global_means[inum] <= (1 + tolerance) * harvested_tuple.value[inum] | ||
assert global_means[inum] <= (1 + tolerance) * harvested_tuple.value[inum] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the same as the line before it, should it be >= and - ?
num_values = len(harvested_tuple.value) | ||
for inum in range(num_values): | ||
assert variances[inum] <= (1 + tolerance) * harvested_tuple.value[inum] | ||
assert variances[inum] <= (1 + tolerance) * harvested_tuple.value[inum] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assert is the same as the line above it, should it be >= and - ?
num_values = len(harvested_tuple.value) | ||
for inum in range(num_values): | ||
assert min_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum] | ||
assert min_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert is the same as the line above it, should it be >= and -?
num_values = len(harvested_tuple.value) | ||
for inum in range(num_values): | ||
assert max_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum] | ||
assert max_values[inum] <= (1 + tolerance) * harvested_tuple.value[inum] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert is the same as the line above it, should it be >= and -?
parameter from the var_stats_instance.calculate_requested_statistics(weights,value) call. It is not used in the calculate_requested_statistics class.
…ested_tuple.value[inum] lines to assert global_means[inum] >= (1 - tolerance) * harvested_tuple.value[inum]
region_solid_angle = (sum_region_weights / sum_global_weights) * 4 * np.pi
to the region_utils. This method returns the data in the specified user region from the original data set.
data have the same shape in the variance calculations. The solid angles. weights that are passed in to the calculate_requested_statistics are the normalized weights based on the Please enter the commit message for your changes. Lines starting
were made. These statistics were calculated with the solid angle weights.
Added a method "calculate_and_normalize_solid_angle" for the calculation of the solid angle and normalizaton of the weights.
variable data to NaN's. Added code for handeling masks.
If they come in as xarray DataArrays they are converted to NumPy arrays before any calculations are done.
This was removed from the find_mininum_value and find_maximum_value methods. The temporal_means array is converted to a NumPy array before any calculations are done.
This pull request is for the branch feature/regions_stats_utils. It includes the new region_utils.py script that was added to the score-hv/src/score_hv/region_utils.py. The region_utils.py code has an instantiation method, an add_user_region and
a get_user_region method.
The main change in the stats_utils.py in score-hv/src/score_hv/ was to take the calculation of the temporal_means array
out of the stats_utils and put it into the python script that calls the stats_utils methods. In this case it was put into the daily_bfg.py harvester.
The daily_bfg.py harvester was changed to now calculate the temporal mean. The temporal mean is then passed into
the methods in the stats_utils to calculate the desired statistics. Code was added to the daily_bfg file work with the
region/regions requested by the user. If no region was requested by the user a default is set that is global.
The file test_harvester_daily_bfg_prateb.py in the test directory was modified to test all of the statistics returned by
daily_bfg.py harvester. In this file five regions were requested. The values of mean, variance, minimum and maximum
were returned from the daily_bfg.py harvester. The values that were hard coded for the statistics were calculated
off line.
The file test_harvester_daily_toa_radative_flux.py was modified to work with the new daily_bfg.py harvester.
White spaces were removed from the test_harvester_daily_bfg_tmp2m.py. This is the only change that was made.