Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/soil harvesters #68

Open
wants to merge 68 commits into
base: develop
Choose a base branch
from
Open

Feature/soil harvesters #68

wants to merge 68 commits into from

Conversation

sherrieF
Copy link
Collaborator

This is a new harvester. The tests in this harvester are
test_harvester_global_soil_moisture.py
test_harvester_global_soil_temperature.py
The above two harvesters test the return values from the daily_bfg.py for the global vaues of
soilt4, tg3, soill4, and soilm. No region was requested. It should be noted that these soil fields
are automatically masked. The ocean and ice values are set to NaN.
The harvesters below:
test_harvester_regional_soil_moisture.py
test_harvester_regional_soil_temperature.py
Test the regional values of soilt4, tg3, soill4, and soilm.
The regions tested were:
'regions': {
'north_hemi': {'north_lat': 90.0, 'south_lat': 0.0, 'west_long': 0.0, 'east_long': 360.0},
'south_meni': {'north_lat': 0.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 360.0},
'eastern_hemis': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 180.0},
'western_hemis': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 180.0, 'east_long': 360.0},
}
All of the tests passed the pytest.
Also the following classes are apart of this branch.
mask_utils.py - The class for masking methods.
region_utils.py - The class for subsetting the global variable and weight data in subregions.
stats_utils.py - The class for calculating the user requested statistics.

The daily_bfg.py
This is the main python script for populating the harvested_data which will be returned to the tests and other methods which call the daily_bfg.py

added the lhtfl control files to the directory.
The lhtfl control files are the files that the test and
the harvester daily_bfg.py uses.
test the surface latent heat flux values returned from the harvester.
test_harvester_daily_bfg_prateb.py
changed the name of test_harvester_daily_bfg_lhtfl_ave.py to
test_harvester_surface_latent_heat_flux.py
sherrieF added 23 commits August 6, 2024 12:11
longitude region to west_long and east_long
…ast_lon')

DEFAULT_REGION = {'global': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 360.0}}
It tests the mean,variance,min and max values for two variables.
soill4 and soilm.  The bfg files found in the data directory were
used for the test.  This test was for a global region.  No subregion
was requested.
It tests the mean,variance,min and max values for two variables. soilt4 and tg3.
The bfg files found in the data directory were used for the test.
This test was for a global region.  No subregion was requested.
It tests the mean,variance,min and max values for two variables. soill4 and soilm.
The bfg files found in the data directory were used for the test.
This test was for the four regions as follows:
'regions': {
           'north_hemi': {'north_lat': 90.0, 'south_lat': 0.0, 'west_long': 0.0, 'east_long': 360.0},
           'south_meni': {'north_lat': 0.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 360.0},
           'eastern_hemis': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 180.0},
           'western_hemis': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 180.0, 'east_long': 360.0},
          }
Values for the statistics mentioned above were tested for all four regions with that
which was returned from the daily_bfg harvester.
It tests the mean,variance,min and max values for two variables. soilt4 and tg3.
The bfg files found in the data directory were used for the test.
This test was for the four regions as follows:
'regions': {
           'north_hemi': {'north_lat': 90.0, 'south_lat': 0.0, 'west_long': 0.0, 'east_long': 360.0},
           'south_meni': {'north_lat': 0.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 360.0},
           'eastern_hemis': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 0.0, 'east_long': 180.0},
           'western_hemis': {'north_lat': 90.0, 'south_lat': -90.0, 'west_long': 180.0, 'east_long': 360.0},
          }
Values for the statistics mentioned above were tested for all four regions with that
which was returned from the daily_bfg harvester.
    The classes that are initialized for this python script are:
        self.weighted_averages=[]
        self.variances=[]
        self.maximum=[]
        self.stats=stats_list

    def clear_requested_statistics(self):
        This method clears out the class lists so the statistice for
        multiple variables can be calculated and returned.

    def calculate_requested_statistics(self,weights,temporal_mean):
        This method takes the weights and the temporal mean for
        the variable passed in from the calling routine and
        calculates the user requested statistics for that
        variable. The following methods are called from this
        method.  The statistics that are calculated from the
        methods below are put into the class lists.
        def calculate_weighted_average(self,weights,temporal_mean):
            This method takes the weights and temporal mean of a
            variable and calculates a weighted sum.
        def calculate_var_variance(self,weights,temporal_mean):
            This method takes the weights and temporal mean of a
            variable  and calculates the variance.
            variance = sum_R{ w_i * (x_i - xbar)^2 }
        def find_minimum_value(self,temporal_mean):
            This method finds the minimum value of the temporal
            mean of a variable.
        def find_maximum_value(self,temporal_mean):
            This method fine the maximum value of the temporal
            mean of a variable.
The class initialization:
    def __init__(self,dataset):
        """
          Here we initalize the region class as a dictionary.
          Parameter: datset - This is a dataset that has been
                              opened with xarray.
          """
        self.name = []        The list to store the name of the user region.  This is key word and is needed.
        self.north_lat = []   The list to store the user requested northern latitude of the reggion.
        self.south_lat = []   The list to store the user requested southern latitude of the region.
        self.west_long = []   The list to store the western longitude of the user region. In degrees East
        self.east_long = []   The list to store the eastern longitude of the user region.  In degrees East
        self.region_indices = []  The list to strore the region indicies.  These are passed back to the calling
                                  routine.
        self.latitude_values = dataset['grid_yt'].values  This is the array of latitude values on the original
                                                          dataset.
        self.longitude_values = dataset['grid_xt'].values This is the array of the longitude values on the
                                                          original data set.
        The methods called from this class.
            def test_user_latitudes(self,north_lat,south_lat):
                 This method tests the user input latitudes to make sure they are reasonable.
                 It exits with an error if the latitudes are out of bounds.
                 If the values pass the tests they are added to the region dictionary defined
                 in the def__init__ method above.
            def test_user_longitudes(self,west_long,east_long):
                This method tests the user input longitudes to make sure they are reasonable.
                It exits with an error if the longitudes are out of bounds.
                If the values pass the tests they are added to the region dictionary define
                in the def__init__ method above.
            def get_region_indices(self,region_index):
                This method is called from the method get_region_data that is a member of
                this class..
                It calculates the start and end indicies in the latitude and longitude
                arrays for the region index passed in from the get region data.

       Methods called from an external python script.
       def add_user_region(self,dictionary):
            This method is called from the main calling python script.  It tests the
            region dictionary passed in from the calling script for validity. It calls the
            test_user_latitudes and test_user_longitudes to make the user defined region
            is valid.  If the user defined region is valid it populates the region dictionary
            as defined above.
       def get_region_indices(self,region_index):
           This method is called from the main calling python script.
           It calculates the start and end indicies in the latitude and longitude
           arrays for the region index passed in. It returns the indicies to the calling python script.
       def get_region_data(self,region_index,data):
           This method subsets the full grid variable data into the requested region.
 The valid masks are land,ocean,sea and ice

    def __init__(self,user_mask_value,soil_type_values):
        """
          Here we initalize the MaskCatalog class.
          """
        self.name = None
        self.user_mask = user_mask_value
        self.data_mask = soil_type_values

   def initial_mask_of_variable(self,var_name,variable_data,dataset):
       This method sets the ocean and ice grid cells to missing for the soil variables.
       This is done automatically for the soil variables: soill4,soilm,soilt4 and tg3.
       This method is called from the main python script.

   def replace_bad_values_with_nan(self,variable_data):
       This method replaces missing or fill values with NaN.
       This is done so any statistics the user has requested will
       be calculated correctly.
       This method is called from the main python script.

   def user_mask_array(self,region_mask):
       This takes the sotyp variable data from the data set.
       It returns an array with boolean values.  The grid points that
       the user wants are set to True and the grid points the user
       does not want is set to false.
This python script is the main sript for the harvesters in score-hv
This script uses the following classes:
from score_hv.config_base  import ConfigInterface
from score_hv.stats_utils  import VarStatsCatalog
from score_hv.region_utils import GeoRegionsCatalog
from score_hv.mask_utils import MaskCatalog

This script reads the VALID_CONFIG_DICT that is set up in the
harvester tests. At present the VALID_CONFIG_DICT has the following values:
VALID_CONFIG_DICT = {'harvester_name': hv_registry.DAILY_BFG,
                     'filenames' : BFG_PATH,
                     'statistic': ['mean','variance', 'minimum', 'maximum'],
                     'variable': ['var1',...'varn'],
                     'regions': {name,latitude values, longitude values}
                                There can be more than one region.
                     'surface_mask': land,or ocean or ice
                     }

The daily_bfg.py  then opens and reads the dataset requested
by the user.  The path to the data files is in the VALID_CONFIG_DICT:filenames.
The python method xarray is used to open and read in the data set.
The script then reads the rest of the VALID_CONFIG_DICT.
A general rundown of the processing in the daily_bfg.py is as follows:
    The gridcell area weight files is read in.
    Each variable that has been requested is processed on at a time.
    If a soil variable has been requested it is masked.
    If a surface mask has been requested the variable grid point
    and weights grid points are masked.
    If a region or regions have been requested they are applied to
    to the variable data and weights.
    The requested user statistics are then calculated.
    The following information is sent back to the havester that has called the daily_bfg.py
    harvested_data.append(HarvestedData(
                          self.config.harvest_filenames,
                          statistic,
                          variable,
                          np.float32(value),
                          units,
                          dt.fromisoformat(median_cftime.isoformat()),
                          longname,
                          self.config.surface_mask,
                          self.config.regions))

The following are methods that are called from this main python script:
    def get_gridcell_area_data_path():
        returns the path to the gridcell area data file.

    def get_median_cftime(xr_dataset):
        returns the median cftime from the sr_dataset.
:                                There can be more than one region.
                     'surface_mask': land,or ocean or ice
                     }
The daily_bfg.py  then opens and reads the dataset requested
by the user.  The path to the data files is in the VALID_CONFIG_DICT:filenames.
The python method xarray is used to open and read in the data set.
The script then reads the rest of the VALID_CONFIG_DICT.
A general rundown of the processing in the daily_bfg.py is as follows:
    The gridcell area weight files is read in.
    Each variable that has been requested is processed on at a time.
    If a soil variable has been requested it is masked.
    If a surface mask has been requested the variable grid point
    and weights grid points are masked.
    If a region or regions have been requested they are applied to
    to the variable data and weights.
    The requested user statistics are then calculated.
    The following information is sent back to the havester that has called the daily_bfg.py
    harvested_data.append(HarvestedData(
                          self.config.harvest_filenames,
                          statistic,
                          variable,
                          np.float32(value),
                          units,
                          dt.fromisoformat(median_cftime.isoformat()),
                          longname,
                          self.config.surface_mask,
                          self.config.regions))

The following are methods that are called from this main python script:
    def get_gridcell_area_data_path():
        returns the path to the gridcell area data file.

    def get_median_cftime(xr_dataset):
        returns the median cftime from the sr_dataset.

    def check_variable_exists(var_name,dataset_variable_names):
        Makes sure the requested variable is in the users dataset.

    def calculate_surface_energy_balance(xr_dataset,dataset_variable_names):
        This method calculates the surface energy balance.  The surface energy balance
        is a derived field.

    def calculate_toa_radative_flux(xr_dataset,dataset_variable_names):
        This method calculates the top of the atmosphere radiative energy flux(netrf_avetoa).
        This is a derived field.

    def check_array_dimensions(region_variable,region_weights):
        This method makes sure that the region variable and the region weights
        have the same dimensions.  If their dimensions are different we exit
        the main script.  The dimensions must be the same to calculate the
        statistics requested by the user.

    def calculate_and_normalize_solid_angle(sum_global_weights,region_weights):
        This method calculates the solid angle for the regional weights
        and normalizes them.  The normalized regional weights are returned.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant