-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MODIS-COSP recipe #68
base: main
Are you sure you want to change the base?
Conversation
jbusecke
commented
Oct 18, 2023
- Closing New Dataset [MODIS cloud data for climate model proxies] #51
- Figure out a 'schema' for the URLs
- Try to move this to a feedstock subfolder
pre-commit.ci autofix |
Ok got the recipe deployed and cached the files. So the auth part works 🎉 But I am running into issues on dataflow now:
I am fairly sure this is due to the fact that the files do not actually have a
|
Yes, this what I recommend, and is how I have solved this type of problem myself, for example: |
Oh crap, this is an even bigger issue. They use netcdf groups 😩. Paging @TomNicholas in hopes there is some datatree/xarray wizardry that might help us here! |
Hey, happy to have a look, but I'm missing a lot of context! Obviously you guys know you can use datatree to open a netcdf files with groups then look at the groups / extract each as a dataset. What is the problem? |
All groups share the lon/lat coordinates, but they are weirdly stored at the root node, and then each node could have additional dimensions:
💡 hold on let me just use datatree to get a better repr:
```python
DataTree('None', parent=None)
│ Dimensions: (latitude: 180, longitude: 360)
│ Coordinates:
│ * latitude (latitude) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
│ * longitude (longitude) float64 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
│ Data variables:
│ *empty*
│ Attributes: (12/51)
│ YAML_config: grid_settings:\n gridsize: 1\n proje...
│ Yori_version: 1.5.0
│ input_files: MCD06COSP_D3_MODIS.A2008336.062.202212...
│ daily_defn_of_day_adjustment: False
│ history:
│ source: idl 8.4, mcd06cosp_preyori 20220218-1,...
│ ... ...
│ longitude_resolution: 1.0
│ license: http://science.nasa.gov/earth-science/...
│ stdname_vocabulary: NetCDF Climate and Forecast (CF) Metad...
│ keywords_vocabulary: NASA Global Change Master Directory (G...
│ keywords: EARTH SCIENCE > ATMOSPHERE > CLOUDS > ...
│ naming_authority: gov.nasa.gsfc.sci.atmos
├── DataTree('Solar_Zenith')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Solar Zenith Angle (Cell to Sun) for Daytime Scenes
│ units: degrees
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 180.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Solar_Azimuth')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Solar Azimuth Angle (Cell to Sun) for Daytime Scenes
│ units: degrees
│ _FillValue: -999.0
│ valid_min: -180.0
│ valid_max: 180.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Sensor_Zenith')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Sensor Zenith Angle (Cell to Sensor) for Daytime Scenes
│ units: degrees
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 180.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Sensor_Azimuth')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Sensor Azimuth Angle (Cell to Sensor) for Daytime Scenes
│ units: degrees
│ _FillValue: -999.0
│ valid_min: -180.0
│ valid_max: 180.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Top_Pressure')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Top Pressure for Daytime Scenes
│ units: mb
│ _FillValue: -999.0
│ valid_min: 1.0
│ valid_max: 1100.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Mask_Fraction')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Fraction from Cloud Mask for Daytime Scenes
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Mask_Fraction_Low')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Fraction from Cloud Mask (Low Clouds, CTP GE 680 hPa...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Mask_Fraction_Mid')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Fraction from Cloud Mask (Mid Clouds, CTP GE 440 hPa...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Mask_Fraction_High')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Fraction from Cloud Mask (High Clouds, CTP LT 440 hP...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_Liquid')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_optical_thickness_liquid_7: 7,
│ jhisto_cloud_particle_size_liquid_6: 6,
│ jhisto_cloud_top_pressure_7: 7)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_optical_thickness_liquid_7,
│ jhisto_cloud_particle_size_liquid_6,
│ jhisto_cloud_top_pressure_7
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_Liquid (longitude, latitude, jhisto_cloud_optical_thickness_liquid_7, jhisto_cloud_particle_size_liquid_6) float64 ...
│ JHisto_vs_Cloud_Top_Pressure (longitude, latitude, jhisto_cloud_optical_thickness_liquid_7, jhisto_cloud_top_pressure_7) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness for Liquid Water Clouds (3.7 micro...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 150.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_Ice')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_optical_thickness_ice_7: 7,
│ jhisto_cloud_particle_size_ice_6: 6,
│ jhisto_cloud_top_pressure_7: 7)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_optical_thickness_ice_7,
│ jhisto_cloud_particle_size_ice_6,
│ jhisto_cloud_top_pressure_7
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_Ice (longitude, latitude, jhisto_cloud_optical_thickness_ice_7, jhisto_cloud_particle_size_ice_6) float64 ...
│ JHisto_vs_Cloud_Top_Pressure (longitude, latitude, jhisto_cloud_optical_thickness_ice_7, jhisto_cloud_top_pressure_7) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness for Ice Clouds (3.7 micron Retriev...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 150.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_Total')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_optical_thickness_total_7: 7,
│ jhisto_cloud_top_pressure_7: 7)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_optical_thickness_total_7,
│ jhisto_cloud_top_pressure_7
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Top_Pressure (longitude, latitude, jhisto_cloud_optical_thickness_total_7, jhisto_cloud_top_pressure_7) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness for Combined (LiquidWater+Ice+Unde...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 150.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_PCL_Liquid')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_optical_thickness_pcl_liquid_7: 7,
│ jhisto_cloud_particle_size_pcl_liquid_6: 6,
│ jhisto_cloud_top_pressure_7: 7)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_optical_thickness_pcl_liquid_7,
│ jhisto_cloud_particle_size_pcl_liquid_6,
│ jhisto_cloud_top_pressure_7
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_PCL_Liquid (longitude, latitude, jhisto_cloud_optical_thickness_pcl_liquid_7, jhisto_cloud_particle_size_pcl_liquid_6) float64 ...
│ JHisto_vs_Cloud_Top_Pressure (longitude, latitude, jhisto_cloud_optical_thickness_pcl_liquid_7, jhisto_cloud_top_pressure_7) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness for Liquid Water Phase Clouds (3.7...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 150.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_PCL_Ice')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_optical_thickness_pcl_ice_7: 7,
│ jhisto_cloud_particle_size_pcl_ice_6: 6,
│ jhisto_cloud_top_pressure_7: 7)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_optical_thickness_pcl_ice_7,
│ jhisto_cloud_particle_size_pcl_ice_6,
│ jhisto_cloud_top_pressure_7
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_PCL_Ice (longitude, latitude, jhisto_cloud_optical_thickness_pcl_ice_7, jhisto_cloud_particle_size_pcl_ice_6) float64 ...
│ JHisto_vs_Cloud_Top_Pressure (longitude, latitude, jhisto_cloud_optical_thickness_pcl_ice_7, jhisto_cloud_top_pressure_7) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness for Ice Phase Clouds (3.7 micron R...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 150.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_PCL_Total')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_optical_thickness_pcl_total_7: 7,
│ jhisto_cloud_top_pressure_7: 7)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_optical_thickness_pcl_total_7,
│ jhisto_cloud_top_pressure_7
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Top_Pressure (longitude, latitude, jhisto_cloud_optical_thickness_pcl_total_7, jhisto_cloud_top_pressure_7) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness for Combined (LiquidWater+Ice+Unde...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 150.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_Log10_Liquid')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness Log10 for Liquid Water Clouds (3.7...
│ units: none
│ _FillValue: -999.0
│ valid_min: -2.0
│ valid_max: 2.176
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_Log10_Ice')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness Log10 for Ice Clouds (3.7 micron R...
│ units: none
│ _FillValue: -999.0
│ valid_min: -2.0
│ valid_max: 2.176
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Optical_Thickness_Log10_Total')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Thickness Log10 for Combined (LiquidWater+Ic...
│ units: none
│ _FillValue: -999.0
│ valid_min: -2.0
│ valid_max: 2.176
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Particle_Size_Liquid')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Effective Radius for Liquid Water Clouds (3.7 micron...
│ units: microns
│ _FillValue: -999.0
│ valid_min: 4.0
│ valid_max: 30.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Particle_Size_Ice')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Effective Radius for Ice Clouds (3.7 micron Retrieva...
│ units: microns
│ _FillValue: -999.0
│ valid_min: 5.0
│ valid_max: 60.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Particle_Size_PCL_Liquid')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Effective Radius for Liquid Water Clouds (3.7 micron...
│ units: microns
│ _FillValue: -999.0
│ valid_min: 4.0
│ valid_max: 30.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Particle_Size_PCL_Ice')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Effective Radius for Ice Clouds (3.7 micron Retrieva...
│ units: microns
│ _FillValue: -999.0
│ valid_min: 5.0
│ valid_max: 60.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Water_Path_Liquid')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_water_path_liquid_7: 7,
│ jhisto_cloud_particle_size_liquid_6: 6)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_water_path_liquid_7,
│ jhisto_cloud_particle_size_liquid_6
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_Liquid (longitude, latitude, jhisto_cloud_water_path_liquid_7, jhisto_cloud_particle_size_liquid_6) float64 ...
│ Attributes:
│ long_name: Cloud Water Path for Liquid Water Clouds (3.7 micron Retri...
│ units: g/m^2
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 3000.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Water_Path_Ice')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_water_path_ice_7: 7,
│ jhisto_cloud_particle_size_ice_6: 6)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_water_path_ice_7,
│ jhisto_cloud_particle_size_ice_6
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_Ice (longitude, latitude, jhisto_cloud_water_path_ice_7, jhisto_cloud_particle_size_ice_6) float64 ...
│ Attributes:
│ long_name: Cloud Water Path for Ice Clouds (3.7 micron Retrieval for ...
│ units: g/m^2
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 6000.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Water_Path_PCL_Liquid')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_water_path_pcl_liquid_7: 7,
│ jhisto_cloud_particle_size_pcl_liquid_6: 6)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_water_path_pcl_liquid_7,
│ jhisto_cloud_particle_size_pcl_liquid_6
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_PCL_Liquid (longitude, latitude, jhisto_cloud_water_path_pcl_liquid_7, jhisto_cloud_particle_size_pcl_liquid_6) float64 ...
│ Attributes:
│ long_name: Cloud Water Path for Liquid Water Clouds (3.7 micron Retri...
│ units: g/m^2
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 3000.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Water_Path_PCL_Ice')
│ Dimensions: (longitude: 360, latitude: 180,
│ jhisto_cloud_water_path_pcl_ice_7: 7,
│ jhisto_cloud_particle_size_pcl_ice_6: 6)
│ Dimensions without coordinates: longitude, latitude,
│ jhisto_cloud_water_path_pcl_ice_7,
│ jhisto_cloud_particle_size_pcl_ice_6
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ JHisto_vs_Cloud_Particle_Size_PCL_Ice (longitude, latitude, jhisto_cloud_water_path_pcl_ice_7, jhisto_cloud_particle_size_pcl_ice_6) float64 ...
│ Attributes:
│ long_name: Cloud Water Path for Ice Clouds (3.7 micron Retrieval for ...
│ units: g/m^2
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 6000.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Retrieval_Fraction_Liquid')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Properties Retrieval Fraction (Liquid Water ...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Retrieval_Fraction_Ice')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Properties Retrieval Fraction (Ice Clouds)
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Retrieval_Fraction_Total')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Properties Retrieval Fraction (Combined (Liq...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Retrieval_Fraction_PCL_Liquid')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Properties Retrieval Fraction (Liquid Water ...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
├── DataTree('Cloud_Retrieval_Fraction_PCL_Ice')
│ Dimensions: (longitude: 360, latitude: 180)
│ Dimensions without coordinates: longitude, latitude
│ Data variables:
│ Mean (longitude, latitude) float64 ...
│ Standard_Deviation (longitude, latitude) float64 ...
│ Sum (longitude, latitude) float64 ...
│ Pixel_Counts (longitude, latitude) float64 ...
│ Sum_Squares (longitude, latitude) float64 ...
│ Attributes:
│ long_name: Cloud Optical Properties Retrieval Fraction (Ice Clouds) f...
│ units: none
│ _FillValue: -999.0
│ valid_min: 0.0
│ valid_max: 1.0
│ scale_factor: 1.0
│ add_offset: 0.0
└── DataTree('Cloud_Retrieval_Fraction_PCL_Total')
Dimensions: (longitude: 360, latitude: 180)
Dimensions without coordinates: longitude, latitude
Data variables:
Mean (longitude, latitude) float64 ...
Standard_Deviation (longitude, latitude) float64 ...
Sum (longitude, latitude) float64 ...
Pixel_Counts (longitude, latitude) float64 ...
Sum_Squares (longitude, latitude) float64 ...
Attributes:
long_name: Cloud Optical Properties Retrieval Fraction (Combined Clou...
units: none
_FillValue: -999.0
valid_min: 0.0
valid_max: 1.0
scale_factor: 1.0
add_offset: 0.0
```
|
@jbusecke I previously did a deep dive on this dataset using the pre-beam code, and ultimately came up with the code at the bottom of pangeo-forge/staged-recipes#125 (comment) as a semi-workable solution, which as you'll see creates a zarr store for each group. In Beam, there should be a better way of doing this, which may or may not benefit from Datatree. |
Update for @TomNicholas: I initially thought all groups have the same dimensions and was wondering if we can brute force them into a single dataset. The broader question here is how we deal with datatrees/groups in pangeo-forge I guess, and I thought this would intersect your interest at the moment? But back to the discussion of this dataset: I am wondering if this could/should be compressed to a dataset instead of a tree? |
If so, we could do something like: class OpenWithDatatree(beam.PTransform):
...
class DatatreeToDataset(beam.PTransform):
def expand(pcoll: PCollection[Datatree]):
# combine the data tree nodes into a single dataset here
ds = ...
return ds
recipe = (
...
| OpenURLWithFSSpec()
| OpenWithDatatree()
| DatatreeToDataset()
| StoreToZarr()
) If not, I'd say just make one zarr store per group. |
Ughhh, I now realize that there is more history to this. Should have looked before diving in. Sorry about that.
But that is not strictly necessary, right? Is it worth thinking about a datatree->nested zarr pipeline? Maybe that is in the end overkill. It might however raise some interesting edge cases for datatree (can we do a |
If you wanted to add support for datatree to pangeo-forge, I would start with simple unambiguous io functions and only do combining of datatree objects dataset-by-dataset.
This kind of operation is not yet implemented in datatree because its too ambiguous as written. |
Probably the most viable method for now. Thinking about how to achieve that. I guess we have two choices here repeated read and filter by groupname class OpenWithDatatree(beam.PTransform):
...
@dataclass
class DatatreeGroupToDataset(beam.PTransform):
:param: var ....
def expand(pcoll: PCollection[Datatree]):
# combine the data tree nodes into a single dataset here
ds = select_single_group_from_datatree(var=var)
return ds
recipe_a = (
...
| OpenURLWithFSSpec()
| OpenWithDatatree()
| DatatreeGroupToDataset(var=a)
| StoreToZarr()
)
recipe_b = (
...
| OpenURLWithFSSpec()
| OpenWithDatatree()
| DatatreeGroupToDataset(var=a)
| StoreToZarr()
) Seems like a pain in the butt to maintain...but might be more straightforward to implement (we could use the dictobj work from the CMIP6 feedstock to loop over groupnames and generate a recipe dict?). emit multiple datasets from datatree and group the results before storing class OpenWithDatatree(beam.PTransform):
...
class SplitGroupsToDatasets(beam.PTransform):
def expand(pcoll: PCollection[Datatree]):
list = split_dt() # this would be a list of ('var', ds_var) tuples maybe?
return list # Not sure how to properly emit multiple outputs per input here
recipe = (
...
| OpenURLWithFSSpec()
| OpenWithDatatree()
| SplitGroupsToDatasets()
| GroupByVar() # this has to group all datasets that belong to each group/store (there will be multiple time steps).
| StoreToZarr(target_store='somehow generated from the grouped keys?')
) |
Yup that makes sense in general. I guess we can add 'time slices of identical tree structures' to the list of subclasses where these operations would actually be non-ambiguous because they have certain properties (similarly to the 'hollow' CMIP6 trees)...but that is a tangent. I think that at the moment we do not really need any new features from datatree to achieve a workable solution. |
The Adapting your first option a bit more concisely, I would suggest @dataclass
class ModisCospRecipe(beam.PTransform):
var: str
def expand(self, pattern: PCollection):
return (
pattern
| OpenURLWithFSSpec()
| OpenWithDatatree()
| DatatreeGroupToDataset(var=self.var)
| StoreToZarr()
)
pattern = ... # same pattern for all recipes
recipe_a = beam.Create(pattern.items()) | ModisCospRecipe(var="var_a")
recipe_b = beam.Create(pattern.items()) | ModisCospRecipe(var="var_b") |