Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon Sustainability Data Initiative ARCO Project #208

Open
sharkinsspatial opened this issue Oct 17, 2022 · 6 comments
Open

Amazon Sustainability Data Initiative ARCO Project #208

sharkinsspatial opened this issue Oct 17, 2022 · 6 comments

Comments

@sharkinsspatial
Copy link
Contributor

The Amazon Sustainabilty Data Initiative (ASDI) is funding work to expand the usability of datasets in the ASDI catalog. This work will involve several phases, one of which includes generating Analysis Ready Cloud Optimized (ARCO) formats of datasets currently available in archival formats.

To provide the best experience for end users of these ARCO formats, we hope to leverage the domain knowledge of researchers and engineers through open communication on staged-recipes around dataset specific considerations and format structure. Many of the datasets available through the ASDI are regularly gridded and distributed in archival formats compatible with existing recipe classes or classes that are under development. For these relevant datasets we plan to

  1. At a mininum, generate kerchunk reference indices as the canonical entrypoint for dataset usage.
  2. Generate a new Zarr archive if sufficient community need exists for a different chunking strategy, optimized for specific analysis tasks.

Below is an initial listing of datasets in the ASDI program that are under consideration for processing in pangeo-forge. We are soliciting community feedback on the prioritization of these datasets and recommendations on format structure. If an ARCO format for one of these datasets would be valuable in your work or you have previous experience with a dataset, please open a new proposed recipe issue referencing this issue in staged-recipes (if one does not already exist).

Dataset Manager Issue/Feedstock
CAFE60 reanalysis CSIRO
Coupled Model Intercomparison Project 6 ESGF and Pangeo feedstock
Multi-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST) Farallon Institute staged-recipes
HIRLAM Weather Model Finnish Meteorological Institute
SILAM Air Quality Finnish Meteorological Institute
ECMWF ERA5 Reanalysis Intertrust staged-recipes
NASA NEX NASA
CAM6 Data Assimilation Research Testbed (DART) Reanalysis: Cloud-Optimized Dataset NCAR
Community Earth System Model Large Ensemble (CESM LENS) NCAR
Community Earth System Model v2 Large Ensemble (CESM2 LENS) NCAR staged-recipes
NA-CORDEX - North American component of the Coordinated Regional Downscaling Experiment NCAR
Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling Dataset NOAA
JMA Himawari-8 NOAA staged-recipes
NOAA Atmospheric Climate Data Records NOAA
NOAA Climate Forecast System (CFS) NOAA
NOAA Fundamental Climate Data Records (FCDR) NOAA
NOAA Geostationary Operational Environmental Satellites (GOES) 16 & 17 NOAA staged-recipes
NOAA Global Ensemble Forecast System (GEFS) NOAA
NOAA Global Ensemble Forecast System (GEFS) Re-forecast NOAA staged-recipes
NOAA Global Extratropical Surge and Tide Operational Forecast System (Global ESTOFS) NOAA
NOAA Global Forecast System (GFS) NOAA staged-recipes
NOAA Global Hydro Estimator (GHE) NOAA
NOAA Global Mosaic of Geostationary Satellite Imagery (GMGSI) NOAA
NOAA High-Resolution Rapid Refresh (HRRR) Model NOAA staged-recipes
NOAA National Digital Forecast Database (NDFD) NOAA
NOAA National Water Model Short-Range Forecast NOAA
NOAA North American Mesoscale Forecast System (NAM) NOAA
NOAA Oceanic Climate Data Records NOAA staged-recipes
NOAA Rapid Refresh (RAP) NOAA
NOAA Rapid Refresh Forecast System (RRFS) Ensemble [Prototype] NOAA
NOAA Terrestrial Climate Data Records NOAA
NOAA U.S. Climate Gridded Dataset (NClimGrid) NOAA
NOAA Unified Forecast System Subseasonal to Seasonal Prototypes NOAA
ARPA-E PERFORM Forecast data NREL
NREL National Solar Radiation Database NREL
NREL Wind Integration National Dataset NREL
Atmospheric Models from Météo-France OpenMeteoData
SILO climate data on AWS Queensland Government
CMIP6 GCMs downscaled using WRF UCLA Center for Climate Science
UK Met Office Atmospheric Deterministic and Probabilistic Forecasts UK Met Office
Downscaled Climate Data for Alaska University of Alaska
High Resolution Downscaled Climate Data for Southeast Alaska University of Alaska
Sea Surface Temperature Daily Analysis: European Space Agency Climate Change Initiative product version 2.1 University of Reading
@rsignell-usgs
Copy link
Contributor

@sharkinsspatial I do not see the National Water Model retrospective 1km gridded data in the list.
I linked the notebooks I used to process this here: #224 (comment)

@glizee-tech
Copy link

Hello, I am looking for ECMWF ERA5 Reanalysis on a cloud solution for academical work. Indeed, I need some specific variable that are not available on aws s3 solution or on gcp. I see that you consider to process it fully in pangeo-forge. Are you able to give a deadline when it will be available ? Should I create my cloud solution myself which will be redundant with your futur solution and totally contrary to what this project is all about but useful in the short term. I hope I'm writting at the right spot.

@rabernat
Copy link
Contributor

Are you able to give a deadline when it will be available ?

No. ERA5 is extremely large and complex. Given the limited resources in this project, we can make any commitments to a timeline.

Should I create my cloud solution myself which will be redundant with your futur solution and totally contrary to what this project is all about but useful in the short term.

Yes, this is what we would recommend.

@glizee-tech
Copy link

@rabernat Thank you for your quick reply. Ok so I will create my own cloud solution but I will follow the progress of this amazing project.

@rsignell-usgs
Copy link
Contributor

@glizee-tech I also needed to access some ERA5 data that was not on cloud, and gave a Pangeo Showcase talk on my approach last fall. Just in case it's useful!

@glizee-tech
Copy link

@rsignell-usgs Yes indeed it's going to be very useful! Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants