Proposed Recipes for the Last Millennium Reanalysis, v2.x #142

CommonClimate · 2022-06-28T22:59:32Z

Source Dataset

The Last Millennium Reanalysis (LMR) utilizes an ensemble methodology to assimilate paleoclimate data for the production of annually resolved climate field reconstructions of the Common Era. The data are available at NOAA but not (as far as we know) enabled for OpenDAP access, much less cloud access. The PaleoCube project would like to make them available to paleoclimatologists to support several workflows in the Cloud.

Gridded fields (sea-level pressure, surface air temperature, sst, precipitation, Palmer Drought Severity Index) have the format: (time, MCrun, lat, lon) where time is the year, lat is the latitude index, lon is the longitude index, and MCrun indicate the Monte Carlo iteration index. There are in fact 20 LMR reconstructions contained in these arrays. They differ in the climate model ensemble prior to assimilation (random draws from the CCSM4 Last Millennium simulation) and the proxies that were drawn randomly for the reconstruction (75% of all available proxies). All fields are anomalies from the 1951--1980 time-mean.
File and variable naming conventions follow as closely as possible those for the NOAA 20th Century Reanalysis.
In addition, there are files with full (5000-member) ensembles for global mean surface temperature, northern and southern hemisphere temperature, and various climate indices (e.g. AMO, PDO, AO, NAO, NINO3, SOI).

Data from two versions (2.0 and 2.1) are provided, both described in Tardif et al. (2019). Common aspects are:

CCSM4 Last Millennium simulation as the source of prior, with states from 100 randomly drawn years as the prior ensemble in each Monte-Carlo realization.
Regression-based Proxy System Models, formulated using the seasonal responses of individual records, with bivariate models w.r.t. temperature and precipitation for tree-ring width proxies, and univariate w.r.t. temperature for all other proxy archives.
Covariance localization applied with a cut-off length scale of 25000 km
Reconstructions generated at an annual resolution, on a 2x2deg grid.

Differences are related to the set of assimilated proxies:

LMR v2.1: Proxies from the PAGES2k (2017) data set*. Corresponds to results presented in Tardif et al. (2019), section 3, figures 2-5.
LMR v2.0: Proxies from PAGES2k (2017) + Anderson et al. (2019) [see figure 8 from Tardif et al. (2019)]. Reconstruction results are discussed in Tardif et al. (2019), section 4.3, and shown in figures 9-10, (e) and (f), and in figure 11.

with the exception of the Palmyra coral record: we used the more recent version from Emile-Geay et al (2013), instead of the version from Cobb et al. (2003) as in PAGES2k (2017); and the Kiritimati coral record: the longer record from the Anderson et al (2019) dataset, taken from Cobb et al (2013), replaces a slightly shorter record included in PAGES2k (2017).

Link to the website / online documentation for the data: https://www.ncei.noaa.gov/access/paleo-search/study/27850
The file format is netCDF
The source files are organized as follows: 4 files per gridded field: v2.0 and v2.1, each with mean and spread across the ensemble. for indices, 8 files (4 files per LMR "flavor": GMST, NHMT, SHMT, and posterior indices)
How are the source files accessed : access protocol unknown, but netCDF files are available here: v2.0 files , v2.1 files
Data are public, fully open.

Transformation / Alignment / Merging

No transformation beyond loading into zarr. The .nc files can easily be loaded by xarray, so this step should not pose particular problems.

Output Dataset

zarr format, preferably parked in GCP US-central so it is easily accessible by 2i2c's linkedearth research hub

The text was updated successfully, but these errors were encountered:

jordanplanders · 2022-09-20T05:41:47Z

@cisaacstern Ok! I think I've got another one in the works! One question that came up however is whether it would be best to move these data from their current residence on the NOAA FTP server to THREDDS, and whether that will introduce any new subtleties I should be aware of.

When I ran this locally (with the FTP urls), it took about two hours.

variables = ['air_mean', 'air_spread', 
             'pdsi_mean', 'pdsi_spread', 
             'pr_mean', 'pr_spread', 
             'prate_mean', 'prate_spread', 
             'prmsl_mean', 'prmsl_spread', 
             'sst_mean', 'sst_spread']


def make_url(time, variable):
    pair = variable.rsplit('_',1)
    stem = 'https://www.ncei.noaa.gov/pub/data/paleo/reconstructions/tardif2019lmr/v2_1/'
    nc_file = '{_var}_MCruns_ensemble_{val_type}_LMRv2.1.nc'.format(_var=pair[0], val_type=pair[1])
    url = stem+nc_file
    return url

# the full time series is in each file, each of which is between ~300 mb and ~3 Gb
time_concat_dim = ConcatDim("time", [0])
pattern = FilePattern(make_url,
                      time_concat_dim,
                      MergeDim(name="variable", keys=variables))


# renames variable to var_* where * is either the "mean" or "spread" value type
def postproc(ds):
    variables = [var for var in ds.data_vars.keys() if 'bound' not in var]

    if 'spread' in ds.attrs['comment'].lower():
        data_type = 'spread'
    elif 'mean' in ds.attrs['comment'].lower():
        data_type = 'mean'
        
    ds = ds.rename(name_dict={variable: '_'.join([variable, data_type]) for variable in variables})
        
    return ds
       
     
# use subset_inputs to make the processing more tractable
recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=1,
                          consolidate_zarr=True,
                          subset_inputs={'time':42},
                          target_chunks={'time':1},
                          process_chunk = postproc, 
                          copy_input_to_local_file=False,
                          xarray_open_kwargs={'decode_coords':True, 
                                              'use_cftime':True, 
                                              'decode_times':True})

jordanplanders · 2022-09-20T05:50:19Z

@cisaacstern True to form, I think I might have a way to tackle the un-gridded variables, but that will have to wait for tomorrow :)

cisaacstern · 2022-09-20T18:49:11Z

One question that came up however is whether it would be best to move these data from their current residence on the NOAA FTP server to THREDDS

Up to you! FWIW, I don't think 2 hrs to cache data is necessarily that long. My intuition is that waiting 2 hrs to cache the data (which only has to happen once) is a smaller price to pay than moving things around on the NOAA side, but I don't know how easy it may (or may not) be to move to THREDDS.

CommonClimate added the proposed recipe label Jun 28, 2022

CommonClimate mentioned this issue Jun 28, 2022

Proposed Recipes for Last Millennium Reanalysis (online data assimilation) #143

Open

CommonClimate mentioned this issue Sep 2, 2022

2k-flavored tutorials LinkedEarth/paleobooks_mathom#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Recipes for the Last Millennium Reanalysis, v2.x #142

Proposed Recipes for the Last Millennium Reanalysis, v2.x #142

CommonClimate commented Jun 28, 2022 •

edited

Loading

jordanplanders commented Sep 20, 2022

jordanplanders commented Sep 20, 2022

cisaacstern commented Sep 20, 2022

Proposed Recipes for the Last Millennium Reanalysis, v2.x #142

Proposed Recipes for the Last Millennium Reanalysis, v2.x #142

Comments

CommonClimate commented Jun 28, 2022 • edited Loading

Source Dataset

Transformation / Alignment / Merging

Output Dataset

jordanplanders commented Sep 20, 2022

jordanplanders commented Sep 20, 2022

cisaacstern commented Sep 20, 2022

CommonClimate commented Jun 28, 2022 •

edited

Loading