Attributes incorrectly labeled with extract_dataset within a Dask client in a Jupyter Notebook #176

mccrayc · 2023-03-24T20:21:41Z

Setup Information

xscen version: 0.5.0
Python version: 3.10.8
JupyterLab version: 3.6.1

Description

A pretty specific issue here: when extracting data with xs.extract_dataset, the attributes within the resulting dataset are labeled "intake_esm_attrs:..." rather than "cat:..." when extraction is done within a Dask client, and within JupyterLab. Doing the same thing but running from within a script produces the correct attribute names. Running outside of a Dask client works as normal, both in a script and in a Notebook.

Steps To Reproduce

Simple example. Attributes are correct for this version (e.g.: 'cat:id':'CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'):

import xscen as xs
cat='/tank/scenario/catalogues/simulation.json'  
variables_and_freqs={'tas':'MS'}
other_search_criteria= {"source": 'CanESM5*', "experiment":'ssp370', "member":'r1i1p1f1'}
xr_combine_kwargs ={'coords': 'minimal', 
                   'data_vars': 'minimal', 
                   'compat': 'override'}

periods = [2028,2029]

cat_ref = xs.search_data_catalogs(data_catalogs=[cat],
                                      variables_and_freqs=variables_and_freqs,
                                      other_search_criteria= other_search_criteria,
                                      periods = periods,
                                      allow_resampling=True, 
                                      allow_conversion = True)

ds_dict = xs.extract_dataset(cat_ref['CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'],
                            variables_and_freqs = variables_and_freqs,
                        xr_combine_kwargs  = xr_combine_kwargs )  `

However, if xs.extract_dataset is wrapped with a Dask client, attributes keys are incorrect (e.g., 'intake_esm_attrs:id': 'CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'):

from dask.distributed import Client
with Client(n_workers=1, threads_per_worker=4,
               memory_limit='12GB') as client:
    ds_dict = xs.extract_dataset(cat_ref['CMIP6_ScenarioMIP_CCCma_CanESM5_ssp370_r1i1p1f1_global'],
                            variables_and_freqs = variables_and_freqs,
                        xr_combine_kwargs  = xr_combine_kwargs ) 
    print(ds_dict['MS'].attrs)

Additional context

Workaround suggested by @aulemahal :

Before xs.extract_dataset, after the "with Client(...) as client:" insert client.run(lambda: xs.__version__).

Contribution

I would be willing/able to open a Pull Request to address this bug.

The text was updated successfully, but these errors were encountered:

aulemahal · 2023-03-24T20:30:37Z

https://stackoverflow.com/questions/75837897/dask-worker-has-different-imports-than-main-thread

aulemahal · 2023-03-24T20:33:14Z

Seems related to how the client workers are created and then on how the dask functions are pickled and sent to them.

In a script, this happens if the xscen import is done after if __name__ == '__main__'. We don't do that usually of course. However, in a notebook, all code is executed after the equivalent.

aulemahal · 2023-03-27T14:13:40Z

From the answer I got on StackOverflow, I don't see any easy xscen-level solution. The issue is that intake-esm makes use of a global state variable (the options) within a dask Delayed function, although « Dask is generally aiming to be functional/stateless, so that each function call produces results based only on the arguments it is supplied ».

Passing the options as an argument to the Delayed would fix that. On our side, the Client.run hack seems to be good enough.

mccrayc added the bug Something isn't working label Mar 24, 2023

juliettelavoie mentioned this issue Jan 18, 2024

attrs_prefix not working in jupyter notebook with dask #316

Closed

1 task

juliettelavoie closed this as completed Jan 18, 2024

juliettelavoie reopened this Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes incorrectly labeled with extract_dataset within a Dask client in a Jupyter Notebook #176

Attributes incorrectly labeled with extract_dataset within a Dask client in a Jupyter Notebook #176

mccrayc commented Mar 24, 2023 •

edited by aulemahal

Loading

aulemahal commented Mar 24, 2023

aulemahal commented Mar 24, 2023

aulemahal commented Mar 27, 2023

Attributes incorrectly labeled with extract_dataset within a Dask client in a Jupyter Notebook #176

Attributes incorrectly labeled with extract_dataset within a Dask client in a Jupyter Notebook #176

Comments

mccrayc commented Mar 24, 2023 • edited by aulemahal Loading

Setup Information

Description

Steps To Reproduce

Additional context

Contribution

aulemahal commented Mar 24, 2023

aulemahal commented Mar 24, 2023

aulemahal commented Mar 27, 2023

mccrayc commented Mar 24, 2023 •

edited by aulemahal

Loading