Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drift vs climate / interannual trends #9

Open
adele-morrison opened this issue Dec 8, 2022 · 5 comments
Open

Drift vs climate / interannual trends #9

adele-morrison opened this issue Dec 8, 2022 · 5 comments
Assignees

Comments

@adele-morrison
Copy link
Collaborator

Can we learn from other model runs or the 1deg simulation how large the BGC drift might be compared to the climate change-driven trends over this cycle?

@hakaseh
Copy link
Collaborator

hakaseh commented Dec 13, 2022

I can't attend the hackathon today so just share my progress here.. I've been trying to compute annual-mean global averages at different vertical slices (i.e. surface, upper 100 m, the entire water column), but so far the kernel dies/restarts.
I just pushed my notebook: https://github.com/hrsdawson/ACCESS-WOMBAT_01deg_BGC_validation/blob/hakaseh/notebooks/drift.ipynb
Could someone have a look and tell me how it could be improved (e.g. more efficient way of computation)? @adele157 @hrsdawson ?
vol1y = vol.groupby('time.year').mean().load()
The computing becomes super slow at the step above and taking up lots of memory (>36GB currently), and I guess the kernel dies because it exceeds my memory request limit (46GB). Should I request more memory?

@hrsdawson
Copy link
Owner

I'd try increasing the memory, 46GB is quite small for analysing 0.1 deg outputs. You can request up to 196 or 240GB, depending which queue you're using. Also, I found reading in the files manually (not using the cookbook) was a bit faster. You can define a preprocessing function when doing this to limit the data that you read in from each file.

E.g. maybe something like this?
datadir = '/g/data/cj50/access-om2/raw-output/access-om2-01/01deg_jra55v140_iaf_cycle4/'
files = sorted(glob(datadir + '/output*/ocean/oceanbgc-3d-fe-1-monthly-mean-3-sigfig-ym_*.nc')) (Note this doesn't grab the last 3 years of data from 2019-2021).

def preproc(ds):
return ds.sel(st_ocean=slice(0,100)).mean(dim='xt_ocean').mean(dim='yt_ocean')

fe_mod = xr.open_mfdataset(files, preprocess=preproc).fe

Then you can read in the vertical cell grid size to average vertically and then use .groupby('time.year').mean(). Not sure how much more efficient this will be, but might speed it up?

@hakaseh
Copy link
Collaborator

hakaseh commented Dec 13, 2022

thanks @hrsdawson
I'll try to request more memory. i have been using the default set up of OOD so far; maybe time for me to learn ARE.
I have never used preproc, which seems like a nice approach. will try.

@hakaseh
Copy link
Collaborator

hakaseh commented Dec 14, 2022

@hrsdawson do you have any tips on how to deal with the error like below?:

KilledWorker: Attempted to run task ('open_dataset-05003a87aef12742998616e5de94f5c8dzt-a1fda4fdbac823dabecf2d807c69d5d6', 0, 0, 0, 0) on 3 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:38303. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.

My updated notebook is here:
https://github.com/hrsdawson/ACCESS-WOMBAT_01deg_BGC_validation/blob/hakaseh/notebooks/drift.ipynb

@hrsdawson
Copy link
Owner

hrsdawson commented Dec 14, 2022

@hrsdawson do you have any tips on how to deal with the error like below?:

KilledWorker: Attempted to run task ('open_dataset-05003a87aef12742998616e5de94f5c8dzt-a1fda4fdbac823dabecf2d807c69d5d6', 0, 0, 0, 0) on 3 different workers, but all those workers died while running it. The last worker that attempt to run the task was tcp://127.0.0.1:38303. Inspecting worker logs is often a good next step to diagnose what went wrong. For more information see https://distributed.dask.org/en/stable/killed.html.

I'm not sure about that error, sorry @hakaseh. Are you using ARE? I've often had trouble when trying to take a time mean in preprocessing step as xarray sometimes struggles to read and concatenate the resulting time coordinate. Especially because each of the individual 3D datasets you're preprocessing won't have the whole year of data to take the mean. However I see that you could read in dzt using that preprocessing step, so maybe that's not the problem.. You could try without and see if that helps? Or you may just need more memory for each of the workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants