Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create weights as dask array in test_deterministic.py #265

Closed
raybellwaves opened this issue Feb 10, 2021 · 9 comments · Fixed by #306
Closed

create weights as dask array in test_deterministic.py #265

raybellwaves opened this issue Feb 10, 2021 · 9 comments · Fixed by #306

Comments

@raybellwaves
Copy link
Member

I think the bottleneck for #221 Is that in a testing script we have something that adjusts the weights to fit the dimensions: https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/tests/test_deterministic.py#L72

This is called on a dask array https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/tests/test_deterministic.py#L161 which I believe causes computation.

We could either move to np before the call and cast to da after or think about rewriting it.

@aaronspring
Copy link
Collaborator

@raybellwaves
Copy link
Member Author

raybellwaves commented Feb 16, 2021

Xarray have some testing for if dask objects compute: pydata/xarray#4898 (comment)

from xarray.tests import raise_if_dask_computes
with raise_if_dask_computes():
    CODE

@raybellwaves
Copy link
Member Author

See #305

@raybellwaves
Copy link
Member Author

pytest -v xskillscore/tests/test_deterministic.py::test_distance_metrics_daskda_same_npda

@raybellwaves
Copy link
Member Author

raybellwaves commented May 6, 2021

To slim the test down

distance_metrics = [
    (me, _me),
]
distance_metrics_names = [
    "me",
]
AXES = (["time"])

@pytest.mark.parametrize("metrics", distance_metrics, ids=distance_metrics_names)
@pytest.mark.parametrize("dim", AXES)
@pytest.mark.parametrize("weight_bool", [True])
@pytest.mark.parametrize("skipna", [True])
@pytest.mark.parametrize("has_nan", [True])
def test_distance_metrics_daskda_same_npda(

@raybellwaves
Copy link
Member Author

raybellwaves commented May 6, 2021

import numpy as np
import xarray as xr

np.random.seed(42)

times = xr.cftime_range(start="2000", periods=12, freq="D")
lats = np.arange(4)
lons = np.arange(5)
members = np.arange(3)

# Setup variables
a = xr.DataArray(
    np.random.rand(len(times), len(lats), len(lons)),
    coords=[times, lats, lons],
    dims=["time", "lat", "lon"],
)
a_dask = a.chunk()
b = xr.DataArray(
    np.random.rand(len(members), len(times), len(lats), len(lons)),
    coords=[members, times, lats, lons],
    dims=["member", "time", "lat", "lon"],
).isel(member=0, drop=True)
b_dask = b.chunk()
weights_cos_lat = xr.ones_like(a) * np.abs(np.cos(a.lat))
weights_cos_lat_dask = weights_cos_lat.chunk()

a = a_dask.copy()
b = b_dask.copy()
weights = weights_cos_lat_dask.copy()

# Add a nan value
a = a.load()
a[0] = np.nan
a = a.chunk()

@dougiesquire
Copy link
Collaborator

Is this the same issue we talk about here? If so, I don't think the issue is that computation is triggered.

The issue is that you can't boolean index a numpy array using a dask array. This happens in xs.core.np_deterministic._match_nans() when a and b are dask arrays and weights is a numpy array (which is enforced in test_distance_metrics_xr_dask). So I think we actually need to add a fix into our code somewhere. I suggested two options:

  1. Add a catch into _match_nans that computes the indices if weights is in memory
  2. Enforce earlier on that a, b and weights are all the same array type. (I think Andrew preferred this option)

(My apologies if this is not what you're talking about - I'm on my phone so can't check very thoroughly right now)

@raybellwaves
Copy link
Member Author

raybellwaves commented May 6, 2021

Thanks @dougiesquire i think you are right. I haven't looked at the this for a few months so going through the steps again. As fair as I can tell the code is working but the values are not expected https://github.com/xarray-contrib/xskillscore/pull/305/checks?check_run_id=2517482958#step:7:5496

@dougiesquire
Copy link
Collaborator

Oh strange, that doesn't sound like the old issue then... It's likely I'm wrong and confusing things further. Bedtime here, but I can try take a closer look tomorrow (if you guys haven't solved it already 🙂)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants