Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result of halfwidth_ci_test may not correct. #413

Open
Tianjie-Wu opened this issue Jan 4, 2024 · 4 comments
Open

Result of halfwidth_ci_test may not correct. #413

Tianjie-Wu opened this issue Jan 4, 2024 · 4 comments

Comments

@Tianjie-Wu
Copy link

In document of xskillscore, result of halfwidth_ci_test is described as follow: difference in scores (score(f2) - score(f1))

However, I discovered that this is only partially right.
For example, when the metric is RMSE, this method returns the difference score(f2) - score(f1), but when the metric is ME, this function returns the difference score(f1) - score(f2).

Code Sample, a copy-pastable example if possible

length = 100
obs_1d = xarray.DataArray(
       numpy.random.rand(length),
       coords=[
           numpy.arange(length),
       ],
       dims=["time"],
       name='var'
   )
fct_1d = obs_1d.copy()
fct_1d.values = numpy.random.rand(length)

# create a worse forecast with high but different to perfect correlation and choose RMSE as the distance metric
fct_1d_worse = fct_1d.copy()
step = 3
fct_1d_worse[::step] = fct_1d[::step].values + 0.3

# half-with of the confidence interval at level alpha is larger than the RMSE differences,
# therefore not significant
metric = "rmse"
alpha = 0.05
significantly_different, diff, hwci = xskillscore.halfwidth_ci_test(
    fct_1d, fct_1d_worse, obs_1d, metric, time_dim="time", dim=[], alpha=alpha
)
print("RMSE DIFF:",diff.values)
print("RMSE HWCI:",hwci.values)
print(
    f"RMSEs significantly different at level {alpha} : {bool(significantly_different)}"
)

metric = "me"
alpha = 0.05
significantly_different, diff, hwci = xskillscore.halfwidth_ci_test(
    fct_1d, fct_1d_worse, obs_1d, metric, time_dim="time", dim=[], alpha=alpha
)
print("ME DIFF:",diff.values)
print("ME HWCI:",hwci.values)
print(
    f"MEs significantly different at level {alpha} : {bool(significantly_different)}"
)

print ( "RMSE 2,1: ",xskillscore.rmse(fct_1d_worse,obs_1d, dim=[]).values.mean(),xskillscore.rmse(fct_1d,obs_1d, dim=[]).values.mean() )
print ( "RMSE DIFF 2-1:", (xskillscore.rmse(fct_1d_worse,obs_1d, dim=[]).values-xskillscore.rmse(fct_1d,obs_1d, dim=[]).values).mean() )
print ( "ME 2,1: ",(fct_1d_worse-obs_1d).mean().values,(fct_1d-obs_1d).mean().values )
print ( "ME DIFF 2-1: ",((fct_1d_worse-obs_1d).values-(fct_1d-obs_1d).values).mean() )

print ("xskillscore version:",xskillscore.__version__)

the result is listed as follow:

RMSE DIFF: 0.002145380792967111
RMSE HWCI: 0.03084942672029623
RMSEs significantly different at level 0.05 : False
ME DIFF: -0.10199999999999998
ME HWCI: 0.027772820244038265
MEs significantly different at level 0.05 : True
RMSE 2,1:  0.35575854107274524 0.3536131602797781
RMSE DIFF 2-1: 0.0021453807929670793
ME 2,1:  0.02998835594288012 -0.07201164405711986
ME DIFF 2-1:  0.102
xskillscore version: 0.0.24
@aaronspring
Copy link
Collaborator

Hi @Tianjie-Wu, thanks for reporting this. Could you specify a bit more what you think should change? Just the documentation or do you think the computation is wrong somewhere?

@Tianjie-Wu
Copy link
Author

Hi @Tianjie-Wu, thanks for reporting this. Could you specify a bit more what you think should change? Just the documentation or do you think the computation is wrong somewhere?

Hello @aaronspring , thank you for your response.
I believe that just mentioning it in the manual should be sufficient, considering the problem is simple to solve (just multiply by -1).

@aaronspring
Copy link
Collaborator

Feel free to open a PR

@Tianjie-Wu
Copy link
Author

Feel free to open a PR

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants