Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong calculation of covariance matrix in function StatsAPI.vcov(fit::LsqFitResult) #255

Open
donovaly opened this issue Nov 12, 2024 · 18 comments · May be fixed by #256
Open

wrong calculation of covariance matrix in function StatsAPI.vcov(fit::LsqFitResult) #255

donovaly opened this issue Nov 12, 2024 · 18 comments · May be fixed by #256

Comments

@donovaly
Copy link

donovaly commented Nov 12, 2024

I am a beginner with Julia and tried it out by fitting data.
I got unexpected results for vcov(fit) and had a look in the source code.

There:

function StatsAPI.vcov(fit::LsqFitResult)

I noticed 2 issues:

  1. if one has no weights (errors of the values to be fit), the Covariance matrix is calculated using QR decomposition of the Jacobian but when one has weights no QR decomposition is used.
    This is strange because both methods deliver the same results but the QR decomposition might be more stable numerically
  2. the MSE is ignored when one has weights. I searched in literature and cannot find why one should omit the MSE. Also Mathlab uses the MSE:
    https://www.mathworks.com/help/stats/nlinfit.html#btk7ign-CovB

The second point is my main problem because the errors of the fit model parameters were larger than expected.
I think it is a bug to omit the MSE. When there are weights, the MSE includes them already, meaning the MSE does not become 1 because of the presence of weights. And therefore it cannot just be omitted.

I the tutorial I see there correctly:
https://julianlsolvers.github.io/LsqFit.jl/latest/tutorial/#Weighted-Least-Squares-1
Cov(γ∗)=σ²[J′WJ]^−1
and σ² is the MSE
there you also write

the covariance of the estimated parameter is calculated as covar = inv(J'*fit.wt*J)

But this is incorrect because the dimensions won't fit and I get the error:
DimensionMismatch and looking at the source code the calculations is
covar = covar = inv(J' * J)
so the weights are ignored.

I use Julia 1.11.1 and LsqFit v0.15.0

p.s. as a newbie I might report the error at the wrong place, please point me then to the correct place.

p.s. the LsqFit docs:
https://julianlsolvers.github.io/LsqFit.jl/latest/tutorial
I see the command ''estimate_var'' but this is marked deprecated.

@donovaly
Copy link
Author

side note: I noticed that this line

r = fit.resid

can most probably removed since it is not used.

@donovaly
Copy link
Author

I just tested with the KDE program LabPlot:
https://labplot.kde.org/
and their covariance matrix is the one I get when I use the MSE.
So they use

Q, R = qr(J)
Rinv = inv(R)
covar = Rinv * Rinv' * mse(fit)

also for weighted fits.

@pkofod
Copy link
Member

pkofod commented Nov 14, 2024

Thank you, this is certainly the right place to report the issue. Thank you for taking the time.

I am the maintainer of this package and I also developed some of it. Most of it is something I moved out of Optim because I didn't think it was good to have all the statistics part of Optim. I think now that the LM method should maybe have stayed but I am digressing.

I think you may be right. I also think we had this discussion many times before in what are now closed issues. I will have to go back and consult. I think there are some nuances here that depend on assumptions such was where the weights are put. Are you weighting the residuals or are you weighting the squared residuals. That matters in the end. So I am open to this change, but I want to make sure that we have references and the math in order, because as I said I think this discussion has been had many times before with diverging opinions and I honestly never really use NLS myself :)

@donovaly
Copy link
Author

donovaly commented Nov 14, 2024

Thanks for the reply. However, I am bit confused. I don't know about a package named Optim. I just googled how to perform a NLSF with Julia, found LsqFit, installed it and gave it a try. As a physicist I have to perform NLSFs all the time.

I hope I made claer where I see the bug: when one has weights (meaning data errors) LsqFit does not take the MSE into account. If it would, it would deliver the same results I get with other programs.

To reproduce, here is my test data

xVals = [1, 2, 4, 5, 8]
yVals = [3, 4, 6, 11, 20]
yError = (0.05 .* yVals)

this is the fit model:
f(x) = p1 * exp(p2*x)

Here is a Julia script to reproduce:
Goodness-of-fits.zip

@pkofod
Copy link
Member

pkofod commented Nov 14, 2024

Yes, I understand. My imperfect memory just told me that it was set up in that way on purpose. I just can't remember the details :) So I have to read up on it. I remember it as such that the weights are included in the residuals so they carry over to the Jacobian so they indeed not ignored, but just incorporated into J. But I have to check.

@pkofod
Copy link
Member

pkofod commented Nov 14, 2024

Okay, I looked at your code example. One thing that does strike me is that you input the weights as inverse variances, but I believe they should be inverse standard deviations. If you input a matrix instead it is interpreted as a variance-covariance matrix. So vector weights -> standard deviation interpretation and matrix weights -> variance-covariance interpretation.

@pkofod
Copy link
Member

pkofod commented Nov 14, 2024

Some past discussion: #69 #70 #85 #103 #158

@donovaly
Copy link
Author

Okay, I looked at your code example. One thing that does strike me is that you input the weights as inverse variances, but I believe they should be inverse standard deviations

but I set
wt = @. 1/yError^2
So the inverse std deviation. This is the default weighting when the data is normally distributed, thus also the default in Origin and LabPlot.

If you input a matrix instead it is interpreted as a variance-covariance matrix.

But I want the variance-covariance matrix to be computed. I don't know it yet.

I just have data and know the error of the data and perform a fit. As result I need the errors of the found fit parameters. I think this is the most standard use case.

@donovaly
Copy link
Author

Some past discussion: #69 #70 #85 #103 #158

from #85

Since the weights are already included in the function which we want to minimize, the correct covarinace should be: `inv(J' * J).

This statement is misleading. Sure, the Jacobian must contain the weighting and LsqFit does this correctly. The point is why the MSE is not used.
Think about it:

  • you have MSE * inv(J' * J) for the case of no weights. Why should now, with the presence the MSE become 1, so that you can write 1 * inv(J' * J)?

You can give this a try:

  • change line 84 in my script from
    jacobianMatrix = Zygote.jacobian((params, x, error) -> getyValsWithParams(x, params)./error, fitParams, xVals, yError)[1]
    to
    jacobianMatrix = Zygote.jacobian((params, x) -> getyValsWithParams(x, params), fitParams, xVals)[1]

@donovaly
Copy link
Author

In #103 (comment)
he says the same.

I think I made it clear:

  1. it is a bug to omit the MSE when there are weights. You can calculate this.
  2. there is no reason to calculate cov one case by the QR decomposition and and one time not
  3. the line r = fit.resid can be omitted

I will make a PR with the way I think the fix should be then you can give this to e.g. a mathematician.

@donovaly donovaly linked a pull request Nov 14, 2024 that will close this issue
@donovaly
Copy link
Author

@gerlachs is one of the main LabPlot (https://github.com/KDE/labplot) developers, maybe he can give feedback since my PR would change LsqFit so that it delivers the same output as LabPlot.

@donovaly
Copy link
Author

OK, the proof that there is a bug and that my PR fixes it is simple: when you make a non-weighted fit you must get the same result than when making a weighted fit but setting the weights to 1.0. (This was already mentioned here: #103 (comment))

Here is a small Julia script with the proof:
LsqTest.zip

@pkofod
Copy link
Member

pkofod commented Nov 15, 2024

I do understand the issue (and thank you for the PR). What I was trying to state above is that as far as I remember, the conclusion in the above links were that the behavior is intentional (you can argue what you want the software to do of course), because if you tell me what the uncertainty is at each collected sample, then there is no variance component to estimate and implicitly the variance is set to 1 and your weights will change the variance (sigma_i) accordingly in the calculations. I suppose it has to do with the way you wish to use the weights.

Happy to hear from @gerlachs on this.

I am not opposed to your PR or your comments I am just trying to explain what I seem to remember was the guiding principle way-back-when... and again, I am not a user of NLS I just happened to become the owner of the code in 2017 :)

@pkofod
Copy link
Member

pkofod commented Nov 15, 2024

but I set
wt = @. 1/yError^2
So the inverse std deviation. This is the default weighting when the data is normally distributed, thus also the default in Origin and LabPlot.

How is that the inverse standard deviation? What is yError? Is that not the standard deviation? Then this is the inverse variance. I do not suppose yError is the square root of the standard deviation?

Edit: and to clarify, my point is that if you have a series of observations and a model function and you estimate parameters unweighted then MSE is indeed an estimate of the variance of the residual distribution. However, if you come to me and say "I know this variance" and we weight the residuals by the inverse of the standard deviation derived from it then your transformed residuals should have variance 1 because what you have done is that you have standardized your normal residuals by weighting.

@donovaly
Copy link
Author

donovaly commented Nov 16, 2024

We should focus on the bug. Weighting with a vector containing 1.0 as entries must deliver the same result than no weighting because this is mathematically the same. In my last post I provided a script to check that this is not the case. So there is a clear bug.

My PR fixes the bug. Where do you see my PR is wrong?

I also think there should be a test to assure that in future weighting with 1.0s deliver the same result than no weighting.

@pkofod
Copy link
Member

pkofod commented Nov 17, 2024

We should focus on the bug. Weighting with a vector containing 1.0 as entries must deliver the same result than no weighting because this is mathematically the same. In my last post I provided a script to check that this is not the case. So there is a clear bug.

My PR fixes the bug. Where do you see my PR is wrong?

I appreciate the interest and I appreciate the effort to fix the bug you are arguing for being a bug, but I also feel like I have to tell you that to me you are coming off as slightly aggressive and not exactly cooperative. Do you have a lot of experience in open source contributions?

If we can discuss in a bit less imperative form ("We should focus on the bug.") then can you maybe reply to my comment above? I stated that if you input the variance/standard deviation as the true noise, do you then not accept that the estimation is different? If sigma_i is the noise term of the i'th observations then the normalized observations have standard deviation/variance 1. Do you agree so far? If so, then there are only coefficients/parameters to estimate because the error variance is fixed. Inputting "weights" of 1 is then not the same as no weights if we follow this interpretation because if we do not input weights of 1 we do not assume that the variance/standard deviation is 1.

@donovaly
Copy link
Author

donovaly commented Nov 17, 2024

Inputting "weights" of 1 is then not the same as no weights if we follow this interpretation because if we do not input weights of 1 we do not assume that the variance/standard deviation is 1.

I cannot follow you, to be honest. When I calculate e.g. χ² reduced (in LsqFit it is the mse() function), with errors, I weight. Mathematically it is this:
χ²red = (observed_i - expected_i)^2 * weight_i /Degrees of Freedom
If you input 1's , I actually do not weight, because you multiply everything with a 1. That's all I say.

From the theory, you are only allowed to use the weighting =1/σ² in χ²-statistics for normally-distributed observable errors. However, the weighting is up to the use case. For the issue here it does no matter. What matters is that when one multiplies with 1's you have actually no weighting according to the formula.

So can you please tell me why weighting with 1's should not give the same result than no weighting despite the given formulas for the MSE and also the residuals?

Do you have a lot of experience in open source contributions?

Yes I have, for 15 years, also as maintainer, also in EU-funded OpenSource projects. But my or your background should not matter for an issue about math. I don't understand why you feel offended when I only wanted to point the discussion to the math case.

@donovaly
Copy link
Author

donovaly commented Dec 5, 2024

What is the status? The bug is real and severe since people rely on the results for their publications.
Please take my example and check with other programs like Origin, LabPlot or whatever.

Please compare the results you get with other programs with the one you get with my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants