Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VonMisesFisherMixture.log_likelihood() doesn't return a log likelihood value. What is it? #40

Open
mattroos opened this issue Jul 30, 2024 · 0 comments

Comments

@mattroos
Copy link

mattroos commented Jul 30, 2024

I'd like to compute the AIC goodness of fit for a fitted model. This requires knowing the likelihood function value for the set of estimated vonMises-Fisher parameters. But what is being returned by VonMisesFisherMixture.log_likelihood()? It is an array of size (n_clusters, n_samples) and would appear to be probability values (in [0, 1]) that a given sample belongs to a given cluster. They are not log-likelihood values (since those would all be < 0). From this array, what is the correct way to compute the likelihood value needed for computing AIC? I think it is something like below, but could be wrong since I'm not yet certain of what's being returned by log_likelihood().

likelihood = vmf_soft.log_likelihood(x) # shape (n_clusters, n_samples)
log_likelihood = np.sum(np.log(np.max(likelihood, axis=0))) # Choose the cluster of highest probability, convert that probability to log-likelihood, and sum across all samples.

This is based on equation 3.2 of the 2005 paper. The cluster/class weights may need to be involved too. I'm not sure if they're already incorporated into the values returned by log_likelihood().

@jasonlaska, maybe you can help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant