Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic approximation (Laplace) #162

Open
zenon opened this issue Aug 12, 2020 · 11 comments
Open

Quadratic approximation (Laplace) #162

zenon opened this issue Aug 12, 2020 · 11 comments

Comments

@zenon
Copy link

zenon commented Aug 12, 2020

Hi Chad,

while trying to understand "Statistical Rethinking" by Richard McElreath, I see that he uses quadratic approximation for the first half of the book.

As far as I understand, it approximates the posterior by estimating the mode of the posterior, and then finding the quadratic curvature of the log distribution at that place. So the result is a gaussian.

He mentiones in the lecture that this is in many cases a surprisingly good approximation, and takes much less computational effort than using MCMC.

His R implementation of quap is here
https://github.com/rmcelreath/rethinking/blob/master/R/map-quap.r

Do you think something like this can be made for Soss too?

Sadly, I still don't feel qualified to do so. So, yes, this is a feature request.

Kind greetings, z.

@cscherrer
Copy link
Owner

Hi @zenon , sorry for the delayed response.

From your description, this is also often called a Laplace approximation. I'll change the title, hope this is ok.

I think we'd want the domain of the quadratic to be ℝⁿ, so we'd need to transform (xform in Soss). That's pretty easy.

Then we'll need to

  1. Find the posterior mode (MAP estimate)
  2. Find the Hessian, probably using Zygote.

@cscherrer cscherrer changed the title quap (quadratic approximation) Quadratic approximation (Laplace) Sep 5, 2020
@zenon
Copy link
Author

zenon commented Sep 5, 2020

Hi Chad,

well, I'm impressed about the traffic in the Soss project these days. So, I didn't have the impression of you idling :-)

Cool, yes, that sounds right.
It may be that we aren't always on R^n, but some dimensions may be restricted to intervals. That might make it more difficult.
Or is this what xform is for?
And I expect that we need to give some initial values (in case these cannon be inferred, that would be better, of course).

Thank you for the response!

Kind greetings, z.

@cscherrer
Copy link
Owner

well, I'm impressed about the traffic in the Soss project these days. So, I didn't have the impression of you idling :-)

Yeah, it's great the way things have picked up. We've been fortunate to have some really great contributors, and lately Dilum and Joey have really been great pushing things forward.

It may be that we aren't always on R^n, but some dimensions may be restricted to intervals. That might make it more difficult.
Or is this what xform is for?

Yep! Here's a silly example. Say I have

m = @model begin
    a ~ Exponential()
    b ~ Normal()
    p ~ Beta(a, b^2)
end

Then

julia> tr = xform(m())
TransformVariables.TransformTuple{NamedTuple{(:b, :a, :p),Tuple{TransformVariables.Identity,TransformVariables.ShiftedExp{true,Float64},TransformVariables.ScaledShiftedLogistic{Float64}}}}((b = asℝ, a = asℝ₊, p = as𝕀), 3)

julia> tr(10.0 .+ randn(3))
(b = 11.451659890547628, a = 49022.08588576738, p = 0.9999880279321084)

julia> tr(-10.0 .+ randn(3))
(b = -11.0897193463806, a = 1.382748478446301e-5, p = 2.24356167877106e-5)

This turns any point in ℝⁿ into something the model can use

@zenon
Copy link
Author

zenon commented Sep 5, 2020

Wow. Cool!

@zenon
Copy link
Author

zenon commented Sep 12, 2020

Only now trying to find out what xform does here.

What ist its outcome? tr seems not to be a model, as

 rand(tr()) 

doesn't work. And what is the role of the arguments you give to tr?

@cscherrer
Copy link
Owner

xform returns one of these:
https://tamaspapp.eu/TransformVariables.jl/stable/

The idea is that sampling a model returns a NamedTuple, with some constraints due to differing supports across distributions.

We can evaluate the logpdf on any of these, but the structure of "constrained named tuple space" makes it inconvenient to explore.

So, the result of xform gives us a function that takes a Vector of the right length and returns a named tuple we can evaluate. Composing these lets us treat the model as if it's just a function on ℝⁿ.

@zenon
Copy link
Author

zenon commented Dec 21, 2020

Hi,
is there any documentation for xform? Or examples?

 m1 = @model N begin
     a ~ Beta(1,1)
     X ~ Bernoulli(a) |> iid(N)
 end;

Then xform(m1(N = 5)) gives me

MethodError: no method matching asTransform(::UnitRange{Int64})

Closest candidates are:

asTransform(!Matched::Distributions.RealInterval) at /home/mesh/.julia/packages/Soss/LvmPD/src/primitives/xform.jl:78

xform(::Distributions.Bernoulli{Float64}, ::NamedTuple{(),Tuple{}})@xform.jl:70
xform(::Soss.iid{Int64,Distributions.Bernoulli{Float64}}, ::NamedTuple{(),Tuple{}})@xform.jl:121
macro expansion@closure_conv.jl:121[inlined]
_xform(::Type{TypeEncoding(Main.workspace6)}, ::Soss.Model{NamedTuple{(:N,),T} where T<:Tuple,TypeEncoding(begin a ~ Beta(1, 1) X ~ Bernoulli(a) |> iid(N) end),TypeEncoding(Main.workspace6)}, ::NamedTuple{(:N,),Tuple{Int64}}, ::NamedTuple{(),Tuple{}})@closure_conv.jl:121
xform(::Soss.JointDistribution{NamedTuple{(:N,),Tuple{Int64}},NamedTuple{(:N,),T} where T<:Tuple,TypeEncoding(begin a ~ Beta(1, 1) X ~ Bernoulli(a) |> iid(N) end),TypeEncoding(Main.workspace6)}, ::NamedTuple{(),Tuple{}})@xform.jl:15

And I'm stuck again.

@cscherrer
Copy link
Owner

Right, that's because X is discrete, so its support doesn't biject to any ℝⁿ. You can see what it's trying to do like this:

julia> sourceXform(m1)
quote
    _result = NamedTuple()
    a = rand(Beta(1, 1))
    _t = xform(Beta(1, 1), _data)
    _result = merge(_result, (a = _t,))
    X = rand(iid(N, Bernoulli(a)))
    _t = xform(iid(N, Bernoulli(a)), _data)
    _result = merge(_result, (X = _t,))
    (TransformVariables.as)(_result)
end

xform is mostly an internal thing, though we should definitely add some developer docs. What are you trying to do that you need to use it directly?

@zenon
Copy link
Author

zenon commented Dec 21, 2020

Ah, of course. laughs

Hallo Chad! Thank you for the answer.

What are you trying to do

Well. I try to understand Soss and probabilistic programming. That's the main task, and I'm not there yet.
In this sub task, I still try to understand how to do the Laplace approximation. First part of sub task, implement MAP.

I assume, I can use something like Optim.jl (at least, as I learn from your answer, when everything is continuous). When I use xf = xform(model), I can use logpdf(model, xf(any vector of suitable dimension)) to compute a likelyhood, and Optim can find MAP.
So slowly slowly I seem to understand the first step.

That's where I am in the moment.

Kind greetings from Hamburg, z.

@cscherrer
Copy link
Owner

That sounds great! From there, you'll need the Hessian (matrix of second derivatives). From there, it will be different depending whether you do the approximation in the support or in the transformed space. And very often the log-determinant will come into play, because the transformation stretches the space in funny ways. Probably best to get MAP going before worrying too much about that.

Warm regards from Seattle :)

@jkbest2
Copy link

jkbest2 commented Mar 1, 2021

Just stumbled on this issue. I started working on a Laplace approximation package a while ago, but it has a long way to go before it's generally usable. Ideally I love to be usable for (marginal) max likelihood and composable with the various MCMC samplers/posterior approximations, similar to this paper that uses the Laplace approximation for some parameters and NUTS for the rest. I found Section 3.4 of GPML helpful. The Stan developers recently published this paper, which presents a more efficient way to do the approximation and get derivatives of the hyperparameters it depends on. GPML and the Stan developers' paper only give results for a few observation likelihoods because you need third derivatives. TMB (paper) on the other hand is very generic, so it's obviously possible to generalize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants