`evaluate!!` shenanigans #720

penelopeysm · 2024-11-25T16:45:44Z

`evaluate!!` as it stands

Our current implementation of evaluate!! is as follows.

Lines 890 to 935 in ba490bf

    
           function AbstractPPL.evaluate!!( 
        
               model::Model, varinfo::AbstractVarInfo, context::AbstractContext 
        
           ) 
        
               return if use_threadsafe_eval(context, varinfo) 
        
                   evaluate_threadsafe!!(model, varinfo, context) 
        
               else 
        
                   evaluate_threadunsafe!!(model, varinfo, context) 
        
               end 
        
           end 
        
           function AbstractPPL.evaluate!!( 
        
               model::Model, 
        
               rng::Random.AbstractRNG, 
        
               varinfo::AbstractVarInfo=VarInfo(), 
        
               sampler::AbstractSampler=SampleFromPrior(), 
        
               context::AbstractContext=DefaultContext(), 
        
           ) 
        
               return evaluate!!(model, varinfo, SamplingContext(rng, sampler, context)) 
        
           end 
        
           function AbstractPPL.evaluate!!(model::Model, context::AbstractContext) 
        
               return evaluate!!(model, VarInfo(), context) 
        
           end 
        
           function AbstractPPL.evaluate!!( 
        
               model::Model, args::Union{AbstractVarInfo,AbstractSampler,AbstractContext}... 
        
           ) 
        
               return evaluate!!(model, Random.default_rng(), args...) 
        
           end 
        
           # without VarInfo 
        
           function AbstractPPL.evaluate!!( 
        
               model::Model, 
        
               rng::Random.AbstractRNG, 
        
               sampler::AbstractSampler, 
        
               args::AbstractContext..., 
        
           ) 
        
               return evaluate!!(model, rng, VarInfo(), sampler, args...) 
        
           end 
        
           # without VarInfo and without AbstractSampler 
        
           function AbstractPPL.evaluate!!( 
        
               model::Model, rng::Random.AbstractRNG, context::AbstractContext 
        
           ) 
        
               return evaluate!!(model, rng, VarInfo(), SampleFromPrior(), context) 
        
           end

Here's a little exercise for anyone reading this code. There are 4 arguments which are seemingly optional, rng, varinfo, sampler, and context, hence there are 2^4 = 16 possible invocations of evaluate!!.

Figure out: (1) which (if any) of them error, and (2) which of them wrap the context in a SamplingContext (line 907).

Answers:

r v s c
✓ ✓ ✓ ✓ Success - wrapped
✓ ✓ ✓ ✗ Success - wrapped
✓ ✓ ✗ ✓ Error
✓ ✓ ✗ ✗ Success - wrapped
✓ ✗ ✓ ✓ Success - wrapped
✓ ✗ ✓ ✗ Success - wrapped
✓ ✗ ✗ ✓ Success - wrapped
✓ ✗ ✗ ✗ Success - wrapped
✗ ✓ ✓ ✓ Success - wrapped
✗ ✓ ✓ ✗ Success - wrapped
✗ ✓ ✗ ✓ Success
✗ ✓ ✗ ✗ Success - wrapped
✗ ✗ ✓ ✓ Success - wrapped
✗ ✗ ✓ ✗ Success - wrapped
✗ ✗ ✗ ✓ Success
✗ ✗ ✗ ✗ Success - wrapped

If the table above seems to suggest that "most of the time it's wrapped", this is not really true in practice: the vast majority of calls to evaluate!! in the DynamicPPL repository are of the form evaluate!!(model, varinfo, context), which is one of the non-wrapped cases.

As you may guess, I'm not a fan of over-using multiple dispatch for this purpose!

It is very difficult to reason about (as the exercise above should have shown), it can easily lead to bugs (#629) or method ambiguities (#636), and the structure of the code does not clearly communicate what we're really trying to say (which is that 'the arguments are optional').

Keyword arguments

In many other programming languages, optional arguments are handled using keyword arguments that have default values. The benefits of this are:

only one method needs to be explicitly defined in the code, making it much easier to reason about;
the role of each argument is more visible at the point of call (compare f(1, 2) with f(; add=1, multiply=2));
the function signature makes it obvious to anybody reading the code that the arguments are optional;
the caller does not need to remember the order of specifying the arguments (which helps to address another easy source of bugs; compare f(2, 1) with f(; multiply=2, add=1)).

Unfortunately, I dabbled with turning evaluate!! into a function that used keyword arguments (see https://github.com/TuringLang/DynamicPPL.jl/blob/py/evaluate/src/model.jl#L904-L927), and this led to a noticeable performance hit:

Benchmarks

Setup:

using Chairmarks
using Distributions
using DynamicPPL
using Random

@model f() = x ~ Normal()
model = f()
rng = Xoshiro(128)
vi = VarInfo(model)
spl = SampleFromPrior()
ctx = DefaultContext()

master:

julia> @be DynamicPPL.evaluate!!(model) evals=1
Benchmark: 21487 samples with 1 evaluation
 min    791.000 ns (56 allocs: 3.719 KiB)
 median 917.000 ns (56 allocs: 3.719 KiB)
 mean   3.967 μs (56 allocs: 3.719 KiB, 0.01% gc time)
 max    52.700 ms (56 allocs: 3.719 KiB, 99.94% gc time)

julia> @be DynamicPPL.evaluate!!(model, rng) evals=1
Benchmark: 54278 samples with 1 evaluation
 min    917.000 ns (57 allocs: 3.734 KiB)
 median 1.083 μs (57 allocs: 3.734 KiB)
 mean   1.183 μs (57 allocs: 3.734 KiB, <0.01% gc time)
 max    1.028 ms (57 allocs: 3.734 KiB, 97.67% gc time)

julia> @be DynamicPPL.evaluate!!(model, vi, ctx) evals=1
Benchmark: 118298 samples with 1 evaluation
 min    208.000 ns (11 allocs: 480 bytes)
 median 292.000 ns (11 allocs: 480 bytes)
 mean   288.398 ns (11 allocs: 480 bytes)
 max    18.958 μs (11 allocs: 480 bytes)

julia> @be DynamicPPL.evaluate!!(model, rng, vi, spl, ctx) evals=1
Benchmark: 115930 samples with 1 evaluation
 min    208.000 ns (11 allocs: 480 bytes)
 median 333.000 ns (11 allocs: 480 bytes)
 mean   318.844 ns (11 allocs: 480 bytes)
 max    20.917 μs (11 allocs: 480 bytes)

This PR:

julia> @be DynamicPPL.new_evaluate!!(model; wrap=true) evals=1
Benchmark: 21543 samples with 1 evaluation
 min    791.000 ns (56 allocs: 3.719 KiB)
 median 917.000 ns (56 allocs: 3.719 KiB)
 mean   3.888 μs (56 allocs: 3.719 KiB, 0.01% gc time)
 max    52.684 ms (56 allocs: 3.719 KiB, 99.94% gc time)

julia> @be DynamicPPL.new_evaluate!!(model; rng=rng, wrap=true) evals=1
Benchmark: 60174 samples with 1 evaluation
 min    875.000 ns (59 allocs: 3.797 KiB)
 median 1.041 μs (59 allocs: 3.797 KiB)
 mean   1.269 μs (59 allocs: 3.797 KiB, 1.08% gc time)
 max    4.065 ms (59 allocs: 3.797 KiB, 64356.64% gc time)

julia> @be DynamicPPL.new_evaluate!!(model; varinfo=vi, context=ctx) evals=1
Benchmark: 113010 samples with 1 evaluation
 min    250.000 ns (13 allocs: 672 bytes)
 median 333.000 ns (13 allocs: 672 bytes)
 mean   331.330 ns (13 allocs: 672 bytes, 8.32% gc time)
 max    16.792 μs (13 allocs: 672 bytes, 939777.60% gc time)

julia> @be DynamicPPL.new_evaluate!!(model; rng=rng, varinfo=vi, sampler=spl, context=ctx, wrap=true) evals=1
Benchmark: 113324 samples with 1 evaluation
 min    334.000 ns (14 allocs: 720 bytes)
 median 458.000 ns (14 allocs: 720 bytes)
 mean   495.560 ns (14 allocs: 720 bytes, <0.01% gc time)
 max    3.929 ms (14 allocs: 720 bytes, 99.05% gc time)

I don't fully know the reasons for this, but possible explanations are overheads arising from the handling of keyword arguments (https://docs.julialang.org/en/v1/devdocs/functions/#Keyword-arguments) and also potential type instability, as kwcall generates a local variable that is a Union of all the keyword arguments. You can see the type instability e.g. with

g(x::Int; y::Int=2, z::Float64=3.0, w::String="Hello, world", t::AbstractArray{<:Number}=Float64[]) = x + y + z + length(w) + sum(t)
@code_warntype g(1; y=3, z=4.0)

Where to go from here?

Firstly, at the very least, I reckon the wrapping in SamplingContext should really be done at the call site, not inside evaluate!! itself. This shouldn't be too hard to fix, it's a matter of identifying invocations of evaluate!! according to the above table. This would be a nice quick win.

As for the method itself: one possibility is to make no changes. As someone developing and maintaining this codebase, dealing with dispatch rabbit holes like this one is a pretty frustrating time sink, so I'm really not keen to keep it this way. (Imagine how it's like for someone completely new, too. If we want to encourage open-source contributions, things like these could be quite discouraging.)

The other possibility that I can see is to just restrict the number of possible invocations. For example, I think keeping the three following methods would preserve almost all of the required behaviour. Some invocations that do not follow the patterns below will have to be modified, e.g. by specifying additional arguments which used to be defaults.

evaluate!!(model)
evaluate!!(model, varinfo, context)
evaluate!!(model, rng, varinfo, sampler, context)

But maybe there's some other pathway that I don't know of, which both preserves flexibility and performance while improving readability?

AbstractPPL and interfaces

evaluate!! is really defined in AbstractPPL, but AbstractPPL doesn't specify an actual interface for evaluate!! (e.g. a function signature).

Personally, I don't fully understand the purpose of an empty interface. It tells you that you can call some function evaluate!! on a model, but it doesn't tell you how to call it (i.e. what arguments to pass it). It's also impossible to test for correctness of the interface because there is no specification.

The result of having such an under-specified interface is that packages that depend on the interface are free to implement evaluate!! however they wish, and the interface becomes meaningless. This is partly proven by the fact that I was able to swap out evaluate!! for one with keyword arguments without causing any upstream breakage. The general problem of having such unenforcable interfaces is described in e.g. https://invenia.github.io/blog/2020/11/06/interfacetesting/.

This is probably a broader question but I think we need to evaluate (haha) why we are putting these zero-specification interfaces in AbstractPPL. If they're meant to have more specification, then they should be documented with signatures, and an interface testing function added. If not, then my current opinion is that it makes more sense to just keep the definition inside DynamicPPL.

Even if there's some other modelling package out there that uses AbstractPPL.evaluate!!, the fact is that they can implement it with whatever signature they wish, which means that there is nothing in common between our and their implementations of it (apart from the name, which as explained above carries no meaning in practice).

The text was updated successfully, but these errors were encountered:

yebai · 2024-11-25T20:45:05Z

AbstractPPL was created relatively late by @phipsgabler and me as an attempt to standardise the interface for working with PPL models.

In reality, I didn't manage to direct enough team-wide attention towards AbstractPPL. As a result, DynamicPPL evolved on a per-demand basis without close sync with AbstractPPL. In retrospect, there are pros and cons to this approach towards DynamicPPL, but it is not intentional but mainly due to a lack of enough resources for a long time.

I appreciate that this answer doesn't provide a solution, but I want to explain the historical context of the above situation that @penelopeysm correctly identifies.

torfjelde · 2024-11-26T08:54:19Z

As @yebai answered the AbstractPPL.jl question, I'll just address potential simplifiications for evaluate!!.

In the past we always passed the rng and the sampler directly into the model (this was before contexts where introduced). Once contexts where introduced we wanted to have a clear separation between when we're sampling and when we're not. As a result, we put rng and sampler into the SamplingContext but we also wanted to maintain the current dispatches, hence why we have so many.

I'd be happy to remove the default arguments for evaluate!!. It's an internal method that we only really use when we have a varinfo and context anyways.

The only user-facing thing we have to worry about here is that we want stuff like

model(rng)

to work as intended, and so we'd need to put this "conversion" into the arguments going into the leaf-call of evaluate!! 😕 A bit uncertain what the best way of doing this would be though. For example, the suggestion of 3 dispatches you make won't cover this use-case (it would require one specifically for rng too), but maybe if we add that one, it'll be okay?

Overall I very much agree with you that this is all very messy 😕

yebai · 2024-11-26T14:25:38Z

I think we could also push __varInfo__ into __context__, so we have:

evaluate!!(__model__, __context__)

as an evaluation interface. I'm not sure how much work this might incur, but it seems sensible to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`evaluate!!` shenanigans #720

`evaluate!!` shenanigans #720

penelopeysm commented Nov 25, 2024 •

edited

Loading

yebai commented Nov 25, 2024 •

edited

Loading

torfjelde commented Nov 26, 2024 •

edited

Loading

yebai commented Nov 26, 2024 •

edited

Loading

evaluate!! shenanigans #720

evaluate!! shenanigans #720

Comments

penelopeysm commented Nov 25, 2024 • edited Loading

evaluate!! as it stands

Keyword arguments

Where to go from here?

AbstractPPL and interfaces

yebai commented Nov 25, 2024 • edited Loading

torfjelde commented Nov 26, 2024 • edited Loading

yebai commented Nov 26, 2024 • edited Loading

`evaluate!!` shenanigans #720

`evaluate!!` shenanigans #720

penelopeysm commented Nov 25, 2024 •

edited

Loading

`evaluate!!` as it stands

yebai commented Nov 25, 2024 •

edited

Loading

torfjelde commented Nov 26, 2024 •

edited

Loading

yebai commented Nov 26, 2024 •

edited

Loading