-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 246: Standarise outputs #358
Conversation
Benchmark resultJudge resultBenchmark Report for /home/runner/work/Rt-without-renewal/Rt-without-renewalJob Properties
ResultsA ratio greater than
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfoTarget
Baseline
Target resultBenchmark Report for /home/runner/work/Rt-without-renewal/Rt-without-renewalJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Baseline resultBenchmark Report for /home/runner/work/Rt-without-renewal/Rt-without-renewalJob Properties
ResultsBelow is a table of this job's results, obtained by running the benchmarks.
Benchmark Group ListHere's a list of all the benchmark groups executed by this job:
Julia versioninfo
Runtime information
|
4b0398a
to
a0c5ad7
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #358 +/- ##
==========================================
- Coverage 93.09% 93.07% -0.02%
==========================================
Files 50 50
Lines 521 520 -1
==========================================
- Hits 485 484 -1
Misses 36 36 ☔ View full report in Codecov by Sentry. |
Love it. |
Its not quite zero e.g Rt-without-renewal/EpiAware/test/EpiLatentModels/manipulators/CombineLatentModels.jl Lines 91 to 106 in 929deb4
Where we test against ability to recover samples from a known (as in analytically derivable) posterior distribution using data generated randomly from the test model, which captures correct inference of parameter uncertainty. Are you proposing a test more aimed at checking true parameters are not in the tails of the posterior distribution (i.e. Bayesian posterior P value, I think that would be good. In a sense thats a less rigourous test, i.e. testing inference has not generated unreasonable tails compared to the whole distribution has to be good but its widely applicable outside of solvable situations. |
Wide format conversion from |
What I mean here is we don't use anything that uses the gradients which is the most important thing for actually using the methods in the real world. All the prior methods are just sampling directly which is good to check but it leaves a gap where bugs can happen |
But it is sampling with |
Oh right yes sorry but that is the only example of that I think |
@SamuelBrand1 I think this is good to review. If you see anything that would be a good target for further issues that would be great. Note #363 exists for betting handling of posterior outputs |
I think the issue is that it can't detect the branch name when its been updated via a rebase - a bit annoying but not the end of the world |
The evidence is just known better compiler performance under type stability... but this is a flag for an issue rather than about this PR |
Pull request was closed
I'm testing benchmarks on #367. I've played around a bunch and I can't really see what in this PR would be causing this new benchmark failure (the change since it worked was the hotfixes to main which included all of the EpiObsModel benchmarks which had been missed out before).Any thoughts. If not thoughts and if #367 doesn't shed any light perhaps we merge as is and make a issue to explore? The error message seems to be about loading previous benchmark results. It could be because |
Yes there is a ERROR: MethodError: no method matching loadparams!(::BenchmarkGroup, ::BenchmarkTools.Parameters, ::Symbol, ::Symbol) Has the pattern changed, because the method signature loadparams!(::BenchmarkGroup, !Matched::BenchmarkGroup, ::Any...) Is quite close? |
Yes I am seeing that and no not that I can see. #367 works and so this is either an issue with a change in this PR or its an issue with the configuration of the PR (which would be weird). Going to cherry pick some bits into their own PRs |
This is the line that is throwing the error I believe and I believe this indicates that the structure of the benchmarking job is changing between main and this PR. I struggle to understand why that might be given they share the same dependencies and benchmarks haven't been changed but am testing this but forcing a retune to see if that resolves the issue. If it does I guess we keep it off, make and issue, and investigate at our leisure. It could be some weird interaction from |
This didn't address the problem but reading the code it seems like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
As context, during f2f discussion we isolated a problem with BenchmarkCI; it was making its own call to pkgbenchmark
which was not receiving a retune
command to create a new branch -> main tuning file which seems to be required for running the Turing benchmarks.
There is still an issue here but it's now further in than the original. It still looks like the structure of suite is different between the two branches. I'll step through again and look locally using debugger |
The problem is now in PkgBenchmark: benchmark results written to .benchmarkci/result-baseline.json
┌ Info: Finish running benchmarks.
│ * Target: 47 minutes 58 seconds
└ * Baseline: 41 minutes 34 seconds
ERROR: MethodError: no method matching judge(::BenchmarkTools.TrialEstimate, ::BenchmarkTools.BenchmarkGroup)
Closest candidates are:
judge(::BenchmarkTools.TrialEstimate, !Matched::BenchmarkTools.TrialEstimate; kwargs...)
@ BenchmarkTools ~/.julia/packages/BenchmarkTools/QNsku/src/trials.jl:221
judge(!Matched::BenchmarkTools.BenchmarkGroup...; kwargs...) Same problem in the sense that of the outputs isn't in the expected format for judge |
I think our conclusion is we should merge now and deal with benchmarks as an issue |
This PR closes #246
It standardises all submodels currently in use to have a single return unless otherwise required (currently only the observation error prior model).
It looks like this gives a nice little performance boost and definitely cleans up the interface.
Note that in testing I found having a
ExpectObs
observation model that adds deterministic recording - perhaps we want this? or alternatively we need a better pattern for reconstructing those parts of the model.Something else that I realised when testing is we have zero parameter recover tests of any kind. Having these for a picked ad backend would probably catch a lot of issues when developing models. Not sure where these sit in the world where you can have gradient tests (and those would overlap heavily with our benchmarks).
I in general also still find chain objects really clunky and really just want them in a long format data frame with variables by index and name (i.e expected obs in one column and index in another) - I think I've just been spoiled by
tidybayes
here.This PR also cheekily closes #345 and updates observation stacking to avoid mutating a tuple.