scoring tool for pipeline #282

SamuelBrand1 · 2024-06-13T16:18:41Z

This PR adds scoring utility to the pipeline and closes #247.

The main contribution here is the score_parameters function, which collects inference/forecast samples into a Dataframe and sends to an R run time to use scoringutils and returns a dataframe of summary scores.

Example usage is in pipeline/test/end-to-end/test_scoring.jl, my main worry after doing a full run is that the function is abit low level given the boilerplate needed to look at all the times points.

pipeline/src/constructors/make_inference_method.jl

pipeline/src/infer/InferenceConfig.jl

pipeline/src/scoring/score_parameters.jl

seabbs

This overall looks good - though there are a collection of random changes.

My main question is do we want to summarise in R and if so we need a follow up to generalise this in order to allow us to do all the evaluations we need to do.

pipeline/test/end-to-end/test_scoring.jl

codecov-commenter · 2024-06-13T16:32:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.85%. Comparing base (9bd7043) to head (5f1a89a).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #282   +/-   ##
=======================================
  Coverage   92.85%   92.85%           
=======================================
  Files          47       47           
  Lines         490      490           
=======================================
  Hits          455      455           
  Misses         35       35

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

seabbs

Looking again I wonder at the decision to nest creating a dataframe with scoring.

pipeline/src/scoring/score_parameters.jl

SamuelBrand1 · 2024-06-14T12:40:52Z

I think the key comments from @seabbs here are:

My main question is do we want to summarise in R and if so we need a follow up to generalise this in order to allow us to do all the evaluations we need to do.

I don't really get why you are doing construction of the required dataframe here? Surely our lives will be easier if we have a dataframe of posterior predictions more generally and then just pass to score with conversion of appropriate column names?

This PR gives a function that get target any parameter in a MCMCChains.Chains object to score vs a truth data point. It might be better to do all the model results, combine into a single dataframe and do one pass to scoringutils?

seabbs · 2024-06-14T12:59:10Z

I think what my questions are trying to get at is does this function give you the flexibility we need to do the analysis plan and I think the answer to that is no as it only supports summarising by parameter.

This is really locked in by the DataFrame setup in the linking function and hard coding parameter in by.

SamuelBrand1 · 2024-06-14T13:02:21Z

I think what my questions are trying to get at is does this function give you the flexibility we need to do the analysis plan and I think the answer to that is no as it only supports summarising by parameter.

It does processes by adding the process_strings = [myprocess[$(i)] for i in 1:n] pattern which then get scored for each time point. I demo that in the extra end-to-end run. But having to write that might mean this approach is too low level/inconvenient which is what I was worried about in the opening.

SamuelBrand1 · 2024-06-14T13:03:36Z

This is really locked in by the DataFrame setup in the linking function and hard coding parameter in by.

Yes, I can see that. I guess that an argument for a post-processing step which makes a big dataframe out of all the models we trial and we send selected cols to R/scoringutils?

seabbs · 2024-06-14T13:07:19Z

yes or julia -> score -> julia followed by julia -> summarise -> julia

SamuelBrand1 · 2024-06-14T14:28:54Z

yes or julia -> score -> julia followed by julia -> summarise -> julia

I'm undecided, but I think the julia -> score -> julia function is handy functionality even if we end up going for

makes a big dataframe out of all the models we trial and we send selected cols to R/scoringutils

approach

pipeline/src/scoring/score_parameters.jl

seabbs

LGTM. I still have some reservations but I think we should merge and see how this works out when sticking together with other parts of the pipeline.

SamuelBrand1 added 5 commits June 13, 2024 12:27

fixes and test scoring

c7db941

include RCall dep

acff610

Minor change: increase target acceptance

3b411b9

Minor fixes to run inference after changes in EpiAware

b8800d2

scoring tool

78c53ee

SamuelBrand1 changed the title ~~scoring tool~~ scoring tool for pipeline Jun 13, 2024

SamuelBrand1 requested a review from seabbs June 13, 2024 16:19