Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scoring tool for pipeline #282

Merged
merged 8 commits into from
Jun 14, 2024
Merged

scoring tool for pipeline #282

merged 8 commits into from
Jun 14, 2024

Conversation

SamuelBrand1
Copy link
Collaborator

This PR adds scoring utility to the pipeline and closes #247.

The main contribution here is the score_parameters function, which collects inference/forecast samples into a Dataframe and sends to an R run time to use scoringutils and returns a dataframe of summary scores.

Example usage is in pipeline/test/end-to-end/test_scoring.jl, my main worry after doing a full run is that the function is abit low level given the boilerplate needed to look at all the times points.

@SamuelBrand1 SamuelBrand1 changed the title scoring tool scoring tool for pipeline Jun 13, 2024
@SamuelBrand1 SamuelBrand1 requested a review from seabbs June 13, 2024 16:19
Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall looks good - though there are a collection of random changes.

My main question is do we want to summarise in R and if so we need a follow up to generalise this in order to allow us to do all the evaluations we need to do.

@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.85%. Comparing base (9bd7043) to head (5f1a89a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #282   +/-   ##
=======================================
  Coverage   92.85%   92.85%           
=======================================
  Files          47       47           
  Lines         490      490           
=======================================
  Hits          455      455           
  Misses         35       35           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again I wonder at the decision to nest creating a dataframe with scoring.

pipeline/src/scoring/score_parameters.jl Show resolved Hide resolved
@SamuelBrand1
Copy link
Collaborator Author

I think the key comments from @seabbs here are:

My main question is do we want to summarise in R and if so we need a follow up to generalise this in order to allow us to do all the evaluations we need to do.

I don't really get why you are doing construction of the required dataframe here? Surely our lives will be easier if we have a dataframe of posterior predictions more generally and then just pass to score with conversion of appropriate column names?

This PR gives a function that get target any parameter in a MCMCChains.Chains object to score vs a truth data point. It might be better to do all the model results, combine into a single dataframe and do one pass to scoringutils?

@seabbs
Copy link
Collaborator

seabbs commented Jun 14, 2024

I think what my questions are trying to get at is does this function give you the flexibility we need to do the analysis plan and I think the answer to that is no as it only supports summarising by parameter.

This is really locked in by the DataFrame setup in the linking function and hard coding parameter in by.

@SamuelBrand1
Copy link
Collaborator Author

I think what my questions are trying to get at is does this function give you the flexibility we need to do the analysis plan and I think the answer to that is no as it only supports summarising by parameter.

It does processes by adding the process_strings = [myprocess[$(i)] for i in 1:n] pattern which then get scored for each time point. I demo that in the extra end-to-end run. But having to write that might mean this approach is too low level/inconvenient which is what I was worried about in the opening.

@SamuelBrand1
Copy link
Collaborator Author

This is really locked in by the DataFrame setup in the linking function and hard coding parameter in by.

Yes, I can see that. I guess that an argument for a post-processing step which makes a big dataframe out of all the models we trial and we send selected cols to R/scoringutils?

@seabbs
Copy link
Collaborator

seabbs commented Jun 14, 2024

yes or julia -> score -> julia followed by julia -> summarise -> julia

@SamuelBrand1
Copy link
Collaborator Author

yes or julia -> score -> julia followed by julia -> summarise -> julia

I'm undecided, but I think the julia -> score -> julia function is handy functionality even if we end up going for

makes a big dataframe out of all the models we trial and we send selected cols to R/scoringutils

approach

@SamuelBrand1 SamuelBrand1 requested a review from seabbs June 14, 2024 14:29
@seabbs seabbs enabled auto-merge June 14, 2024 15:05
Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I still have some reservations but I think we should merge and see how this works out when sticking together with other parts of the pipeline.

@seabbs seabbs added this pull request to the merge queue Jun 14, 2024
Merged via the queue into main with commit e506f7d Jun 14, 2024
10 checks passed
@seabbs seabbs deleted the eval-methods branch June 14, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evaluation method on inference output. I'm assuming we outsource this to scoringutils.
3 participants