-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
this was easier than expected #40
this was easier than expected #40
Conversation
cc @ablaom I've had a go at implementing evaluation metrics for conformal predictions. This was fairly straight-forward thanks to MLJ's existing infrastructure. I essentially only had to add custom performance measures and this seems to be working. I have two questions though that you might be able to help me with 🙏🏽 Q1: Firstly, should I extend function MMI.evaluate(model, data...; cache=true, kw_options...)
@assert measure in available_measures "Performance measure not applicable to `ConformalModel`."
MMI.evaluate(model, data...; cache=true, measure=measure, kw_options...)
end Q2: Secondly, while evaluation runs smoothly, the output it prints for my custom methods look odd. Below is lifted from the example in the README: > _eval = evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
measure, operation, measurement, per_fold,
per_observation, fitted_params_per_fold,
report_per_fold, train_test_rows
Extract:
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ measure ⋯
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12memp_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────────────────╮\e ⋯
│ \e[0m\e[38;2;155;179;224m│\e[39m\e[0m \e[0m\e[38;2;155;179;224m│\e[39m\e[0m ⋯
│ \e[0m\e[38;2;155;179;224m│\e[39m\e[0m \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217memp_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m ⋯
│ \e[0m\e[38;2;155;179;224m│\e[39m\e[0m \e[0m\e[38;2;155;179;224m│\e[39m\e[0m ⋯
│ \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m ⋯
│ \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Docstring\e[0m \e[2m\e[32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\e[0m\e[22m\e[39m\e[22m\e[39m\e[0m ⋯
│ \e[2m\e[37m\e[48;2;38;38;38m┌─────────────────────────────────────────────────────────┐\e[22m\e[39m\e[49m ⋯
│ \e[0m\e[2m\e[37m\e[48;2;38;38;38m│\e[22m\e[39m\e[49m\e[0m\e[48;2;38;38;38m \e[49m\e[0m\e[48;2;38;38;38m\e[38;2;232;212;114memp_coverage\e[39m\e[38;2;227;136;100m(\e[39m\e[38;2;222;222 ⋯
│ \e[2m\e[37m\e[48;2;38;38;38m└─────────────────────────────────────────────────────────┘\e[22m\e[39m\e[49m\e[0m ⋯
│ ⋯
│ Computes the empirical coverage for conformal predictions \e[3m\e[38;2;255;245;157m`\e[23m\e[39m\e[0m\e[38;2;222;222;222mŷ\e[39m\e[3m\e[38;2;255;245;157m`\e[23m\e[39m.\e[0m ⋯
│ ⋯
│ \e[38;2;155;179;224m╭──── \e[38;2;227;172;141mFunction: \e[1m\e[38;5;12msize_stratified_coverage\e[22m\e[39m\e[39m\e[38;2;155;179;224m\e[38;2;155;179;224m ──────────────────────────────╮\e ⋯
│ \e[0m\e[38;2;155;179;224m│\e[39m\e[0m \e[0m\e[38;2;155;179;224m│\e[39m\e[0m ⋯
│ \e[0m\e[38;2;155;179;224m│\e[39m\e[0m \e[1m\e[2m(1) \e[22m\e[22m \e[1m\e[38;2;165;198;217msize_stratified_coverage\e[22m\e[39m\e[38;2;255;245;157m(\e[39mŷ, y\e[38;2;255;245;157m)\e[39m ⋯
│ \e[0m\e[38;2;155;179;224m│\e[39m\e[0m \e[0m\e[38;2;155;179;224m│\e[39m\e[0m ⋯
│ \e[38;2;155;179;224m╰───────────────────────────────────────────────────────── \e[1m\e[37m1\e[22m\e[39m method\e[38;2;155;179;224m ───╯\e[39m\e[0m\e[39m\e[0m ⋯
│ ⋱
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── When I access the fields of |
Codecov Report
@@ Coverage Diff @@
## main #40 +/- ##
==========================================
+ Coverage 97.59% 97.86% +0.27%
==========================================
Files 8 9 +1
Lines 374 422 +48
==========================================
+ Hits 365 413 +48
Misses 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Keeping the branch open until below is sorted out.
|
@pat-alt Great to hear about your progress!
Generally the kind of target proxy the measure is used for is articulated with the StatisticalTraits.prediction_type(::Type{<:YourMeasureType}) = :probablisitic_set edited: The model version of this trait is already suitably overloaded here: The
Do you always see this rubbish, or just for your custom measure? Where are you viewing this? Is it in an ordinary terminal or VSCode, notebook, other? Could you please try |
Thanks! I'll implement the trait with the goal to contribute once sorted. As for how this is displayed, I'm working in the VSCode REPL (with Term.jl) and only get this issue for my custom measures. |
Mmm. Not sure about the display issue. I doubt it's anything you are doing wrong. I don't have problem in an emacs term REPL: julia> evaluate!(mach; measure=[emp_coverage, ssc], verbosity=0)
PerformanceEvaluation object with these fields:
measure, operation, measurement, per_fold,
per_observation, fitted_params_per_fold,
report_per_fold, train_test_rows
Extract:
┌───────────────────────────────────────────────────────────┬───────────┬───────
│ measure │ operation │ meas ⋯
├───────────────────────────────────────────────────────────┼───────────┼───────
│ emp_coverage (generic function with 1 method) │ predict │ 0.95 ⋯
│ size_stratified_coverage (generic function with 1 method) │ predict │ 0.75 ⋯
└───────────────────────────────────────────────────────────┴───────────┴─────── |
Just had to defined performance measures and then tapping into MLJ
evaluate