[Tracking] Improvements to measures #17

ablaom · 2020-05-11T00:17:44Z

The measures part of MLJBase could do with some TLC. It is not the shiniest part of the MLJ code base, written in a bit of a hurry because nothing much could go forward without something in place, and the existing packages came up short.

I think the API is more-or-less fine, but the way things are implemented is less that ideal, leading to:

(i) code redundancy
(ii) less functionality: measures that could support weights or implement reports_each_observation don't

Recall that a measure reports_each_observation means m(v1, v2) returns a vector of measurements, and otherwise a single scalar is returned. So it does't really make sense for auc, for example, to report_each_observation (which it doesn't). However, mae should (but doesn't).

I propose we make the following assumption that will allow us to resolve these issues for the majority of measures:

If a measure m(v1, v2) implements reports_each_observation then it is understood that it is the sum or mean value of some scalar version m(s1, s2).

For such measures, then, we need only implement the scalar method m(s1, s2) and we can generate the other methods m(v1, v2), m(v1, v2, w) automatically.

For other measures, such as auc and the rms family, m(v1, v2) (and optionally m(v1, v2, w)) must be explicitly implemented, as at present.

In addition to the docs there is a lot about the measure design in this discussion.

Details

To "automatically generate" the extra methods, we could do something like this:

# fallbacks for measures
(m::Measure)(yhat, y::AbstractVector) = _eval(Val(reports_each_observation(m)), m, yhat, y)
(m::Measure)(yhat, y::AbstractVector, w) = _eval(Val(reports_each_observation(m)), m, yhat, y, w)
_eval(::Val{false}, m, args...) = m(yhat, args...)
_eval(::Val{true}, m, y, yhat) = (m(yhat, y)) |> aggregation(m)
_eval(::Val{true}, m, y, yhat, w) = w .* (m(yhat, y)) |> aggregation(m)

supports_measures(m::Measure) = _sm(Val(supports_each_observation(m), m)
_sm(::Val{false}, m) = false
_sm(::Val{true}, m) = true

@tlienart
@azev77

The text was updated successfully, but these errors were encountered:

ablaom · 2020-05-11T02:55:41Z

Decision:

What do we do about measures like mape where we want to drop some terms where the computation is unstable? That is, what do we do if we are now reporting a value for every observation?

One option is return a missing there and make sure the aggregators do skipmissing. There is no skipnan, which might be more natural.

A related question is whether instead of returning a single value, in the case of reports_each_observation=false (eg, auc) we instead report a constant vector. This would eliminate some bothersome case distinctions, but might also be confusing.

ablaom · 2020-05-11T03:45:09Z

Some other improvements on my wish list:

Export all measure types (such as RMS, CrossEntropy, and so forth) and always use the explicit instantiatioins (such as RMS(), CrossEntropy(eps=1e-7), BrierScore(distribution=Normal), etc) in documentation, rather than rms, cross_entropy. I think keeping the aliases is fine, but their use in docs has hidden the fact that some measures depend on parameters, and that these parameters must be decided on instantiation of the measure, not when they are called on data. This is in line with the LossFunctions.jl package. I think this point has confused several people. edit This is essentially done.
Following a suggestion of @juliohm, we replace orientation trait, taking values :score, :loss, or :none, and replace it with objective, taking values :max, :min or :none. We could then introduce the following new traits that might have some value
- is_loss: true only for measures that you minimise but also which vanish in the case of "perfect predictions"
- is_score: true only for measures that you maximise but which which take value in [0, 1] and have unit value in case of "perfect" predictions.
Add functionality to take "products" of measures, for computing muli-target losses.

azev77 · 2020-05-11T17:52:30Z

A few features that would be nice:

similar to models(matching(X,y)) I would like measures() to give me all measures available for regression (continuous y).
Nice decomposition of performance measures:
deterministic (predictions) vs probabilistic
a "subtype" of deterministic
Scale-dependent measures: e := ŷ - y (mse, rmse, mae ...)
Measures based on percentage errors: p := 100*e/y (rmsp, mape ...)
{note: this way you only deal w/ zero denominators once & it applies to all percent based measures}

So a user can easily find:
all measures for continuous y
all scale-dependent measures for continuous y
etc

ablaom · 2020-05-11T21:58:47Z

A few features that would be nice:

similar to models(matching(X,y)) I would like measures() to give me all measures available for regression (continuous y).

Good idea: JuliaAI/MLJBase.jl#301

Nice decomposition of performance measures:
deterministic (predictions) vs probabilistic
a "subtype" of deterministic

Not sure I understand. We have a prediction_type trait. Could you elaborate?

Scale-dependent measures: e := ŷ - y (mse, rmse, mae ...)

I don't see why not. Not sure "scale-dependent" is the best description. Is this terminology common? How about a trait called difference_based and if that is true then API expects m(difference) (difference a scalar) to be implemented, and that's all? I'm supposing that these measures would all be of the reports_each_observation=true kind.

Measures based on percentage errors: p := 100*e/y (rmsp, mape ...)

Sure. Similar to above. percentage_based could be the trait name. (Is it common to use a percentage and not just proportion?)

azev77 · 2020-05-14T21:13:29Z

Not sure I understand. We have a prediction_type trait. Could you elaborate?

Just like there are many models/measures in MLJ.jl, there are many distributions in Distributions.jl.
When the Arcsine(a,b) distribution was added it was given the ContinuousUnivariateDistribution "type"

struct Arcsine{T<:Real} <: ContinuousUnivariateDistribution
    a::T
    b::T
    Arcsine{T}(a::T, b::T) where {T<:Real} = new{T}(a, b)
end

Then automatically that is a subtype of ContinuousDistribution & UnivariateDistribution & Distribution.
Thus Arcsine(a,b) is an element in ALL of the following:

using Distributions
subtypes(Distribution)
subtypes(UnivariateDistribution)
subtypes(ContinuousDistribution)
subtypes(ContinuousUnivariateDistribution)

I'm throwing ideas out here, but could it help if measures was similarly organized?
Measure the umbrella type (like Distribution)
RegressionMeasure a subtype (like ContinuousDistribution)
ScaleDependentRegressionMeasure a subtype of RegressionMeasure
PercentBasedRegressionMeasure a subtype of RegressionMeasure
etc

Then if a user wants to find all measures etc:

subtypes(Measure)
subtypes(RegressionMeasure)
subtypes(ScaleDependentRegressionMeasure)
subtypes(PercentBasedRegressionMeasure)

azev77 · 2020-05-14T21:19:42Z

I don't see why not. Not sure "scale-dependent" is the best description. Is this terminology common?

You're right, it is not a flattering description (though informative).
I'm using terminology from the most cited paper on forecast accuracy.
Perhaps DifferenceRegressionMeasure PercentRegressionMeasure or whatever you think is right (and most familiar to users)...

azev77 · 2020-05-14T23:52:34Z

There is also an issue w/ asymmetry for percent errors:
mape(ŷ, y) != mape(y,ŷ)
a possible solution could be to require keyword args for those types of measures?
I have mixed feelings about this.
I like parsimony.
Keyword args might be clunky, maybe have a convention that predictions go before observations in MLJ...

ablaom · 2020-05-18T04:09:14Z

There is also an issue w/ asymmetry for percent errors:
mape(ŷ, y) != mape(y,ŷ)

Right. The API specifies that yhat goes first. We could give MAPE a field compare_with_prediction which defaults to false in the key-work constructor (just like eps exists and defaults to eps()) but I'm not sure there would be much call for this option. Do you have a use case in mind?

ablaom · 2020-05-18T04:15:27Z

Regarding having a hierarchy of types. We don't really need this. You can do queries based on the traits. I think there is a tendency towards traits, because other packages can extend without your package being a dependency, and so forth. For example, Distributions is a quite large and MLJ is kind-of forced to include it as a dependency because it doesn't use traits. I think if they would start over, Distributions would probably use traits. And if I had my time over, I would possibly have used them more in MLJ (for model types) - but I think that ship has sailed.

Also, adding traits is much easier than changing type hierarchies you didn't quite get right.

@oxinabox May want to comment here.

ablaom · 2020-05-18T04:26:36Z

You can see all the traits in the current API with this example:

julia> info(rms)
root mean squared; aliases: `rms`.
(name = "rms",
 target_scitype = Union{AbstractArray{Continuous,1}, AbstractArray{Count,1}},
 supports_weights = true,
 prediction_type = :deterministic,
 orientation = :loss,
 reports_each_observation = false,
 aggregation = MLJBase.RootMeanSquare(),
 is_feature_dependent = false,
 docstring = "root mean squared; aliases: `rms`.",
 distribution_type = missing,)

azev77 · 2020-05-18T09:57:26Z

Can the current API tell me which measures work w/ regression (continuous y) versus classification?

ablaom · 2020-05-18T20:40:14Z

Sure.

Measures for a Finite univariate target (a.k.a. "classification"):

julia> measures(m -> AbstractVector{Finite} <: m.target_scitype)
19-element Array{NamedTuple{(:name, :target_scitype, :supports_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type),T} where T<:Tuple,1}:
 (name = area_under_curve, ...)            
 (name = accuracy, ...)                    
 (name = balanced_accuracy, ...)           
 (name = cross_entropy, ...)               
 (name = FScore, ...)                      
 (name = false_discovery_rate, ...)        
 (name = false_negative, ...)              
 (name = false_negative_rate, ...)         
 (name = false_positive, ...)              
 (name = false_positive_rate, ...)         
 (name = misclassification_rate, ...)      
 (name = negative_predictive_value, ...)   
 (name = positive_predictive_value, ...)   
 (name = true_negative, ...)               
 (name = true_negative_rate, ...)          
 (name = true_positive, ...)               
 (name = true_positive_rate, ...)          
 (name = BrierScore{UnivariateFinite}, ...)
 (name = confusion_matrix, ...)

Measures for a Continuous univariate target (aka "Regression"):

julia> measures(m -> AbstractVector{Continuous} <: m.target_scitype)
15-element Array{NamedTuple{(:name, :target_scitype, :supports_weights, :prediction_type, :orientation, :reports_each_observation, :aggregation, :is_feature_dependent, :docstring, :distribution_type),T} where T<:Tuple,1}:
 (name = l1, ...)                
 (name = l2, ...)                
 (name = mae, ...)               
 (name = mape, ...)              
 (name = rms, ...)               
 (name = rmsl, ...)              
 (name = rmslp1, ...)            
 (name = rmsp, ...)              
 (name = HuberLoss(), ...)       
 (name = L1EpsilonInsLoss(), ...)
 (name = L2EpsilonInsLoss(), ...)
 (name = LPDistLoss(), ...)      
 (name = LogitDistLoss(), ...)   
 (name = PeriodicLoss(), ...)    
 (name = QuantileLoss(), ...)

azev77 · 2020-05-18T22:40:27Z

Ahhh! that's what I was looking for. Thanks!
I guess this will be easier to realize once measures() is refactored...

ablaom · 2020-05-25T23:23:20Z

There is also EvalMetrics.jl to look at; see JuliaAI/MLJBase.jl#316

OkonSamuel · 2020-06-16T17:43:57Z

A related question is whether instead of returning a single value, in the case of reports_each_observation=false (eg, auc) we instead report a constant vector. This would eliminate some bothersome case distinctions, but might also be confusing.

This would eliminate some type instabilities in the evaluate method

tlienart · 2020-06-16T18:45:19Z

There is also EvalMetrics.jl to look at; see JuliaAI/MLJBase.jl#316

I don't think it's a serious contender after looking at their code in some amount of details (too narrow focus when we would like something as generic as possible); some of their core methods could possibly be adapted (they explicitly said they were happy with that).

OkonSamuel · 2020-07-24T03:10:57Z

What the status with integration with EvalMetrics.jl??

tlienart · 2020-07-24T06:02:21Z

I don't think we should, possibly we can use some of their code for a few specific metrics but last I checked it's not really interesting for us (eg not generic enough)

ablaom · 2020-08-19T04:18:49Z

JuliaAI/MLJBase.jl#395

ablaom · 2020-09-03T23:04:35Z

Comment from @ven-k on slack:

While defining the struct for losses, including the y slightly improved the time taken. For ex,

struct MSE{T<:Float32}
    y::Vector{T}
end

and

struct MSE end

gave, similiar benchmarks but mean time of former was 0.1 to 0.01 μs was lesser than latter.
And wouldn't this be more intuitive as we can define an object for a target

mse = MSE(y)

and pass only yhat in each epoch as mse(yhat).

Also adding to above, we could have a wrapper function mse(yhat, y) = MSE(y)(that) to support mse(yhat, y).

ablaom · 2020-11-02T19:45:55Z

LossFunctions fix: We can make measures from LossFunctions behave exactly like all the others when called by importing their names into scope, instead of using them, and exporting versions that satisfy our API.

azev77 · 2020-11-03T01:30:15Z

I just tried measures(matching(y)) and it works like a charm!!!

There is a growing literature on Probabilistic predictions for regression models (ie predicting a conditional distribution).
ngboost.py is one of the contenders.
As explained in these slides, a common way to score probabilistic predictions is w/ Negative-LogLikelihood.
Is this measure (NLL or LL) worth adding to MLJ?

PS: here are all 47 measures I currently get

using MLJ;
a=measures()
[println(a[i]) for i in 1:length(measures())]
(name = area_under_curve, ...)
(name = accuracy, ...)
(name = balanced_accuracy, ...)
(name = cross_entropy, ...)
(name = FScore, ...)
(name = false_discovery_rate, ...)
(name = false_negative, ...)
(name = false_negative_rate, ...)
(name = false_positive, ...)
(name = false_positive_rate, ...)
(name = l1, ...)
(name = l2, ...)
(name = log_cosh, ...)
(name = mae, ...)
(name = mape, ...)
(name = matthews_correlation, ...)
(name = misclassification_rate, ...)
(name = negative_predictive_value, ...)
(name = positive_predictive_value, ...)
(name = rms, ...)
(name = rmsl, ...)
(name = rmslp1, ...)
(name = rmsp, ...)
(name = true_negative, ...)
(name = true_negative_rate, ...)
(name = true_positive, ...)
(name = true_positive_rate, ...)
(name = BrierScore{UnivariateFinite}, ...)
(name = DWDMarginLoss(), ...)
(name = ExpLoss(), ...)
(name = L1HingeLoss(), ...)
(name = L2HingeLoss(), ...)
(name = L2MarginLoss(), ...)
(name = LogitMarginLoss(), ...)
(name = ModifiedHuberLoss(), ...)
(name = PerceptronLoss(), ...)
(name = SigmoidLoss(), ...)
(name = SmoothedL1HingeLoss(), ...)
(name = ZeroOneLoss(), ...)
(name = HuberLoss(), ...)
(name = L1EpsilonInsLoss(), ...)
(name = L2EpsilonInsLoss(), ...)
(name = LPDistLoss(), ...)
(name = LogitDistLoss(), ...)
(name = PeriodicLoss(), ...)
(name = QuantileLoss(), ...)
(name = confusion_matrix, ...)

ablaom · 2020-11-04T02:37:28Z

I believe we already have negative log-likelihood, aka log-loss. It is called cross_entropy and, yes, it is a proper scoring loss.

search: cross_entropy

  cross_entropy

  Cross entropy loss with probabilities clamped between eps() and 1-eps(); aliases: cross_entropy.

  ce = CrossEntropy(; eps=eps())
  ce(ŷ, y)

  Given an abstract vector of distributions ŷ and an abstract vector of true observations y, return the corresponding cross-entropy
  loss (aka log loss) scores.

  Since the score is undefined in the case of the true observation has predicted probability zero, probablities are clipped between
  eps and 1-eps where eps can be specified.

  If sᵢ is the predicted probability for the true class yᵢ then the score for that example is given by

  -log(clamp(sᵢ, eps, 1-eps))

  For more information, run info(cross_entropy).

julia> yhat = UnivariateFinite(["yes", "no"], rand(5), pool=missing, augment=true)
5-element MLJBase.UnivariateFiniteArray{Multiclass{2},String,UInt8,Float64,1}:
 UnivariateFinite{Multiclass{2}}(yes=>0.374, no=>0.626)
 UnivariateFinite{Multiclass{2}}(yes=>0.532, no=>0.468)
 UnivariateFinite{Multiclass{2}}(yes=>0.428, no=>0.572)
 UnivariateFinite{Multiclass{2}}(yes=>0.691, no=>0.309)
 UnivariateFinite{Multiclass{2}}(yes=>0.539, no=>0.461)

julia> y = rand(classes(yhat), 5)
5-element Array{CategoricalArrays.CategoricalValue{String,UInt8},1}:
 "no"
 "no"
 "yes"
 "no"
 "yes"

julia> cross_entropy(yhat, y)
5-element Array{Float64,1}:
 0.4691627141887623
 0.7594675442682963
 0.8484769383284205
 1.1752213731506886
 0.6185977143266518

azev77 · 2020-11-04T05:11:24Z

@ablaom cross_entropy doesn't give the log-likelihood for the following:

using MLJ
X,y=@load_boston
train, test = partition(eachindex(y), .7, rng=333);

@load LinearRegressor pkg = GLM
mdl = LinearRegressor()
mach = machine(mdl, X, y)
fit!(mach, rows=train, verbosity=0)
ŷ = predict(mach, rows=test)

cross_entropy(ŷ, y[test])

ERROR: MethodError: no method matching (::MLJBase.CrossEntropy{Float64})(::Array{Distributions.Normal{Float64},1}, ::Array{Float64,1})
Closest candidates are:
  Any(::MLJBase.UnivariateFiniteArray{S,V,R,P,1}, ::AbstractArray{T,1} where T) where {S, V, R, P} at /Users/AZevelev/.julia/packages/MLJBase/Ov46j/src/measures/finite.jl:64
  Any(::AbstractArray{var"#s577",1} where var"#s577"<:UnivariateFinite, ::AbstractArray{T,1} where T) at /Users/AZevelev/.julia/packages/MLJBase/Ov46j/src/measures/finite.jl:57
Stacktrace:
 [1] top-level scope at none:1

ablaom · 2020-11-04T05:23:41Z

Ah yes. cross_entropy is for Finite targets only. And yes, it has an extension to the continuous case (https://www-jstor-org.ezproxy.auckland.ac.nz/stable/2629907?seq=3&socuuid=dcad1753-575a-42c3-a7b8-fdd39a9f7589&socplat=email#metadata_info_tab_contents ) but it is not yet implemented. Indeed no measure for probabilistic predictors of Continuous targets is yet implemented 😢 See also JuliaAI/MLJBase.jl#395

azev77 · 2020-11-04T05:28:50Z

For the continuous case, I doubt it would be called cross_entropy, just log-likelihood is prob fine

ablaom · 2020-11-16T00:09:27Z

JuliaAI/MLJBase.jl#450

ablaom · 2021-06-28T22:38:09Z

Lighthouse has some measures we may want to include: JuliaAI/MLJBase.jl#586

ablaom · 2021-08-25T00:19:08Z

Community discussion on mitigating metric code fragmentation

FluxML/FluxML-Community-Call-Minutes#38

tlienart · 2021-08-25T08:24:29Z

(sorry super old message but...

One option is return a missing there and make sure the aggregators do skipmissing. There is no skipnan, which might be more natural.

You can use

skipnan(x) = Iterators.filter(!isnan, x)

(see also JuliaLang/julia#35162)

and going through the thread, the option to get auc to return a vector seems pretty weird to me, if it returns it internally to eliminate type instability, fine, but not to the user, maybe a way to do this is in implementing a show for the measure.

ablaom · 2021-08-25T19:44:19Z

A related question is whether instead of returning a single value

No there doesn't seem to be much stomach for this suggestion.

make sure the aggregators do skipmissing

Done. I had forgotten about NaN's though.

pat-alt · 2022-12-06T15:53:17Z

Would be nice to have various metrics from CalibrationErrors.jl by @devmotion added. I may chip in myself once I've sorted out this one and by then will have hopefully understood how it works (but will be some time before I get to it).

ablaom mentioned this issue May 11, 2020

MAPE JuliaAI/MLJBase.jl#296

Closed

ablaom mentioned this issue May 14, 2020

Make all measures return a vector given vector arguments, even AUC, and so forth JuliaAI/MLJBase.jl#308

Closed

ven-k mentioned this issue Sep 4, 2020

Log cosh JuliaAI/MLJBase.jl#421

Merged

ablaom pinned this issue Nov 15, 2020

ablaom mentioned this issue Nov 16, 2020

name, aliases cleanup for measures JuliaAI/MLJBase.jl#450

Closed

4 tasks

ablaom changed the title ~~Improvements to measures~~ [Tracking] Improvements to measures Aug 25, 2021

ablaom mentioned this issue Aug 30, 2021

Skip NaN in aggregation of measures JuliaAI/MLJBase.jl#622

Closed

ablaom transferred this issue from JuliaAI/MLJBase.jl Jan 17, 2024

github-project-automation bot added this to General Aug 30, 2024

github-project-automation bot moved this to priority high / involved in General Aug 30, 2024

ablaom moved this from priority high / involved to tracking/discussion/metaissues/misc in General Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] Improvements to measures #17

[Tracking] Improvements to measures #17

ablaom commented May 11, 2020 •

edited

Loading

ablaom commented May 11, 2020 •

edited

Loading

ablaom commented May 11, 2020 •

edited

Loading

azev77 commented May 11, 2020

ablaom commented May 11, 2020

azev77 commented May 14, 2020

azev77 commented May 14, 2020

azev77 commented May 14, 2020 •

edited

Loading

ablaom commented May 18, 2020

ablaom commented May 18, 2020 •

edited

Loading

ablaom commented May 18, 2020

azev77 commented May 18, 2020

ablaom commented May 18, 2020

azev77 commented May 18, 2020

ablaom commented May 25, 2020 •

edited

Loading

OkonSamuel commented Jun 16, 2020

tlienart commented Jun 16, 2020 •

edited

Loading

OkonSamuel commented Jul 24, 2020

tlienart commented Jul 24, 2020

ablaom commented Aug 19, 2020

ablaom commented Sep 3, 2020 •

edited

Loading

ablaom commented Nov 2, 2020

azev77 commented Nov 3, 2020

ablaom commented Nov 4, 2020

azev77 commented Nov 4, 2020 •

edited

Loading

ablaom commented Nov 4, 2020

azev77 commented Nov 4, 2020

ablaom commented Nov 16, 2020

ablaom commented Jun 28, 2021

ablaom commented Aug 25, 2021

tlienart commented Aug 25, 2021 •

edited

Loading

ablaom commented Aug 25, 2021

pat-alt commented Dec 6, 2022

[Tracking] Improvements to measures #17

[Tracking] Improvements to measures #17

Comments

ablaom commented May 11, 2020 • edited Loading

Details

ablaom commented May 11, 2020 • edited Loading

ablaom commented May 11, 2020 • edited Loading

azev77 commented May 11, 2020

ablaom commented May 11, 2020

azev77 commented May 14, 2020

azev77 commented May 14, 2020

azev77 commented May 14, 2020 • edited Loading

ablaom commented May 18, 2020

ablaom commented May 18, 2020 • edited Loading

ablaom commented May 18, 2020

azev77 commented May 18, 2020

ablaom commented May 18, 2020

azev77 commented May 18, 2020

ablaom commented May 25, 2020 • edited Loading

OkonSamuel commented Jun 16, 2020

tlienart commented Jun 16, 2020 • edited Loading

OkonSamuel commented Jul 24, 2020

tlienart commented Jul 24, 2020

ablaom commented Aug 19, 2020

ablaom commented Sep 3, 2020 • edited Loading

ablaom commented Nov 2, 2020

azev77 commented Nov 3, 2020

ablaom commented Nov 4, 2020

azev77 commented Nov 4, 2020 • edited Loading

ablaom commented Nov 4, 2020

azev77 commented Nov 4, 2020

ablaom commented Nov 16, 2020

ablaom commented Jun 28, 2021

ablaom commented Aug 25, 2021

tlienart commented Aug 25, 2021 • edited Loading

ablaom commented Aug 25, 2021

pat-alt commented Dec 6, 2022

ablaom commented May 11, 2020 •

edited

Loading

ablaom commented May 11, 2020 •

edited

Loading

ablaom commented May 11, 2020 •

edited

Loading

azev77 commented May 14, 2020 •

edited

Loading

ablaom commented May 18, 2020 •

edited

Loading

ablaom commented May 25, 2020 •

edited

Loading

tlienart commented Jun 16, 2020 •

edited

Loading

ablaom commented Sep 3, 2020 •

edited

Loading

azev77 commented Nov 4, 2020 •

edited

Loading

tlienart commented Aug 25, 2021 •

edited

Loading