Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAPE #296

Closed
wants to merge 13 commits into from
Closed

MAPE #296

wants to merge 13 commits into from

Conversation

azev77
Copy link
Contributor

@azev77 azev77 commented May 9, 2020

No description provided.

@azev77
Copy link
Contributor Author

azev77 commented May 9, 2020

@OkonSamuel I think this one has the stuff

@codecov-io
Copy link

codecov-io commented May 9, 2020

Codecov Report

Merging #296 into dev will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     JuliaAI/MLJBase.jl#296      +/-   ##
==========================================
+ Coverage   82.50%   82.59%   +0.09%     
==========================================
  Files          30       30              
  Lines        2023     2034      +11     
==========================================
+ Hits         1669     1680      +11     
  Misses        354      354              
Impacted Files Coverage Δ
src/MLJBase.jl 100.00% <ø> (ø)
src/measures/finite.jl 98.75% <ø> (ø)
src/data/datasets_synthetic.jl 95.95% <100.00%> (ø)
src/measures/continuous.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c64b7e8...1d76a1d. Read the comment docs.

@tlienart
Copy link
Collaborator

tlienart commented May 10, 2020

ah crap I commented on the wrong PR, please see comments in #293 there's a few numerical issues to sort out. + there should be tests for these corner cases

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

@tlienart
1 added @inbounds.
This can be added other places in the same program (I don't wanna touch other people's code)

2 regarding a tolerance, this is an issue w/ any measure based on percentage errors #95
(currently RMSP does the same thing)
New code:

function (::MAPE)(ŷ::Vec{<:Real}, y::Vec{<:Real}; tol = eps())
    check_dimensions(ŷ, y)
    ret = zero(eltype(y))
    count = 0
    @inbounds for i in eachindex(y)
        ayi = abs(y[i])
        if ayi > tol
        #if y[i] != zero(eltype(y))
            dev = abs((y[i] - ŷ[i]) / ayi)
            #dev = abs((y[i] - ŷ[i]) / y[i])
            ret += dev
            count += 1
        end
    end
    return ret / count
end

3 regarding the corner case where count = 0, that is an inevitable known problem w/ any percentage based score-measure (including RMSP).

@tlienart
Copy link
Collaborator

Great thanks, @ablaom will probably complain on how you pass the tolerance parameter, it should be part of the MAPE struct.

Re inevitable errors etc, I would suggest showing a warning?

Re improving RMSP to also have a threshold + use inbounds, we should do it 👍🏼

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

None of the current measures have a tolerance parameter as part of the struct.
How can this be done?
I don't know how to throw a warning...

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

perhaps the current code could be written more parsimoniously using mean(), median etc?
(unless loops are faster?)
Define vector: e := ŷ - y
image

Define vector: p := e/y {for y[i]>tol}
image

etcetera?

@tlienart
Copy link
Collaborator

tlienart commented May 10, 2020

I think @ablaom can better comment on this PR, he’s the one who thought carefully about measures.

I think the PR is great.

There’s a wider discussion of performance, warnings etc but it can be done separately. And yes afaik mean is two-three times faster than the loop, possibly bc it uses SIMD (?) , not sure.

Warning can be shown with @warn but better to do this for everything rather than have only one metric that would do it, so could be a separate issue.

Ps; performance is not really a problem with metrics...

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

Ps; performance is not really a problem with metrics...

I fully agree, that's why I wasn't sure about the need for @inbounds, but why not...

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

@ablaom @tlienart see how much easier this could be:
(the only things missing are weights & tol)

function err(ŷ, y)
    return ŷ - y
end

function perr(ŷ, y; tol = eps())
    e = err(ŷ, y)
    p = 100 * e[y .!= 0] ./ y[y .!= 0]
    return p
end

using Statistics
mse(ŷ, y)  = err(ŷ, y) |> (x)->x.^2 |> mean
rmse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> mean |> sqrt
mae(ŷ, y)  = err(ŷ, y) |> x -> abs.(x) |> mean
mdae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> median
rmdse(ŷ, y) = err(ŷ, y) |> x -> x.^2 |> median |> sqrt
maxae(ŷ, y) = err(ŷ, y) |> x -> abs.(x) |> maximum

mape(ŷ, y)    = perr(ŷ, y) |> x -> abs.(x) |> mean
mdape(ŷ, y)   = perr(ŷ, y) |> x -> abs.(x) |> median
rmspe(ŷ, y)   = perr(ŷ, y) |> x -> x.^2 |> mean |> sqrt
rmdspe(ŷ, y)  = perr(ŷ, y) |> x -> x.^2 |> median |> sqrt

"https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error"
smape(ŷ, y)   = 200. * abs.(err(ŷ, y)) ./+ y) |> mean
smdape(ŷ, y)   = 200. * abs.(err(ŷ, y)) ./+ y) |> median

# TEST
y    = [-1, 0, 1, 2, 3, 4]
ŷ    = [-3, 1, 2, 2, 4, 5]

e = err(ŷ, y)
p = perr(ŷ, y)

mse(ŷ, y)
rmse(ŷ, y)
mae(ŷ, y) 
mdae(ŷ, y) 
rmdse(ŷ, y)
maxae(ŷ, y)

mape(ŷ, y)
mdape(ŷ, y)
rmspe(ŷ, y)
rmdspe(ŷ, y)
smape(ŷ, y)
smdape(ŷ, y) 

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azev77 Thanks very much indeed.

Yes, you need to make the tolerance part of the struct. Measures can't have kwargs. There is an example here:

https://github.com/alan-turing-institute/MLJBase.jl/blob/7f023250dd497c3ead048c06f894bcbeee0113f7/src/measures/finite.jl#L9

You may also want to add weight support but I won't insist.

It's not necessary in this PR, but ideally mape should be implemented as a reports_per_observation measure, as should all measures that are simply sum or mean aggregates of some function of a single observation (so, not rms and not auc). However, this improvement could be part of a larger refactoring project to eliminate a lot of redundant code for such loss functions. I'll open an issue.

(Yes, would be great if you can add this tolerance to Root mean squared percentage loss as well.)

@azev77
Copy link
Contributor Author

azev77 commented May 10, 2020

I'm not sure about struct & I took out kwarg, but put in a default tol.
Can a measure have arguments w/ default values?

struct MAPE <: Measure 
    tol::Real
end

"""
     mape(ŷ, y)
Mean Absolute Percentage Error:
``\\text{MAPE} =  m^{-1}∑ᵢ|{(yᵢ-ŷᵢ) \\over yᵢ}|`` 
where the sum is over indices such that `yᵢ≂̸0` and `m` is the number
of such indices.
For more information, run `info(mape)`.
"""
const mape = MAPE()

metadata_measure(MAPE;
    name                     = "mape",
    target_scitype           = Union{Vec{Continuous},Vec{Count}},
    prediction_type          = :deterministic,
    orientation              = :loss,
    reports_each_observation = false,
    is_feature_dependent     = false,
    supports_weights         = false,
    docstring                = "Mean Absolute Percentage Error; aliases: `mape`.")

function (::MAPE)(ŷ::Vec{<:Real}, y::Vec{<:Real}, tol = eps())
    check_dimensions(ŷ, y)
    ret = zero(eltype(y))
    count = 0
    @inbounds for i in eachindex(y)
        ayi = abs(y[i])
        if ayi > tol
        #if y[i] != zero(eltype(y))
            dev = abs((y[i] - ŷ[i]) / ayi)
            #dev = abs((y[i] - ŷ[i]) / y[i])
            ret += dev
            count += 1
        end
    end
    return ret / count
end

@ablaom
Copy link
Member

ablaom commented May 11, 2020

No, the idea is that you choose the tolerance at time of instantiation of the measure. As in

mape_dangerous = MAPE(tol=0)
mape_dangerous(yhat, y) 

You're missing a keyword constructor for the callable measure instances, as in CrossEntropy(;eps=eps()) = CrossEntropy(eps). Please see the cross_entropy example.

The signature of the callable object m can only be m(yhat, y) or m(yhat, y, w).

Clear?

@azev77
Copy link
Contributor Author

azev77 commented May 11, 2020

@ablaom sorry, not clear

  1. can you show me w/ the mape() example
  2. include it w/ the new upgraded measures.jl (if we're gonna throw out most of the code anyway)

@ablaom
Copy link
Member

ablaom commented May 11, 2020

1. can you show me w/ the mape() example

See the new mape fork. Here's the diff: 9aabed7

2. include it w/ the new upgraded measures.jl (if we're gonna throw out most of the code anyway)

Are you volunteering to carry out the refactor at JuliaAI/StatisticalMeasures.jl#17, then? I'm not going to get around to that for some time.

@azev77
Copy link
Contributor Author

azev77 commented May 11, 2020

@ablaom

  1. thanks for the code (I think I incorporated it)
  2. while I'd love to help w/ the refactor I don't know enough about the details of MLJ.
    I spilled my guts here.

@ablaom ablaom mentioned this pull request May 11, 2020
@ablaom
Copy link
Member

ablaom commented May 11, 2020

Closed in favour of #302 - essentially identical

@ablaom ablaom closed this May 11, 2020
@ablaom
Copy link
Member

ablaom commented May 11, 2020

@azev77 Many thanks for the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants