-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for multi-dimensional "label" for regressions? #38
Comments
See https://github.com/dmlc/xgboost/blob/master/doc/parameter.md and
the multi:softprob objective for how a vector output would be handled (as a
flattened matrix).
However a deeper question is what you expect to happen in the gradient
boosting regression model with a vector output that is different than
running a separate model for each dimension. If you can clarify what you
want to be different (other than just easier coding), then it will be
easier to see if XGBoost supports that.
- Scott
…On Fri, Feb 3, 2017 at 8:16 AM ExpandingMan ***@***.***> wrote:
Hello all. I haven't dug too far into the source code yet, but I'm
wondering if it's possible to do regressions where the "label" (target
value) consists of multi-dimensional data points. (i.e. the label
argument of the xgboost function would be an Array{T<:Number,2}.) This
seems like a pretty important feature, but I can't find any literature
about it in the xgboost documentation for any language.
It seems to me that even if it's not explicitly supported this should be
possible by setting a custom loss function, however I get the following
error any time I try to pass a matrix-valued "label":
ERROR: LoadError: MethodError: no method matching (::XGBoost.#_setinfo#8)(::Ptr{Void}, ::String, ::Array{Float64,2})
Closest candidates are:
_setinfo{T<:Number}(::Ptr{Void}, ::String, ::Array{T<:Number,1}) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:10
in (::XGBoost.##call#7#11)(::Array{Any,1}, ::Type{T}, ::Array{Float64,2}, ::Bool, ::Float32) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:59
in (::Core.#kw#Type)(::Array{Any,1}, ::Type{XGBoost.DMatrix}, ::Array{Float64,2}, ::Bool, ::Float32) at ./<missing>:0
in makeDMatrix(::Array{Float64,2}, ::Array{Float64,2}) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:137
in #xgboost#20(::Array{Float64,2}, ::Array{Any,1}, ::Array{Any,1}, ::Array{Any,1}, ::Type{T}, ::Type{T}, ::Array{Any,1}, ::Array{Any,1}, ::XGBoost.#xgboost, ::Array{Float64,2}, ::Int64) at /home/user/.julia/v0.5/XGBoost/src/xgboost_lib.jl:147
in (::XGBoost.#kw##xgboost)(::Array{Any,1}, ::XGBoost.#xgboost, ::Array{Float64,2}, ::Int64) at ./<missing>:0
in include_from_node1(::String) at ./loading.jl:488
while loading /home/user/RatingsPrediction/xgboost0.jl, in expression starting on line 43
Taking a look at the source code I get the impression it is not designed
to pass labels that aren't Vectors into the C code. Certainly the above
error seems to indicate that it is impossible to set a "label" that cannot
be converted to Vector.
Is there any way around this? Does the Python API support this? Thanks.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#38>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADkTxR-9mV7wWqzIJmMPFQo25l-uaQhBks5rY1LogaJpZM4L2gs5>
.
|
Thanks for your prompt response. I don't see any significant problem with using multiple models (as far as I can think, in the case of gradient boosted trees this should be exactly equivalent to "one" multi-dimensional model). Of course, one usually doesn't have to resort to this (from an API standpoint), hence the issue. Apart from convenience, I'd be a bit concerned about performance issues if I were fitting in a high-dimensional space, but perhaps that's unwarranted. |
Deep learning API's often allow vector output because they share parameters
during such multitask learning. My guess is since GBM's don't typically do
this, running separate models is the most explicit way of doing this
without implying that any parameter sharing is happening. I think Tianqi
wrote a paper with Carlos a while back on accounting for certain types of
dependence among the output features, so you might also check that out if
you want.
- Scott
…On Fri, Feb 3, 2017 at 9:49 AM ExpandingMan ***@***.***> wrote:
Thanks for your prompt response.
I don't see any significant problem with using multiple models (as far as
I can think, in the case of gradient boosted trees this should be exactly
equivalent to "one" multi-dimensional model). Of course, one usually
doesn't have to resort to this (from an API standpoint), hence the issue.
Apart from convenience, I'd be a bit concerned about performance issues if
I were fitting in a high-dimensional space, but perhaps that's unwarranted.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#38 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADkTxcgs7lxRJCWNUCMsUdKnn-edWr3zks5rY2iNgaJpZM4L2gs5>
.
|
I have a related confuse, according to some out-of-date documentation, eg:
output:
so my question is,
thanks for explanation in advance |
The matrix input for labels is a recent addition (1.6) for multi-output and multi-label, the getter hasn't been able to return the matrix yet. |
Hello all. I haven't dug too far into the source code yet, but I'm wondering if it's possible to do regressions where the "label" (target value) consists of multi-dimensional data points. (i.e. the
label
argument of thexgboost
function would be anArray{T<:Number,2}
.) This seems like a pretty important feature, but I can't find any literature about it in the xgboost documentation for any language.It seems to me that even if it's not explicitly supported this should be possible by setting a custom loss function, however I get the following error any time I try to pass a matrix-valued "label":
Taking a look at the source code I get the impression it is not designed to pass labels that aren't
Vector
s into the C code. Certainly the above error seems to indicate that it is impossible to set a "label" that cannot be converted toVector
.Is there any way around this? Does the Python API support this? Thanks.
The text was updated successfully, but these errors were encountered: