-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compliance with ScikitLearn API #39
Comments
As far as I know, there are no concrete plans to provide this functionality for the Julia package, because the general approach is to provide a consistent interface for XGBoost across languages. It's this consistent version of the interface that is implemented for Julia, and I'm currently refactoring the package to be even more consistent (for some progress, see https://github.com/Allardvm/XGBoost.jl/tree/package-refactor). For most languages, however, there is some alternative functionality specific to that language, such as the scikit-learn interface for Python—and the Julia package doesn't have this. In this sense, there is a case to be made to provide an alternative interface, but at this point there isn't a single dominant framework like scikit-learn for Julia that provides clear guidelines on what (and how) to implement the interface. For another Gradient Boosting library, I build a Julia interface that is somewhat similar to scikit-learn in Python (see https://github.com/Allardvm/LightGBM.jl) that would satisfy your use-case and that would be reasonably easy to implement for XGBoost as well. |
Well, other conformity issues aside, at least across Julia and Python packages that I am aware of, I find that all machine learning packages have:
The major exception that I'm aware of are methods that accept higher rank tensors. So, even though Julia lacks a package as universally utilized as ScikitLearn is in Python, most of the machine learning code available for it at the moment still works this way. At the moment I can't think of a single other example in either Julia or Python where one can't declare the model object independently of training it. To give you an example of my use case, I frequently use
In all of these cases except for XGBoost.jl one can create a model object before training it. Other details of the interface aside, not having this breaks my process and necessitates specialized code for XGBoost. The other details don't matter to me nearly as much: if the I could be wrong, but I strongly suspect many other users have exactly this same issue. I'm not trying to make an argument for any other aspect of the interface but this one which, to me at least, causes headaches. Anyway, that's the best case I think I can make, so I won't try to persuade you guys any further. Thanks for maintaining this package, it seems quite robust and (other than this one issue) easy to use. Edit: And, of course, adding this functionality would not mean that you'd have to get rid of the existing |
I can see why you would want consistency, though in this case it is tricky since consistency with the other XGBoost bindings is also valuable. Since @Allardvm is working on the refactor I would leave it to him to see what is best. One comment though is that you could call |
Thanks for the tip. It doesn't completely "fix" all the data input/output steps but it's quite useful! |
It wouldn't be hard to implement the ScikitLearn.jl interface on top of the existing code by creating a new type. It's what I did for DecisionTree.jl (see this file) and LowRankModels.jl. I understand the reluctance to add more code. FWIW, the ScikitLearnBase.jl interface has been essentially unchanged since it started, and sticks very close to scikit-learn. I could make a PR if there is interest. If there isn't, well, I'm kinda curious to see what a pure-Julia XGBoost would look like, and might start work on that. Has there been any effort on that side? |
It would be kind of nice to make a common machine learning interface for Julia, sort of the way JuMP works for optimization. There is MLBase, but for me, it would be nicer to have an interface that I could plug any of ScikitLearn, XGBoost, TensorFlow, MXNet into. I've toyed with the idea of doing this, but frankly about 95% of my time and effort is spent getting the data into the proper form to be ingested by machine learning, so the machine learning interfaces themselves have seemed like a relatively minor issue. |
That's the objective behind |
I'll look into the ScikitLearnBase.jl interface. It closely matches what I planned for the XGBoost interface anyway, so we might as well make it compatible. Since I'm quite busy at the moment, progress is a bit slow, but you can check the current state here: https://github.com/Allardvm/XGBoost.jl/tree/package-refactor. This version is fully functional and you're welcome to contribute/test it. |
Just checking in here after a long time. What's the current plan for APIs and such here? There's the ScikitLearn.jl approach and the MLJ.jl folks as well. |
Funny how little patience I have to go back and read my own incredibly verbose issue posts from a long time ago. As of 2.0, the layout of this package largely matches libxgboost itself. The I'm therefore closing this issue. |
Like many, I frequently use the "ScikitLearn" paradigm where I create a model object and then call functions like
fit!
andpredict
. In Julia, with multiple dispatch, it's trivially easy to get arbitrarily complicated machine learning methods to follow this paradigm.Except... with this package. This is because the model creation and training are done with one function call. This means that in most cases one has to write custom code to plug in XGBoost, as this is the only package I'm aware of where these are done in one step.
Has there been any thought to adding user friendly code to allow separate creation of model objects? It looks like this is possible using the
Booster
class, but as it is one would have to rewrite most of thexgboost
function to set the parameters and so forth.The text was updated successfully, but these errors were encountered: