Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with sklearn estimators #98

Open
edwardwliu opened this issue Apr 18, 2023 · 6 comments · May be fixed by #146
Open

Compatibility with sklearn estimators #98

edwardwliu opened this issue Apr 18, 2023 · 6 comments · May be fixed by #146
Assignees
Labels
enhancement New feature or request Python Python changes

Comments

@edwardwliu
Copy link
Collaborator

We're interested in compatibility with sklearn estimators (see instructions) for integration with other libraries like Ray. The goal would be to pass the check_estimator() check. A few specific items to respect requirements include: class __init__ parameters should be explicit and changing the newdata arg to x , adhered by subsequent methods.

@petrovicboban
Copy link
Contributor

I have no Data Science skills and experience, so I need more details here. It's very likely I would need serious DS training to get into it, and I can't be productive in this domain yet.

@edwardwliu
Copy link
Collaborator Author

I see your concerns. I shared some additional context via Slack, but IIUC you don't have access or to email for the next two weeks, is that correct? This request stems from an exploration of sklearn compatibility by others that I've uploaded here. I think running check_estimator() in the example then addressing pure syntax changes could be an appropriate starting point. Let me know if you run into specific issues or continue to have concerns.

@petrovicboban
Copy link
Contributor

I still have no idea of what's going on in that code and what the goal is. No idea what sklearn/ray is too, I have no exposure to ds frameworks at all.
DS is not something I can jump in just like that, without any theoretical knowledge. I'm sorry but you'll have to reassign this task to someone else.

@edwardwliu
Copy link
Collaborator Author

Thanks for sharing the context. I shared some details earlier over Slack, but realize you may not have access. Let's chat tomorrow about the best path forward!

@edwardwliu
Copy link
Collaborator Author

edwardwliu commented Apr 19, 2023

Chatted about this, appreciate you taking the time to sync. Documenting some of the additional context discussed here.

The Python package was originally developed with the intention of natively following an interface called "scikit-learn", but deltas have been introduced over time. The interface defines some requirements for class initialization, function naming, and parameter. Adhering to this interface allows easy integration with many other Python libraries that can parallelize model training or improve performance. The fixes should be syntactical and be limited to appropriately extending class requirements. However, if any errors touch model logic, requires data science context, or if any questions arise at all, please tag me and I'll resolve those myself as soon as I can.

The starting step is to run and eventually pass the following check:

from random_forestry import RandomForest
from sklearn.utils.estimator_checks import check_estimator

check_estimator(RandomForest())

The checks can be found in estimator_checks.py() with details outlined in the documentation starting from this section and below. The bulk of the change should be the following:

  • Renaming getter and setter methods to get_params and set_params
  • Explicitly define __init__() that aligns with params
  • Functions with the data frame argument newdata should be renamed X

Other changes may be class specific so as mentioned above, let's work closely on this if questions arise!

@petrovicboban petrovicboban added enhancement New feature or request Python Python changes labels May 2, 2023
@petrovicboban petrovicboban linked a pull request May 11, 2023 that will close this issue
@Ilia-Shutov Ilia-Shutov reopened this Aug 22, 2023
@Ilia-Shutov
Copy link
Contributor

Created PR #146

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python Python changes
Projects
None yet
3 participants