-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial skbase
design proposal
#76
Conversation
enhancement_proposals/sbep_1.md
Outdated
|
||
#### HeterogenousMetaObject | ||
|
||
**DESIGN DECISION: Do we want to provide specific base classes for useful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, yes, that would be useful!
Wonder whether the polymorphic base class would also be useful here: sktime/sktime#3108
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I like the idea of making it easy for people to build pipelines. Since the goal of this design document is to provide the outline of the interface, we can defer the implementation details to a follow-up design specific to the class implementations.
@fkiraly, @miraep8 and @benHeid my initial thoughts are providing 3 meta classes: HeterogenousMetaObject (downstream users can make their own metaobjects), BasePipeline (makes it easier to make standard linear pipelines), BaseNetworkPipeline (makes it easier to build networked pipelines). But I'm up for a potential polymorphic approach to reduce the number of classes. For now we just need to state the use cases we intend to support. Do those 3 use cases make sense to everyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, agreed. Sorry, did not see the reply earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(busy with pydata :-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, reads well and makes a lot of sense design-wise!
Aware that there are a few open sections, but I don't mind what content goes in there (assuming it is a sensible extrapolate of the rest).
One possible set of features revolves around manipulating (base-)objects on the class level, e.g., composite class contraction: #31
I think this is pretty advanced though and may require working very close to python
language features (dynamic generation of class definitions etc), so it may be the last of the things of the list to work on.
(no change requests, since this is consolidating existing designs and ideas discussed on hackmd earlier, with some nice new ideas) |
Sounds good, I'll give you a heads up when I finalize it though, so you have a chance to see what I write for those sections. |
skbase
design proposal
@fkiraly I've updated this PR to fill out the rest of the design proposal based on what we've talked about. Can you take another look to make sure I've represented everything correctly. |
Codecov Report
@@ Coverage Diff @@
## main #76 +/- ##
==========================================
- Coverage 77.94% 77.93% -0.02%
==========================================
Files 23 23
Lines 1877 1867 -10
==========================================
- Hits 1463 1455 -8
+ Misses 414 412 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
enhancement_proposals/sbep_1.md
Outdated
class level interface. Other classes are subclasses of `BaseObject`. | ||
- [BaseEstimator](#BaseEstimator): A subclass of `BaseObject` that adds a | ||
high-level interface for *fittable* estimators | ||
- [BaseMetaObject](#BaseMetaObject): A subclass of `BaseObject` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, you're subclassing this. Is this fine?
What would an estimator be that is also a metaobject? Do we get a diamond inheritance diagram then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fkiraly aren't forecasting pipelines really estimators that are also metaobjects? That being said, I went the Mixin route in the final version.
- `BasePipeline` expands on the `BaseMetaObject` framework by providing | ||
common functionality for (linear) stepped based pipelines like those found | ||
in `scikit-learn` and `sktime` | ||
- `BaseDAGPipeline` expands on the `BaseMetaObject` framework by providing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, this is probably going to be hairy, because it may require base class polymorphism.
Have you been involved with the pywatts discussion stream?
Although, in skbase
, it might be easier since there is only one scitype (or two).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - do we need a BasePipeline class at all? Or do we simply need to make a sufficiently flexible pipeline class? I think having a BasePipeline class opens one up to making many types of pipeline classes (which is kind of what we do in sktime now tbh). But a better way to do this is perhaps to just build a pipeline that can handle sufficient complexity of different use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miraep8 I think that is a valid point for a single package. But the end-stage goal is for skbase
to be used external to sktime
(I for instance have other use-cases and the hope is it would make creating a new package easy for someone else too). In that case, a pipeline in a different package would be able to inherit some base functionality, but the implementation of the "action" or "state-changing" methods that the pipeline performs would be up to the external developer.
common functionality for (linear) stepped based pipelines like those found | ||
in `scikit-learn` and `sktime` | ||
- `BaseDAGPipeline` expands on the `BaseMetaObject` framework by providing | ||
functionality for pipelines composed of directed-acyclic graphs (DAGs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+100.
Left some minor comments.
looking up) any `BaseObject`'s from a project. | ||
- `skbase.validate` will include tools for validating and comparing `BaseObject`'s | ||
and/or collections of `BaseObject`'s. | ||
- `skbase.testing` will include tools for testing `BaseObject`s for interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I just wanted to say that I love the idea of skbase.validate and skbase.testing. Especially think there is a space to also specify the tools as classes themselves. (eg a LengthChecker that can validate that certain specified inputs have the same type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed! Did not notice it, but I agree with @miraep8 that tests as objects would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an interesting idea, though beyond the intended scope of this proposal. I do think we should revisit this topic when we dive further into the skbase.testing
interface (hopefully in the not too distant future).
Going this route, testing is really executing a pipeline of "checker" or "validator" objects (which makes intuitive sense). This could also be useful in parameter validation.
BaseObjects are base classes with: | ||
|
||
- `scikit-learn` style interface to get and set parameters | ||
- `sktime` style interface for working with *tags* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question - has come up recently within sktime
- I think one could in theory replace a lot of the functionality currently captured in tags
via input validation. Of course I think tags
could still play an important role in estimator search.. but think this could still be done in a more robust way using the input tests - ie do we implement this type of input for test for this object etc. (Not necessarily against tags
across the board, but think in the current implementation they are perhaps a bit under-documented..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's an old, related, STEP that never got implemented: https://github.com/sktime/enhancement-proposals/blob/main/steps/05_scitype_based_IO_checks/step.md
I would do it differently today, but just FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miraep8 from the skbase
perspective we want to make it possible to use "tags". We leave the usage to the individual packages (but I agree there can be non-tag ways to accopmlish the same outcome. But I do believe there are cases where tags can make the most sense).
@RNKuhns, fine with me - feel free to merge when you are happy with this being in its final form. (I was expecting to see this merged already) |
Reference Issues/PRs
Fixes #2, #12, #32
What does this implement/fix? Explain your changes.
This outlines
skbase
's initial design expectations.What should a reviewer concentrate their feedback on?
Provide initial feedback on the broad proposal (does the stated functionality make sense broadly, and if so, in the proposed sub-module layout). If that makes sense, we'll iterate on the more detailed design points until agreement is reached.