Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial skbase design proposal #76

Merged
merged 15 commits into from
Jan 8, 2023
Merged

Initial skbase design proposal #76

merged 15 commits into from
Jan 8, 2023

Conversation

RNKuhns
Copy link
Contributor

@RNKuhns RNKuhns commented Nov 22, 2022

Reference Issues/PRs

Fixes #2, #12, #32

What does this implement/fix? Explain your changes.

This outlines skbase's initial design expectations.

What should a reviewer concentrate their feedback on?

Provide initial feedback on the broad proposal (does the stated functionality make sense broadly, and if so, in the proposed sub-module layout). If that makes sense, we'll iterate on the more detailed design points until agreement is reached.


#### HeterogenousMetaObject

**DESIGN DECISION: Do we want to provide specific base classes for useful
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, yes, that would be useful!

Wonder whether the polymorphic base class would also be useful here: sktime/sktime#3108

FYI @miraep8, @benHeid re pipelines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I like the idea of making it easy for people to build pipelines. Since the goal of this design document is to provide the outline of the interface, we can defer the implementation details to a follow-up design specific to the class implementations.

@fkiraly, @miraep8 and @benHeid my initial thoughts are providing 3 meta classes: HeterogenousMetaObject (downstream users can make their own metaobjects), BasePipeline (makes it easier to make standard linear pipelines), BaseNetworkPipeline (makes it easier to build networked pipelines). But I'm up for a potential polymorphic approach to reduce the number of classes. For now we just need to state the use cases we intend to support. Do those 3 use cases make sense to everyone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, agreed. Sorry, did not see the reply earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(busy with pydata :-) )

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, reads well and makes a lot of sense design-wise!

Aware that there are a few open sections, but I don't mind what content goes in there (assuming it is a sensible extrapolate of the rest).

One possible set of features revolves around manipulating (base-)objects on the class level, e.g., composite class contraction: #31

I think this is pretty advanced though and may require working very close to python language features (dynamic generation of class definitions etc), so it may be the last of the things of the list to work on.

@fkiraly
Copy link
Contributor

fkiraly commented Nov 22, 2022

(no change requests, since this is consolidating existing designs and ideas discussed on hackmd earlier, with some nice new ideas)

@fkiraly fkiraly added the API design API design & software architecture label Nov 22, 2022
@RNKuhns
Copy link
Contributor Author

RNKuhns commented Nov 22, 2022

(no change requests, since this is consolidating existing designs and ideas discussed on hackmd earlier, with some nice new ideas)

Sounds good, I'll give you a heads up when I finalize it though, so you have a chance to see what I write for those sections.

@RNKuhns RNKuhns mentioned this pull request Dec 5, 2022
@RNKuhns RNKuhns changed the title WIP draft of initial design proposal Initial skbase design proposal Dec 5, 2022
@RNKuhns
Copy link
Contributor Author

RNKuhns commented Dec 5, 2022

@fkiraly I've updated this PR to fill out the rest of the design proposal based on what we've talked about. Can you take another look to make sure I've represented everything correctly.

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2022

Codecov Report

Merging #76 (3dd3948) into main (01f6b01) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
- Coverage   77.94%   77.93%   -0.02%     
==========================================
  Files          23       23              
  Lines        1877     1867      -10     
==========================================
- Hits         1463     1455       -8     
+ Misses        414      412       -2     
Impacted Files Coverage Δ
skbase/base/_meta.py 15.27% <ø> (-1.08%) ⬇️
skbase/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

class level interface. Other classes are subclasses of `BaseObject`.
- [BaseEstimator](#BaseEstimator): A subclass of `BaseObject` that adds a
high-level interface for *fittable* estimators
- [BaseMetaObject](#BaseMetaObject): A subclass of `BaseObject`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, you're subclassing this. Is this fine?
What would an estimator be that is also a metaobject? Do we get a diamond inheritance diagram then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fkiraly aren't forecasting pipelines really estimators that are also metaobjects? That being said, I went the Mixin route in the final version.

- `BasePipeline` expands on the `BaseMetaObject` framework by providing
common functionality for (linear) stepped based pipelines like those found
in `scikit-learn` and `sktime`
- `BaseDAGPipeline` expands on the `BaseMetaObject` framework by providing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, this is probably going to be hairy, because it may require base class polymorphism.
Have you been involved with the pywatts discussion stream?

Although, in skbase, it might be easier since there is only one scitype (or two).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - do we need a BasePipeline class at all? Or do we simply need to make a sufficiently flexible pipeline class? I think having a BasePipeline class opens one up to making many types of pipeline classes (which is kind of what we do in sktime now tbh). But a better way to do this is perhaps to just build a pipeline that can handle sufficient complexity of different use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miraep8 I think that is a valid point for a single package. But the end-stage goal is for skbase to be used external to sktime (I for instance have other use-cases and the hope is it would make creating a new package easy for someone else too). In that case, a pipeline in a different package would be able to inherit some base functionality, but the implementation of the "action" or "state-changing" methods that the pipeline performs would be up to the external developer.

common functionality for (linear) stepped based pipelines like those found
in `scikit-learn` and `sktime`
- `BaseDAGPipeline` expands on the `BaseMetaObject` framework by providing
functionality for pipelines composed of directed-acyclic graphs (DAGs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100.

Left some minor comments.

looking up) any `BaseObject`'s from a project.
- `skbase.validate` will include tools for validating and comparing `BaseObject`'s
and/or collections of `BaseObject`'s.
- `skbase.testing` will include tools for testing `BaseObject`s for interface
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I just wanted to say that I love the idea of skbase.validate and skbase.testing. Especially think there is a space to also specify the tools as classes themselves. (eg a LengthChecker that can validate that certain specified inputs have the same type).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed! Did not notice it, but I agree with @miraep8 that tests as objects would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an interesting idea, though beyond the intended scope of this proposal. I do think we should revisit this topic when we dive further into the skbase.testing interface (hopefully in the not too distant future).

Going this route, testing is really executing a pipeline of "checker" or "validator" objects (which makes intuitive sense). This could also be useful in parameter validation.

BaseObjects are base classes with:

- `scikit-learn` style interface to get and set parameters
- `sktime` style interface for working with *tags*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question - has come up recently within sktime - I think one could in theory replace a lot of the functionality currently captured in tags via input validation. Of course I think tags could still play an important role in estimator search.. but think this could still be done in a more robust way using the input tests - ie do we implement this type of input for test for this object etc. (Not necessarily against tags across the board, but think in the current implementation they are perhaps a bit under-documented..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's an old, related, STEP that never got implemented: https://github.com/sktime/enhancement-proposals/blob/main/steps/05_scitype_based_IO_checks/step.md

I would do it differently today, but just FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miraep8 from the skbase perspective we want to make it possible to use "tags". We leave the usage to the individual packages (but I agree there can be non-tag ways to accopmlish the same outcome. But I do believe there are cases where tags can make the most sense).

@fkiraly
Copy link
Contributor

fkiraly commented Dec 25, 2022

@RNKuhns, fine with me - feel free to merge when you are happy with this being in its final form. (I was expecting to see this merged already)

@fkiraly
Copy link
Contributor

fkiraly commented Dec 31, 2022

What do we do with this PR, @RNKuhns, @miraep8?

@RNKuhns
Copy link
Contributor Author

RNKuhns commented Jan 7, 2023

What do we do with this PR, @RNKuhns, @miraep8?

@fkiraly I plan to finish this by early next week.

Want to look at things 1 more time.

@RNKuhns
Copy link
Contributor Author

RNKuhns commented Jan 8, 2023

What do we do with this PR, @RNKuhns, @miraep8?

@fkiraly I plan to finish this by early next week.

Want to look at things 1 more time.

This is now ready to merge. Will do so an include in the v0.3.0 release.

@RNKuhns RNKuhns merged commit 5d3ab75 into sktime:main Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design API design & software architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BaseObject design reconciliation points
4 participants