Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate series ops from imperative ops in Pluggable Analysis Framework #171

Closed
1 task done
paddymul opened this issue Nov 19, 2023 · 1 comment
Closed
1 task done

Comments

@paddymul
Copy link
Owner

Checks

  • I have checked that this enhancement has not already been requested

How would you categorize this request. You can select multiple if not sure

Summary stats, Performance, Developer Experience/CI (feature to make it easier to devlop on Buckaroo)

Enhancement Description

For PAF most of the runtime is take up by series operations (nlargest, mean, median)

Most of the business logic is in regular python code. Look at assembling histograms as an example.

Pseudo Code Implementation

Refactor PAF so it works as follows, for polars in particular

have a series_selector function that returns statements like

class NewAnalysis:
    provides_series = ["mean", "bottom", "top"]
    @static_method
    def series_selector():
        return [
                    F.all().mean().name.prefix("mean:")
                    F.col(pl.Int64, pl.UInt32).bottom_k(5).name.prefix("bottom:"),
                    F.col(pl.Int64, pl.UInt32).top_k(5).name.prefix("top:")]

Then pluggable analysis framework works to make the rest of the logic work with regular python dag stuff

This way we can implement pandas logic and polars series methods, and have the rest of the python code work the same.

Prior Art

N/A

@paddymul
Copy link
Owner Author

closed by #182

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant