Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mask= argument in LocBody #566

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Conversation

jrycw
Copy link
Collaborator

@jrycw jrycw commented Dec 16, 2024

Related Issue: #389.

Hello team,

This is a rough idea for supporting the passing of Polars expressions to the mask= argument of LobBody.

The procedures are outlined as follows:

  1. Exclude the stub and group columns if necessary.
  2. Use df.select() to process the Polars expression and obtain a boolean-like matrix.
  3. Use two for-loops to iterate through the cells and select the desired cells based on whether their values are True.

This idea is preliminary but should provide a foundation for further discussions.

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

df_mini = pl.from_pandas(gtcars).head().pivot(on="year", index=["trim", "mfr"], values="hp")
(
    GT(df_mini,rowname_col="trim", groupname_col="mfr")
    .tab_style(
        style=style.text(color="red"),
        locations=loc.body(mask=cs.numeric().gt(600))
    )
)

image

print(df_mini.select(cs.numeric().gt(600)))
shape: (3, 4)
┌────────┬────────┬────────┬────────┐
│ 2017.0 ┆ 2015.0 ┆ 2014.0 ┆ 2016.0 │
│ ---    ┆ ---    ┆ ---    ┆ ---    │
│ bool   ┆ bool   ┆ bool   ┆ bool   │
╞════════╪════════╪════════╪════════╡
│ true   ┆ null   ┆ null   ┆ null   │
│ null   ┆ false  ┆ false  ┆ true   │
│ null   ┆ false  ┆ null   ┆ null   │
└────────┴────────┴────────┴────────┘

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.70%. Comparing base (4043f14) to head (34d7452).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
great_tables/_locations.py 96.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
+ Coverage   90.67%   90.70%   +0.02%     
==========================================
  Files          46       46              
  Lines        5381     5409      +28     
==========================================
+ Hits         4879     4906      +27     
- Misses        502      503       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrycw jrycw marked this pull request as draft December 16, 2024 08:36
@jrycw jrycw changed the title Support mask= argument in GT.tab_style() Support mask= argument in LocBody() Dec 16, 2024
@jrycw jrycw changed the title Support mask= argument in LocBody() Support mask= argument in LocBody Dec 16, 2024
cols_excl = [*(stub_var if excl_stub else []), *(group_var if excl_group else [])]

frame: PlDataFrame = data._tbl_data
df = frame.select(expr)
Copy link
Collaborator

@machow machow Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the rules for the mask were as follows?:

  • the mask expressions should--when run by frame.select(mask_expr)--return a DataFrame that:
    • columns: has the same or fewer columns (names must match original frame, but could be in a different order).
    • rows: has the same number of rows
  • after running the expression, we validate this right away
  • the mask is assumed to be in the original row order
  • in this case we can capture the column name and row number of each cell (e.g. with enumerate(mask_result.iter_rows())).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for summarizing the rules.

It seems that for the last part, we only need a single loop to gather all the information required by CellPos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this version?

For prototyping, use assert for validation, which can later be replaced with raise ValueError().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced assert with raise ValueError() and added more test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants