Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add df.columns property to improve visibility of generated API for typed column access #957

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

koperagen
Copy link
Collaborator

@koperagen koperagen commented Nov 19, 2024

It's not always easy to tell if code generation in notebooks or compiler plugin worked as expected or find columns you're interested in without printing schema. With this change, people will have an option to overview generated column properties

In notebooks:

val df = DataFrame.read("file.csv")
==== next code cell
df. // column properties are mixed together with methods, not easy to find unless you already know names
df.columns. // easy to overview available columns

image

In compiler plugin:

val df = @Import DataFrame.read("file.csv")
df.columns.

image

@koperagen koperagen added the enhancement New feature or request label Nov 19, 2024
@koperagen koperagen added this to the 0.15.0 milestone Nov 19, 2024
@koperagen koperagen self-assigned this Nov 19, 2024
@Jolanrensen
Copy link
Collaborator

It will clash if you have df.group { a and b }.into("columns"), right?
I suspect this is the reason we have df.size df.size(), df.nrow df.rowsCount(), etc in the public api.
Only functions, no properties. Now, df.columns() already exists, but maybe df.columnScope() could work?

@koperagen
Copy link
Collaborator Author

Generated properties override "columns", so for users it means that they loose ability to overview columns if they have column with this name. It won't prevent them from accessing their "columns" column though. That's why i think it's a good trade off, no?
image

image

@Jolanrensen
Copy link
Collaborator

Hmm that means the API would have certain parts be available depending on the data.
I think this is not a good idea. It may create unexpected behavior and copy/pasting of examples failing, etc. Plus, it's a slippery slope to adding more properties to DataFrame and more potential clashes... So no, I would say we maintain our convention of not having any public properties to DataFrame, just the generated ones.

Functions are fine of course :)

@koperagen
Copy link
Collaborator Author

koperagen commented Nov 19, 2024

There's an important part: it's not available in selection dsl, so potential conflicts are limited to only df. column access. Generally, it's not even useful api - It's an utility to mitigate completion list clutter and the fact there's no good way to see generated code
We can simply agree to not add properties other than this one

df.columnsScope().columnGroup.columnsScope().
df.columns.columnGroup.columns.

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented Nov 19, 2024

Yes, but we'd still have conflicts that don't need to be there. What's wrong with making a function instead?
I think df.availableColumns(), or df.columnsScope(), for instance, would be a nice alternative that can never clash :)

@koperagen
Copy link
Collaborator Author

Because df.columns is a better name. We'd have conflicts - if someone wants to overview columns (which wasn't even possible before) and miraculously have "columns" in their dataframe

@Jolanrensen
Copy link
Collaborator

it's also an option to deprecate the old df.columns() -> df.getColumns() and migrate the functionality to this. That would be fine by me, but a property, I can't agree with

@zaleslaw
Copy link
Collaborator

I need a pause for a week to play with it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants