Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Data Explorer Summary Panel statistics #2161

Open
10 of 14 tasks
jthomasmock opened this issue Jan 30, 2024 · 13 comments
Open
10 of 14 tasks

Epic: Data Explorer Summary Panel statistics #2161

jthomasmock opened this issue Jan 30, 2024 · 13 comments
Assignees
Labels
area: data explorer Issues related to Data Explorer category. epic Epic

Comments

@jthomasmock
Copy link
Contributor

jthomasmock commented Jan 30, 2024

When the Summary Panel is expanded, it will dynamically calculate and then reveal additional summary statistics for that specific column. This is a lazy operation in the backend as it would otherwise be costly for long/wide datasets.

Summary stats will be right aligned at the decimal place:

NA:           15
Median:       14
Mean:         15.7
SD:            2.1
Min:           1.2
Max:          20.3

Completed:

  • Numeric in R
  • Numeric in Python
  • String in R
  • String in Python
  • Date in R
  • Date in Python
  • Datetime in R
  • Datetime in Python
  • Boolean in R
  • Boolean in Python
  • Unknown in R
  • Unknown in Python

Parent Categorical: #3417

  • Factor/Categorical in R
  • Factor/Categorical in Python

Number

  • Median
  • Mean
  • Standard Deviation (SD)
  • Min
  • Max

Boolean

  • TRUE N (%)
  • FALSE N (%)

String

  • Empty: N ( this is equivalent to a "" string - implicit missing)
  • Unique (Number of unique strings)

String sub-category: Categorical/Factor

  • Levels - Ordered/Not Ordered + number

Date or Datetime or time

  • Number of unique
  • Mean
  • Median
  • Min
  • Max
  • If time, timezone

Array -- holding off for now

  • Number of unique

Struct -- holding off for now

  • Number of unique

Unknown -- holding off for now

  • Number of unique
@jthomasmock jthomasmock added the epic Epic label Jan 30, 2024
@jthomasmock jthomasmock added this to the Private Alpha 2024 Q2 milestone Jan 30, 2024
@petetronic petetronic modified the milestones: Private Alpha 2024 Q2, Public Beta Feb 8, 2024
@petetronic petetronic changed the title Epic: Data Viewer Summary Panel statistics Epic: Data Explorer Summary Panel statistics Feb 14, 2024
@softwarenerd
Copy link
Contributor

/**
 * Possible values for TypeDisplay in ColumnSchema
 */
export enum ColumnSchemaTypeDisplay {
	Number = 'number',
	Boolean = 'boolean',
	String = 'string',
	Date = 'date',
	Datetime = 'datetime',
	Time = 'time',
	Array = 'array',
	Struct = 'struct',
	Unknown = 'unknown'
}

@jthomasmock
Copy link
Contributor Author

@softwarenerd -- I've converted the headers above to type_display enum.

@wesm wesm added area: data explorer Issues related to Data Explorer category. and removed epic Epic labels Feb 29, 2024
@petetronic petetronic added the epic Epic label Feb 29, 2024
@wesm wesm self-assigned this Apr 2, 2024
@wesm
Copy link
Contributor

wesm commented Apr 2, 2024

I'm working on improvements in the backend protocol to better support these statistics right now.

I'm not sure it makes sense to compute number of unique values for arrays and structs for now -- there are varying degrees of ease of computing this in different backends, so I'll punt on that for now and we can address it later once we can investigate how to compute that consistently.

@jthomasmock
Copy link
Contributor Author

jthomasmock commented Apr 2, 2024

Sounds good! I also think it'd be interesting to hear from users on what types of metrics they'd like. I've indicated that we're holding off on the array/structs/unknowns for now

@jthomasmock
Copy link
Contributor Author

We can close this once #3021 is merged and validated.

@petetronic
Copy link
Collaborator

petetronic commented May 16, 2024

@jthomasmock do we have a good test data that exercises all of the types and thus the column summary statistics? (including precision, null, empty, various types, etc)

we'd want QA to exhaustively cover these statistics to check their validity for the data set.

@jthomasmock
Copy link
Contributor Author

@jthomasmock do we have a good test data that exercises all of the types and thus the column summary statistics? (including precision, null, empty, various types, etc)

we'd want QA to exhaustively cover these statistics to check their validity for the data set.

I can work on this.

There are some example tests at: https://github.com/r-lib/pillar/blob/main/tests/testthat/test-format_decimal.R

@jmcphers
Copy link
Collaborator

@jthomasmock is there still work to do for Beta on this now that #3021 is merged and validated? (we do need tests but we can close this without them)

@jthomasmock
Copy link
Contributor Author

@jmcphers I think we are still missing date/datetime stats in: Positron Version: 2024.05.0 (Universal) build 1307

image

@dfalbel
Copy link
Contributor

dfalbel commented May 29, 2024

I could pick the backend side for those, I'm assuming @wesm is not working on it yet, right?

@wesm
Copy link
Contributor

wesm commented May 29, 2024

I'm working on the float formatting as we speak, so feel free to pick this up

@jthomasmock
Copy link
Contributor Author

The checkboxes above are the missing stats as of 2024-05-29. Boolean, date, datetime, factor/categorical, and unknown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: data explorer Issues related to Data Explorer category. epic Epic
Projects
None yet
Development

No branches or pull requests

6 participants