-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order data by partitions if available #70
Comments
I'd also like to sort partitions by max value of the partition column, but couldn't find an easy way to get statistics out of the metadata |
The min/max values are available through the deltalake.DeltaTable.get_add_actions API mentioned in this issue: delta-io/delta-rs#2233 (comment) |
I'm going to reopen this for now. I think that we could probably do better by looking at the data coming out of the |
I've stored a bunch of data partitioned by date, and written it to delta using the deltalake package like so:
(although actually this was done in parallel, and so things maybe got written out of order
When I go to read it I find that the data isn't sorted by partition
We should order things if we can I think. I propose the following:
"stats"
attribute of the deltalake metadataProbably both the effort and uncertainty increase as we go down that list. The first item seems pretty straightforward to me though.
The text was updated successfully, but these errors were encountered: