-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add partition pruning related utility methods #2098
Conversation
fbe875f
to
5b46cd0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome. left some comments.
Set<String> partitionColNames) { | ||
for (Expression child : children) { | ||
if (child instanceof Column) { | ||
String[] names = ((Column) child).getNames(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. i'm just noticing this getNames
API. IMO getNames
implies that the Column has multiple names, no? that seems odd. nestedName
would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark has filedNames
. nameParts
is another option. nestedName
could imply it is just for nested columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nameParts
is great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be done in this PR. We can create an issue and assign it to the 3.1 milestone.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/PartitionUtils.java
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/PartitionUtils.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/PartitionUtils.java
Show resolved
Hide resolved
kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/PartitionUtilsSuite.scala
Outdated
Show resolved
Hide resolved
val partitionTestCases = Map[Predicate, (String, String)]( | ||
// single predicate on a data column | ||
predicate("=", col("data1"), ofInt(12)) -> | ||
("ALWAYS_TRUE()", "(column(`data1`) = 12)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably too late, but it should ideally just be
`column(data1)`
instead of
column(`data1`)
right?
i.e. we should only use back ticks for columns with .
s in them?
column(a.b.`col.with.dot`.e)
Isn't this what spark does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a toString
method. If I recall correctly Delta-Spark does the same (add `` for every column). I can make it just adding backtick for names containing .
, not sure if it is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I just want to do what Spark does.
kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/PartitionUtilsSuite.scala
Outdated
Show resolved
Hide resolved
5b46cd0
to
6440261
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with 1 comment about the ALWAYS_FALSE case. Will let you decide to accept / ignore it.
Which Delta project/connector is this regarding?
Description
Part of #2071 (Partition Pruning in Kernel). This PR adds the following utility methods:
Predicate
given to theScanBuilder.withFilter
into data column and partition column predicatesPredicate
to refer to the columns in the scan file columnar batch with the appropriate partition value deserialization expressions applied.How was this patch tested?
Added UTs