[Kernel] Add partition pruning related utility methods #2098

vkorukanti · 2023-09-24T21:58:21Z

Which Delta project/connector is this regarding?

Description

Part of #2071 (Partition Pruning in Kernel). This PR adds the following utility methods:

Dividing Predicate given to the ScanBuilder.withFilter into data column and partition column predicates
Rewrite the partition column Predicate to refer to the columns in the scan file columnar batch with the appropriate partition value deserialization expressions applied.

How was this patch tested?

Added UTs

scottsand-db

Looks awesome. left some comments.

scottsand-db · 2023-09-26T17:50:42Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/PartitionUtils.java

+        Set<String> partitionColNames) {
+        for (Expression child : children) {
+            if (child instanceof Column) {
+                String[] names = ((Column) child).getNames();


hm. i'm just noticing this getNames API. IMO getNames implies that the Column has multiple names, no? that seems odd. nestedName would be better?

Spark has filedNames. nameParts is another option. nestedName could imply it is just for nested columns.

nameParts is great

Doesn't need to be done in this PR. We can create an issue and assign it to the 3.1 milestone.

kernel/kernel-api/src/main/java/io/delta/kernel/internal/util/PartitionUtils.java

kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/PartitionUtilsSuite.scala

scottsand-db · 2023-09-26T18:02:25Z

kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/PartitionUtilsSuite.scala

+  val partitionTestCases = Map[Predicate, (String, String)](
+    // single predicate on a data column
+    predicate("=", col("data1"), ofInt(12)) ->
+      ("ALWAYS_TRUE()", "(column(`data1`) = 12)"),


nit: probably too late, but it should ideally just be

`column(data1)`

instead of

column(`data1`)

right?

i.e. we should only use back ticks for columns with .s in them?

column(a.b.`col.with.dot`.e)

Isn't this what spark does?

It is just a toString method. If I recall correctly Delta-Spark does the same (add `` for every column). I can make it just adding backtick for names containing ., not sure if it is worth it.

Cool. I just want to do what Spark does.

kernel/kernel-api/src/test/scala/io/delta/kernel/internal/util/PartitionUtilsSuite.scala

scottsand-db

LGTM, with 1 comment about the ALWAYS_FALSE case. Will let you decide to accept / ignore it.

This was referenced Sep 24, 2023

[Feature Request][Kernel] Support partition pruning in Kernel #2071

Closed

[Kernel] Implement partition pruning #2099

Merged

vkorukanti requested review from allisonport-db and scottsand-db September 25, 2023 06:09

vkorukanti force-pushed the partitionUtils branch 2 times, most recently from fbe875f to 5b46cd0 Compare September 25, 2023 17:13

scottsand-db reviewed Sep 26, 2023

View reviewed changes

vkorukanti added 2 commits September 27, 2023 14:49

[Kernel] Add partition pruning related utility methods

38e7ab1

review

6440261

vkorukanti force-pushed the partitionUtils branch from 5b46cd0 to 6440261 Compare September 28, 2023 17:31

vkorukanti added 2 commits September 28, 2023 10:50

scalastyle

3e8cce1

checkstyle

fe6a48e

scottsand-db approved these changes Sep 28, 2023

View reviewed changes

address review

b6aced4

vkorukanti merged commit ca69895 into delta-io:master Sep 28, 2023
7 checks passed

vkorukanti deleted the partitionUtils branch September 28, 2023 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Add partition pruning related utility methods #2098

[Kernel] Add partition pruning related utility methods #2098

vkorukanti commented Sep 24, 2023 •

edited

Loading

scottsand-db left a comment

scottsand-db Sep 26, 2023

vkorukanti Sep 28, 2023

scottsand-db Sep 28, 2023

scottsand-db Sep 28, 2023

scottsand-db Sep 26, 2023

vkorukanti Sep 28, 2023

scottsand-db Sep 28, 2023

scottsand-db left a comment

[Kernel] Add partition pruning related utility methods #2098

[Kernel] Add partition pruning related utility methods #2098

Conversation

vkorukanti commented Sep 24, 2023 • edited Loading

Which Delta project/connector is this regarding?

Description

How was this patch tested?

scottsand-db left a comment

Choose a reason for hiding this comment

scottsand-db Sep 26, 2023

Choose a reason for hiding this comment

vkorukanti Sep 28, 2023

Choose a reason for hiding this comment

scottsand-db Sep 28, 2023

Choose a reason for hiding this comment

scottsand-db Sep 28, 2023

Choose a reason for hiding this comment

scottsand-db Sep 26, 2023

Choose a reason for hiding this comment

vkorukanti Sep 28, 2023

Choose a reason for hiding this comment

scottsand-db Sep 28, 2023

Choose a reason for hiding this comment

scottsand-db left a comment

Choose a reason for hiding this comment

vkorukanti commented Sep 24, 2023 •

edited

Loading