-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add partition_value
and element_at
expressions
#2096
Conversation
503cde6
to
3d841b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
* deserialized as according to the Delta Protocol. | ||
*/ | ||
public PartitionValueExpression( | ||
Expression serializedPartitionValue, DataType partitionDataType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 8 space indent, no? (are we on the same page about this? perhaps I missed a decision where we don't want this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code style issue can be addressed at once.
Usually the {
on a new line, simplifies this, but we choose to keep the starting bracket on the same line. Also I don't see an easy way to enforce this Intellij without affecting every multi-line statement.
kernel/kernel-api/src/main/java/io/delta/kernel/expressions/ScalarExpression.java
Show resolved
Hide resolved
.../src/main/java/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluator.java
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Utility method to evaluate the {@code element_at} on given map and key vectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but there's no element_at
as input into this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a need for it. Both the map and lookup key values are passed which are sufficient for evaluating the element_at
expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why split up the input? instead of just passing in the actual element_at instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
element_at
expression is just the definition, it doesn't have any input data.
...defaults/src/main/java/io/delta/kernel/defaults/internal/expressions/ElementAtEvaluator.java
Outdated
Show resolved
Hide resolved
static ColumnVector eval(ColumnVector map, ColumnVector lookupKey) { | ||
return new ColumnVector() { | ||
// Store the last lookup value to avoid multiple looks up for same row id. | ||
private int lastLookupRowId = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the intuition behind assuming this will be a common lookup case?
this is essentially just a cache of size 1. any reason not to have a cache of size, say, 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the pattern is: isNullAt(rowId)
based on the value, make a call to get the value (getValue
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment :)
...defaults/src/main/java/io/delta/kernel/defaults/internal/expressions/ElementAtEvaluator.java
Outdated
Show resolved
Hide resolved
/** | ||
* Utility method to return the left child of the binary input expression | ||
*/ | ||
static Expression getLeft(Expression expression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there not a BinaryExpression class that could have a getLeft
API?
This seems like a very c / struct type design, instead of a OOP design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like another public class BinaryExpression
just for getLeft
and getRight
. Not sure if it is worth bloating the interface for utility methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im worried that these APIs won't be discoverable. with an IDE, i can see all of the APIs available for a given class. with this util organization, I won't be able to do that.
also, what's wrong with another class for just two methods? seems fine with me
not a blocker. we can decide and refactor later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are not APIs, just utility methods in the expression evaluator which is a table client implementation.
Anyways, if this becomes a problem for custom TableClient
implementations, we can add the BinaryExpression
.
* a column vector with decoded values according to the given partition type. | ||
*/ | ||
static ColumnVector eval(ColumnVector input, DataType partitionType) { | ||
return new ColumnVector() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicitly document which data types are not yet supported? (or, will likely never be supported?) arrays, maps, structs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already documented on the PartitionValueExpression
.
...st/scala/io/delta/kernel/defaults/internal/expressions/DefaultExpressionEvaluatorSuite.scala
Outdated
Show resolved
Hide resolved
3d841b8
to
f6d2351
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but left 1 comment
Which Delta project/connector is this regarding?
Description
Part of #2071 (Partition Pruning in Kernel). We need the following two expressions in order to evaluate predicate on scan file columnar batch data.
element_at(map_column, key_value)
: Take input amap
type column andkey
value, return thevalue
for the givenkey
. This is similar to Apache Spark UDF for similar purposes. This expression will be used to retrieve the specific partition value from the(partition column name -> string serialized partition)
mappartition_value(string_type_value, datatype)
: Decode the partition value given as a string into the given datatype format. The interpretation of the string value is according to the Delta protocol.How was this patch tested?
Added UTs