Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request][Kernel] Investigate whether we need Cast as an expression in Kernel #2043

Open
2 of 8 tasks
vkorukanti opened this issue Sep 12, 2023 · 0 comments
Open
2 of 8 tasks
Labels
enhancement New feature or request kernel
Milestone

Comments

@vkorukanti
Copy link
Collaborator

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

Currently, we don't have the Cast as a first-class expression support in Kernel expressions (See #1997). The default expression handler can handle implicit cast where a type is being upcasted (eg. int to long).

This task is to investigate whether we need to export the Cast within the Kernel expression, similar to Spark DSv2.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@vkorukanti vkorukanti added the enhancement New feature or request label Sep 12, 2023
@vkorukanti vkorukanti added this to the 3.1.0 milestone Sep 12, 2023
vkorukanti added a commit that referenced this issue Sep 12, 2023
## Description
* Refactor the Kernel expression interfaces. Currently, the `Expression` base interface contains a few unnecessary methods such as `dataType` and `eval`. Keep the `Expression` to a minimum, so that it is just used to represent a `SQL` string version of the expression in Kernel `Expression` classes.
```
interface Expression {
  /**
    * @return a {@link List} of the immediate children of this node
    */
  List<Expression> children();
}
```
* Introduce a subtype of `Expression` called `ScalarExpression` which is a base class for all scalar expressions. 
* Introduce a subtype of `ScalarExpression` called `Predicate` as the base expression class for scalar expression that evaluates to `boolean` type. The `Predicate` is defined such that it takes a generic expression name and any number of input expressions. It is up to the evaluator to make sure the given `Predicate` is evaluable. Currently `Predicate` only allows a subset of expressions (`=`, `<`, `<=`, `>`, `>=`, `AND`, `OR`, `ALWAYS_TRUE`, `ALWAYS_FALSE`) as of now. In the future, this can be extended to support more predicate expressions with minimal code changes.
* Update scan-related APIs to `Predicate` instead of `Expression`.
* Remove the use of `Literal.FALSE` and `Literal.TRUE` and instead use `AlwaysTrue.ALWAYS_TRUE` and `AlwaysFalse.ALWAYS_FALSE`. `Literal` is not a predicate.
* Extract the expression evaluation from `kernel-api` into `kernel-defaults`. 
   * `DefaultExpressionEvaluator` validates the expression and adds necessary implicit casts to allow evaluation.

TODO (will be addressed after this PR is landed):
* It is not clear now whether we need the `CAST` expression as a first class `Expression` in the `kernel-api` module. If needed in the future, we can add one (#2043).
* Implicit cast in `kernel-default` may need to support more type conversions, especially around the Decimal type (#2044)
* Add support for nested column reference in `Column` expression (#2040).
* Implicit cast of `DefaultExpressionEvaluator` output type to expected type (#2047)

## How was this patch tested?
Moved the existing Java-based test to Scala and also added new tests (some of them are copied over from the standalone `ExpressionSuite` and updated).
@vkorukanti vkorukanti modified the milestones: 3.1.0, 3.2.0 Jan 16, 2024
@vkorukanti vkorukanti modified the milestones: 3.2.0, 3.3.0 Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request kernel
Projects
None yet
Development

No branches or pull requests

1 participant