You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using the built-in support of Trino to access existing Hive views. Under the hood, Trino uses Coral to perform this conversion. Syntactically, most queries work out perfectly fine. But now we encountered a case, where the behaviour of the same SQL expression differs between Trino and Hive/Impala/Spark.
Problem
Hive/Impala support a wider range of timestamp formats in a CAST(STRING AS TIMESTAMP) expression. Specifically, the following two CAST operations work in Hive/Impala/Spark, but not in Trino:
SELECT CAST('2020-08-27T13:58:02.483Z' AS TIMESTAMP);
SELECT CAST('2020-08-27T13:58:02.483' AS TIMESTAMP);
The problem is the "T" in between the date and the time. The following two queries work as expected:
SELECT CAST('2020-08-27 13:58:02.483Z' AS TIMESTAMP);
SELECT CAST('2020-08-27 13:58:02.483' AS TIMESTAMP);
Unfortunately, we have both datetime formats in our data and therefore cannot simply pick one of them.
Moreover, a failing conversion results in a query failure in Trino, while Hive/Impala/Spark will simply produce NULL values (quite debatable which behaviour is preferred in the context of a Hive to Trino conversion).
Suggested Solution
I was experimenting with different expressions in Trino for implementing a conversion, which supports both value types, and came up with the following expression:
SELECT COALESCE(TRY(CAST('2020-08-27T13:58:02.483Z' AS TIMESTAMP)), CAST(TRY(from_iso8601_timestamp('2020-08-27T13:58:02.483Z')) AS TIMESTAMP WITHOUT TIME ZONE));
Of course, the real SQL wouldn't use fixed strings, but a column from the source table instead. This expression provides a compatible behaviour in the sense that:
It supports two different time formats (with and without a "T")
It doesn't fail the query on failed conversions
Suggested Implementation
I know that this is a very special corner case, and I was wondering where this expression replacement should be implemented. I would not suggest to implement it in the Hive to Coral translation, and I also would not suggest to implement it in the Coral to Trino translation. Instead, I propose to implement that very special rule in the wrapper Hive to Trino translation class by implementing an additional translation pass.
If you agree on this approach, I am optimistic that I can provide a corresponding pull-request. But I first want to discuss the issue with you to avoid going into a wrong direction.
The text was updated successfully, but these errors were encountered:
Background
We are using the built-in support of Trino to access existing Hive views. Under the hood, Trino uses Coral to perform this conversion. Syntactically, most queries work out perfectly fine. But now we encountered a case, where the behaviour of the same SQL expression differs between Trino and Hive/Impala/Spark.
Problem
Hive/Impala support a wider range of timestamp formats in a CAST(STRING AS TIMESTAMP) expression. Specifically, the following two CAST operations work in Hive/Impala/Spark, but not in Trino:
The problem is the "T" in between the date and the time. The following two queries work as expected:
Unfortunately, we have both datetime formats in our data and therefore cannot simply pick one of them.
Moreover, a failing conversion results in a query failure in Trino, while Hive/Impala/Spark will simply produce NULL values (quite debatable which behaviour is preferred in the context of a Hive to Trino conversion).
Suggested Solution
I was experimenting with different expressions in Trino for implementing a conversion, which supports both value types, and came up with the following expression:
Of course, the real SQL wouldn't use fixed strings, but a column from the source table instead. This expression provides a compatible behaviour in the sense that:
Suggested Implementation
I know that this is a very special corner case, and I was wondering where this expression replacement should be implemented. I would not suggest to implement it in the Hive to Coral translation, and I also would not suggest to implement it in the Coral to Trino translation. Instead, I propose to implement that very special rule in the wrapper Hive to Trino translation class by implementing an additional translation pass.
If you agree on this approach, I am optimistic that I can provide a corresponding pull-request. But I first want to discuss the issue with you to avoid going into a wrong direction.
The text was updated successfully, but these errors were encountered: