-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Support external DSV2 catalog in RESTORE & Vacuum command #2036
[Spark] Support external DSV2 catalog in RESTORE & Vacuum command #2036
Conversation
tt.timestamp, | ||
tt.version, | ||
tt.creationSource)) | ||
|
||
case ct @ CloneTableStatement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move the resolution of CloneTableStatement to ResolveDeltaIdentifier as well.
@@ -498,8 +498,8 @@ class DeltaVacuumSuite | |||
val e = intercept[AnalysisException] { | |||
vacuumSQLTest(tablePath, viewName) | |||
} | |||
assert(e.getMessage.contains("not found") || | |||
e.getMessage.contains("TABLE_OR_VIEW_NOT_FOUND")) | |||
assert(e.getMessage.contains("VACUUM is only supported for Delta tables.") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is better now when vacuuming on a view.
spark/src/main/scala/org/apache/spark/sql/delta/ResolveDeltaIdentifier.scala
Outdated
Show resolved
Hide resolved
val optionalDeltaTableV2 = resolveAsTable(unresolvedId) | ||
|
||
// If the identifier is not a Delta table, try to resolve it as a Delta file table. | ||
val deltaTableV2 = optionalDeltaTableV2.getOrElse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what circumstances would resolveAsTable
return None for a valid (path-based?) Delta table?
Or, put another way, what circumstances would mean ResolveRelations
fails to handle an UnresolvedTable
that references a path-based Delta table?
If such circumstances exist, can/should we fix table resolution directly, instead of compensating here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is mentioned in the comment of resolveAsTable
. The short answer is that the table identifier doesn't exist in spark catalogs.
// Resolve the identifier as a table in the Spark catalogs.
// Return None if the table doesn't exist.
// If the table exists but is not a Delta table, throw an exception.
// If the identifier is a view, throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If such circumstances exist, can/should we fix table resolution directly, instead of compensating here?
What do you mean by "fix table resolution directly"?
I believe there is a corner case here.
In Spark, You can create a table /tmp/foo
under a database delta
use delta;
create table `/tmp/foo` using delta;
So we always have to look up as a catalog table, and try loading it as a path if the table doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If the identifier is a view, throw an exception. | ||
private def resolveAsTable(unresolvedId: UnresolvedDeltaIdentifier): Option[DeltaTableV2] = { | ||
val unresolvedTable = | ||
UnresolvedTable(unresolvedId.nameParts, unresolvedId.commandName, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this an UnresolvedRelation
before? Are we changing it to UnresolvedTable
intentionally?
If so, why needed, and what are the side effects of that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR, I introduce a new logical plan UnresolvedDeltaIdentifier
. I didn't make it UnresolvedRelation
because of resolution order matters for the corner case I mentioned above #2036 (comment) .
The code here coverts a UnresolvedDeltaIdentifier
to UnresolvedTable
and uses the Spark analyzer rule to resolve the relation.
@@ -32,28 +32,14 @@ import org.apache.spark.sql.types.StringType | |||
* }}} | |||
*/ | |||
case class VacuumTableCommand( | |||
path: Option[String], | |||
table: Option[TableIdentifier], | |||
pathToVacuum: Path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this just be a RunnableCommand with UnaryLike
, whose child starts out as an UnresolvedDeltaPathOrIdentifier
the sql parser creates?
(see e.g. how OPTIMIZE command was recently upgraded to handle this situation -- the sql parser just has to call that helper method)
If we go that route, we won't need the VacuumTableStatement
any more.
If it doesn't work for some reason, then I suspect OptimizeCommand
will also need whatever fix we come up with in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I took a look at the implementation of VacuumTableCommand, it only cares about the pathToVacuum
.
I didn't use UnresolvedDeltaPathOrIdentifier
since I introduced UnresolvedDeltaIdentifier
in this PR.
…entifier.scala Co-authored-by: Ryan Johnson <[email protected]>
@ryan-johnson-databricks Thanks for the review. We need to decide whether we need to take care of the corner case I mentioned in the PR comment. |
Which Delta project/connector is this regarding?
Description
Support external DSV2 catalog in RESTORE & Vacuum command. After the changes, the restore command supports tables from external DSV2 catalog.
For example, with
We can query
Or simply
-->
How was this patch tested?
Does this PR introduce any user-facing changes?
Yes, users can use RESTORE & Vacuum command on the tables of their external DSV2 catalogs.