Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Support external DSV2 catalog in RESTORE command #2033

Closed

Conversation

gengliangwang
Copy link
Contributor

@gengliangwang gengliangwang commented Sep 8, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

How was this patch tested?

  1. A new end-to-end test
  2. A parser test case

Does this PR introduce any user-facing changes?

Yes, users can use RESTORE command on the tables of their external DSV2 catalogs.

@gengliangwang gengliangwang changed the title [SPARK] Support external DSV2 catalog in RESTORE command [Spark] Support external DSV2 catalog in RESTORE command Sep 8, 2023
properties: java.util.Map[String, String]): Table = {
val tablePath = getTablePath(ident.name())
// Create an empty Delta table on the tablePath
spark.range(0).write.format("delta").save(tablePath.toString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the passed-in schema? Otherwise it takes a DDL later to update it with info we already had...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can just create the DTV2 and call it good -- Delta knows how to handle the new/empty/missing directory case, tho it won't let you read such tables. Which comes back to the first comment -- if the table needs to be readable after this, it needs the correct schema, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the passed-in schema?

Since it is a dummy catalog, I try to make the schema fixed as "id: long"

Alternatively, we can just create the DTV2

Could you show me some details of how to use it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeltaTableV2(spark, tablePath.toString) should suffice? See DeltaTableV2.scala

@gengliangwang
Copy link
Contributor Author

FYI I am closing this and continuing my works on #2036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants