Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Add test utilities like checkAnswer and checkTable #2034

Merged
merged 4 commits into from
Sep 15, 2023

Conversation

allisonport-db
Copy link
Collaborator

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Improves the testing infrastructure for Scala tests in Delta Kernel.

For now adds it to kernel-defaults but if we have tests with ColumnarBatchs in kernel-api we can move it there.

How was this patch tested?

Refactors existing tests to use the new infra.

@allisonport-db allisonport-db changed the title [Kernel] [Kernel] Add test utilities like checkAnswer and checkTable Sep 8, 2023
Comment on lines +44 to +45
// TODO: we could make this extend Row and create a way to generate Seq(Any) from Rows but it
// would complicate a lot of the code for not much benefit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of generating Seq[Any]?
What's the harm of not being able to do so now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of generating Seq[Any]?

To compare Rows we need a way to turn them into iterables. So either

  1. Convert all read Rows --> TestRows (which can be converted to iterable) and then compare with TestRows created in tests
  2. Make TestRow extend Row. Convert both read Rows and TestRows to iterable and then compare

What's the harm of not being able to do so now?

Differences if we did this

  • Create rows in tests with Row(0, 1, ...) instead of TestRow(0, 1, ...) (minor difference)
  • TestRow needs to implement all methods in Row (annoying)
  • Instead of converting Row --> TestRow and then comparing, we'd have some other method to go from Row --> iterable (pretty equivalent)
    • Note we'd still use TestRow throughout since we need to create rows in prepareRow etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly lost my train of thought on this PR since it's been a while; going to double check this tomorrow


val scanState = scan.getScanState(tableClient);
val fileIter = scan.getScanFiles(tableClient)
// TODO serialize scan state and scan rows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this working now? what benefit does implementing this TODO give?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to serialize the scan data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied over from before, I think to make this a more realistic end-to-end test. No practical purpose I can remove it.

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but looks great!


val scanState = scan.getScanState(tableClient);
val fileIter = scan.getScanFiles(tableClient)
// TODO serialize scan state and scan rows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to serialize the scan data?


fileIter.forEach { fileColumnarBatch =>
// TODO deserialize scan state and scan rows
val dataBatches = Scan.readData(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close dataBatches

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// For binary arrays, we convert it to Seq to avoid of calling java.util.Arrays.equals for
// equality test.
val converted = answer.map(prepareRow)
converted.sortBy(_.toString())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a way to differentiate between the long and the int values which failed the comparison (expected a long value, but got int from kernel). In the printed error message it will look the same and can't tell what is wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what the TODO here is for; was trying to time-box this as it got more complex on what the best way to do that was

@allisonport-db allisonport-db merged commit 19b6c9e into delta-io:master Sep 15, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants