-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add test utilities like checkAnswer and checkTable #2034
[Kernel] Add test utilities like checkAnswer and checkTable #2034
Conversation
// TODO: we could make this extend Row and create a way to generate Seq(Any) from Rows but it | ||
// would complicate a lot of the code for not much benefit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of generating Seq[Any]
?
What's the harm of not being able to do so now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of generating Seq[Any]?
To compare Rows we need a way to turn them into iterables. So either
- Convert all read Rows --> TestRows (which can be converted to iterable) and then compare with TestRows created in tests
- Make TestRow extend Row. Convert both read Rows and TestRows to iterable and then compare
What's the harm of not being able to do so now?
Differences if we did this
- Create rows in tests with
Row(0, 1, ...)
instead ofTestRow(0, 1, ...)
(minor difference) TestRow
needs to implement all methods inRow
(annoying)- Instead of converting
Row
-->TestRow
and then comparing, we'd have some other method to go fromRow
--> iterable (pretty equivalent)- Note we'd still use
TestRow
throughout since we need to create rows inprepareRow
etc
- Note we'd still use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly lost my train of thought on this PR since it's been a while; going to double check this tomorrow
|
||
val scanState = scan.getScanState(tableClient); | ||
val fileIter = scan.getScanFiles(tableClient) | ||
// TODO serialize scan state and scan rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this working now? what benefit does implementing this TODO give?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to serialize the scan data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copied over from before, I think to make this a more realistic end-to-end test. No practical purpose I can remove it.
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments but looks great!
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/DeletionVectorSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestRow.scala
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestRow.scala
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Outdated
Show resolved
Hide resolved
|
||
val scanState = scan.getScanState(tableClient); | ||
val fileIter = scan.getScanFiles(tableClient) | ||
// TODO serialize scan state and scan rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to serialize the scan data?
|
||
fileIter.forEach { fileColumnarBatch => | ||
// TODO deserialize scan state and scan rows | ||
val dataBatches = Scan.readData( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close dataBatches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's closed as part of the .forEach
here https://github.com/delta-io/delta/pull/2034/files/7905c4d8e2d3c66d1a78e37c05cb6b02211cf9aa#diff-4b532e689ace6208898c69a5aa728bb7e76f97d08a39cc1632946c59f73f56a3R39
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/utils/TestUtils.scala
Show resolved
Hide resolved
// For binary arrays, we convert it to Seq to avoid of calling java.util.Arrays.equals for | ||
// equality test. | ||
val converted = answer.map(prepareRow) | ||
converted.sortBy(_.toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a way to differentiate between the long
and the int
values which failed the comparison (expected a long value, but got int from kernel). In the printed error message it will look the same and can't tell what is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's what the TODO here is for; was trying to time-box this as it got more complex on what the best way to do that was
Which Delta project/connector is this regarding?
Description
Improves the testing infrastructure for Scala tests in Delta Kernel.
For now adds it to
kernel-defaults
but if we have tests withColumnarBatch
s inkernel-api
we can move it there.How was this patch tested?
Refactors existing tests to use the new infra.