-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks for magnolify-parquet vs parquet-avro R/W #1040
Conversation
* An in-memory Parquet page store modeled after parquet-java's MemPageStore, used to benchmark | ||
* ParquetType conversion between Parquet Groups and Scala case classes | ||
*/ | ||
class ParquetInMemoryPageStore(rowCount: Long) extends PageReadStore with PageWriteStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These classes are heavily based on this parquet-java package, which sadly are not a part of any artifact: https://github.com/apache/parquet-java/tree/master/parquet-column/src/test/java/org/apache/parquet/column/page/mem
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1040 +/- ##
=======================================
Coverage 95.50% 95.50%
=======================================
Files 56 56
Lines 1980 1980
Branches 186 186
=======================================
Hits 1891 1891
Misses 89 89 ☔ View full report in Codecov by Sentry. |
Adds benchmarks for Parquet read/write performance, for both magnolify-parquet and parquet-avro (although we don't own parquet-avro, it's helpful to compare against IMO).
Parquet is a little tricky in that it doesn't have a granular "write/read a single record to/from a file" operation due to its complex file structure/encodings. This benchmark sets up an in-memory page store that can can read or write Parquet "groups", which are Parquet's internal record structure. Read/write is invoked with a record type
T
and a matchingRecordConverter[T]
, which converts either case classes (magnolify-parquet) or Avro records (parquet-avro) into Parquet groups. Thus, what we're benchmarking here is Group-to-record and record-to-Group conversion, which is the core functionality of magnolify-parquet 👍Results (run locally w 64GB M1 mac + OpenJDK 17.0.5):