Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read footer using 1 call readFully(byte[8]) instead of 5 calls ( 4 x read() for footer length + 1 x read(byte[4]) for magic marker ) #3074

Open
Arnaud-Nauwynck opened this issue Nov 23, 2024 · 0 comments

Comments

@Arnaud-Nauwynck
Copy link

Arnaud-Nauwynck commented Nov 23, 2024

Describe the enhancement requested

This is a minor performance improvement, but worth when reading many files.
read footer using 1 call readFully(byte[8]) instead of 5 calls ( 4 x read() for footer length + 1 x read(byte[4]) for magic marker )

in summary the patch is for file ParquetFileReader.java, method "readFooter()" :

--- a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
+++ b/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
@@ -585,14 +585,18 @@ public class ParquetFileReader implements Closeable {
     }

     // Read footer length and magic string - with a single seek
-    byte[] magic = new byte[MAGIC.length];
-    long fileMetadataLengthIndex = fileLen - magic.length - FOOTER_LENGTH_SIZE;
+    long fileMetadataLengthIndex = fileLen - MAGIC.length - FOOTER_LENGTH_SIZE;
     LOG.debug("reading footer index at {}", fileMetadataLengthIndex);
     f.seek(fileMetadataLengthIndex);
-    int fileMetadataLength = readIntLittleEndian(f);
-    f.readFully(magic);
+    byte[] magicAndLengthBytes = new byte[FOOTER_LENGTH_SIZE + MAGIC.length];
+    f.readFully(magicAndLengthBytes);
+    int fileMetadataLength = readIntLittleEndian(magicAndLengthBytes, 0);

     boolean encryptedFooterMode;
+    // using JDK >= 9: if (Arrays.equals(MAGIC, 0, MAGIC.length, magicAndLengthBytes, FOOTER_LENGTH_SIZE, FOOTER_LENGTH_SIZE + MAGIC.length)) {
+    // using JDK <= 8: need extract sub array then compare
+    byte[] magic = new byte[MAGIC.length];
+    System.arraycopy(magicAndLengthBytes, FOOTER_LENGTH_SIZE, magic, 0, MAGIC.length);
     if (Arrays.equals(MAGIC, magic)) {
       encryptedFooterMode = false;
     } else if (Arrays.equals(EFMAGIC, magic)) {

Component(s)

Core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant