Skip to content

A sweet library to generate parquet files as per the required schema

Notifications You must be signed in to change notification settings

Meghajit/Parquet-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parquet Generator

This is a simple library which helps to generate parquet files containing the required data and as per the schema.

Design

The library consists of a single concrete class named ParquetGenerator. It has 3 methods with their contract like this:

public class ParquetGenerator {
     public ParquetGenerator(String filePath, MessageType schema) throws IOException();
     public void writeToFile(SimpleGroup simpleGroup) throws IOException();
     public void closeWriter() throws IOException();
}
  • The constructor takes 2 arguments:
    • filePath: This is the required path to the output parquet file where the data needs to be written. The file will be generated automatically and need not be created prior.
    • schema: This is the schema of the parquet file denoted as a MessageType object.
  • The writeToFile method takes a single SimpleGroup object. This object contains the entire data required to be written to the file.
  • The closeWriter method should be invoked when the write job is done. This will close the open stream to the file and close the ParquetWriter.

Usage

The library can be built into a jar and then imported as a dependency into the application. Publishing as a maven package hasn't been implemented yet. Otherwise, the library also a single test class ParquetGeneratorTest which can be utilised to generate the parquet file.

The test class has ample examples on how to generate different kinds of parquet schema, how to build simple group based on that schema and using the library to build a parquet file.

About

A sweet library to generate parquet files as per the required schema

Topics

Resources

Stars

Watchers

Forks

Languages