- Read Parquet files via Hadoop FileSystem.
- Outputs a single record named
record
(type isjson
).
- Plugin type: input
- Resume supported: no
- Cleanup supported: no
- Guess supported: no
config_files
: list of paths to Hadoop's configuration files (array of strings, default:[]
)config
: overwrites configuration parameters (hash, default:{}
)path
: file path on Hdfs. you can use glob pattern (string, required).parquet_log_level
: set log level of parquet reader module (string, default:"INFO"
)- value is one of
java.util.logging.Level
(ALL, SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, OFF)
- value is one of
Describe parquet reader specific configuration.
parquet.read.bad.record.threshold
: Tolerated percent bad records per file (float, 0.0 to 1.0, default:0
)
in:
type: parquet_hadoop
config_files:
- /etc/hadoop/conf/core-site.xml
- /etc/hadoop/conf/hdfs-site.xml
config:
parquet.read.bad.record.threshold: 0.01
path: /user/hadoop/example/data/*.parquet
parquet_log_level: WARNING
$ ./gradlew gem # -t to watch change of files and rebuild continuously
Because to parsing Parquet format needs seekable file stream but parser plugin has only sequential read.
TBD