Write excel in single file with v2 #549
Replies: 1 comment 6 replies
-
@quanghgx @nightscape Would you be interested in having this functionality in spark-excel v2? The client code would look like: For the rename and cleanup I can extend org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter and overwrite commitJob and register the ouputCommiter in com.crealytics.spark.v2.excel.ExcelWriteBuilder (similar to what is happening in org.apache.spark.sql.execution.datasources.v2.parquet.ParquetWrite with org.apache.parquet.hadoop.ParquetOutputCommitter) Some other improvements that would be nice but I'm not sure how/where to implement are (any suggestions?).:
Not sure if this is something that would be useful to others. Any feedback is welcomed. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I see that, in v2, write excel writes multiple files (one per partition). I know that's consistent with the behavior for json, parquet, etc but is there any chance you'll provide an option or something to write to a single file?
I'm aware that you can do .coalesce(1) before the write to get a single file but you still have a random name for the actual file. If I want to save the file with a predefined name the only way I see is to do some extra steps like determine, rename and move the file, delete the generated folder, etc.
Any thoughts?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions