Snappy for compression codec using velox as backend #6697

himanshu-zetta · 2024-08-02T12:57:11Z

himanshu-zetta
Aug 2, 2024

I need to run gluten on spark with snappy as shuffle compression codec.
As gluten velox backends api support only lz4 and zstd as shown in the figure below:

so, I added snappy in list using this Set("lz4", "zstd", "snappy").

This compiled and I was able to run the spark queries with spark.io.compression.codec=snappy.
But the issue I'm facing which seems like there is no compression happening during snappy with gluten as when I compared the shuffle write with base I observed ~2x write in case of gluten.

type	input size	shuffle read	shuffle write
snappy (vanilla-spark)	94.9 GiB	234.9 GiB	118.9 GiB
snappy (gluten)	93.3 GiB	379.9 GiB	191.8 GiB

@PHILO-HE , @FelixYBW, @weiting-chen could you help me, what else I need to do to use snappy codec for shuffle compression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snappy for compression codec using velox as backend #6697

{{title}}

Replies: 0 comments

Select a reply

Snappy for compression codec using velox as backend #6697

himanshu-zetta Aug 2, 2024

Replies: 0 comments

himanshu-zetta
Aug 2, 2024