-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta log getting too big, resulting in spark job failures while writing. #779
Comments
Hi @nnani - there are a number of questions here and it may be worth pinging us in the Delta Users Slack.
|
@dennyglee Is delta.logRetentionDuration property supported in 0.6.1 version? |
We started supporting |
The checkpoint.parquet file size of one of my delta tables had reached 118M, which caused my spark program to process each batch slowly. One of the job that merged transaction logs was executed for 1min each time.
|
bump |
I have a delta lake table with 100s of checkpoint files created per minute. The delta log folder is reaching over 8 terabytes and 3 million files. The table itselfs is about 1 terabyte. The table is now beyond vacuum because the driver crashes, probably due to the vast number of checkpoint files. |
Hello,
We have been using delta library for more than 2 years now on HDI Cluster. Recently we came across few cases where the spark job starts failing when trying to append data to a existing partitioned table. It fails with java.lang.OutOfMemory: Jave heap size at ...... issue.
Delta library used - 0.5 version
Table partitioned on 3 columns.
We tried querying this delta table through Jupyter and when no filters applied, it fails with same error.
After searching for this issue, it looks like the delta library is trying read / store the list of parquet files that need to be scanned into an array, however its failing to do so.
When we try with huge (10GB) driver memory the spark job goes through. However, we cannot afford to allocate that much of driver memory due to # of jobs and infra limitations.
Based on this we have the below questions. It would be nice if you can help answer the doubts
Note - We have already vacuumed all the data for these tables.
The text was updated successfully, but these errors were encountered: