-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
append corrupts _metadata and _common_metadata files #807
Comments
Quick question: how do you know that the _common_metadata file is also corrupted? There isn't really any reason for an append operation to touch this file at all (but I haven't checked whether it does or not). |
I should have mentioned an obvious workaround, if you have not time to debug further: just write separate data files in the same directory without bothering to use append on the whole dataset. Without (_common)_metadata, a dataset is just the total of the .parquet files in a directory. However, was there no error/warning during append before the data became unreadable? |
@yohplala , it's probably time we follow suit from dask, spark, arrow... and explicitly allow for write and append without creating the global _metadata. It could even be the default, although fastparquet is more likely to be able to fit the whole dataset |
|
no worries, I appreciate your taking the time to report.
By the time of the error, the metadata file is apparently already corrupt. I was wondering if there was any warning on the previous iteration. The next step would be to try to make a reproducer on fake/public data, since, as @yohplala points out in the partner issue, many-iteration appends do normally succeed. |
Hi @martindurant , My 1st reaction is when PART_ID = re.compile(r'.*part.(?P<i>[\d]+).parquet$') Other than that, I am guessing changes are mostly in At write time, given a "classical" |
What happened:
saving dataframe chunks as rowgroups with the code below:
after 215 appends, the append write corrupts the _metadata and _common_metadata files and cannot append any more.
Environment:
The text was updated successfully, but these errors were encountered: