-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect roundtrip of index names on filtered dataframe #732
Comments
Hello, Is your index correct? (is it what you expect?) (to check there is no bug at this level) You would find the behavior you are expecting by using filtered_df.to_parquet('test.parq', engine='fastparquet', write_index=False) Bests |
Thanks @yohplala, I can see that reasoning and your technical explanation makes sense. However I still disagree that this is expected, I would expect a DataFrame to round-trip exactly as is, i.e. it should pass |
Indeed, I think we can call this a bug. Indeed, parquet requires that the column being saved must have a real str name, but we also save pandas metadata, in which we can give the actual final name of the index. Either we are not writing the metadata, or we are not applying it correctly - can check by doing the roundtrip pyarrow/fastparquet and fastparquet/pyarrow. This behaviour has been around a long time, I think, and there are tests in dask which use both engines and explicitly ignore the name of the index, if it was None. Fixing this might break those tests! Personally, I think "index" is a fine name for an index :) |
@yohplala , you are probably in a good place to ensure |
Hi @martindurant , to be honest, I have no need for this, and am only able to code in spare time, few hours per week. So this will be a very low priority for me. |
No rush! I might do it myself also, but I have a similar problem with finding time :) |
When saving a filtered dataframe to parquet using Pandas and fastparquet the index names are round-tripped incorrectly:
Versions
fastparquet 0.7.2
pandas 1.3.2
The text was updated successfully, but these errors were encountered: