-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for pandas DataFrame subclasses #52
Comments
The Lines 194 to 203 in 9c9ba0a
Currently only the actual underlying data is stored, and so when deserializing, I don't think partd knows anything about the original class? So if we want to change that, it would need to start store some additional information? An alternative might be to tackle this on the dask side, and ensure the retrieved part is of the same type as |
When dask uses partd for eg shuffle operations, the dataframes always come back as a
pandas.DataFrame
, even if a subclass was stored (xref geopandas/dask-geopandas#59 (comment)).For example:
To be able to use dask's shuffle operations with
dask_geopandas
, which uses a pandas subclass as the partition type, the subclass should be preserved in the partd roundtrip (or are there other ways that you can override / dispatch this operation in dask?).I was wondering how other dask.dataframe subclasses handle this, but eg
dask_cudf
doesn't seem to support "disk"-based shuffling.The text was updated successfully, but these errors were encountered: