Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle error when trying to append to existing deltatable #87

Open
Thodorissio opened this issue Dec 16, 2024 · 0 comments
Open

Pickle error when trying to append to existing deltatable #87

Thodorissio opened this issue Dec 16, 2024 · 0 comments

Comments

@Thodorissio
Copy link

First of all I would like to thank you for you awesome contributions. During my development I came across the following issue.

Description

When trying to append to existing DeltaTable the following error occurs:

TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 3 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x1384dcc0ad0>\n 0. 1341334790528\n 1. finalize-02082eb4-e53c-4b1a-83dc-fb753d3f60dc\n 2. _commit-94f97f6c-675a-47c6-88a7-82b1a5234034\n>')

Reproducible Example

import pandas as pd
import dask.dataframe as dd
import dask_deltatable as ddt

from distributed import Client
from deltalake import DeltaTable

output_table = "./animals"


if __name__ == "__main__":
    client = Client()
    print(f"Dask Client: {client}")

    animals_df = pd.DataFrame(
        {
            "name": ["dog", "cat", "whale", "elephant"],
            "life_span": [13, 15, 90, 70],
        },
    )

    animals_ddf = dd.from_pandas(animals_df)
    animals_ddf["high_longevity"] = animals_ddf["life_span"] > 40
    ddt.to_deltalake(
        table_or_uri=output_table,
        df=animals_ddf,
        compute=True,
        mode="append",
    )

    delta_table = DeltaTable(output_table)
    delta_table_df = delta_table.to_pandas()
    print("Created DeltaTable:")
    print(delta_table_df)

    more_animals_df = pd.DataFrame(
        {
            "name": ["shark", "parrot"],
            "life_span": [20, 50],
        },
    )

    more_animals_ddf = dd.from_pandas(more_animals_df)
    more_animals_ddf["high_longevity"] = more_animals_ddf["life_span"] > 40
    ddt.to_deltalake(
        table_or_uri=output_table,
        df=more_animals_ddf,
        compute=True,
        mode="append",
    )

Stacktrace

Dask Client: <Client: 'tcp://127.0.0.1:60355' processes=4 threads=12, memory=31.90 GiB>
Created DeltaTable:
       name  life_span  high_longevity
0       dog         13           False
1       cat         15           False
2     whale         90            True
3  elephant         70            True
2024-12-16 13:58:42,783 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 3 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x1e7f9b8b530>
 0. 2095838501424
 1. finalize-54be334c-3207-4a95-8908-1aac80f5edb6
 2. _commit-2c7fae99-a722-4a11-8b99-ca2120ebbb4d
>.
Traceback (most recent call last):
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 60, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 65, in dumps
    pickler.dump(x)
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 77, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1529, in dumps
    cp.dump(obj)
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1295, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
Traceback (most recent call last):
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 60, in dumps
    result = pickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 65, in dumps
    pickler.dump(x)
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\serialize.py", line 366, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\serialize.py", line 78, in pickle_dumps
    frames[0] = pickle.dumps(
                ^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 77, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1529, in dumps
    cp.dump(obj)
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1295, in dump
    return super().dump(obj)
           ^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\thodo\Documents\libra\myenv\append_issue.py", line 45, in <module>
    ddt.to_deltalake(
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\dask_deltatable\write.py", line 239, in to_deltalake
    result = result.compute()
             ^^^^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\dask\base.py", line 372, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\dask\base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\serialize.py", line 392, in serialize
    raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 3 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x1e7f9b8b530>\n 0. 2095838501424\n 1. finalize-54be334c-3207-4a95-8908-1aac80f5edb6\n 2. _commit-2c7fae99-a722-4a11-8b99-ca2120ebbb4d\n>')

Library Versions

dask==2024.11.2
dask-deltatable==0.3.3
deltalake==0.22.3
distributed==2024.11.2
pandas==2.2.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant