Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/spark] Python version updated from 3.9 to 3.11, which is not supported in Spark prior to 3.4 #52170

Closed
pawel-big-lebowski opened this issue Oct 26, 2023 · 7 comments
Assignees
Labels
solved spark stale 15 days without activity tech-issues The user has a technical issue about an application

Comments

@pawel-big-lebowski
Copy link

Name and Version

bitnami/spark:3.3.3, bitnami/spark:3.2.4, probably bitnami/spark < 3.4

What architecture are you using?

amd64

What steps will reproduce the bug?

The image got Python version updated from 3.9 to 3.11: https://github.com/bitnami/containers/pull/52152/files#diff-d95248e117d08ad609675905dc5c243f87eca478a7529f91190c2edd974d8f07R31

Spark versions prior 3.4 do not support it: apache/spark#38987

Simple Spark code:

people = spark.createDataFrame([
  {"name":"Bilbo Baggins", "age": 50},
  {"name":"Gandalf", "age":1000}
])

leads to

Traceback (most recent call last):
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 458, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 602, in dump
    return Pickler.dump(self, obj)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 692, in reducer_override
    return self._function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 565, in _function_reduce
    return self._dynamic_function_reduce(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 546, in _dynamic_function_reduce
    state = _function_getstate(func)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 157, in _function_getstate
    f_globals_ref = _extract_code_globals(func.__code__)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle.py", line 334, in _extract_code_globals
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle.py", line 334, in <dictcomp>
    out_names = {names[oparg]: None for _, oparg in _walk_global_ops(co)}
                 ~~~~~^^^^^^^
IndexError: tuple index out of range
Traceback (most recent call last):
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 458, in dumps
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 602, in dump
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 692, in reducer_override
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 565, in _function_reduce
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 546, in _dynamic_function_reduce
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 157, in _function_getstate
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle.py", line 334, in _extract_code_globals
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle.py", line 334, in <dictcomp>
IndexError: tuple index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/spark_scripts/spark_kafka.py", line 18, in <module>
    people = spark.createDataFrame([
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 894, in createDataFrame
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 938, in _create_dataframe
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 3113, in _to_java_object_rdd
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 3505, in _jrdd
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 3362, in _wrap_function
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 3345, in _prepare_for_python_RDD
  File "/opt/bitnami/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 468, in dumps
_pickle.PicklingError: Could not serialize object: IndexError: tuple index out of range

What is the expected behavior?

Get back previous python version which is supported by that image.

What do you see instead?

Additional information

@pawel-big-lebowski pawel-big-lebowski added the tech-issues The user has a technical issue about an application label Oct 26, 2023
@github-actions github-actions bot added the triage Triage is needed label Oct 26, 2023
@pawel-big-lebowski
Copy link
Author

@javsalgar It would be super helpful if you were able to provide us with a hash of the previous image version.

@mdhont
Copy link
Contributor

mdhont commented Nov 5, 2023

We've changed the Python version to 3.10 for Spark versions < 3.4, see

Thanks for reporting this issue

Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Nov 21, 2023
Copy link

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

@bitnami-bot bitnami-bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2023
@DangHoang2109
Copy link

We've changed the Python version to 3.10 for Spark versions < 3.4, see

Thanks for reporting this issue

Hellow @mdhont , will spark-3.3.2 upgraded to python 3.10 too?

@DangHoang2109
Copy link

Can you upgrade the spark-3.3.2 or spark-3.3.0 to python 3.10.x?

@carrodher
Copy link
Member

Thank you for reaching out and for your interest in our project. We'd be happy to assist you, but to do so effectively, we'll need some more details. Could you please create a separate issue filling the Issue template?

Providing this information will help us better understand your situation and assist you in a more targeted manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved spark stale 15 days without activity tech-issues The user has a technical issue about an application
Projects
None yet
Development

No branches or pull requests

7 participants