How to run it? #19

anki-code · 2023-11-13T16:05:45Z

hi @ThaminduR! Thank you for your work here!

I'm trying to repeat the examples using jupyter/pyspark-notebook:spark-2 docker container with PySpark 2.4.5 and Python 3.7.6 (as required in the readme) but have no success. I tried many things to run it but I got errors again and again.

Is there a way to have step by step guide or docker container for test the code?

What I did:

# Run container
docker run --rm -it --entrypoint /bin/bash jupyter/pyspark-notebook:spark-2

apt update && apt install -y git vim
pip install -U pip

# Install dependencies manually
pip install -U pandas>=1.1 pyarrow diffprivlib==0.2.1 tabulate==0.8.7 mypy>=0.770 kmodes

# Install `spark-privacy-preserver`
git clone https://github.com/ThaminduR/spark-privacy-preserver
cd spark-privacy-preserver
pip install --no-deps .
pyspark
# Run the code from mondrian_preserver demo.ipynb

The line:

dfn = Preserver.k_anonymize(df, k, feature_columns, sensitive_column, categorical, schema)
dfn.show()

Output:

ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 13) java.lang.IllegalArgumentException

Could you please help with environment setup and runnning? Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run it? #19

How to run it? #19

anki-code commented Nov 13, 2023 •

edited

Loading

How to run it? #19

How to run it? #19

Comments

anki-code commented Nov 13, 2023 • edited Loading

anki-code commented Nov 13, 2023 •

edited

Loading