Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run it? #19

Open
anki-code opened this issue Nov 13, 2023 · 0 comments
Open

How to run it? #19

anki-code opened this issue Nov 13, 2023 · 0 comments

Comments

@anki-code
Copy link

anki-code commented Nov 13, 2023

hi @ThaminduR! Thank you for your work here!

I'm trying to repeat the examples using jupyter/pyspark-notebook:spark-2 docker container with PySpark 2.4.5 and Python 3.7.6 (as required in the readme) but have no success. I tried many things to run it but I got errors again and again.

Is there a way to have step by step guide or docker container for test the code?

What I did:

# Run container
docker run --rm -it --entrypoint /bin/bash jupyter/pyspark-notebook:spark-2

apt update && apt install -y git vim
pip install -U pip

# Install dependencies manually
pip install -U pandas>=1.1 pyarrow diffprivlib==0.2.1 tabulate==0.8.7 mypy>=0.770 kmodes

# Install `spark-privacy-preserver`
git clone https://github.com/ThaminduR/spark-privacy-preserver
cd spark-privacy-preserver
pip install --no-deps .
pyspark
# Run the code from mondrian_preserver demo.ipynb

The line:

dfn = Preserver.k_anonymize(df, k, feature_columns, sensitive_column, categorical, schema)
dfn.show()

Output:

ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 13) java.lang.IllegalArgumentException

Could you please help with environment setup and runnning? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant