Skip to content

Latest commit

 

History

History
112 lines (90 loc) · 4.71 KB

README.md

File metadata and controls

112 lines (90 loc) · 4.71 KB

Disparate Vulnerability to Membership Inference Attacks

This is the accompanying code to the paper "Disparate Vulnerability to Membership Inference Attacks" which appears in PETS 2022.

The code enables to reproduce all the paper experiments with the corresponding plots and tables.

Setup

Manual

System Requirements. You need Python 3.8 and poetry 1.1.8. You can install poetry, e.g., as follows:

pip install --user poetry==1.1.8

To reproduce the plots exactly, you also need a LaTeX distribution with certain packages. On a Debian-based system, these can be installed as follows:

sudo apt install texlive-latex-extra cm-super texlive-science dvipng

Environment. Use poetry to set up the Python environment:

poetry install

Data. We use ADULT and Texas Hospital Discharge datasets. The ADULT dataset is checked into the repository; to install the Texas Hospital Discharge data use the following command:

make data

Notebooks sync. We use jupytext to automatically convert Jupyter notebooks to Python scripts and keep them in sync. To generate the notebooks initially:

make sync

Docker

Alternatively, we provide a docker image that can be used as:

docker build -t dv .
docker run -it --rm dv <command such as 'make tests'>

The docker image already includes all the steps in the manual setup.

Testing the setup

To test that the setup works as expected, you can use:

make tests

This runs the same scripts that are used for the paper experiments but with fewer models (3 vs 200 models needed for the full reproduction). The tests can take several minutes to run. The tests fail if the command terminates unsuccessfully. Warnings are OK.

Modules and scripts

The repo contains the following relevant modules and directories:

  • mia.py - Implementations of Membership Inference Attacks.
  • model_zoo.py - Definitions of target ML models used in experiments.
  • plot_params.py - Setup of the plot style parameters.
  • plotting.py - Plotting utilities.
  • utils.py - Misc. utilities.
  • data/ - The directory in which make data stores the data files.
  • loaders/ - The directory that contains modules for loading the datasets.
  • results/ - The directory where the experiment data are saved.
  • images/ - The directory where the plots are saved.

Jupyter notebooks (committed as Python scripts, see Notebooks Sync above):

Reproducing the paper results

Launching the Jupyter server

To reproduce the paper results, you need to launch the Jupyter server:

poetry run jupyter-notebook .

The command will output instructions on accessing the server.

If using the Docker container, you can launch the Jupyter server like so:

docker run -it --rm dv -p <port>:<port> jupyter-notebook --ip=0.0.0.0 --port=<port>

This will run the container on a given port (e.g., 8888) on your host machine.

Full reproduction

To reproduce all the experimental data, tables, and plots, you need to execute each of the notebooks in the root folder using the launched Jupyter server. If you have not used Jupyter before, you can check this tutorial to learn how to do this.

Executing the notebooks will re-run the experiments and might take from 20 minutes to about 6 hours depending on the experiment and the hardware.

Using experimental data from the paper

The reproduction will not be exact, as scikit-learn is not deterministic. We include the experimental data used in the paper in the results/ folder---results of the experiments. Using this data, you can reproduce the tables and plots from the paper without actually re-running the experiments.

To do so, you need to run the relevant notebook (see Modules and scripts) using the Jupyter server, with a modification: set RESTORE_SAVED_DATA to True (if this flag is used in the notebook). You will see the plots and tables displayed inline of the respective cells.