DPImageBench is an open-source toolkit developed to facilitate the research and application of DP image synthesis. DPImageBench simplifies the access, understanding, and assessment of DP image synthesis, making it accessible to both researchers and the broader community.
Synthetic images by algorithms in DPImageBench with
- 1. Contents
- 2. Introduction
- 3. Repo Contents
- 4. Quick Start
- 5. Customization
- 6. Contacts
- Acknowledgment
- 🎉 (2024.11.19) We're thrilled to announce the release of initial version of DPImageBench!
-
DP-LDM and DP-LoRA for stable diffusion framework.
-
Hyper-pararmeter gradient norm, epoch.
-
emnist as the pretraining dataset.
-
Can GS-WGAN have more intermediate results.
-
Merge DP-LoRA.
-
downstream generative model?
-
DP ldm pretraining optimization.
-
Exp: different resolutions.
-
Exp: All method for conditional pretraining except dm method.
We list currently supported DP image synthesis methods as follows.
We list the studied datasets as follows, which include seven sensitive datasets and two public datasets.
Usage | Dataset |
---|---|
Pretraining dataset | ImageNet_ILSVRC2012, Places365 |
Sensitive dataset | MNIST, FashionMNIST, CIFAR-10, CIFAR-100, EuroSAT, CelebA, Camelyon |
Below is the directory structure of the DPImageBench project, which organizes its two core functionalities within the models/
and evaluation/
directories. To enhance user understanding and showcase the toolkit's ease of use, we offer a variety of example scripts located in the scripts/
directory.
DPImageBench/
├── config/ # Configuration files for various DP image synthesis algorithms
│ ├── DP-MERF
│ ├── DP-NTK
│ ├── DP-Kernel
│ ├── PE
│ ├── DP-GAN
│ ├── DPDM
│ ├── PDP-Diffusion
│ ├── DP-LDM-SD
│ ├── DP-LDM
│ ├── GS-WGAN
│ └── PDP-Diffusion
├── data/ # Data Preparation for Our Benchmark
│ ├── stylegan3
│ ├── SpecificPlaces365.py
│ ├── dataset_loader.py
│ └── preprocess_dataset.py
├── dataset/ # Datasets studied in the project
├── docker/ # Docker file
├── exp/ # The output of the training process and evaluation results.
├── evaluation/ # Evaluation module of DPImageBench, including utility and fidelity
│ ├── classifier/ # Downstream tasks classification training algorithms
│ │ ├── densenet.py
│ │ ├── resnet.py
│ │ ├── resnext.py
│ │ └── wrn.py
│ ├── ema.py
│ └── evaluator.py
├── models/ # Implementation framework for DP image synthesis algorithms
│ ├── DP_Diffusion
│ ├── DP_GAN
│ ├── DP_MERF
│ ├── DP_NTK
│ ├── GS_WGAN
│ ├── PE
│ ├── PrivImage
│ ├── dpsgd_diffusion.py
│ ├── dpsgd_gan.py
│ ├── pretrained_models # The pre-downloaed files for PE and PrivImage
│ ├── model_loader.py
│ └── synthesizer.py
├── opacus/ # Implementation of DPSGD
├── plot/ # Figures and plots in our paper
│ ├── plot_eps_change.py # Plotting for Figure 5 and 10
│ ├── plot_size_change.py # Plotting for Figure 6
│ ├── plot_wo_pretrain_cond_cifar10.py # Plotting for Figure 7
│ ├── plot_wo_pretrain_cond_fmnist.py # Plotting for Figure 9
│ ├── plot_wo_pretrain_places_imagenet.py # Plotting for Figure 8
│ └── visualization.py # Plotting for Figure 4
├── scripts/ # Scripts for using DPImageBench
│ ├── diffusion_size_change.py
│ ├── download_dataset.sh
│ ├── eps_change.sh.
│ ├── gan_size_change.sh
│ ├── pdp_diffusion.sh
│ └── test_classifier.py
├── utils/ # Helper classes and functions supporting various operations
│ └── utils.py
├── README.md # Main project documentation
└── requirements.txt # Dependencies required for the project
Clone repo and setup the environment:
git clone [email protected]:2019ChenGong/DPImageBench.git
sh install.sh
We also provide the Docker file.
sh scripts/data_preparation.sh
After running, we can found the folder dataset
:
dataset/
├── camelyon/
├── celeba/
├── cifar10/
...
The training and evaluatin codes are run.py
and eval.py
.
The core codes of run.py
are present as follows.
def main(config):
initialize_environment(config)
model, config = load_model(config)
sensitive_train_loader, sensitive_val_loader, sensitive_test_loader, public_train_loader, config = load_data(config)
model.pretrain(public_train_loader, config.pretrain)
model.train(sensitive_train_loader, config.train)
syn_data, syn_labels = model.generate(config.gen)
evaluator = Evaluator(config)
evaluator.eval(syn_data, syn_labels, sensitive_train_loader, sensitive_val_loader, sensitive_test_loader)
We list the key hyper-parameters below, including their explanations and available options.
--data_name
: means the sensitive dataset; the option is [mnist_28
,fmnist_28
,cifar10_32
,cifar100_32
,eurosat_32
,celeba_male_32
,camelyon_32
].--method
: the method to train the DP image synthesizers; the option is [DP-NTK
,DP-Kernel
,DP-MERF
,DPGAN
,DP-LDM-SD
,DP-LDM
,DPDM
,PE
,GS-WGAN
,PDP-Diffusion
,PrivImage
].--epsilon
: the privacy budget 10.0; the option is [1.0
,10.0
].--exp_description
: the notes for the name of result folders.setup.n_gpus_per_node
: means the number of GPUs to be used for training.pretrain.cond
: specifies the mode of pretraining. The options are [true
,false
], wheretrue
indicates conditional pretraining andfalse
indicates conditional pretraining.public_data.name
: the name of pretraining dataset; the option is [null
,imagenet
,places365
], which mean that without pretraining, using ImageNet dataset as pretraining dataset, and using Places365 as pretraining dataset. It is notice that DPImageBench uses ImageNet as default pretraining dataset. If users use Places365 as pretraining dataset, please addpublic_data.n_classes=365 public_data.train_path=dataset/places365
.eval.mode
: the mode of evaluations; the option is [val
,syn
] which means that using part of sensitive images and directly using the synthetic images as the validation set for model selection, respectively. The default setting isval
.setup.master_port
: a configuration parameter specifying the port number on the master node (or primary process) that other processes or nodes use to communicate within a distributed system.pretrain.n_epochs
: the number of epoch for pretraining.train.n_epochs
: the number of epoch for finetuning on sensitive datasets.
Note
DP-LDM originally uses a latent diffusion model as the DP synthesizer. For a fair comparison, we now use a standard diffusion model, just like other diffusion-based models, which we call DP-LDM-SD
. In addition, DP-LDM
means using latent diffusion models (i.e., stabel diffusion) as synthesizers.
Warning
It is a common issue that we can not run a distributed process under a setup.master_port=6026
. If you intend to run multiple distributed processes on the same machine, please consider using a different setup.master_port
, such as 6027.
Users should first activate the conda environment.
conda activate dpimagebench
cd DPImageBench
We list an example as follows. Users can modify the configuration files in configs as their preference.
We provide an example of training a synthesizer using the PDP-Diffusion method with 4 GPUs. The results reported in Table 6 were obtained by following the instructions provided. Additionally, the results (fidelity evaluations) reported in Table 7 were obtained using the default settings.
python run.py setup.n_gpus_per_node=4 --method PDP-Diffusion --data_name mnist_28 --epsilon 10.0 eval.mode=val
The results reported in Table 5 were obtained by following the instructions below.
python run.py setup.n_gpus_per_node=4 --method PDP-Diffusion --data_name mnist_28 --epsilon 10.0 eval.mode=syn
We provide more examples in the scripts/rq1.sh
, please refer to scrips.
Besides, if users want to directly evaluate the synthetic images,
python eval.py --method PDP-Diffusion --data_name mnist_28 --epsilon 10.0 --exp_path exp/pdp-diffusion/<the-name-of-file>
The results are recorded in exp/pdp-diffusion/<the-name-of-file>/stdout.txt
.
Test the classification algorithm on the sensitive images without DP.
python ./scripts/test_classifier.py --method PDP-Diffusion --data_name mnist_28 --epsilon 10.0 -ed no-dp-mnist_28
The results are recorded in exp/pdp-diffusion/<the-name-of-file>no-dp-mnist_28/stdout.txt
. This process is independent of --method
and uses of --epsilon
.
If users wish to finetune the synthesizers using pretrained models, they should: (1) set public_data.name=null
, and (2) load the pretrained synthesizers through model.ckpt
. For example, the pretrained synthesizer can be sourced from other algorithms. Readers can refer to the file structure for more details about loading pretrained models.
python run.py setup.n_gpus_per_node=3 public_data.name=null eval.mode=val \
model.ckpt=./exp/pdp-diffusion/<the-name-of-scripts>/pretrain/checkpoints/final_checkpoint.pth \
--method PDP-Diffusion --data_name fmnist_28 --epsilon 10.0 --exp_description <any-notes>
Only pretraining the synthesizer on public datasets and without finetuning on the sensitive datasets.
Please set sensitive_data.name=null and eval.mode=sen. For example, to use ImageNet for pretraining:
CUDA_VISIBLE_DEVICES=0,1,2 python run.py \
sensitive_data.name=null eval.mode=sen \
setup.n_gpus_per_node=3 \
public_data.name=imagenet \
pretrain.cond=Ture --method PDP-Diffusion \
--data_name cifar10_32 --epsilon 10.0 \
--exp_description pretrain_imagenet32
Note
It is noted that the default resolution for pretraining is 28x28 when --data_name is set to mnist_28
or fmnist_28
, but 32x32 for other datasets. Users can edit pretrain.n_epochs
to control the number of pretrain epoch
For the implementation of the results reported in Figures 5, 6, and 9 (RQ2), the performance is analyzed by varying the epsilon and model size.
If users wish to change the size of the synthesizer, the following parameters should be considered.
train.dp.n_split
: the number of gradient accumulations for saving GPU memory usage. For example, if your server allows to train a 4M DPDM withbatch_size=4096
andtrain.dp.n_split=32
. When you want to train an 80M DPDM with the samebatch_size
, you may need to increasetrain.dp.n_split
into 512,- Change the model size: For diffusion based model, please change
model.network.ch_mult
,model.network.attn_resolutions
andmodel.network.nf
to adjust the synthesizer size. For GAN based model, please changemodel.Generator.g_conv_dim
to adjust the synthesizer size.
In our experiments, we list the model sizes and corresponding hyper-parameter settings as follows.
Diffusion Model size | Hyper-parameters |
---|---|
3.8M | model.network.ch_mult=[2,4] model.network.attn_resolutions=[16] model.network.nf=32 |
11.1M | model.network.ch_mult=[1,2,3] model.network.attn_resolutions=[16,8] model.network.nf=64 |
19.6M | model.network.ch_mult=[1,2,2,4] model.network.attn_resolutions=[16,8,4] model.network.nf=64 |
44.2M | model.network.ch_mult=[1,2,2,4] model.network.attn_resolutions=[16,8,4] model.network.nf=96 |
78.5M | model.network.ch_mult=[1,2,2,4] model.network.nf=128 |
GAN size | Hyper-parameters |
---|---|
3.8M | model.Generator.g_conv_dim=40 |
6.6M | model.Generator.g_conv_dim=60 |
10.0M | model.Generator.g_conv_dim=80 |
14.3M | model.Generator.g_conv_dim=100 |
19.4M | model.Generator.g_conv_dim=120 |
For example:
(1) Using DPDM with an 80M diffusion model.
python run.py setup.n_gpus_per_node=4 --method DPDM --data_name mnist_28 --epsilon 10.0 eval.mode=val \
public_data.name=null \
model.network.ch_mult=[1,2,2,4] \
model.network.attn_resolutions=[16,8,4]
model.network.nf=128 \
train.dp.n_split=512 \
--exp_description 80M
(2) Using DPGAN with a 14M generator.
python run.py setup.n_gpus_per_node=4 --method DPGAN --data_name mnist_28 --epsilon 10.0 eval.mode=val \
public_data.name=null \
model.Generator.g_conv_dim=120 \
--exp_description 14M
Users can set the pretrain.cond
and public_data.name
to choose between conditional and unconditional pretraining or to enable or disable pretraining. public_data.name=null
indicates that pretraining is excluded. If users wish to use Places365 or a pretraining dataset, please take note of the following key parameters.
public_data.n_classes
: the number of categories for pretraining dataset (e.g., 365 for Places365).public_data.name
: [null
,imagenet
,places365
].public_data.train_path
: the path to pretraining dataset.
We use ImageNet as the default pretraining dataset, and these parameters are configured accordingly. We provide more implementation examples in the scripts.
For example,
(1) Using ImageNet to pretrain DPGAN using conditional pretraining.
python run.py setup.n_gpus_per_node=4 --method DPGAN --data_name mnist_28 \
--epsilon 10.0 eval.mode=val \
public_data.name=imagenet \
pretrain.cond=Ture \
--exp_description pretrain_imagenet_conditional
(2) Using Places365 to pretrain DPGAN using conditional pretraining.
python run.py setup.n_gpus_per_node=4 --method DPGAN --data_name mnist_28 \
--epsilon 10.0 eval.mode=val \
public_data.name=places365 public_data.n_classes=365 public_data.train_path=dataset/places365 \
pretrain.cond=Ture \
--exp_description pretrain_places365_conditional
(3) Using Places365 to pretrain DPGAN using unconditional pretraining.
python run.py setup.n_gpus_per_node=4 --method DPGAN --data_name mnist_28 \
--epsilon 10.0 eval.mode=val \
public_data.name=places365 public_data.n_classes=365 public_data.train_path=dataset/places365 \
pretrain.cond=False \
--exp_description pretrain_places365_unconditional
<!### 4.4 Training Using Checkpoints
DPImageBench also supports training synthesizers from the checkpoints. As mentioned in the results structure, we provide snapshot_checkpoint.pth
to store the synthesizer's parameters at the current epoch after each iteration.
For pretraining using checkpoints, we -->
We can find the stdout.txt
files in the result folder, which record the training and evaluation processes. The results for utility and fidelity evaluations are available in stdout.txt
. The result folder name consists of <data_name>_eps<epsilon><notes>-<starting-time>
, e.g., mnist_28_eps1.0-2024-10-25-23-09-18
.
We outline the structure of the results files as follows. The training and evaluations results are recorded in the file exp
. For example, if users leverage the PDP-Diffusion method to generate synthetic images for the MNIST dataset under a privacy budget of eps=1.0
, the structure of the folder is as follows:
exp/
├── dp-kernel/
├── dp-ldm/
├── dp-merf/
├── dp-ntk/
├── dpdm/
├── dpgan/
├── gs-wgan/
├── pdp-diffusion/
│ └── mnist_28_eps1.0-2024-10-25-23-09-18/
│ ├── gen
│ │ ├── gen.npz
│ │ └── sample.png
│ ├── pretrain
│ │ ├── checkpoints
│ │ │ ├── final_checkpoint.pth
│ │ │ └── snapshot_checkpoint.pth
│ │ └── samples
│ │ ├── iter_2000
│ │ └── ...
│ ├── train
│ │ ├── checkooints
│ │ │ ├── final_checkpoint.pth
│ │ │ └── snapshot_checkpoint.pth
│ │ └── samples
│ │ ├── iter_2000
│ │ └── ...
│ └──stdout.txt
├── pe/
└── privimage/
We introduce the files as follows,
./gen/gen.npz
: the synthetic images../gen/sample.png
: the samples of synthetic images../pretrain/checkpoints/final_checkpoint.pth
: the parameters of synthsizer at the final epochs../pretrain/checkpoints/snapshot_checkpoint.pth
: we store the synthesizer's parameters at the current epoch after each iteration, deleting the previous parameters to manage storage efficiently../pretrain/samples/iter_2000
: the synthetic images under 2000 iterations for pretraining on public datasets../train/checkpoints/final_checkpoint.pth
: the parameters of synthsizer at the final epochs../train/checkpoints/snapshot_checkpoint.pth
: we store the synthesizer's parameters at the current epoch after each iteration, deleting the previous parameters to manage storage efficiently../train/samples/iter_2000
: the synthetic images under 2000 iterations for training on sensitive datasets../stdout.txt
: the file used to record the training and evaluation results.
In utility evaluation, after each classifier training, we can find,
INFO - evaluator.py - 2024-11-12 05:54:26,463 - The best acc of synthetic images on sensitive val and the corresponding acc on test dataset from wrn is 64.75999999999999 and 63.99
INFO - evaluator.py - 2024-11-12 05:54:26,463 - The best acc of synthetic images on noisy sensitive val and the corresponding acc on test dataset from wrn is 64.75999999999999 and 63.87
INFO - evaluator.py - 2024-11-12 05:54:26,463 - The best acc test dataset from wrn is 64.12
These results represent the best accuracy achieved by: (1) using the sensitive validation set (63.99%), (2) adding noise to the validation results of the sensitive dataset (model.eval = val
), and the accuracy is 63.87%, and (3) using the sensitive test set for classifier selection (64.12%).
If synthetic images are used as the validation set (model.eval = syn
), the results after each classifier training would be:
INFO - evaluator.py - 2024-10-24 06:45:11,042 - The best acc of synthetic images on val (synthetic images) and the corresponding acc on test dataset from wrn is 63.175 and 56.22
INFO - evaluator.py - 2024-10-24 06:45:11,042 - The best acc test dataset from wrn is 64.22
These results present that the best accuracy achieved by: (1) using the synthetic images for validation set (56.22%) and (2) using the sensitive test set for classifier selection (64.22%).
The following results can be found at the end of the log file:
INFO - evaluator.py - 2024-11-13 21:19:44,813 - The best acc of accuracy (adding noise to the results on the sensitive set of validation set) of synthetic images from resnet, wrn, and resnext are [61.6, 64.36, 59.31999999999999].
INFO - evaluator.py - 2024-11-13 21:19:44,813 - The average and std of accuracy of synthetic images are 61.76 and 2.06
INFO - evaluator.py - 2024-11-13 21:50:27,195 - The FID of synthetic images is 21.644407353392182
INFO - evaluator.py - 2024-11-13 21:50:27,200 - The Inception Score of synthetic images is 7.621163845062256
INFO - evaluator.py - 2024-11-13 21:50:27,200 - The Precision and Recall of synthetic images is 0.5463906526565552 and 0.555840015411377
INFO - evaluator.py - 2024-11-13 21:50:27,200 - The FLD of synthetic images is 7.258963584899902
INFO - evaluator.py - 2024-11-13 21:50:27,200 - The ImageReward of synthetic images is -2.049745370597344
The first line shows the accuracy of the downstream task when noise is added to the validation results of the sensitive dataset for classifier selection (model.eval = val
), across three studied classification outcomes.
If synthetic images are used as the validation set (model.eval = syn
), the first line would be:
INFO - evaluator.py - 2024-11-12 09:06:18,148 - The best acc of accuracy (using synthetic images as the validation set) of synthetic images from resnet, wrn, and resnext are [59.48, 63.99, 59.53].
The synthetic images can be found at the /exp/<algorithm_name>/<file_name>/gen/gen.npz
.
We provide the plotting codes for results visualization in the folder plot
of DPImageBench.
plot_eps_change.py
: plotting for Figure 5 and 10.plot_size_change.py
: plotting for Figure 6.plot_wo_pretrain_cond_cifar10.py
: plotting for Figure 7.plot_wo_pretrain_cond_fmnist.py
: plotting for Figure 9.plot_wo_pretrain_places_imagenet.py
: plotting for Figure 8.visualization.py
: plotting for Figure 4.
This part introduces how to apply DPImageBench for your own sensitive dataset.
First, you need to organize your own dataset like:
train/
├── class1/
├── calss2/
├── class3/
...
test/
├── class1/
├── class2/
├── class3/
...
Process your dataset using:
cd data; python preprocess_dataset.py --data_name <name-of-your-dataset> --train_path <dir-of-train-folder> --test_path <dir-of-test-folder>
For example, if your want to use PrivImage as your synthesizer with eps=10, you can run:
python run.py setup.n_gpus_per_node=4 --method PrivImage --epsilon 10.0 --data_name <name-of-your-dataset> sensitive_data.n_classes=<num_of_classes>
Other processes are the same.
If you have any question about our work or this repository, please don't hesitate to contact us by emails or open an issue under this project.
Part of code is borrowed from DP-MERF, DP-Kernel, DP-NTK, GS-WGAN, DPAGN, PE, DPDM, PrivImage. We have elaborate them on the Appendix B.1 of our paper. We sincerely thank them for their contributions to the community.