This is the official repository for "How to Craft Backdoors with Unlabeled Data Alone?" by Yifei Wang*, Wenhan Ma*, Stefanie Jegelka, Yisen Wang.
torch==1.10.0
torchvision==0.11.1
einops
pytorch-lightning==1.5.3
torchmetrics==0.6.0
lightning-bolts>=0.4.0
tqdm
wandb
scipy
timm
Split training set into pretraining set and downstream set. Run
python ./misc/cifar10_split.py
Pretrain the clean encoder on pretraining set. Run the script in script/cifar10_encoder
Select the poison subset. Run the script in script/cifar10_poison
Pretrain the backdoor encoder on pretraining set with poison subset. Run the script in script/cifar10_pretrain
Train the classifier. Run the script in script/cifar10_linear
Split training set into pretraining set and downstream set. Run
python ./misc/imagnet_script/in100_split.py
Create a copy with poison. Run
python ./misc/imagnet_script/in100_add_trigger.py
Create a folder with soft link to the clean dataset. Run
python misc/imagnet_script/in100_link.py
Pretrain the clean encoder on pretraining set. Run the script in script/cifar10_encoder
Select the poison subset. Run the script in script/cifar10_poison
Create a folder with soft link to the clean dataset and poison subset. Run
python misc/imagnet_script/in100_link_poison.py
Pretrain the backdoor encoder on pretraining set with poison subset. Run the script in script/cifar10_pretrain
Train the classifier. Run the script in script/cifar10_linear
If you find this useful in your research, please consider citing:
@misc{wang2024craft,
title={How to Craft Backdoors with Unlabeled Data Alone?},
author={Yifei Wang and Wenhan Ma and Stefanie Jegelka and Yisen Wang},
year={2024},
eprint={2404.06694},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
This repo is based upon the following repository: