This is the official repository with PyTorch implementation of Add-SD: Rational Generation without Manual Reference.
☀️ If you find this work useful for your research, please kindly star our repo and cite our paper! ☀️
We propose Add-SD, a novel visual generation method for instruction-based object addition, demonstrating significant advancements in seamlessly integrating objects into realistic scenes using only textual instructions.
Add-SD consists of three essential stages to complete the object addition task:
- Creating image pairs by removing objects,
- Fine-tuning Add-SD,
- Generating synthetic data for downstream tasks.
-
Follow the instructions in Inpaint-Anything repository to install the necessary dependencies.
-
Download pretrained models, including sam_vit_h_4b8939 and big-lama, into the
pretrained
directory -
Navigate to the 0_inpaint_anything directory and run the script to process COCO and LVIS data:
cd 0_inpaint_anything
sh script/remove_anything_with_GTbox.sh ### containing both COCO and LVIS
-
Follow the installation instructions in the instruct-pix2pix repository.
-
Download pretrained model, v1-5-pruned-emaonly.ckpt, in the
pretrained
directory. -
Download the required JSON files and organize them as follows:
1_AddSD/data/
├── json/
├── seeds_coco_multi_vanilla.json
├── seeds_coco_multi_vanilla.json
├── seeds_lvis_vanilla.json
├── seeds_lvis_multi_vanilla.json
├── seeds_refcoco_vanilla.json
├── seeds_vg_vanilla.json
└── seeds_vgcut_vanilla.json
- (Optional) If you want to make your own datasets, conduct the following steps:
cd 1_AddSD
python utils/gen_train_data_annos.py
- Train Add-SD
cd 1_AddSD
python run_train.sh
Make sure place the datasets, such as COCO, LVIS, VG, VGCUT, RefCOCO, RefCOCO+, and RefCOCOg, in the data directory with the following structure:
1_AddSD/data/
├── coco/
├── train2017/
├── val2017/
├── train2017_remove_image/ ## coco single object remove datasets
├── train2017_remove_image_multiobj/ ## coco multiple objects remove datasets
├── lvis_remove_image/ ## lvis single object remove datasets
├── lvis_remove_image_multiobj/ ## lvis multiple objects remove datasets
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
├── lvis/
├── lvis_v1_train.json
└── lvis_v1_val.json
├── refcoco/
├── refcoco/
└── instances.json
├── refcoco+/
└── instances.json
├── refcocog/
└── instances.json
├── refcoco_remove/
├── vg/
├── images/
├── metas/
├── caption_vg_all.json
└── caption_vg_train.json
├── vg_remove/
├── vgcut/
├── refer_train.json
├── refer_val.json
├── refer_input_train.json
└── refer_input_val.json
└── vgcut_remove/
- Generating synthetic data
Download the pretrained models from Google Drive.
Run the dataset generation script:
cd 1_AddSD
sh utils/gen_datasets.sh
Here are examples of generation on COCO and LVIS datasets.
COCO object generation
python edit_cli_datasets.py --config configs/generate.yaml \
-n $NNODES -nr $NODE_RANK --addr $ADDR --port $PORT --input $INPUT --output $OUTPUT --ckpt $MODEL --seed $SEED \
-
By default, use super-label-based sampling strategy to restrict the category of the added object. If do not use it, please add
--no_superlabel
parameter. -
By default, generate single object. If want to generate multiple objects, please add
--multi
parameter.
LVIS object generation
python edit_cli_datasets.py --config configs/generate.yaml -n $NNODES -nr $NODE_RANK --addr $ADDR --port $PORT --input $INPUT --output $OUTPUT --ckpt $MODEL --seed $SEED \
--is_lvis --lvis_label_selection r
-
Need to add
--is_lvis
parameter to generate on LVIS dataset. -
By default, add object with rare classes. If want to use common or frequent classes, please change
--lvis_label_selection f c r
parameter, where f, c, r represents frequent, common, rare class, respectively.
-
Follow the installation instructions in the GroundingDINO repository.
-
Download pretrained model, groundingdino_swinb_cogcoor.pth, in the
pretrained
directory. -
Navigate to the
2_grounding_dino
directory and run the inference script:
cd 2_grounding_dino
sh run_infer_with_GT_for_AddSD.sh
-
Follow the installation instructions in the XPaste repository.
-
Navigate to the
3_XPaste
directory and run the inference script:
cd 3_XPaste
sh train.sh
Visualization on image editing.
Visualization under different instructions.
Our project is conducted based on the following public paper with code:
If you find this code useful in your research, please kindly consider citing our paper:
@article{yang2024add,
title={Add-SD: Rational Generation without Manual Reference},
author={Yang, Lingfeng and Zhang, Xinyu and Li, Xiang and Chen, Jinwen and Yao, Kun and Zhang, Gang and Liu, Lingqiao and Wang, Jingdong and Yang, Jian},
journal={arXiv preprint arXiv},
year={2024}
}