2024-AAAI-DP

This is the pytorch implementation of the paper (accpted by AAAI 2024).

Abstract

Text-based person search is a challenging task aimed at locating specific target pedestrians through text descriptions. Recent advancements have been made in this field, but there remains a deficiency in datasets tailored for text-based person search. The creation of new, real-world datasets is hindered by concerns such as the risk of pedestrian privacy leakage and the substantial costs of annotation. In this paper, we introduce a framework, named Diverse Person (DP), to achieve efficient and high-quality text-based person search data generation without involving privacy concerns. Specifically, we propose to leverage available images of clothing and accessories as reference attribute images to edit the original dataset images through diffusion models. Additionally, we employ a Large Language Model (LLM) to produce annotations that are both high in quality and stylistically consistent with those found in real-world datasets. Extensive experimental results demonstrate that the baseline models trained with our DP can achieve new state-of-the-art results on three public datasets, with performance improvements up to 4.82%, 2.15%, and 2.28% on CUHK-PEDES, ICFG-PEDES, and RSTPReid in terms of Rank-1 accuracy, respectively.

Usage

Setup

Make sure conda is installed.

conda create -n diverseperson python
conda activate diverseperson
pip install -r requirements.txt

Training

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
  --mixed_precision=bf16 \
  train.py \
  --pretrained_model_hf runwayml/stable-diffusion-v1-5 \
  --dataset_name {DATASET_PATH} \
  --logging_dir {LOG_PATH} \
  --output_dir {OUTPUT_PATH} \
  --max_train_steps 30000 \
  --num_train_epochs 30000 \
  --train_batch_size 32 \
  --lr 1e-5 \
  --unet_lr_scale 1.0 \
  --checkpointing_steps 200 \
  --mixed_precision bf16 \
  --keep_only_last_checkpoint \
  --keep_interval 10000 \
  --seed 42 \
  --image_encoder_hf openai/clip-vit-large-patch14 \
  --img_token_num 1 \
  --MAX_REFERENCE_ATTRIBUTE_NUM 3 \
  --resolution 128 \
  --text_image_linking control \
  --disable_flashattention \
  --train_image_encoder \
  --image_encoder_trainable_layers 2 \
  --reference_attribute_types hair_uppercloth_bottomcloth \
  --mask_loss \
  --attribute_loc \
  --attribute_loc_weight 1e-3 \
  --attribute_localization_loss balanced_l1 \
  --resume_from_checkpoint latest \

Annotation

python annotate.py \
  --origin_path {DATA_PATH} \
  --output_path {ANNOTATION_JSONL_PATH} \
  --model "gpt-3.5-turbo" \
  --TEXT_START 0 \
  --TEXT_END -1 \
  --SLICE_STEP 500 \

Reference

Part of this code is based on related script of diffusers.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
annotator		annotator
figure		figure
model		model
segment_anything		segment_anything
README.md		README.md
annotate.py		annotate.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2024-AAAI-DP

Abstract

Usage

Setup

Training

Annotation

Reference

About

Releases

Packages

Languages

Vill-Lab/2024-AAAI-DP

Folders and files

Latest commit

History

Repository files navigation

2024-AAAI-DP

Abstract

Usage

Setup

Training

Annotation

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages