GitHub - Hydragon516/GSANet: [CVPR 2024] Guided Slot Attention for Unsupervised Video Object Segmentation

Guided Slot Attention for Unsupervised Video Object Segmentation

Minhyeok Lee ¹ Suhwan Cho ¹ Dogyoon Lee ¹ Chaewon Park ¹ Jungho Lee ¹ Sangyoun Lee ^1,2

¹ Yonsei University ² Korea Institute of Science and Technology (KIST)

CVPR 2024

Abstract

Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However, the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue, we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation. The foreground and background slots, which are initialized with query guidance, are iteratively refined based on interactions with template information. Furthermore, to improve slot--template interaction and effectively fuse global and local features in the target and reference frames, K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally, we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.

Overview

Requirements

We use fast_pytorch_kmeans for the GPU-accelerated Kmeans algorithm.

pip install fast-pytorch-kmeans

Datasets

We use the DUTS train dataset for model pretraining and the DAVIS 2016 dataset for fintuning. For DAVIS 2016, RAFT is used to generate optical flow maps. The complete dataset directory structure is as follows:

dataset dir/
├── DUTS_train/
│   ├── RGB/
│   │   ├── sun_ekmqudbbrseiyiht.jpg
│   │   ├── sun_ejwwsnjzahzakyjq.jpg
│   │   └── ...
│   └── GT/
│       ├── sun_ekmqudbbrseiyiht.png
│       ├── sun_ejwwsnjzahzakyjq.png
│       └── ...
├── DAVIS_train/
│   ├── RGB/
│   │   ├── bear_00000.jpg
│   │   ├── bear_00001.jpg
│   │   └── ...
│   ├── GT/
│   │   ├── bear_00000.png
│   │   ├── bear_00001.png
│   │   └── ...
│   └── FLOW/
│       ├── bear_00000.jpg
│       ├── bear_00001.jpg
│       └── ...
└── DAVIS_test/
    ├── blackswan/
    │   ├── RGB/
    │   │   ├── blackswan_00000.jpg
    │   │   ├── blackswan_00001.jpg
    │   │   └── ...
    │   ├── GT/
    │   │   ├── blackswan_00000.png
    │   │   ├── blackswan_00001.png
    │   │   └── ...
    │   └── FLOW/
    │       ├── blackswan_00000.jpg
    │       ├── blackswan_00001.jpg
    │       └── ...
    ├── bmx-trees
    └── ...

Training Model

We use a two-stage learning strategy: pretraining and finetuning.

Pretraining

Edit config.py. The data root path option and GPU index should be modified.
training

python pretrain.py

Finetuning

Edit config.py. The best model path generated during the pretraining process is required.
training

python train.py

Evaluation

TBD

Result

An example of the resulting image is shown below.

A : RGB image
B : Optical Flow map
C : Pred map
D : GT
E : Pred forground RGB slot attention map
F : Pred background RGB slot attention map
G : Pred forground Flow slot attention map
H : Pred background Flow slot attention map

Citation

@InProceedings{Lee_2024_CVPR,
    author    = {Lee, Minhyeok and Cho, Suhwan and Lee, Dogyoon and Park, Chaewon and Lee, Jungho and Lee, Sangyoun},
    title     = {Guided Slot Attention for Unsupervised Video Object Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {3807-3816}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
dataloader		dataloader
model		model
README.md		README.md
config.py		config.py
logger.py		logger.py
loss.py		loss.py
metrics.py		metrics.py
pretrain.py		pretrain.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Guided Slot Attention for Unsupervised Video Object Segmentation

Abstract

Overview

Requirements

Datasets

Training Model

Pretraining

Finetuning

Evaluation

Result

Citation

About

Releases

Packages

Languages

Hydragon516/GSANet

Folders and files

Latest commit

History

Repository files navigation

Guided Slot Attention for Unsupervised Video Object Segmentation

Abstract

Overview

Requirements

Datasets

Training Model

Pretraining

Finetuning

Evaluation

Result

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages