Skip to content

2023-MindSpore-1/ms-code-179

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contents

TokenFusion is a multimodal token fusion method tailored for transformer-based vision tasks. To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features. Residual positional alignment is also adopted to enable explicit utilization of the inter-modal alignments after fusion. The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact. Extensive experiments are conducted on a variety of homogeneous and heterogeneous modalities and demonstrate that TokenFusion surpasses state-of-the-art methods in three typical vision tasks: multimodal image-to-image translation, RGB-depth semantic segmentation, and 3D object detection with point cloud and images.

Paper: Yikai Wang, Xinghao Chen, Lele Cao, Wenbing Huang, Fuchun Sun, Yunhe Wang. Multimodal Token Fusion for Vision Transformers. In CVPR 2022.

The overall architecture of TokenFusion is show below:

Dataset used: NYUDv2

  • Dataset size:colorful images and depth images, with labels in 40 segmentation classes
    • Train:795 samples
    • Test:654 samples
  • Data format:image files
    • Note:Data will be processed in utils/datasets.py
.TokenFusion
├── README.md               # descriptions about TokenFusion
├── models
│   ├── mix_transformer.py  # definition of backbone model
│   ├── segformer.py        # definition of segmentation model
│   └── modules.py          # TokenFusion operations
├── utils
│   ├── datasets.py         # data loader
│   ├── helpers.py          # utility functions
│   ├── transforms.py       # data preprocessing functions
│   └── meter.py            # utility functions
├── eval.py                 # evaluation interface
├── cfg.py                  # configure file
├── config.py               # configure file

To Be Done

Launch

# infer example

python eval.py --checkpoint_path  [CHECKPOINT_PATH]

Checkpoint can be downloaded at here or Mindspore Hub.

Result

result: IoU=54.8, ckpt= ./tokenfusion_ascend_v180_nyudv2_research_cv_acc54.8.ckpt
Parameters Ascend
Model TokenFusion
Model Version tokenfusion_seg_mitb3_nyudv2
Resource Ascend 910
Uploaded Date 2022-08-10
MindSpore Version 1.8.0
Dataset NYUDv2
Outputs probability
Accuracy 1pc: 54.8%
Speed 1pc:1s/step

We set the seed inside datasets.py.

Please check the official homepage.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages