SBNet (ICASSP 2023)

Official implementation of SBNet as described in "Single-branch Network for Multimodal Training".

Paper Link: SBNet

Presentation: https://youtu.be/bXeiy8kQQtY

Proposed Methodology

a) Two independent modality-specific embedding networks to extract features (left) and a conventional two-branch network (right) having two independent modality-specific branches to learn discriminative joint representations of the multimodal task. (b) Proposed network with a single modality-invariant branch.

Installation

We have used the following setup for our experiments:

python==3.6.5

CUDA and cuDNN Setup:

For tensorflow:

CUDA Toolkit 10.1
cudnn v7.6.5.32 for CUDA10.1

For PyTorch:

CUDA Toolkit 10.2
cudnn v8.2.1.32 for CUDA10.2

To install PyTorch and TensorFlow with GPU support:

  pip install tensorflow-gpu==1.13.1
  pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Feature Extraction

We perform experiments on cross-modal verification and cross-modal matching tasks on the large-scale VoxCeleb1 dataset.

Facial Feature Extraction

For face feature extraction we use Facenet. The official implmentation from authors is available here

Voice Feature Extraction

For Voice Embeddings we use the method described in Utterance Level Aggregator. The code we used is released by authors and is publicly available here

Extracted Features

The face and voice features used in our work can be accessed here. Once downloaded, place the files like this:

|-- data
  |-- voice
    |-- .csv files
  |-- face
    |--  .csv files
|-- imgs
|-- ssnet_cent_git
|-- ssnet_fop
|-- twobranch_cent_git
|-- twobranch_fop

Training and Testing

FOP Loss

# Training
python main.py --save_dir ./model --batch_size 128 --max_num_epoch 100 --dim_embed 128 --split_type <face_only, voice_only, hefhev, hevhef, random, fvfv, vfvf>

# Testing
python test.py --split_type vfvf --sh unseenunheard --test random

Cent/Git Loss

# Training
python main.py --save_dir ./model --batch_size 128 --max_num_epoch 100 --split_type <face_only, voice_only, hefhev, hevhef, random, fvfv, vfvf> --loss <git, cent>

# Testing
python test.py --split_type fvfv --sh unseenunheard --test random

Baseline

For baseline results, we leverage the work from FOP.

Paper
Code

Citation

@inproceedings{saeed2023sbnet,
  title={Single-branch Network for Multimodal Training},
  author={Saeed, Muhammad Saad and Nawaz, Shah and Yousaf and Khan, Muhammad Haris and Zaheer, Muhammad Zaigham and Nandakumar, Karthik and Yousaf, Muhammad Haroon and Mahmood, Arf},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  organization={IEEE}
}

@inproceedings{saeed2022fusion,
  title={Fusion and Orthogonal Projection for Improved Face-Voice Association},
  author={Saeed, Muhammad Saad and Khan, Muhammad Haris and Nawaz, Shah and Yousaf, Muhammad Haroon and Del Bue, Alessio},
  booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7057--7061},
  year={2022},
  organization={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SBNet (ICASSP 2023)

Proposed Methodology

Installation

For tensorflow:

For PyTorch:

Feature Extraction

Facial Feature Extraction

Voice Feature Extraction

Extracted Features

Training and Testing

FOP Loss

Cent/Git Loss

Baseline

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
imgs		imgs
ssnet_cent_git		ssnet_cent_git
ssnet_fop		ssnet_fop
twobranch_cent_git		twobranch_cent_git
twobranch_fop		twobranch_fop
README.md		README.md

msaadsaeed/SBNet

Folders and files

Latest commit

History

Repository files navigation

SBNet (ICASSP 2023)

Proposed Methodology

Installation

For tensorflow:

For PyTorch:

Feature Extraction

Facial Feature Extraction

Voice Feature Extraction

Extracted Features

Training and Testing

FOP Loss

Cent/Git Loss

Baseline

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages