This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.
Review Paper in multi-modal
-
Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial
-
From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial
-
Tutorial on MultiModal Machine Learning: IEEE CVPR 2022 Tutorial
-
PyTorchVideo a deep learning library for video understanding research
-
horovod a tool for multi-gpu parallel processing
-
accelerate an easy API for mixed precision and any kind of distributed computing