Skip to content

paper list on Video Moment Retrieval (VMR), or Natural Language Video Localization (NLVL), or Temporal Sentence Grounding in Videos (TSGV))

Notifications You must be signed in to change notification settings

ZhenZHAO/awesome-video-moment-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-video-moment-retrieval

A personal paper list on Video Moment Retrieval (VMR), or Natural Language Video Localization (NLVL), or Temporal Sentence Grounding in Videos (TSGV)), Natural Language Query (NLQ).

  • Keywords: moment retrieval, temporal grounding, video/language/moment grounding/localization, sentence grounding, etc.

1 Papers List

Summarized by,

2 Quick references

Survey

Datasets

Dataset Video Source Domain
TACoS Kitchen Cooking
Charades-STA Homes Indoor Activity
ActivityNet Captions Youtube Open
DiDeMo Flickr Open
MAD, CVPR22 Movie Open

Referring to this paper, more info,

Dataset Video # VL-pair# --> train val Test Vocab Size
ActivityNet Captions 14926 37421 17505 17031 15406
TACoS 127 10146 4589 4083 2255
DiDeMo 10642 33005 4180 4021 7523
Charades-STA 6670 12404 - 3720 1289

Normally, top three is widely used. Then processed feature,

Visual: 1) by 3D ConvNet, e.g. C3D, I3D 2) by 2D ConvNet, e.g. vgg

Text: 1) pretained word embeddings, e.g. GloVe 2) pre-trained language models, e.g. BERT

NEW MAD: both by CLIP.

extracted features can be downloaded from

Process

vmr-pipeline

Performance Comparisons

vmr-pipeline

3 Resources

About

paper list on Video Moment Retrieval (VMR), or Natural Language Video Localization (NLVL), or Temporal Sentence Grounding in Videos (TSGV))

Topics

Resources

Stars

Watchers

Forks