Skip to content

Latest commit

 

History

History
61 lines (44 loc) · 3.89 KB

README.md

File metadata and controls

61 lines (44 loc) · 3.89 KB

TokenBench

TokenBench.mp4

TokenBench is a comprehensive benchmark to standardize the evaluation for Cosmos-Tokenizer, which covers a wide variety of domains including robotic manipulation, driving, egocentric, and web videos. It consists of high-resolution, long-duration videos, and is designed to evaluate the performance of video tokenizers. We resort to existing video datasets that are commonly used for various tasks, including BDD100K, EgoExo-4D, BridgeData V2, and Panda-70M. This repo provides instructions on how to download and preprocess the videos for TokenBench.

Instructions to build TokenBench

  1. Download the datasets from the official websites:
  1. Pick the videos as specified in the video/list.txt file.
  2. Preprocess the videos using the script video/preprocessing_script.py.

Continuous video tokenizer leaderboard

Tokenizer Compression Ratio (T x H x W) Formulation PSNR SSIM rFVD
CogVideoX 4 × 8 × 8 VAE 33.149 0.908 6.970
OmniTokenizer 4 × 8 × 8 VAE 29.705 0.830 35.867
Cosmos-CV 4 × 8 × 8 AE 37.270 0.928 6.849
Cosmos-CV 8 × 8 × 8 AE 36.856 0.917 11.624
Cosmos-CV 8 × 16 × 16 AE 35.158 0.875 43.085

Discrete video tokenizer leaderboard

Tokenizer Compression Ratio (T x H x W) Quantization PSNR SSIM rFVD
VideoGPT 4 × 4 × 4 VQ 35.119 0.914 13.855
OmniTokenizer 4 × 8 × 8 VQ 30.152 0.827 53.553
Cosmos-DV 4 × 8 × 8 FSQ 35.137 0.887 19.672
Cosmos-DV 8 × 8 × 8 FSQ 34.746 0.872 43.865
Cosmos-DV 8 × 16 × 16 FSQ 33.718 0.828 113.481

Core contributors

Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liu