Skip to content

Latest commit

 

History

History
54 lines (54 loc) · 2.03 KB

2023-04-24-zhang23a.md

File metadata and controls

54 lines (54 loc) · 2.03 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Recent advances in image tokenizers, such as VQ-VAE, have enabled text-to-image generation using auto-regressive methods, similar to language modeling. However, these methods have yet to leverage pre-trained language models, despite their adaptability to various downstream tasks. In this work, we explore this gap by adapting a pre-trained language model for auto-regressive text-to-image generation, and find that pre-trained language models offer limited help. We provide a two-fold explanation by analyzing tokens from each modality. First, we demonstrate that image tokens possess significantly different semantics compared to text tokens, rendering pre-trained language models no more effective in modeling them than randomly initialized ones. Second, the text tokens in the image-text datasets are too simple compared to normal language model pre-training data, which causes the catastrophic degradation of language models’ capability.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
zhang23a
0
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
127
133
127-133
127
false
Zhang, Yuhui and McKinzie, Brandon and Gan, Zhe and Shankar, Vaishaal and Toshev, Alexander
given family
Yuhui
Zhang
given family
Brandon
McKinzie
given family
Zhe
Gan
given family
Vaishaal
Shankar
given family
Alexander
Toshev
2023-04-24
Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops
239
inproceedings
date-parts
2023
4
24