Skip to content

Latest commit

 

History

History
49 lines (49 loc) · 1.81 KB

2023-04-24-hsu23a.md

File metadata and controls

49 lines (49 loc) · 1.81 KB
title abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?
When humans reason about complex text-based questions, we leverage diagrammatic abstractions drawn on a visual scratchpad. In this paper, we introduce and explore the capabilities of Visual-Scratchpad, a method that augments a large language foundation model (LLM) with diagrammatic execution and readout. We enable the LLM to generate drawing commands and to readout abstractions from the resulting picture. The visual readout operation uses a visual foundation model, optionally finetuned with expert iteration. Here, we show that although Visual-Scratchpad outperforms an inference-only LLM, it surprisingly yields worse performance compared to a single finetuned LLM. Through experiments, we propose that this gap is due to the failure mode of vision foundation models in understanding abstractions in diagrams.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
hsu23a
0
Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?
21
28
21-28
21
false
Hsu, Joy and Poesia, Gabriel and Wu, Jiajun and Goodman, Noah
given family
Joy
Hsu
given family
Gabriel
Poesia
given family
Jiajun
Wu
given family
Noah
Goodman
2023-04-24
Proceedings on "I Can't Believe It's Not Better: Failure Modes in the Age of Foundation Models" at NeurIPS 2023 Workshops
239
inproceedings
date-parts
2023
4
24