Related work #10

LengZhuo0831 · 2023-04-21T07:24:03Z

Hello! Thank you so much for the contribution of this repo.
I'm so interested in this work, and I'm suveying papers with key words like "captioning anything" or "instance level captioning" or "per pixel captioning". Would you like to recomand some related work to me?

ttengwang · 2023-04-21T16:25:42Z

@LengZhuo0831 As far as I know, dense captioning is the most related topic, which generates captions at the region/object level. Scene graph generation is another way to describe the image at the instance level, which considers the instance as graph nodes and relationships as edges.

Here I list several early seminal works

image-based: DenseCap: Fully Convolutional Localization Networks for Dense Captioning
video-based: Dense-Captioning Events in Videos
3D data-based: Scan2cap: Context-aware dense captioning in rgb-d scans

Some recent works to combine LLMs and fine-grained visual experts for dense captioning generation:

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

DavidMChan · 2023-05-05T15:59:43Z

One more to add for the LLMs + Image Captioners:

IC3: Image Captioning by Committee Consensus
https://arxiv.org/abs/2302.01328

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Related work #10

Related work #10

LengZhuo0831 commented Apr 21, 2023

ttengwang commented Apr 21, 2023 •

edited

Loading

DavidMChan commented May 5, 2023

Related work #10

Related work #10

Comments

LengZhuo0831 commented Apr 21, 2023

ttengwang commented Apr 21, 2023 • edited Loading

DavidMChan commented May 5, 2023

ttengwang commented Apr 21, 2023 •

edited

Loading