Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Related work #10

Open
LengZhuo0831 opened this issue Apr 21, 2023 · 2 comments
Open

Related work #10

LengZhuo0831 opened this issue Apr 21, 2023 · 2 comments

Comments

@LengZhuo0831
Copy link

Hello! Thank you so much for the contribution of this repo.
I'm so interested in this work, and I'm suveying papers with key words like "captioning anything" or "instance level captioning" or "per pixel captioning". Would you like to recomand some related work to me?

@ttengwang
Copy link
Owner

ttengwang commented Apr 21, 2023

@LengZhuo0831 As far as I know, dense captioning is the most related topic, which generates captions at the region/object level. Scene graph generation is another way to describe the image at the instance level, which considers the instance as graph nodes and relationships as edges.

Here I list several early seminal works

  • image-based: DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  • video-based: Dense-Captioning Events in Videos
  • 3D data-based: Scan2cap: Context-aware dense captioning in rgb-d scans

Some recent works to combine LLMs and fine-grained visual experts for dense captioning generation:

  • Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
  • Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
  • ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

@DavidMChan
Copy link

One more to add for the LLMs + Image Captioners:

IC3: Image Captioning by Committee Consensus
https://arxiv.org/abs/2302.01328

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants