#

vision-language-learning

Here are 11 public repositories matching this topic...

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

chatbot multimodality multimodal vision-language-model multimodal-large-language-models vision-language-learning qwen llama3

Updated Nov 26, 2024
Python

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

chatbot llama multimodal gpt-4 chatgpt vision-language-model vision-language-learning large-multimodal-models

Updated Aug 24, 2024
Python

RLHF-V / RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

chatbot multimodal llava vision-language-learning gpt-4v llava-next rlaif-v minicpm-v

Updated Dec 2, 2024
Python

shikiw / Modality-Integration-Rate

The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

chatbot llama multimodal vision-language-model llava vision-language-learning large-multimodal-models gpt-4o

Updated Nov 27, 2024
Python

YunzeMan / Situation3D

[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning

deep-learning multimodal-learning multi-modal-learning 3d-scene-understanding vision-language-model vision-language-learning

Updated Nov 1, 2024
Python

LooperXX / ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

vision-language multi-modal-learning vision-language-pretraining vision-language-learning

Updated Dec 12, 2023
Python

SHTUPLUS / GITM-MR

The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".

vision-and-language vision-and-language-pre-training vision-language-dataset vision-language-model vision-language-learning

Updated Dec 8, 2023
Python

yubin1219 / CrossVLT

Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)

pytorch referring-image-segmentation vision-language-learning

Updated Aug 14, 2024
Python

lyuchenyang / Dialogue-to-Video-Retrieval

Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"

machine-learning deep-learning multimedia neural-networks video-retrieval vision-language-learning

Updated Jul 14, 2023
Python

Ravi-Teja-konda / TunedLlavaDelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

dessert nutrition nutrition-information finetuning multimodal multi-modality gpt4 tranformers dalle2 stable-diffusion chatgpt vision-language-model llava vision-language-learning llama2 gpt4v

Updated Mar 17, 2024
Python

abhinav-neil / socratic-models

Socratic models for multimodal reasoning & image captioning

image-captioning clip multimodal-learning visual-question-answering gpt-3 chain-of-thought flan-t5 vision-language-learning

Updated Jun 4, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the vision-language-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-language-learning topic, visit your repo's landing page and select "manage topics."