Reading list for Multimodal Large Language Models
-
Updated
Aug 17, 2023
Reading list for Multimodal Large Language Models
Research Trends in LLM-guided Multimodal Learning.
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
[Paper][Preprint 2023] Making Large Language Models Perform Better in Knowledge Graph Completion
A Gradio demo of MGIE
A Video Chat Agent with Temporal Prior
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
A PyTorch-based system for highly accurate drug-target interaction predictions utilizing multi-modal large language models to discern structural affinities in drug-target pairs.
A curated list of awesome Image captioning strudies, aimed at annotating and reporting CT / MRI scans
An Easy-to-use Hallucination Detection Framework for LLMs.
This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimodal content comprehension tasks.
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
From scratch implementation of a vision language model in pure PyTorch
[Arxiv 2024] Official Implementation of the paper: "InstrAug: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning"
Add a description, image, and links to the multimodal-large-language-models topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-large-language-models topic, visit your repo's landing page and select "manage topics."