Start building LLM-empowered multi-agent applications in an easier way.
-
Updated
May 23, 2024 - Python
Start building LLM-empowered multi-agent applications in an easier way.
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.
ModelScope: bring the notion of Model-as-a-Service to life.
GPT4V-level open-source multi-modal model based on Llama3-8B
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
State-of-the-art, multi-modal virtual assistant framework powered by LLaMA. Ame is under active development.
Code the ICML 2024 paper: "EMC^2: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence"
Efficient Retrieval Augmentation and Generation Framework
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Audio, Image, Video, Music and 3D content. 🔥
Open Source Routing Engine for OpenStreetMap
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Build high-performance AI models with modular building blocks
a state-of-the-art-level open visual language model | 多模态预训练模型
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
[MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models
Offline Multi-Modal RAG. Execution Scripts optimized for for Intel, CUDA.
Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement
Add a description, image, and links to the multi-modal topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal topic, visit your repo's landing page and select "manage topics."