LAVIS - A One-stop Library for Language-Vision Intelligence
-
Updated
Jun 3, 2024 - Jupyter Notebook
LAVIS - A One-stop Library for Language-Vision Intelligence
A curated list of awesome vision and language resources for earth observation.
"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
日本語LLMまとめ - Overview of Japanese LLMs
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
[ICML2024] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Fast and accurate open-vocabulary end-to-end object detection
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Code and pre-trained models for our paper "CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection".
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390
This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023
A curated list of various topics related to development for the open source community⚡️
FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.
A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
HPT - Open Multimodal LLMs from HyperGAI
Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.
To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."