#

vision-and-language

Here are 221 public repositories matching this topic...

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

deep-learning salesforce image-captioning deep-learning-library vision-framework vision-and-language multimodal-deep-learning multimodal-datasets vision-language-transformer vision-language-pretraining visual-question-anwsering

Updated Jun 3, 2024
Jupyter Notebook

geoaigroup / awesome-vision-language-models-for-earth-observation

A curated list of awesome vision and language resources for earth observation.

awesome remote-sensing awesome-list earth-observation vision-and-language multimodal-deep-learning

Updated Jun 3, 2024

eric-ai-lab / ProbMed

"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"

evaluation vision-and-language medical-vqa medical-diagnosis llms large-multimodal-models

Updated Jun 3, 2024
Python

aishwaryanr / awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

awesome awesome-list interview-questions vision-and-language notebook-jupyter large-language-models llms generative-ai

Updated Jun 3, 2024

patrick-tssn / Awesome-Colorful-LLM

Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.

agent robotics awesome-list multimodal vision-and-language ai4science large-language-models llm

Updated Jun 2, 2024

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

vision-and-language lmm foundation-models vision-language-model llm-agent

Updated Jun 2, 2024
Python

llm-jp / awesome-japanese-llm

日本語LLMまとめ - Overview of Japanese LLMs

japanese generative-model japanese-language language-models language-model generative-models multimodal vision-and-language vision-language foundation-models large-language-models llm llms generative-ai large-language-model vision-language-model japanese-llm japanese-language-model llm-japanese

Updated Jun 1, 2024

naamiinepal / medvlsm

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

segmentation medical-images vision-and-language

Updated May 31, 2024
Python

batmanlab / Mammo-CLIP

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

breast-cancer-prediction clip mammogram rsna multimodal vision-and-language efficientnet vindr rsna-breast-cancer

Updated May 31, 2024
Python

NVlabs / DoRA

[ICML2024] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

deep-neural-networks deep-learning lora commonsense-reasoning vision-and-language large-language-models parameter-efficient-tuning instruction-tuning large-vision-language-models parameter-efficient-fine-tuning

Updated May 28, 2024
Python

om-ai-lab / OmDet

Fast and accurate open-vocabulary end-to-end object detection

object-detection vision-and-language open-vocabulary

Updated May 27, 2024
Python

CurryYuan / ZSVG3D

[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

3d zero-shot vision-and-language visual-grounding

Updated May 26, 2024
Jupyter Notebook

sohailahmedkhan / CLIPping-the-Deception

Code and pre-trained models for our paper "CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection".

clip vision-and-language deepfakes fake-image-detection deepfake-detection

Updated May 24, 2024
Python

mees / calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

natural-language-processing computer-vision deep-learning robotics pytorch vision manipulation vision-and-language grounding vision-language

Updated May 17, 2024
Python

zeyofu / BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390

benchmark natural-language-processing ai computer-vision multimodal-learning multimodal vision-and-language

Updated May 15, 2024
Python

aimagelab / open-fashion-clip

This is the official repository for the paper "OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data". ICIAP 2023

clip vision-and-language fashionai contrastive-learning

Updated May 13, 2024
Python

DaveSimoes / Awesome-List-for-Developers

A curated list of various topics related to development for the open source community⚡️

Updated May 10, 2024

zhuang-li / FactualSceneGraph

FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.

natural-language-processing scene-graph vision-and-language

Updated May 8, 2024
Python

om-ai-lab / OVDEval

A Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)

object-detection vision-and-language open-vocabulary-detection

Updated May 7, 2024
Python

HyperGAI / HPT

HPT - Open Multimodal LLMs from HyperGAI

multimodal vision-and-language generative-ai

Updated May 6, 2024
Python

Improve this page

Add a description, image, and links to the vision-and-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vision-and-language topic, visit your repo's landing page and select "manage topics."