Skip to content

jacobmarks/vqa-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Question Answering Plugin

vqa_updated

Updates

  • 2024-05-07: Major updates:

    • Added support for Moondream2 model.
    • Added support for reading question from field on the sample.
    • Added support for storing the answer in a field on the sample.
    • Added support for applying to all samples in the current view (one at a time).
    • Added support for delegated execution.
    • Added support for Python operator execution.
  • 2024-05-03: @harpreetsahota204 added support for Idefics-8b model from Replicate.

  • 2023-10-24: Added support for Llava-13b and Fuyu-8b models from Replicate.

Plugin Overview

This plugin is a Python plugin that allows you to answer visual questions about images in your dataset!

Supported Models

This version of the plugin supports the following models:

Feel free to fork this plugin and add support for other models!

Watch On Youtube

Video Thumbnail

Installation

Pre-requisites

  1. If you plan to use it, install the Hugging Face transformers library:
pip install transformers
  1. If you plan to use it, install the Replicate library:
pip install replicate

And add your Replicate API key to your environment:

export REPLICATE_API_TOKEN=<your-api-token>

Install the plugin

fiftyone plugins download https://github.com/jacobmarks/vqa-plugin

Operators

answer_visual_question

  • Applies the selected visual question answering model to the selected sample in your dataset and outputs the answer.

Usage

The recommended interactive way to use this plugin is in the FiftyOne App with exactly one sample selected.

Python Operator Execution

If you want to loop over samples in your dataset or view, you may be interested in using the Python operator execution mode.

import fiftyone as fo
import fiftyone.operators as foo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart", max_samples=5)

## Access the operator via its URI (plugin name + operator name)
vqa = foo.get_operator("@jacobmarks/vqa/answer_visual_question")

## Apply the operator to the dataset
vqa(
    dataset,
    model_name="llava",
    question="Describe the image",
    answer_field="llava_answer",
)

## Print the answers
print(dataset.values("llava_answer"))

Releases

No releases published

Packages

No packages published

Languages