VuBot

Description

VuBot is an application that combines speech and gesture recognition to interact with objects in a real-time video feed. Using a webcam, users can point at objects and issue voice commands to perform actions such as detecting individual objects, recognizing all objects in the scene, or querying the color of a specific object. VuBot leverages powerful libraries and models like MediaPipe for gesture detection, OpenCV for video processing, and OpenAI whisper for capturing and processing voice commands.

Key Features

Gesture Recognition: Detects gestures such as pointing, closed fist, and victory using MediaPipe.
Speech Recognition: Processes voice commands to trigger actions like object detection and color recognition.
Object Detection: Identifies objects in the video feed and draws bounding boxes around them.
Color Recognition: Determines the color of objects by averaging the colors within the bounding boxes.

VuBot is designed to be intuitive and user-friendly, making it a versatile tool for various applications.

Models used

Gesture Recognition: The application uses the MediaPipe library to recognize gestures.
Speech Recognition: The application uses the OpenAI whisper model to recognize speech.
Object Recognition: The application uses the Facebook DETR model to recognize objects.

Installation

1 - First, clone the project from the repository and navigate to the project root:

git clone https://github.com/darmangerd/vubot.git

cd vubot

2 - Next, install the project dependencies, preferably in a virtual environment. To do this, execute the following commands from the project root:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3 - Finally, run the project:

python app.py

Be sure to have a functional microphone and webcam connected to your computer.

Guide

Gesture	Trigger Word	Output
Pointing	'object'	Return object's name
Pointing	'color'	Return object's color
Closed Fist	'every item'	Highlight all detected objects

Project Structure

app.py: Main file to run the project.
/evaluation: Folder containing the material used for the evaluation phase of the project.
- /evaluation/evaluation_keys.py: File containing the alternative keys method, used to evaluate the project (object and color names are manipulated).
- /evaluation/evaluation_speech.py: File containing the alternative speech method, used to evaluate the project (object and color names are manipulated).
- /evaluation/main_evaluation.csv: File containing the evaluation data obtained during the evaluation trials.
- /evaluation/accuracy_evaluation.py: File containing the evaluation of the accuracy.
- /evaluation/runtime_evaluation.py: File containing the evaluation of the runtime.
requirements.txt: File containing the project dependencies.
/images: Folder containing the images saved for debugging purposes.
/utils: Folder containing the utility functions used in the project.
- /utils/models: Folder containing the gesture recognition model used in the project (mediapipe).
/docs: Folder containing the project documentation. Includes the report, presentation and demo video.

Future Work

Future enhancements include developing a mobile version, improving audio speech handling, adding more interaction methods, integrating a large language model (LLM) for richer interactions, and implementing features to remember and locate specific objects.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
docs		docs
evaluation		evaluation
utils/model		utils/model
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

evaluation

evaluation

utils/model

utils/model

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

app.py

app.py

requirements.txt

requirements.txt

Repository files navigation

VuBot

Description

Key Features

Models used

Installation

Guide

Project Structure

Future Work

About

Releases

Packages

Contributors 3

Languages

darmangerd/vubot

Folders and files

Latest commit

History

Repository files navigation

VuBot

Description

Key Features

Models used

Installation

Guide

Project Structure

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Languages