INGRESS

Interactive Visual Grounding of Referring Expressions for Human Robot Interaction
Mohit Shridhar, David Hsu
RSS 2018

This is a docker image (~9.2GB) of my demo setup for grounding referring expressions. You can treat this is as a black box; input: image & expression, output: bounding boxes and question captions. See Architecture for more details.

If you find the code useful, please cite:

@inproceedings{Shridhar-RSS-18, 
    author    = {Mohit Shridhar AND David Hsu}, 
    title     = {Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction}, 
    booktitle = {Proceedings of Robotics: Science and Systems}, 
    year      = {2018}
}

And works in the acknowledgements.

Requirements

Software

Hardware

Tested on NVIDIA GTX 1080 (needs about 2.5 GB RAM)

Installation

The docker image contains: ROS (Indigo), Torch, Caffe, and Ingress (source code). To run and test Ingress inside the docker image, you don't need to install any dependencies other than nvidia-docker itself.

Nvidia Docker

Follow the instructions to install NVIDIA docker. You should be able to run this, if everything is installed properly:

$ docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

Quickstart

A quick guide to testing the whole system inside the docker image.

Start Docker

Clone the repo OR unzip the folder:

$ git clone https://github.com/MohitShridhar/ingress.git

Run the script. The first time you run this command, Docker downloads an 9.2GB image (could take a while!):

$ cd <ingress_dir>
$ sh start_ingress.sh

Roscore

Start roscore in a tmux shell:

root@pc:/# tmux new -s roscore
root@pc:/# roscore

Press Ctrl+b and d to escape the tmux shell.

Ingress

Start the INGRESS server in a tmux shell by running the ingress command:

root@pc:/# tmux new -s ingress
root@pc:/# ingress

Wait until you see METEOR initialized. That means the grounding server is ready. Now you can send images and expressions to the server, and receive grounded bounding boxes and question captions as output.

Test

Run the example in another tmux shell:

root@pc:/# tmux new -s test
root@pc:/# cd ~/ros_devel_ws/src/ingress/examples/
root@pc:/# python interactive_grounding_example.py

Type "the red cup" into the query. This outputs grounding_result.png and prints out self-referrential and relational question captions:

[INFO] [WallTime: 1532576914.160205] Self Referential Captions:
['a red cup on the table', 'red cup on the table', 'red cup on the table']

[INFO] [WallTime: 1532576914.160599] Relational Captions:
['the red cup in the middle.', 'the red cup on the left.', 'the red cup on the right.']

To open grounding_result.png, on a separate shell:

$ docker cp <container_id>:/root/ros_devel_ws/src/ingress/examples/grounding_result.png ./

Exit

To shutdown the ingress server, use Ctrl + c or Ctrl + \.

Robot Setup

To integrate Ingress with real-robots, use the docker image as a grounding server. But first, you need to compile the ROS actionlib interface on your robot or client-pc in order to communicate with the Ingress server (that is running inside the docker image).

Compile Interface

On your robot/client-pc, clone the interface repo:

$ cd <your_ros_workspace>/src
$ git clone --recursive https://github.com/MohitShridhar/ingress.git

Install actionlib messages:

$ cd <your_ros_workspace>
$ catkin_make --pkg action_controller

Network

Edit the start_ingress.sh script with your network settings:

...
MASTER_URI=http://<roscore_ip_addr>:11311
IP=<ingress_system_ip_addr>
...

Usage

Start roscore on your robot or client-pc. Then start ingress inside the docker image:

$ sh start_ingress.sh
root@pc:/# ingress

You should now be able to run the Quickstart example outside the docker image on all clients connected to roscore.

Options

Disambiguation

By default, the disambiguation is enabled. It can disabled by setting DISAMBIGUATE=false in ~/ingress_server.sh for fast-grounding without disambiguation:

root@pc:/# sed -i 's/DISAMBIGUATE=true/DISAMBIGUATE=false/g' ~/ingress_server.sh
root@pc:/# ingress

Tips

Make sure the input image is well-lit, and the scene is uncluttered
Crop the image to exclude irrelevant parts of the scene (e.g: backdrop of the table) to reduce mis-detections
roscore should be up and running before you start the ingress server
Use tmux to multiplex roscore, ingress and python interactive_grounding_example.py

Caveats

This demo code doesn't contain the interactive (robot-pointing) question asking interface.
For grounding perspectives (e.g: 'my left', 'your right') see perspective correction guide.

Issues

If Lua complains that certain CUDA functions were not found during execution: remove the clean-up option --rm from the docker command in start_ingress.sh. Run the script and reinstall the rocks:

$ luarocks install cutorch
$ luarocks install cunn
$ luarocks install cudnn

Exit and docker commit the changes to the image.

Acknowledgements

Johnson et. al, Densecap

@inproceedings{densecap,
  title={DenseCap: Fully Convolutional Localization Networks for Dense Captioning},
  author={Johnson, Justin and Karpathy, Andrej and Fei-Fei, Li},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and 
             Pattern Recognition},
  year={2016}
}

Nagaraja et. al, Referring Expressions

@inproceedings{nagaraja16refexp,
  title={Modeling Context Between Objects for Referring Expression Understanding},
  author={Varun K. Nagaraja and Vlad I. Morariu and Larry S. Davis},
  booktitle={ECCV},
  year={2016}
}

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
action_controller @ 7d378c4		action_controller @ 7d378c4
data		data
docs		docs
examples		examples
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
start_ingress.sh		start_ingress.sh

License

MohitShridhar/ingress

Folders and files

Latest commit

History

Repository files navigation