Name		Name	Last commit message	Last commit date
parent directory ..
lib		lib
models		models
preprocessing		preprocessing
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
reqs_conda.txt		reqs_conda.txt
reqs_pip.txt		reqs_pip.txt

README.md

scan2cap-2d

Setup

Step 1 - Download & Unzip the Required Files

Download the hdf5 databaes for one of the following viewpoint types:

Annotated Viewpoint Database (Oracle Boxes):
- db_annotated.h5 (~3GB)
Annotated Viewpoint Database (MRCNN Detected Boxes):
- db_annotated_mrcnn.h5 (~0.5GB)
Estimated Viewpoint Database (3D-2D Backprojected)
- db_estimated.h5 (~2.5GB)
Bird's Eye Viewpoint Database (Top-Down)
- db_td.h5 (~11GB)

Download the ScanRefer train and validation splits:
- ScanRefer Download
Download the vocabulary, glove embeddings and word weights:
Pre-trained models: https://mega.nz/folder/PJARTYJC#JhlUw3zagS9ck402_5TyRw

and unzip the downloaded files to your desired location. Each database contains global features, object features, object bounding box, semantic label and object id corresponding to each sample in the desired ScanRefer split.

Optional - Prepare Databases from Scratch

Alternatively, you can manually render color and instance masks and use the code provided in preprocessing to obtain these databases. Make sure to set IMAGE and INSTANCE_MASK paths in the conf.py file. Here is a quick guide on how to use preprocessing module:

python main.py --prep --exp_type $ET --dataset $DS --viewpoint $VP --box $BX

where variables can take the following permutations:

$DS	$VP	$BX	Comments
scanrefer	annotated	oracle	Extracts oracle bounding boxes, bounding box features and global features from annotated viewpoints.
scanrefer	annotated	mrcnn	Extracts MaskRCNN detected bounding boxes, bounding box features and global features from annotated viewpoints.
scanrefer	estimated	votenet	Extracts votenet estimated bounding boxes, bounding box features and global features from estimated viewpoints.
scanrefer	topdown	oracle	Extracts bird's eye view bounding boxes, bounding box features and global features from bird's eye viewpoints.

Step 2 - Install Required Packages

Code was tested with the following settings:

python==3.6.10
cudatoolkit==10.0.130
pytorch==1.2.0
torchvision==0.4.0

To setup the enviroment, simply run the following:

conda create -n scan2cap2d --file reqs_conda.txt && 
conda activate scan2cap2d && pip install -r reqs_pip.txt

Training and Evaluation

Set the following paths in scan2cap-2d/lib/conf.py based on your needs:

CONF.PATH.DATA_ROOT = '/home/user/data'
CONF.PATH.CODE_ROOT = '/home/user/code'
CONF.PATH.SCANNET_DIR = "/scannet/public/v2"

Command-line arguments to run the training and/or evaluation; permutations are the same as provided in the preprocessing step.

ap.add_argument("--exp_type", default="nret", help="retrieval or nonretrieval")
ap.add_argument("--dataset", default="scanrefer", help="scanrefer or referit")
ap.add_argument("--viewpoint", default="annotated", help="annotated, estimated or bev")
ap.add_argument("--box", default="oracle", help="oracle, mrcnn or votenet")

Training and Evaluation

python main.py --train --exp_type $ET --dataset $DS --viewpoint $VP --box $BX --model $MD --visual_feat $VF...

where $MD='snt' for the Show and Tell model, and $MD='satnt' for Top-down and Bottom-up Attention Model. Also, $VF can take any combination of 'GTC', where it corresponds to GLOBAL, TARGET and CONTEXT respectively. Note that $MD='snt' only allows for 'GT'. By default, $ET is set to 'nret' which stands for Non-Retrieval. To run a retrieval experiment use $ET='ret'.

other options include:

 --batch_size 128 
 --num_workers 16 
 --val_step 1000    
 --lr 1e-3
 --wd 1e-5
 --seed 42

Evaluation Only

python main.py --eval --exp_type $ET --dataset $DS --viewpoint $VP --box $BX --folder $EN

where $EN is the experiment directory name.

Reproducing the results

Here is a set of experiments reported in the Scan2Cap paper and the commands to reproduce them. Please refer to table 6 and 8 in our paper for experiment names: https://arxiv.org/pdf/2012.02206.pdf.

For the M2 and M2-RL results, please refer to the official Meshed-Memory Transformer.

Experiment	Command	CIDER	BLEU-4	METEOR	ROUGLE-L
{G, A, -, S&T}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box oracle --visual_feat 'G' --model 'snt' --folder exp2 --ckpt_path pretrained/ant_snt_g/model.pth`	51.65	13.53	20.36	46.92
{T, A, O, Retr}	`python main.py --eval --exp_type ret --dataset scanrefer --viewpoint annotated --box oracle --visual_feat 'T' --folder exp3`	30.64	9.74	18.93	41.26
{T+C, A, O, TD}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box oracle --visual_feat 'TC' --model 'satnt' --folder exp4 --ckpt_path pretrained/ant_td_tc/model.pth`	48.50	15	20.52	49.31
{G+T, A, O, S&T}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box oracle --visual_feat 'GT' --model 'snt' --folder exp5 --ckpt_path pretrained/ant_snt_gt/model.pth`	60.95	14.79	21.24	47.91
{G+T+C, A, O, TD}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box oracle --visual_feat 'GTC' --model 'satnt' --folder exp6 --ckpt_path pretrained/ant_td_gtc/model.pth`	20.34	7.47	16.54	40.25
{T+C, A, 2DM, TD}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box mrcnn --visual_feat 'TC' --model 'satnt' --folder exp7 --ckpt_path pretrained/ant_td_tc/model.pth`	27.00	12.32	46.49	18.61
{G+T, A, 2DM, S&T}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box mrcnn --visual_feat 'GT' --model 'snt' --folder exp8 --ckpt_path pretrained/ant_snt_gt/model.pth`	32.88	12.32	19.38	45.04
{G+T+C, A, 2DM, TD}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint annotated --box mrcnn --visual_feat 'GTC' --model 'satnt' --folder exp9 --ckpt_path pretrained/ant_td_gtc/model.pth`	11.04	5.67	15.55	37.29
{T+C, E, 3DV, TD}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint estimated --box votenet --visual_feat 'TC' --model 'satnt' --folder exp10 --ckpt_path pretrained/ant_td_tc/model.pth`	37.81	13.70	19.84	48.23
{G+T, E, 3DV, S&T}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint estimated --box votenet --visual_feat 'GT' --model 'snt' --folder exp11 --ckpt_path pretrained/ant_snt_gt/model.pth`	39.46	12.22	19.65	44.62
{G+T+C, E, 3DV, TD}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint estimated --box votenet --visual_feat 'GTC' --model 'satnt' --folder exp12 --ckpt_path pretrained/ant_td_gtc/model.pth`	15.26	6.60	16.00	38.87
{G, BEV, O, S&T}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint bev --box oracle --visual_feat 'G' --model 'snt' --folder exp13 --ckpt_path pretrained/bev_g/model.pth`	27.31	12.02	18.70	46.82
{G+T, BEV, O, S&T}	`python main.py --eval --exp_type nret --dataset scanrefer --viewpoint bev --box oracle --visual_feat 'GT' --model 'snt' --folder exp14 --ckpt_path pretrained/bev_gt/model.pth`	30.41	13.89	19.37	48.18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scan2Cap-2D

Scan2Cap-2D

lib

lib

models

models

preprocessing

preprocessing

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

main.py

main.py

reqs_conda.txt

reqs_conda.txt

reqs_pip.txt

reqs_pip.txt

README.md

scan2cap-2d

Setup

Step 1 - Download & Unzip the Required Files

Optional - Prepare Databases from Scratch

Step 2 - Install Required Packages

Training and Evaluation

Reproducing the results

Files

Scan2Cap-2D

Directory actions

More options

Directory actions

More options

Latest commit

History

Scan2Cap-2D

Folders and files

parent directory

scan2cap-2d

Setup

Step 1 - Download & Unzip the Required Files

Optional - Prepare Databases from Scratch

Step 2 - Install Required Packages

Training and Evaluation

Reproducing the results