Skip to content

Latest commit

 

History

History

models

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Models

A selection of the best models are available for download from my Google Drive. After downloading simply store the pre-trained model directories in either the vision/experiments or captioning/experiments directory.

A summary of the models and their results is below

Vision

Framewise CNN

The first model (with ID 0006) and basis for many other experiments was a framewise DenseNet-121 architecture, this can be evaluated with

python evaluate.py --model_id 0006 --backbone DenseNet121

.......

Two Stream

The two-stream model (with ID 0010) utilises two DenseNet-121 CNNs, one for flow and one for RGB. The model can be evaluated with

python evaluate.py --model_id 0010 --backbone DenseNet121 --flow twos

.......

R(2+1)D

The 3D CNN (with ID 0031) utilises the a R(2+1)D architecture and can be evaluated with

python evaluate.py --model_id 0031 --backbone rdnet --window 8 --data_shape 224

The CNN is fine-tuned from pre-training on Kinetics and only uses input images of 224 by 224 due to memory constraints

.......

Temporal Pooling

The temporal pooling model (with ID 0028) utilises the pretrained framewise DenseNet-121 architecture (0006), and uses temporal max pooling. It can be evaluated with

python evaluate.py --model_id 0028 --backbone DenseNet121 --temp_pool mean --window 15 --backbone_from_id 0006 --feats_model 0006

by specifying --feats_model 0006 the model is expecting to read pre-extracted features from \data\features\$model_id$\. These features can be extracted by running something like the following

python evaluate.py --model_id 0006 --backbone DenseNet121 --save_feats

.......

CNN - RNN

The CNN-RNN model (with ID 0042) utilises the pretrained framewise DenseNet-121 architecture (0006), this can be evaluated with

python evaluate.py --model_id 0042 --backbone DenseNet121 --temp_pool gru --window 30 --backbone_from_id 0006 --feats_model 0006 --freeze_backbone

Captioning

The CNN-RNN captioning model (with ID 0102) utilises the pretrained framewise DenseNet-121 architecture (0006), and expects the features to be pre-extracted (see Temporal Pooling above). This can be evaluated with

python evaluate_gnmt.py --model_id 0102 --num_hidden 256 --backbone_from_id 0006 --feats_model 0006

NOTE: The captioning scripts require the nlg-eval package. Please install prior as recommended by thier README