Name		Name	Last commit message	Last commit date
parent directory ..
captioning		captioning
vision		vision
README.md		README.md
__init__.py		__init__.py

README.md

Models

A selection of the best models are available for download from my Google Drive. After downloading simply store the pre-trained model directories in either the vision/experiments or captioning/experiments directory.

A summary of the models and their results is below

Vision

Framewise CNN

The first model (with ID 0006) and basis for many other experiments was a framewise DenseNet-121 architecture, this can be evaluated with

python evaluate.py --model_id 0006 --backbone DenseNet121

.......

Two Stream

The two-stream model (with ID 0010) utilises two DenseNet-121 CNNs, one for flow and one for RGB. The model can be evaluated with

python evaluate.py --model_id 0010 --backbone DenseNet121 --flow twos

.......

R(2+1)D

The 3D CNN (with ID 0031) utilises the a R(2+1)D architecture and can be evaluated with

python evaluate.py --model_id 0031 --backbone rdnet --window 8 --data_shape 224

The CNN is fine-tuned from pre-training on Kinetics and only uses input images of 224 by 224 due to memory constraints

.......

Temporal Pooling

The temporal pooling model (with ID 0028) utilises the pretrained framewise DenseNet-121 architecture (0006), and uses temporal max pooling. It can be evaluated with

python evaluate.py --model_id 0028 --backbone DenseNet121 --temp_pool mean --window 15 --backbone_from_id 0006 --feats_model 0006

by specifying --feats_model 0006 the model is expecting to read pre-extracted features from \data\features\$model_id$\. These features can be extracted by running something like the following

python evaluate.py --model_id 0006 --backbone DenseNet121 --save_feats

.......

CNN - RNN

The CNN-RNN model (with ID 0042) utilises the pretrained framewise DenseNet-121 architecture (0006), this can be evaluated with

python evaluate.py --model_id 0042 --backbone DenseNet121 --temp_pool gru --window 30 --backbone_from_id 0006 --feats_model 0006 --freeze_backbone

Captioning

The CNN-RNN captioning model (with ID 0102) utilises the pretrained framewise DenseNet-121 architecture (0006), and expects the features to be pre-extracted (see Temporal Pooling above). This can be evaluated with

python evaluate_gnmt.py --model_id 0102 --num_hidden 256 --backbone_from_id 0006 --feats_model 0006

NOTE: The captioning scripts require the nlg-eval package. Please install prior as recommended by thier README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

models

models

captioning

captioning

vision

vision

README.md

README.md

init.py

init.py

README.md

Models

Vision

Framewise CNN

Two Stream

R(2+1)D

Temporal Pooling

CNN - RNN

Captioning

Files

models

Directory actions

More options

Directory actions

More options

Latest commit

History

models

Folders and files

parent directory

Models

Vision

Framewise CNN

Two Stream

R(2+1)D

Temporal Pooling

CNN - RNN

Captioning