Skip to content

Implementation of CLIP model with a reduced capacity. For self-educational purposes only.

Notifications You must be signed in to change notification settings

mattroz/miniCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

miniCLIP

Implementation of CLIP model with a reduced capacity. For self-educational purposes only.

clip_summary

This repo currently contains only CLIP-ResNet implementation, while in the original paper there are 5 ResNets and 3 ViTs models. There was no intention to beat SotA or train a superior version of CLIP. This is just an attempt to understand the logic behind CLIP.

Preliminary results

After training CLIP-ResNet50 for 10 epochs, the following results were obtained.

As can be seen, the results are not great, but the model is definetely trying to stick closer to correct pairs.

Example usage

Train

To run the training, you should first download the COCO dataset and provide paths to annotations and images for both train and val in a config (check example here). After that, run:

python tools/train.py --path_to_config=configs/clip_base.yaml --path_to_log=logs/

This will create directory structure under the logs/ directory for each run separately (aka experiment directories):

logs/
  |--{experiment_name}/
      |--artifacts/
      |--checkpoints/
      |--train.log
      |--{experiment_name}.yaml               

Under the logs/{experiment_name}/artifacts/ a training_progress.log will be saved, containing losses for train and validation. Each training run generates an overrided config and saves it under the logs/{experiment_name}/ directory.

Plot similarity matrices

To plot similarity matrices on validation dataset, run:

python tools/plot_similarities.py --path_to_config=logs/{experiment_name}/{experiment_name}.yaml \
                                  --path_to_ckpt=logs/{experiment_name}/checkpoints/some_ckpt.pth \
                                  --n_pairs=8 \
                                  --n_matricies=5

Here, n_matricies denotes number of similarity matrices to create, and n_pairs denotes number of image-text pairs to include into each similarity matrix. All the similarity matrices will be saved under logs/{experiment_name}/artifacts/.

About

Implementation of CLIP model with a reduced capacity. For self-educational purposes only.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages