Skip to content

Data generation, model training and inference for Visual Font Recognition using PyTorch

License

Notifications You must be signed in to change notification settings

Dexterp37/fontina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fontina

fontina is a PyTorch library that helps with training models for the task of Visual Font Recognition and doing inference with them.

Feature highlights

Using fontina

Installing the dependencies

Starting from a cloned repository directory:

# Create a virtual environment: this uses venv but any system
# would work!
python -m venv .venv

# Activate the virtual environment: this depends on the OS. See
# the two options below.
# .venv/Scripts/activate # Windows
source .venv/bin/activate # Linux

# Install the dependencies needed to use .
pip install .

Note Windows users must manually install the CUDA-based version of PyTorch, as pip will only install the CPU version on this platform. See PyTorch Get Started for the specific command to ru, which should be something along the lines of pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117.

(Optional) - Installing development dependencies

The following dependencies are only needed to develop fontina.

# Install the developer dependencies.
pip install .[linting]

# Run linting and tests!
make lint
make test

Generating a synthetic dataset

If needed, the model can be trained on synthetic data. fontina provides a synthetic dataset generator that follows part of the recommendations from the DeepFont paper to make the synthetic data look closer to the real data. To use the generator:

  1. Make a copy of configs/sample.yaml, e.g. configs/mymodel.yaml
  2. Open configs/mymodel.yaml and tweak the fonts section:
fonts:

  # ...

  # Force an uniform white background for the generated images.
  # backgrounds_path: "assets/backgrounds"

  samples_per_font: 1000

  # Fill in the paths of the fonts that need to be used to generate
  # the data.
  classes:
    - name: Test Font
      path: "assets/fonts/test/Test.ttf"
    - name: Other Test Font
      path: "assets/fonts/test2/Test2.ttf"
  1. Run the generation:
fontina-generate -c configs/mymodel.yaml -o outputs/font-images/mymodel

After this completes, there should be one directory per configured font in outputs/font-images/mymodel.

Training

fontina currently only supports training a DeepFont-like architecture. The training process has two major steps: unsupervised training of the stacked autoencoders and supervised training of the full network.

Before starting, make a copy of configs/sample.yaml, e.g. configs/mymodel.yaml (or use the existing one that was created for the dataset generation step).

Part 1 - Unsupervised training

  1. Open configs/mymodel.yaml and tweak the training section:
training:
  # Set this to True to train the stacked autoencoders.
  only_autoencoder: True

  # Don't use an existing checkpoint for the unsupervised training.
  # scae_checkpoint_file: "outputs/models/autoenc/good-checkpoint.ckpt"

  data_root: "outputs/font-images/mymodel"

  # The directory that will contain the model checkpoints.
  output_dir: "outputs/models/mymodel-scae"

  # The size of the batch to use for training.
  batch_size: 128

  # The initial learning rate to use for training.
  learning_rate: 0.01

  epochs: 20

  # Whether or not to use a fraction of the data to run a
  # test cycle on the trained model.
  run_test_cycle: True
  1. Then run the training with:
python src/fontina/train.py -c configs/mymodel.yaml

Part 2 - Supervised training

  1. Open configs/mymodel.yaml (or create a new one!) and tweak the training section:
training:
  # This will freeze the stacked autoencoders.
  only_autoencoder: False

  # Pick the best checkpoint from the unsupervised training.
  scae_checkpoint_file: "outputs/models/mymodel-scae/good-checkpoint.ckpt"

  data_root: "outputs/font-images/mymodel"

  # The directory that will contain the model checkpoints.
  output_dir: "outputs/models/mymodel-full"

  # The size of the batch to use for training.
  batch_size: 128

  # The initial learning rate to use for training.
  learning_rate: 0.01

  epochs: 20

  # Whether or not to use a fraction of the data to run a
  # test cycle on the trained model.
  run_test_cycle: True
  1. Then run the training with:
fontina-train -c configs/mymodel.yaml

(Optional) - Monitor performance using TensorBoard

fontina automatically captures the performances of the training runs in a TensorBoard-compatible way. It should be possible to visualize the recorded data by pointing TensorBoard to the logs directory as follows:

tensorboard --logdir=lightning_logs

Inference

Once training is complete, the resulting model can be used to run inference.

fontina-predict -n 6 -w "outputs/models/mymodel-full/best_checkpoint.ckpt" -i "assets/images/test.png"

AdobeVFR Pre-trained model

The AdobeVFR dataset is currently available for download at Dropbox, here. The license for using and distributing the dataset is available here, which cites:

This dataset ('Licensed Material') is made available to the scientific community for non-commercial research purposes such as academic research, teaching, scientific publications or personal experimentation.

The model, being trained on that dataset, retains the same spirit and the same license applies: the release model can only be used for non-commercial purposes.

How to train

  1. Download the dataset to assets/AdobeVFR
  2. Unpack assets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new.zip in that directory so that the assets/AdobeVFR/Raw Image/VFR_real_u/scrape-wtf-new/ path exists
  3. Run fontina-train -c configs/adobe-vfr-autoencoder.yaml. This will take a long while but progress can be checked with Tensorboard (see the previous sections) during training
  4. Change configs/adobe-vfr.yaml so that scae_checkpoint_file points to the best checkpoint from step (3).
  5. Run fontina-train -c configs/adobe-vfr.yaml. This will take a long while (but less than the unsupervised training round)

Downloading the models

While only the full model is needed, the stand-alone autonencoder model is being released as well.

Note The pre-trained model achieves a validation loss of 0.3523, with an accuracy of 0.8855 after 14 epochs. Unfortunately the test performance on VFR_real_test is much worse, with a top-1 accuracy of 0.05. I'm releasing the model in the hope that somebody could help me fixing this 😊😅

About

Data generation, model training and inference for Visual Font Recognition using PyTorch

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published