Skip to content

duanzhiihao/lossy-vae

Repository files navigation

Lossy Image Compression using Hierarchical VAEs

This repository contains authors' implementation of several deep learning-based methods related to lossy image compression.
This project is under active development.

Models

Implemented Methods (Pre-Trained Models Available)

  • Lossy Image Compression with Quantized Hierarchical VAEs [arxiv] [cvf] [ieee]
  • QARV: Quantization-Aware ResNet VAE for Lossy Image Compression [arxiv] [ieee]
    • Published at TPAMI 2023
    • Abstract: an improved version of the previous model; Variable-rate, faster decoding, better performance.
    • [Code & pre-trained models]: lossy-vae/lvae/models/qarv
  • An Improved Upper Bound on the Rate-Distortion Function of Images [arxiv] [ieee]
    • Published at ICIP 2023
    • Abstract: a 15-layer VAE model used to estimate the information R(D) function. This model proves that -30% BD-rate w.r.t. VTM is theoretically achievable.
    • [Code & pre-trained models]: lossy-vae/lvae/models/rd

Features

Progressive coding: our models learn a deep hierarchy of latent variables and compress/decompress images in a coarse-to-fine fashion. This feature comes from the hierarchical nature of ResNet VAEs.

Compression performance: our models are powerful in terms of both rate-distortion and decoding speed. Please see the results section below.

Results

Bpp-PSNR results in JSON format

Notes on metric computation:

  • Bpp and PSNR are first compute for each image and then averaged over all images in a dataset.
  • Bpp is the saved file size (in bits) divided by # of image pixels.
  • PSNR is computed in RGB space (not YUV).

Encoding/decoding latency on CPU/GPU, and BD-rate

Model Name CPU* Enc. CPU* Dec. 3080 ti Enc. 3080 ti Dec. BD-rate* (lower is better)
qres34m 0.899s 0.441s 0.116s 0.083s -3.95 %
qarv_base 0.757s 0.295s 0.096s 0.063s -7.26 %

*Time is the latency to encode/decode a 512x768 image, averaged over 24 Kodak images. Tested in plain PyTorch (v1.13 + CUDA 11.7) code, ie, no mixed-precision, torchscript, ONNX/TensorRT, etc.
*CPU is Intel 10700k.
*BD-rate is w.r.t. VTM 18.0, averaged on three common test sets (Kodak, Tecnick TESTIMAGES, and CLIC 2022 test set).

Install

Requirements:

Download and Install:

  1. Download the repository;
  2. Modify the dataset paths in lossy-vae/lvae/paths.py.
  3. [Optional] pip install the repository in development mode:
cd /pasth/to/lossy-vae
python -m pip install -e .

Usage

Get pre-trained weights

from lvae import get_model
model = get_model('qarv_base', pretrained=True) # weights are downloaded automatically
model.eval()
model.compress_mode(True) # initialize entropy coding

Compress images

Encode an image:

model.compress_file('/path/to/image.png', '/path/to/compressed.bits')

Decode an image:

im = model.decompress_file('/path/to/compressed.bits')
# im is a torch.Tensor of shape (1, 3, H, W). RGB. pixel values in [0, 1].

Datasets

COCO

  1. Download the COCO dataset "2017 Train images [118K/18GB]" from https://cocodataset.org/#download
  2. Unzip the images anywhere, e.g., at /path/to/datasets/coco/train2017
  3. Edit lossy-vae/lvae/paths.py such that
known_datasets['coco-train2017'] = '/path/to/datasets/coco/train2017'

Kodak (link), Tecnick TESTIMAGES (link), and CLIC (link)

python scripts/download-dataset.py --name kodak         --datasets_root /path/to/datasets
                                          clic2022-test
                                          tecnick

Then, edit lossy-vae/lvae/paths.py such that known_datasets['kodak'] = '/path/to/datasets/kodak', and similarly for other datasets.

Custom Dataset

  1. Prepare a folder containing images. The folder should contain only images (may contain subfolders).
  2. Edit lossy-vae/lvae/paths.py such that known_datasets['custom-name'] = '/path/to/my-dataset', where custom-name is the name of your dataset, and /path/to/my-dataset is the path to the folder containing images.
  3. Then, you can use custom-name as the dataset name in the training/evaluation scripts.

Training and evaluation scripts

Training and evaluation scripts vary from model to model. For example, qres34m uses fixed-rate train/eval scheme, while qarv_base uses variable-rate train/eval scheme.
Detailed training/evaluation instructions are provided in each model's subfolder (see the section Models).

License

Code in this repository is freely available for non-commercial use.

About

Authors' PyTorch implementation of lossy image compression methods that are based on hierarchical VAEs

Topics

Resources

Stars

Watchers

Forks