ml-zip: File compression using Machine Learning

A package for data compression using Machine Learning.

Waring: 🏗️ Under construction 0.0.1 pre-alpha

Usecase

You have a large amount of computational resources but a terrible internet bandwidth to send files.
You want to get started on the Hutter Prize.

Changelog

0.0.1:
- Add Arithmetic Coding (unlimited precisions) method in plein Python using int64 (or int32).
- Add the following estimators:
  - Fixed Frequency, Adaptive Frequency (implemented using Fenwick tree).
  - LSTM (used PyTorch).

Usage

import sys
from mlzip.arithmetic_coding import ArithmeticCoding
from mlzip.estimators import PPMEstimator, LSTM

tiny_model = {
  'context_width': 32,
  'embed_size': 16,
  'hidden_size': 32,
  'num_layers': 2,
  'epochs': 30,
  'lr': 1e-2
}

if __name__ == '__main__':
  with open('example.txt', 'r') as f:
    text = f.read()
  model_type = LSTM
  estimator = PPMEstimator(stream_type='char',
                           context_width=tiny_model.get('context_width'),
                           model_type=model_type,
                           model_configs=tiny_model)
  org_size = len(text)*8
  print('Original size {} bits.'.format(org_size))
  estimator.fit(text)
  ac = ArithmeticCoding(estimator)
  encoded, length = ac.encode(text)
  decoded = ac.decode(encoded, length)
  assert ''.join(decoded) == text
  encoded_size = len(encoded)
  model_size = sys.getsizeof(estimator.model.state_dict())*8
  print('The file was encoded to {} bits with the parameter size of {} bits'
        '.'.format(encoded_size, model_size))
  print('Compression ratio: {:.2f}.'.format(org_size/(model_size+encoded_size)))

Author

honghaipvu at gmail dot com

References

The implementation of infinite precision Arithmetic Coding using integers follows Mark Nelson's guide.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
examples		examples
mlzip		mlzip
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

examples

examples

mlzip

mlzip

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

ml-zip: File compression using Machine Learning

Waring: 🏗️ Under construction 0.0.1 pre-alpha

Usecase

Changelog

Usage

Author

References

License

About

Releases

Packages

Languages

License

HongHaiPV/ml-zip

Folders and files

Latest commit

History

Repository files navigation

ml-zip: File compression using Machine Learning

Waring: 🏗️ Under construction 0.0.1 pre-alpha

Usecase

Changelog

Usage

Author

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages