Releases: joeynmt/joeynmt
Releases · joeynmt/joeynmt
v2.3
- introduced DistributedDataParallel.
- implemented language tags, see notebooks/torchhub.ipynb
- released a iwslt14 de-en-fr multilingual model (trained using DDP)
- special symbols definition refactoring
- configuration refactoring
- autocast refactoring
- bugfixes
- upgrade to python 3.11, torch 2.1.2
- documentation refactoring
v2.2
v2.1
- upgrade to python 3.10, torch 1.12
- replace Automated Mixed Precision from NVIDA's amp to Pytorch's amp package
- replace discord.py with pycord in the Discord Bot demo
- data Iterator refactoring (#189, #190, #191)
- migrate to pytorch's torch.testing.assert_close to check tensors in unittests
- add wmt14 ende / deen benchmark trained with joey v2 from scratch
- bugfixes
Joey NMT 2.0
Breaking changes:
- upgrade to python 3.9, torch 1.11
- torchtext.legacy dependencies are completely replaced by torch.utils.data
- joeynmt/tokenizers.py: handles tokenization internally (also supports bpe-dropout!)
- joeynmt/datasets.py: loads data from plaintext, tsv, and huggingface's datasets
- scripts/build_vocab.py: trains subwords, creates joint vocab
- enhancement in decoding
- scoring with hypotheses or references
- repetition penalty, ngram blocker
- attention plots for transformers
- yapf, isort, flake8 introduced
- bugfixes, minor refactoring
Requirements update
Six >= 1.12
Beam search & checkpointing improvements, dependency update
- upgrade to sacrebleu 2.0, python 3.7, torch 1.8
- bug fixes:
- heaps in checkpoint maintenance #153
- beam search stopping criterion #149
- removing final BPE merge markers in hypotheses (dsfsi/masakhane-web#33)
- keeping best and last ckpts #136
- using utf encoding when opening files #150
- f-style formatting
n-best decoding, checkpointing, dependency updates
You can now retrieve the n-best outputs during inference (rather than just the one best translation) and track the latest checkpoint (for continuing training). We also added a colab for training a small translation model on the Tatoeba task. Now operating on Torch v1.8.0 and using deprecated Torchtext dataset implementations from v0.9.
1.0
Pre-release v0.9
Stable recurrent and Transformer models. Minor changes and refactoring might happen before v1.0.