Releases · joeynmt/joeynmt

upgrade to python 3.9, torch 1.11
torchtext.legacy dependencies are completely replaced by torch.utils.data
joeynmt/tokenizers.py: handles tokenization internally (also supports bpe-dropout!)
joeynmt/datasets.py: loads data from plaintext, tsv, and huggingface's datasets
scripts/build_vocab.py: trains subwords, creates joint vocab
enhancement in decoding
scoring with hypotheses or references
repetition penalty, ngram blocker
attention plots for transformers
yapf, isort, flake8 introduced
bugfixes, minor refactoring

Assets 2

18 Jan 02:25

joeynmt

1.5

092c504

Requirements update

Six >= 1.12

Assets 2

18 Jan 02:17

joeynmt

1.4

2fb88cd

Beam search & checkpointing improvements, dependency update

upgrade to sacrebleu 2.0, python 3.7, torch 1.8
bug fixes:
- heaps in checkpoint maintenance #153
- beam search stopping criterion #149
- removing final BPE merge markers in hypotheses (dsfsi/masakhane-web#33)
- keeping best and last ckpts #136
- using utf encoding when opening files #150
f-style formatting

Assets 2

14 Apr 03:40

juliakreutzer

1.3

42ad588

n-best decoding, checkpointing, dependency updates

You can now retrieve the n-best outputs during inference (rather than just the one best translation) and track the latest checkpoint (for continuing training). We also added a colab for training a small translation model on the Tatoeba task. Now operating on Torch v1.8.0 and using deprecated Torchtext dataset implementations from v0.9.

Assets 2