Skip to content

A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)

License

Notifications You must be signed in to change notification settings

augustwester/transformer-xl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer-XL

Model architecture

This repo is associated with the blog post "Transformer-XL: A Memory-Augmented Transformer" over at sigmoid prime. It contains a lightweight implementation of the Transformer-XL architecture, proposed by Dai et al. in the paper "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019).

Inspired by Andrej Karpthy's minGPT, the implementation focuses on simplicity by distilling the architecture to its bare minimum, so you won't find any code for training the model on large text corpuses or anything like that. Like minGPT, the implementation only comes with a script to train the model to sort sequences of unordered numbers. I think this makes the implementation friendlier for people who simply wish to understand the architecture.

To train the model, simply run:

python3 transformer-xl/train.py

followed by the command below to evaluate it:

python3 transformer-xl/eval.py

The eval.py script will evaluate the model with and without the "memory-augmentation" proposed by Dai et al. Depending on how long you set the sequence length and memory length, you should see the memory-augmented version perform significantly faster than the memory-free model.

About

A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages