Skip to content

Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"

License

Notifications You must be signed in to change notification settings

dtunai/Griffin-Jax

Repository files navigation

Griffin Jax

Jax + Flax implementation of the "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models".

A hybrid model that mixes gated linear recurrences with local attention.

Key Information About Architecture

  • Griffin-3B outperforms Mamba-3B, and Griffin-7B and Griffin-14B achieve performance competitive with Llama-2, despite being trained on nearly 7 times fewer tokens.
  • Griffin can extrapolate on sequences significantly longer than those seen during training.

TODOs:

[] Usage and training code will be added to the repository.

About

Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published