Skip to content

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

License

Notifications You must be signed in to change notification settings

tenstorrent/tt-metal

Repository files navigation

ttnn logo

TT-NN is python & C++ Neural Network OP library.


Grayskull (GS) Models

Model Batch End-to-end throughput [1] Device throughput [2] Target
ResNet-50 (fps) 20 2,850 7,200 10,000
BERT-Large (sen/s) 12 362 406 410
Falcon7B-decode (t/s) 32 135 135 140
ViT (fps) 8 480 1570 2000
T5 small (sen/s) 140
Bloom (sen/s) 70
U-Net coming soon

[1] - Observed from the host. Includes dispatch overhead and kernel execution time.

[2] - Ignoring host overhead. Kernel execution time only.

Wormhole (WH) Models

Model Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode 129th 32 11.6 t/s/u - 371 t/s 15.4 t/s/u - 493 t/s 21 t/s/u
Mistral-7B-decode 33rd 32 10.9 t/s/u - 349 t/s 13.3 t/s/u - 426 t/s 21 t/s/u
Mamba-2.8B-decode any 32 9.2 t/s/u - 295 t/s 13.1 t/s/u - 419 t/s 22 t/s/u
BERT-Large (sen/s) any 8 270 340 400
Stable Diffusion 1.4 512x512 coming soon 1

[3] - Generating the i'th token in a sequence while the kv_cache is filled with i-1 rows.

T3000 (2x4 mesh of WHs) Models

Model Technique Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode Data Parallel 129th 256 4.4 t/s/u - 1114 t/s coming soon 21 t/s/u
LLaMA-2-70B-decode Tensor Parallel 129th 32 8.4 t/s/u - 269 t/s 13.8 t/s/u - 441 t/s 20 t/s/u
LLaMA-3-70B-decode Tensor Parallel 129th 32 2.4 t/s/u - 75.4 t/s 7.7 t/s/u - 246.4 t/s 20 t/s/u
Falcon40B-decode Tensor Parallel 129th 32 1.5 t/s/u - 48 t/s 14.0 t/s/u - 448 t/s 30 t/s/u
Mixtral7Bx8-decode Tensor Parallel 129th 32 3.6 t/s/u - 114 t/s 23.5 t/s/u - 752 t/s 28 t/s/u
ResNet50 Data Parallel coming soon

Using TT-NN ops and tensors

import ttnn
import torch

with ttnn.manage_device(device_id=0) as device:
   a = torch.ones((5, 7))
   b = torch.ones((1, 7))

   a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
   b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)

   output = a + b
   output = ttnn.to_torch(output)

print(output)

TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.