PyTorch 2.0 TTNN Compiler

This project allows to run PyTorch code on Tenstorrent hardware.

Supported Models

The table below summarizes the results of running various ML models through our TTNN compiler. For each model, we track whether the run was successful, the number of operations before and after conversion, the number of to_device and from_device operations, performance metrics, and accuracy.

Model	Status	Torch Ops Before (Unique Ops)	Torch Ops Remain (Unique Ops)	To/From Device Ops	Original Run Time (ms)	Compiled Run Time for 5th Iteration (ms)	Accuracy (%)
Autoencoder (linear)	✅	22 (3)	0 (0)	0	1575.35	16.59	100.0
BERT	✅	1393 (21)	0 (0)	0	78247.2	4606.08	99.69
Bloom	✅	1403 (26)	0 (0)	0	66246.2	7669.73	39.93
DPR	✅	720 (22)	0 (0)	3	4998.62	1038.84	99.29
Llama	✅	40 (11)	0 (0)	2	315874	129085.12	100.0
Mnist	✅	14 (8)	0 (0)	1	3619.95	34.24	99.65
MobileNetV2	✅	154 (9)	0 (0)	0	858.73	2077.15	4.39
OpenPose V2	✅	155 (7)	0 (0)	6	3033.13	1511.88	91.47
Perceiver IO	✅	1531 (20)	0 (0)	1	47016.1	4402.13	99.95
ResNet18	✅	70 (9)	0 (0)	1	2016.68	531.94	30.71
ResNet50	✅	176 (9)	0 (0)	1	4427.25	2142.46	4.56
RoBERTa	✅	719 (21)	0 (0)	3	17674.1	3541.67	98.64
SqueezeBERT	✅	16 (9)	0 (0)	3	2957	214.69	100.0
U-Net	✅	68 (6)	0 (0)	12	40975.2	886.76	100.0
Unet-brain	✅	68 (6)	0 (0)	12	41298.9	1036.0	N/A
Unet-carvana	✅	67 (5)	0 (0)	12	87709.4	2015.57	99.69
YOLOv5	✅	3 (3)	0 (0)	0	24098.3	16862.82	100.0
albert/albert-base-v2	✅	791 (21)	0 (0)	3	2453.77	478.27	68.8
albert/albert-base-v2-classification	✅	779 (21)	0 (0)	3	2198.49	414.17	99.96
albert/albert-large-v2	✅	1547 (21)	0 (0)	3	4352.92	961.66	24.89
albert/albert-xlarge-v2	✅	1547 (21)	0 (0)	3	11059.1	1512.75	52.29
distilbert-base-uncased	✅	361 (16)	0 (0)	2	3253.31	569.18	99.7
dla34.in1k	✅	135 (9)	0 (0)	23	3859.25	1067.17	8.94
ghostnet_100.in1k	✅	515 (14)	0 (0)	64	1066.86	2092.37	23.36
mobilenet_v2	✅	154 (9)	0 (0)	0	806.08	2093.1	4.39
mobilenet_v3_large	✅	188 (11)	0 (0)	0	729.55	2382.69	12.02
mobilenet_v3_small	✅	158 (11)	0 (0)	0	496.91	1173.0	27.11
mobilenetv1_100.ra4_e3600_r224_in1k	✅	85 (7)	0 (0)	0	1530.42	1410.52	12.9
regnet_x_16gf	✅	235 (8)	0 (0)	0	13663	5987.96	19.64
regnet_x_1_6gf	✅	195 (8)	0 (0)	0	1613.7	2386.98	14.69
regnet_x_32gf	✅	245 (8)	0 (0)	0	27664.5	10535.78	12.48
regnet_x_3_2gf	✅	265 (8)	0 (0)	0	2930.71	3255.91	5.15
regnet_x_400mf	✅	235 (8)	0 (0)	0	982.05	2021.08	9.37
regnet_x_800mf	✅	175 (8)	0 (0)	0	1124.19	1620.73	2.32
regnet_x_8gf	✅	245 (8)	0 (0)	0	7339.88	5816.71	-0.21
regnet_y_1_6gf	✅	447 (10)	0 (0)	0	2453.78	3135.43	-4.82
regnet_y_32gf	✅	335 (10)	0 (0)	0	28765.1	22205.53	-0.94
regnet_y_3_2gf	✅	351 (10)	0 (0)	0	4485.61	3654.79	14.92
regnet_y_400mf	✅	271 (10)	0 (0)	0	786.47	1750.87	0.25
regnet_y_800mf	✅	239 (10)	0 (0)	0	1042.28	1764.32	-1.17
regnet_y_8gf	✅	287 (10)	0 (0)	0	8009.46	5233.2	-0.58
resnet101	✅	346 (9)	0 (0)	1	7096.04	3322.34	5.82
resnet152	✅	516 (9)	0 (0)	1	10619	4952.24	-0.28
resnet18	✅	70 (9)	0 (0)	1	2115.1	515.46	14.68
resnet34	✅	126 (9)	0 (0)	1	3831.47	935.81	21.92
resnet50	✅	176 (9)	0 (0)	1	4231.32	1805.75	4.56
resnext101_32x8d	✅	346 (9)	0 (0)	1	14751.1	9919.81	0.38
resnext101_64x4d	✅	346 (9)	0 (0)	1	14121.1	9670.26	1.42
resnext50_32x4d	✅	176 (9)	0 (0)	1	5031.2	2738.28	7.63
textattack/albert-base-v2-imdb	✅	782 (22)	0 (0)	3	2215.94	413.81	100.0
tf_efficientnet_lite0.in1k	✅	149 (9)	0 (0)	5	1122.02	4527.09	-1.81
tf_efficientnet_lite1.in1k	✅	194 (9)	0 (0)	5	1630.98	5943.64	0.78
tf_efficientnet_lite2.in1k	✅	194 (9)	0 (0)	5	2102.1	6717.82	1.34
twmkn9/albert-base-v2-squad2	✅	783 (23)	0 (0)	3	2694.76	418.46	98.39
vgg11	✅	33 (8)	0 (0)	5	11938.8	1381.73	99.8
vgg11_bn	✅	41 (9)	0 (0)	5	9073.45	1529.05	99.3
vgg13	✅	37 (8)	0 (0)	5	18361.8	1528.98	99.88
vgg13_bn	✅	47 (9)	0 (0)	5	18914.6	1636.33	99.05
vgg16	✅	43 (8)	0 (0)	5	21301.2	1554.33	99.7
vgg16_bn	✅	56 (9)	0 (0)	5	22906.2	1703.47	98.21
vgg19	✅	49 (8)	0 (0)	5	24318	1679.34	99.52
vgg19_bn	✅	65 (9)	0 (0)	5	26595.2	1874.33	97.44
wide_resnet101_2	✅	346 (9)	0 (0)	1	24316.4	6052.75	3.58
wide_resnet50_2	✅	176 (9)	0 (0)	1	12397.3	3173.83	5.52
xception71.tf_in1k	✅	393 (9)	0 (0)	0	17342.8	14550.41	4.21
Autoencoder (conv)	🚧	9 (3)	1 (1)	1	1333.74	28.74	100.0
Autoencoder (conv)-train	🚧	24 (7)	17 (5)	0	2045.57	28.6	100.0
Autoencoder (linear)-train	🚧	104 (8)	26 (3)	0	2330.18	53.04	100.0
CLIP	🚧	1395 (29)	7 (6)	5	4186.07	1938.31	94.18
DETR	🚧	1655 (39)	32 (9)	2	90907.1	20170.71	46.57
Falcon	🚧	71 (6)	3 (2)	1	112150	30695.47	100.0
GLPN-KITTI	🚧	2959 (26)	22 (2)	6	92654.9	68326.28	99.74
GPT-2	🚧	745 (29)	27 (5)	2	6059.89	1053.73	100.0
Hand Landmark	🚧	N/A	N/A	N/A	6410.15	96.79	N/A
HardNet	🚧	245 (10)	2 (1)	122	4720.1	1876.07	5.37
HardNet-train	🚧	867 (21)	480 (11)	120	11826	9294.22	100.0
MLPMixer	🚧	253 (11)	25 (2)	0	5629.32	5558.04	99.97
MLPMixer-train	🚧	616 (19)	127 (8)	0	15743.2	13175.24	100.0
Mnist-train	🚧	46 (15)	18 (7)	0	3688.31	75.92	100.0
OpenPose V2-train	🚧	523 (14)	385 (9)	6	11167.6	8976.47	100.0
ResNet18-train	🚧	241 (19)	176 (12)	0	5761.27	4686.67	100.0
ResNet50-train	🚧	616 (19)	470 (12)	0	12892.5	11209.34	100.0
SegFormer	🚧	676 (22)	16 (1)	4	49314.9	3844.17	99.49
SegFormer-train	🚧	1780 (35)	175 (14)	4	79430	38565.07	100.0
U-Net-train	🚧	236 (15)	178 (10)	8	79460.4	41081.79	100.0
Unet-brain-train	🚧	236 (15)	178 (10)	8	82861	41420.09	100.0
Unet-carvana-train	🚧	232 (13)	175 (9)	8	185243	99886.68	100.0
ViLT	🚧	42 (16)	8 (6)	3	22717	18379.27	87.8
XGLM	🚧	1432 (28)	29 (5)	1	18435	6313.93	95.48
YOLOS	🚧	952 (27)	17 (2)	6	14815.9	7645.16	97.52
YOLOv3	🚧	250 (7)	2 (1)	4	227878	3695.41	98.74
albert/albert-xxlarge-v2	🚧	791 (21)	24 (1)	3	22409.7	2233.31	22.25
densenet121	🚧	432 (10)	3 (1)	594	3811.8	2845.11	18.16
densenet161	🚧	572 (10)	3 (1)	1144	8958.98	6461.31	18.66
densenet169	🚧	600 (10)	3 (1)	1238	4748.89	5261.89	69.9
densenet201	🚧	712 (10)	3 (1)	1902	4801.65	6433.33	23.71
dla34.in1k-train	🚧	469 (18)	334 (11)	17	11036	7742.59	100.0
ese_vovnet19b_dw.ra_in1k	🚧	111 (12)	3 (1)	16	2299.37	876.07	34.46
ese_vovnet19b_dw.ra_in1k-train	🚧	360 (25)	227 (12)	16	4708.11	4186.09	100.0
facebook/deit-base-patch16-224	🚧	685 (17)	1 (1)	2	14383.2	2216.11	98.19
facebook/deit-base-patch16-224-train	🚧	1854 (27)	151 (9)	2	78630.7	6074.55	100.0
ghostnet_100.in1k-train	🚧	1468 (33)	845 (18)	64	1510.52	3272.11	100.0
ghostnetv2_100.in1k	🚧	683 (18)	28 (2)	64	1633.89	3952.93	17.55
ghostnetv2_100.in1k-train	🚧	2000 (39)	1209 (23)	64	2448.5	5790.16	100.0
googlenet	🚧	214 (15)	13 (1)	39	1775	1227.75	-5.78
hrnet_w18.ms_aug_in1k	🚧	1209 (11)	31 (1)	0	5190.14	4643.09	8.39
hrnet_w18.ms_aug_in1k-train	🚧	3998 (21)	2867 (12)	0	14700.9	14361.39	100.0
inception_v4.tf_in1k	🚧	495 (11)	15 (2)	83	12678.2	4871.8	55.07
inception_v4.tf_in1k-train	🚧	1702 (24)	1231 (13)	80	45261.9	36378.71	100.0
microsoft/beit-base-patch16-224	🚧	793 (21)	25 (3)	2	12057.2	2713.77	98.95
microsoft/beit-base-patch16-224-train	🚧	2229 (34)	201 (13)	2	80421.2	6693.61	100.0
microsoft/beit-large-patch16-224	🚧	1573 (21)	49 (3)	2	39668.4	6678.31	99.24
microsoft/beit-large-patch16-224-train	🚧	4437 (34)	393 (13)	2	360877	16971.78	100.0
mixer_b16_224.goog_in21k	🚧	356 (11)	1 (1)	0	12972.8	1571.83	55.91
mixer_b16_224.goog_in21k-train	🚧	959 (18)	102 (7)	0	57226	4117.56	100.0
mobilenetv1_100.ra4_e3600_r224_in1k-train	🚧	231 (15)	165 (8)	0	3087.86	2605.46	100.0
regnet_y_128gf	🚧	447 (10)	3 (1)	0	510801	165778.64	3.94
regnet_y_16gf	🚧	303 (10)	1 (1)	0	15248.1	14625.21	7.17
speecht5-tts	🚧	860 (20)	1 (1)	2	40002.8	29899.25	N/A
swin_b	🚧	1898 (30)	126 (5)	39	13978.9	4344.89	5.02
swin_s	🚧	1898 (30)	126 (5)	39	7501.39	3154.26	11.29
swin_t	🚧	968 (30)	72 (5)	27	3763.09	1867.69	15.44
swin_v2_b	🚧	2474 (37)	294 (9)	33	17685.3	4554.69	9.77
swin_v2_s	🚧	2474 (37)	294 (9)	33	10291.7	3562.54	1.1
swin_v2_t	🚧	1256 (37)	156 (9)	21	5529.05	1974.19	9.93
tf_efficientnet_lite0.in1k-train	🚧	403 (17)	286 (9)	5	3199.32	5413.2	100.0
tf_efficientnet_lite1.in1k-train	🚧	523 (17)	371 (9)	5	3614	7418.72	100.0
tf_efficientnet_lite2.in1k-train	🚧	523 (17)	371 (9)	5	4579.16	9930.73	100.0
tf_efficientnet_lite3.in1k	🚧	221 (9)	5 (1)	5	2786.34	9914.41	13.05
tf_efficientnet_lite3.in1k-train	🚧	595 (17)	427 (10)	5	7316.23	15671.53	100.0
vit_b_16	🚧	552 (17)	13 (2)	2	19566.3	6668.23	98.97
vit_b_32	🚧	552 (17)	13 (2)	2	6118.63	2447.85	98.45
vit_h_14	🚧	1452 (17)	33 (2)	2	765778	181953.18	98.96
vit_l_16	🚧	1092 (17)	25 (2)	2	65154.8	17771.28	99.69
vit_l_32	🚧	1092 (17)	25 (2)	2	20005.8	6501.15	98.87
xception71.tf_in1k-train	🚧	1370 (18)	1093 (10)	0	55453.5	58485.04	100.0
CLIP-train	❌	3942 (43)	N/A	N/A	31137.2	N/A	N/A
FLAN-T5	❌	20020 (38)	N/A	N/A	4774.09	N/A	N/A
GPTNeo	❌	2725 (35)	N/A	N/A	13954.4	N/A	N/A
MobileNetSSD	❌	519 (29)	N/A	N/A	554.22	N/A	N/A
OPT	❌	4001 (31)	N/A	N/A	32412.9	N/A	N/A
Stable Diffusion V2	❌	1870 (29)	N/A	N/A	1.64424e+06	N/A	N/A
Whisper	❌	4286 (17)	N/A	N/A	317246	N/A	N/A
codegen	❌	9177 (36)	N/A	N/A	17699.9	N/A	N/A
retinanet_resnet50_fpn	❌	1048 (27)	N/A	N/A	1874.96	N/A	N/A
retinanet_resnet50_fpn_v2	❌	558 (28)	N/A	N/A	1948.71	N/A	N/A
ssd300_vgg16	❌	329 (28)	N/A	N/A	2790.43	N/A	N/A
ssdlite320_mobilenet_v3_large	❌	519 (29)	N/A	N/A	500.89	N/A	N/A
t5-base	❌	14681 (38)	N/A	N/A	21768.2	N/A	N/A
t5-large	❌	22696 (38)	N/A	N/A	61878.7	N/A	N/A
t5-small	❌	6118 (38)	N/A	N/A	7553.98	N/A	N/A
tf_efficientnet_lite4.in1k	❌	275 (9)	N/A	N/A	5232.27	N/A	N/A
tf_efficientnet_lite4.in1k-train	❌	739 (17)	N/A	N/A	18446.4	N/A	N/A

Explanation of Metrics

Model: Name of the model.
Status: Indicates whether the model is ❌ traced / 🚧 compiled / ✅ E2E on device.
Torch Ops Before (Unique Ops): The total number of operations used by the model in the original Torch implementation. The number in parenthesis represents the total unique ops.
Torch Ops Remain (Unique Ops): The total number of operations used after conversion to TTNN. The number in parenthesis represents the total unique ops.
To/From Device Ops: The number of to/from_device operations (data transfer to/from the device).
Original Run Time (ms): Execution time (in seconds) of the model before conversion.
Compiled Run Time for 5th Iteration (ms): Execution time (in seconds) of the model after conversion for the 5th iteration.
Accuracy (%): Model accuracy on a predefined test dataset after conversion.

Quickstart

The torch_ttnn module has a backend function, which can be used with the torch.compile().

import torch
import torch_ttnn

# A torch Module
class FooModule(torch.nn.Module):
    ...
# Create a module
module = FooModule()

# Compile the module, with ttnn backend
device = ttnn.open_device(device_id=0)
option = torch_ttnn.TorchTtnnOption(device=self.device)
ttnn_module = torch.compile(module, backend=torch_ttnn.backend, options=option)

# Running inference / training
ttnn_module(input_data)

Tracer

The tracer dump the information of fx graph such as node's op_name and shape.

For example, you can run this script to parse the information

PYTHONPATH=$(pwd) python3 tools/stat_models.py --trace_orig --backward --profile
ls stat/raw

By default, the raw result will be stored at stat/raw, and you can run this script to generate the report

python3 tools/generate_report.py
ls stat/

Now the stat/ folder have these report

fw_node_count.csv
bw_node_count.csv
fw_total_input_size_dist/
bw_total_input_size_dist/
fw_total_output_size_dist/
bw_total_output_size_dist/
profile/

The node_count.csv show the node with op_type appear in the fx graph. This report can help analyze the frequency of op type appear in the graph.

The *_total_*_size_dist/ statistics the op_type's input/output_size distribution from all fx graph recored in stat/raw. This report can help analyze the memory footprint durning the calculation of op_type.

Notice: the default input_shapes in tools/stat_torchvision.py is [1,3,224,224], which has dependency with *_total_*_size_dist/ report.
Notice: the aten ir interface is in there

The profile/ is the tools provided by pytorch, you can open it by the url: chrome://tracing

For developers

Install torch-ttnn with editable mode

During development, you may want to use the torch-ttnn package for testing. In order to do that, you can install the torch-ttnn package in "editable" mode with

pip install -e .

Now, you can utilize torch_ttnn in your Python code. Any modifications you make to the torch_ttnn package will take effect immediately, eliminating the need for constant reinstallation via pip.

Build wheel file

For developers want to deploy the wheel, you can build the wheel file with

python -m build

Then you can upload the .whl file to the PyPI (Python Package Index).

Run transformer models

To run transformer model with ttnn backend, run:

PYTHONPATH="$TT_METAL_HOME:$(pwd)" python3 tools/run_transformers.py --model "phiyodr/bert-large-finetuned-squad2" --backend torch_ttnn

You can also substitute the backend with torch_stat to run a reference comparison.

Name		Name	Last commit message	Last commit date
Latest commit History 495 Commits
.github		.github
docs		docs
tests		tests
tools		tools
torch_ttnn		torch_ttnn
tracer		tracer
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch 2.0 TTNN Compiler

Supported Models

Explanation of Metrics

Quickstart

Tracer

For developers

Install torch-ttnn with editable mode

Build wheel file

Run transformer models

About

Releases

Packages

Contributors 15

Languages

tenstorrent/pytorch2.0_ttnn

Folders and files

Latest commit

History

Repository files navigation

PyTorch 2.0 TTNN Compiler

Supported Models

Explanation of Metrics

Quickstart

Tracer

For developers

Install torch-ttnn with editable mode

Build wheel file

Run transformer models

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 15

Languages

Packages