Installation and Dependencies

pip install ai-edge-torch==0.2.0

Python versions: 3.9, 3.10, 3.11
Operating system: Linux
PyTorch: 2.4.0
TensorFlow: tf-nightly>=2.18.0.dev20240722

See this section of the README

PyTorch Converter

Compatible with torch 2.4.0 stable release. pip install ai-edge-torch(-nightly) is now the only command needed to install ai-edge-torch and all dependencies.

Features

Added ai_edge_torch.to_channel_last_io API (doc)
Added ai_edge_torch.debug._search_model API

Performance Improvements

Improved layout optimization algorithm and general model performance
Improved performance for torch.nn.function.interpolate with nearest mode
Improved performance for aten.gelu
Improved performance for aten.avg_pool2d with ceil_mode=True
Reduced conversion memory usage in torch_xla and MLIR converter

Bug Fix

Fixed numerical/precision issue with aten.native_group_norm (nn.GroupNorm)

Generative API

Authoring API

Implemented new layer components for diffusion-based models

Support for new models

Stable Diffusion 1.5 support CPU

Quantization

Enabled selective quantization for different Generative layers for LLMs
Enabled weight-only quantization with computation in floating point for increased accuracy
Added quantization support for embedding tables

Documentation

Added system architecture overview for Torch generative API

Functionality

First release of a direct path from PyTorch to the TFLite runtime (blog post).

Coverage

Verified successful conversion of Pytorch to TFLite on a Beta test set of 72 Pytorch models readily available from torchvision, torchaudio, timm, HuggingFace transformers, and open source GitHub repositories (such as Yolox, U2Net, IS-Net) spanning computer vision, text, audio, and speech applications.

Performance

Excellent CPU performance for the converted models, leveraging the TFLite XNNPACK delegate.

A subset of the Beta test set can be fully delegated to GPU, others are partially delegated or unsupported.

QNN delegate (available here) supports most models in the Beta test set with significant average acceleration relative to CPU (20X) and GPU (5X) using Qualcomm’s DSP and neural processing units.

Quantization

Support for dynamic quantization with PT2E.

Support for post-training quantization via the TFLite converter.

AI Edge Torch Converter APIs to use these two quantization frameworks are here.

Functionality

Provides PyTorch native building blocks to compose LLMs using mobile friendly abstractions for performant execution on TFLite runtime.

Examples to author LLMs via Edge Generative API for conversion to TFLite for Gemma, TinyLlama and Phi-2. (Examples)

Supports 8-bit dynamic range quantization. (here)

Integration with MediaPipe LLM Inference API for easy integration into Mobile Apps, and a prompt interface.

Known Issues

The conversion, and serialization process is unoptimized for LLMs. It requires keeping multiple copies of the weights in memory for transformations, and serialization/deserialization.

Runtime execution of the LLM in TFLite is missing some memory optimizations, and inefficient during memory unpacking on XNNPack.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation and Dependencies

PyTorch Converter

Features

Performance Improvements

Bug Fix

Generative API

Authoring API

Support for new models

Quantization

Documentation

Installation and Dependencies

PyTorch Converter (Beta)

Functionality

Coverage

Performance

Quantization

Known Issues

Generative API (Alpha)

Functionality

Known Issues

Releases: google-ai-edge/ai-edge-torch

v0.2.0

Installation and Dependencies

PyTorch Converter

Features

Performance Improvements

Bug Fix

Generative API

Authoring API

Support for new models

Quantization

Documentation

v0.1.1

Installation and Dependencies

PyTorch Converter (Beta)

Functionality

Coverage

Performance

Quantization

Known Issues

Generative API (Alpha)

Functionality

Known Issues