Releases: google-ai-edge/ai-edge-torch
Releases · google-ai-edge/ai-edge-torch
v0.2.0
Installation and Dependencies
pip install ai-edge-torch==0.2.0
- Python versions: 3.9, 3.10, 3.11
- Operating system: Linux
- PyTorch: 2.4.0
- TensorFlow: tf-nightly>=2.18.0.dev20240722
See this section of the README
PyTorch Converter
Compatible with torch 2.4.0 stable release. pip install ai-edge-torch(-nightly)
is now the only command needed to install ai-edge-torch and all dependencies.
Features
- Added
ai_edge_torch.to_channel_last_io
API (doc) - Added
ai_edge_torch.debug._search_model
API
Performance Improvements
- Improved layout optimization algorithm and general model performance
- Improved performance for
torch.nn.function.interpolate
with nearest mode - Improved performance for
aten.gelu
- Improved performance for
aten.avg_pool2d
withceil_mode=True
- Reduced conversion memory usage in torch_xla and MLIR converter
Bug Fix
- Fixed numerical/precision issue with
aten.native_group_norm
(nn.GroupNorm
)
Generative API
Authoring API
- Implemented new layer components for diffusion-based models
Support for new models
- Stable Diffusion 1.5 support CPU
Quantization
- Enabled selective quantization for different Generative layers for LLMs
- Enabled weight-only quantization with computation in floating point for increased accuracy
- Added quantization support for embedding tables
Documentation
- Added system architecture overview for Torch generative API
v0.1.1
Installation and Dependencies
pip install -r https://github.com/google-ai-edge/ai-edge-torch/releases/download/v0.1.1/requirements.txt
pip install ai-edge-torch==0.1.1
- Python versions: 3.9, 3.10, 3.11
- Operating system: Linux
- PyTorch: 2.4.0.dev20240429
- TensorFlow: 2.17.0.dev20240509
See this section of the README
PyTorch Converter (Beta)
Functionality
First release of a direct path from PyTorch to the TFLite runtime (blog post).
Coverage
- Verified successful conversion of Pytorch to TFLite on a Beta test set of 72 Pytorch models readily available from torchvision, torchaudio, timm, HuggingFace transformers, and open source GitHub repositories (such as Yolox, U2Net, IS-Net) spanning computer vision, text, audio, and speech applications.
Performance
- Excellent CPU performance for the converted models, leveraging the TFLite XNNPACK delegate.
- A subset of the Beta test set can be fully delegated to GPU, others are partially delegated or unsupported.
- QNN delegate (available here) supports most models in the Beta test set with significant average acceleration relative to CPU (20X) and GPU (5X) using Qualcomm’s DSP and neural processing units.
Quantization
- Support for dynamic quantization with PT2E.
- Support for post-training quantization via the TFLite converter.
- AI Edge Torch Converter APIs to use these two quantization frameworks are here.
Known Issues
- Inference latency with quantized models is higher than unquantized models in some cases.
Generative API (Alpha)
Functionality
- Provides PyTorch native building blocks to compose LLMs using mobile friendly abstractions for performant execution on TFLite runtime.
- Examples to author LLMs via Edge Generative API for conversion to TFLite for Gemma, TinyLlama and Phi-2. (Examples)
- Supports 8-bit dynamic range quantization. (here)
- Integration with MediaPipe LLM Inference API for easy integration into Mobile Apps, and a prompt interface.
Known Issues
- The conversion, and serialization process is unoptimized for LLMs. It requires keeping multiple copies of the weights in memory for transformations, and serialization/deserialization.
- Runtime execution of the LLM in TFLite is missing some memory optimizations, and inefficient during memory unpacking on XNNPack.