Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TorchFX] Experimental quantization using torch.ao quantizer #32

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/workflows/examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,8 @@ jobs:
cache: pip
- name: cpuinfo
run: cat /proc/cpuinfo
- name: Install NNCF and test requirements
- name: Install test requirements
run: |
pip install -e .
pip install -r tests/cross_fw/examples/requirements.txt
- name: Print installed modules
run: pip list
Expand Down
23 changes: 10 additions & 13 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -119,23 +119,20 @@ nncf_debug/

# NNCF examples
examples/torch/object_detection/eval/
examples/post_training_quantization/onnx/mobilenet_v2/mobilenet_v2_*
examples/post_training_quantization/openvino/mobilenet_v2/mobilenet_v2_*
examples/post_training_quantization/tensorflow/mobilenet_v2/mobilenet_v2_*
examples/post_training_quantization/torch/mobilenet_v2/mobilenet_v2_*
examples/post_training_quantization/torch/ssd300_vgg16/ssd300_vgg16_*
examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control/stfpm_*
examples/post_training_quantization/openvino/yolov8/yolov8n*
examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control/yolov8n*
examples/**/runs/**
examples/**/results/**
examples/llm_compression/openvino/tiny_llama_find_hyperparams/statistics
compressed_graph.dot
original_graph.dot
examples/**/*.xml
examples/**/*.bin
examples/**/*.pt
examples/**/*.onnx
examples/**/statistics
examples/**/runs
examples/**/results
examples/**/metrics.json
datasets/**

# Tests
tests/**/runs/**
tests/**/tmp*/**
open_model_zoo/
nncf-tests.xml
compressed_graph.dot
original_graph.dot
17 changes: 6 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
[![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)
[![PyPI Downloads](https://static.pepy.tech/badge/nncf)](https://pypi.org/project/nncf/)

![Python](https://img.shields.io/badge/python-3.9+-blue)
![Backends](https://img.shields.io/badge/backends-openvino_|_pytorch_|_onnx_|_tensorflow-orange)
![OS](https://img.shields.io/badge/OS-Linux_|_Windows_|_MacOS-blue)

</div>

Neural Network Compression Framework (NNCF) provides a suite of post-training and training-time algorithms for optimizing inference of neural networks in [OpenVINO&trade;](https://docs.openvino.ai) with a minimal accuracy drop.
Expand Down Expand Up @@ -467,17 +471,8 @@ NNCF is also available via [conda](https://anaconda.org/conda-forge/nncf):
conda install -c conda-forge nncf
```

### System requirements

- Ubuntu\* 18.04 or later (64-bit)
- Python\* 3.9 or later
- Supported frameworks:
- PyTorch\* >=2.4, <2.6
- TensorFlow\* >=2.8.4, <=2.15.1
- ONNX\* ==1.17.0
- OpenVINO\* >=2022.3.0

This repository is tested on Python* 3.10.14, PyTorch* 2.5.0 (NVidia CUDA\* Toolkit 12.4) and TensorFlow* 2.12.1 (NVidia CUDA\* Toolkit 11.8).
System requirements of NNCF correspond to the used backend. System requirements for each backend and
the matrix of corresponding versions can be found in [installation.md](./docs/Installation.md).

## NNCF Compressed NNCF Model Zoo

Expand Down
10 changes: 9 additions & 1 deletion docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,13 @@

We suggest to install or use the package in the [Python virtual environment](https://docs.python.org/3/tutorial/venv.html).

If you want to optimize a model from PyTorch, install PyTorch by following [PyTorch installation guide](https://pytorch.org/get-started/locally/#start-locally). For other backend follow: [TensorFlow installation guide](https://www.tensorflow.org/install/), [ONNX installation guide](https://onnxruntime.ai/docs/install/), [OpenVINO installation guide](https://docs.openvino.ai/latest/openvino_docs_install_guides_overview.html).
NNCF supports multiple backends. Follow the corresponding installation guides and ensure your system meets
the required specifications for your chosen backend:

- OpenVINO&trade;: [Install Guide](https://docs.openvino.ai/2024/get-started/install-openvino.html), [System Requirements](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html)
- ONNX: [Install Guide](https://onnxruntime.ai/docs/install/)
- PyTorch: [Install Guide](https://pytorch.org/get-started/locally/#start-locally)
- TensorFlow: [Install Guide](https://www.tensorflow.org/install/)

## As a PyPI package

Expand Down Expand Up @@ -58,3 +64,5 @@ as well as the supported versions of Python:
| `2.4.0` | `2022.1.0` | `1.12.1` | `1.12.0` | `2.8.2` | `3.8` |

> (*) Python 3.9 or higher is required for TensorFlow 2.15.1

This repository is tested on Python* 3.10.14, PyTorch* 2.5.0 (NVidia CUDA\* Toolkit 12.4) and TensorFlow* 2.12.1 (NVidia CUDA\* Toolkit 11.8).
4 changes: 2 additions & 2 deletions examples/llm_compression/openvino/smollm2_360m_fp8/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@
# limitations under the License.
from functools import partial

import datasets
import numpy as np
import openvino as ov
from datasets import load_dataset
from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer

Expand Down Expand Up @@ -75,7 +75,7 @@ def main():
MODEL_ID = "HuggingFaceTB/SmolLM2-360M-Instruct"
OUTPUT_DIR = "smollm2_360m_compressed"

dataset = datasets.load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
# Filtering to remove empty samples from the dataset
dataset = dataset.filter(lambda example: len(example["text"]) > 1)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ datasets
openvino==2024.5
optimum-intel[openvino]
transformers
onnx<1.16.2
onnx==1.17.0
37 changes: 21 additions & 16 deletions examples/post_training_quantization/onnx/mobilenet_v2/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,36 +12,37 @@
import re
import subprocess
from pathlib import Path
from typing import List, Optional
from typing import List

import numpy as np
import onnx
import openvino as ov
import torch
from fastdownload import FastDownload
from fastdownload import download_url
from rich.progress import track
from sklearn.metrics import accuracy_score
from torchvision import datasets
from torchvision import transforms
from tqdm import tqdm

import nncf

ROOT = Path(__file__).parent.resolve()
MODEL_URL = "https://huggingface.co/alexsu52/mobilenet_v2_imagenette/resolve/main/mobilenet_v2_imagenette.onnx"
DATASET_URL = "https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz"
DATASET_PATH = "~/.cache/nncf/datasets"
MODEL_PATH = "~/.cache/nncf/models"
DATASET_PATH = Path().home() / ".cache" / "nncf" / "datasets"
MODEL_PATH = Path().home() / ".cache" / "nncf" / "models"
DATASET_CLASSES = 10


def download_dataset() -> Path:
downloader = FastDownload(base=DATASET_PATH, archive="downloaded", data="extracted")
downloader = FastDownload(base=DATASET_PATH.as_posix(), archive="downloaded", data="extracted")
return downloader.get(DATASET_URL)


def download_model() -> Path:
return download_url(MODEL_URL, Path(MODEL_PATH).resolve())
MODEL_PATH.mkdir(exist_ok=True, parents=True)
return download_url(MODEL_URL, MODEL_PATH.resolve())


def validate(path_to_model: Path, validation_loader: torch.utils.data.DataLoader) -> float:
Expand All @@ -51,7 +52,7 @@ def validate(path_to_model: Path, validation_loader: torch.utils.data.DataLoader
compiled_model = ov.compile_model(path_to_model, device_name="CPU")
output = compiled_model.outputs[0]

for images, target in tqdm(validation_loader):
for images, target in track(validation_loader, description="Validating"):
pred = compiled_model(images)[output]
predictions.append(np.argmax(pred, axis=1))
references.append(target)
Expand All @@ -61,13 +62,17 @@ def validate(path_to_model: Path, validation_loader: torch.utils.data.DataLoader
return accuracy_score(predictions, references)


def run_benchmark(path_to_model: Path, shape: Optional[List[int]] = None, verbose: bool = True) -> float:
command = f"benchmark_app -m {path_to_model} -d CPU -api async -t 15"
if shape is not None:
command += f' -shape [{",".join(str(x) for x in shape)}]'
cmd_output = subprocess.check_output(command, shell=True) # nosec
if verbose:
print(*str(cmd_output).split("\\n")[-9:-1], sep="\n")
def run_benchmark(path_to_model: Path, shape: List[int]) -> float:
command = [
"benchmark_app",
"-m", path_to_model.as_posix(),
"-d", "CPU",
"-api", "async",
"-t", "15",
"-shape", str(shape),
] # fmt: skip
cmd_output = subprocess.check_output(command, text=True)
print(*cmd_output.splitlines()[-8:], sep="\n")
match = re.search(r"Throughput\: (.+?) FPS", str(cmd_output))
return float(match.group(1))

Expand Down Expand Up @@ -136,9 +141,9 @@ def transform_fn(data_item):
print(f"[2/7] Save INT8 model: {int8_model_path}")

print("[3/7] Benchmark FP32 model:")
fp32_fps = run_benchmark(fp32_model_path, shape=[1, 3, 224, 224], verbose=True)
fp32_fps = run_benchmark(fp32_model_path, shape=[1, 3, 224, 224])
print("[4/7] Benchmark INT8 model:")
int8_fps = run_benchmark(int8_model_path, shape=[1, 3, 224, 224], verbose=True)
int8_fps = run_benchmark(int8_model_path, shape=[1, 3, 224, 224])

print("[5/7] Validate ONNX FP32 model in OpenVINO:")
fp32_top1 = validate(fp32_model_path, val_loader)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@

import openvino as ov
import torch
from tqdm import tqdm
from rich.progress import track
from ultralytics.cfg import get_cfg
from ultralytics.engine.validator import BaseValidator as Validator
from ultralytics.models.yolo import YOLO
from ultralytics.models.yolo.segment.val import SegmentationValidator
from ultralytics.utils import DEFAULT_CFG
from ultralytics.utils.metrics import ConfusionMatrix

Expand All @@ -37,7 +37,7 @@
def validate_ov_model(
ov_model: ov.Model,
data_loader: torch.utils.data.DataLoader,
validator: Validator,
validator: SegmentationValidator,
num_samples: Optional[int] = None,
) -> Tuple[Dict, int, int]:
validator.seen = 0
Expand All @@ -47,7 +47,7 @@ def validate_ov_model(
validator.confusion_matrix = ConfusionMatrix(nc=validator.nc)
compiled_model = ov.compile_model(ov_model, device_name="CPU")
num_outputs = len(compiled_model.outputs)
for batch_i, batch in enumerate(data_loader):
for batch_i, batch in enumerate(track(data_loader, description="Validating")):
if num_samples is not None and batch_i == num_samples:
break
batch = validator.preprocess(batch)
Expand All @@ -65,12 +65,17 @@ def validate_ov_model(
return stats, validator.seen, validator.nt_per_class.sum()


def run_benchmark(model_path: str, config) -> float:
command = f"benchmark_app -m {model_path} -d CPU -api async -t 30"
command += f' -shape "[1,3,{config.imgsz},{config.imgsz}]"'
cmd_output = subprocess.check_output(command, shell=True) # nosec

match = re.search(r"Throughput\: (.+?) FPS", str(cmd_output))
def run_benchmark(model_path: Path, config) -> float:
command = [
"benchmark_app",
"-m", model_path.as_posix(),
"-d", "CPU",
"-api", "async",
"-t", "30",
"-shape", str([1, 3, config.imgsz, config.imgsz]),
] # fmt: skip
cmd_output = subprocess.check_output(command, text=True)
match = re.search(r"Throughput\: (.+?) FPS", cmd_output)
return float(match.group(1))


Expand All @@ -96,11 +101,11 @@ def run_benchmark(model_path: str, config) -> float:
validator, data_loader = prepare_validation(YOLO(ROOT / f"{MODEL_NAME}.pt"), args)

print("[5/7] Validate OpenVINO FP32 model:")
fp32_stats, total_images, total_objects = validate_ov_model(fp32_ov_model, tqdm(data_loader), validator)
fp32_stats, total_images, total_objects = validate_ov_model(fp32_ov_model, data_loader, validator)
print_statistics(fp32_stats, total_images, total_objects)

print("[6/7] Validate OpenVINO INT8 model:")
int8_stats, total_images, total_objects = validate_ov_model(int8_ov_model, tqdm(data_loader), validator)
int8_stats, total_images, total_objects = validate_ov_model(int8_ov_model, data_loader, validator)
print_statistics(int8_stats, total_images, total_objects)

print("[7/7] Report:")
Expand Down
Loading
Loading