Skip to content

Latest commit

 

History

History
80 lines (58 loc) · 3.81 KB

deeplabv3_edgetpuv2.md

File metadata and controls

80 lines (58 loc) · 3.81 KB

Jetson Nano TensorRT Autoseg-EdgeTPU and DeepLab v3+ MobilenetEdgeTPUV2 latency

Environment

  • HW
    • Jetson Nano
  • OS
    • JetPack 4.6 Linux raspberrypi 5.10.36-v8+ #1418 SMP PREEMPT Thu May 13 18:19:53 BST 2021 aarch64 GNU/Linux
  • SW
    • TensorRT 8

How to benchmarks

Models

Convert ONNX model.

# Latancy
$ usr/src/tensorrt/bin/trtexec --onnx=_PATH_TO_/*.onnx [--fp16]

Results

Latency mean (ms)

Model FP16 FP16 with optimized FP32 FP32 with optimized
Autoseg-EdgeTPU-XS with default argmax 84.31 85.04 93.99 93.58
Autoseg-EdgeTPU-S with default argmax 94.53 93.60 109.18 108.71
Autoseg-EdgeTPU-XS with optimized fused argmax 60.41 59.53 67.1 66.48
Autoseg-EdgeTPU-S with optimized fused argmax 69.79 69.05 82.06 81.68
DeepLab v3+ MobilenetEdgeTPUV2-XS with default argmax 98.63 92.73 113.62 108.28
DeepLab v3+ MobilenetEdgeTPUV2-S with default argmax 122.27 114.61 146.90 141.30
DeepLab v3+ MobilenetEdgeTPUV2-M with default argmax 150.33 141.30 189.69 181.41
DeepLab v3+ MobilenetEdgeTPUV2-XS with optimized fused argmax 73.59 67.92 86.34 81.00
DeepLab v3+ MobilenetEdgeTPUV2-S with optimized fused argmax 97.41 89.60 119.77 112.89
DeepLab v3+ MobilenetEdgeTPUV2-M with optimized fused argmax 125.74 116.54 162.46 154.03

^ with optimized: using openvino2tensorflow and tflite2tensorflow

Latency

Youtube video

Autoseg-EdgeTPU-XS with default argmax

Autoseg-EdgeTPU-S with default argmax

Autoseg-EdgeTPU-XS with optimized fused argmax

Autoseg-EdgeTPU-S with optimized fused argmax

DeepLab v3+ MobilenetEdgeTPUV2-XS with default argmax

DeepLab v3+ MobilenetEdgeTPUV2-S with default argmax

DeepLab v3+ MobilenetEdgeTPUV2-M with default argmax

DeepLab v3+ MobilenetEdgeTPUV2-XS with optimized fused argmax

DeepLab v3+ MobilenetEdgeTPUV2-S with optimized fused argmax

DeepLab v3+ MobilenetEdgeTPUV2-M with optimized fused argmax