GPU accelerated TensorFlow Lite / TensorRT applications.

$ sudo apt install libgles2-mesa-dev 
$ mkdir ~/work
$ mkdir ~/lib
$
$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh

2.1.2. build TensorFlow Lite library.

$ cd ~/work 
$ git clone https://github.com/terryky/tflite_gles_app.git
$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

$
$ ln -s tensorflow_r2.4 ./tensorflow
$
$ cp ./tensorflow/bazel-bin/tensorflow/lite/libtensorflowlite.so ~/lib
$ cp ./tensorflow/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so ~/lib

2.1.3. build an application.

$ cd ~/work/tflite_gles_app/gl2handpose
$ make -j4

2.1.4. run an application.

$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
$ cd ~/work/tflite_gles_app/gl2handpose
$ ./gl2handpose

2.2. Build for aarch64 Linux (Jetson Nano, Raspberry Pi)

2.2.1. build TensorFlow Lite library on Host PC.

(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_aarch64.sh

# If you want to build XNNPACK-enabled TensorFlow Lite, use the following script.
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_with_xnnpack_aarch64.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

2.2.2. copy Tensorflow Lite libraries to target Jetson / Raspi.

(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/libtensorflowlite.so [email protected]:/home/jetson/lib
(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so [email protected]:/home/jetson/lib

2.2.3. clone Tensorflow repository on target Jetson / Raspi.

(Jetson/Raspi)$ cd ~/work
(Jetson/Raspi)$ git clone -b r2.4 https://github.com/tensorflow/tensorflow.git
(Jetson/Raspi)$ cd tensorflow
(Jetson/Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh

2.2.4. build an application.

(Jetson/Raspi)$ sudo apt install libgles2-mesa-dev libdrm-dev
(Jetson/Raspi)$ cd ~/work 
(Jetson/Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose

# on Jetson
(Jetson)$ make -j4 TARGET_ENV=jetson_nano TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi without GPUDelegate (recommended)
(Raspi )$ make -j4 TARGET_ENV=raspi4

# on Raspberry pi with GPUDelegate (low performance)
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi with XNNPACK
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=XNNPACK

2.2.5. run an application.

(Jetson/Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Jetson/Raspi)$ ./gl2handpose

about VSYNC

On Jetson Nano, display sync to vblank (VSYNC) is enabled to avoid the tearing by default . To enable/disable VSYNC, run app with the following command.

# enable VSYNC (default).
(Jetson)$ export __GL_SYNC_TO_VBLANK=1; ./gl2handpose

# disable VSYNC. framerate improves, but tearing occurs.
(Jetson)$ export __GL_SYNC_TO_VBLANK=0; ./gl2handpose

2.3 Build for armv7l Linux (Raspberry Pi)

2.3.1. build TensorFlow Lite library on Host PC.

(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.3/build_libtflite_r2.3_armv7l.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

2.3.2. copy Tensorflow Lite libraries to target Raspberry Pi.

(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/libtensorflowlite.so [email protected]:/home/pi/lib
(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so [email protected]:/home/pi/lib

2.3.3. setup environment on Raspberry Pi.

(Raspi)$ sudo apt install libgles2-mesa-dev libegl1-mesa-dev xorg-dev
(Raspi)$ sudo apt update
(Raspi)$ sudo apt upgrade

2.3.4. clone Tensorflow repository on target Raspi.

(Raspi)$ cd ~/work
(Raspi)$ git clone -b r2.3 https://github.com/tensorflow/tensorflow.git
(Raspi)$ cd tensorflow
(Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh

2.3.5. build an application on target Raspi..

(Raspi)$ cd ~/work 
(Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ make -j4 TARGET_ENV=raspi4  #disable GPUDelegate. (recommended)

#enable GPUDelegate. but it cause low performance on Raspi4.
(Raspi)$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2

2.3.6. run an application on target Raspi..

(Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ ./gl2handpose

for more detail infomation, please refer this article.

3. About Input video stream

Both Live camera and video file are supported as input methods.

Live UVC Camera
Recorded Video file

3.1. Live UVC Camera (default)

UVC(USB Video Class) camera capture is supported.

Use v4l2-ctl command to configure the capture resolution.
- lower the resolution, higher the framerate.

(Target)$ sudo apt-get install v4l-utils

# confirm current resolution settings
(Target)$ v4l2-ctl --all

# query available resolutions
(Target)$ v4l2-ctl --list-formats-ext

# set capture resolution (160x120)
(Target)$ v4l2-ctl --set-fmt-video=width=160,height=120

# set capture resolution (640x480)
(Target)$ v4l2-ctl --set-fmt-video=width=640,height=480

currently, only YUYV pixelformat is supported.
- If you have error messages like below:

-------------------------------
 capture_devie  : /dev/video0
 capture_devtype: V4L2_CAP_VIDEO_CAPTURE
 capture_buftype: V4L2_BUF_TYPE_VIDEO_CAPTURE
 capture_memtype: V4L2_MEMORY_MMAP
 WH(640, 480), 4CC(MJPG), bpl(0), size(341333)
-------------------------------
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
...

please try to change your camera settings to use YUYV pixelformat like following command :

$ sudo apt-get install v4l-utils
$ v4l2-ctl --set-fmt-video=width=640,height=480,pixelformat=YUYV --set-parm=30

to disable camera
- If your camera doesn't support YUYV, please run the apps in camera_disabled_mode with argument -x

$ ./gl2handpose -x

3.2 Recorded Video file

FFmpeg (libav) video decode is supported.
If you want to use a recorded video file instead of a live camera, follow these steps:

# setup dependent libralies.
(Target)$ sudo apt install libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavresample-dev libavutil-dev

# build an app with ENABLE_VDEC options
(Target)$ cd ~/work/tflite_gles_app/gl2facemesh
(Target)$ make -j4 ENABLE_VDEC=true

# run an app with a video file name as an argument.
(Target)$ ./gl2facemesh -v assets/sample_video.mp4

4. Tested platforms

You can select the platform by editing Makefile.env.

Linux PC (X11)
NVIDIA Jetson Nano (X11)
NVIDIA Jetson TX2 (X11)
RaspberryPi4 (X11)
RaspberryPi3 (Dispmanx)
Coral EdgeTPU Devboard (Wayland)

5. Performance of inference [ms]

Blazeface

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	10	10
TensorFlow Lite	CPU int8	7	7
TensorFlow Lite GPU Delegate	GPU fp16	70	10
TensorRT	GPU fp16	--	?

Classification (mobilenet_v1_1.0_224)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	69	50
TensorFlow Lite	CPU int8	28	29
TensorFlow Lite GPU Delegate	GPU fp16	360	37
TensorRT	GPU fp16	--	19

Object Detection (ssd_mobilenet_v1_coco)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	150	113
TensorFlow Lite	CPU int8	62	64
TensorFlow Lite GPU Delegate	GPU fp16	980	90
TensorRT	GPU fp16	--	32

Facemesh

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	29	30
TensorFlow Lite	CPU int8	24	27
TensorFlow Lite GPU Delegate	GPU fp16	150	20
TensorRT	GPU fp16	--	?

Hair Segmentation

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	410	400
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	270	30
TensorRT	GPU fp16	--	?

3D Handpose

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	116	85
TensorFlow Lite	CPU int8	80	87
TensorFlow Lite GPU Delegate	GPU fp16	880	90
TensorRT	GPU fp16	--	?

3D Object Detection

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	470	302
TensorFlow Lite	CPU int8	248	249
TensorFlow Lite GPU Delegate	GPU fp16	1990	235
TensorRT	GPU fp16	--	108

Posenet (posenet_mobilenet_v1_100_257x257)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	92	70
TensorFlow Lite	CPU int8	53	55
TensorFlow Lite GPU Delegate	GPU fp16	803	80
TensorRT	GPU fp16	--	18

Semantic Segmentation (deeplabv3_257)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	108	80
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	790	90
TensorRT	GPU fp16	--	?

Selfie to Anime

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	?	7700
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	?	?
TensorRT	GPU fp16	--	?

Artistic Style Transfer

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	1830	950
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	2440	215
TensorRT	GPU fp16	--	?

Text Detection (east_text_detection_320x320)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	1020	680
TensorFlow Lite	CPU int8	378	368
TensorFlow Lite GPU Delegate	GPU fp16	4665	388
TensorRT	GPU fp16	--	?

Name		Name	Last commit message	Last commit date
Latest commit History 499 Commits
.github/workflows		.github/workflows
common		common
gl2age_gender		gl2age_gender
gl2animegan2		gl2animegan2
gl2blazeface		gl2blazeface
gl2blazepose		gl2blazepose
gl2blazepose_fullbody		gl2blazepose_fullbody
gl2boundless		gl2boundless
gl2classification		gl2classification
gl2dbface		gl2dbface
gl2dense_depth		gl2dense_depth
gl2detection		gl2detection
gl2face_portrait		gl2face_portrait
gl2face_segmentation		gl2face_segmentation
gl2facemesh		gl2facemesh
gl2hair_segmentation		gl2hair_segmentation
gl2handpose		gl2handpose
gl2iris_landmark		gl2iris_landmark
gl2mirnet		gl2mirnet
gl2objectron		gl2objectron
gl2pose_estimation_3d		gl2pose_estimation_3d
gl2posenet		gl2posenet
gl2segmentation		gl2segmentation
gl2selfie2anime		gl2selfie2anime
gl2style_transfer		gl2style_transfer
gl2text_detection		gl2text_detection
list_egl_configs		list_egl_configs
misc		misc
third_party		third_party
tools		tools
trt_age_gender		trt_age_gender
trt_classification		trt_classification
trt_dbface		trt_dbface
trt_dense_depth		trt_dense_depth
trt_detection		trt_detection
trt_objectron		trt_objectron
trt_pose_estimation_3d		trt_pose_estimation_3d
trt_posenet		trt_posenet
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Makefile.env		Makefile.env
Makefile.include		Makefile.include
README.md		README.md

License

terryky/tflite_gles_app

Folders and files

Latest commit

History

Repository files navigation