RealCUGAN-TensorRT is a project dedicated to optimizing the inference speed of RealCUGAN. This project is implemented based on TensorRT and includes further code optimizations, enhancing the inference performance of RealCUGAN.
What we provide:
- Refactored code for more logical class definitions and calls (both from TensorRT or tranditional pytorch), improving overall code structure and maintainability.
- Significantly optimized speed using TensorRT implementation.
- Provided ready-to-use scripts for both image and video super-resolution tasks.
- For videos: we output videos as '.mp4' file, and the subtitles will be moved to the new file.
- For images: we use opencv to load and store images.
TODO:
Models:
- 2x model
- 3x model
- 4x model
Fucntion:
- Video super resolution
- Image super resolution
- Faster
sr_tool.py
According to our benchmarks, we fully utilize the GPU for our purpose.
Testing on an NVIDIA RTX 2080Ti shows the following performance improvements:
- Original RealCUGAN: 3.5 FPS
- After code optimization: 4.0 FPS
- With TensorRT implementation and optimizations: 9.8 FPS Through our optimizations, the inference speed of RealCUGAN has improved to approximately 280%.
Our CLI tool, sr_tool.py
, is capable of 9.2FPS in average when generating super resolution for 1080P bangumis.
NVIDIA RTX 20 series or newer NVIDIA GPUs
Important : Sufficient GPU memory is required. Your GPU should be able to run the original RealCUGAN with cache_mode=0
without memory issues.
- Install CUDA and cuDNN
- Clone this repo:
git clone https://github.com/JeremyGuo/RealCUGAN-TensorRT.git && cd RealCUGAN-TensorRT
- Install requirements
pip install -r requirements.txt
- Install pytorch according to the official website.
- Download weights
XXX.pth
from the original RealCUGAN repo: RealCUGAN.
Important : TensoRT changes API frequently, thus this project only supports to 10.5.0
.
python3 ./sr_tool.py -i A.jpg -o B.jpg
and python3 ./sr_tool.py -i A.mp4 -o B.mp4
More details of CLI tool is in python3 ./sr_tool.py -h
from sr_tool import sr_video, sr_image, sr_create_engine
from config import config
engine = sr_create_engine(config, start=True)
sr_image(engine, 'A.jpg', 'B.jpg')
sr_video(engine, 'A.mkv', 'B.mkv')
engine.stop()
Or using our library like a streaming tool.
from sr_engine import SREngine
from upcunet2x import RealCUGANUpScaler2x
model1 = RealCUGANUpScaler2x(...)
model2 = RealCUGANUpScaler2x(...)
engine = SREngine([model1, model2])
for i in range(64):
engine.put((ANYTING, image))
while not stop_flag:
for i in range(32):
ANYTING, result = engine.get()
for i in range(32):
engine.put((ANYTHING, image))
engine.stop()
You could you python3 benchmark.py
for basic evaluaiton.
The profiling result of the above is in profile.nsys-rep
This project is also used to my project Mikan Tracker to automatically track torrents and RSSes.