This repos contains the code for the Embedded Machine Learning Lab Challenge. The focus of the lab was to speedup inference of TinyYolov2 by applying different steps: finetuning, layer fusion, pruning and other optimizations.
- refactor training
- tune hyperparam (lr?)
- add early stopping to not overfit
- save model after each epoch
- Add quantized version of yolo
- add fusion and quantization with pytorch API (model 1: PTQ, model 2: QAT)
- test model 1
- train and test model 2
- add self-implemented fused conv_bn layer and quantized tinyyolo (model 3)
- train model 3
Resources:
- Add pruning
- export to ONNX
- test inference with ONNX
- Add detection pipeline to camera loop
- Add framerate measurements
- Test each model for demo
- add tensorboard logging for every step
- Add visuals for different models
- see this graphs from this paper
- compare roc curves of different models (one plot with all curves)
- compare size of models (stae_dicts) after each step (histogram)
- implement test for inference time improvement (see last cell in quantization notebook)
- Other pruning method: https://github.com/NVlabs/Taylor_pruning
- ONNX Graph optimization: https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html
- Up-to-date Jetson Docker container https://github.com/dusty-nv/jetson-containers