Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda10.0+cudnn7.6.3+TensorRT-6.0.1.5 #28

Open
DaChaoXc opened this issue Jun 18, 2020 · 9 comments
Open

cuda10.0+cudnn7.6.3+TensorRT-6.0.1.5 #28

DaChaoXc opened this issue Jun 18, 2020 · 9 comments

Comments

@DaChaoXc
Copy link

能编译通过,测试下来结果如下:
input-engine: model/yolov4.engine
image: dog.jpg
video: nuscenes_mini.mp4
cost: 382 ms
cost: 206.5 ms
cost: 148 ms
cost: 118.75 ms
cost: 101.4 ms
cost: 89.3333 ms
cost: 81 ms
cost: 74.5 ms
cost: 69.6667 ms
cost: 67.2 ms
cost: 63.7273 ms
cost: 60.8333 ms
cost: 58.3846 ms
cost: 56.2857 ms
cost: 54.4667 ms
cost: 52.875 ms
cost: 51.4706 ms
cost: 50.2222 ms
cost: 49.1053 ms
cost: 48.1 ms
cost: 47.1905 ms
cost: 46.3636 ms
cost: 45.6087 ms
cost: 44.9167 ms
cost: 44.28 ms
cost: 43.6923 ms
cost: 43.1481 ms
cost: 42.6429 ms
cost: 42.1724 ms
cost: 41.7333 ms
cost: 41.3226 ms
cost: 41.1875 ms
cost: 40.8182 ms
cost: 40.5588 ms
cost: 40.2857 ms
cost: 40 ms
cost: 39.7297 ms
cost: 39.5 ms
cost: 39.2308 ms
cost: 39 ms
cost: 38.7561 ms

感觉cuda9.0测试下来结果都在29ms左右,请问是什么原因呢?莫非源代码和cuda9.0绑定了?感谢

@CaoWGG
Copy link
Owner

CaoWGG commented Jun 18, 2020

@DaChaoXc
估计差距在mishkernel这个核函数
可用nvprof查看 cuda9 和 cuda10的函数运行时间占比区别
mishkernel可以优化

@zhangzhixiangah
Copy link

@DaChaoXc 想问一下你这个是量化后的结果还是没有量化的结果呀?

@DaChaoXc
Copy link
Author

DaChaoXc commented Jan 4, 2021

@DaChaoXc 想问一下你这个是量化后的结果还是没有量化的结果呀?

没有量化,是float32

@zhangzhixiangah
Copy link

谢谢,为什么我fp16才有你这样的结果呀,是什么原因?

@zhangzhixiangah
Copy link

@DaChaoXc

@DaChaoXc
Copy link
Author

DaChaoXc commented Jan 6, 2021

@DaChaoXc

不太明白你说的结果是指什么

@zhangzhixiangah
Copy link

@DaChaoXc ,hi 我在nvidia nx上和你跑的程序一样,但运行时间比你慢很多,另外还有一个问题是,我把上面的程序int8量化后竟然比fp16还慢一倍,你知道是什么原因么?,谢谢.

@DaChaoXc
Copy link
Author

DaChaoXc commented Jan 7, 2021

@DaChaoXc ,hi 我在nvidia nx上和你跑的程序一样,但运行时间比你慢很多,另外还有一个问题是,我把上面的程序int8量化后竟然比fp16还慢一倍,你知道是什么原因么?,谢谢.

你的配置环境是什么样的?

@zhangzhixiangah
Copy link

Hi,@DaChaoXc ,我的是Jetpack4.4,ubuntu18.04,cuda10.2,cudnn8.0,板子是nvidia Jetson xavier nx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants