Are concurrent calls to the 'Run' method in the C interface serialized when using the CUDA provider? #5502

xgirones · 2020-10-15T18:01:54Z

xgirones
Oct 15, 2020

When employing the CPU provider, it is possible to run multiple inferences in parallel in the same session by calling the Run method concurrently from different threads. However, when attempting the same with the CUDA provider, the performance is very bad, and it appears that the ORT is serializing simultaneous calls to Run. Is this the case? If so, what can I do to run multiple concurrent queries efficiently in a single session when using the CUDA backend? My workflow involves running multiple inferences in parallel using tiny batch sizes and small models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are concurrent calls to the 'Run' method in the C interface serialized when using the CUDA provider? #5502

{{title}}

Replies: 0 comments

Select a reply

Are concurrent calls to the 'Run' method in the C interface serialized when using the CUDA provider? #5502

xgirones Oct 15, 2020

Replies: 0 comments

xgirones
Oct 15, 2020