How to deploy multiple instances of the the same T5-3b model on a single node with 4 GPUs #133

James-Bao · 2023-05-24T20:02:11Z

James-Bao
May 24, 2023

After checking the examples at: https://github.com/triton-inference-server/fastertransformer_backend/tree/main/all_models, I'm a little lost on how to enable Data Parallelism using FasterTransformer for T5. E.g: I have a T5-3b model which can be loaded into an A10 GPU. If my node/host has 4 GPUs, how can I have 4 model instances to be loaded into 4 GPUs accordingly? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deploy multiple instances of the the same T5-3b model on a single node with 4 GPUs #133

{{title}}

Replies: 0 comments

Select a reply

How to deploy multiple instances of the the same T5-3b model on a single node with 4 GPUs #133

James-Bao May 24, 2023

Replies: 0 comments

James-Bao
May 24, 2023