when use engine optimum device tensorrt，startup fail #372

weibingo · 2024-09-23T09:07:32Z

System Info

infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32
OS: linux
model_base PEG
nvidia-smi: cuda version 11.8, tensorrt: 8.6.1

Information

Docker
The CLI directly via pip

Tasks

An officially supported command
My own modifications

Reproduction

1、just startup

Expected behavior

python3.10/dist-packages/optimum/onnxruntime/model_ort.py line 1444, in forward
model_outputs = self.__prepare_onnx_outputs(use_torch, **onnx_outputs)
python3.10/dist-packages/optimum/onnxruntime/modeling_ort.py line 939 in __prepare_onnx_outputs
model_outputs[output_name]=onnx_outputs[idx]
IndexError: tuple index out of range

then i print log with model run inputs and outputs , find warmup model , first inference is ok , twice is error
if i startup with --no-model-warmup, server can startup , but twice inference also error

michaelfeil · 2024-09-23T17:38:54Z

@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed?

weibingo · 2024-09-24T02:54:33Z

@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed?

yes。model i use optimum cuda is ok 。 tensorrt env alse have error，but i resolved。
i test the embedder.optimum.py , directly init OptimumEmbedder，and the error exists。 then i look source code, at utils_optimum.py , i find tensorrtExecutionProvider options without trt_cude_graph_enble can work.
but i don't understand why can work and if has trt_cude_graph_enble can't work

michaelfeil · 2024-09-24T03:09:31Z

@weibingo No idea why cuda graph capture does not work. I have not used trt much, it only had marginal performance gains over onnx-gpu.

weibingo · 2024-09-24T11:22:00Z

@michaelfeil so you don't test with engine optimum, device tensorrt ?

michaelfeil · 2024-09-24T14:24:08Z

@weibingo it’s not possible to test in ci (which is cpu) & i have not used it locally in the last 3 months. Before that, it was extensively tested with 8.6.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when use engine optimum device tensorrt，startup fail #372

when use engine optimum device tensorrt，startup fail #372

weibingo commented Sep 23, 2024 •

edited

Loading

michaelfeil commented Sep 23, 2024

weibingo commented Sep 24, 2024 •

edited

Loading

michaelfeil commented Sep 24, 2024

weibingo commented Sep 24, 2024

michaelfeil commented Sep 24, 2024

when use engine optimum device tensorrt，startup fail #372

when use engine optimum device tensorrt，startup fail #372

Comments

weibingo commented Sep 23, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

michaelfeil commented Sep 23, 2024

weibingo commented Sep 24, 2024 • edited Loading

michaelfeil commented Sep 24, 2024

weibingo commented Sep 24, 2024

michaelfeil commented Sep 24, 2024

weibingo commented Sep 23, 2024 •

edited

Loading

weibingo commented Sep 24, 2024 •

edited

Loading