Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when use engine optimum device tensorrt,startup fail #372

Open
2 of 4 tasks
weibingo opened this issue Sep 23, 2024 · 5 comments
Open
2 of 4 tasks

when use engine optimum device tensorrt,startup fail #372

weibingo opened this issue Sep 23, 2024 · 5 comments

Comments

@weibingo
Copy link

weibingo commented Sep 23, 2024

System Info

infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32
OS: linux
model_base PEG
nvidia-smi: cuda version 11.8, tensorrt: 8.6.1

Information

  • Docker
  • The CLI directly via pip

Tasks

  • An officially supported command
  • My own modifications

Reproduction

1、just startup

Expected behavior

python3.10/dist-packages/optimum/onnxruntime/model_ort.py line 1444, in forward
model_outputs = self.__prepare_onnx_outputs(use_torch, **onnx_outputs)
python3.10/dist-packages/optimum/onnxruntime/modeling_ort.py line 939 in __prepare_onnx_outputs
model_outputs[output_name]=onnx_outputs[idx]
IndexError: tuple index out of range

then i print log with model run inputs and outputs , find warmup model , first inference is ok , twice is error
if i startup with --no-model-warmup, server can startup , but twice inference also error

@michaelfeil
Copy link
Owner

@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed?

@weibingo
Copy link
Author

weibingo commented Sep 24, 2024

@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed?

yes。model i use optimum cuda is ok 。 tensorrt env alse have error,but i resolved。
i test the embedder.optimum.py , directly init OptimumEmbedder,and the error exists。 then i look source code, at utils_optimum.py , i find tensorrtExecutionProvider options without trt_cude_graph_enble can work.
but i don't understand why can work and if has trt_cude_graph_enble can't work

@michaelfeil
Copy link
Owner

@weibingo No idea why cuda graph capture does not work. I have not used trt much, it only had marginal performance gains over onnx-gpu.

@weibingo
Copy link
Author

@michaelfeil so you don't test with engine optimum, device tensorrt ?

@michaelfeil
Copy link
Owner

@weibingo it’s not possible to test in ci (which is cpu) & i have not used it locally in the last 3 months. Before that, it was extensively tested with 8.6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants