-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when use engine optimum device tensorrt,startup fail #372
Comments
@weibingo Any chance you have a similar model from huggingface? Are you using optimum-gpu or tensorrt backend? Are you sure tensorrt is correct installed? |
yes。model i use optimum cuda is ok 。 tensorrt env alse have error,but i resolved。 |
@weibingo No idea why cuda graph capture does not work. I have not used trt much, it only had marginal performance gains over onnx-gpu. |
@michaelfeil so you don't test with engine optimum, device tensorrt ? |
@weibingo it’s not possible to test in ci (which is cpu) & i have not used it locally in the last 3 months. Before that, it was extensively tested with 8.6.1 |
System Info
infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32
OS: linux
model_base PEG
nvidia-smi: cuda version 11.8, tensorrt: 8.6.1
Information
Tasks
Reproduction
1、just startup
Expected behavior
python3.10/dist-packages/optimum/onnxruntime/model_ort.py line 1444, in forward
model_outputs = self.__prepare_onnx_outputs(use_torch, **onnx_outputs)
python3.10/dist-packages/optimum/onnxruntime/modeling_ort.py line 939 in __prepare_onnx_outputs
model_outputs[output_name]=onnx_outputs[idx]
IndexError: tuple index out of range
then i print log with model run inputs and outputs , find warmup model , first inference is ok , twice is error
if i startup with --no-model-warmup, server can startup , but twice inference also error
The text was updated successfully, but these errors were encountered: