-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to remove signal and wait layer in the engine? #4232
Comments
It seems that signal and wait layer are own to myelin. you can upload your two scripts. |
@lix19937 the second script is official example scripts(https://github.com/NVIDIA/TensorRT-LLM/blob/v0.9.0/examples/llama/convert_checkpoint.py), there are some different with output because the model is a llama SequenceClassification model, the follow is diff with llama model
the following is build command. python3 convert_checkpoint.py \
--model_dir pytorch_model/ \
--output_dir checkpoints/ \
--dtype float16
trtllm-build \
--checkpoint_dir checkpoints/ \
--output_dir engines/ \
--gpt_attention_plugin disable \
--gemm_plugin disable \
--remove_input_padding disable \
--paged_kv_cache disable \
--max_batch_size 2 \
--max_input_len 1300 \
--max_output_len 1 \
--gpus_per_node 1 \
--profiling_verbosity |
@zerollzeng Can you take a look? |
@lijinghaooo SequenceClassification model need last results then as current input, can you modify the the second script and use the same model to check. |
@lix19937 Thank you for your reply! Is there other insights/ to work around this layer?
|
Description
Using trt llm to generate llama classification model. I have two similar script to generate engine, the first is raw scripts, the second is base on example/llama/build.sh script.
However, the second engine is slower than the first engine, so I dump engine layer, there are many signal and wait layer(as the images below show) in the second engine. It seems happen at type cast.
Any idea why generate signal and wait layer and how to work around this layer.
Environment
TensorRT Version:
9.3.0
NVIDIA GPU:
L20
NVIDIA Driver Version:
535.161.08
CUDA Version:
12.2
CUDNN Version:
8.9.6
Operating System:
Ubuntu 22.04.3 LTS
Python Version (if applicable):
3.10.12
Tensorflow Version (if applicable):
no
PyTorch Version (if applicable):
2.2.2
Baremetal or Container (if so, version):
container nvidia/cuda:12.1.0-devel-ubuntu22.04
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?: no, there are high cost to upgrade trt version
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: