ilab serve with vllm - error on 3x GPU system #514

markmc · 2024-05-23T19:02:03Z

This machine has:

$ nvidia-ctk --quiet cdi list | grep -P nvidia.com/gpu='\d+'
nvidia.com/gpu=0
nvidia.com/gpu=1
nvidia.com/gpu=2

and I got:

  File "/workspace/vllm/entrypoints/openai/api_server.py", line 172, in <module>
    engine = AsyncLLMEngine.from_engine_args(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/vllm/engine/async_llm_engine.py", line 332, in from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/vllm/engine/arg_utils.py", line 495, in create_engine_config
    return EngineConfig(model_config=model_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 11, in __init__
  File "/workspace/vllm/config.py", line 1054, in __post_init__
    self.model_config.verify_with_parallel_config(self.parallel_config)
  File "/workspace/vllm/config.py", line 260, in verify_with_parallel_config
    raise ValueError(
ValueError: Total number of attention heads (32) must be divisible by tensor parallel size (3).

xref vllm-project/vllm#596

The text was updated successfully, but these errors were encountered:

Gregory-Pereira · 2024-05-23T19:45:29Z

Seems like we need to find some way to set the --tensor-parallel-size within the vllm image from the build process. As long as its configurable users should be able to set something to work for them.

DotNetDevlll · 2024-08-13T00:06:09Z

I have the same problem too with 3 GPU setup.

hemajv · 2024-09-10T21:37:04Z

I get this error as well, when I tried to serve the mixtral model across 3 GPUs
has there been any workarounds for this?

cooktheryan added the bug Something isn't working label Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ilab serve with vllm - error on 3x GPU system #514

ilab serve with vllm - error on 3x GPU system #514

markmc commented May 23, 2024

Gregory-Pereira commented May 23, 2024

DotNetDevlll commented Aug 13, 2024

hemajv commented Sep 10, 2024 •

edited

Loading

ilab serve with vllm - error on 3x GPU system #514

ilab serve with vllm - error on 3x GPU system #514

Comments

markmc commented May 23, 2024

Gregory-Pereira commented May 23, 2024

DotNetDevlll commented Aug 13, 2024

hemajv commented Sep 10, 2024 • edited Loading

hemajv commented Sep 10, 2024 •

edited

Loading