Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ilab serve with vllm - error on 3x GPU system #514

Open
markmc opened this issue May 23, 2024 · 3 comments
Open

ilab serve with vllm - error on 3x GPU system #514

markmc opened this issue May 23, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@markmc
Copy link
Contributor

markmc commented May 23, 2024

This machine has:

$ nvidia-ctk --quiet cdi list | grep -P nvidia.com/gpu='\d+'
nvidia.com/gpu=0
nvidia.com/gpu=1
nvidia.com/gpu=2

and I got:

  File "/workspace/vllm/entrypoints/openai/api_server.py", line 172, in <module>
    engine = AsyncLLMEngine.from_engine_args(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/vllm/engine/async_llm_engine.py", line 332, in from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/vllm/engine/arg_utils.py", line 495, in create_engine_config
    return EngineConfig(model_config=model_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 11, in __init__
  File "/workspace/vllm/config.py", line 1054, in __post_init__
    self.model_config.verify_with_parallel_config(self.parallel_config)
  File "/workspace/vllm/config.py", line 260, in verify_with_parallel_config
    raise ValueError(
ValueError: Total number of attention heads (32) must be divisible by tensor parallel size (3).

xref vllm-project/vllm#596

@Gregory-Pereira
Copy link
Collaborator

Seems like we need to find some way to set the --tensor-parallel-size within the vllm image from the build process. As long as its configurable users should be able to set something to work for them.

@cooktheryan cooktheryan added the bug Something isn't working label Jun 13, 2024
@DotNetDevlll
Copy link

I have the same problem too with 3 GPU setup.

@hemajv
Copy link
Collaborator

hemajv commented Sep 10, 2024

I get this error as well, when I tried to serve the mixtral model across 3 GPUs
has there been any workarounds for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants