You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
File "/workspace/vllm/entrypoints/openai/api_server.py", line 172, in <module>
engine = AsyncLLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/vllm/engine/async_llm_engine.py", line 332, in from_engine_args
engine_config = engine_args.create_engine_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/vllm/engine/arg_utils.py", line 495, in create_engine_config
return EngineConfig(model_config=model_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 11, in __init__
File "/workspace/vllm/config.py", line 1054, in __post_init__
self.model_config.verify_with_parallel_config(self.parallel_config)
File "/workspace/vllm/config.py", line 260, in verify_with_parallel_config
raise ValueError(
ValueError: Total number of attention heads (32) must be divisible by tensor parallel size (3).
Seems like we need to find some way to set the --tensor-parallel-size within the vllm image from the build process. As long as its configurable users should be able to set something to work for them.
This machine has:
and I got:
xref vllm-project/vllm#596
The text was updated successfully, but these errors were encountered: