You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.
Traceback (most recent call last):
File "bloom-inference-scripts/bloom-ds-inference.py", line 182, in <module>
model = deepspeed.init_inference(
File "/opt/conda/lib/python3.8/site-packages/deepspeed/__init__.py", line 324, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 194, in __init__
self._apply_injection_policy(config)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 396, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 519, in replace_transformer_layer
load_model_with_checkpoint(replaced_module,
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 243, in load_model_with_checkpoint
load_module_recursive(r_module)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 237, in load_module_recursive
load_module_recursive(
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 237, in load_module_recursive
load_module_recursive(
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 235, in load_module_recursive
layer_policies[child.__class__](child, prefix + name + '.')
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 173, in load_transformer_layer
container.load_params(module, sd[0], weight_quantizer, mp_replace, prefix)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/containers/bloom.py", line 51, in load_params
maybe_copy(module.attention,
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/policy.py", line 181, in maybe_copy
dst = mp_replace.copy(dst, weight_quantizer.quantize(tmp if weight_quantizer.q_int8 else \
File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 111, in copy
dst.data.copy_(src[:, self.gpu_index * dst_shape[self.out_dim]: (self.gpu_index + 1) * dst_shape[self.out_dim]] if outer_dim == 1 else \
RuntimeError: The size of tensor a (6144) must match the size of tensor b (4096) at non-singleton dimension 1
There is no code changed, so I wonder why the code of multi-gpu int8 is failed, while multi-gpu with FP16 settings works fine.
The text was updated successfully, but these errors were encountered:
I am using multi-gpu to quantize the model and inference with deepspeed==0.9.0, but failed.
Device: RTX-3090 x 8 Server
Docker: nvidia-pytorch-container which tag is
22.07-py3
. Then git clone this codebase in docker.Command:
ErrorLog:
There is no code changed, so I wonder why the code of multi-gpu int8 is failed, while multi-gpu with FP16 settings works fine.
The text was updated successfully, but these errors were encountered: