Skip to content
This repository has been archived by the owner on Oct 9, 2024. It is now read-only.

[Bug] Int8 quantize inference failed using bloom-inference-scripts/bloom-ds-inference.py with deepspeed==0.9.0 on multi-gpus #77

Open
hanrui1sensetime opened this issue Apr 17, 2023 · 1 comment

Comments

@hanrui1sensetime
Copy link

I am using multi-gpu to quantize the model and inference with deepspeed==0.9.0, but failed.

Device: RTX-3090 x 8 Server
Docker: nvidia-pytorch-container which tag is 22.07-py3. Then git clone this codebase in docker.
Command:

deepspeed --include localhost:1,6 bloom-inference-scripts/bloom-ds-inference.py --local_rank=0 --name bigscience/bloomz-7b1-mt --dtype int8

ErrorLog:

Traceback (most recent call last):
  File "bloom-inference-scripts/bloom-ds-inference.py", line 182, in <module>
    model = deepspeed.init_inference(
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/__init__.py", line 324, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 194, in __init__
    self._apply_injection_policy(config)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 396, in _apply_injection_policy
    replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 519, in replace_transformer_layer
    load_model_with_checkpoint(replaced_module,
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 243, in load_model_with_checkpoint
    load_module_recursive(r_module)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 237, in load_module_recursive
    load_module_recursive(
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 237, in load_module_recursive
    load_module_recursive(
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 235, in load_module_recursive
    layer_policies[child.__class__](child, prefix + name + '.')
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 173, in load_transformer_layer
    container.load_params(module, sd[0], weight_quantizer, mp_replace, prefix)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/containers/bloom.py", line 51, in load_params
    maybe_copy(module.attention,
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/policy.py", line 181, in maybe_copy
    dst = mp_replace.copy(dst, weight_quantizer.quantize(tmp if weight_quantizer.q_int8 else \
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 111, in copy
    dst.data.copy_(src[:, self.gpu_index * dst_shape[self.out_dim]: (self.gpu_index + 1) * dst_shape[self.out_dim]] if outer_dim == 1 else \
RuntimeError: The size of tensor a (6144) must match the size of tensor b (4096) at non-singleton dimension 1

There is no code changed, so I wonder why the code of multi-gpu int8 is failed, while multi-gpu with FP16 settings works fine.

@LiuShixing
Copy link

Same error, how to solve it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants