Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行multimodal_understanding.py报错,只改了模型从魔搭社区下载那一部分 #36

Open
zhrli opened this issue Oct 24, 2024 · 11 comments

Comments

@zhrli
Copy link

zhrli commented Oct 24, 2024

Exception has occurred: ValueError
Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.
RuntimeError: Could not infer dtype of numpy.float32

During handling of the above exception, another exception occurred:

File "/home/lizhaorui/.cache/huggingface/modules/transformers_modules/Emu3-VisionTokenizer/image_processing_emu3visionvq.py", line 349, in preprocess
return BatchFeature(data=data, tensor_type=return_tensors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 274, in tokenize_image
image_inputs = self.image_processor(image, return_tensors="pt")["pixel_values"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 159, in call
image_tokens = self.tokenize_image(image, padding_image=padding_image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/multimodal_understanding.py", line 35, in
inputs = processor(
^^^^^^^^^^
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

图片的例子是项目里的例子

@ryanzhangfan
Copy link
Collaborator

拉一下最新的代码和模型呢?我check了github,modelscope里所有最新代码,processing_emu3.py line 159都不是一行有效代码。

@zhrli
Copy link
Author

zhrli commented Oct 24, 2024

拉一下最新的代码和模型呢?我check了github,modelscope里所有最新代码,processing_emu3.py line 159都不是一行有效代码。

确实能跑起来了,爆内存了。 项目能不能考虑分块儿分卡执行,单卡执行需要的资源太多了。Emu3确实是我们行业里可以依赖的唯一模型,感谢智源研究院。

@ryanzhangfan
Copy link
Collaborator

模型完全兼容transformers中的各种优化方法,可以直接使用transformers或者accelerate支持的自动化分卡(仅限多模态理解模型),代码可以参考Emu2 demo code,或者使用transformers自带的int4量化。如果只是kv cache爆了也可以尝试transformers库支持的offload kvcache的方式。

@zhrli
Copy link
Author

zhrli commented Oct 24, 2024

拉一下最新的代码和模型呢?我check了github,modelscope里所有最新代码,processing_emu3.py line 159都不是一行有效代码。

Traceback (most recent call last):
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 186, in convert_to_tensors
tensor = as_tensor(value)
^^^^^^^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 142, in as_tensor
return torch.tensor(value)
^^^^^^^^^^^^^^^^^^^
RuntimeError: Could not infer dtype of numpy.float32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/lizhaorui/DL/Emu3/multimodal_understanding.py", line 34, in
inputs = processor(
^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 156, in call
image_tokens = self.tokenize_image(image, padding_image=padding_image)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/DL/Emu3/emu3/mllm/processing_emu3.py", line 271, in tokenize_image
image_inputs = self.image_processor(image, return_tensors="pt")["pixel_values"]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/image_processing_utils.py", line 41, in call
return self.preprocess(images, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/.cache/huggingface/modules/transformers_modules/Emu3-VisionTokenizer/image_processing_emu3visionvq.py", line 349, in preprocess
return BatchFeature(data=data, tensor_type=return_tensors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 79, in init
self.convert_to_tensors(tensor_type=tensor_type)
File "/home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages/transformers/feature_extraction_utils.py", line 192, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

在运行multimodal_understanding.py时候报错仍然存在,新拉了代码

@ryanzhangfan
Copy link
Collaborator

可以check下numpy和torch的版本,目测这是np.array转torch.tensor的时候报的错误。
image

@zhrli
Copy link
Author

zhrli commented Oct 24, 2024

可以check下numpy和torch的版本,目测这是np.array转torch.tensor的时候报的错误。 image

Name: torch
Version: 2.2.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/lizhaorui/anaconda3/envs/agent/lib/python3.12/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, typing-extensions
Required-by: accelerate, bitsandbytes, flash_attn, torchaudio, torchvision

Name: numpy
Version: 1.26.4

@ryanzhangfan
Copy link
Collaborator

可以尝试换下numpy版本试试?看着是numpy转tensor报错,但是识别到的numpy.dtype也没啥问题。。我们的环境同样的版本torch 2.2.1, numpy 1.26.4, transformers 4.44.0是能正常运行的。如果换版本还是不行可以在/home/lizhaorui/.cache/huggingface/modules/transformers_modules/Emu3-VisionTokenizer/image_processing_emu3visionvq.py line 349前打印确认下pixel_values的dtype和shape.

@zhrli
Copy link
Author

zhrli commented Oct 24, 2024

transformers自带的int4量化

print(pixel_values.shape)
(2, 3, 512, 512)

print(pixel_values.dtype)
float32

@ryanzhangfan
Copy link
Collaborator

确认下环境问题吧,仅从目前提供的信息看,看起来不太像是我们代码的问题,而是numpy.array转torch.tensor报错了。

@zhrli
Copy link
Author

zhrli commented Oct 25, 2024

确认下环境问题吧,仅从目前提供的信息看,看起来不太像是我们代码的问题,而是numpy.array转torch.tensor报错了。
搞定了
pytorch 2.2.1 配合numpy1.25.2

@zhrli
Copy link
Author

zhrli commented Oct 25, 2024

模型完全兼容transformers中的各种优化方法,可以直接使用transformers或者accelerate支持的自动化分卡(仅限多模态理解模型),代码可以参考Emu2 demo code,或者使用transformers自带的int4量化。如果只是kv cache爆了也可以尝试transformers库支持的offload kvcache的方式。

双卡4090跑成功了

-- coding: utf-8 --

from PIL import Image
from transformers import AutoTokenizer, AutoModel, AutoImageProcessor, AutoModelForCausalLM
from transformers import BitsAndBytesConfig # 导入量化配置
from transformers.generation.configuration_utils import GenerationConfig
import torch

from emu3.mllm.processing_emu3 import Emu3Processor

from modelscope import snapshot_download

model path

EMU_HUB = snapshot_download("BAAI/Emu3-Chat")
VQ_HUB = snapshot_download("BAAI/Emu3-VisionTokenizer")

Quantization configuration

quantization_config = BitsAndBytesConfig(
load_in_4bit=True, # 使用 int4 量化
bnb_4bit_quant_type='nf4', # 量化类型
bnb_4bit_compute_dtype=torch.bfloat16, # 计算精度
)

prepare model and processor

model = AutoModelForCausalLM.from_pretrained(
EMU_HUB,
quantization_config=quantization_config, # 使用量化配置
device_map="auto", # 自动分配到所有可用的GPU上
trust_remote_code=True,
)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(EMU_HUB, trust_remote_code=True, padding_side="left")
image_processor = AutoImageProcessor.from_pretrained(VQ_HUB, trust_remote_code=True)
image_tokenizer = AutoModel.from_pretrained(VQ_HUB, device_map="auto", trust_remote_code=True).eval() # 同样量化模型
processor = Emu3Processor(image_processor, image_tokenizer, tokenizer)

prepare input

text = ["Please describe the image", "Please describe the image"]
image = Image.open("assets/demo.png")
image = [image, image]

inputs = processor(
text=text,
image=image,
mode='U',
padding_image=True,
padding="longest",
return_tensors="pt",
)

prepare hyper parameters

GENERATION_CONFIG = GenerationConfig(pad_token_id=tokenizer.pad_token_id, bos_token_id=tokenizer.bos_token_id, eos_token_id=tokenizer.eos_token_id)

generate

outputs = model.generate(
inputs.input_ids.to("cuda:0"), # 这里应该使用 inputs.input_ids
generation_config=GENERATION_CONFIG,
max_new_tokens=1024,
attention_mask=inputs.attention_mask.to("cuda:0"),
)

outputs = outputs[:, inputs.input_ids.shape[-1]:]
answers = processor.batch_decode(outputs, skip_special_tokens=True)
for ans in answers:
print(ans)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants