🦜 VideoChat [论文/demo]

我们提出了VideoChat，一个以聊天为中心的视频理解系统，开展了探索性的视频理解研究。它通过一个可学习的接口将视频预训练模型和大语言模型结合在一起，擅长于空间-时间推理、事件定位和因果关系推断。为了有指导性地训练这个模型，我们提出了一个视频为中心的Instruction数据集，由数千个视频和详细描述及对话组成。该数据集强调空间-时间推理和因果关系，为训练以聊天为中心的视频理解系统提供了训练数据。初步的实验展示了我们系统在广泛的视频应用中的潜力。

🔥 更新

2023/05/12: 发布7B版本：
- 🎊 模型-7B：7B版本需要约20GB的GPU内存，而13B版本需要约32GB的GPU内存。
2023/05/11: 发布🦜VideoChat V1版本，可以处理图像和视频理解！
- 🎊 模型-13B and 数据.
- 🤗 在线演示Demo
- 🧑‍🔧 训练脚本在整理代码中，稍后开源.

⏳ 计划

小规模视频Instruction数据和训练
在BLIP+UniFormerV2+Vicuna上进行训练
大规模和复杂的视频Instruction数据
在更强视频基础模型上进行Instruction训练
与更长的视频进行友好的交互
…

💬 示例在线体验🦜

与ChatGPT、MiniGPT-4、LLaVA和mPLUG-Owl的比较。
我们的VideoChat可以较好地处理图像和视频理解！

[Video] 为什么这个视频很有趣？b>

[Video] 空间感知

[Video] 时间感知

[Video] 多轮对话

图像理解

🏃 使用方法

Linux 环境

准备环境.

建议在conda环境下进行安装（可选）

conda create -n videochat python=3.8 conda activate videochat

安装python环境

pip install -r requirements.txt

下载预训练模型

下载BLIP2 model:

mkdir model wget -P ./model/ https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth wget -P ./model/ https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth

如果您修改了下载的地址，您需要修改 vit_model_path and q_former_model_path in config.json or config_7b.json.

下载StableVicuna模型：

您需要从llama github 或 huggingface 下载预训练权重

如果您是从llama github中下载的LLAMA，请先对权重进行预处理

# convert_llama_weights_to_hf is copied from transformers python src/transformers/models/llama/convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights \ --model_size 13B --output_dir /output/path

下载13B的 stable-vicuna-13b-delta并处理: ** 请注意，这可能需要30G以上的GPU显存，如果您有24G显存的GPU，请下载下面的7B模型 **

# fastchat v0.1.10 python3 apply_delta.py \ --base {llama-13b的模型路径} \ --target stable-vicuna-13b \ --delta CarperAI/stable-vicuna-13b-delta

下载7B的 vicuna-7b-delta-v0并处理：

# fastchat v0.1.10 python3 apply_delta.py \ --base {llama-7b的模型路径} \ --target vicuna-7b-v0 \ --delta CarperAI/vicuna-7b-delta-v0

Change the llama_model_path in config.json or config_7b.json.

下载VideoChat-13B or VideoChat-7B:

Change the videochat_model_path in config.jsonor config_7b.json.

开始运行demo

python demo.py

打开 127.0.0.1:7860 开始体验~

[可选] 我们也提供了Jupyter Notebook 的demo

📄 引用

如果您觉得这个项目对您有帮助，请考虑引用我们：

@article{2023videochat, title={VideoChat: Chat-Centric Video Understanding}, author={KunChang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, Limin Wang, and Yu Qiao}, journal={arXiv preprint arXiv:2305.06355}, year={2023} }

👍 致谢

感谢以下开源数据:

InternVideo, UniFormerV2, MiniGPT-4, LLaVA, BLIP2, StableLM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_CN.md

README_CN.md

🦜 VideoChat [论文/demo]

🔥 更新

⏳ 计划

💬 示例在线体验🦜

🏃 使用方法

Linux 环境

📄 引用

👍 致谢

Files

README_CN.md

Latest commit

History

README_CN.md

File metadata and controls

🦜 VideoChat [论文/demo]

🔥 更新

⏳ 计划

💬 示例 在线体验🦜

🏃 使用方法

Linux 环境

📄 引用

👍 致谢

💬 示例在线体验🦜