An AI with Visual for generating voice and actions based on the RWKV model architecture
- This project can be flexibly applied to various local deployments of AI virtual anchors or physical robots, which has the characteristics of saving computing power and power consumption in the future. This AI project has the functions of visual emotion expression and action generation, and is currently in the process of improving various functions.And it also already have the Visual.
- Commands in the documentation are to be executed in the project's root directory unless otherwise specified
python
andpython3
are the same
- Install Python
- Install CUDA/ROCm and the corresponding version of PyTorch
- Install the required libraries
pip install -r requirements.txt
- If you are using an AMD GPU, add the following commands to
~/.bashrc
(using gfx1100 as an example, you can find the specific model by runningrocminfo
)
export ROCM_PATH=/opt/rocm
export HSA_OVERRIDE_GFX_VERSION=11.0.0
- Run the following commands
sudo usermod -aG render $USERNAME
sudo usermod -aG video $USERNAME
If you are an AMD user and want to add cuda operator parallelization, it will be a bit trouble. You need to modify the rwkv standard library, and it may not work.
cd ~/.local/lib/python3.10/site-packages/rwkv
vim ./model.py
- Change lines 37, 46, 472, 505 from
extra_cuda_cflags=["--use_fast_math", "-O3", "--extra-device-vectorization"]
toextra_cuda_cflags=["-O3", "--hipstdpar", "-xhip"]
- Globally search for
os.environ["RWKV_CUDA_ON"] = '0'
and change it toos.environ["RWKV_CUDA_ON"] = '1'
python webui.py
You will find a hip directory under ~/.local/lib/python3.10/site-packages/rwkv
, which contains the converted CUDA parallel operators
Failed? Globally search for os.environ["RWKV_CUDA_ON"] = '1'
and change it to os.environ["RWKV_CUDA_ON"] = '0'
Pre-trained weights are stored in ./weights/
- RWKV-LM RWKV-x060-World-1B6-v2-20240208-ctx4096.pth
- Visaul-RWKV-LM rwkv1b5-vitl336p14-577token_mix665k_rwkv.pth
- Visaul-RWKV rwkv1b5-vitl336p14-577token_mix665k_visual.pth
- Bert s1bert.ckpt
- HuBert hubert_base.pt
- RMVPE rmvpe.pt
- [RWKV-LM] (https://huggingface.co/BlinkDL/rwkv-6-world/tree/main/)
- [Visaul-RWKV] (https://huggingface.co/howard-hou/visualrwkv-5/tree/main/)
- [RWKV-music] (https://huggingface.co/BlinkDL/rwkv-5-music/tree/main)
- Line 19 in
./models/rwkv6/dialogue.py
- Line 19 in
./models/rwkv6/continuation.py
- Line 17 in
./models/music/run.py
- Line 11 in
./models/language_test.py
- line 19 and line 20 in
./models/visualRWKV/app/app_gpu.py
- Execute
python models/language_test.py
- If it interacts normally, the preparation work is correct
python webui.py
python webui.py
Adjust the Visual-RWKV model running strategy in line 24 ofmodels/visualRWKV/app/app_gpu.py
default "cuda fp16"
Alic is a noob in the DeepLearning οΌbut it's could be running
- Training requires OpenSeeFace to extract facial features. After installation, configure the path in
config/openseeface.json
- For some datasets, automatic speech annotation may be required DeepSpeech
You can prepare the data yourself or refer to the following datasets
- Speech, Text Mozilla Common Voice
- Speech, Face CelebV-Text
- Video or audio slicing (default 25FPS * 40s per slice, corresponding to non-language model 25FPS * 1024CTX)
python
- Extract hubert and f0
python
- Extract facial features from video
python
- Wait for YuChuXi,She is a lazy little fox
- Wait for YuChuXi,She is a lazy little fox
- Prepare the model (choose MIDI-model) https://huggingface.co/BlinkDL/rwkv-5-music/tree/main
cd ./models/music
python ./run.py
- The model path is on line 17 of
run.py
. If it does not run properly, change line 22 from "strategy='cuda fp32'" to "strategy='cpu fp32'"
Refer to https://github.com/JL-er/RWKV-PEFT
- Go to
./models/rwkv/
- Run
python language_test.py
- If yuo can can't run
webui.py
In most cases, the command line terminal may not be able to connect to the Huggingface website. Please try using a proxy and set the proxy in the command line terminal.
export https_proxy=http://127.0.0.1:[port]
export http_proxy=http://127.0.0.1:[port]
- parselmouth installation failed: temporarily downgrade
setuptools
to below 58.0