You can now use an image as a driving signal to drive the source image or video! Additionally, we have refined the driving options to support expressions, pose, lips, eyes, or all (all is consistent with the previous default method), which we name it regional control. The control is becoming more and more precise! 🎯
Please note that image-based driving or regional control may not perform well in certain cases. Feel free to try different options, and be patient. 😊
Note
We recognize that the project now offers more options, which have become increasingly complex, but due to our limited team capacity and resources, we haven’t fully documented them yet. We ask for your understanding and will work to improve the documentation over time. Contributions via PRs are welcome! If anyone is considering donating or sponsoring, feel free to leave a message in the GitHub Issues or Discussions. We will set up a payment account to reward the team members or support additional efforts in maintaining the project. 💖
It's very simple to use an image as a driving reference. Just set the -d
argument to the driving image:
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d30.jpg
To change the animation_region
option, you can use the --animation_region
argument to exp
, pose
, lip
, eyes
, or all
. For example, to only drive the lip region, you can run by:
# only driving the lip region
python inference.py -s assets/examples/source/s5.jpg -d assets/examples/driving/d0.mp4 --animation_region lip
Image-driven Portrait Animation and Regional Control
flag_relative_motion:
When using an image as the driving input, setting --flag_relative_motion
to true will apply the motion deformation between the driving image and its canonical form. If set to false, the absolute motion of the driving image is used, which may amplify expression driving strength but could also cause identity leakage. This option corresponds to the relative motion
toggle in the Gradio interface. Additionally, if both source and driving inputs are images, the output will be an image. If the source is a video and the driving input is an image, the output will be a video, with each frame driven by the image's motion. The Gradio interface automatically saves and displays the output in the appropriate format.
animation_region: This argument offers five options:
exp
: Only the expression of the driving input influences the source.pose
: Only the head pose drives the source.lip
: Only lip movement drives the source.eyes
: Only eye movement drives the source.all
: All motions from the driving input are applied.
You can also select these options directly in the Gradio interface.
Editing the Lip Region of the Source Video to a Neutral Expression:
In response to requests for a more neutral lip region in the Retargeting Video
of the Gradio interface, we've added a keeping the lip silent
option. When selected, the animated video's lip region will adopt a neutral expression. However, this may cause inter-frame jitter or identity leakage, as it uses a mode similar to absolute driving. Note that the neutral expression may sometimes feature a slightly open mouth.
Others:
When both source and driving inputs are videos, the output motion may be a blend of both, due to the default setting of --flag_relative_motion
. This option uses relative driving, where the motion offset of the current driving frame relative to the first driving frame is added to the source frame's motion. In contrast, --no_flag_relative_motion
applies the driving frame's motion directly as the final driving motion.
For CLI usage, to retain only the driving video's motion in the output, use:
python inference.py --no_flag_relative_motion
In the Gradio interface, simply uncheck the relative motion option. Note that absolute driving may cause jitter or identity leakage in the animated video.