A novel adaptive speech-driven facial animation approach, which learns the personalized talking style from a reference
video of about 10 seconds and generates vivid facial expressions and head poses.
An unsupervised variational style transfer model (VAST) to vivify the neutral photo-realistic avatars.
It is able to flexibly capture the expressive facial style from arbitrary video prompts and transfer
it onto a personalized image renderer in a zero-shot manner.
A novel robust and efficient Speech-to-Animation (S2A) approach for synchronized facial animation generation in human-computer interaction.
Experiments demonstrate the effectiveness of the proposed approach on both objective and subjective evaluation with 17x inference speedup
compared with the state-of-the-art approach
we conduct systematic analyses on the motion jittering problem based on the pipeline that
uses 3D face representations to bridge the input audio and output video, and improve the motion stability with a series of effective designs.
A novel dance generation method designed to generate expressive dances, concurrently taking genre matching, beat alignment and dance dynamics into account.
Previous works on expressive speech synthesis mainly focus on current sentence.
The context in adjacent sentences is neglected, resulting in inflexible speaking style for the same text, which lacks speech variations.
A style modeling method for expressive speech synthesis, to capture and predict styles at different levels from a wider range of context rather than a sentence.