[1]LI Wei,SONG Yupu,LIU Yazhi,et al.A Method for Personalized Speech-Driven 3D Facial Animation Generation[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-8.[doi:10.13705/j.issn.1671-6833.2026.04.023]
Copy
Journal of Zhengzhou University (Engineering Science)[ISSN
1671-6833/CN
41-1339/T] Volume:
48
Number of periods:
2027 XX
Page number:
1-8
Column:
Public date:
2027-12-10
- Title:
-
A Method for Personalized Speech-Driven 3D Facial Animation Generation
- Author(s):
-
LI Wei1,2, SONG Yupu1,2, LIU Yazhi1,2, AN Yi1,2
-
(1. College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China; 2. Hebei Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China)
-
- Keywords:
-
speech-driven animation; 3D facial animation; deep learning; diffusion model; personalization
- CLC:
-
TP391.41 TN912.3
- DOI:
-
10.13705/j.issn.1671-6833.2026.04.023
- Abstract:
-
To address the challenges of speech-driven 3D facial animation, including difficult alignment between speech and motion, loss of identity features, and limited personalized dynamic expression, a conditional diffusionbased generation framework was proposed. The framework used a dual-path style encoding structure to extract hierarchical identity features and dynamic motion features, and then applied a bidirectional attention mechanism to deeply fuse speech features with noisy motion features. Based on this design, an improved Transformer decoderguided by style conditions was introduced to generate high-quality motion sequences. Experiments on the BIWI, VOCASET, and 3DMEAD datasets showed that the proposed method achieved the best results in average vertex error (MVE) , lipvertex error (LVE) , and facial dynamic deviation (FDD) . Compared with the best baseline method on each metric, MVE, LVE, and FDD were reduced by 4.8%, 15.4%, and 13.4% respectively on BIWI, LVE was reduced by 14.9% on VOCASET, and MVE and FDD were reduced by 10.2% and 13.7% respectively on 3DMEAD. Subjective evaluation results further confirmed its advantages in visual naturalness and realism. The proposed method provided a new technical approach for high-fidelity generation, identity preservation, and personalized modeling of 3D facial animation.