[1]CHEN Enqing,LI Jiahui,GUO Xin.Diffusion Method and Cross-attention Mechanisms for Skeleton-based Action Recognition Method[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-8.[doi:10.13705/j.issn.1671-6833.2026.04.011]
Copy
Journal of Zhengzhou University (Engineering Science)[ISSN
1671-6833/CN
41-1339/T] Volume:
48
Number of periods:
2027 XX
Page number:
1-8
Column:
Public date:
2027-12-10
- Title:
-
Diffusion Method and Cross-attention Mechanisms for Skeleton-based Action Recognition Method
- Author(s):
-
CHEN Enqing, LI Jiahui, GUO Xin
-
School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou 450001, China
-
- Keywords:
-
skeleton-based action recognition; self-supervised learning; masked reconstruction; diffusion model; cross-attention mechanism
- CLC:
-
TP391TP181
- DOI:
-
10.13705/j.issn.1671-6833.2026.04.011
- Abstract:
-
To address the problems of incomplete motion information caused by occlusion or missing joints in skeleton-based action recognition, as well as the limited generalization ability of models under few-label conditions, a skeleton-based action recognition method DCMAE was proposed, which integrated a diffusion model with a cross-attention mechanism. Within a self-supervised learning framework, a spatio-temporal masking strategy was adopted, where the diffusion model learned the global distribution characteristics of motion sequences during the denoising process to improve classification accuracy under data-missing conditions. In the decoding stage, the cross-attention mechanism introduced encoder features to achieve spatio-temporal information interaction and guidance, thereby enhancing the model’s generalization ability in few-label conditions. Experiments conducted on the NTU RGB+D 60 and NTU RGB+D 120 datasets showed that the proposed method achieves accuracy improvements of up to 14.9 percentage points and 3 percentage points, respectively, over the SkeletonMAE models under data-missing conditions and few-label conditions. The results demonstrated that the proposed method effectively enhanced the robustness of skeleton-based action recognition models to data-missing and few-label data, providing a new perspective for self-supervised action recognition research.