«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2026.02.013]
点击复制

基于图卷积网络的三维手部姿态估计()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 48
期数:: 2027年XX

页码:: 1-8

栏目:

出版日期:: 2027-12-10

文章信息/Info

Title:: 3D Hand Pose Estimation Based on Graph Convolution Network

作者:: 彭春燕^1，2，王璇^1，2，陈杨博^1，2，何港波^1，2; 1.青海师范大学计算机学院,青海西宁810016;2青海师范大学藏语智能全国重点实验室,青海西宁810016

Author(s):: PENG Chunyan ^1,2, WANG Xuan^1,2, CHEN Yangbo^1,2,HE Gangbo^1,2; 1.College of Computer, Qinghai Normal University, Xining 810016 , China;2. The State Key Laboratory of Tibetan Intelligence, Qinghai Normal University, Xining 810016 , China

关键词:: 手部三维姿态估计; 图卷积网络; 特征提取; 图核学习优化; 评估指标动态调整

Keywords:: 3D hand pose estimation; graph convolution networks ; feature extraction; optimisation of graph kernel learning; dynamic adjustment of assessment indicators

分类号:: TP391；TP751

DOI:: 10.13705/j.issn.1671-6833.2026.02.013

摘要:: 基于单张彩色图片的三维手部姿态估计因手部自遮挡和自相似性高等原因导致预测结果存在误差大、手部结构不自然等问题。针对这些问题，首先，提出一个基于图卷积的三维手部姿态估计方法，使用Keypoint R-CNN提取图像视觉特征和手部关键点二维位置信息，将特征信息输入到改进的自适应核图卷积模块（AK_GraFormer）中；其次，引入带残差连接的AKNN图核，自适应处理图数据以增强模型的特征学习与表达；最后，利用提出的评估指标监控动态训练策略以获得更优的估计结果。通过在HO3D v3数据集与FreiHand数据集上实验，结果表明在单张彩色图片手部三维姿态估计任务中，所提方法相比其他同类方法具有明显优势，刚性对齐后的平均每关节位置误差（PA-MPJPE）最高降低了14.28百分点，检测关节点百分比曲线下面积（AUC）最高提高了3.33百分点。

Abstract:: In the task of 3D hand pose estimation from a single color image, challenges such as occlusion and high self-similarity of hand parts are faced, which lead to large prediction errors and unnatural hand structures. To address these issues, a graph convolution-based 3D hand pose estimation method is firstly proposed. Visual features and 2D keypoint positions are extracted from the input image using Keypoint R-CNN. These features are then fed into an improved Adaptive Kernel Graph Convolution module (AK_GraFormer). Subsequently, a residual-connected AKNN graph kernel is introduced to adaptively process graph-structured data, thereby enhancing the model’s feature learning and representation. Finally, a dynamic training strategy is employed, which is monitored by a proposed evaluation metric, to optimize estimation performance. Experimental results on the HO3D v3 and FreiHand datasets demonstrate that the proposed method outperforms existing approaches in monocular 3D hand pose estimation. Specifically, the Procrustes-Aligned Mean Per Joint Position Error (PA-MPJPE) is reduced by up to 17.83 percentage points, and the Area Under the Curve (AUC) of the Percentage of Correct Keypoints (PCK) metric is improved by up to 5.59 percentage points compared to state-of-the-art methods.

参考文献/References:

[1] Sridhar S, Feit A M, Theobalt C, et al. Investigating the dexterity of multi-finger input for mid-air text entry[C]// Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. New York: ACM, 2015: 3643-3652.
[2] Oikonomidis I, Kyriazis N, Argyros A A. Tracking the articulated motion of two strongly interacting hands[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 1862-1869.
[3] Tkach A, Pauly M, Tagliasacchi A. Sphere-meshes for real-time hand modeling and tracking[J]. ACM Transactions on Graphics. New York: ACM, 2016, 35(6): 1-11.
[4] ROMERO J, TZIONAS D, BLACK M J. Embodied Hands: Modeling and Capturing Hands and Bodies Together[J]. ACM Transactions on Graphics. New York: ACM, 2017, 36(6): 1-17.
[5] Pavlakos G, Choutas V, Ghorbani N, et al. Expressive body capture: 3d hands, face, and body from a single image[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Long Beach: IEEE, 2019: 10975-10985.
[6] KESKIN C, KIRAÇ F, KARA Y E, et al. Hand Pose Estimation and Hand Shape Classification Using Multilayered Randomized Decision Forests[C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2012: 852-863.
[7] TOMPSON J, STEIN M, LECUN Y, et al. Real-time Continuous Pose Recovery of Human Hands Using Convolutional Networks[J]. ACM Trans Graph, 2014, 33(5): 1-10.
[8] Pan X, Li S, Wang H, et al. LGCAnet: lightweight hand pose estimation network based on HRnet[J]. The Journal of Supercomputing, 2024(80): 1-23.
[9] Hoang D C, Tan P X, Pham D L, et al. Efficient Multimodal Fusion For Hand Pose Estimation With Hourglass Network[J]. IEEE Access, 2024(12): 113810-113825.
[10] Zhan Z, Luo G. Multiscale feature fusion network for monocular complex hand pose estimation [J]. Electronics Letters, 2023, 59(24): 1-4.
[11] Panteleris P, Oikonomidis I, Argyros A. Using a single rgb frame for real time 3d hand pose estimation in the wild [C]// Proceedings of the 2018 IEEE winter conference on applications of computer vision. Lake Tahoe: IEEE, 2018: 436-445.
[12] Doosti B, Naha S, Mirbagheri M, et al. Hope-net: A graph-based model for hand-object pose estimation [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Seattle: IEEE, 2020: 6608-6617.
[13] Zhao W, Wang W, Tian Y. Graformer: Graph-oriented transformer for 3d pose estimation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 20438-20447.
[14] 李志新, 商樊洪, 郇战, 等. 基于混合特征图卷积神经网络的人体行为识别方法[J]. 郑州大学学报(工学版), 2024, 45(04): 46-52.
[15] Cai Y, Ge L, Liu J, et al. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks[C]// Proceedings of the IEEE/CVF international conference on computer vision. Seoul: IEEE, 2019: 2272-2281.
[16] Aboukhashab A T, Robertini N, Malik J, et al. Shape-GraFormer: GraFormer-Based Network for Hand-Object Reconstruction from a Single Depth Map[J]. IEEE Access, 2024.
[17] Zhuang N, Mu Y. Joint hand-object pose estimation with differentially-learned physical contact point analysis [C]// Proceedings of the 2021 international conference on multimedia retrieval. New York, 2021: 420-427.
[18] Zhang M, Li A, Liu H, et al. Coarse-to-fine hand-object pose estimation with interaction-aware graph convolutional network[J]. Sensors, 2021, 21(23): 8092.
[19] 马胜营, 李敬华, 孔德慧, 等. 基于双分支多尺度注意力的手三维姿态估计[J]. 计算机学报, 2023, 46(07): 1383-1395.
[20] Yang W, Xie L, Qian W, et al. Coarse-to-fine cascaded 3D hand reconstruction based on SSGC and MHSA[J]. The Visual Computer, 2025, 41(1): 11-24.
[21] He K, Gkioxari G, Dollár P, et al. Mask r-cnn [C]// Proceedings of the IEEE international conference on computer vision. Venice: IEEE, 2017: 2961-2969.
[22] Ju M, Hou S, Fan Y, et al. Adaptive kernel graph neural network[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Online: AAAI, 2022, 36(6): 7051-7058.
[23] Vasconcelos C, Birodkar V, Dumoulin V. Proper reuse of image classification features improves object detection [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans: IEEE, 2022: 13628-13637.
[24] Hampali S, Sarkar S D, Lepetit V. Ho-3d_v3: Improving the accuracy of hand-object annotations of the ho-3d dataset[J]. arxiv preprint arxiv: 2107.00887, 2021.
[25] Zimmermann C, Ceylan D, Yang J, et al. Freihand: A Dataset for markerless capture of hand pose and shape from single rgb images [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 813-822.
[26] 杨冰, 徐楚阳, 姚金良, 等. 基于单目 RGB 图像的三维手部姿态估计方法[J]. 浙江大学学报(工学版), 2025, 59(01): 18-26.
[27] Chen Y, Tu Z, Kang D, et al. Model-based 3d hand reconstruction via self-supervised learning[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Online: IEEE, 2021: 10451-10460.
[28] Yang L, Li K, Zhan X, et al. Artiboost: Boosting articulated 3d hand-object pose estimation via online exploration and synthesis [C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. New Orleans: IEEE, 2022: 2750-2760.
[29] Zhang H, Tian Y, Zhang Y, et al. Pymaf-x: Towards well-aligned full-body model regression from monocular images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12287-12303.
[30] Duran E, Kocabas M, Choutas V, et al. HMP: Hand motion priors for pose and shape estimation from video [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2024: 6353-6363.
[31] Chen P, Chen Y, Yang D, et al. I2UV-HandNet: Image-to-uv prediction network for accurate and high-fidelity 3d hand mesh modeling [C]// Proceedings of the IEEE/CVF international conference on computer vision. Montreal: IEEE, 2021: 12929-12938.
[32] Lin K, Wang L, Liu Z. Mesh graphormer[C]// Proceedings of the IEEE/CVF international conference on computer vision. Montreal: IEEE, 2021: 12939-12948.
[33] Liu Z, Lin G, Wang C, et al. HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation [J]. arxiv preprint arxiv: 2307.16061, 2023.
[34] Pavlakos G, Shan D, Radosavovic I, et al. Reconstructing hands in 3d with transformers [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 9826-9836.

备注/Memo

备注/Memo:: 收稿日期:2025-12-10;修订日期:2026-01-02
基金项目:国家自然科学基金资助项目(62441609,62563033) ;青海省重点研发与成果转化项目(2025-2J-J08)
作者简介:彭春燕(1980— ) ,女,山东菏泽人,青海师范大学教授,博士,主要从事文化计算、机器学习的研究,E-mail:pcy@qhnu.edu.cn。

更新日期/Last Update: 2026-03-31

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics