[1]王海荣,徐 玺,王 彤,等.多模态命名实体识别方法研究进展[J].郑州大学学报(工学版),2024,45(02):60-71.[doi:10.13705/j.issn.1671-6833.2024.02.001]
 WANG Hairong,XU Xi,WANG Tong,et al.Research Progress of Multimodal Named Entity Recognition[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):60-71.[doi:10.13705/j.issn.1671-6833.2024.02.001]
点击复制

多模态命名实体识别方法研究进展()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
45
期数:
2024年02期
页码:
60-71
栏目:
出版日期:
2024-03-06

文章信息/Info

Title:
Research Progress of Multimodal Named Entity Recognition
作者:
王海荣 徐 玺 王 彤 荆博祥
1. 北方民族大学 计算科学与工程学院,宁夏 银川 750021;2. 北方民族大学 图像图形智能处理国家民委重点实验 室,宁夏 银川 750021
Author(s):
WANG Hairong XU Xi WANG Tong JING Boxiang
1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China; 2. The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China
关键词:
多模态命名实体识别 Transformer BiLSTM 多模态融合 多任务学习
Keywords:
multimodal named entity recognition Transformer BiLSTM multimode fusion multitasking learning
DOI:
10.13705/j.issn.1671-6833.2024.02.001
文献标志码:
A
摘要:
为了解决多模态命名实体识别(MNER)研究中存在的文本特征语义不足、视觉特征语义缺失、图文特征融 合困难等问题,多模态命名实体识别方法相继被提出。 首先,总结了多模态命名实体识别方法的整体框架以及各 部分常用的技术,随后对其进行梳理并分类为基于 BiLSTM 的 MNER 方法和基于 Transformer 的 MNER 方法,并根 据模型结构将其划分为前融合模型、后融合模型、Transformer 单任务模型、Transformer 多任务模型等 4 类模型结构。 其次,在 Twitter-2015、Twitter-2017 2 个数据集上,分别对这 2 类方法进行实验,结果表明:多特征协同表示能增强各 模态特征的语义,多任务学习能够促进模态特征融合或者结果融合,从而提升 MNER 的准确性。 建议在 MNER 的 未来研究中,着重关注通过多特征协同表示来增强模态语义,通过多任务学习促进模态特征融合或结果融合等方 向的研究。
Abstract:
In order to solve the problems in studies of multimodal named entity recognition, such as the lack of text feature semantics, the lack of visual feature semantics, and the difficulty of graphic feature fusion, a series of multimodal named entity recognition methods were proposed. Firstly, the overall framework of multi modal named entity recognition methods and common technologies in each part were examined, and classified into BilSTM-based MNER method and Transformer based MNER method. Furthermore, according to the model structure, it was further divided into four model structures, including pre-fusion model, post-fusion model, Transformer single-task model and Transformer multi-task model. Then, experiments were carried out on two data sets of Twitter-2015 and Twitter2017 for these two types of methods respectively. The experimental results showed that multi-feature cooperative representation could enhance the semantics of each modal feature. In addition, multi-task learning could promote modal feature fusion or result fusion, so as to improve the accuracy of MNER. Finally, in the future research of MNER, it was suggested to focus on enhancing modal semantics through multi-feature cooperative representation, and promoting model feature fusion or result fusion by multi-task learning.

参考文献/References:

[1] GRISHMAN R, SUNDHEIM B. Message understanding conference-6: a brief history[C]∥Proceedings of the 16th conference on Computational linguistics. Stroudsburg: ACL, 1996: 466-471.

[2] 佘俊, 张学清. 音乐命名实体识别方法[J]. 计算机应用, 2010, 30(11): 2928-2931, 2948.
SHE J, ZHANG X Q. Musical named entity recognition method[J]. Journal of Computer Applications, 2010, 30(11): 2928-2931, 2948.
[3] 潘正高. 基于规则和统计相结合的中文命名实体识别研究[J]. 情报科学, 2012, 30(5): 708-712, 786.
PAN Z G. Research on the recognition of Chinese named entity based on rules and statistics[J]. Information Science, 2012, 30(5): 708-712, 786.
[4] ZHOU G D, SU J. Named entity recognition using an HMM-based chunk tagger[C]∥Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. New York:ACM, 2002: 473-480.
[5] 梁立荣, 李长伟, 沈晔, 等. 基于层叠条件随机场模型的电子病历文本信息抽取[J]. 计算机应用与软件, 2019, 36(10): 47-54, 112.
LIANG L R, LI C W, SHEN Y, et al. Text information extraction for electronic medical record based on cascaded conditional random field model[J]. Computer Applications and Software, 2019, 36(10): 47-54, 112.
[6] KONG J, ZHANG L X, JIANG M, et al. Incorporating multi-level CNN and attention mechanism for Chinese clinical named entity recognition[J]. Journal of Biomedical Informatics, 2021, 116: 103737.
[7] 罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43(10): 1943-1957.
LUO L, YANG Z H, SONG Y W, et al. Chinese clinical named entity recognition based on stroke ELMo and multi-task learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.
[8] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2020, 46(4): 40-45, 52.
YANG P, DONG W Y. Chinese named entity recognition method based on BERT embedding[J]. Computer Engineering, 2020, 46(4): 40-45, 52.
[9] 郭军成, 万刚, 胡欣杰, 等. 基于BERT的中文简历命名实体识别[J]. 计算机应用, 2021, 41(增刊1): 15-19.
GUO J C, WAN G, HU X J, et al. Chinese resume named entity recognition based on BERT[J]. Journal of Computer Applications, 2021, 41(S1): 15-19.
[10] 李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别[J]. 计算机工程与应用, 2020, 56(5): 153-159.
LI B, KANG X D, ZHANG H L, et al. Named entity recognition in Chinese electronic medical records using transformer-CRF[J]. Computer Engineering and Applications, 2020, 56(5): 153-159.
[11] CETOLI A, BRAGAGLIA S, O′HARNEY A D, et al. Graph convolutional networks for named entity recognition[C]∥Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories. Stroudsburg: ACL, 2018:37-45.
[12] TANG Z, WAN B Y, YANG L. Word-character graph convolution network for Chinese named entity recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1520-1532.
[13] SUI Y, BU F Y, HU Y T, et al. Trigger-GNN: a trigger-based graph neural network for nested named entity recognition[C]∥2022 International Joint Conference on Neural Networks (IJCNN). Piscataway:IEEE, 2022: 1-8.
[14] 刘威,马磊,李凯,等.基于多粒度字形增强的中文医学命名实体识别[J].计算机工程,2024,50(2):337-344.
LIU W, MA L, LI K, et al. Chinese medical named entity recognition based on multi-granularity glyph enhancement[J]. Computer Engineering,2024,50(2):337-344.
[15] 赵珍珍,董彦如,刘静等.融合词信息和图注意力的医学命名实体识别[J/OL].计算机工程与应用,2023:1-11(2023-06-14)[2023-09-27].https:∥kns.cnki.net/kcms2/detail/11.2127.TP.20230613.1328.010.html.
ZHAO Z Z, DONG Y R, LIU J, et al. Medical named entity recognition incorporating word lnformation and graph attention [J/OL]. Computer Engineering and Applications,2023,1-11(2023-06-14)[2023-09-27].https:∥kns.cnki.net/kcms2/detail/11.2127.TP.20230613.1328.010.html.
[16] 陈曙东, 罗超, 欧阳小叶, 等. 基于动态词典匹配的语义增强中文命名实体识别算法[J]. 无线电工程, 2021, 51(7): 519-525.
CHEN S D, LUO C, OUYANG X Y, et al. A semantic-enhanced Chinese named entity recognition algorithm based on dynamic dictionary matching[J]. Radio Engineering, 2021, 51(7): 519-525.
[17] 胡新棒, 于溆乔, 李邵梅, 等. 基于知识增强的中文命名实体识别[J]. 计算机工程, 2021, 47(11): 84-92.
HU X B, YU X Q, LI S M, et al. Chinese named entity recognition based on knowledge enhancement[J]. Computer Engineering, 2021, 47(11): 84-92.
[18] 耿志超, 颜航, 邱锡鹏, 等. 基于不确定片段的检索增强命名实体识别框架[J]. 中文信息学报, 2023, 37(7): 71-81.
GENG Z C, YAN H, QIU X P, et al. The uncertainty-based retrieval framework for Chinese NER[J]. Journal of Chinese Information Processing, 2023, 37(7): 71-81.
[19] 廖梦,贾真,李天瑞.基于标签信息融合与多任务学习的中文命名实体识别[J/OL].计算机科学, 2023:1-11(2023-09-26)[2023-09-27].https:∥link.cnki.net/urlid/50.1075.TP.20230925.2014.235.
LIAO M, JIA Z, LI T R .Chinese named entity recognition based on label information fusion and multi-task learning[J].Computer Science, 2023:1-11(2023-09-26)[2023-09-27].https:∥link.cnki.net/urlid/50.1075.TP.20230925.2014.235.
[20] 王蓬辉, 李明正, 李思. 基于数据增强的中文医疗命名实体识别[J]. 北京邮电大学学报, 2020, 43(5): 84-90.
WANG P H, LI M Z, LI S. Data augmentation for Chinese clinical named entity recognition[J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 84-90.
[21] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型[J]. 数据分析与知识发现, 2022, 6(增刊1): 117-128.
YU C M, LIN H J, ZHANG Z G. Joint extraction model for entities and events with multi-task deep learning[J]. Data Analysis and Knowledge Discovery, 2022, 6(S1): 117-128.
[22] 武国亮, 徐继宁. 基于命名实体识别任务反馈增强的中文突发事件抽取方法[J]. 计算机应用, 2021, 41(7): 1891-1896.
WU G L, XU J N. Chinese emergency event extraction method based on named entity recognition task feedback enhancement[J]. Journal of Computer Applications, 2021, 41(7): 1891-1896.
[23] ARSHAD O, GALLO I, NAWAZ S, et al. Aiding intra-text representations with visual context for multimodal named entity recognition[C]∥2019 International Conference on Document Analysis and Recognition (ICDAR).Piscataway:IEEE, 2019: 337-342.
[24] ESTEVES D, PERES R, LEHMANN J, et al. Named entity recognition in twitter using images and text[C]∥International Conference on Web Engineering. Cham: Springer, 2018: 191-199.
[25] CHEN D W, LI Z X, GU B B, et al. Multimodal named entity recognition with image attributes and image knowledge[C]∥International Conference on Database Systems for Advanced Applications. Cham: Springer, 2021: 186-201.
[26] 范涛, 王昊, 陈玥彤. 基于深度迁移学习的地方志多模态命名实体识别研究[J]. 情报学报, 2022, 41(4): 412-423.
FAN T, WANG H, CHEN Y T. Research on multimodal named entity recognition of local history based on deep transfer learning[J]. Journal of the China Society for Scientific and Technical Information, 2022, 41(4): 412-423.
[27] MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[C]∥Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans:NAACL,2018:852-860.
[28] LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]∥Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 1990-1999.
[29] ZHANG Q,FU J L,LIU X Y,et al. Adaptive co-attention network for named entity recognition in tweets[C]∥Proceedings of the Thirty-Second AAAI Conferenceon Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI,2018:5674-5681.
[30] ZHENG C M, WU Z W, WANG T, et al. Object-aware multimodal named entity recognition in social media posts with adversarial learning[J]. IEEE Transactions on Multimedia, 2020, 23: 2520-2532.
[31] ZHANG D, WEI S Z, LI S S, et al. Multi-modal graph fusion for named entity recognition with targeted visual guidance[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI,2021:14347-14355.
[32] XU B, HUANG S Z, SHA C F, et al. MAF: a general matching and alignment framework for multimodal named entity recognition[C]∥Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 1215-1223.
[33] YU J F, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer[C]∥Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg: ACL, 2020: 3342-3352.
[34] WANG X Y, GUI M, JIANG Y, et al. ITA: image-text alignments for multi-modal named entity recognition[C]∥Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 3176-3189 .
[35] WU Z W, ZHENG C M, CAI Y, et al. Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts[C]∥Proceedings of the 28th ACM International Conference on Multimedia. New York:ACM, 2020: 1038-1046.
[36] CHEN S G, AGUILAR G, NEVES L, et al. Can images help recognize entities? a study of the role of images for Multimodal NER[C]∥Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). Stroudsburg: ACL, 2021: 87-96.
[37] 钟维幸, 王海荣, 王栋, 等. 多模态语义协同交互的图文联合命名实体识别方法[J]. 广西科学, 2022, 29(4): 681-690.
ZHONG W X, WANG H R, WANG D, et al. Image-text joint named entity recognition method based on multi-modal semantic interaction[J]. Guangxi Sciences, 2022, 29(4): 681-690.
[38] TIAN Y, SUN X, YU H F, et al. Hierarchical self-adaptation network for multimodal named entity recognition in social media[J]. Neurocomputing, 2021, 439: 12-21.
[39] LIU L P, WANG M L, ZHANG M Z, et al. UAMNer: uncertainty-aware multimodal named entity recognition in social media posts[J]. Applied Intelligence, 2022, 52(4): 4109-4125.
[40] 李晓腾, 张盼盼, 勾智楠, 等. 基于多任务学习的多模态命名实体识别方法[J]. 计算机工程, 2023, 49(4): 114-119.
LI X T, ZHANG P P, GOU Z N, et al. Multi-modal named entity recognition method based on multi-task learning[J]. Computer Engineering, 2023, 49(4): 114-119.
[41] CHEN X, ZHANG N Y, LI L, et al. Good visual guidance make a better extractor: hierarchical visual prefix for multimodal entity and relation extraction[C]∥Findings of the ACL: NAACL 2022.Stroudsburg: ACL, 2022: 1607-1618.
[42] SUI D B, TIAN Z K, CHEN Y B, et al. A large-scale Chinese multimodal NER dataset with speech clues[C]∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 2807-2818.
[43] LIU Y, HUANG S B, LI R S, et al. USAF: multimodal Chinese named entity recognition using synthesized acoustic features[J]. Information Processing &Management, 2023, 60(3): 103290.
[44] 冯皓楠, 何智勇, 马良荔. 基于图文注意力融合的主题标签推荐[J]. 郑州大学学报(工学版), 2022, 43(6): 30-35.
FENG H N, HE Z Y, MA L L. Multimodal hashtag recommendation based on image and text attention fusion[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(6): 30-35.
[45] 郑建兴, 郭彤彤, 申利华, 等. 基于评论文本情感注意力的推荐方法研究[J]. 郑州大学学报(工学版), 2022, 43(2): 44-50, 57.
ZHENG J X, GUO T T, SHEN L H, et al. Research on recommendation method based on sentimental attention of review text[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(2): 44-50, 57.
[46] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].(2023-01-16)[2023-06-18].https:∥arxiv.org/abs/1301.3781.pdf.
[47] GOLDBERG Y, LEVY O. Word2vec explained: deriving Mikolov et al.′s negative-sampling word-embedding method[EB/OL]. (2014-02-15)[2023-06-18]. https:∥arxiv.org/abs/1402.3722.pdf.
[48] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: ACL, 2014: 1532-1543.
[49] ATHIWARATKUN B, WILSON A, ANANDKUMAR A. Probabilistic FastText for multi-sense word embeddings[C]∥Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 1-11.
[50] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]∥Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2018: 2227-2237.
[51] ZHONG Q, TANG Y. An attention-based BILSTM-CRF for Chinese named entity recognition[C]∥2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). Piscataway:IEEE, 2020: 550-555.
[52] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2016: 770-778.
[53] HE K M, GKIOXARI G, DOLL R P, et al. Mask R-CNN[C]∥2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE, 2017: 2980-2988.
[54] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE, 2015: 3156-3164.
[55] ASGARI-CHENAGHLU M, FEIZI-DERAKHSHI M R, FARZINVASH L, et al. CWI: a multimodal deep learning approach for named entity recognition from social media using character, word and image features[J]. Neural Computing and Applications, 2022, 34(3): 1905-1922.
[56] LIU Y G, ZHOU Y M, WEN S T, et al. A strategy on selecting performance metrics for classifier evaluation[J]. International Journal of Mobile Computing and Multimedia Communications, 2014, 6(4): 20-35.
[57] LIU P P, LI H, REN Y M, et al. A novel framework for multimodal named entity recognition with multi-level alignments[EB/OL].(2023-05-15)[2023-06-18].https:∥doi.org/10.48550/arxiv.2305.08372.
[58] ZHANG Z X, MAI W X, XIONG H L, et al. A token-wise graph-based framework for multimodal named entity recognition[C]∥2023 IEEE International Conference on Multimedia and Expo (ICME).Piscatawy: IEEE, 2023: 2153-2158.
[59] XU B, HUANG S, DU M, et al. Different data, different modalities! reinforced data splitting for effective multimodal information extraction from social media posts[C]∥Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg: ACL, 2022:1855-1864.
[60] ZHANG X, YUAN J L, LI L, et al. Reducing the bias of visual objects in multimodal named entity recognition[C]∥Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. New York:ACM, 2023: 958-966.
[61] WANG J, YANG Y, LIU K Y, et al. M3S: scene graph driven multi-granularity multi-task learning for multi-modal NER[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 31: 111-120.

相似文献/References:

[1]贲可荣,杨佳辉,张献,等.基于Transformer和卷积神经网络的代码克隆检测[J].郑州大学学报(工学版),2023,44(06):12.[doi:10.13705/j.issn.1671-6833.2023.03.012]
 BEN Kerong,YANG Jiahui,ZHANG Xian,et al.Code Clone Detection Based on Transformer and Convolutional Neural Network[J].Journal of Zhengzhou University (Engineering Science),2023,44(02):12.[doi:10.13705/j.issn.1671-6833.2023.03.012]
[2]高宇飞,马自行,徐 静,等.基于卷积和可变形注意力的脑胶质瘤图像分割[J].郑州大学学报(工学版),2024,45(02):27.[doi:10.13705/j.issn.1671-6833.2023.05.007]
 GAO Yufei,MA Zixing,XU Jing,et al.Brain Glioma Image Segmentation Based on Convolution and Deformable Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):27.[doi:10.13705/j.issn.1671-6833.2023.05.007]
[3]陈 燕,赖宇斌,肖 澳,等.基于 CLIP 和交叉注意力的多模态情感分析模型[J].郑州大学学报(工学版),2024,45(02):42.[doi:10.13705/j.issn.1671-6833.2024.02.003]
 CHEN Yan,LAI Yubin,XIAO Ao,et al.Multimodal Sentiment Analysis Model Based on CLIP and Cross-attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):42.[doi:10.13705/j.issn.1671-6833.2024.02.003]

更新日期/Last Update: 2024-03-08