Research Progress of Multimodal Named Entity Recognition

NAVIGATE

Table of Contents

STATISTICS

Viewed1406

Downloads6008

Research Progress of Multimodal Named Entity Recognition

[HTML] PDF下载 (6008)

[1]WANG Hairong,XU Xi,WANG Tong,et al.Research Progress of Multimodal Named Entity Recognition[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):60-71.[doi:10.13705/j.issn.1671-6833.2024.02.001]

Copy

Journal of Zhengzhou University (Engineering Science)[ISSN 1671-6833/CN 41-1339/T] Volume: 45 Number of periods: 2024 02 Page number: 60-71 Column: Public date: 2024-03-06

Title:: Research Progress of Multimodal Named Entity Recognition

Author(s):: WANG Hairong; XU Xi; WANG Tong; JING Boxiang; 1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China; 2. The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China

Keywords:: multimodal named entity recognition; Transformer; BiLSTM; multimode fusion; multitasking learning

CLC:: -

DOI:: 10.13705/j.issn.1671-6833.2024.02.001

Abstract:: In order to solve the problems in studies of multimodal named entity recognition, such as the lack of text feature semantics, the lack of visual feature semantics, and the difficulty of graphic feature fusion, a series of multimodal named entity recognition methods were proposed. Firstly, the overall framework of multi modal named entity recognition methods and common technologies in each part were examined, and classified into BilSTM-based MNER method and Transformer based MNER method. Furthermore, according to the model structure, it was further divided into four model structures, including pre-fusion model, post-fusion model, Transformer single-task model and Transformer multi-task model. Then, experiments were carried out on two data sets of Twitter-2015 and Twitter2017 for these two types of methods respectively. The experimental results showed that multi-feature cooperative representation could enhance the semantics of each modal feature. In addition, multi-task learning could promote modal feature fusion or result fusion, so as to improve the accuracy of MNER. Finally, in the future research of MNER, it was suggested to focus on enhancing modal semantics through multi-feature cooperative representation, and promoting model feature fusion or result fusion by multi-task learning.

References:: [1] GRISHMAN R, SUNDHEIM B. Message understanding conference-6: a brief history[C]∥Proceedings of the 16th conference on Computational linguistics. Stroudsburg: ACL, 1996: 466-471.
[2] 佘俊, 张学清. 音乐命名实体识别方法[J]. 计算机应用, 2010, 30(11): 2928-2931, 2948.
SHE J, ZHANG X Q. Musical named entity recognition method[J]. Journal of Computer Applications, 2010, 30(11): 2928-2931, 2948.
[3] 潘正高. 基于规则和统计相结合的中文命名实体识别研究[J]. 情报科学, 2012, 30(5): 708-712, 786.
PAN Z G. Research on the recognition of Chinese named entity based on rules and statistics[J]. Information Science, 2012, 30(5): 708-712, 786.
[4] ZHOU G D, SU J. Named entity recognition using an HMM-based chunk tagger[C]∥Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. New York:ACM, 2002: 473-480.
[5] 梁立荣, 李长伟, 沈晔, 等. 基于层叠条件随机场模型的电子病历文本信息抽取[J]. 计算机应用与软件, 2019, 36(10): 47-54, 112.
LIANG L R, LI C W, SHEN Y, et al. Text information extraction for electronic medical record based on cascaded conditional random field model[J]. Computer Applications and Software, 2019, 36(10): 47-54, 112.
[6] KONG J, ZHANG L X, JIANG M, et al. Incorporating multi-level CNN and attention mechanism for Chinese clinical named entity recognition[J]. Journal of Biomedical Informatics, 2021, 116: 103737.
[7] 罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43(10): 1943-1957.
LUO L, YANG Z H, SONG Y W, et al. Chinese clinical named entity recognition based on stroke ELMo and multi-task learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.
[8] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2020, 46(4): 40-45, 52.
YANG P, DONG W Y. Chinese named entity recognition method based on BERT embedding[J]. Computer Engineering, 2020, 46(4): 40-45, 52.
[9] 郭军成, 万刚, 胡欣杰, 等. 基于BERT的中文简历命名实体识别[J]. 计算机应用, 2021, 41(增刊1): 15-19.
GUO J C, WAN G, HU X J, et al. Chinese resume named entity recognition based on BERT[J]. Journal of Computer Applications, 2021, 41(S1): 15-19.
[10] 李博, 康晓东, 张华丽, 等. 采用Transformer-CRF的中文电子病历命名实体识别[J]. 计算机工程与应用, 2020, 56(5): 153-159.
LI B, KANG X D, ZHANG H L, et al. Named entity recognition in Chinese electronic medical records using transformer-CRF[J]. Computer Engineering and Applications, 2020, 56(5): 153-159.
[11] CETOLI A, BRAGAGLIA S, O′HARNEY A D, et al. Graph convolutional networks for named entity recognition[C]∥Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories. Stroudsburg: ACL, 2018:37-45.
[12] TANG Z, WAN B Y, YANG L. Word-character graph convolution network for Chinese named entity recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1520-1532.
[13] SUI Y, BU F Y, HU Y T, et al. Trigger-GNN: a trigger-based graph neural network for nested named entity recognition[C]∥2022 International Joint Conference on Neural Networks (IJCNN). Piscataway:IEEE, 2022: 1-8.
[14] 刘威,马磊,李凯,等.基于多粒度字形增强的中文医学命名实体识别[J].计算机工程,2024,50(2):337-344.
LIU W, MA L, LI K, et al. Chinese medical named entity recognition based on multi-granularity glyph enhancement[J]. Computer Engineering,2024,50(2):337-344.
[15] 赵珍珍,董彦如,刘静等.融合词信息和图注意力的医学命名实体识别[J/OL].计算机工程与应用,2023:1-11(2023-06-14)[2023-09-27].https:∥kns.cnki.net/kcms2/detail/11.2127.TP.20230613.1328.010.html.
ZHAO Z Z, DONG Y R, LIU J, et al. Medical named entity recognition incorporating word lnformation and graph attention [J/OL]. Computer Engineering and Applications,2023,1-11(2023-06-14)[2023-09-27].https:∥kns.cnki.net/kcms2/detail/11.2127.TP.20230613.1328.010.html.
[16] 陈曙东, 罗超, 欧阳小叶, 等. 基于动态词典匹配的语义增强中文命名实体识别算法[J]. 无线电工程, 2021, 51(7): 519-525.
CHEN S D, LUO C, OUYANG X Y, et al. A semantic-enhanced Chinese named entity recognition algorithm based on dynamic dictionary matching[J]. Radio Engineering, 2021, 51(7): 519-525.
[17] 胡新棒, 于溆乔, 李邵梅, 等. 基于知识增强的中文命名实体识别[J]. 计算机工程, 2021, 47(11): 84-92.
HU X B, YU X Q, LI S M, et al. Chinese named entity recognition based on knowledge enhancement[J]. Computer Engineering, 2021, 47(11): 84-92.
[18] 耿志超, 颜航, 邱锡鹏, 等. 基于不确定片段的检索增强命名实体识别框架[J]. 中文信息学报, 2023, 37(7): 71-81.
GENG Z C, YAN H, QIU X P, et al. The uncertainty-based retrieval framework for Chinese NER[J]. Journal of Chinese Information Processing, 2023, 37(7): 71-81.
[19] 廖梦,贾真,李天瑞.基于标签信息融合与多任务学习的中文命名实体识别[J/OL].计算机科学, 2023:1-11(2023-09-26)[2023-09-27].https:∥link.cnki.net/urlid/50.1075.TP.20230925.2014.235.
LIAO M, JIA Z, LI T R .Chinese named entity recognition based on label information fusion and multi-task learning[J].Computer Science, 2023:1-11(2023-09-26)[2023-09-27].https:∥link.cnki.net/urlid/50.1075.TP.20230925.2014.235.
[20] 王蓬辉, 李明正, 李思. 基于数据增强的中文医疗命名实体识别[J]. 北京邮电大学学报, 2020, 43(5): 84-90.
WANG P H, LI M Z, LI S. Data augmentation for Chinese clinical named entity recognition[J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(5): 84-90.
[21] 余传明, 林虹君, 张贞港. 基于多任务深度学习的实体和事件联合抽取模型[J]. 数据分析与知识发现, 2022, 6(增刊1): 117-128.
YU C M, LIN H J, ZHANG Z G. Joint extraction model for entities and events with multi-task deep learning[J]. Data Analysis and Knowledge Discovery, 2022, 6(S1): 117-128.
[22] 武国亮, 徐继宁. 基于命名实体识别任务反馈增强的中文突发事件抽取方法[J]. 计算机应用, 2021, 41(7): 1891-1896.
WU G L, XU J N. Chinese emergency event extraction method based on named entity recognition task feedback enhancement[J]. Journal of Computer Applications, 2021, 41(7): 1891-1896.
[23] ARSHAD O, GALLO I, NAWAZ S, et al. Aiding intra-text representations with visual context for multimodal named entity recognition[C]∥2019 International Conference on Document Analysis and Recognition (ICDAR).Piscataway:IEEE, 2019: 337-342.
[24] ESTEVES D, PERES R, LEHMANN J, et al. Named entity recognition in twitter using images and text[C]∥International Conference on Web Engineering. Cham: Springer, 2018: 191-199.
[25] CHEN D W, LI Z X, GU B B, et al. Multimodal named entity recognition with image attributes and image knowledge[C]∥International Conference on Database Systems for Advanced Applications. Cham: Springer, 2021: 186-201.
[26] 范涛, 王昊, 陈玥彤. 基于深度迁移学习的地方志多模态命名实体识别研究[J]. 情报学报, 2022, 41(4): 412-423.
FAN T, WANG H, CHEN Y T. Research on multimodal named entity recognition of local history based on deep transfer learning[J]. Journal of the China Society for Scientific and Technical Information, 2022, 41(4): 412-423.
[27] MOON S, NEVES L, CARVALHO V. Multimodal named entity recognition for short social media posts[C]∥Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans:NAACL,2018:852-860.
[28] LU D, NEVES L, CARVALHO V, et al. Visual attention model for name tagging in multimodal social media[C]∥Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 1990-1999.
[29] ZHANG Q,FU J L,LIU X Y,et al. Adaptive co-attention network for named entity recognition in tweets[C]∥Proceedings of the Thirty-Second AAAI Conferenceon Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI,2018:5674-5681.
[30] ZHENG C M, WU Z W, WANG T, et al. Object-aware multimodal named entity recognition in social media posts with adversarial learning[J]. IEEE Transactions on Multimedia, 2020, 23: 2520-2532.
[31] ZHANG D, WEI S Z, LI S S, et al. Multi-modal graph fusion for named entity recognition with targeted visual guidance[C]∥Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI,2021:14347-14355.
[32] XU B, HUANG S Z, SHA C F, et al. MAF: a general matching and alignment framework for multimodal named entity recognition[C]∥Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. New York: ACM, 2022: 1215-1223.
[33] YU J F, JIANG J, YANG L, et al. Improving multimodal named entity recognition via entity span detection with unified multimodal transformer[C]∥Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg: ACL, 2020: 3342-3352.
[34] WANG X Y, GUI M, JIANG Y, et al. ITA: image-text alignments for multi-modal named entity recognition[C]∥Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2022: 3176-3189 .
[35] WU Z W, ZHENG C M, CAI Y, et al. Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts[C]∥Proceedings of the 28th ACM International Conference on Multimedia. New York:ACM, 2020: 1038-1046.
[36] CHEN S G, AGUILAR G, NEVES L, et al. Can images help recognize entities? a study of the role of images for Multimodal NER[C]∥Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). Stroudsburg: ACL, 2021: 87-96.
[37] 钟维幸, 王海荣, 王栋, 等. 多模态语义协同交互的图文联合命名实体识别方法[J]. 广西科学, 2022, 29(4): 681-690.
ZHONG W X, WANG H R, WANG D, et al. Image-text joint named entity recognition method based on multi-modal semantic interaction[J]. Guangxi Sciences, 2022, 29(4): 681-690.
[38] TIAN Y, SUN X, YU H F, et al. Hierarchical self-adaptation network for multimodal named entity recognition in social media[J]. Neurocomputing, 2021, 439: 12-21.
[39] LIU L P, WANG M L, ZHANG M Z, et al. UAMNer: uncertainty-aware multimodal named entity recognition in social media posts[J]. Applied Intelligence, 2022, 52(4): 4109-4125.
[40] 李晓腾, 张盼盼, 勾智楠, 等. 基于多任务学习的多模态命名实体识别方法[J]. 计算机工程, 2023, 49(4): 114-119.
LI X T, ZHANG P P, GOU Z N, et al. Multi-modal named entity recognition method based on multi-task learning[J]. Computer Engineering, 2023, 49(4): 114-119.
[41] CHEN X, ZHANG N Y, LI L, et al. Good visual guidance make a better extractor: hierarchical visual prefix for multimodal entity and relation extraction[C]∥Findings of the ACL: NAACL 2022.Stroudsburg: ACL, 2022: 1607-1618.
[42] SUI D B, TIAN Z K, CHEN Y B, et al. A large-scale Chinese multimodal NER dataset with speech clues[C]∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 2807-2818.
[43] LIU Y, HUANG S B, LI R S, et al. USAF: multimodal Chinese named entity recognition using synthesized acoustic features[J]. Information Processing &Management, 2023, 60(3): 103290.
[44] 冯皓楠, 何智勇, 马良荔. 基于图文注意力融合的主题标签推荐[J]. 郑州大学学报(工学版), 2022, 43(6): 30-35.
FENG H N, HE Z Y, MA L L. Multimodal hashtag recommendation based on image and text attention fusion[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(6): 30-35.
[45] 郑建兴, 郭彤彤, 申利华, 等. 基于评论文本情感注意力的推荐方法研究[J]. 郑州大学学报(工学版), 2022, 43(2): 44-50, 57.
ZHENG J X, GUO T T, SHEN L H, et al. Research on recommendation method based on sentimental attention of review text[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(2): 44-50, 57.
[46] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].(2023-01-16)[2023-06-18].https:∥arxiv.org/abs/1301.3781.pdf.
[47] GOLDBERG Y, LEVY O. Word2vec explained: deriving Mikolov et al.′s negative-sampling word-embedding method[EB/OL]. (2014-02-15)[2023-06-18]. https:∥arxiv.org/abs/1402.3722.pdf.
[48] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: ACL, 2014: 1532-1543.
[49] ATHIWARATKUN B, WILSON A, ANANDKUMAR A. Probabilistic FastText for multi-sense word embeddings[C]∥Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2018: 1-11.
[50] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]∥Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2018: 2227-2237.
[51] ZHONG Q, TANG Y. An attention-based BILSTM-CRF for Chinese named entity recognition[C]∥2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA). Piscataway:IEEE, 2020: 550-555.
[52] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE, 2016: 770-778.
[53] HE K M, GKIOXARI G, DOLL R P, et al. Mask R-CNN[C]∥2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE, 2017: 2980-2988.
[54] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE, 2015: 3156-3164.
[55] ASGARI-CHENAGHLU M, FEIZI-DERAKHSHI M R, FARZINVASH L, et al. CWI: a multimodal deep learning approach for named entity recognition from social media using character, word and image features[J]. Neural Computing and Applications, 2022, 34(3): 1905-1922.
[56] LIU Y G, ZHOU Y M, WEN S T, et al. A strategy on selecting performance metrics for classifier evaluation[J]. International Journal of Mobile Computing and Multimedia Communications, 2014, 6(4): 20-35.
[57] LIU P P, LI H, REN Y M, et al. A novel framework for multimodal named entity recognition with multi-level alignments[EB/OL].(2023-05-15)[2023-06-18].https:∥doi.org/10.48550/arxiv.2305.08372.
[58] ZHANG Z X, MAI W X, XIONG H L, et al. A token-wise graph-based framework for multimodal named entity recognition[C]∥2023 IEEE International Conference on Multimedia and Expo (ICME).Piscatawy: IEEE, 2023: 2153-2158.
[59] XU B, HUANG S, DU M, et al. Different data, different modalities! reinforced data splitting for effective multimodal information extraction from social media posts[C]∥Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg: ACL, 2022:1855-1864.
[60] ZHANG X, YUAN J L, LI L, et al. Reducing the bias of visual objects in multimodal named entity recognition[C]∥Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. New York:ACM, 2023: 958-966.
[61] WANG J, YANG Y, LIU K Y, et al. M3S: scene graph driven multi-granularity multi-task learning for multi-modal NER[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 31: 111-120.

Similar References:

Memo

Last Update: 2024-03-08