[1]王海荣,徐 玺,王 彤,等.多模态命名实体识别方法研究进展[J].郑州大学学报(工学版),2024,45(02):60-71.[doi:10.13705/j.issn.1671-6833.2024.02.001]
 WANG Hairong,XU Xi,WANG Tong,et al.Research Progress of Multimodal Named Entity Recognition[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):60-71.[doi:10.13705/j.issn.1671-6833.2024.02.001]
点击复制

多模态命名实体识别方法研究进展()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
45卷
期数:
2024年02期
页码:
60-71
栏目:
出版日期:
2024-03-06

文章信息/Info

Title:
Research Progress of Multimodal Named Entity Recognition
作者:
王海荣 徐 玺 王 彤 荆博祥
1. 北方民族大学 计算科学与工程学院,宁夏 银川 750021;2. 北方民族大学 图像图形智能处理国家民委重点实验 室,宁夏 银川 750021
Author(s):
WANG Hairong XU Xi WANG Tong JING Boxiang
1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China; 2. The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China
关键词:
多模态命名实体识别 Transformer BiLSTM 多模态融合 多任务学习
Keywords:
multimodal named entity recognition Transformer BiLSTM multimode fusion multitasking learning
DOI:
10.13705/j.issn.1671-6833.2024.02.001
文献标志码:
A
摘要:
为了解决多模态命名实体识别(MNER)研究中存在的文本特征语义不足、视觉特征语义缺失、图文特征融 合困难等问题,多模态命名实体识别方法相继被提出。 首先,总结了多模态命名实体识别方法的整体框架以及各 部分常用的技术,随后对其进行梳理并分类为基于 BiLSTM 的 MNER 方法和基于 Transformer 的 MNER 方法,并根 据模型结构将其划分为前融合模型、后融合模型、Transformer 单任务模型、Transformer 多任务模型等 4 类模型结构。 其次,在 Twitter-2015、Twitter-2017 2 个数据集上,分别对这 2 类方法进行实验,结果表明:多特征协同表示能增强各 模态特征的语义,多任务学习能够促进模态特征融合或者结果融合,从而提升 MNER 的准确性。 建议在 MNER 的 未来研究中,着重关注通过多特征协同表示来增强模态语义,通过多任务学习促进模态特征融合或结果融合等方 向的研究。
Abstract:
In order to solve the problems in studies of multimodal named entity recognition, such as the lack of text feature semantics, the lack of visual feature semantics, and the difficulty of graphic feature fusion, a series of multimodal named entity recognition methods were proposed. Firstly, the overall framework of multi modal named entity recognition methods and common technologies in each part were examined, and classified into BilSTM-based MNER method and Transformer based MNER method. Furthermore, according to the model structure, it was further divided into four model structures, including pre-fusion model, post-fusion model, Transformer single-task model and Transformer multi-task model. Then, experiments were carried out on two data sets of Twitter-2015 and Twitter2017 for these two types of methods respectively. The experimental results showed that multi-feature cooperative representation could enhance the semantics of each modal feature. In addition, multi-task learning could promote modal feature fusion or result fusion, so as to improve the accuracy of MNER. Finally, in the future research of MNER, it was suggested to focus on enhancing modal semantics through multi-feature cooperative representation, and promoting model feature fusion or result fusion by multi-task learning.

相似文献/References:

[1]贲可荣,杨佳辉,张献,等.基于Transformer和卷积神经网络的代码克隆检测[J].郑州大学学报(工学版),2023,44(06):12.[doi:10.13705/j.issn.1671-6833.2023.03.012]
 BEN Kerong,YANG Jiahui,ZHANG Xian,et al.Code Clone Detection Based on Transformer and Convolutional Neural Network[J].Journal of Zhengzhou University (Engineering Science),2023,44(02):12.[doi:10.13705/j.issn.1671-6833.2023.03.012]
[2]高宇飞,马自行,徐 静,等.基于卷积和可变形注意力的脑胶质瘤图像分割[J].郑州大学学报(工学版),2024,45(02):27.[doi:10.13705/j.issn.1671-6833.2023.05.007]
 GAO Yufei,MA Zixing,XU Jing,et al.Brain Glioma Image Segmentation Based on Convolution and Deformable Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):27.[doi:10.13705/j.issn.1671-6833.2023.05.007]
[3]陈 燕,赖宇斌,肖 澳,等.基于 CLIP 和交叉注意力的多模态情感分析模型[J].郑州大学学报(工学版),2024,45(02):42.[doi:10.13705/j.issn.1671-6833.2024.02.003]
 CHEN Yan,LAI Yubin,XIAO Ao,et al.Multimodal Sentiment Analysis Model Based on CLIP and Cross-attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(02):42.[doi:10.13705/j.issn.1671-6833.2024.02.003]

更新日期/Last Update: 2024-03-08