[1]刘纳,吴克东,刘磊,等.基于边界平滑的中文医疗嵌套命名实体识别方法[J].郑州大学学报(工学版),2027,48(XX):1-8.[doi:10.13705/j.issn.1671-6833.2026.02.014]
 LIU Na,WU Kedong,LIU Lei,et al.A Boundary Smoothing-based Method for Chinese Medical Nested Named Entity Recognition[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-8.[doi:10.13705/j.issn.1671-6833.2026.02.014]
点击复制

基于边界平滑的中文医疗嵌套命名实体识别方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
48
期数:
2027年XX
页码:
1-8
栏目:
出版日期:
2027-12-10

文章信息/Info

Title:
A Boundary Smoothing-based Method for Chinese Medical Nested Named Entity Recognition
作者:
刘纳 1,2 , 吴克东 1,2 , 刘磊 1,2 , 季喆 1,2 , 周雪雨1,2
1. 北方民族大学 计算机科学与工程学院,宁夏 银川 750021;2. 北方民族大学 图像图形智能处理国家民委重点实验室,宁夏 银川 750021
Author(s):
LIU Na 1,2 , WU Kedong 1,2 , LIU Lei 1,2 , JI Zhe 1,2 , ZHOU Xueyu1,2
1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China; 2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China
关键词:
嵌套命名实体识别 中文医疗文本 边界预测 边界平滑 预训练语言模型
Keywords:
nested NER Chinese medical text boundary prediction boundary smoothing pre-trained language model
分类号:
TP391. 1
DOI:
10.13705/j.issn.1671-6833.2026.02.014
文献标志码:
A
摘要:
医疗语料普遍存在多层次、多粒度语义与实体重叠的问题,现有方法易出现边界预测过度置信与边界不确定性建模不足,难以有效建模实体间的嵌套关系。因此,提高对实体边界的预测能力成为解决该问题的关键。针对此问题,提出了一种基于边界平滑方法的中文医疗嵌套命名实体模型,结合改进的跨度编码策略以优化识别效果。模型通过RoBERTa-wwm-ext-large获取词级语义表示,结合BiLSTM建模长距离依赖;识别层采用全局指针统一定位实体起止边界,结合旋转位置编码显式编码相对位置信息,并通过双仿射解码器强化首尾交互完成跨度级别判别;训练阶段引入边界平滑正则,对标注及其邻域跨度按距离分配软标签,以抑制硬边界噪声与过度置信,提升边界校准与召回能力。实验结果表明,模型在CMeEE、CMeEE-V2和CLUENER2020数据集上的F1值均取得了显著提升,验证了该方法能够有效缓解中文医疗文本中的边界不确定性与嵌套干扰,具备较好的准确性与泛化能力。
Abstract:
Medical corpora commonly exhibit multi-level and multi-granularity semantics with overlapping entities. Existing approaches tend to produce overconfident boundary predictions and insufficient modeling of boundary uncertainty, which hinders effective representation of nested relations among entities. Strengthening boundary prediction is therefore essential. A Chinese medical nested named entity recognition model based on boundary smoothing is developed, together with an improved span-encoding strategy to enhance recognition. The model uses RoBERTa-wwm-ext-large to obtain token-level representations and employs a BiLSTM to capture long-range dependencies. In the recognition layer, a GlobalPointer uniformly locates start and end boundaries, Rotary Position Embedding explicitly encodes relative positional information, and a biaffine decoder strengthens head-tail interactions for span-level discrimination. During training, boundary-smoothing regularization assigns soft labels to annotated spans and their neighboring spans according to distance, which suppresses hard-boundary noise and overconfidence and improves boundary calibration and recall. Experiments on CMeEE, CMeEE-V2, and CLUENER2020 show significant improvements in F1, confirming that the method effectively mitigates boundary uncertainty and nested interference in Chinese medical text, with strong accuracy and generalization.

参考文献/References:

[1] Goyal N, Singh N. Named entity recognition and relationship extraction for biomedical text: a comprehensive survey, recent advancements, and future research directions[J]. Neurocomputing, 2025, 618: 129171.
[2] Lample G, Ballesteros M, Subramanian S, et al. Neural architectures for named entity recognition[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 260-270.
[3] Hua Zaifeng, Chen Yifei. Local Metric NER: a new paradigm for named entity recognition from a multi-label perspective[J]. Knowledge-Based Systems, 2024, 305: 112686.
[4] Zheng Guofeng, Liu Na, Li Chen, et al. Chinese medical named entity recognition based on prompt tuning and contrastive learning[J/OL]. Computer Engineering and Applications. https://link.cnki.net/urlid/11.2127.tp.20240923.1435.004.
[5] Fan Jintao, Chen Yanping, Yang Caiwei, et al. Nested named entity recognition by contrastive learning with boundary information[J]. Journal of Computer Applications, 2025, 45(10): 3111-3120.
[6] Liu Xin, Xu Hongzhen, Liu Aihua, et al. Geological named entity recognition based on MacBERT and R-drop[J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(3): 89-95.
[7] Yan Yang, Kang Yufeng, Huang Wenbo, et al. Chinese medical named entity recognition utilizing entity association and gate context awareness[J]. PLoS One, 2025, 20(2): e0319056.
[8] Yu Juntao, Bohnet B, Poesio M. Named entity recognition as dependency parsing[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6470-6476.
[9] Su Jianlin, Murtadha A, Pan Shengfeng, et al. Global pointer: novel efficient span-based approach for named entity recognition[PP/OL]. (2022-08-05)[2025-04-10]. https://doi.org/10.48550/arXiv.2208.03054.
[10] Yan Jinghui, Zong Chengqing, Xu Jin’an. Nested entity recognition approach in Chinese medical text[J]. Journal of Software, 2024, 35(6): 2923-2935.
[11] Yang Caiwei, Chen Yanping, Qin Yongbin, et al. A multi-scale semantic convergence difference operator for named entity recognition[J]. Journal of Chinese Information Processing, 2025, 39(6): 99-109.
[12] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 2818-2826.
[13] Zhu Enwei, Li Jinpeng. Boundary smoothing for named entity recognition[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 7096-7108.
[14] Shen Yongliang, Song Kaitao, Tan Xu, et al. Diffusion-NER: boundary diffusion for named entity recognition[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2023: 3875-3890.
[15] Deng Zhenrong, Huang Zheng, Wei Shiwei, et al. KCB-FLAT: enhancing Chinese named entity recognition with syntactic information and boundary smoothing techniques[J]. Mathematics, 2024, 12(17): 2714.
[16] Gao Kai, Zhou Jiahao, Chi Yunxian, et al. TourismNER: a Tourism Named Entity Recognition method based on entity boundary joint prediction[J]. Intelligent Systems with Applications, 2025, 25: 200475.
[17] Zhang Ningyu, Chen Mosha, Bi Zhen, et al. CBLUE: a Chinese biomedical language understanding evaluation benchmark[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 7888-7915.
[18] Xu Liang, Hu Hai, Zhang Xuanwei, et al. CLUE: a Chinese language understanding evaluation benchmark[C]//Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: International Committee on Computational Linguistics, 2020: 4762-4772.
[19] Wang Jue, Shou Lidan, Chen Ke, et al. Pyramid: a layered model for nested named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 5918-5928.
[20] Cui Xiaohui, Yang Yu, Li Dongmei, et al. Fusion of SoftLexicon and RoBERTa for purpose-driven electronic medical record named entity recognition[J]. Applied Sciences, 2023, 13(24): 13296.
[21] Guo Qujiang, Dong Yihong, Tian Ling, et al. BANER: boundary-aware LLMs for few-shot named entity recognition[C]//Proceedings of the 31st International Conference on Computational Linguistics. Kerrville: Association for Computational Linguistics, 2025: 10375-10389.

备注/Memo

备注/Memo:
收稿日期:2025-12-09;修订日期:2026-02-07
基金项目:国家自然科学基金资助项目(62162001) ;宁夏重点研发计划引才专项项目( 2024BEH04020) ;北方民族大学校级科研项目(2024XYZJK01) ;北方民族大学研究生创新项目( YCX24361)
作者简介:刘纳(1986— ) ,女,宁夏银川人,北方民族大学讲师,博士,主要从事数据挖掘与自然语言处理技术研究,E-mail:liuna@nun.edu.cn。
更新日期/Last Update: 2026-04-03