[1]LIU Na,WU Kedong,LIU Lei,et al.A Boundary Smoothing-based Method for Chinese Medical Nested Named Entity Recognition[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-8.[doi:10.13705/j.issn.1671-6833.2026.02.014]
Copy
Journal of Zhengzhou University (Engineering Science)[ISSN
1671-6833/CN
41-1339/T] Volume:
48
Number of periods:
2027 XX
Page number:
1-8
Column:
Public date:
2027-12-10
- Title:
-
A Boundary Smoothing-based Method for Chinese Medical Nested Named Entity Recognition
- Author(s):
-
LIU Na 1,2 , WU Kedong 1,2 , LIU Lei 1,2 , JI Zhe 1,2 , ZHOU Xueyu1,2
-
1. College of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China; 2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China
-
- Keywords:
-
nested NER; Chinese medical text; boundary prediction; boundary smoothing; pre-trained language model
- CLC:
-
TP391. 1
- DOI:
-
10.13705/j.issn.1671-6833.2026.02.014
- Abstract:
-
Medical corpora commonly exhibit multi-level and multi-granularity semantics with overlapping entities. Existing approaches tend to produce overconfident boundary predictions and insufficient modeling of boundary uncertainty, which hinders effective representation of nested relations among entities. Strengthening boundary prediction is therefore essential. A Chinese medical nested named entity recognition model based on boundary smoothing is developed, together with an improved span-encoding strategy to enhance recognition. The model uses RoBERTa-wwm-ext-large to obtain token-level representations and employs a BiLSTM to capture long-range dependencies. In the recognition layer, a GlobalPointer uniformly locates start and end boundaries, Rotary Position Embedding explicitly encodes relative positional information, and a biaffine decoder strengthens head-tail interactions for span-level discrimination. During training, boundary-smoothing regularization assigns soft labels to annotated spans and their neighboring spans according to distance, which suppresses hard-boundary noise and overconfidence and improves boundary calibration and recall. Experiments on CMeEE, CMeEE-V2, and CLUENER2020 show significant improvements in F1, confirming that the method effectively mitigates boundary uncertainty and nested interference in Chinese medical text, with strong accuracy and generalization.