[1]耿雪莲,宋明阳,冯 毅,等.面向关键词预测的动态对比表示增强方法[J].郑州大学学报(工学版),2025,46(03):128-135.[doi:10.13705/j.issn.1671-6833.2025.03.004]
 GENG Xuelian,SONG Mingyang,FENG Yi,et al.Dynamic Contrastive Representation Enhancement Approach for Keyphrase Prediction[J].Journal of Zhengzhou University (Engineering Science),2025,46(03):128-135.[doi:10.13705/j.issn.1671-6833.2025.03.004]
点击复制

面向关键词预测的动态对比表示增强方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
46
期数:
2025年03期
页码:
128-135
栏目:
出版日期:
2025-05-13

文章信息/Info

Title:
Dynamic Contrastive Representation Enhancement Approach for Keyphrase Prediction
文章编号:
1671-6833(2025)03-0128-08
作者:
耿雪莲12 宋明阳12 冯 毅12 景丽萍12 于 剑12
1.北京交通大学 计算机与信息技术学院,北京 100044;2.北京交通大学 交通数据分析与挖掘北京市重点实验室,北京 100044
Author(s):
GENG Xuelian12 SONG Mingyang12 FENG Yi12 JING Liping12 YU Jian12
1.School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China; 2.Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
关键词:
自然语言处理 关键词预测 多目标优化 对比学习 嵌入表示
Keywords:
natural language processing keyphrase prediction multi-objective optimization contrastive learning embedding representation
分类号:
TP391
DOI:
10.13705/j.issn.1671-6833.2025.03.004
文献标志码:
A
摘要:
关键词预测通常难以充分利用文本结构中的复杂层次和语义信息,针对该问题,提出了一种新型语义表示增强的关键词预测方法(ACL-KP),即利用动态对比学习增强关键词预测。该方法首先通过引入自适应权重机制,动态调整样本权重,解决在对比学习过程中难以区分真实样本与噪声样本的问题,减少误识别噪声样本的影响,优化空间表示。此外,为了提高训练数据的多样性,引入高斯白噪声,自动生成一些具有挑战性的虚拟样本,从而增强文档和关键词的语义表示。在关键词预测领域的多个公开数据集上进行的实验结果表明:模型在F1@5和F1@M指标上相较于当前先进模型提升了2%~17%,与序列到序列模型和统一模型相比,展现出了更显著的性能优势。
Abstract:
Keyphrase prediction often fail to fully utilize the complex hierarchies and semantic information within text structures. To address this issue, a novel keyphrase prediction method that enhanced semantic representation, called adaptive contrastive learning for keyphrase prediction(ACL-KP) was proposed. This method introduced an adaptive weighting mechanism to dynamically adjust sample weights, to solve the problem of distinguishing true samples from noise samples during contrastive learning, thereby reducing the impact of misidentifying noise samples and optimizing spatial representation. Additionally, to increase the diversity of training data, Gaussian white noise was incorporated to automatically generate some challenging virtual samples, thus enhancing the semantic representation of documents and keywords. Experimental results on multiple public datasets in the keyphrase prediction field showed that the model improved performance by 2% to 17% in F1@5 and F1@M metrics compared to current state-of-the-art models. Compared to sequence-to-sequence models and unified models, the proposed model demonstrated a more significant performance advantage.

参考文献/References:

[1]ZHAI C X, LAFFERTY J. A study of smoothing methods for language models applied to ad hoc information retrieval [J]. ACM SIGIR Forum, 2017, 51(2): 268-276. 

[2]HAMMOUDA K M, MATUTE D N, KAMEL M S. CorePhrase: keyphrase extraction for document clustering[C]∥ LectureNotes in ComputerScience. Berlin: Springer, 2005: 265-274. 
[3]BAI H L, CHEN Z B, LYU M R, et al. Neural relational topic models for scientific article analysis[C]∥Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York: ACM, 2018: 27-36. 
[4]WANG Z H, WANG D, LI Q. Keyword extraction from scientific research projects based on SRP-TF-IDF[J]. Chinese Journal of Electronics, 2021, 30(4): 652-657. 
[5]MIHALCEA R, TARAU P. Textrank: bringing order into text[C]∥Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2004: 404-411. 
[6]MENG R, ZHAO S Q, HAN S G, et al. Deep keyphrase generation[EB/OL]. (2021-05-31)[2024-06-05]. http:∥arxiv.org/abs/1704.06879. 
[7]ZHAO G Z, YIN G S, YANG P, et al. Keyphrase generation via soft and hard semantic corrections[C]∥Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2022: 7757-7768. 
[8]LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]∥2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2017: 2999-3007. 
[9]HUANG S J, CAI N G, PACHECO P P, et al. Applications of support vector machine (SVM) learning in cancer genomics[J]. Cancer Genomics & Proteomics, 2018, 15 (1): 41-51. 
[10] BENNANI-SMIRES K, MUSAT C, HOSSMANN A, et al. Simple unsupervised keyphrase extraction using sentence embeddings[EB/OL]. (2018-09-05)[2024-0605].http:∥arxiv.org/abs/1801.04470. 
[11] LIANG X N, WU S Z, LI M, et al. Unsupervised keyphrase extraction by jointly modeling local and global context[EB/OL]. (2021-09-15)[2024-06-05].http:∥ arxiv.org/abs/2109.07293. 
[12] ZHANG L H, CHEN Q, WANG W, et al. MDERank: a masked document embedding rank approach for unsupervised keyphrase extraction[EB/OL]. (2023-02-28) [2024-06-05].http:∥arxiv.org/abs/2110.06651. 
[13] CHEN J, ZHANG X M, WU Y, et al. Keyphrase generation with correlation constraints[EB/OL]. (2018-0822)[2024-06-05].http:∥arxiv.org/abs/1808.07185. 
[14] GAO T Y, YAO X C, CHEN D Q. SimCSE: simple contrastive learning of sentence embeddings [EB/OL]. (2022-05-18)[2024-06-05]. http:∥arxiv. org/ abs/2104.08821. 
[15] VAN DEN OORD A, LI Y Z, VINYALS O. Representation learning with contrastive predictive coding[EB/OL]. (2018-09-05)[2024-06-05]. http:∥arxiv. org/ abs/1807.03748. 
[16] XU J H, SHAO W, CHEN L H, et al. SimCSE++: improving contrastive learning for sentence embeddings from two perspectives[EB/OL].(2023-10-20)[2024-0605]. https:∥arxiv.org/abs/2305.13192. 
[17] HOU P Y, LI X Y. Improving contrastive learning of sentence embeddings with Focal-InfoNCE[EB/OL]. (2023-10-20) [2024-06-05]. http:∥arxiv. org/ abs/2310.06918. 
[18] LI J C, SHANG J B, MCAULEY J. UCTopic: unsupervised contrastive learning for phrase representations and topic mining[EB/OL]. (2022-02-27)[2024-06-05]. http:∥arxiv.org/abs/2202.13469. 
[19] CAI H, CHEN W H, SHI K H, et al. Keyword extractor for contrastive learning of unsupervised sentence embedding[C]∥Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing.New York: ACM, 2022: 88-93. 
[20] CHOI M, GWAK C, KIM S, et al. SimCKP: simple contrastive learning of keyphrase representations[EB/ OL]. (2023-10-12)[2024-06-05].http:∥arxiv.org/ abs/2310.08221. 
[21] LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[EB/OL]. (2019-10-29)[2024-06-05]. https:∥arxiv. org/ abs/1910.13461. 
[22] CHUANG C Y, ROBINSON J, YEN-CHENL, et al. Debiased contrastive learning[EB/OL]. (2020-10-21) [2024-06-05].http:∥arxiv.org/abs/2007.00224. 
[23] ZHAO G Z, YIN G S, YANG P, et al. Keyphrase generation via soft and hard semantic corrections[C]∥Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi: EMNLP, 2022: 7757-7768. 
[24] HULTH A. Improved automatic keyword extraction given more linguistic knowledge[C]∥Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing Not Known. Stroudsburg: Association for Computational Linguistics, 2003: 216-223. 
[25] KRAPIVIN M, AUTAEU A, MARCHESE M. Large dataset for keyphrases extraction[EB/OL]. (2008-0501)[2024-06-05]. https:∥iris.unitn. it/bitstream/11572/ 358576/1/disi09055-krapivin-autayeu-marchese. pdf. 
[26] NGUYEN T D, KAN M Y. Keyphrase extraction in scientific publications[C]∥International Conference on Asian Digital Libraries. Berlin: Springer, 2007: 317-326. 
[27] KIM N S, MEDELYAN O, KAN M Y, et al. SemEval2010 task 5?: automatic keyphrase extraction from scientific articles[J]. Language Resources and Evaluation, 2010,47(3): 21-26. 
[28] YUAN X D, WANG T, MENG R, et al. One size does not fit all: generating and evaluating variable number of keyphrases[EB/OL]. (2020-05-12)[2024-06-05]. http:∥arxiv.org/abs/1810.05241. 
[29] CHEN W, GAO Y F, ZHANG J N, et al. Title-guided encoding for keyphrase generation[EB/OL]. (2019-0116)[2024-06-05]. https:∥arxiv.org/abs/1808.08575. 
[30] CHAN H P, CHEN W, WANG L, et al. Neural keyphrase generation via reinforcement learning with adaptive rewards[EB/OL]. (2019-06-10)[2024-06-05].http:∥arxiv.org/abs/1906.04106. 
[31]WU H Q, LIU W, LI L, et al. UniKeyphrase: a unified extraction and generation framework for keyphrase prediction[EB/OL]. (2021-08-31)[2024-06-05].http:∥ arxiv.org/abs/2106.04847. 
[32]WU H Q, MA B, LIU W, et al. Fast and constrained absent keyphrase generation by prompt-based learning [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(10): 11495-11503.

更新日期/Last Update: 2025-05-22