«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2025.03.008]
点击复制

基于伪孪生网络的无监督学习多语言神经机器翻译方法()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 46
期数:: 2025年06期

页码:: 8-14

栏目:

出版日期:: 2025-10-22

文章信息/Info

Title:: Unsupervised Learning Multilingual Neural Machine Translation Based on Pseudo-siamese Network

文章编号:: 1671-6833(2025)06-0008-07

作者:: 都力铭¹; 屈　丹¹; 2; 张传财²; 席阳丽¹; 1.郑州大学网络空间安全学院,河南郑州 450002;2.网络空间部队信息工程大学先进计算与智能工程(国家级)实验室,河南郑州 450001

Author(s):: DU Liming¹; QU Dan¹; 2; ZHANG Chuancai²; XI Yangli¹; 1.School of Cyber Science and Engineering, Zhengzhou University,Zhengzhou 450002, China; 2.Laboratory for Advanced Computing and Intelligence Engineering,Information Engineering University,Zhengzhou 450001, China

关键词:: 无监督机器翻译; 伪孪生网络; 单语数据; 噪声过滤门机制; 跨语言预训练模型

Keywords:: unsupervised machine translation; pseudo-siamese network; monolingual data; noise filtering gate mechanism; cross-language pretraining model

分类号:: TP391TP18

DOI:: 10.13705/j.issn.1671-6833.2025.03.008

文献标志码:: A

摘要:: 无监督神经机器翻译采用单语数据进行训练时会产生大量噪音信息,使得机器翻译模型在训练迭代过程中的误差不断积累,影响翻译效果。针对此问题,在跨语言预训练模型(XLM)的基础上,提出了一种基于伪孪生网络的无监督神经机器翻译方法。该方法将模型编码器分为两个模块,其中伪孪生网络部分引入了一种噪声过滤门机制,利用其对编码过程中的噪音特征进行过滤,使得模型能够更好地学习源语言和目标语言之间的映射关系。实验结果表明:在英语同德语、法语、罗马尼亚语3种语言之间的交互翻译任务中,所提方法相较于基线系统平均提升了3.5百分点,证明了其翻译效果的优越性,并使用消融实验对该模型各组件进行了有效性验证,同时在德译英翻译任务中模拟了该方法在不同噪声条件下的性能测试,表现出较好的抗噪性。

Abstract:: When unsupervised neural machine translation was trained with monolingual data, it inevitably brought a lot of noise information. The errors of the machine translation model accumulated continuously during the training iteration process, affecting the translation effect. To solve this problem, in this study an unsupervised neural machine translation method was proposed based on pseudo-siamese network on the basis of cross-lingual pre-training model (XLM). The model encoder was divided into two modules, in which the pseudo-Siamese network part introduced a noise filtering gate mechanism to filter the noise features in the encoding process, so that the model could better learn the mapping relationship between the source language and the target language. The experimental results showed that in the interactive translation task between English, German, French, and Romanian, the proposed method had an average improvement of 3.5 percentage points compared with the baseline system, which proved its superiority in translation effect. Ablation experiments were used to verify the effectiveness of each component of the model. At the same time, the performance test of the method with different noise conditions was simulated in the German-English translation task, and it also showed good noise resistance.

参考文献/References:

[1]SUTSKEVER I, VINYALS O, LE QUOC V. Sequence to sequence learning with neural networks [C]∥Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2014: 3104-3112.

[2]MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]∥Interspeech 2010. Saint-Malo:ISCA, 2010: 1045-1048.

[3]XU H F, LIU Q H, VAN GENABITH J, et al. Multihead highly parallelized LSTM decoder for neural machine translation[C]∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: USAACL, 2021: 273-282.

[4]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]∥Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press,2017: 5998-6008.

[5]LAMPLE G, CONNEAU A, DENOYER L, et al. Unsupervised machine translation using monolingual corpora only[EB/OL].(2018-04-13)[2024-10-15]. https:∥ arxiv.org/abs/1711.00043v2.

[6]LI J N, DU Y, DU L. Siamese network representation for active learning[C]∥2023 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2023: 131-135.

[7]CONNEAU A, LAMPLE G. Cross-lingual language model pretraining [C]∥Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2019: 7057-7067.

[8]KOEHN P, KNOWLES R. Six challenges for neural machine translation[C]∥Proceedings of the First Workshop on Neural Machine Translation. Stroudsburg: USAACL, 2017: 28-39.

[9]CONNEAU A, LAMPLE G, RANZATO M,et al. Word translation without parallel data [EB/OL]. (2017-1011)[2024-10-15]. https:∥arxiv.org/abs/1710.04087.

[10] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]∥Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 1096-1103.

[11]孙海鹏, 赵铁军. 无监督神经机器翻译综述[J]. 智能计算机与应用, 2021, 11(2): 1-6.

SUN H P, ZHAO T J. A survey on unsupervised neural machine translation[J]. Intelligent Computer and Applications, 2021, 11(2): 1-6.

[12] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: USAACL, 2014: 1532-1543.

[13] MELAMUD O, GOLDBERGER J, DAGAN I. Context2vec: learning generic context embedding with bidirectional LSTM[C]∥Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: USAACL, 2016: 51-61.

[14] DEVLIN J, CHANG M W, LEE K, et al. BERT: pretraining of deep bidirectional transformers for language understanding[EB/OL]. (2019-05-24)[2024-1015]. https:∥arxiv.org/abs/1810.04805v2.

[15] SONG K T, TAN X, QIN T, et al. MASS: masked sequence to sequence pre-training for language generation [EB/OL]. (2019-06-21)[2024-10-15]. https:∥ arxiv.org/abs/1905.02450v5.

[16] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised cross-lingual representation learning at scale [C]∥Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: USAACL, 2020: 8440-8451.

[17] LIU Y H, GU J T, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742.

[18] MA S M, DONG L, HUANG S H, et al. DeltaLM: encoder-decoder pre-training for language generation and translation by augmenting pretrained multilingual encoders [EB/OL]. (2021-08-18)[2024-10-15]. https:∥ arxiv.org/abs/2106.13736v2.

[19] BROMLEY J, GUYON I, LECUN Y,et al.Signature verification using a siamese time delay neural network[C]∥ Advances in Neural Information Processing Systems 6: Annual Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 1993: 737-744.

[20] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]∥ Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: USAACL, 2016: 1715-1725.

[21] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]∥Proceedings of the 34th International Conference on Machine Learning - Volume 70. New York: ACM, 2017: 1243-1252.

[22] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J].North American Chapter of the Association for Computational Linguistics, 2018, 39(10):1508-1520.

[23] PAPINENI K, ROUKOS S, WARD T, et al. BLEU[C]∥ Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. New York: ACM, 2002: 311-318.

[24] NGUYEN X P, JOTY S R, KUI W, et al. Cross-model back-translated distillation for unsupervised machine translation[EB/OL]. (2021-05-24)[2024-10-15]. https:∥arxiv.org/abs/2106.13736v2.

[25] HE Z W, WANG X, WANG R, et al. Bridging the data gap between training and inference for unsupervised neural machine translation[C]∥Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: USAACL, 2022: 6611-6623.

[26] BRIMACOMBE B, ZHOU J W. Quick back-translation for unsupervised machine translation[C]∥Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: USAACL, 2023: 8521-8534.

更新日期/Last Update: 2025-10-21

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics