[1]SUTSKEVER I, VINYALS O, LE QUOC V. Sequence to sequence learning with neural networks [C]∥Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2014: 3104-3112. [2]MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]∥Interspeech 2010. Saint-Malo:ISCA, 2010: 1045-1048.
[3]XU H F, LIU Q H, VAN GENABITH J, et al. Multihead highly parallelized LSTM decoder for neural machine translation[C]∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg: USAACL, 2021: 273-282.
[4]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]∥Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press,2017: 5998-6008.
[5]LAMPLE G, CONNEAU A, DENOYER L, et al. Unsupervised machine translation using monolingual corpora only[EB/OL].(2018-04-13)[2024-10-15]. https:∥ arxiv.org/abs/1711.00043v2.
[6]LI J N, DU Y, DU L. Siamese network representation for active learning[C]∥2023 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2023: 131-135.
[7]CONNEAU A, LAMPLE G. Cross-lingual language model pretraining [C]∥Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2019: 7057-7067.
[8]KOEHN P, KNOWLES R. Six challenges for neural machine translation[C]∥Proceedings of the First Workshop on Neural Machine Translation. Stroudsburg: USAACL, 2017: 28-39.
[9]CONNEAU A, LAMPLE G, RANZATO M,et al. Word translation without parallel data [EB/OL]. (2017-1011)[2024-10-15]. https:∥arxiv.org/abs/1710.04087.
[10] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]∥Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 1096-1103.
[11]孙海鹏, 赵铁军. 无监督神经机器翻译综述[J]. 智能计算机与应用, 2021, 11(2): 1-6.
SUN H P, ZHAO T J. A survey on unsupervised neural machine translation[J]. Intelligent Computer and Applications, 2021, 11(2): 1-6.
[12] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C]∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: USAACL, 2014: 1532-1543.
[13] MELAMUD O, GOLDBERGER J, DAGAN I. Context2vec: learning generic context embedding with bidirectional LSTM[C]∥Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Stroudsburg: USAACL, 2016: 51-61.
[14] DEVLIN J, CHANG M W, LEE K, et al. BERT: pretraining of deep bidirectional transformers for language understanding[EB/OL]. (2019-05-24)[2024-1015]. https:∥arxiv.org/abs/1810.04805v2.
[15] SONG K T, TAN X, QIN T, et al. MASS: masked sequence to sequence pre-training for language generation [EB/OL]. (2019-06-21)[2024-10-15]. https:∥ arxiv.org/abs/1905.02450v5.
[16] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised cross-lingual representation learning at scale [C]∥Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: USAACL, 2020: 8440-8451.
[17] LIU Y H, GU J T, GOYAL N, et al. Multilingual denoising pre-training for neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 726-742.
[18] MA S M, DONG L, HUANG S H, et al. DeltaLM: encoder-decoder pre-training for language generation and translation by augmenting pretrained multilingual encoders [EB/OL]. (2021-08-18)[2024-10-15]. https:∥ arxiv.org/abs/2106.13736v2.
[19] BROMLEY J, GUYON I, LECUN Y,et al.Signature verification using a siamese time delay neural network[C]∥ Advances in Neural Information Processing Systems 6: Annual Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 1993: 737-744.
[20] SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units[C]∥ Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: USAACL, 2016: 1715-1725.
[21] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]∥Proceedings of the 34th International Conference on Machine Learning - Volume 70. New York: ACM, 2017: 1243-1252.
[22] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[J].North American Chapter of the Association for Computational Linguistics, 2018, 39(10):1508-1520.
[23] PAPINENI K, ROUKOS S, WARD T, et al. BLEU[C]∥ Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. New York: ACM, 2002: 311-318.
[24] NGUYEN X P, JOTY S R, KUI W, et al. Cross-model back-translated distillation for unsupervised machine translation[EB/OL]. (2021-05-24)[2024-10-15]. https:∥arxiv.org/abs/2106.13736v2.
[25] HE Z W, WANG X, WANG R, et al. Bridging the data gap between training and inference for unsupervised neural machine translation[C]∥Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: USAACL, 2022: 6611-6623.
[26] BRIMACOMBE B, ZHOU J W. Quick back-translation for unsupervised machine translation[C]∥Findings of the Association for Computational Linguistics: EMNLP 2023. Stroudsburg: USAACL, 2023: 8521-8534.