«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2024.01.012]
点击复制

基于小样本学习的口语理解方法综述()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 45
期数:: 2024年01期

页码:: 78-89

栏目:

出版日期:: 2024-01-19

文章信息/Info

Title:: A Survey of Spoken Language Understanding Based on Few-shot Learning

作者:: 刘纳¹; 2; 郑国风¹; 2; 徐贞顺¹; 2; 林令德¹; 2; 李晨¹; 2; 杨杰¹; 2; 1. 北方民族大学计算机科学与工程学院,宁夏银川 750021;2. 北方民族大学图像图形智能处理国家民委重点实验室,宁夏银川 750021

Author(s):: LIU Na ¹; 2 ; ZHENG Guofeng ¹; 2 ; XU Zhenshun ¹; 2 ; LIN Lingde ¹; 2 ; LI Chen ¹; 2 ; YANG Jie ¹; 2; 1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China;
2. The Key Laboratory of Images and Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China

关键词:: 口语理解; 小样本学习; 模型微调; 数据增强; 度量学习

Keywords:: spoken language understanding; few-shot learning; fine-tune; data augmentation; metric learning

DOI:: 10.13705/j.issn.1671-6833.2024.01.012

文献标志码:: A

摘要:: 小样本口语理解是目前对话式人工智能亟待解决的问题之一。结合国内外最新研究现状,系统地梳理了口语理解任务的相关文献。简要介绍了在非小样本场景中口语理解任务建模的经典方法,包括无关联建模、隐式关联建模、显式关联建模以及基于预训练范式的建模方法;重点阐述了在小样本口语理解任务中为解决训练样本受限问题而提出的基于模型微调、基于数据增强和基于度量学习 3 类方法,介绍了如 ULMFiT、原型网络和归纳网络等代表性模型。在此基础上对不同模型的语义理解能力、可解释性、泛化能力等性能进行分析对比。最后对口语理解任务面临的挑战和未来发展方向进行讨论,指出零样本口语理解、中文口语理解、开放域口语理解以及跨语言口语理解等研究内容是该领域的研究难点。

Abstract:: Few-shot spoken language understanding ( SLU) is one of the urgent problems in dialogue artificial intelligence (DAI) . The relevant literature on SLU task, combining the latest research trends both domestic and foreign was systematically reviewed. The classic methods for SLU task modeling in non-few-shot scenarios were briefly introduced, including single modeling, implicit joint modeling, explicit joint modeling, and pre-trained paradigms. The latest studies in few-shot SLU were introduced, which included three kinds of few-shot learning methods based on model fine-tuning, data augmentation and metric learning. Representative models such as ULMFiT, prototypical network, and induction network were discussed. On this basis, the semantic understanding ability, interpretability, generalization ability and other performances of different methods were analyzed and compared. Finally, the challenges and future development directions of SLU tasks were discussed, it was pointed out that zero-shot SLU, Chinese SLU, open-domain SLU, and cross-lingual SLU would be the research difficulties in this field

参考文献/References:

[1] 薛均晓, 黄世博, 王亚博, 等. 基于时空特征的语音情感识别模型TSTNet[J]. 郑州大学学报(工学版), 2021, 42(6)： 28-33.XUE J X, HUANG S B, WANG Y B, et al. Speech emotion recognition TSTNet based on spatial-temporal features[J]. Journal of Zhengzhou University (Engineering Science), 2021, 42(6)： 28-33.

[2] LOUVAN S, MAGNINI B. Recent neural methods on slot filling and intent classification for task-oriented dialogue systems： a survey[C]∥Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain： ICCI, 2020： 480-496.

[3] WELD H, HUANG X Q, LONG S Q, et al. A survey of joint intent detection and slot filling models in natural language understanding[J]. ACM Computing Surveys, 2022, 55(8)： 1-38.

[4] TUR G, DE MORI R. Spoken language understanding： systems for extracting semantic information from speech[D]. New York： John Wiley and Sons, 2011.

[5] BHARGAVA A, CELIKYILMAZ A, HAKKANI-TÜR D, et al. Easy contextual intent prediction and slot detection[C]∥International Conference on Acoustics, Speech and Signal Processing. Piscataway： IEEE, 2013： 8337-8341.

[6] MESNIL G, DAUPHIN Y, YAO K S, et al. Using recurrent neural networks for slot filling in spoken language understanding[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, 23(3)： 530-539.

[7] LIN Z H, FENG M W, DOS SANTOS C N, et al. A structured self-attentive sentence embedding[EB/OL]. (2017-03-09)[2023-08-09]. http：∥export.arxiv.org/abs/1703.03130.

[8] KIM Y. Convolutional neural networks for sentence classification[EB/OL]. (2014-09-03)[2023-08-09]. https：∥arxiv.org/abs/1408.5882.

[9] ZHAO W, YE J B, YANG M, et al. Investigating capsule networks with dynamic routing for text classification[EB/OL]. (2018-09-03)[2023-08-09]. https：∥arxiv.org/abs/1804.00538.

[10] ZHANG X D, WANG H F. A joint model of intent determination and slot filling for spoken language understan-ding[C]∥Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York： ACM, 2016： 2993-2999.

[11] CHUNG J, GULCEHRE C, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. (2014-12-11)[2023-08-09]. https：∥arxiv.org/abs/1412.3555.

[12] LIU B, LANE I. Joint online spoken language understanding and language modeling with recurrent neural networks[EB/OL]. (2016-09-06)[2023-08-09]. https：∥arxiv.org/abs/1609.01462.

[13] LIU B, LANE I. Attention-based recurrent neural network models for joint intent detection and slot filling[EB/OL]. (2016-09-06)[2023-08-09]. https：∥arxiv.org/abs/1609.01454.

[14] GOO C W, GAO G, HSU Y K. Slot-gated modeling for joint slot filling and intent prediction[C]∥ The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. New Haven： ACL, 2018：753-757.

[15] QIN L B, CHE W X, LI Y M, et al. A stack-propagation framework with token-level intent detection for spoken language understanding[EB/OL]. (2019-09-05)[2023-08-09]. https：∥arxiv.org/abs/1909.02188.

[16] CHEN M Y, ZENG J, LOU J. A self-attention joint mo-del for spoken language understanding in situational dialog applications[EB/OL]. (2019-05-27)[2023-08-09]. https：∥arxiv.org/abs/1905.11393.

[17] WANG Y, SHEN Y L, JIN H X. A Bi-model based RNN semantic frame parsing model for intent detection and slot filling[EB/OL]. (2018-12-26)[2023-08-09]. https：∥arxiv.org/abs/1812.10235.

[18] CHEN Q, ZHUO Z, WANG W. BERT for joint intent classification and slot filling[EB/OL]. (2019-02-28)[2023-08-09]. https：∥arxiv.org/abs/1902.10909.

[19] QIN L B, LI Z Y, CHE W X, et al. Co-GAT： a co-interactive graph attention network for joint dialog act recognition and sentiment classification[EB/OL].(2020-12-24)[2023-08-09].https：∥arxiv.org/abs/2012.13260.

[20] LI F F, FERGUS R, PERONA P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4)： 594-611.

[21] KOCH G, ZEMEL R, SALAKHUTDINOV R. Siamese neural networks for one-shot image recognition[C]∥ International Conference on Machine Learning. Piscataway： IEEE, 2015： 1-30.

[22] ZHANG X T, QIANG Y T, SUNG F, et al. RelationNet2： deep comparison columns for few-shot learning[EB/OL]. (2018-11-17)[2023-08-09]. https：∥arxiv.org/abs/1811.07100.

[23] CHEN J F, ZHANG R C, MAO Y Y, et al. ContrastNet： a contrastive learning framework for few-shot text classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(10)： 10492-10500.

[24] JIAN Y R, TORRESANI L. Label hallucination for few-shot classification[EB/OL]. (2021-12-06)[2023-08-09].https：∥arxiv.org/abs/2112.03340.

[25] DAI A M, LE Q V. Semi-supervised sequence learning[EB/OL]. (2015-11-04)[2023-08-09]. https：∥arxiv.org/abs/1511.01432.

[26] HOWARD J, RUDER S. Universal language model fine-tuning for text classification[EB/OL]. (2018-05-23)[2023-08-09]. https：∥arxiv.org/abs/1801.06146.

[27] SUN C, QIU X P, XU Y G, et al. How to fine-tune BERT for text classification? [J]. Lecture Notes in Computer Science, 2019, 11856： 194-206.

[28] MOHAMMADI S, CHAPON M. Investigating the performance of fine-tuned text classification models based-on Bert[C]∥2020 IEEE 22nd International Conference on High Performance Computing and Communications. Piscataway： IEEE, 2020： 1252-1257.

[29] ZHANG H D, ZHANG Y W, ZHAN L M, et al. Effectiveness of pre-training for few-shot intent classification[EB/OL]. (2021-09-13)[2023-08-09]. https：∥arxiv.org/abs/2109.05782.

[30] ZHANG H D, LIANG H W, ZHANG Y W, et al. Fine-tuning pre-trained language models for few-shot intent detection： supervised pre-training and isotropization[EB/OL]. (2022-05-26)[2023-08-09]. https：∥arxiv.org/abs/2205.07208.

[31] KURATA G, XIANG B, ZHOU B W, et al. Labeled data generation with encoder-decoder LSTM for semantic slot filling[C]∥17th Annual Conference of the International Speech Communication Association. San Francisco： ISCA, 2016： 725-729.

[32] HOU Y T, LIU Y J, CHE W X, et al. Sequence-to-sequence data augmentation for dialogue language understanding[EB/OL]. (2018-06-04)[2023-08-09]. https：∥arxiv.org/abs/1807.01554.

[33] KIM H Y, ROH Y H, KIM Y K. Data augmentation by data noising for open-vocabulary slots in spoken language understanding[C]∥ The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies (NAACL-HLT 2019). New Haven： ACL, 2019： 97-102.

[34] ZHAO Z J, ZHU S, YU K. Data augmentation with atomic templates for spoken language understanding[EB/OL]. (2019-08-28)[2023-08-09]. https：∥arxiv.org/abs/1908.10770.

[35] PENG B L, ZHU C G, ZENG M, et al. Data augmentation for spoken language understanding via pretrained language models[EB/OL]. (2021-03-11)[2023-08-09]. https：∥arxiv.org/abs/2004.13952.

[36] QIN L B, NI M H, ZHANG Y, et al. CoSDA-ML： multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP[EB/OL]. (2020-07-13)[2023-08-09]. https：∥arxiv.org/abs/2006.06402.

[37] SAHU G, RODRIGUEZ P, LARADJI I H, et al. Data augmentation for intent classification with off-the-shelf large language models[EB/OL]. (2022-04-05)[2023-08-09]. https：∥arxiv.org/abs/2204.01959v1.

[38] LIN Y T, PAPANGELIS A, KIM S, et al. Selective in-context data augmentation for intent detection using pointwise V-information[EB/OL]. (2023-02-10)[2023-08-09]. https：∥arxiv.org/abs/2302.05096v1.

[39] SNELL J, SWERSKY K, ZEMEL R S. Prototypical networks for few-shot learning[EB/OL]. (2017-06-19)[2023-08-09]. https：∥arxiv.org/abs/1703.05175.

[40] HOU Y T, MAO J F, LAI Y K, et al. FewJoint： a few-shot learning benchmark for joint language understanding[EB/OL]. (2020-12-13)[2023-08-09]. https：∥arxiv.org/abs/2009.08138.

[41] XU W Y, ZHOU P L, YOU C Y, et al. Semantic transportation prototypical network for few-shot intent detection[C]∥Interspeech 2021. Brno, Czechia： ISCA, 2021： 251-255.

[42] DOPIERRE T, GRAVIER C, LOGERAIS W. PROTAUGMENT： unsupervised diverse short-texts paraphrasing for intent detection meta-learning[EB/OL]. (2021-05-27)[2023-08-09].https：∥arxiv.org/abs/2105.12995.

[43] YANG F Y, ZHOU X, WANG Y, et al. Diversity features enhanced prototypical network for few-shot intent detection[C]∥ International Joint Conference on Artificial Intelligence. Vienna, Austria： IJCAI, 2022： 4447-4453.

[44] GENG R Y, LI B H, LI Y B, et al. Induction networks for few-shot text classification[EB/OL]. (2019-09-29)[2023-08-09]. https：∥arxiv.org/abs/1902.10482.

[45] GENG R Y, LI B H, LI Y B, et al. Dynamic memory induction networks for few-shot text classification[EB/OL]. (2020-05-12)[2023-08-09]. https：∥arxiv.org/abs/2005.05727.

[46] VINYALS O, BLUNDELL C, LILLICRAP T, et al. Matching networks for one shot learning[EB/OL]. (2017-12-29)[2023-08-09]. https：∥arxiv.org/abs/1606.04080.

[47] SUNG F, YANG Y X, ZHANG L, et al. Learning to compare： relation network for few-shot learning[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE, 2018： 1199-1208.

[48] SUN Y, WANG S H, FENG S K, et al. ERNIE 3.0： large-scale knowledge enhanced pre-training for language understanding and generation[EB/OL]. (2021-07-05)[2023-08-09]. https：∥arxiv.org/abs/2107.02137.

[49] ZHANG J G, HASHIMOTO K, WAN Y, et al. Are pre-trained transformers robust in intent classification? a mis-sing ingredient in evaluation of out-of-scope intent detection[EB/OL]. (2022-04-07)[2023-08-09].https：∥arxiv.org/abs/2106.04564.

[50] BHATHIYA H S, THAYASIVAM U. Meta learning for few-shot joint intent detection and slot-filling[C]∥ICMLT 2020： 2020 5th International Conference on Machine Learning Technologies. New York： ACM, 2020： 86-92.

[51] SHARMA B, MADHAVI M, ZHOU X H, et al. Exploring teacher-student learning approach for multi-lingual speech-to-intent classification[C]∥2021 IEEE Automatic Speech Recognition and Understanding Workshop. Pisca-taway： IEEE, 2022： 419-426.

更新日期/Last Update: 2024-01-24

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics