参考文献/References:
[1] DAYAN P, DAW N D. Decision theory, reinforcement learning, and the brain[J]. Cognitive, Affective, & Behavioral Neuroscience, 2008, 8(4): 429-453.
[2] GUPTA N, AHIRWAL M K, ATULKAR M. Development of human decision making model with consideration of human factors through reinforcement learning and prospect utility theory[J]. Journal of Experimental & Theoretical Artificial Intelligence, 2024, 36(7): 1003-1019.
[3] SUTTON R S, BARTO A G. Reinforcement learning: an introduction [M]. London, England: The MIT Press, 2018.
[4] MILLER K J, VENDITTO S J C. Multi-step planning in the brain[J]. Current Opinion in Behavioral Sciences, 2021, 38: 29-39.
[5] DEHAENE S, SIGMAN M. From a single decision to a multi-step algorithm[J]. Current Opinion in Neurobiology, 2020, 62: 155-166.
[6] 张倩倩. 面向人机序贯决策的混合智能方法研究[D]. 合肥: 中国科学技术大学, 2021.
ZHANG Q Q. Research on hybrid intelligent method for man-machine sequential decision-making[D]. Hefei: University of Science and Technology of China, 2021.
[7] MATTAR M G, THOMPSON-SCHILL L S, BASSETT D S. The network architecture of value learning[J]. Network Neuroscience, 2018, 2(2): 128-149.
[8] 王东署, 杨凯. 基于状态转移学习的机器人行为决策认知模型[J]. 郑州大学学报(工学版), 2021,42(6): 7-13.
WANG D S, YANG K. Behavior decision-making cognitive model of mobile robot based on state transfer learning[J]. Journal of Zhengzhou University (Engineering Science), 2021,42(6): 7-13.
[9] 蒲慕明. 跨学科开启头脑风暴 促进学科交叉与融合[J]. 科学通报, 2023, 68(35): 4749-4750.
PU M M. Initiate interdisciplinary brainstorming, promote cross-disciplinary integration in neuroscience[J]. Chinese Science Bulletin, 2023, 68(35): 4749-4750.
[10] Huang J, Zhang Z, Ruan X. An improved dyna-Q algorithm inspired by the forward prediction mechanism in the rat brain for mobile robot path planning[J]. Biomimetics, 2024, 9(6): 315.
[11] Rescorla R A. A theory of pavlovian conditioning: variations in the effectiveness of reinforcement[J]. Current Research & Theory, 1972, 64-99.
[12] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1): 9-44.
[13] 李琳, 李玉泽, 张钰嘉, 等. 基于多估计器平均值的深度确定性策略梯度算法[J]. 郑州大学学报(工学版), 2022, 43(2): 15-21.
LI L, LI Y Z, ZHANG Y J, et al. Deep deterministic policy gradient algorithm based on mean of multiple estimators[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(2): 15-21.
[14] 师黎, 陶梦妍, 李志辉. 鸽子强化学习过程中内部学习状态的动态建模研究[J]. 科学技术与工程, 2017, 17(13): 120-125.
SHI L, TAO M Y, LI Z H. Dynamic modeling of internal cognitive status of pigeon in the process of reinforcement learning[J]. Science Technology and Engineering, 2017, 17(13): 120-125.
[15] DAW N D, NIV Y, DAYAN P. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control[J]. Nature Neuroscience, 2005, 8(12): 1704-1711.
[16] DOLL B B, DUNCAN K D, SIMON D A, et al. Model-based choices involve prospective neural activity[J]. Nature Neuroscience, 2015, 18(5): 767-772.
[17] MOMENNEJAD I. Learning structures: predictive representations, replay, and generalization[J]. Current Opinion in Behavioral Sciences, 2020, 32: 155-166.
[18] ESBER G R, SCHOENBAUM G, IORDANOVA M D. The rescorla-Wagner model: it is not what you think it is[J]. Neurobiology of Learning and Memory, 2025, 217: 108021.
[19] YANG L F, JIN F L, YANG L, et al. The hippocampus in pigeons contributes to the model-based valuation and the relationship between temporal context states[J]. Animals, 2024, 14(3): 431.
[20] VENDITTO S J C, MILLER K J, BRODY C D, et al. Dynamic reinforcement learning reveals time-dependent shifts in strategy during reward learning[J]. bioRxiv, 2024: 2024.02.28.582617.
相似文献/References:
[1]王丙琛,司怀伟,谭国真.基于深度强化学习的自动驾驶车控制算法研究[J].郑州大学学报(工学版),2020,41(04):41.[doi:10.13705/j.issn.1671-6833.2020.04.002]
WANG Bingchen,SI Huaiwei,TAN Guozhen.Research on Autopilot Control Algorithms Based on Deep Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2020,41(XX):41.[doi:10.13705/j.issn.1671-6833.2020.04.002]
[2]申晓宁,毛鸣健,沈如一,等.基于深度强化学习的大规模敏捷软件项目调度[J].郑州大学学报(工学版),2023,44(05):17.[doi:10.13705/j.issn.1671-6833.2023.05.003]
SHEN Xiaoning,MAO Mingjian,SHEN Ruyi,et al.Large-scale Agile Software Project Scheduling Based on Deep Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2023,44(XX):17.[doi:10.13705/j.issn.1671-6833.2023.05.003]