STATISTICS

Viewed2343

Downloads1054

Deep Deterministic Policy Gradient Algorithm Based on Mean of Multiple Estimators
[1]LI Lin,LI Yuze,ZHANG Yujia,et al.Deep Deterministic Policy Gradient Algorithm Based on Mean of Multiple Estimators[J].Journal of Zhengzhou University (Engineering Science),2022,43(02):15-21.[doi:10.13705/j.issn.1671-6833.2022.02.013]
Copy
References:
[1] 陈兴国,俞扬.强化学习及其在电脑围棋中的应用 [J].自动化学报,2016,42( 5) : 685-695.
 [2] 张凯峰,俞扬.基于逆强化学习的示教学习方法综 述[J].计算机研究与发展,2019,56( 2) : 254-261. 
[3] 王丙琛,司怀伟,谭国真.基于深度强化学习的自动 驾驶车控制算法研究[J].郑州大学学报( 工学版) , 2020,41( 4) : 41-45,80. 
[4] BERTSEKAS D P,BERTSEKAS D P,BERTSEKAS D P,et al. Dynamic programming and optimal control [M]. Nashua,NH: Athena scientific,1995. 
[5] ANSCHEL O,BARAM N,SHIMKIN N,et al. Averaged-DQN: variance reduction and stabilization for deep rein-forcement learning[C]/ /Proceedings of the 34th International Conference on Machine Learning. New York: ACM,2017: 176-185. 
[6] ALLEN C,ASADI K,RODERICK M,et al.Mean actor critic[EB/OL]. ( 2017 - 06 - 11) [2021 - 08 - 04]. https: / /arxiv.org /abs/1709. 00503. 
[7] NACHUM O,NOROUZI M,TUCKER G,et al. Smoothed action value functions for learning Gaussian policies[EB/OL]. ( 2018 - 10 - 11) [2021 - 08 - 04]. https: / /arxiv.org /abs/1803. 02348.
 [8] HASSELT H. Double Q-learning[C]/ /Advances in neural information processing systems 23. Boston: MIT,2010: 2613-2621. 
[9] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Human-level control through deep reinforcement learning[J].Nature,2015,518( 7540) : 529-533.
 [10] MNIH V,KAVUKCUOGLU K,SILVER D,et al. Playing atari with deep reinforcement learning[EB/ OL]. ( 2013-06-11) [2021-08-04]. http: / /export. arxiv.org /pdf /1312. 5602. 
[11] LI A,LU Z Q,MIAO C L.Revisiting prioritized experience replay: a value perspective[EB/OL]. ( 2021 - 03 - 11) [2021-08-04].https: / /arxiv.org /abs/2102. 03261.
 [12] WANG Z,SCHAULT T,HESSEL M,et al. Dueling network archite ctures for deep reinforcement learning [C ]/ /Proceedings of the 33rd International Conference on machine Learning. New York: ACM, 2016: 1995-2003. 
[13] 吴金金,刘全,陈松,等.一种权重平均值的深度双 Q 网络方法[J].计算机研究与发展,2020,57( 3) : 576 -589. 
[14] Van HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double Q-Learning[C]/ /Proceedings of the 30th AAAI Conference on Artificial In-telligence. Phoenix: AAAI,2016: 2094-2100.
 [15] PETERS J,SCHAAL S.Natural actor-critic[J].Neurocomputing,2008,71( 7 /8 /9) : 1180-1190. 
[16] LILLICRAP T P,HUNT J J,PRITZEL A. Continuous control with deep reinforcement learning[EB/OL]. ( 2019- 06 - 05) [2021 - 09 - 09]. https: / /arxiv. org / abs/1509.02971. 
[17] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms [C]/ /Proceedings of the 31st International Conference on International Conference on Machine Learning. New York: ACM,2014: 387-395. 
[18] FUJIMOTO S,VAN HOOF H,MEGER D. Addressing function approximation error in actor-critic methods [EB/OL]. ( 2018-03-11) [2021-08-04].https: / / arxiv.org /abs/1802. 09477. 
[19] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计 算机学报,2018,41( 1) : 1-27. 
[20] SUTTON R S,MCALLESTER D,SINGH S,et al.Policy gradient methods for reinforcement learning with function approximation [C]/ /Advances in Neural Information Processing Systems 12.Boston: MIT,2000: 1057-1063.
Similar References:
Memo

-

Last Update: 2022-02-25
Copyright © 2023 Editorial Board of Journal of Zhengzhou University (Engineering Science)