[1] ZHOU T Y,ZANG Y C,ZHU J H, et al. NIG-AP: a new method for automated penetration testing[ J] . Frontiers of Information Technology & Electronic Engineering 2019, 20(9) :1277-1288.
[2] 黄万伟 , 郑向雨 , 张超钦 , 等 . 基于深度强化学习的智能路由技术研究 [ J] . 郑州大学学报 ( 工学版 ) ,2023, 44(1) :44-51.
HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[ J] . Journal of Zhengzhou university ( engineering science) ,2023,44(1) :44-51.
[3] KRÖSE B J A. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233-235.
[4] SRINIVASAN S, LANCTOT M, ZAMBALDI V, et al. Actor-critic policy optimization in partially observable multiagent environments[C]∥Proceedings of the 32nd International Conference on Neural Information Processing Systems. NewYork:ACM, 2018: 3426-3439.
[5] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.
[6] GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing [ C] ∥2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) . Piscataway:IEEE, 2018: 185 -192.
[7] 周仕承 , 刘京菊 , 钟晓峰 , 等 . 基于深度强化学习的智能化渗透测试路径发现 [ J] . 计算机科学 , 2021, 48 (7) : 40-46.
ZHOU S C, LIU J J, ZHONG X F, et al. Intelligent penetration testing path discovery based on deep reinforcement learning [ J ] . Computer Science, 2021, 48 (7) : 40-46.
[8] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.
[9] HU Z G, BEURAN R, TAN Y S. Automated penetration testing using deep reinforcement learning[C]∥2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) . Piscataway:IEEE, 2020: 2-10.
[10] CHEN J Y, HU S L, ZHENG H B, et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning [ EB / OL] . ( 2022 -04 - 05 ) [ 2023 - 06 - 11 ] https: ∥ arxiv. org / abs/ 2204. 01975. pdf.
[11] SCHWARTZ J, KURNIAWATI H, EL-MAHASSNI E. POMDP + information-decay: incorporating defender′ s behaviour in autonomous penetration testing [ J ] . Proceedings of the International Conference on Automated Planning and Scheduling, 2020, 30: 235-243.
[12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [ J] . Nature, 2015, 518(7540) : 529-533.
[13] ZHOU S C, LIU J J, HOU D D, et al. Autonomous penetration testing based on improved deep Q-network [ J] . Applied Sciences, 2021, 11(19) : 8823.
[14] NGUYEN H, TEERAKANOK S, INOMATA A, et al. The proposal of double agent architecture using actor-critic algorithm for penetration testing [ C] ∥Proceedings of the 7th International Conference on Information Systems Security and Privacy. San Francisco: Science and Technology Publications, 2021: 440-449.
[15] JIANG Z Y, ZHANG T J, KIRK R, et al. Graph backup: data efficient backup exploiting Markovian transitions [EB / OL] . (2022 - 05 - 31) [ 2023 - 06 - 11] . https:∥arxiv. org / abs/ 2205. 15824. pdf.
[16] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[ J] . 郑州大学学报( 工学版) , 2020, 41(4) : 41-45, 80.
WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [ J] . Journal of Zhengzhou University ( Engineering Science) , 2020, 41(4) : 41-45, 80.
[17] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥Proceedings of the 12th International Conference on Neural Information Processing Systems. NewYork:ACM, 1999: 1057-1063.
[18] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. NewYork:ACM, 2020: 621-632.
[19] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[ J] . Nature, 2016, 529(7587) : 484-489.