[1]董卫宇,刘鹏坤,刘春玲,等.基于深度强化学习Noisy Net-A3C算法的自动化渗透测试方法[J].郑州大学学报(工学版),2024,45(pre):2.[doi:10.13705/j.issn.1671-6833.2024.02.011]
 DONG Weiyu,LIU Pengkun,LIU Chunling,et al.Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm[J].Journal of Zhengzhou University (Engineering Science),2024,45(pre):2.[doi:10.13705/j.issn.1671-6833.2024.02.011]

基于深度强化学习Noisy Net-A3C算法的自动化渗透测试方法()




Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm
(1.信息工程大学 网络空间安全学院, 河南 郑州 450001;2郑州大学 网络空间安全 学院,河南 郑州450001)
DONG Weiyu1 LIU Pengkun2 LIU Chunling1 TANG Yonghe1 MA Yupu 2
(1. School of Network and Cybersecurity, Information Engineering University, Zhengzhou 450001, China2. School of Network and Cybersecurity, Zhengzhou University, Zhengzhou 450001 , China)
渗透测试攻击路径决策A3C算法 深度强化学习Metasploit
penetration testing attack path decision-making A3C algorithm deep reinforcement learning metasploit
In the field of automated penetration testing, most existing attack path decision algorithms are based on partially observable Markov decision processes (POMDP), which have problems such as high algorithm complexity, slow convergence speed, and susceptibility to getting stuck in local optima. This article proposes a reinforcement learning algorithm NoisyNet-A3C based on Markov Decision Process (MDP) and applies it to the field of automated penetration testing. This algorithm trains Actor Critic through multiple threads, and the operation results of each thread are fed back to the main neural network. At the same time, the latest parameter updates are obtained from the main neural network, fully utilizing computer performance, reducing data correlation, and improving training efficiency. In addition, adding noise parameters and weight network training update parameters to the training network increases the randomness of the behavior strategy, facilitates faster exploration of effective paths, reduces the impact of data disturbances, and enhances the robustness of the algorithm. The experimental results show that compared with A3C, Q-learning, DQN, and NDSPI-DQN algorithms, the NoisyNet-A3C algorithm converges more than 30% faster, verifying that the algorithm proposed in this paper converges faster


[1] ZHOU T Y,ZANG Y C,ZHU J H, et al. NIG-AP: a new method for automated penetration testing[ J] . Frontiers of Information Technology & Electronic Engineering 2019, 20(9) :1277-1288.

[2] 黄万伟 , 郑向雨 , 张超钦 , . 基于深度强化学习的智能路由技术研究 [ J] . 郑州大学学报 ( 工学版 ) ,2023, 44(1) :44-51.

HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[ J] . Journal of Zhengzhou university ( engineering science) ,2023,44(1) :44-51.

[3] KRÖSE B J A. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233-235.

[4] SRINIVASAN S, LANCTOT M, ZAMBALDI V, et al. Actor-critic policy optimization in partially observable multiagent environments[C]∥Proceedings of the 32nd International Conference on Neural Information Processing Systems. NewYork:ACM, 2018: 3426-3439.

[5] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.

[6] GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing [ C] ∥2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) . Piscataway:IEEE, 2018: 185 -192.

[7] 周仕承 , 刘京菊 , 钟晓峰 , . 基于深度强化学习的智能化渗透测试路径发现 [ J] . 计算机科学 , 2021, 48 (7) : 40-46.

ZHOU S C, LIU J J, ZHONG X F, et al. Intelligent penetration testing path discovery based on deep reinforcement learning [ J ] . Computer Science, 2021, 48 (7) : 40-46.

[8] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.

[9] HU Z G, BEURAN R, TAN Y S. Automated penetration testing using deep reinforcement learning[C]∥2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) . Piscataway:IEEE, 2020: 2-10.

[10] CHEN J Y, HU S L, ZHENG H B, et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning [ EB / OL] . ( 2022 -04 - 05 ) [ 2023 - 06 - 11 ] https: ∥ arxiv. org / abs/ 2204. 01975. pdf.

[11] SCHWARTZ J, KURNIAWATI H, EL-MAHASSNI E. POMDP + information-decay: incorporating defender′ s behaviour in autonomous penetration testing [ J ] . Proceedings of the International Conference on Automated Planning and Scheduling, 2020, 30: 235-243.

[12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [ J] . Nature, 2015, 518(7540) : 529-533.

[13] ZHOU S C, LIU J J, HOU D D, et al. Autonomous penetration testing based on improved deep Q-network [ J] . Applied Sciences, 2021, 11(19) : 8823.

[14] NGUYEN H, TEERAKANOK S, INOMATA A, et al. The proposal of double agent architecture using actor-critic algorithm for penetration testing [ C] ∥Proceedings of the 7th International Conference on Information Systems Security and Privacy. San Francisco: Science and Technology Publications, 2021: 440-449.

[15] JIANG Z Y, ZHANG T J, KIRK R, et al. Graph backup: data efficient backup exploiting Markovian transitions [EB / OL] . (2022 - 05 - 31) [ 2023 - 06 - 11] . https:∥arxiv. org / abs/ 2205. 15824. pdf.

[16] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[ J] . 郑州大学学报( 工学版) , 2020, 41(4) : 41-45, 80.

WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [ J] . Journal of Zhengzhou University ( Engineering Science) , 2020, 41(4) : 41-45, 80.

[17] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥Proceedings of the 12th International Conference on Neural Information Processing Systems. NewYork:ACM, 1999: 1057-1063.

[18] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. NewYork:ACM, 2020: 621-632.

[19] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[ J] . Nature, 2016, 529(7587) : 484-489.


收稿日期:2023-3-15;修订日期:基金项目: 国家重点研发计划项目“网络身份关联分析与刻画技术”(2018YFB0804503);河南省重点研发项目“网络化信创系统安全风险感知技术与平台”( 221111210300)通信作者:刘春玲(1981—),女,河南滑县人,信息工程大学讲师,主要从事漏洞挖掘与应用,E-mail:lcl_506@163.com。
更新日期/Last Update: 2024-10-24