Automated Penetration Testing Method Based on Deep Reinforcement Learning NoisyNet-A3C Algorithm

NAVIGATE

Table of Contents

STATISTICS

Viewed835

Downloads918

Automated Penetration Testing Method Based on Deep Reinforcement Learning NoisyNet-A3C Algorithm

[HTML] PDF下载 (918)

[1]DONG Weiyu,LIU Pengkun,LIU Chunling,et al.Automated Penetration Testing Method Based on Deep Reinforcement Learning NoisyNet-A3C Algorithm[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):60-68.[doi:10.13705/j.issn.1671-6833.2024.02.011]

Copy

Journal of Zhengzhou University (Engineering Science)[ISSN 1671-6833/CN 41-1339/T] Volume: 46 Number of periods: 2025 05 Page number: 60-68 Column: Public date: 2025-08-10

Title:: Automated Penetration Testing Method Based on Deep Reinforcement Learning NoisyNet-A3C Algorithm

Author(s):: DONG Weiyu¹; LIU Pengkun²; LIU Chunling¹; TANG Yonghe¹; MA Yupu²; 1.School of Network and Cybersecurity, Information Engineering University, Zhengzhou 450001, China; 2.School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450001, China

Keywords:: penetration testing; attack path decision; A3C algorithm; deep reinforcement learning; Metasploit

CLC:: TP181

DOI:: 10.13705/j.issn.1671-6833.2024.02.011

Abstract:: In the field of automated penetration testing, most existing attack path decision algorithms are based on partially observable Markov decision processes (POMDP), with problems such as high algorithm complexity, slow convergence speed, and susceptibility to getting stuck in local optima. In this study a reinforcement learning algorithm NoisyNet-A3C was proposed based on Markov Decision Process (MDP). And it was applied to the field of automated penetration testing. This algorithm trained actor-critic through multiple threads, and the operation results of each thread were fed back to the main neural network. At the same time, the latest parameter updates were obtained from the main neural network, fully utilizing computer performance, reducing data correlation, and improving training efficiency. In addition, adding noise parameters and weight network training update parameters to the training network increased the randomness of the behavior strategy, facilitated faster exploration of effective paths, reduced the impact of data disturbances, and enhanced the robustness of the algorithm. The experimental results showed that compared with A3C, Q-learning, DQN, and NDSPI-DQN algorithms, the NoisyNet-A3C algorithm converged more than 30% faster, verifying that the algorithm proposed in this study converged faster.

References:: [1]ZHOU T Y,ZANG Y C,ZHU J H, et al. NIG-AP: a new method for automated penetration testing[J]. Frontiers of Information Technology & Electronic Engineering,2019, 20(9):1277-1288.
[2]黄万伟, 郑向雨, 张超钦,等. 基于深度强化学习的智能路由技术研究[J]. 郑州大学学报(工学版),2023, 44(1):44-51.
HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[J]. Journal of Zhengzhou university (engineering science),2023,44(1):44-51.
[3]KRÖSE B J A. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233-235.
[4]SRINIVASAN S, LANCTOT M, ZAMBALDI V, et al. Actor-critic policy optimization in partially observable multiagent environments[C]∥Proceedings of the 32nd International Conference on Neural Information Processing Systems.NewYork:ACM, 2018: 3426-3439.
[5]KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management[C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.
[6]GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing[C]∥2018 Second World Conference on Smart Trends in Systems, Security and Sustainability. Piscataway:IEEE, 2018: 185-192.
[7]周仕承, 刘京菊, 钟晓峰, 等. 基于深度强化学习的智能化渗透测试路径发现[J]. 计算机科学, 2021, 48 (7): 40-46.
ZHOU S C, LIU J J, ZHONG X F, et al. Intelligent penetration testing path discovery based on deep reinforcement learning[J]. Computer Science, 2021, 48 (7): 40-46.
[8]HU Z G, BEURAN R, TAN Y S. Automated penetration testing using deep reinforcement learning[C]∥2020 IEEE European Symposium on Security and Privacy Workshops. Piscataway:IEEE, 2020: 2-10.
[9]CHEN J Y, HU S L, ZHENG H B, et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning[EB/OL]. (202204-05)[2023-10-11].https:∥doi.org/10.48550/arXiv.2204.01975.
[10] SCHWARTZ J, KURNIAWATI H, EL-MAHASSNI E. POMDP+information-decay: incorporating defender′s behaviour in autonomous penetration testing[J]. Proceedings of the International Conference on Automated Planning and Scheduling, 2020, 30: 235-243.
[11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518(7540): 529-533.
[12] ZHOU S C, LIU J J, HOU D D, et al. Autonomous penetration testing based on improved deep Q-network[J]. Applied Sciences, 2021, 11(19): 8823.
[13] NGUYEN H, TEERAKANOK S, INOMATA A, et al. The proposal of double agent architecture using actor-critic algorithm for penetration testing[C]∥Proceedings of the 7th International Conference on Information Systems Security and Privacy. San Francisco:Science and Technology Publications, 2021: 440-449.
[14] JIANG Z Y, ZHANG T J, KIRK R, et al. Graph backup: data efficient backup exploiting Markovian transitions [EB/OL]. (2022-05-31)[2023-10-11]. https:∥ doi.org/10.48550/arXiv.2205.15824.
[15]王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[J]. 郑州大学学报(工学版), 2020, 41(4): 41-45, 80.
WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [J]. Journal of Zhengzhou University (Engineering Science), 2020, 41(4): 41-45, 80.
[16] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥Proceedings of the 12th International Conference on Neural Information Processing Systems. NewYork:ACM, 1999: 1057-1063.
[17] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. NewYork:ACM, 2020: 621-632.
[18] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.

Similar References:

Memo

Last Update: 2025-09-19