[1]董卫宇,刘鹏坤,刘春玲,等.基于深度强化学习NoisyNet-A3C算法的自动化渗透测试方法[J].郑州大学学报(工学版),2025,46(05):60-68.[doi:10.13705/j.issn.1671-6833.2024.02.011]
 DONG Weiyu,LIU Pengkun,LIU Chunling,et al.Automated Penetration Testing Method Based on Deep Reinforcement Learning NoisyNet-A3C Algorithm[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):60-68.[doi:10.13705/j.issn.1671-6833.2024.02.011]
点击复制

基于深度强化学习NoisyNet-A3C算法的自动化渗透测试方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
46
期数:
2025年05期
页码:
60-68
栏目:
出版日期:
2025-08-10

文章信息/Info

Title:
Automated Penetration Testing Method Based on Deep Reinforcement Learning NoisyNet-A3C Algorithm
文章编号:
1671-6833(2025)05-0060-09
作者:
董卫宇1 刘鹏坤2 刘春玲1 唐永鹤1 马钰普2
1.信息工程大学 网络空间安全学院,河南 郑州 450001;2.郑州大学 网络空间安全学院,河南 郑州 450001
Author(s):
DONG Weiyu1 LIU Pengkun2 LIU Chunling1 TANG Yonghe1 MA Yupu2
1.School of Network and Cybersecurity, Information Engineering University, Zhengzhou 450001, China; 2.School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450001, China
关键词:
渗透测试 攻击路径决策 A3C算法 深度强化学习 Metasploit
Keywords:
penetration testing attack path decision A3C algorithm deep reinforcement learning Metasploit
分类号:
TP181
DOI:
10.13705/j.issn.1671-6833.2024.02.011
文献标志码:
A
摘要:
在自动化渗透测试领域,现有攻击路径决策算法大多基于部分可观测马尔可夫决策过程(POMDP),存在算法复杂度过高、收敛速度慢、易陷入局部最优解等问题。针对这些问题,提出了一种基于马尔可夫决策过程(MDP)的强化学习算法NoisyNet-A3C,并用于自动化渗透测试领域。该算法通过多线程训练actor-critic,每个线程的运算结果反馈到主神经网络中,同时从主神经网络中获取最新的参数更新,充分利用计算机性能,减少数据相关性,提高训练效率。另外,训练网络添加噪声参数与权重网络训练更新参数,增加了行为策略的随机性,利于更快探索有效路径,减少了数据扰动的影响,从而增强了算法的鲁棒性。实验结果表明:与A3C、Q-learning、DQN和NDSPI-DQN算法相比,NoisyNet-A3C算法收敛速度提高了30%以上,验证了所提算法的收敛速度更快。
Abstract:
In the field of automated penetration testing, most existing attack path decision algorithms are based on partially observable Markov decision processes (POMDP), with problems such as high algorithm complexity, slow convergence speed, and susceptibility to getting stuck in local optima. In this study a reinforcement learning algorithm NoisyNet-A3C was proposed based on Markov Decision Process (MDP). And it was applied to the field of automated penetration testing. This algorithm trained actor-critic through multiple threads, and the operation results of each thread were fed back to the main neural network. At the same time, the latest parameter updates were obtained from the main neural network, fully utilizing computer performance, reducing data correlation, and improving training efficiency. In addition, adding noise parameters and weight network training update parameters to the training network increased the randomness of the behavior strategy, facilitated faster exploration of effective paths, reduced the impact of data disturbances, and enhanced the robustness of the algorithm. The experimental results showed that compared with A3C, Q-learning, DQN, and NDSPI-DQN algorithms, the NoisyNet-A3C algorithm converged more than 30% faster, verifying that the algorithm proposed in this study converged faster.

参考文献/References:

[1]ZHOU T Y,ZANG Y C,ZHU J H, et al. NIG-AP: a new method for automated penetration testing[J]. Frontiers of Information Technology & Electronic Engineering,2019, 20(9):1277-1288. 

[2]黄万伟, 郑向雨, 张超钦,等. 基于深度强化学习的智能路由技术研究[J]. 郑州大学学报(工学版),2023, 44(1):44-51. 
HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[J]. Journal of Zhengzhou university (engineering science),2023,44(1):44-51. 
[3]KRÖSE B J A. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233-235. 
[4]SRINIVASAN S, LANCTOT M, ZAMBALDI V, et al. Actor-critic policy optimization in partially observable multiagent environments[C]∥Proceedings of the 32nd International Conference on Neural Information Processing Systems.NewYork:ACM, 2018: 3426-3439. 
[5]KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management[C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145. 
[6]GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing[C]∥2018 Second World Conference on Smart Trends in Systems, Security and Sustainability. Piscataway:IEEE, 2018: 185-192. 
[7]周仕承, 刘京菊, 钟晓峰, 等. 基于深度强化学习的智能化渗透测试路径发现[J]. 计算机科学, 2021, 48 (7): 40-46. 
ZHOU S C, LIU J J, ZHONG X F, et al. Intelligent penetration testing path discovery based on deep reinforcement learning[J]. Computer Science, 2021, 48 (7): 40-46. 
[8]HU Z G, BEURAN R, TAN Y S. Automated penetration testing using deep reinforcement learning[C]∥2020 IEEE European Symposium on Security and Privacy Workshops. Piscataway:IEEE, 2020: 2-10. 
[9]CHEN J Y, HU S L, ZHENG H B, et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning[EB/OL]. (202204-05)[2023-10-11].https:∥doi.org/10.48550/arXiv.2204.01975. 
[10] SCHWARTZ J, KURNIAWATI H, EL-MAHASSNI E. POMDP+information-decay: incorporating defender′s behaviour in autonomous penetration testing[J]. Proceedings of the International Conference on Automated Planning and Scheduling, 2020, 30: 235-243. 
[11] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518(7540): 529-533. 
[12] ZHOU S C, LIU J J, HOU D D, et al. Autonomous penetration testing based on improved deep Q-network[J]. Applied Sciences, 2021, 11(19): 8823. 
[13] NGUYEN H, TEERAKANOK S, INOMATA A, et al. The proposal of double agent architecture using actor-critic algorithm for penetration testing[C]∥Proceedings of the 7th International Conference on Information Systems Security and Privacy. San Francisco:Science and Technology Publications, 2021: 440-449. 
[14] JIANG Z Y, ZHANG T J, KIRK R, et al. Graph backup: data efficient backup exploiting Markovian transitions [EB/OL]. (2022-05-31)[2023-10-11]. https:∥ doi.org/10.48550/arXiv.2205.15824. 
[15]王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[J]. 郑州大学学报(工学版), 2020, 41(4): 41-45, 80. 
WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [J]. Journal of Zhengzhou University (Engineering Science), 2020, 41(4): 41-45, 80. 
[16] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥Proceedings of the 12th International Conference on Neural Information Processing Systems. NewYork:ACM, 1999: 1057-1063. 
[17] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. NewYork:ACM, 2020: 621-632. 
[18] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.

更新日期/Last Update: 2025-09-19