[1]董卫宇,刘鹏坤,刘春玲,等.基于深度强化学习Noisy Net-A3C算法的自动化渗透测试方法[J].郑州大学学报(工学版),2024,45(pre):2.[doi:10.13705/j.issn.1671-6833.2024.02.011]
 DONG Weiyu,LIU Pengkun,LIU Chunling,et al.Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm[J].Journal of Zhengzhou University (Engineering Science),2024,45(pre):2.[doi:10.13705/j.issn.1671-6833.2024.02.011]
点击复制

基于深度强化学习Noisy Net-A3C算法的自动化渗透测试方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
45
期数:
2024年pre
页码:
2
栏目:
出版日期:
2024-11-30

文章信息/Info

Title:
Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm
作者:
董卫宇1刘鹏坤2刘春玲1唐永鹤1马钰普2
(1.信息工程大学 网络空间安全学院, 河南 郑州 450001;2郑州大学 网络空间安全 学院,河南 郑州450001)
Author(s):
DONG Weiyu1 LIU Pengkun2 LIU Chunling1 TANG Yonghe1 MA Yupu 2
(1. School of Network and Cybersecurity, Information Engineering University, Zhengzhou 450001, China2. School of Network and Cybersecurity, Zhengzhou University, Zhengzhou 450001 , China)
关键词:
渗透测试攻击路径决策A3C算法 深度强化学习Metasploit
Keywords:
penetration testing attack path decision-making A3C algorithm deep reinforcement learning metasploit
分类号:
TP181
DOI:
10.13705/j.issn.1671-6833.2024.02.011
文献标志码:
A
摘要:
在自动化渗透测试领域,现有攻击路径决策算法大多基于部分可观测马尔可夫决策过程(POMDP),存在算法复杂度过高、收敛速度慢、易陷入局部最优解等问题。针对此问题,提出了一种基于马尔可夫决策过程(MDP)的强化学习算法NoisyNet-A3C,并用于自动化渗透测试领域。该算法通过多线程训练Actor-Critic,每个线程的运算结果反馈到主神经网络中,同时从主神经网络中获取最新的参数更新,充分利用计算机性能,减少数据相关性,提高训练效率。另外,训练网络添加噪音参数与权重网络训练更新参数,增加了行为策略的随机性,利于更快探索有效路径,减少了数据扰动的影响,增强了算法的鲁棒性。实验结果表明:与A3C、Q-learning、DQN和NDSPI-DQN算法比较,NoisyNet-A3C算法收敛速度提前了30%以上,验证了本文算法收敛速度更快。
Abstract:
In the field of automated penetration testing, most existing attack path decision algorithms are based on partially observable Markov decision processes (POMDP), which have problems such as high algorithm complexity, slow convergence speed, and susceptibility to getting stuck in local optima. This article proposes a reinforcement learning algorithm NoisyNet-A3C based on Markov Decision Process (MDP) and applies it to the field of automated penetration testing. This algorithm trains Actor Critic through multiple threads, and the operation results of each thread are fed back to the main neural network. At the same time, the latest parameter updates are obtained from the main neural network, fully utilizing computer performance, reducing data correlation, and improving training efficiency. In addition, adding noise parameters and weight network training update parameters to the training network increases the randomness of the behavior strategy, facilitates faster exploration of effective paths, reduces the impact of data disturbances, and enhances the robustness of the algorithm. The experimental results show that compared with A3C, Q-learning, DQN, and NDSPI-DQN algorithms, the NoisyNet-A3C algorithm converges more than 30% faster, verifying that the algorithm proposed in this paper converges faster

参考文献/References:

[1] ZHOU T Y,ZANG Y C,ZHU J H, et al. NIG-AP: a new method for automated penetration testing[ J] . Frontiers of Information Technology & Electronic Engineering 2019, 20(9) :1277-1288.

[2] 黄万伟 , 郑向雨 , 张超钦 , . 基于深度强化学习的智能路由技术研究 [ J] . 郑州大学学报 ( 工学版 ) ,2023, 44(1) :44-51.

HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[ J] . Journal of Zhengzhou university ( engineering science) ,2023,44(1) :44-51.

[3] KRÖSE B J A. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233-235.

[4] SRINIVASAN S, LANCTOT M, ZAMBALDI V, et al. Actor-critic policy optimization in partially observable multiagent environments[C]∥Proceedings of the 32nd International Conference on Neural Information Processing Systems. NewYork:ACM, 2018: 3426-3439.

[5] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.

[6] GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing [ C] ∥2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) . Piscataway:IEEE, 2018: 185 -192.

[7] 周仕承 , 刘京菊 , 钟晓峰 , . 基于深度强化学习的智能化渗透测试路径发现 [ J] . 计算机科学 , 2021, 48 (7) : 40-46.

ZHOU S C, LIU J J, ZHONG X F, et al. Intelligent penetration testing path discovery based on deep reinforcement learning [ J ] . Computer Science, 2021, 48 (7) : 40-46.

[8] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.

[9] HU Z G, BEURAN R, TAN Y S. Automated penetration testing using deep reinforcement learning[C]∥2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) . Piscataway:IEEE, 2020: 2-10.

[10] CHEN J Y, HU S L, ZHENG H B, et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning [ EB / OL] . ( 2022 -04 - 05 ) [ 2023 - 06 - 11 ] https: ∥ arxiv. org / abs/ 2204. 01975. pdf.

[11] SCHWARTZ J, KURNIAWATI H, EL-MAHASSNI E. POMDP + information-decay: incorporating defender′ s behaviour in autonomous penetration testing [ J ] . Proceedings of the International Conference on Automated Planning and Scheduling, 2020, 30: 235-243.

[12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [ J] . Nature, 2015, 518(7540) : 529-533.

[13] ZHOU S C, LIU J J, HOU D D, et al. Autonomous penetration testing based on improved deep Q-network [ J] . Applied Sciences, 2021, 11(19) : 8823.

[14] NGUYEN H, TEERAKANOK S, INOMATA A, et al. The proposal of double agent architecture using actor-critic algorithm for penetration testing [ C] ∥Proceedings of the 7th International Conference on Information Systems Security and Privacy. San Francisco: Science and Technology Publications, 2021: 440-449.

[15] JIANG Z Y, ZHANG T J, KIRK R, et al. Graph backup: data efficient backup exploiting Markovian transitions [EB / OL] . (2022 - 05 - 31) [ 2023 - 06 - 11] . https:∥arxiv. org / abs/ 2205. 15824. pdf.

[16] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[ J] . 郑州大学学报( 工学版) , 2020, 41(4) : 41-45, 80.

WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [ J] . Journal of Zhengzhou University ( Engineering Science) , 2020, 41(4) : 41-45, 80.

[17] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥Proceedings of the 12th International Conference on Neural Information Processing Systems. NewYork:ACM, 1999: 1057-1063.

[18] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. NewYork:ACM, 2020: 621-632.

[19] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[ J] . Nature, 2016, 529(7587) : 484-489.


备注/Memo

备注/Memo:
收稿日期:2023-3-15;修订日期:基金项目: 国家重点研发计划项目“网络身份关联分析与刻画技术”(2018YFB0804503);河南省重点研发项目“网络化信创系统安全风险感知技术与平台”( 221111210300)通信作者:刘春玲(1981—),女,河南滑县人,信息工程大学讲师,主要从事漏洞挖掘与应用,E-mail:lcl_506@163.com。
更新日期/Last Update: 2024-10-24