Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm

NAVIGATE

Table of Contents

STATISTICS

Viewed835

Downloads934

Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm

PDF下载 (934)

[1]DONG Weiyu,LIU Pengkun,LIU Chunling,et al.Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm[J].Journal of Zhengzhou University (Engineering Science),2024,45(pre):2-.[doi:10.13705/j.issn.1671-6833.2024.02.011]

Copy

Journal of Zhengzhou University (Engineering Science)[ISSN 1671-6833/CN 41-1339/T] Volume: 45 Number of periods: 2024 pre Page number: 2- Column: Public date: 2026-01-10

Title:: Automated penetration testing method based on deep reinforcement learning Noisy Net-A3C algorithm

Author(s):: DONG Weiyu1; LIU Pengkun2; LIU Chunling1; TANG Yonghe1 ; MA Yupu 2; (1. School of Network and Cybersecurity, Information Engineering University, Zhengzhou 450001, China2. School of Network and Cybersecurity, Zhengzhou University, Zhengzhou 450001 , China)

Keywords:: penetration testing; attack path decision-making; A3C algorithm; deep reinforcement learning; metasploit

CLC:: TP181

DOI:: 10.13705/j.issn.1671-6833.2024.02.011

Abstract:: In the field of automated penetration testing, most existing attack path decision algorithms are based on partially observable Markov decision processes (POMDP), which have problems such as high algorithm complexity, slow convergence speed, and susceptibility to getting stuck in local optima. This article proposes a reinforcement learning algorithm NoisyNet-A3C based on Markov Decision Process (MDP) and applies it to the field of automated penetration testing. This algorithm trains Actor Critic through multiple threads, and the operation results of each thread are fed back to the main neural network. At the same time, the latest parameter updates are obtained from the main neural network, fully utilizing computer performance, reducing data correlation, and improving training efficiency. In addition, adding noise parameters and weight network training update parameters to the training network increases the randomness of the behavior strategy, facilitates faster exploration of effective paths, reduces the impact of data disturbances, and enhances the robustness of the algorithm. The experimental results show that compared with A3C, Q-learning, DQN, and NDSPI-DQN algorithms, the NoisyNet-A3C algorithm converges more than 30% faster, verifying that the algorithm proposed in this paper converges faster

References:: [1] ZHOU T Y,ZANG Y C,ZHU J H, et al. NIG-AP: a new method for automated penetration testing[ J] . Frontiers of Information Technology & Electronic Engineering 2019, 20(9) :1277-1288.

[2] 黄万伟 , 郑向雨 , 张超钦 , 等 . 基于深度强化学习的智能路由技术研究 [ J] . 郑州大学学报 ( 工学版 ) ,2023, 44(1) :44-51.

HUANG W W, ZHENG X Y, ZHANG C Q, et al. Research on intelligent routing technology based on deep reinforcement learning[ J] . Journal of Zhengzhou university ( engineering science) ,2023,44(1) :44-51.

[3] KRÖSE B J A. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233-235.

[4] SRINIVASAN S, LANCTOT M, ZAMBALDI V, et al. Actor-critic policy optimization in partially observable multiagent environments[C]∥Proceedings of the 32nd International Conference on Neural Information Processing Systems. NewYork:ACM, 2018: 3426-3439.

[5] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.

[6] GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing [ C] ∥2018 Second World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4) . Piscataway:IEEE, 2018: 185 -192.

[7] 周仕承 , 刘京菊 , 钟晓峰 , 等 . 基于深度强化学习的智能化渗透测试路径发现 [ J] . 计算机科学 , 2021, 48 (7) : 40-46.

ZHOU S C, LIU J J, ZHONG X F, et al. Intelligent penetration testing path discovery based on deep reinforcement learning [ J ] . Computer Science, 2021, 48 (7) : 40-46.

[8] KANG Q M, ZHOU H Z, KANG Y F. An asynchronous advantage actor-critic reinforcement learning method for stock selection and portfolio management [ C]∥Proceedings of the 2nd International Conference on Big Data Research. NewYork:ACM, 2018: 141-145.

[9] HU Z G, BEURAN R, TAN Y S. Automated penetration testing using deep reinforcement learning[C]∥2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW) . Piscataway:IEEE, 2020: 2-10.

[10] CHEN J Y, HU S L, ZHENG H B, et al. GAIL-PT: a generic intelligent penetration testing framework with generative adversarial imitation learning [ EB / OL] . ( 2022 -04 - 05 ) [ 2023 - 06 - 11 ] https: ∥ arxiv. org / abs/ 2204. 01975. pdf.

[11] SCHWARTZ J, KURNIAWATI H, EL-MAHASSNI E. POMDP + information-decay: incorporating defender′ s behaviour in autonomous penetration testing [ J ] . Proceedings of the International Conference on Automated Planning and Scheduling, 2020, 30: 235-243.

[12] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [ J] . Nature, 2015, 518(7540) : 529-533.

[13] ZHOU S C, LIU J J, HOU D D, et al. Autonomous penetration testing based on improved deep Q-network [ J] . Applied Sciences, 2021, 11(19) : 8823.

[14] NGUYEN H, TEERAKANOK S, INOMATA A, et al. The proposal of double agent architecture using actor-critic algorithm for penetration testing [ C] ∥Proceedings of the 7th International Conference on Information Systems Security and Privacy. San Francisco: Science and Technology Publications, 2021: 440-449.

[15] JIANG Z Y, ZHANG T J, KIRK R, et al. Graph backup: data efficient backup exploiting Markovian transitions [EB / OL] . (2022 - 05 - 31) [ 2023 - 06 - 11] . https:∥arxiv. org / abs/ 2205. 15824. pdf.

[16] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[ J] . 郑州大学学报( 工学版) , 2020, 41(4) : 41-45, 80.

WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [ J] . Journal of Zhengzhou University ( Engineering Science) , 2020, 41(4) : 41-45, 80.

[17] SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥Proceedings of the 12th International Conference on Neural Information Processing Systems. NewYork:ACM, 1999: 1057-1063.

[18] YE D H, CHEN G B, ZHANG W, et al. Towards playing full MOBA games with deep reinforcement learning [C]∥Proceedings of the 34th International Conference on Neural Information Processing Systems. NewYork:ACM, 2020: 621-632.

[19] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[ J] . Nature, 2016, 529(7587) : 484-489.

Similar References:

Memo

Last Update: 2024-10-24