«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2022.04.018]
点击复制

基于深度强化学习的智能路由技术研究()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 44卷
期数:: 2023年01期

页码:: 44-51

栏目:

出版日期:: 2022-12-06

文章信息/Info

Title:: Research on Intelligent Routing Technology ba＜x＞sed on Deep Reinforcement Learning in SDN

作者:: 黄万伟¹; 郑向雨¹; 张超钦²; 王苏南³; 张校辉⁴; 1.郑州轻工业大学软件学院,河南郑州 450001, 2.郑州轻工业大学计算机与通信工程学院,河南郑州 450001, 3.深圳职业技术学院电子与通信工程学院,广东深圳 518055, 4.河南信安通信技术股份有限公司,河南郑州450001

Author(s):: HUANG W W¹; ZHENG X Y¹; ZHANG C Q²; et al.; 1.Zhengzhou Light Industry University School of Software, Henan Zhengzhou 450001, 2.Zhengzhou Light Industry University School of Computer and Communication Engineering, Henan Zhengzhou 450001, 3.Shenzhen Institute of Electronic and Communication Engineering, Shenzhen, Shenzhen 518055, 4.Henan Xinan Communication Technology Co., Ltd., Henan Zhengzhou 450001

关键词:: 体验质量; 软件定义网络; 深度强化学习; 路由算法; 循环确定性策略梯度

Keywords:: Experience quality; software definition network; deep reinforcement learning; routing algorithm; cycling definitive strategy gradient

分类号:: TP393

DOI:: 10.13705/j.issn.1671-6833.2022.04.018

文献标志码:: A

摘要:: 针对现有智能路由算法收敛速度慢、平均时延高、带宽利用率低等问题,提出了一种基于深度强化学习（ DRL) 的多路径智能路由算法RDPG-Route。该算法采用循环确定性策略梯度( RDPG) 作为训练框架,引入长短期记忆网络( LSTM) 作为神经网络,基于RDPG 处理高纬度问题的算法优势,以及LSTM 循环核中记忆体的存储能力,将动态变化的网络状态输入神经网络进行训练。算法训练收敛后,将神经网络输出的动作值作为网络链路权重,基于多路径路由策略进行流量划分,以实现网络路由的智能动态调整。最后,将RDPG-Route 路由算法分别与 ECMP、DRL-TE 和DRL-R-DDPG 路由算法进行对比。结果表明,RDPG-Route 具有较好的收敛性和有效性,相比于其他智能路由算法至少降低了7. 2%平均端到端时延,提高了6. 5%吞吐量,减少了8. 9%丢包率和6. 3%的最大链路利用率。

Abstract:: To solve the problems of slow convergence speed, high average delay, and low bandwidth utilization of existing intelligent routing algorithms, in this study, a multi-path intelligent routing algorithm RDPG-Route based on deep reinforcement learning (DRL) was proposed. In the algorithm, the recurrent determi-nistic policy gradient (RDPG) was used as the training framework, the long short-term memory (LSTM) was introduced as the neural network. The algorithm advantages of RDPG were used to handle high-latitude problems and the storage capacity of the memory in the LSTM loop core, the dynamically changing network state could be input to the neural network for training. After the algorithm training converged, the action value output by the neural network was used as the network link weight, and the traffic was divided based on the multi-path routing strategy to realize the intelligent dynamic adjustment of the network routing. Finally, RDPG-Route routing algorithm was compared with ECMP, DRLTE, and DRL-R-DDPG routing algorithms respectively. The results indicated that RDPG-Route had better convergence and effectiveness. Compared with other optimal intelligent routing algorithm, RDPG-Route could reduce the average end-to-end delay by at least 7. 2%, improve the throughput by 6. 5%, and reduce the packet loss rate by 8. 9% and the maximum link utilization rate by 6. 3%.

参考文献/References:

[1] 刘振鹏, 王鑫鹏, 李明, 等. 基于时延和负载均衡的多控制器部署策略 [ J] . 郑州大学学报 ( 工学版) , 2021, 42(3) : 19-25, 32.

LIU Z P, WANG X P, LI M, et al. Multi-controller deployment strategy based on delay and load balancing[ J] . Journal of Zhengzhou university ( engineering science) , 2021, 42(3) : 19-25, 32.

[2] SCHWARZMANN S, MARQUEZAN C C, TRIVISONNO R, et al. Accuracy vs. cost trade-off for machine learning based QoE estimation in 5G networks [ C] / / IEEE International Conference on Communications ( ICC) . Piscataway: IEEE, 2020:1-6

[3] LIU Y F, ZHAO B, ZHAO P Y, et al. A survey: typical security issues of software-defined networking[ J] . China communications, 2019, 16(7) : 13-31.

[4] REZA M, JAVAD M, RAOUF S, et al. Network traffic classification using machine learning techniques over software defined networks[ J] . International journal of advanced computer science and applications, 2017, 8( 7) : 220-225.

[5] TANG F X, MAO B M, FADLULLAH Z M, et al. On removing routing protocol from future wireless networks: a real-time deep learning approach for intelligent traffic control [ J ] . IEEE wireless communications, 2018, 25 (1) : 154-160.

[6] RAO Z H, XU Y Y, PAN S M. A deep learning-based constrained intelligent routing method [ J] . Peer-to-peer networking and applications, 2021, 14(4) : 2224-2235.

[7] LIU W X, CAI J, CHEN Q C, et al. DRL-R: deep reinforcement learning approach for intelligent routing in software-defined data-center networks[ J] . Journal of network and computer applications, 2021, 177: 102865.

[8] CHEN B, SUN P H, ZHANG P, et al. Traffic engineering based on deep reinforcement learning in hybrid IP / SR network [ J ] . China communications, 2021, 18 ( 10 ) : 204-213.

[9] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动驾驶车控制算法研究[ J] . 郑州大学学报( 工学版) , 2020, 41(4) : 41-45, 80.

WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [ J] . Journal of Zhengzhou university ( engineering science) , 2020, 41(4) : 41-45, 80.

[10] HEESS N, HUNT J J, LILLICRAP T P, et al. Me-morybased control with recurrent neural networks [ EB / OL] . (2015- 12 - 14 ) [ 2021 - 10 - 20 ] . https: / / arxiv. org / abs/ 1512. 04455v1.

[11] XI L, WU J N, XU Y C, et al. Automatic generation control based on multiple neural networks with actor-critic strategy[J]. IEEE transactions on neural networks and learning systems, 2021, 32(6): 2483-2493.

[12] FANG L L, LI X Y, WU Y R, et al. Deep recurrent Qlearning method for single intersection signal control [ C] / / 13th Asia Pacific Transportation Development Conference. Reston, USA: ASCE, 2020: 148-156.

[13] YAO Z, WANG Y, MENG L M, et al. DDPG-based energy-efficient flow scheduling algorithm in software-defined data centers[ J] . Wireless communications and mobile computing, 2021, 2021: 6629852.

[14] 李琳, 李玉泽, 张钰嘉, 等. 基于多估计器平均值的深度确定性策略梯度算法[ J] . 郑州大学学报( 工学版) , 2022, 43(2) : 15-21.

LI L, LI Y Z, ZHANG Y J, et al. Deep deterministic policy gradient algorithm based on mean of multiple estimators[ J] . Journal of Zhengzhou university ( engineering science) , 2022, 43(2) : 15-21.

[15] LI S, LI W Q, COOK C, et al. Independently recurrent neural network ( IndRNN) : building a longer and deeper RNN[C] / / 2018 IEEE / CVF Conference on Computer Vision and Pattern Recognition. Psicataway: IEEE, 2018: 5457-5466.

[16] SHERSTINSKY A. Fundamentals of recurrent neural network (RNN) and long short-term memory ( LSTM) network[ J] . Physica D: nonlinear phenomena, 2020, 404: 132306.

[17] WEHRLE K, GÜNEŞ M, GROSS J. Modeling and tools for network simulation[M] . Berlin: Springer-Verlag Berlin Heidelberg, 2010.

[18] PATHAK S, MANI A, SHARMA M, et al. A novel salp swarm algorithm for controller placement problem [ J ] . Trends in computational intelligence, security and Internet of Things, 2020, 1358:24-36.

[19] BULL P, MURPHY S, BRUNO JUNIOR N, et al. A flow analysis and preemption fra

更新日期/Last Update: 2022-12-07

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics