[1]申晓宁,毛鸣健,沈如一,等.基于深度强化学习的大规模敏捷软件项目调度[J].郑州大学学报(工学版),2023,44(05):17-23.[doi:10.13705/j.issn.1671-6833.2023.05.003]
 SHEN Xiaoning,MAO Mingjian,SHEN Ruyi,et al.Large-scale Agile Software Project Scheduling Based on Deep Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2023,44(05):17-23.[doi:10.13705/j.issn.1671-6833.2023.05.003]
点击复制

基于深度强化学习的大规模敏捷软件项目调度()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
44
期数:
2023年05期
页码:
17-23
栏目:
出版日期:
2023-08-20

文章信息/Info

Title:
Large-scale Agile Software Project Scheduling Based on Deep Reinforcement Learning
作者:
申晓宁1234 毛鸣健1 沈如一1 宋丽妍5
1. 南京信息工程大学 自动化学院,江苏 南京 210044;2. 南京信息工程大学 江苏省大气环境与装备技术协同创新 中心,江苏 南京 210044;3. 南京信息工程大学 江苏省大数据分析技术重点实验室,江苏 南京 210044;4. 江苏省气 象能源利用与控制工程技术研究中心,江苏 南京 210044;5. 南方科技大学 广东省类脑智能计算重点实验室,广东 深圳 518055
Author(s):
SHEN Xiaoning 1234 MAO Mingjian 1 SHEN Ruyi 1 SONG Liyan 5
1. School of Automation, Nanjing University of Information Science and Technology, Nanjing 210044, China; 2. Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China; 3. Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China; 4. Jiangsu Engineering Research Center on Meteorological Energy Using and Control ( C-MEIC) , Nanjing 210044, China; 5. Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Southern University of Science and Technology, Shenzhen 518055, China
关键词:
强化学习 大规模 敏捷软件项目调度 深度 Q 网络 复合调度规则 优先经验回放 强耦合
Keywords:
reinforcement learning large-scale agile software project scheduling deep Q network composite scheduling rules priority experience replaystrong coupling
分类号:
TP311. 5;TP301. 6
DOI:
10.13705/j.issn.1671-6833.2023.05.003
文献标志码:
A
摘要:
为解决大规模敏捷软件项目调度问题,首先,将其分解为故事选择、故事分配和任务分配 3 个强耦合子问题, 并引入用户故事的新增与删除、每个冲刺阶段中员工工作时长的变化等动态事件,考虑团队开发速度、任务时长和技 能等约束,以最大化项目所完成用户故事总价值为目标建立大规模敏捷软件项目调度数学模型;其次,根据问题特征 设计了马尔可夫决策过程,采用 10 个状态特征描述每个冲刺阶段开始时的敏捷调度环境,12 个复合调度规则作为智 能体的候选动作,并按照调度模型的目标函数定义奖励;最后,提出一种基于复合调度规则的优先经验回放双重深度 Q 网络算法来求解所建模型,引入双重深度 Q 网络(DDQN)策略和优先经验回放策略,避免深度 Q 网络的过估计问 题,并提高经验回放池中轨迹信息的利用效率。 为了验证所提算法的有效性,在 6 个大规模敏捷软件项目调度算例 中进行了实验,分析了所提算法的收敛性。 根据算法性能测度,与已有代表性算法 DQN、双重深度 Q 网络以及仅使用 单一复合调度规则的方法进行对比。 结果表明:所提算法在 6 个不同算例中均获得了最高的平均累计奖励值。
Abstract:
This study aimed to solve the scheduling problem of large-scale agile software project. It was decomposed into three strong-coupled subproblems: story selection, story allocation and task allocation. Dynamic events such as the addition and deletion of user stories, the change of employee′s working hours in each sprint, and other constraints such as team development speed, task duration and skills were introduced. To maximize the total value of user stories completed by the project, a large-scale agile software project scheduling mathematical model was established. According to the characteristics of the problem, the Markov decision process was designed. Ten state features were used to describe the agile scheduling environment at the beginning of each sprint; 12 composite scheduling rules were designed as candidate actions of the agent; and rewards were defined according to the objective function of the scheduling model. A priority experience replay double deep Q network algorithm based on composite scheduling rules was proposed to solve the built model. The double Q network strategy and priority experience replay strategy were introduced to avoid the over-estimation problem of deep Q network and improve the utilization efficiency of trajectory information in the experience replay pool. In order to verify the effectiveness of the proposed algorithm, experiments were carried out in six large-scale agile software project scheduling numerical examples, and the convergence of the proposed algorithm was analyzed. According to the performance measurement of the algorithm, it was compared with the existing representative algorithm DQN, double deep Q network and 12 single composite scheduling rules. The results showed that it had the highest average cumulative reward value in 6 different numerical examples.

参考文献/References:

[1] 王映红. 企业大规模敏捷转型探索与实践[ J] . 金融 科技时代, 2017, 25(11) : 84-85. 

WANG Y H. Exploration and practice of large-scale agile transformation of enterprises [ J ] . Financial Technology Time, 2017, 25(11) : 84-85.
 [2] BIESIALSKA K, FRANCH X, MUNTÉS-MULERO V. Mining dependencies in large-scale agile software development projects: a quantitative industry study[C]∥Evaluation and Assessment in Software Engineering. New York:ACM, 2021: 20-29.
 [3] COHN M. Agile estimating and planning [ M ] . Upper Saddle River: Prentice Hall,2005.
 [4] ZAPOTECAS-MARTÍNEZ S, GARCÍA-NÁJERA A, CERVANTES H. Multi-objective optimization in the agile software project scheduling using decomposition [ C ] ∥ Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion. New York: ACM, 2020: 1495-1502. 
[5] ROQUE L, ARAÚJO A A, DANTAS A, et al. Human Resource Allocation in Agile Software Projects Based on Task Similarities[C]∥International Symposium on Search Based Software Engineering. Cham: Springer, 2016: 291 -297.
 [6] XIAO J, AO X T, TANG Y. Solving software project scheduling problems with ant colony optimization[J]. Computers & Operations Research, 2013, 40(1): 33-46. 
 [7] SHEN X N, GUO Y N, LI A M. Cooperative coevolution with an improved resource allocation for large-scale multiobjective software project scheduling [ J ] . Applied Soft Computing, 2020, 88: 106059.
 [8] PADBERG F, WEISS D. Optimal scheduling of software projects using reinforcement learning[ C]∥2011 18th Asia-Pacific Software Engineering Conference. Piscataway: IEEE, 2012: 9-16.
 [9] LIU W T, SU S, TANG T, et al. A DQN-based intelligent control method for heavy haul trains on long steep downhill section[ J] . Transportation Research Part C: Emerging Technologies, 2021, 129: 103249.
 [10] HUYNH T N, DO D T T, LEE J. Q-Learning-based parameter control in differential evolution for structural optimization[J]. Applied Soft Computing, 2021, 107: 107464.
 [11] LUO S. Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning [ J ] . Applied Soft Computing, 2020, 91: 106208. 
[12] LI Y X, GU W B, YUAN M H, et al. Real-time datadriven dynamic scheduling for flexible job shop with insufficient transportation resources using hybrid deep Q network[ J] . Robotics and Computer-Integrated Manufacturing, 2022, 74: 102283.
[13] Van HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning [ C]∥Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. New York: ACM, 2016: 2094-2100.
 [14] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB / OL]. ( 2016- 02- 25) [ 2023- 02-09]. https:∥arxiv. org / abs/ 1511. 05952.
 [15] 王培崇, 尹欣洁, 李丽荣. 一种具有学习机制的海鸥 优化算 法 [ J] . 郑 州 大 学 学 报 ( 工 学 版) , 2022, 43 (6) : 8-14. 
WANG P C, YIN X J, LI L R. An improved seagull optimization algorithm with learning[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(6): 8-14.

相似文献/References:

[1]王丙琛,司怀伟,谭国真.基于深度强化学习的自动驾驶车控制算法研究[J].郑州大学学报(工学版),2020,41(04):41.[doi:10.13705/j.issn.1671-6833.2020.04.002]
 Wang Bingchen,Si Huaiwei,Tan Guozhen.Research on Autopilot Control Algorithms Based on Deep Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2020,41(05):41.[doi:10.13705/j.issn.1671-6833.2020.04.002]

更新日期/Last Update: 2023-09-03