[1]张富强,张焱锐,丁 凯,等.基于多智能体强化学习的AMR协作任务分配方法[J].郑州大学学报(工学版),2025,46(03):26-33.[doi:10.13705/j.issn.1671-6833.2025.03.001]
 ZHANG Fuqiang,ZHANG Yanrui,DING Kai,et al.AMRs Autonomous Collaboration Task Assignment Method Based on Multi-agent Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2025,46(03):26-33.[doi:10.13705/j.issn.1671-6833.2025.03.001]
点击复制

基于多智能体强化学习的AMR协作任务分配方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
46
期数:
2025年03期
页码:
26-33
栏目:
出版日期:
2025-05-13

文章信息/Info

Title:
AMRs Autonomous Collaboration Task Assignment Method Based on Multi-agent Reinforcement Learning
文章编号:
1671-6833(2025)03-0026-08
作者:
张富强12 张焱锐12 丁 凯12 常丰田12
1.长安大学 道路施工技术与装备教育部重点实验室,陕西 西安 710064;2.长安大学 智能制造系统研究所,陕西 西安 710064
Author(s):
ZHANG Fuqiang12 ZHANG Yanrui12 DING Kai 12 CHANG Fengtian12
1.Key Laboratory of Road Construction Technology and Equipment of MOE, Chang’an University, Xi’an 710064, China; 2. Institute of Smart Manufacturing Systems, Chang’an University, Xi’an 710064, China
关键词:
自主移动机器人 多智能体 强化学习 协作 任务分配
Keywords:
autonomous mobile robot multi-agent reinforcement learning collaboration task assignment
分类号:
TP13
DOI:
10.13705/j.issn.1671-6833.2025.03.001
文献标志码:
A
摘要:
为了解决AMR在柔性生产中运输任务的自主分配难题,采用一种基于改进多智能体强化学习算法的多智能体深度确定性策略梯度算法(MADDPG)。首先,引入注意力机制对算法进行改进,采用中心化训练分散式执行的框架,并对AMR的动作及状态进行设置;其次,根据奖励值的大小确定任务节点的覆盖程度以及任务的完成效果;最后,在Pycharm上进行仿真,结果表明:MADDPG算法的平均奖励值较其他算法增幅为3,训练次数减少了300次,在保证求解任务分配完成度的基础上,具有更快的学习速度和更稳定的收敛过程。
Abstract:
In order to solve the task autonomy assignment problem of AMR in flexible production, a multi-agent deep deterministic policy gradient (MADDPG) algorithm based on improved multi-agent reinforcement learning algorithm was adopted. The attention mechanism was introduced to improve the algorithm. Firstly, the framework of centralized training decentralized execution was adopted, and then the action and state of AMR were set. Secondly, according to the size of the reward value, the coverage degree of the task node and the completion effect of the task were determined. The simulation results showed that the average reward value of MADDPG algorithm increase 3 than other algorithms, and the training times were reduced by 300 times. It could have faster learning speed and more stable convergence process while ensuring the completion of task allocation.

参考文献/References:

[1]SHOJAEINASAB A, CHARTER T, JALAYER M, et al. Intelligent manufacturing execution systems: a systematic review[J]. Journal of Manufacturing Systems, 2022, 62: 503-522. 

[2]李腾, 冯珊. 面向 “货到人” 拣选系统的一种随机调度策略[J]. 工业工程, 2020, 23(2): 59-66. 
LI T, FENG S. A research on a random scheduling strategy of “rack to picker” picking system[J]. Industrial Engineering Journal, 2020, 23(2): 59-66. 
[3]FRAGAPANE G, IVANOV D, PERON M, et al. Increasing flexibility and productivity in Industry 4.0 production networks with autonomous mobile robots and smart intralogistics[J]. Annals of Operations Research, 2022, 308(1): 125-143. 
[4]HERCIK R, BYRTUS R, JAROS R, et al. Implementation of autonomous mobile robot in SmartFactory[J]. Applied Sciences, 2022, 12(17): 8912. 
[5]WANG X, WANG L, WANG S Y, et al. Recommending-and-grabbing: a crowdsourcing-based order allocation pattern for on-demand food delivery[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(1): 838-853. 
[6]刘广瑞, 王庆海, 姚冬艳. 基于改进人工蜂群算法的多无人机协同任务规划[J]. 郑州大学学报(工学版), 2018, 39(3): 51-55. 
LIU G R, WANG Q H, YAO D Y. Multi-UAV cooperative mission planning based on improved artificial bee colony algorithm[J]. Journal of Zhengzhou University (Engineering Science), 2018, 39(3): 51-55. 
[7]WU X W, XIAO B, CAO L, et al. Optimal transport and model predictive control-based simultaneous task assignment and trajectory planning for unmanned system swarm [J]. Journal of Intelligent & Robotic Systems, 2024, 110(1): 28. 
[8]王俊英, 颜芬芬, 陈鹏, 等. 基于概率自适应蚁群算法的云任务调度方法[J]. 郑州大学学报(工学版), 2017, 38(4): 51-56. 
WANG J Y, YAN F F, CHEN P, et al. Task scheduling method based on probability adaptive ant colony optimization in cloud computing[J]. Journal of Zhengzhou University (Engineering Science), 2017, 38(4): 51-56. 
[9]吴蔚楠, 关英姿, 郭继峰, 等. 基于SEAD任务特性约束的协同任务分配方法[J]. 控制与决策, 2017, 32 (9): 1574-1582. 
WU W N, GUAN Y Z, GUO J F, et al. Research on cooperative task assignment method used to the mission SEAD with real constraints[J]. Control and Decision, 2017, 32(9): 1574-1582.
[10]鞠锴, 冒泽慧, 姜斌, 等. 基于势博弈的异构多智能体系统任务分配和重分配[J]. 自动化学报, 2022, 48 (10): 2416-2428. 
JU K, MAO Z H, JIANG B, et al. Task allocation and reallocation for heterogeneous multiagent systems based on potential game[J]. Acta Automatica Sinica, 2022, 48 (10): 2416-2428.
[11]施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究 [J]. 自动化学报, 2021, 47 (7): 1610-1623. 
SHI W, FENG Y H, CHENG G Q, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning [J]. ACTA Automatica Sinica, 2021, 47(7):1610-1623.
[12] MOTES J, SANDSTRÖM R, LEE H, et al. Multi-robot task and motion planning with subtask dependencies[J]. IEEE Robotics and Automation Letters, 2020, 5(2): 3338-3345. 
[13] YIN Z Z, LIU J H, WANG D P. Multi-AGV task allocation with attention based on deep reinforcement learning [J]. International Journal of Pattern Recognition and Artificial Intelligence, 2022, 36(9): 1-20. 
[14] LI M G, MA M, WANG L, et al. Multitask-oriented collaborative crowdsensing based on reinforcement learning and blockchain for intelligent transportation system[J]. IEEE Transactions on Industrial Informatics, 2023, 19 (9): 9503-9514. 
[15] OROOJLOOY A, HAJINEZHAD D. A review of cooperative multi-agent deep reinforcement learning[J]. Applied Intelligence, 2023, 53(11): 13677-13722. 
[16]WANG H P, LI S Q, JI H C. Fitness-based hierarchical reinforcement learning for multi-human-robot task allocation in complex terrain conditions[J]. Arabian Journal for Science and Engineering, 2023, 48(5): 7031-7041. 
[17] XIAO X J, PAN Y H, LV L L, et al. Scheduling multimode resource-constrained tasks of automated guided vehicles with an improved particle swarm optimization algorithm[J]. IET Collaborative Intelligent Manufacturing, 2021, 3(2): 93-104. 
[18]王乐,齐尧,何滨兵,等.机器人自主探索算法综述[J].计算机应用,2023,43(A1):314-322. 
WANG L, QI Y, HE B B,et al. Survey of autonomous exploration algorithms for robots[J]. Journal of Computer Applications, 2023,43(A1):314-322. 
[19] VU Q T, DUONG V T, NGUYEN H H, et al. Optimization of swimming mode for elongated undulating fin using multi-agent deep deterministic policy gradient[J]. Engineering Science and Technology, an International Journal, 2024, 56: 101783. 
[20] SUMIEA E H, ABDULKADIR S J, ALHUSSIAN H S, et al. Deep deterministic policy gradient algorithm: a systematic review[J]. Heliyon, 2024, 10(9): e30697. 
[21] CHAI J J, LI W F, ZHU Y H, et al. UNMAS: multiagent reinforcement learning for unshaped cooperative scenarios[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2093-2104. 
[22] MENG X Q, JIANG J H, WANG H. AGWO: advanced GWO in multi-layer perception optimization[J]. Expert Systems with Applications, 2021, 173: 114676. 
[23]敬超,全育涛,陈艳.基于多层感知机-注意力模型的功耗预测算法[J/OL].计算机应用,2024:1-10(202411-13)[2024-11-15].http:∥kns. cnki.net/kcms/detail/51.1307.TP.20241112.1237.004.html. 
JING C, QUAN Y T, CHEN Y. Improved multi-layer perceptron and attention model-based power consumption prediction algorithm[J/OL]. Journal of Computer Applications, 2024:1-10(2024-11-13)[2024-11-15].http: ∥kns. cnki. net/kcms/detail/51. 1307. TP. 20241112. 1237.004.html.

相似文献/References:

[1]许珉,李宏晓,白春涛..基于网络节点编号的多智能体电网操作票专家系统的研究[J].郑州大学学报(工学版),2007,28(01):30.[doi:10.3969/j.issn.1671-6833.2007.01.008]
 Xu Min,LI Hongxiao,Bai Chuntao.Research on multi-smart grid operation ticket expert system based on network node number[J].Journal of Zhengzhou University (Engineering Science),2007,28(03):30.[doi:10.3969/j.issn.1671-6833.2007.01.008]

更新日期/Last Update: 2025-05-22