[1]ZHAO Xin,FEI Xiaohu,WANG Dongyu,et al.Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-9.[doi:10.13705/j.issn.1671-6833.2026.05.001]
Copy
Journal of Zhengzhou University (Engineering Science)[ISSN
1671-6833/CN
41-1339/T] Volume:
48
Number of periods:
2027 XX
Page number:
1-9
Column:
Public date:
2027-12-10
- Title:
-
Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD
- Author(s):
-
ZHAO Xin 1,2 , FEI Xiaohu 1 , WANG Dongyu 1 , HAN Shoufei1
-
1. School of Artificial Intelligence, Anhui University of Science and Technology, Anhui , Huainan 232001 , China; 2 . The development of intelligent technology for the mechanised extraction of coal is being conducted at the National Key Laboratory for Numerical Simulation of Geomechanics, Chinese Academy of Sciences, Anhui, Huainan 232001, China
-
- Keywords:
-
Infrared Dynamic Target Detection; YOLOv12; DAM; CACONV; Multi-dimensional channel attention mechanism
- CLC:
-
TP391.41;TN219
- DOI:
-
10.13705/j.issn.1671-6833.2026.05.001
- Abstract:
-
To overcome the limitation that existing infrared object detection algorithms had inadequately exploited temporal information and inter-frame dependencies in dynamic target detection, thereby resulting in suboptimal detection accuracy, a real-time infrared dynamic object detection framework based on YOLO-IDOD, incorporating a Dynamic Attention Module (DAM) and a Channel Attention Convolution (CACONV) module, has been proposed. The YOLOv12s architecture had been employed as the baseline network, in which a dynamic attention mechanism had been integrated at the input stage to extract short-term optical flow features via an optical flow network, effectively suppressing background motion interference and enhancing the network’s sensitivity to target motion characteristics. Furthermore, a channel attention convolution module had been embedded within the network architecture, where channel-wise attention mechanisms had been introduced at both the input and output stages to facilitate more discriminative feature representation and selection for the DAM-enhanced features. The proposed modules had been designed as plug-and-play components, enabling spatiotemporal feature aggregation and adaptive feature selection, thereby improving the generalization capability of the network for infrared dynamic target detection. Experimental evaluations had demonstrated that the improved YOLO-IDOD model had achieved a precision of 79.9%, a recall of 62.5%, an mAP@50 of 77.7%, and an mAP@95 of 57.3% on a mixed dataset composed of a self-constructed dataset (IRDA) and the public FLIR_ADAS_v2 dataset. Compared with the baseline YOLOv12s model, precision, mAP@50, and mAP@95 had been improved by 5.2, 4.6, and 2.4 percentage points, respectively, while maintaining a comparable recall rate, thereby effectively enhancing detection accuracy and generalization performance for infrared dynamic targets.