Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD

NAVIGATE

Table of Contents

STATISTICS

Viewed68

Downloads220

Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD

PDF下载 (220)

[1]ZHAO Xin,FEI Xiaohu,WANG Dongyu,et al.Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD[J].Journal of Zhengzhou University (Engineering Science),2027,48(XX):1-9.[doi:10.13705/j.issn.1671-6833.2026.05.001]

Copy

Journal of Zhengzhou University (Engineering Science)[ISSN 1671-6833/CN 41-1339/T] Volume: 48 Number of periods: 2027 XX Page number: 1-9 Column: Public date: 2027-12-10

Title:: Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD

Author(s):: ZHAO Xin ^1,2 , FEI Xiaohu ¹ , WANG Dongyu ¹ , HAN Shoufei¹; 1. School of Artificial Intelligence, Anhui University of Science and Technology, Anhui , Huainan 232001 , China; 2 . The development of intelligent technology for the mechanised extraction of coal is being conducted at the National Key Laboratory for Numerical Simulation of Geomechanics, Chinese Academy of Sciences, Anhui, Huainan 232001, China

Keywords:: Infrared Dynamic Target Detection; YOLOv12; DAM; CACONV; Multi-dimensional channel attention mechanism

CLC:: TP391.41；TN219

DOI:: 10.13705/j.issn.1671-6833.2026.05.001

Abstract:: To overcome the limitation that existing infrared object detection algorithms had inadequately exploited temporal information and inter-frame dependencies in dynamic target detection, thereby resulting in suboptimal detection accuracy, a real-time infrared dynamic object detection framework based on YOLO-IDOD, incorporating a Dynamic Attention Module (DAM) and a Channel Attention Convolution (CACONV) module, has been proposed. The YOLOv12s architecture had been employed as the baseline network, in which a dynamic attention mechanism had been integrated at the input stage to extract short-term optical flow features via an optical flow network, effectively suppressing background motion interference and enhancing the network’s sensitivity to target motion characteristics. Furthermore, a channel attention convolution module had been embedded within the network architecture, where channel-wise attention mechanisms had been introduced at both the input and output stages to facilitate more discriminative feature representation and selection for the DAM-enhanced features. The proposed modules had been designed as plug-and-play components, enabling spatiotemporal feature aggregation and adaptive feature selection, thereby improving the generalization capability of the network for infrared dynamic target detection. Experimental evaluations had demonstrated that the improved YOLO-IDOD model had achieved a precision of 79.9%, a recall of 62.5%, an mAP@50 of 77.7%, and an mAP@95 of 57.3% on a mixed dataset composed of a self-constructed dataset (IRDA) and the public FLIR_ADAS_v2 dataset. Compared with the baseline YOLOv12s model, precision, mAP@50, and mAP@95 had been improved by 5.2, 4.6, and 2.4 percentage points, respectively, while maintaining a comparable recall rate, thereby effectively enhancing detection accuracy and generalization performance for infrared dynamic targets.

References:: [1] Xu Huilin, Zhao Xin, Yu Bo, et al. Multi-resolution feature extraction algorithm for semantic segmentation of infrared images[J]. Infrared Technology, 2024, 46(5): 556-564.
[2] Li Yuanbo, Zhou Ping, Zhou Gongbo, et al. A comprehensive survey of visible and infrared imaging in complex environments: principle, degradation and enhancement[J]. Information Fusion, 2025, 119: 103036.
[3] Chen Tianxiang, Ye Zi, Tan Zhentao, et al. MiM-ISTD: mamba-in-mamba for efficient infrared small-target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 500713.
[4] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2014-09-01)[2025-11-10]. https://doi.org/10.4850/arXiv.1409.0473.
[5] Ye Baicheng, Zhu Youpan, Zhou Yongkang, et al. Review of lightweight target detection algorithms[J]. Infrared Technology, 2025, 47(3): 289-298.
[6] Guo Haofan, Jiao Ting, Sun Fangliang, et al. Real-time infrared imaging gas-leak detection method based on improved YOLOv5-seg[J]. Infrared Technology, 2025, 47(7): 918-927.
[7] Dai Yimian, Wu Yiquan, Zhou Fei, et al. Attentional local contrast networks for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11): 9813-9824.
[8] Yue Taoran, Li Xiaojin, Cai Jiaxi, et al. YOLO-MST: multiscale deep learning method for infrared small target detection based on super-resolution and YOLO[J]. Optics & Laser Technology, 2025, 187: 112835.
[9] Wang Quan, Liu Fengyuan, Cao Yi, et al. LFIR-YOLO: lightweight model for infrared vehicle and pedestrian detection[J]. Sensors, 2024, 24(20): 7197.
[10] Sun Mingyuan, Zhang Haochun, Huang Ziliang, et al. Road infrared target detection with I-YOLO[J]. IET Image Processing, 2022, 16(1): 92-101.
[11] Ling Song, Hong Xianggong, Liu Yongchao. YOLO-APDM: improved YOLOv8 for road target detection in infrared images[J]. Sensors, 2024, 24(22): 7197.
[12] Sohan M, Sai Ram T, Rami Reddy C V. A review on YOLOv8 and its advancements[C]//Data intelligence and cognitive informatics. Singapore: Springer Nature Singapore, 2024: 529-545.
[13] Wang Yong, Wang Bairong, Huo Lile, et al. GT-YOLO: nearshore infrared ship detection based on infrared images[J]. Journal of Marine Science and Engineering, 2024, 12(2): 213.
[14] Zhao Xiaofeng, Zhang Wenwen, Zhang Hui, et al. ITD-YOLOv8: an infrared target detection model based on YOLOv8 for unmanned aerial vehicles[J]. Drones, 2024, 8(4): 161.
[15] Hao Xinyue, Luo Shaojuan, Chen Meiyun, et al. Infrared small target detection with super-resolution and YOLO[J]. Optics & Laser Technology, 2024, 177: 11221.
[16] Tian Yunjie, Ye Qixiang, Doermann D, et al. YOLOv12: attention-centric real-time object detectors[PP/OL]. V1. arXiv (2025-02-18)[2025-11-10]. https://doi.org/10.4850/arXiv.2502.12524.
[17] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-787.
[18] Liu Wei, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detection[C]//Computer vision - ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[19] Chen Yuming, Yuan Xinbin, Wang Jiabao, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(6): 4240-4252.
[20] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[21] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[PP/OL]. V7. arXiv (2023-08-02)[2025-11-10]. https://doi.org/10.4850/arXiv.1706.03762.
[22] Zhao Yan, Lv Wenyu, Xu Shangliang, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 16965-16974.
[23] Lv Wenyu, Zhao Yan, Chang Qinyao, et al. RT-DETRv2: improved baseline with bag-of-freebies for real-time detection transformer[PP/OL]. (2024-07-24)[2025-11-10]. https://doi.org/10.4850/arXiv.2407.17140.
[24] Hui T W, Tang Xiaoou, Loy C C. A lightweight optical flow CNN—revisiting data fidelity and regularization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2555-2569.
[25] Lin Min, Chen Qiang, Yan Shuicheng. Network in network[PP/OL]. (2014-03-04)[2025-11-10]. https://doi.org/10.4850/arXiv.1312.4400.
[26] Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[27] Uzun E, Dursun A A, Akagündüz E. Augmentation of atmospheric turbulence effects on thermal adapted object detection models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE, 2022: 240-247.
[28] Gower R M, Loizou N, Qian Xun, et al. SGD: general analysis and improved rates[PP/OL]. V4. arXiv (2019-05-01)[2025-11-10]. https://doi.org/10.4850/arXiv.1901.09401.
[29] Khanam R, Hussain M. YOLOv11: an overview of the key architectural enhancements[PP/OL]. (2024-10-23)[2025-11-10]. https://doi.org/10.4850/arXiv.2410.17725.
[30] Chen Yuming, Yuan Xinbin, Wang Jiabao, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(6): 4240-4252.
[31] Li Shuang, Han Bingfeng, Yu Zhenjie, et al. I2V-GAN: unpaired infrared-to-visible video translation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3061-3069.

Similar References:

Memo

Last Update: 2026-04-03