«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2026.05.001]
点击复制

基于YOLO-IDOD的红外动态目标实时检测算法()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 48
期数:: 2027年XX

页码:: 1-9

栏目:

出版日期:: 2027-12-10

文章信息/Info

Title:: Real-time Detection Algorithm for Infrared Dynamic Targets Based on YOLO-IDOD

作者:: 赵鑫 ^1,2 , 费晓虎 ¹ , 王东宇 ¹ , 韩守飞¹; 1. 安徽理工大学人工智能学院,安徽淮南 232001;2. 安徽理工大学煤炭无人化开采数智技术全国重点实验室,安徽淮南 232001

Author(s):: ZHAO Xin ^1,2 , FEI Xiaohu ¹ , WANG Dongyu ¹ , HAN Shoufei¹; 1. School of Artificial Intelligence, Anhui University of Science and Technology, Anhui , Huainan 232001 , China; 2 . The development of intelligent technology for the mechanised extraction of coal is being conducted at the National Key Laboratory for Numerical Simulation of Geomechanics, Chinese Academy of Sciences, Anhui, Huainan 232001, China

关键词:: 红外动态目标检测; YOLOv12; DAM; CACONV; 多维通道注意力机制

Keywords:: Infrared Dynamic Target Detection; YOLOv12; DAM; CACONV; Multi-dimensional channel attention mechanism

分类号:: TP391.41；TN219

DOI:: 10.13705/j.issn.1671-6833.2026.05.001

文献标志码:: A

摘要:: 针对现有红外目标检测算法对动态目标检测时存在未充分利用时序信息、挖掘连续帧之间的关联性导致检测精度不高的问题，提出一种以 DAM 与 CACONV 为核心的基于 YOLO-IDOD 的红外动态目标实时检测算法。以 YOLOv12s 作为基础网络架构，首先，在输入端引入动态关注模块，使用光流计算短时光流特征，抑制背景运动噪声，使网络关注实际目标的运动特征，提升检测精度；其次，在网络架构中引入通道注意力卷积模块，该模块在输入通道与输出通道均增加通道注意力机制，使网络能够更好地理解与关注 DAM 模块输入的数据特征；最后，将上述模块作为优化动态目标检测模型即插即用模块，使网络具备时空聚合与特征选择能力，提升网络对于红外动态目标检测的泛化性能。实验结果表明：改进后的 YOLO-IDOD 模型在自建数据集 IRDA 和公共数据集 FLIR_ADAS_v2 的混合数据集上对红外动态目标检测取得的准确率 P、召回率 R、mAP@50 和 mAP@95 分别为 79.9%、62.5%、77.7% 和 57.3%，相较于改进前的 YOLOv12s 基准模型，在维持召回率的同时准确率提升 5.2 个百分点、mAP@50 提升 4.6 个百分点、mAP@95 提升 2.4 个百分点，有效提升了对于动态目标的检测精度与泛化能力。

Abstract:: To overcome the limitation that existing infrared object detection algorithms had inadequately exploited temporal information and inter-frame dependencies in dynamic target detection, thereby resulting in suboptimal detection accuracy, a real-time infrared dynamic object detection framework based on YOLO-IDOD, incorporating a Dynamic Attention Module (DAM) and a Channel Attention Convolution (CACONV) module, has been proposed. The YOLOv12s architecture had been employed as the baseline network, in which a dynamic attention mechanism had been integrated at the input stage to extract short-term optical flow features via an optical flow network, effectively suppressing background motion interference and enhancing the network’s sensitivity to target motion characteristics. Furthermore, a channel attention convolution module had been embedded within the network architecture, where channel-wise attention mechanisms had been introduced at both the input and output stages to facilitate more discriminative feature representation and selection for the DAM-enhanced features. The proposed modules had been designed as plug-and-play components, enabling spatiotemporal feature aggregation and adaptive feature selection, thereby improving the generalization capability of the network for infrared dynamic target detection. Experimental evaluations had demonstrated that the improved YOLO-IDOD model had achieved a precision of 79.9%, a recall of 62.5%, an mAP@50 of 77.7%, and an mAP@95 of 57.3% on a mixed dataset composed of a self-constructed dataset (IRDA) and the public FLIR_ADAS_v2 dataset. Compared with the baseline YOLOv12s model, precision, mAP@50, and mAP@95 had been improved by 5.2, 4.6, and 2.4 percentage points, respectively, while maintaining a comparable recall rate, thereby effectively enhancing detection accuracy and generalization performance for infrared dynamic targets.

参考文献/References:

[1] Xu Huilin, Zhao Xin, Yu Bo, et al. Multi-resolution feature extraction algorithm for semantic segmentation of infrared images[J]. Infrared Technology, 2024, 46(5): 556-564.
[2] Li Yuanbo, Zhou Ping, Zhou Gongbo, et al. A comprehensive survey of visible and infrared imaging in complex environments: principle, degradation and enhancement[J]. Information Fusion, 2025, 119: 103036.
[3] Chen Tianxiang, Ye Zi, Tan Zhentao, et al. MiM-ISTD: mamba-in-mamba for efficient infrared small-target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 500713.
[4] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2014-09-01)[2025-11-10]. https://doi.org/10.4850/arXiv.1409.0473.
[5] Ye Baicheng, Zhu Youpan, Zhou Yongkang, et al. Review of lightweight target detection algorithms[J]. Infrared Technology, 2025, 47(3): 289-298.
[6] Guo Haofan, Jiao Ting, Sun Fangliang, et al. Real-time infrared imaging gas-leak detection method based on improved YOLOv5-seg[J]. Infrared Technology, 2025, 47(7): 918-927.
[7] Dai Yimian, Wu Yiquan, Zhou Fei, et al. Attentional local contrast networks for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(11): 9813-9824.
[8] Yue Taoran, Li Xiaojin, Cai Jiaxi, et al. YOLO-MST: multiscale deep learning method for infrared small target detection based on super-resolution and YOLO[J]. Optics & Laser Technology, 2025, 187: 112835.
[9] Wang Quan, Liu Fengyuan, Cao Yi, et al. LFIR-YOLO: lightweight model for infrared vehicle and pedestrian detection[J]. Sensors, 2024, 24(20): 7197.
[10] Sun Mingyuan, Zhang Haochun, Huang Ziliang, et al. Road infrared target detection with I-YOLO[J]. IET Image Processing, 2022, 16(1): 92-101.
[11] Ling Song, Hong Xianggong, Liu Yongchao. YOLO-APDM: improved YOLOv8 for road target detection in infrared images[J]. Sensors, 2024, 24(22): 7197.
[12] Sohan M, Sai Ram T, Rami Reddy C V. A review on YOLOv8 and its advancements[C]//Data intelligence and cognitive informatics. Singapore: Springer Nature Singapore, 2024: 529-545.
[13] Wang Yong, Wang Bairong, Huo Lile, et al. GT-YOLO: nearshore infrared ship detection based on infrared images[J]. Journal of Marine Science and Engineering, 2024, 12(2): 213.
[14] Zhao Xiaofeng, Zhang Wenwen, Zhang Hui, et al. ITD-YOLOv8: an infrared target detection model based on YOLOv8 for unmanned aerial vehicles[J]. Drones, 2024, 8(4): 161.
[15] Hao Xinyue, Luo Shaojuan, Chen Meiyun, et al. Infrared small target detection with super-resolution and YOLO[J]. Optics & Laser Technology, 2024, 177: 11221.
[16] Tian Yunjie, Ye Qixiang, Doermann D, et al. YOLOv12: attention-centric real-time object detectors[PP/OL]. V1. arXiv (2025-02-18)[2025-11-10]. https://doi.org/10.4850/arXiv.2502.12524.
[17] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 779-787.
[18] Liu Wei, Anguelov D, Erhan D, et al. SSD: single shot MultiBox detection[C]//Computer vision - ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
[19] Chen Yuming, Yuan Xinbin, Wang Jiabao, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(6): 4240-4252.
[20] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[21] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[PP/OL]. V7. arXiv (2023-08-02)[2025-11-10]. https://doi.org/10.4850/arXiv.1706.03762.
[22] Zhao Yan, Lv Wenyu, Xu Shangliang, et al. DETRs beat YOLOs on real-time object detection[C]//Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 16965-16974.
[23] Lv Wenyu, Zhao Yan, Chang Qinyao, et al. RT-DETRv2: improved baseline with bag-of-freebies for real-time detection transformer[PP/OL]. (2024-07-24)[2025-11-10]. https://doi.org/10.4850/arXiv.2407.17140.
[24] Hui T W, Tang Xiaoou, Loy C C. A lightweight optical flow CNN—revisiting data fidelity and regularization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(8): 2555-2569.
[25] Lin Min, Chen Qiang, Yan Shuicheng. Network in network[PP/OL]. (2014-03-04)[2025-11-10]. https://doi.org/10.4850/arXiv.1312.4400.
[26] Hu Jie, Shen Li, Sun Gang. Squeeze-and-excitation networks[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141.
[27] Uzun E, Dursun A A, Akagündüz E. Augmentation of atmospheric turbulence effects on thermal adapted object detection models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE, 2022: 240-247.
[28] Gower R M, Loizou N, Qian Xun, et al. SGD: general analysis and improved rates[PP/OL]. V4. arXiv (2019-05-01)[2025-11-10]. https://doi.org/10.4850/arXiv.1901.09401.
[29] Khanam R, Hussain M. YOLOv11: an overview of the key architectural enhancements[PP/OL]. (2024-10-23)[2025-11-10]. https://doi.org/10.4850/arXiv.2410.17725.
[30] Chen Yuming, Yuan Xinbin, Wang Jiabao, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(6): 4240-4252.
[31] Li Shuang, Han Bingfeng, Yu Zhenjie, et al. I2V-GAN: unpaired infrared-to-visible video translation[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 3061-3069.

备注/Memo

备注/Memo:: 收稿日期:2025-11-12;修订日期:2026-01-28
基金项目:国家自然科学基金资助项目(62306279) ;安徽省自然科学基金资助项目(2208085ME128)
作者简介:赵鑫(1991— ) ,男,山西运城人,安徽理工大学讲师,博士,主要从事红外图像处理与分割、语义融合研究,E-mail:zhaoxin@aust.edu.cn。

更新日期/Last Update: 2026-04-03

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics