«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1671-6833. 2024. 03. 001]
点击复制

基于特征融合和混合注意力的小目标检测()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 45
期数:: 2024年03期

页码:: 72-79

栏目:

出版日期:: 2024-04-20

文章信息/Info

Title:: Small Object Detection Based on Feature Fusion and Mixed Attention

文章编号:: 1671-6833( 2024) 03-0072-08

作者:: 魏明军1; 2; 王镆涵1; 刘亚志1; 2; 李辉1; 1. 华北理工大学人工智能学院,河北唐山 063210;2. 华北理工大学河北省工业智能感知重点实验室,河北唐山

Author(s):: WEI Mingjun¹; 2; WANG Mohan¹; LIU Yazhi¹; 2; LI Hui¹; 1. College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China; 2. Hebei Provincial Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China

关键词:: 小目标检测; 注意力机制; 特征融合; 深度学习; 实时检测

Keywords:: small target detection; attention mechanism; feature fusion; deep learning; real-time detection

分类号:: TP391. 4TP18

DOI:: 10. 13705/ j. issn. 1671-6833. 2024. 03. 001

文献标志码:: A

摘要:: 针对目标检测任务中小目标特征信息不足、检测率较低,且错、漏检率较高等缺点,提出一种基于多尺度特征融合以及混合注意力机制的Tr-SSD 算法。首先,使用Resnet50 残差网络作为SSD 算法的骨干网络,增强SSD 算法的特征提取能力;其次,设计了一种混合注意力机制并将其应用于网络的中尺度特征图中以增强特征图中的有效信息,并建立信息间的远距离依赖;最后,使用以Transformer 为核心的网络层与替换骨干网络后的SSD 算法形成FPN 结构,融合不同尺度的特征信息,以更准确地对小目标进行定位。实验结果表明:Tr-SSD 算法在PASCAL VOC数据集、HRSID 数据集和RSOD 遥感数据集上检测的mAP 值分别达到81. 9%、87. 5%和88. 4%,比SSD 算法分别提高了4. 7 百分点、6. 8 百分点和9. 2 百分点,且检测速度均满足实时检测的要求。

Abstract:: To address to the low feature information, low detection rates, and high false rate and missing rate in the target detection task, a Tr-SSD algorithm based on multiscale feature fusion and a hybrid attention mechanism was proposed. Firstly, a Resnet50 residual network was utilized as the backbone network for the SSD algorithm to enhance its feature extraction capabilities. Secondly, a hybrid attention mechanism was designed and applied to the mid-scale feature maps of the network to enhance effective information within the feature maps and establish longrange dependencies between pieces of information. Finally, a FPN (feature pyramid network) structure was formed by using network layers centered around the Transformer instead of the original backbone network in the SSD algorithm, which fused feature information of different scales to more accurately locate small targets. Experimental results showed that the Tr-SSD algorithm achieved mAP values of 81. 9%, 87. 5%, and 88. 4% on the PASCAL VOC dataset, HRSID dataset, and RSOD remote sensing dataset, respectively. This represented an improvement of 4. 7 percentage points, 6. 8 percentage points, and 9. 2 percentage points compared to the original SSD algorithm. Moreover, the detection speed could meet the requirements for real-time detection.

参考文献/References:

[1] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[J]. Computer Vision,2016,9905: 21-37.

[2] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [ EB/ OL]. ( 2017 - 01 - 23)[2023-09-17]. https:∥arxiv. org/ abs/ 1701. 06659.

[3] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 936-944.

[4] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]∥2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 10778-10787.

[5] QIAO S Y, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]∥2021 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 10208-10219.

[6] LIM J S, ASTRID M, YOON H J, et al. Small object detection using context and attention[C]∥2021 International Conference on Artificial Intelligence in Information and Communication ( ICAIIC ). Piscataway: IEEE, 2021: 181-186.

[7] YANG C, HUANG Z H, WANG N Y. QueryDet: cascaded sparse query for accelerating high-resolution small object detection[C]∥2022 IEEE/ CVF Conference on Computer Vision and Pattern Recognition ( CVPR). Piscataway: IEEE, 2022: 13658-13667.

[8] REDMON J,FARHADI A. YOLOv3:an incremental improvement[EB/ OL]. (2018- 04- 08) [2023- 09- 17].https:∥arxiv. org/ abs/ 1804. 02767.

[9] 马学森, 马吉, 蒋功辉, 等. 基于注意力机制和多尺度特征融合的绝缘子缺陷检测方法[J]. 南京大学学报(自然科学), 2022, 58(6): 1020-1029.

MA X S, MA J, JIANG G H, et al. Insulator defect detection method based on attention mechanism and multiscale feature fusion [ J]. Journal of Nanjing University (Natural Science), 2022, 58(6): 1020-1029.

[10] QU Z, HAN T Q, YI T M. MFFAMM: a small object detection with multi-scale feature fusion and attention mechanism module [ J]. Applied Sciences, 2022, 12 (18): 8940.

[11] NI J, WANG R, TANG J. ADSSD: improved single-shot detector with attention mechanism and dilated convolution [J]. Applied Sciences, 2023, 13(6): 4038.

[12] SHAN D R, XU Y L, ZHANG P, et al. DPSSD: dual-path single-shot detector[J]. Sensors, 2022, 22(12): 4616.

[13] SHI H, CHAI B Q, WANG Y P, et al. A local-sparseinformation- aggregation transformer with explicit contour guidance for SAR ship detection[ J]. Remote Sensing, 2022, 14(20): 5247.

[14] ZHANG X S, WAN F, LIU C, et al. Learning to match anchors for visual object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (6): 3096-3109.

[15] SHI H, FANG Z H, WANG Y P, et al. An adaptive sample assignment strategy based on feature enhancement for ship detection in SAR images[J]. Remote Sensing, 2022, 14(9): 2238.

[16] YU J M, WU T, ZHANG X, et al. An efficient lightweight SAR ship target detection network with improved regression loss function and enhanced feature information expression[J]. Sensors, 2022, 22(9): 3447.

[17] LIU M J, WANG X H, ZHOU A J, et al. UAV-YOLO: small object detection on unmanned aerial vehicle perspective[J]. Sensors, 2020, 20(8): 2238.

[18] HUANG Z C, WANG J L, FU X S, et al. DC-SPP-YOLO:dense connection and spatial pyramid pooling based YOLO for object detection [ J]. Information Sciences,2020, 522: 241-258.

相似文献/References:

[1]张震,陈可鑫,陈云飞.优化聚类和引入 CBAM 的 YOLOv5 管制刀具检测[J].郑州大学学报(工学版),2023,44(05):40.[doi:10.13705/j.issn.1671-6833.2022.05.015]
　ZHANG Zhen,CHEN Kexin,CHEN Yunfei.YOLOv5 with Optimized Clustering and CBAM for Controlled Knife Detection[J].Journal of Zhengzhou University (Engineering Science),2023,44(03):40.[doi:10.13705/j.issn.1671-6833.2022.05.015]
[2]崔建明,蔺繁荣,张迪,等.基于有向图的强化学习自动驾驶轨迹预测[J].郑州大学学报(工学版),2023,44(05):53.[doi:10.13705/j.issn.1671-6833.2023.05.002]
　CUI Jianming,LIN Fanrong,ZHANG Di,et al.Reinforcement Learning Autonomous Driving Trajectory Prediction Based on Directed Graph[J].Journal of Zhengzhou University (Engineering Science),2023,44(03):53.[doi:10.13705/j.issn.1671-6833.2023.05.002]
[3]李卫军,张新勇,高庾潇,等.基于门控时空注意力的视频帧预测模型[J].郑州大学学报(工学版),2024,45(01):70.[doi:10.13705/j.issn.1671-6833.2024.01.017]
　LI Weijun,ZHANG Xinyong,GAO Yuxiao,et al.Video Frame Prediction Model Based on Gated Spatio-Temporal Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(03):70.[doi:10.13705/j.issn.1671-6833.2024.01.017]
[4]王瑜,毕玉,石健彤,等.基于注意力与多级特征融合的 YOLOv5 算法[J].郑州大学学报(工学版),2024,45(03):38.[doi:10. 13705 / j. issn. 1671-6833. 2023. 06. 009]
　LIU Xin,XU Hongzhen,LIU Aihua,et al.Geological Named Entity Recognition Based on MacBERT and R-Drop[J].Journal of Zhengzhou University (Engineering Science),2024,45(03):38.[doi:10. 13705 / j. issn. 1671-6833. 2023. 06. 009]
[5]林楠,唐凯鹏,牛勇鹏,等.基于双阶段特征提取网络的 ECG 降噪分类算法[J].郑州大学学报(工学版),2024,45(05):61.[doi:10.13705/j.issn.1671-6833.2024.05.005]
　LIN Nan,TANG Kaipeng,NIU Yongpeng,et al.An ECG Denoising and Classification Algorithm Based on Two-stage Feature Extraction Network[J].Journal of Zhengzhou University (Engineering Science),2024,45(03):61.[doi:10.13705/j.issn.1671-6833.2024.05.005]
[6]林予松,李孟娅,李英豪,等.基于GAN和多尺度空间注意力的多模态医学图像融合[J].郑州大学学报(工学版),2025,46(01):1.[doi:10.13705/j.issn.1671-6833.2025.01.001]
　LIN Yusong,,et al.Multimodal Medical Image Fusion Based on GAN and Multiscale Spatial Attention[J].Journal of Zhengzhou University (Engineering Science),2025,46(03):1.[doi:10.13705/j.issn.1671-6833.2025.01.001]
[7]赵　冬,李亚瑞,王文相,等.基于动态融合注意力机制的电力负荷缺失数据填充模型[J].郑州大学学报(工学版),2025,46(02):111.[doi:10.13705/j.issn.1671-6833.2024.05.004]
　ZHAO Dong,LI Yarui,WANG Wenxiang,et al.Power Load Missing Data Imputation Model Based on Dynamic Fusion Attention Mechanism[J].Journal of Zhengzhou University (Engineering Science),2025,46(03):111.[doi:10.13705/j.issn.1671-6833.2024.05.004]
[8]燕雨,荆宇超,史孟翔,等.基于改进 YOLOv5 算法的钢材表面缺陷检测[J].郑州大学学报(工学版),2025,46(04):93.[doi:10.13705/j.issn.1671-6833.2025.01.007]
　YAN Yu,JING Yuchao,SHI Mengxiang,et al.Steel Surface Defect Detection Based on Improved YOLOv5 Algorithm[J].Journal of Zhengzhou University (Engineering Science),2025,46(03):93.[doi:10.13705/j.issn.1671-6833.2025.01.007]

备注/Memo

备注/Memo:: 收稿日期:2023-09-20;修订日期:2023-10-19
基金项目:科技部重点研发项目(2017YFE0135700);河北省高等学校科学技术研究项目(ZD2022102)
作者简介:魏明军(1969—),男,河北唐山人,华北理工大学教授,主要从事计算机视觉、入侵检测、机器学习、数据挖掘等方面的研究,E-mail:weimj@ ncst. edu. cn。

更新日期/Last Update: 2024-04-29

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

相似文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics