[1]魏明军,王镆涵,刘亚志,等.基于特征融合和混合注意力的小目标检测[J].郑州大学学报(工学版),2024,45(03):72-79.[doi:10. 13705/ j. issn. 1671-6833. 2024. 03. 001]
 WEI Mingjun,WANG Mohan,LIU Yazhi,et al.Small Object Detection Based on Feature Fusion and Mixed Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(03):72-79.[doi:10. 13705/ j. issn. 1671-6833. 2024. 03. 001]
点击复制

基于特征融合和混合注意力的小目标检测()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
45卷
期数:
2024年03期
页码:
72-79
栏目:
出版日期:
2024-04-20

文章信息/Info

Title:
Small Object Detection Based on Feature Fusion and Mixed Attention
文章编号:
1671-6833( 2024) 03-0072-08
作者:
魏明军12 王镆涵1 刘亚志12 李 辉1
1. 华北理工大学 人工智能学院,河北 唐山 063210;2. 华北理工大学 河北省工业智能感知重点实验室,河北 唐山
Author(s):
WEI Mingjun12 WANG Mohan1 LIU Yazhi12 LI Hui1
1. College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China; 2. Hebei Provincial Key Laboratory of Industrial Intelligent Perception, North China University of Science and Technology, Tangshan 063210, China
关键词:
小目标检测 注意力机制 特征融合 深度学习 实时检测
Keywords:
small target detection attention mechanism feature fusion deep learning real-time detection
分类号:
TP391. 4TP18
DOI:
10. 13705/ j. issn. 1671-6833. 2024. 03. 001
文献标志码:
A
摘要:
针对目标检测任务中小目标特征信息不足、检测率较低,且错、漏检率较高等缺点,提出一种基于多尺度特征融合以及混合注意力机制的Tr-SSD 算法。首先,使用Resnet50 残差网络作为SSD 算法的骨干网络,增强SSD 算法的特征提取能力;其次,设计了一种混合注意力机制并将其应用于网络的中尺度特征图中以增强特征图中的有效信息,并建立信息间的远距离依赖;最后,使用以Transformer 为核心的网络层与替换骨干网络后的SSD 算法形成FPN 结构,融合不同尺度的特征信息,以更准确地对小目标进行定位。实验结果表明:Tr-SSD 算法在PASCAL VOC数据集、HRSID 数据集和RSOD 遥感数据集上检测的mAP 值分别达到81. 9%、87. 5%和88. 4%,比SSD 算法分别提高了4. 7 百分点、6. 8 百分点和9. 2 百分点,且检测速度均满足实时检测的要求。
Abstract:
To address to the low feature information, low detection rates, and high false rate and missing rate in the target detection task, a Tr-SSD algorithm based on multiscale feature fusion and a hybrid attention mechanism was proposed. Firstly, a Resnet50 residual network was utilized as the backbone network for the SSD algorithm to enhance its feature extraction capabilities. Secondly, a hybrid attention mechanism was designed and applied to the mid-scale feature maps of the network to enhance effective information within the feature maps and establish longrange dependencies between pieces of information. Finally, a FPN (feature pyramid network) structure was formed by using network layers centered around the Transformer instead of the original backbone network in the SSD algorithm, which fused feature information of different scales to more accurately locate small targets. Experimental results showed that the Tr-SSD algorithm achieved mAP values of 81. 9%, 87. 5%, and 88. 4% on the PASCAL VOC dataset, HRSID dataset, and RSOD remote sensing dataset, respectively. This represented an improvement of 4. 7 percentage points, 6. 8 percentage points, and 9. 2 percentage points compared to the original SSD algorithm. Moreover, the detection speed could meet the requirements for real-time detection.

参考文献/References:

[1] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[J]. Computer Vision,2016,9905: 21-37.
[2] FU C Y, LIU W, RANGA A, et al. DSSD: deconvolutional single shot detector [ EB/ OL]. ( 2017 - 01 - 23)[2023-09-17]. https:∥arxiv. org/ abs/ 1701. 06659.
[3] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 936-944.
[4] TAN M X, PANG R M, LE Q V. EfficientDet: scalable and efficient object detection[C]∥2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2020: 10778-10787.
[5] QIAO S Y, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]∥2021 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 10208-10219.
[6] LIM J S, ASTRID M, YOON H J, et al. Small object detection using context and attention[C]∥2021 International Conference on Artificial Intelligence in Information and Communication ( ICAIIC ). Piscataway: IEEE, 2021: 181-186.
[7] YANG C, HUANG Z H, WANG N Y. QueryDet: cascaded sparse query for accelerating high-resolution small object detection[C]∥2022 IEEE/ CVF Conference on Computer Vision and Pattern Recognition ( CVPR). Piscataway: IEEE, 2022: 13658-13667.
[8] REDMON J,FARHADI A. YOLOv3:an incremental improvement[EB/ OL]. (2018- 04- 08) [2023- 09- 17].https:∥arxiv. org/ abs/ 1804. 02767.
[9] 马学森, 马吉, 蒋功辉, 等. 基于注意力机制和多尺度特征融合的绝缘子缺陷检测方法[J]. 南京大学学报(自然科学), 2022, 58(6): 1020-1029.
MA X S, MA J, JIANG G H, et al. Insulator defect detection method based on attention mechanism and multiscale feature fusion [ J]. Journal of Nanjing University (Natural Science), 2022, 58(6): 1020-1029.
[10] QU Z, HAN T Q, YI T M. MFFAMM: a small object detection with multi-scale feature fusion and attention mechanism module [ J]. Applied Sciences, 2022, 12 (18): 8940.
[11] NI J, WANG R, TANG J. ADSSD: improved single-shot detector with attention mechanism and dilated convolution [J]. Applied Sciences, 2023, 13(6): 4038.
[12] SHAN D R, XU Y L, ZHANG P, et al. DPSSD: dual-path single-shot detector[J]. Sensors, 2022, 22(12): 4616.
[13] SHI H, CHAI B Q, WANG Y P, et al. A local-sparseinformation- aggregation transformer with explicit contour guidance for SAR ship detection[ J]. Remote Sensing, 2022, 14(20): 5247.
[14] ZHANG X S, WAN F, LIU C, et al. Learning to match anchors for visual object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (6): 3096-3109.
[15] SHI H, FANG Z H, WANG Y P, et al. An adaptive sample assignment strategy based on feature enhancement for ship detection in SAR images[J]. Remote Sensing, 2022, 14(9): 2238.
[16] YU J M, WU T, ZHANG X, et al. An efficient lightweight SAR ship target detection network with improved regression loss function and enhanced feature information expression[J]. Sensors, 2022, 22(9): 3447.
[17] LIU M J, WANG X H, ZHOU A J, et al. UAV-YOLO: small object detection on unmanned aerial vehicle perspective[J]. Sensors, 2020, 20(8): 2238.
[18] HUANG Z C, WANG J L, FU X S, et al. DC-SPP-YOLO:dense connection and spatial pyramid pooling based YOLO for object detection [ J]. Information Sciences,2020, 522: 241-258.


相似文献/References:

[1]张 震,陈可鑫,陈云飞.优化聚类和引入 CBAM 的 YOLOv5 管制刀具检测[J].郑州大学学报(工学版),2023,44(05):40.[doi:10.13705/j.issn.1671-6833.2022.05.015]
 ZHANG Zhen,CHEN Kexin,CHEN Yunfei.YOLOv5 with Optimized Clustering and CBAM for Controlled Knife Detection[J].Journal of Zhengzhou University (Engineering Science),2023,44(03):40.[doi:10.13705/j.issn.1671-6833.2022.05.015]
[2]崔建明,蔺繁荣,张 迪,等.基于有向图的强化学习自动驾驶轨迹预测[J].郑州大学学报(工学版),2023,44(05):53.[doi:10.13705/j.issn.1671-6833.2023.05.002]
 CUI Jianming,LIN Fanrong,ZHANG Di,et al.Reinforcement Learning Autonomous Driving Trajectory Prediction Based on Directed Graph[J].Journal of Zhengzhou University (Engineering Science),2023,44(03):53.[doi:10.13705/j.issn.1671-6833.2023.05.002]
[3]李卫军,张新勇,高庾潇,等.基于门控时空注意力的视频帧预测模型[J].郑州大学学报(工学版),2024,45(01):70.[doi:10.13705/j.issn.1671-6833.2024.01.017]
 LI Weijun,ZHANG Xinyong,GAO Yuxiao,et al.Video Frame Prediction Model Based on Gated Spatio-Temporal Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(03):70.[doi:10.13705/j.issn.1671-6833.2024.01.017]
[4]王 瑜,毕 玉,石健彤,等.基于注意力与多级特征融合的 YOLOv5 算法[J].郑州大学学报(工学版),2024,45(03):38.[doi:10. 13705 / j. issn. 1671-6833. 2023. 06. 009]
 LIU Xin,XU Hongzhen,LIU Aihua,et al.Geological Named Entity Recognition Based on MacBERT and R-Drop[J].Journal of Zhengzhou University (Engineering Science),2024,45(03):38.[doi:10. 13705 / j. issn. 1671-6833. 2023. 06. 009]

备注/Memo

备注/Memo:
收稿日期:2023-09-20;修订日期:2023-10-19
基金项目:科技部重点研发项目(2017YFE0135700);河北省高等学校科学技术研究项目(ZD2022102)
作者简介:魏明军(1969—),男,河北唐山人,华北理工大学教授,主要从事计算机视觉、入侵检测、机器学习、数据挖掘等方面的研究,E-mail:weimj@ ncst. edu. cn。
更新日期/Last Update: 2024-04-29