[1]韩慧健,邢怀宇,张云峰,等.基于Transformer多元注意力的钢材表面缺陷视觉检测[J].郑州大学学报(工学版),2025,46(05):69-76.[doi:10.13705/j.issn.1671-6833.2025.05.009]
 HAN Huijian,XING Huaiyu,ZHANG Yunfeng,et al.Visual Detection of Steel Surface Defects Based on Transformer and Multi-attention[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):69-76.[doi:10.13705/j.issn.1671-6833.2025.05.009]
点击复制

基于Transformer多元注意力的钢材表面缺陷视觉检测()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
46
期数:
2025年05期
页码:
69-76
栏目:
出版日期:
2025-08-10

文章信息/Info

Title:
Visual Detection of Steel Surface Defects Based on Transformer and Multi-attention
文章编号:
1671-6833(2025)05-0069-08
作者:
韩慧健 邢怀宇 张云峰 张 锐
山东财经大学 计算机科学与技术学院,山东 济南 250014
Author(s):
HAN Huijian XING Huaiyu ZHANG Yunfeng ZHANG Rui
School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
关键词:
缺陷检测 注意力机制 Transformer 混合采样 DETR
Keywords:
defect detection attention mechanism Transformer hybrid sampling DETR
分类号:
TP391TP18
DOI:
10.13705/j.issn.1671-6833.2025.05.009
文献标志码:
A
摘要:
针对钢材表面缺陷尺度不一和现有检测算法多尺度特征处理能力较差、精度不高的问题,提出一种混合采样与多元注意力协同的钢材表面缺陷检测方法。首先,构建高效通道特征提取主干网络模块,在复杂的钢材表面背景下着重提取缺陷特征;其次,提出一种双重注意力协同的特征金字塔,扩大网络感受野,更好地捕获多尺度缺陷特征,提高对小目标的检测性能;最后,设计出一种Transformer混合采样策略,动态感知缺陷区域,提高模型的整体检测性能。在NEU-DET数据集上进行实验,结果表明:相较于基准算法DETR,所提改进算法的平均精度均值提高6.1百分点,达到81.4%,提升了模型对钢材表面缺陷检测的精度;此外,检测帧率为44.2 帧/s,所提算法在检测速度和检测性能之间取得了较好的平衡。
Abstract:
Addressing the challenges posed by the varying scales of steel surface defects and the limited multi-scale feature processing capabilitied and accuracy of existing detection algorithms, in this study a steel surface defect detection method that integrates hybrid sampling and multi-attention collaboration was proposed. Firstly, an efficient channel feature extraction backbone was constructed to emphasize defect feature extraction against the complex background of steel surfaces. Secondly, a dual-attention collaborative feature pyramid was introduced to expand the network′s receptive field, thereby enhancing the capture of multi-scale defect features and improving the detection performance for small targets. Finally, a Transformer-based hybrid sampling strategy was designed to dynamically perceive defect regions, thereby boosting the overall detection performance of the model. Experimental comparisons on the NEU-DET dataset revealed that, compared to the baseline DETR algorithm, the improved algorithm achieved a 6.1 percentage point increase in mean average precision, reaching 81.4%, thereby enhancing the model′s accuracy in detecting steel surface defects. Additionally, with a detection speed of 44.2 frame/s, the proposed algorithm strikes a commendable balance between detection speed and performance.

参考文献/References:

[1]REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2025-02-08]. https:∥doi.org/10.48550/arXiv.1804.02767. 

[2]BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2025-02-08]. https:∥doi. org/ 10.48550/arXiv.2004.10934. 
[3]REN S Q, HE K M, GIRSHICK R, et al. Faster RCNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137-1149. 
[4]LIU Z, HU H, LIN Y T, et al. Swin Transformer V2: scaling up capacity and resolution[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11999-12009. 
[5]FERGUSON M K, RONAY A, LEE Y T, et al. Detection and segmentation of manufacturing defects with convolutional neural networks and transfer learning[J]. Smart and Sustainable Manufacturing Systems, 2018, 2(1): 137-164. 
[6]FU G Z, ZHANG Z G, LE W W, et al. A multi-scale pooling convolutional neural network for accurate steel surface defects classification[J]. Frontiers in Neurorobotics, 2023, 17: 1096083. 
[7]HE Y, SONG K C, MENG Q G, et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(4): 1493-1504. 
[8]LIU R Q, HUANG M, GAO Z M, et al. MSC-DNet: an efficient detector with multi-scale context for defect detection on strip steel surface[J]. Measurement, 2023, 209: 112467. 
[9]CARION N, MASSA F, SYNNAEVE G, et al. End-toend object detection with Transformers[C]∥Computer Vision-ECCV 2020.Cham: Springer, 2020: 213-229. 
[10] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection [EB/OL]. (2020-10-08)[2025-02-08]. https:∥ doi.org/10.48550/arXiv.2010.04159. 
[11] LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. (2022-06-28)[2025-02-08]. https:∥doi. org/ 10.48550/arXiv.2201.12329. 
[12] LI F, ZHANG H, LIU S, et al. DN-DETR: accelerate DETR training by introducing query denoising[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(4):2239-2251. 
[13] ZHANG H, LI F, LIU S L, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. (2022-03-07)[2025-02-08]. https:∥doi.org/10.48550/arXiv.2203.03605. 
[14]WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]∥2020 IEEE/CVF Conference onComputer Vision and Pattern Recognition. Piscataway: IEEE , 2020: 11531-11539. 
[15] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[EB/OL].(2020-0526)[2025-02-08]. https:/ /doi. org/10.48550/arXiv. 2005.12872. 
[16] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE, 2017: 936-944. 
[17] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018: 7132-7141. 
[18]肖进胜, 赵陶, 周剑, 等. 基于上下文增强和特征提纯的小目标检测网络[J]. 计算机研究与发展, 2023, 60(2): 465-474. 
XIAO J S, ZHAO T, ZHOU J, et al. Small target detection network based on context augmentation and feature refinement[J]. Journal of Computer Research and Development, 2023, 60(2): 465-474. 
[19]魏明军, 王镆涵, 刘亚志, 等. 基于特征融合和混合注意力的小目标检测[J]. 郑州大学学报(工学版), 2024, 45(3): 72-79. 
WEI M J, WANG M H, LIU Y Z, et al. Small object detection based on feature fusion and mixed attention[J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(3): 72-79. 
[20]薛均晓, 武雪程, 王世豪, 等. 基于改进YOLOv4的自然人群口罩佩戴检测方法[J]. 郑州大学学报(工学版), 2022, 43(4): 16-22. 
XUE J X, WU X C, WANG S H, et al. A method on mask wearing detection of natural population based on improved YOLOv4[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(4): 16-22. 
[21]WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for realtime object detectors[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475. 
[22] REIS D, KUPEC J, HONG J, et al. Real-time flying object detection with YOLOv8[EB/OL]. (2023-05-17) [2025-02-08]. https:∥doi. org/10.48550/ arXiv.2305.09972. 
[23]WANG A, CHEN H, LIU L H, et al. YOLOv10: realtime end-to-end object detection[EB/OL]. (2024-0523) [2025-02-08]. https:∥doi. org/10.48550/ arXiv.2405.14458. 
[24] ROH B, SHIN J, SHIN W, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity [EB/OL]. (2021-11-29)[2025-02-08]. https:∥ doi.org/10.48550/arXiv.2111.14330. 
[25] CHENG X, YU J B. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 70: 2503911.

相似文献/References:

[1]张 震,陈可鑫,陈云飞.优化聚类和引入 CBAM 的 YOLOv5 管制刀具检测[J].郑州大学学报(工学版),2023,44(05):40.[doi:10.13705/j.issn.1671-6833.2022.05.015]
 ZHANG Zhen,CHEN Kexin,CHEN Yunfei.YOLOv5 with Optimized Clustering and CBAM for Controlled Knife Detection[J].Journal of Zhengzhou University (Engineering Science),2023,44(05):40.[doi:10.13705/j.issn.1671-6833.2022.05.015]
[2]崔建明,蔺繁荣,张 迪,等.基于有向图的强化学习自动驾驶轨迹预测[J].郑州大学学报(工学版),2023,44(05):53.[doi:10.13705/j.issn.1671-6833.2023.05.002]
 CUI Jianming,LIN Fanrong,ZHANG Di,et al.Reinforcement Learning Autonomous Driving Trajectory Prediction Based on Directed Graph[J].Journal of Zhengzhou University (Engineering Science),2023,44(05):53.[doi:10.13705/j.issn.1671-6833.2023.05.002]
[3]李卫军,张新勇,高庾潇,等.基于门控时空注意力的视频帧预测模型[J].郑州大学学报(工学版),2024,45(01):70.[doi:10.13705/j.issn.1671-6833.2024.01.017]
 LI Weijun,ZHANG Xinyong,GAO Yuxiao,et al.Video Frame Prediction Model Based on Gated Spatio-Temporal Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(05):70.[doi:10.13705/j.issn.1671-6833.2024.01.017]
[4]王 瑜,毕 玉,石健彤,等.基于注意力与多级特征融合的 YOLOv5 算法[J].郑州大学学报(工学版),2024,45(03):38.[doi:10. 13705 / j. issn. 1671-6833. 2023. 06. 009]
 LIU Xin,XU Hongzhen,LIU Aihua,et al.Geological Named Entity Recognition Based on MacBERT and R-Drop[J].Journal of Zhengzhou University (Engineering Science),2024,45(05):38.[doi:10. 13705 / j. issn. 1671-6833. 2023. 06. 009]
[5]魏明军,王镆涵,刘亚志,等.基于特征融合和混合注意力的小目标检测[J].郑州大学学报(工学版),2024,45(03):72.[doi:10. 13705/ j. issn. 1671-6833. 2024. 03. 001]
 WEI Mingjun,WANG Mohan,LIU Yazhi,et al.Small Object Detection Based on Feature Fusion and Mixed Attention[J].Journal of Zhengzhou University (Engineering Science),2024,45(05):72.[doi:10. 13705/ j. issn. 1671-6833. 2024. 03. 001]
[6]廖晓辉,谢子晨,辛忠良,等.基于轻量化YOLOv5的电气设备外部缺陷检测[J].郑州大学学报(工学版),2024,45(04):117.[doi:10.13705/ j.issn.1671-6833.2024.04.010]
 LIAO Xiaohui,XIE Zichen,XIN Zhongliang,et al.Electrical Equipment External Defect Detection Based on Lightweight YOLOv5[J].Journal of Zhengzhou University (Engineering Science),2024,45(05):117.[doi:10.13705/ j.issn.1671-6833.2024.04.010]
[7]林 楠,唐凯鹏,牛勇鹏,等.基于双阶段特征提取网络的 ECG 降噪分类算法[J].郑州大学学报(工学版),2024,45(05):61.[doi:10.13705/j.issn.1671-6833.2024.05.005]
 LIN Nan,TANG Kaipeng,NIU Yongpeng,et al.An ECG Denoising and Classification Algorithm Based on Two-stage Feature Extraction Network[J].Journal of Zhengzhou University (Engineering Science),2024,45(05):61.[doi:10.13705/j.issn.1671-6833.2024.05.005]
[8]林予松,李孟娅,李英豪,等.基于GAN和多尺度空间注意力的多模态医学图像融合[J].郑州大学学报(工学版),2025,46(01):1.[doi:10.13705/j.issn.1671-6833.2025.01.001]
 LIN Yusong,,et al.Multimodal Medical Image Fusion Based on GAN and Multiscale Spatial Attention[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):1.[doi:10.13705/j.issn.1671-6833.2025.01.001]
[9]赵 冬,李亚瑞,王文相,等.基于动态融合注意力机制的电力负荷缺失数据填充模型[J].郑州大学学报(工学版),2025,46(02):111.[doi:10.13705/j.issn.1671-6833.2024.05.004]
 ZHAO Dong,LI Yarui,WANG Wenxiang,et al.Power Load Missing Data Imputation Model Based on Dynamic Fusion Attention Mechanism[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):111.[doi:10.13705/j.issn.1671-6833.2024.05.004]
[10]燕 雨,荆宇超,史孟翔,等.基于改进 YOLOv5 算法的钢材表面缺陷检测[J].郑州大学学报(工学版),2025,46(04):93.[doi:10.13705/j.issn.1671-6833.2025.01.007]
 YAN Yu,JING Yuchao,SHI Mengxiang,et al.Steel Surface Defect Detection Based on Improved YOLOv5 Algorithm[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):93.[doi:10.13705/j.issn.1671-6833.2025.01.007]

更新日期/Last Update: 2025-09-19