Visual Detection of Steel Surface Defects Based on Transformer and Multi-attention

NAVIGATE

Table of Contents

STATISTICS

Viewed802

Downloads1102

Visual Detection of Steel Surface Defects Based on Transformer and Multi-attention

[HTML] PDF下载 (1102)

[1]HAN Huijian,XING Huaiyu,ZHANG Yunfeng,et al.Visual Detection of Steel Surface Defects Based on Transformer and Multi-attention[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):69-76.[doi:10.13705/j.issn.1671-6833.2025.05.009]

Copy

Journal of Zhengzhou University (Engineering Science)[ISSN 1671-6833/CN 41-1339/T] Volume: 46 Number of periods: 2025 05 Page number: 69-76 Column: Public date: 2025-08-10

Title:: Visual Detection of Steel Surface Defects Based on Transformer and Multi-attention

Author(s):: HAN Huijian; XING Huaiyu; ZHANG Yunfeng; ZHANG Rui; School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China

Keywords:: defect detection; attention mechanism; Transformer; hybrid sampling; DETR

CLC:: TP391TP18

DOI:: 10.13705/j.issn.1671-6833.2025.05.009

Abstract:: Addressing the challenges posed by the varying scales of steel surface defects and the limited multi-scale feature processing capabilitied and accuracy of existing detection algorithms, in this study a steel surface defect detection method that integrates hybrid sampling and multi-attention collaboration was proposed. Firstly, an efficient channel feature extraction backbone was constructed to emphasize defect feature extraction against the complex background of steel surfaces. Secondly, a dual-attention collaborative feature pyramid was introduced to expand the network′s receptive field, thereby enhancing the capture of multi-scale defect features and improving the detection performance for small targets. Finally, a Transformer-based hybrid sampling strategy was designed to dynamically perceive defect regions, thereby boosting the overall detection performance of the model. Experimental comparisons on the NEU-DET dataset revealed that, compared to the baseline DETR algorithm, the improved algorithm achieved a 6.1 percentage point increase in mean average precision, reaching 81.4%, thereby enhancing the model′s accuracy in detecting steel surface defects. Additionally, with a detection speed of 44.2 frame/s, the proposed algorithm strikes a commendable balance between detection speed and performance.

References:: [1]REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08)[2025-02-08]. https:∥doi.org/10.48550/arXiv.1804.02767.
[2]BOCHKOVSKIY A, WANG C Y, LIAO H M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. (2020-04-23)[2025-02-08]. https:∥doi. org/ 10.48550/arXiv.2004.10934.
[3]REN S Q, HE K M, GIRSHICK R, et al. Faster RCNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137-1149.
[4]LIU Z, HU H, LIN Y T, et al. Swin Transformer V2: scaling up capacity and resolution[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 11999-12009.
[5]FERGUSON M K, RONAY A, LEE Y T, et al. Detection and segmentation of manufacturing defects with convolutional neural networks and transfer learning[J]. Smart and Sustainable Manufacturing Systems, 2018, 2(1): 137-164.
[6]FU G Z, ZHANG Z G, LE W W, et al. A multi-scale pooling convolutional neural network for accurate steel surface defects classification[J]. Frontiers in Neurorobotics, 2023, 17: 1096083.
[7]HE Y, SONG K C, MENG Q G, et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(4): 1493-1504.
[8]LIU R Q, HUANG M, GAO Z M, et al. MSC-DNet: an efficient detector with multi-scale context for defect detection on strip steel surface[J]. Measurement, 2023, 209: 112467.
[9]CARION N, MASSA F, SYNNAEVE G, et al. End-toend object detection with Transformers[C]∥Computer Vision-ECCV 2020.Cham: Springer, 2020: 213-229.
[10] ZHU X Z, SU W J, LU L W, et al. Deformable DETR: deformable transformers for end-to-end object detection [EB/OL]. (2020-10-08)[2025-02-08]. https:∥ doi.org/10.48550/arXiv.2010.04159.
[11] LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. (2022-06-28)[2025-02-08]. https:∥doi. org/ 10.48550/arXiv.2201.12329.
[12] LI F, ZHANG H, LIU S, et al. DN-DETR: accelerate DETR training by introducing query denoising[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(4):2239-2251.
[13] ZHANG H, LI F, LIU S L, et al. DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection[EB/OL]. (2022-03-07)[2025-02-08]. https:∥doi.org/10.48550/arXiv.2203.03605.
[14]WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks [C]∥2020 IEEE/CVF Conference onComputer Vision and Pattern Recognition. Piscataway: IEEE , 2020: 11531-11539.
[15] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[EB/OL].(2020-0526)[2025-02-08]. https:/ /doi. org/10.48550/arXiv. 2005.12872.
[16] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE, 2017: 936-944.
[17] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2018: 7132-7141.
[18]肖进胜, 赵陶, 周剑, 等. 基于上下文增强和特征提纯的小目标检测网络[J]. 计算机研究与发展, 2023, 60(2): 465-474.
XIAO J S, ZHAO T, ZHOU J, et al. Small target detection network based on context augmentation and feature refinement[J]. Journal of Computer Research and Development, 2023, 60(2): 465-474.
[19]魏明军, 王镆涵, 刘亚志, 等. 基于特征融合和混合注意力的小目标检测[J]. 郑州大学学报(工学版), 2024, 45(3): 72-79.
WEI M J, WANG M H, LIU Y Z, et al. Small object detection based on feature fusion and mixed attention[J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(3): 72-79.
[20]薛均晓, 武雪程, 王世豪, 等. 基于改进YOLOv4的自然人群口罩佩戴检测方法[J]. 郑州大学学报(工学版), 2022, 43(4): 16-22.
XUE J X, WU X C, WANG S H, et al. A method on mask wearing detection of natural population based on improved YOLOv4[J]. Journal of Zhengzhou University (Engineering Science), 2022, 43(4): 16-22.
[21]WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for realtime object detectors[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 7464-7475.
[22] REIS D, KUPEC J, HONG J, et al. Real-time flying object detection with YOLOv8[EB/OL]. (2023-05-17) [2025-02-08]. https:∥doi. org/10.48550/ arXiv.2305.09972.
[23]WANG A, CHEN H, LIU L H, et al. YOLOv10: realtime end-to-end object detection[EB/OL]. (2024-0523) [2025-02-08]. https:∥doi. org/10.48550/ arXiv.2405.14458.
[24] ROH B, SHIN J, SHIN W, et al. Sparse DETR: efficient end-to-end object detection with learnable sparsity [EB/OL]. (2021-11-29)[2025-02-08]. https:∥ doi.org/10.48550/arXiv.2111.14330.
[25] CHENG X, YU J B. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 70: 2503911.

Similar References:

Memo

Last Update: 2025-09-19