基于MHTD-YOLO11n的钢材表面缺陷检测算法

doi:10.16088/j.issn.1001-6600.2025073101

摘要/Abstract

摘要： 钢材表面缺陷检测面临缺陷形态多样、结构复杂、小目标占比高且伴随复杂环境等因素的干扰,而现有缺陷检测模型往往结构复杂、参数量庞大、检测精度和实时性较差。针对上述问题,本文提出一种基于YOLO11n的轻量高效钢材缺陷检测算法MHTD-YOLO11n。该方法首先引入多尺度分组膨胀卷积(multi-scale grouped dilated convolution,MSGDC)模块,通过集成不同膨胀率的分组卷积实现多尺度特征融合,提升对不同种类缺陷的检测能力;随后通过引用分层互补注意力混合模块(hierarchical reciprocal attention mixer,H-RAMi),补偿因下采样特征导致的像素级信息损失;接着设计C2PSA_TPA模块,通过引用张量积注意力(tensor product attention,TPA),显著压缩推理时的KV缓存规模;最后重构特征交互模块(C3K2_DFF),使网络能够在更大的感受野下有效结合多尺度信息,促进检测精度和速度的提升。实验结果表明,相较于YOLO11n算法,MHTD-YOLO11n算法的mAP值和召回率分别提升4.3和9.1个百分点,检测速度达到258.3 frame/s,参数量和计算量分别降低1.42×10⁶和3.4×10⁹,满足工业质检场景对高精度与实时性的双重需求。

关键词: 计算机图像处理, 钢材表面缺陷, 缺陷检测, 目标检测, YOLO11n, 注意力机制

Abstract: Steel surface defects exhibit diverse morphologies, complex structures, a high proportion of small targets, and susceptibility to interference from environmental factors, while existing defect detection models suffer from complex structures, large parameter counts, and poor detection accuracy and real-time performance. To address these issues, a lightweight and efficient steel defect detection algorithm (MHTD-YOLO11n) based on YOLO11n is proposed in this studyly. Firstly, a multi-scale grouped dilated convolution (MSGDC) module is introduced in this method, in which grouped convolutions with different dilation rates are integrated to achieve multi-scale feature fusion and enhance the detection capability for various types of defects. Subsequently, a Hierarchical Reciprocal Attention Mixer (H-RAMi) module is incorporated to compensate for pixel-level information loss caused by downsampled features. A C2PSA_TPA module is then designed, in which the KV cache size during inference is significantly compressed by leveraging Tensor Product Attention (TPA). Finally, the feature interaction module (C3K2_DFF) is reconfigured to enable the network to effectively combine multi-scale information under a larger receptive field, promoting improvements in both detection accuracy and speed.Experimental results show that compared with the YOLO11n algorithm, the mAP value and recall rate of the MHTD-YOLO11n algorithm are increased by 4.3 and 9.1 percentage points respectively, a detection speed of 258.3 frame/s is achieved, the parameter count and computational volume are reduced by 1.42×10⁶ and 3.4×10⁹ respectively, and the dual requirements of high accuracy and real-time performance in industrial quality inspection scenarios are met.

Key words: computer image processing, steel surface defects, defect detection, object detection, YOLO11n, attention mechanism

中图分类号: TP391.41

钱俊磊, 王熹之, 曾凯, 杜学强, 刘贺, 朱立光. 基于MHTD-YOLO11n的钢材表面缺陷检测算法[J]. 广西师范大学学报（自然科学版）, 2026, 44(3): 60-74.

QIAN Junlei, WANG Xizhi, ZENG Kai, DU Xueqiang, LIU He, ZHU Liguang. Steel Surface Defect Detection Algorithm Based on MHTD-YOLO11n[J]. Journal of Guangxi Normal University(Natural Science Edition), 2026, 44(3): 60-74.

参考文献

[1] MORDIA R, KUMAR VERMA A. Visual techniques for defects detection in steel products: a comparative study[J]. Engineering Failure Analysis, 2022, 134: 106047. DOI: 10.1016/j.engfailanal.2022.106047.
[2] 邓能辉, 侯睿, 叶俊明. 基于深度学习的圆钢表面缺陷检测系统[J]. 中国冶金, 2022, 32(12): 113-121. DOI: 10.13228/j.boyuan.issn1006-9356.20220449.
[3] LIANG F T, ZHOU Y, CHEN X, et al. Review of target detection technology based on deep learning[C]// CCEAI'21: Proceedings of the 5th International Conference on Control Engineering and Artificial Intelligence. New York, NY: Association for Computing Machinery, 2021: 132-135. DOI: 10.1145/3448218.3448234.
[4] 李跃, 王子铭, 李鑫林, 等. 带钢表面缺陷检测方法研究进展[J]. 钢铁研究学报, 2023, 35(8): 950-962. DOI: 10.13228/j.boyuan.issn1001-0963.20220363.
[5] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
[6] QIAN H M, WANG H L, FENG S, et al. FESSD: SSD target detection based on feature fusion and feature enhancement[J]. Journal of Real-Time Image Processing, 2023, 20(1): 2. DOI: 10.1007/s11554-023-01258-y.
[7] TERVEN J, CÓRDOVA-ESPARZA D M, ROMERO-GONZÁLEZ J A. A comprehensive review of YOLO architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS[J]. Machine Learning and Knowledge Extraction, 2023, 5(4):1680-1716. DOI: 10.3390/make5040083.
[8] JIANG P Y, ERGU D J, LIU F Y, et al. A review of yolo algorithm developments[J]. Procedia Computer Science, 2022, 199: 1066-1073. DOI: 10.1016/j.procs.2022.01.135.
[9] 马磊, 李晔, 王宇翔. YOLOv8-FD: YOLOv8改进的钢板表面缺陷检测方法[J]. 计算机工程与应用, 2024, 60(24): 211-221. DOI: 10.3778/j.issn.1002-8331.2406-0223.
[10] 梁礼明, 龙鹏威, 金家新, 等. 基于改进YOLOv8s的钢材表面缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(3): 512-522. DOI: 10.3785/j.issn.1008-973X.2025.03.009.
[11] 窦智, 高浩然, 刘国奇, 等. 轻量化YOLOv8的小样本钢板缺陷检测算法[J]. 计算机工程与应用, 2024, 60(9): 90-100. DOI: 10.3778/j.issn.1002-8331.2311-0070.
[12] 张航, 周毅, 邱宇峰. 融合HGnetv2和注意力机制的钢材表面缺陷检测方法[J]. 电子测量与仪器学报, 2025, 39(1): 36-49. DOI: 10.13382/j.jemi.B2407618.
[13] LIAO L F, SONG C, WU S L, et al. A novel YOLOv10-based algorithm for accurate steel surface defect detection[J]. Sensors, 2025, 25(3): 769. DOI: 10.3390/s25030769.
[14] SU P, HAN H Z, LIU M, et al. MOD-YOLO: rethinking the YOLO architecture at the level of feature information and applying it to crack detection[J]. Expert Systems with Applications, 2024, 237: 121346. DOI: 10.1016/j.eswa.2023.121346.
[15] AKHYAR F, LIU Y, HSU C Y, et al. FDD: a deep learning-based steel defect detectors[J]. The International Journal of Advanced Manufacturing Technology, 2023, 126(3): 1093-1107. DOI: 10.1007/s00170-023-11087-9.
[16] DAMACHARLA P,ACHUTH RAO M V, RINGENBERG J, et al. TLU-Net: a deep learning approach for automatic steel surface defect detection[C]// 2021 International Conference on Applied Artificial Intelligence (ICAPAI). Piscataway NJ: IEEE, 2021: 1-6. DOI: 10.1109/ICAPAI49758.2021.9462060.
[17] URAON P K, VERMA A, BADHOLIA A. Steel sheet defect detection using feature pyramid network and RESNET[C]// 2022 International Conference on Edge Computing and Applications (ICECAA). Piscataway NJ: IEEE, 2022: 1543-1550. DOI: 10.1109/ICECAA55415.2022.9936318.
[18] 周建新, 许兴博. 改进Steel-YOLO的钢材表面缺陷检测[J]. 东北师大学报(自然科学版), 2026, 58(1): 65-75. DOI: 10.16163/j.cnki.dslkxb202404200002.
[19] GAO T, ZHANG Y, ZHANG Z Y, et al. BHViT: binarized hybrid vision transformer[C]// 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2025: 3563-3572. DOI: 10.1109/CVPR52734.2025.00337.
[20] LIU Z C, SHEN Z Q, SAVVIDES M, et al. ReActNet: towards precise binary neural network with generalized activation functions[C]// Computer Vision-ECCV 2020: LNCS Volume 12359. Cham: Springer Nature Switzerland AG, 2020: 143-159. DOI: 10.1007/978-3-030-58568-6_9.
[21] CHOI H, NA C, OH J, et al. Reciprocal attention mixing transformer for lightweight image restoration[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Los Alamitos, CA: IEEE Computer Society, 2024: 5992-6002. DOI: 10.1109/CVPRW63382.2024.00606.
[22] GAO Z H, AI D N, LI W T, et al. N-gram swin transformer for CT image super-resolution[C]// Extended Reality: LNCS Volume 15461. Singapore: Springer Nature Singapore Pte Ltd., 2024: 136-148. DOI: 10.1007/978-981-96-3679-2_9.
[23] ZHANG Y F, LIU Y F, YUAN H Z, et al. Tensor product attention is all you need[EB/OL]. (2025-05-29)[2025-07-31]. https://arxiv.org/abs/2501.06425. DOI: 10.48550/arXiv.2501.06425.
[24] YANG J, QIU P J, ZHANG Y C, et al. D-Net: dynamic large kernel with dynamic feature fusion for volumetric medical image segmentation[J]. Biomedical Signal Processing and Control, 2026, 113(Part B): 108837. DOI: 10.1016/j.bspc.2025.108837.
[25] CHEN J R, KAO S H, HE H, et al. Run, don't walk: chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 12021-12031. DOI: 10.1109/CVPR52729.2023.01157.
[26] SUN J, PENG Y F, CHEN C, et al. ESC-YOLO: optimizing apple fruit recognition with efficient spatial and channel features in YOLOX[J]. Journal of Real-Time Image Processing, 2024, 21(5): 162. DOI: 10.1007/s11554-024-01540-7.
[27] LI H L, LI J, WEI H B, et al. Slim-neck by GSConv: a lightweight-design for real-time detector architectures[J]. Journal of Real-Time Image Processing, 2024, 21(3): 62. DOI: 10.1007/s11554-024-01436-6.
[28] QIAO S Y, CHEN L C, YUILLE A. DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 10208-10219. DOI: 10.1109/CVPR46437.2021.01008.
[29] JIN X M, LIANG X Y, DENG P F. Lightweight daylily grading and detection model based on improved YOLOv10[J]. Smart Agriculture, 2024, 6(5): 108-118. DOI: 10.12133/j.smartag.SA202407022.
[30] DENG Y H, GUO D, GUO X F, et al. MQA: answering the question via robotic manipulation[EB/OL]. (2023-02-21)[2025-07-31]. https://arxiv.org/abs/2003.04641v4. DOI: 10.48550/arXiv.2003.04641.
[31] HUDSON D A, MANNING C D. GQA: a new dataset for real-world visual reasoning and compositional question answering[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 6693-6702. DOI: 10.1109/CVPR.2019.00686.
[32] ZHENG C, SONG Y X. Personalized multi-head self-attention network for news recommendation[J]. Neural Networks, 2025, 181: 106824. DOI: 10.1016/j.neunet.2024.106824.
[33] TAN H C, LIU X P, YIN B C, et al. MHSA-Net: multihead self-attention network for occluded person re-identification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(11): 8210-8224. DOI: 10.1109/TNNLS.2022.
3144163.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed