«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2025.05.018]
点击复制

基于剪枝与后门遗忘的深度神经网络后门移除方法()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 47
期数:: 2026年02期

页码:: 27-34

栏目:

出版日期:: 2026-02-13

文章信息/Info

Title:: Backdoor Removal Method for Deep Neural Networks Based on Pruning and Backdoor Unlearning

文章编号:: 1671-6833(2026)02-0027-08

作者:: 李学相¹; 高亚飞¹; 夏辉丽²; 王　超¹; 刘明林¹; 1.郑州大学网络空间安全学院,河南郑州 450002;2.郑州经贸学院河南省多模态感知与智能交互技术工程研究中心,河南郑州 451191

Author(s):: LI Xuexiang¹; GAO Yafei¹; XIA Huili²; WANG Chao¹; LIU Minglin¹; 1.School of Cyber Science and Engineering,Zhengzhou University, Zhengzhou 450002,China; 2.Henan Multimodal Perception and Intelligent Interaction Technology Engineering Research Center, Zhengzhou University of Economics and Business, Zhengzhou 451191,China

关键词:: 深度神经网络; 后门攻击; 后门防御; 预激活分布; 对抗性后门遗忘

Keywords:: deep neural network; backdoor attack; backdoor defense; pre-activation distribution; adversarial backdoor unlearning

分类号:: TP309 TP181

DOI:: 10.13705/j.issn.1671-6833.2025.05.018

文献标志码:: A

摘要:: 后门攻击对深度神经网络的安全性构成了严重威胁。现有的大多数后门防御方法依赖部分原始训练数据来移除模型中的后门,但在数据访问受限这一现实场景中,这些方法在移除模型后门时的效果不佳,并且对模型的原始精度产生较大影响。针对上述问题,提出了一种基于剪枝和后门遗忘的无数据后门移除方法(DBR-PU)。首先,用所提方法分析模型神经元在合成数据集上的预激活分布差异,以此来定位可疑神经元;其次,通过对这些可疑神经元进行剪枝操作来降低后门对模型的影响;最后,使用对抗性后门遗忘策略来进一步消除模型对少量残留后门信息的内部响应。在CIFAR10和GTSRB数据集上对6种主流后门攻击方法进行实验,结果表明:在数据访问受限的条件下,所提方法在准确率上可以与最优的基准防御方法保持较小差距,并且在降低攻击成功率方面表现最好。

Abstract:: Backdoor attacks pose a serious threat to the security of deep neural networks. Most existing backdoor defense methods relied on partial original training data to remove backdoor from models. However, in real-world scenarios where these data access was limited, these methods performed poorly in eliminating backdoor and often significantly impact the model′s original accuracy. To address these issues, in this study proposes a data-free backdoor removal method was proposed based on pruning and backdoor unlearning (DBR-PU). Specifically, the proposed method first analyzed the pre-activation distribution differences of model neurons on a synthetic dataset to identify suspicious neurons. Then, it reduced the impact of backdoor by pruning these suspicious neurons. Finally, an adversarial backdoor unlearning strategy was employed to further eliminate the model′s internal response to any residual backdoor information. Extensive experiments on the CIFAR10 and GTSRB datasets against six mainstream backdoor attack methods demonstrated that, under data access constraints, the proposed method achieved a minimal accuracy gap compared to the best baseline defense methods and performed the best in reducing attack success rates.

参考文献/References:

[1]罗荣辉,袁航,钟发海,等.基于卷积神经网络的道路拥堵识别研究[J].郑州大学学报(工学版), 2019,40(2):21-25.

LUO R H, YUAN H, ZHONG F H, et al. Traffic jam detection based on convolutional neural network[J]. Journal of Zhengzhou University (Engineering Science), 2019,40(2):21-25.

[2]LI Y M, JIANG Y, LI Z F, et al. Backdoor learning: a survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 5-22.

[3]GU T Y, LIU K, DOLAN-GAVITT B, et al. BadNets: evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244.

[4]NGUYEN A, TRAN A. WaNet: imperceptible warpingbased backdoor attack[EB/OL]. (2021-02-20)[202508-16].https:∥doi.org/10.48550/arXiv.2102.10369.

[5]BARNI M, KALLAS K, TONDI B. A new backdoor attack in CNNS by training set corruption without label poisoning[C]∥2019 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2019: 101-105.

[6]TRAN B, LI J, MADRY A. Spectral signatures in backdoor attacks[EB/OL]. (2018-11-01)[2025-08-16].https:∥doi.org/10.48550/arXiv.1811.00636.

[7]WU D X, WANG Y S. Adversarial neuron pruning purifies backdoored deep models[EB/OL]. (2021-10-27)[2025-08-16]. https:∥doi. org/10. 48550/arXiv.2110.14430.

[8]ZENG Y, CHEN S, PARK W, et al. Adversarial unlearning of backdoors via implicit hypergradient[EB/OL]. (2021-10-07)[2025-08-16]. https:∥doi. org/10.48550/arXiv.2110.03735.

[9]ZHENG R K, TANG R J, LI J Z, et al. Pre-activation distributions expose backdoor neurons[J]. Advances in Neural Information Processing Systems, 2022, 35: 1866718680.

[10] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856.

[11] CAI R, ZHANG Z Y, CHEN T L, et al. Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets[J]. Advances in Neural Information Processing Systems, 2022, 35: 33876-33889.

[12] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks[C]∥2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE, 2019: 3514-3522.

[13] FANG G F, SONG J, SHEN C C, et al. Data-free adversarial distillation[EB/OL]. (2019-12-23)[2025-0816].https:∥arxiv.org/abs/1912.11006.

[14] SHI L C, JIAO Y Y, LU B L. Differential entropy feature for EEG-based vigilance estimation[C]∥2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Piscataway:IEEE, 2013: 6627-6630.

[15] CHEN X Y, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[EB/OL]. (2017-12-15)[2025-08-16]. https:∥arxiv.org/abs/1712.05526.

[16] NGUYEN T A, TRAN A. Input-aware dynamic backdoor attack[J]. Advances in Neural Information Processing Systems, 2020, 33: 3454-3464.

[17]WANG Z T, ZHAI J, MA S Q. BppAttack: stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2022: 15074-15084.

[18] LIU K, DOLAN-GAVITT B, GARG S. Fine-pruning: defending against backdooring attacks on deep neural networks[C]∥Research in Attacks, Intrusions, and Defenses. Cham: Springer, 2018: 273-294.

[19]WU B Y, CHEN H R, ZHANG M D, et al. Backdoorbench: a comprehensive benchmark of backdoor learning[J]. Advances in Neural Information Processing Systems, 2022, 35: 10546-10559.

[20] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[EB/OL]. (2017-06-26)[2025-08-16].https:∥arxiv.org/abs/1706.08500.

[21] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11):2579-2605.

相似文献/References:

[1]赵淑芳,董小雨.基于改进的LSTM深度神经网络语音识别研究[J].郑州大学学报(工学版),2018,39(05):63.[doi:10.13705/j.issn.1671-6833.2018.02.004]
　Zhao Shufang,Dong Xiaoyu.Research on Speech Recognition Based on Improved LSTM Deep Neural Network[J].Journal of Zhengzhou University (Engineering Science),2018,39(02):63.[doi:10.13705/j.issn.1671-6833.2018.02.004]

更新日期/Last Update: 2026-03-04

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

相似文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics