[1]李学相,高亚飞,夏辉丽,等.基于剪枝与后门遗忘的深度神经网络后门移除方法[J].郑州大学学报(工学版),2026,47(02):27-34.[doi:10.13705/j.issn.1671-6833.2025.05.018]
 LI Xuexiang,GAO Yafei,XIA Huili,et al.Backdoor Removal Method for Deep Neural Networks Based on Pruning and Backdoor Unlearning[J].Journal of Zhengzhou University (Engineering Science),2026,47(02):27-34.[doi:10.13705/j.issn.1671-6833.2025.05.018]
点击复制

基于剪枝与后门遗忘的深度神经网络后门移除方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
47
期数:
2026年02期
页码:
27-34
栏目:
出版日期:
2026-02-13

文章信息/Info

Title:
Backdoor Removal Method for Deep Neural Networks Based on Pruning and Backdoor Unlearning
文章编号:
1671-6833(2026)02-0027-08
作者:
李学相1 高亚飞1 夏辉丽2 王 超1 刘明林1
1.郑州大学 网络空间安全学院,河南 郑州 450002;2.郑州经贸学院 河南省多模态感知与智能交互技术工程研究中心,河南 郑州 451191
Author(s):
LI Xuexiang1 GAO Yafei1 XIA Huili2 WANG Chao1 LIU Minglin1
1.School of Cyber Science and Engineering,Zhengzhou University, Zhengzhou 450002,China; 2.Henan Multimodal Perception and Intelligent Interaction Technology Engineering Research Center, Zhengzhou University of Economics and Business, Zhengzhou 451191,China
关键词:
深度神经网络 后门攻击 后门防御 预激活分布 对抗性后门遗忘
Keywords:
deep neural network backdoor attack backdoor defense pre-activation distribution adversarial backdoor unlearning
分类号:
TP309 TP181
DOI:
10.13705/j.issn.1671-6833.2025.05.018
文献标志码:
A
摘要:
后门攻击对深度神经网络的安全性构成了严重威胁。现有的大多数后门防御方法依赖部分原始训练数据来移除模型中的后门,但在数据访问受限这一现实场景中,这些方法在移除模型后门时的效果不佳,并且对模型的原始精度产生较大影响。针对上述问题,提出了一种基于剪枝和后门遗忘的无数据后门移除方法(DBR-PU)。首先,用所提方法分析模型神经元在合成数据集上的预激活分布差异,以此来定位可疑神经元;其次,通过对这些可疑神经元进行剪枝操作来降低后门对模型的影响;最后,使用对抗性后门遗忘策略来进一步消除模型对少量残留后门信息的内部响应。在CIFAR10和GTSRB数据集上对6种主流后门攻击方法进行实验,结果表明:在数据访问受限的条件下,所提方法在准确率上可以与最优的基准防御方法保持较小差距,并且在降低攻击成功率方面表现最好。
Abstract:
Backdoor attacks pose a serious threat to the security of deep neural networks. Most existing backdoor defense methods relied on partial original training data to remove backdoor from models. However, in real-world scenarios where these data access was limited, these methods performed poorly in eliminating backdoor and often significantly impact the model′s original accuracy. To address these issues, in this study proposes a data-free backdoor removal method was proposed based on pruning and backdoor unlearning (DBR-PU). Specifically, the proposed method first analyzed the pre-activation distribution differences of model neurons on a synthetic dataset to identify suspicious neurons. Then, it reduced the impact of backdoor by pruning these suspicious neurons. Finally, an adversarial backdoor unlearning strategy was employed to further eliminate the model′s internal response to any residual backdoor information. Extensive experiments on the CIFAR10 and GTSRB datasets against six mainstream backdoor attack methods demonstrated that, under data access constraints, the proposed method achieved a minimal accuracy gap compared to the best baseline defense methods and performed the best in reducing attack success rates.

参考文献/References:

[1]罗荣辉,袁航,钟发海,等.基于卷积神经网络的道路拥堵识别研究[J].郑州大学学报(工学版), 2019,40(2):21-25.

LUO R H, YUAN H, ZHONG F H, et al. Traffic jam detection based on convolutional neural network[J]. Journal of Zhengzhou University (Engineering Science), 2019,40(2):21-25.
[2]LI Y M, JIANG Y, LI Z F, et al. Backdoor learning: a survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 5-22.
[3]GU T Y, LIU K, DOLAN-GAVITT B, et al. BadNets: evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244.
[4]NGUYEN A, TRAN A. WaNet: imperceptible warpingbased backdoor attack[EB/OL]. (2021-02-20)[202508-16].https:∥doi.org/10.48550/arXiv.2102.10369.
[5]BARNI M, KALLAS K, TONDI B. A new backdoor attack in CNNS by training set corruption without label poisoning[C]∥2019 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2019: 101-105.
[6]TRAN B, LI J, MADRY A. Spectral signatures in backdoor attacks[EB/OL]. (2018-11-01)[2025-08-16].https:∥doi.org/10.48550/arXiv.1811.00636.
[7]WU D X, WANG Y S. Adversarial neuron pruning purifies backdoored deep models[EB/OL]. (2021-10-27)[2025-08-16]. https:∥doi. org/10. 48550/arXiv.2110.14430.
[8]ZENG Y, CHEN S, PARK W, et al. Adversarial unlearning of backdoors via implicit hypergradient[EB/OL]. (2021-10-07)[2025-08-16]. https:∥doi. org/10.48550/arXiv.2110.03735.
[9]ZHENG R K, TANG R J, LI J Z, et al. Pre-activation distributions expose backdoor neurons[J]. Advances in Neural Information Processing Systems, 2022, 35: 1866718680.
[10] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856.
[11] CAI R, ZHANG Z Y, CHEN T L, et al. Randomized channel shuffling: minimal-overhead backdoor attack detection without clean datasets[J]. Advances in Neural Information Processing Systems, 2022, 35: 33876-33889.
[12] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks[C]∥2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE, 2019: 3514-3522.
[13] FANG G F, SONG J, SHEN C C, et al. Data-free adversarial distillation[EB/OL]. (2019-12-23)[2025-0816].https:∥arxiv.org/abs/1912.11006.
[14] SHI L C, JIAO Y Y, LU B L. Differential entropy feature for EEG-based vigilance estimation[C]∥2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Piscataway:IEEE, 2013: 6627-6630.
[15] CHEN X Y, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[EB/OL]. (2017-12-15)[2025-08-16]. https:∥arxiv.org/abs/1712.05526.
[16] NGUYEN T A, TRAN A. Input-aware dynamic backdoor attack[J]. Advances in Neural Information Processing Systems, 2020, 33: 3454-3464.
[17]WANG Z T, ZHAI J, MA S Q. BppAttack: stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning[C]∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2022: 15074-15084.
[18] LIU K, DOLAN-GAVITT B, GARG S. Fine-pruning: defending against backdooring attacks on deep neural networks[C]∥Research in Attacks, Intrusions, and Defenses. Cham: Springer, 2018: 273-294.
[19]WU B Y, CHEN H R, ZHANG M D, et al. Backdoorbench: a comprehensive benchmark of backdoor learning[J]. Advances in Neural Information Processing Systems, 2022, 35: 10546-10559.
[20] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[EB/OL]. (2017-06-26)[2025-08-16].https:∥arxiv.org/abs/1706.08500.
[21] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11):2579-2605.

相似文献/References:

[1]赵淑芳,董小雨.基于改进的LSTM深度神经网络语音识别研究[J].郑州大学学报(工学版),2018,39(05):63.[doi:10.13705/j.issn.1671-6833.2018.02.004]
 Zhao Shufang,Dong Xiaoyu.Research on Speech Recognition Based on Improved LSTM Deep Neural Network[J].Journal of Zhengzhou University (Engineering Science),2018,39(02):63.[doi:10.13705/j.issn.1671-6833.2018.02.004]

更新日期/Last Update: 2026-03-04