[1]李学相,高亚飞,夏辉丽,等.基于剪枝与后门遗忘的深度神经网络后门移除方法[J].郑州大学学报(工学版),2026,47(XX):1-8.[doi:10. 13705 / j. issn. 1671-6833. 2025. 05. 018]
 LI Xuexiang,GAO Yafei,XIA Huili,et al.Backdoor Removal Method for Deep Neural Networks Based on Pruning and Backdoor Unlearning[J].Journal of Zhengzhou University (Engineering Science),2026,47(XX):1-8.[doi:10. 13705 / j. issn. 1671-6833. 2025. 05. 018]
点击复制

基于剪枝与后门遗忘的深度神经网络后门移除方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
47
期数:
2026年XX
页码:
1-8
栏目:
出版日期:
2026-09-10

文章信息/Info

Title:
Backdoor Removal Method for Deep Neural Networks Based on Pruning and Backdoor Unlearning
作者:
李学相1高亚飞1夏辉丽2王超1刘明林1
1.郑州大学 网络空间安全学院,河南 郑州450002;2. 河南省多模态感知与智能交互技术工程研究中心,河南 郑州 451191
Author(s):
LI Xuexiang1 GAO Yafei1 XIA Huili2 WANG Chao1 LIU Minglin1
1. School of Cyber Science and Engineering, Zhengzhou University, Zheng zhou 450002,China; 2. Henan Multimodal Perception and Intelligent Interaction Technology Engineering Research Center, Zhengzhou 451191,China
关键词:
深度神经网络后门攻击后门防御预激活分布对抗性后门遗忘
Keywords:
deep neural networks backdoor attack backdoor defense pre-activation distribution adversarial backdoor unlearning
分类号:
TP309 TP181
DOI:
10. 13705 / j. issn. 1671-6833. 2025. 05. 018
文献标志码:
A
摘要:
后门攻击对深度神经网络的安全性构成了严重威胁。现有的大多数后门防御方法依赖于部分原始训练数据来移除模型中的后门,但在数据访问受限这一现实场景中,这些方法在移除模型后门时的效果不佳,并且对模型的原始精度会产生较大影响。针对上述问题,提出了一种基于剪枝与后门遗忘的无数据后门移除方法(DBR-PU)。首先,所提方法分析模型神经元在合成数据集上的预激活分布差异来定位可疑神经元;其次,通过对这些可疑神经元进行剪枝操作来降低后门对模型的影响;最后,使用对抗性后门遗忘策略以进一步消除模型对少量残留后门信息的内部响应。在CIFAR10和GTSRB数据集上对6种主流后门攻击方法进行的大量实验结果表明:在数据访问受限的条件下,所提方法在精度上可以与最优的基准防御方法保持较小差距并且在降低攻击成功率方面表现最好,比最优的基准防御方法分别高2.37%与1.13%。
Abstract:
Backdoor attacks pose a serious threat to the security of deep neural networks. Most existing backdoor defense methods rely on partial original training data to remove backdoor from models. However, in real-world scenarios where these data access is limited, these methods perform poorly in eliminating backdoor and often significantly impact the model’s original accuracy. To address these issues, this paper proposes a data-free backdoor removal method based on pruning and backdoor unlearning (DBR-PU). Specifically, the proposed method first analyzes the pre-activation distribution differences of model neurons on a synthetic dataset to identify suspicious neurons. Then, it reduces the impact of backdoor by pruning these suspicious neurons. Finally, an adversarial backdoor unlearning strategy is employed to further eliminate the model’s internal response to any residual backdoor information. Extensive experiments on the CIFAR10 and GTSRB datasets against six mainstream backdoor attack methods demonstrate that, under data access constraints, the proposed method achieves a minimal accuracy gap compared to the best baseline defense methods and performs the best in reducing attack success rates, outperforming the best baseline defense method by 2.37% and 1.3%, respectively.

参考文献/References:

[1] 罗荣辉, 袁航, 钟发海, 等. 基于卷积神经网络的道路拥堵识别研究[J]. 郑州大学学报(工学版), 2019, 40(2): 21-25.
LUO R H, YUAN H, ZHONG F H, et al. Traffic jam detection based on convolutional neural network[J]. Journal of Zhengzhou University (Engineering Science), 2019, 40(2): 21-25.
[2] LI Y M, JIANG Y, LI Z F, et al. Backdoor learning: a survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 5-22.
[3] GU T Y, LIU K, DOLAN-GAVITT B, et al. BadNets: evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230-47244.
[4] NGUYEN A, TRAN A. WaNet: imperceptible warping-based backdoor attack[EB/OL]. (2021-02-20)[2024-11-16]. https://doi.org/10.48550/arXiv.2102.10369.
[5] BARNI M, KALLAS K, TONDI B. A new backdoor attack in CNNs by training set corruption without label poisoning[C]//2019 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2019: 101-105.
[6] TRAN B, LI J, MADRY A. Spectral signatures in backdoor attacks[EB/OL]. (2018-11-01)[2024-11-16]. https://doi.org/10.48550/arXiv.1811.00636.
[7] WU D X, WANG Y S. Adversarial neuron pruning purifies backdoored deep models[EB/OL]. (2021-10-27)[2024-11-16]. https://doi.org/10.48550/arXiv.2110.14430.
[8] ZENG Y, CHEN S, PARK W, et al. Adversarial unlearning of backdoors via implicit hypergradient[EB/OL]. (2021-10-07)[2024-11-16]. https://doi.org/10.48550/arXiv.2110.03735.
[9] ZHENG R K, TANG R J, LI J Z, et al. Pre-activation distributions expose backdoor neurons[J]. Advances in Neural Information Processing Systems, 2022, 35: 18667-18680.
[10] ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 6848-6856.
[11] CAI R, ZHANG Z Y, CHEN T L, et al. Randomized channel shuffling: Minimal-overhead backdoor attack detection without clean datasets[J]. Advances in Neural Information Processing Systems, 2022, 35: 33876-33889.
[12] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 3514-3522.
[13] FANG G F, SONG J, SHEN C C, et al. Data-free adversarial distillation[EB/OL]. (2019-12-23)[2024-11-16]. https://arxiv.org/abs/1912.11006.
[14] SHI L C, JIAO Y Y, LU B L. Differential entropy feature for EEG-based vigilance estimation[C]//2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Piscataway: IEEE, 2013: 6627-6630.
[15] CHEN X Y, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[EB/OL]. (2017-12-15)[2024-11-16]. https://arxiv.org/abs/1712.05526.
[16] NGUYEN T A, TRAN A. Input-aware dynamic backdoor attack[J]. Advances in Neural Information Processing Systems, 2020, 33: 3454-3464.
[17] WANG Z T, ZHAI J, MA S Q. BppAttack: Stealthy and efficient trojan attacks against deep neural networks via image quantization and contrastive adversarial learning[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Piscataway: IEEE, 2022: 15074-15084.
[18] LIU K, DOLAN-GAVITT B, GARG S. Fine-pruning: defending against backdooring attacks on deep neural networks[C]//Research in Attacks, Intrusions, and Defenses. Cham: Springer International Publishing, 2018: 273-294.
[19] WU B Y, CHEN H R, ZHANG M D, et al. Backdoorbench: A comprehensive benchmark of backdoor learning[J]. Advances in Neural Information Processing Systems, 2022, 35: 10546-10559.
[20] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[EB/OL]. (2017-06-26)[2024-11-16]. https://arxiv.org/abs/1706.08500.
[21] VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(11): 2579-2605.

相似文献/References:

[1]赵淑芳,董小雨.基于改进的LSTM深度神经网络语音识别研究[J].郑州大学学报(工学版),2018,39(05):63.[doi:10.13705/j.issn.1671-6833.2018.02.004]
 Zhao Shufang,Dong Xiaoyu.Research on Speech Recognition Based on Improved LSTM Deep Neural Network[J].Journal of Zhengzhou University (Engineering Science),2018,39(XX):63.[doi:10.13705/j.issn.1671-6833.2018.02.004]

备注/Memo

备注/Memo:
收稿日期:2025-04-16;修订日期:2025-06-13
基金项目:国家自然科学基金资助项目(62302458) ;河南省自然科学基金资助项目(222300420295)
作者简介:李学相(1965— ) ,男,河南郑州人,郑州大学教授,主要从事高性能计算、云计算和人工智能方向的研究,E-mail:lxx@ zzu. edu. cn。
通信作者:刘明林(1991— ) ,男,河南郑州人,郑州大学讲师,博士,主要从事图像隐写与隐写分析、数字媒体取证、AI 安全的研究,E-mail:liuminglin@zzu.edu.cn。
更新日期/Last Update: 2026-01-14