[1]汪祖民,王冬昊,梁霞,等.基于DBSCAN_GAN_XGBoost的网络入侵检测方法[J].郑州大学学报(工学版),2022,43(03):44-51.[doi:10.13705/j.issn.1671-6833.2022.03.006]
 WANG Zumin,WANG Donghao,LIANG Xia,et al.Network Intrusion Detection Method Based on DBSCAN_GAN_XGBoost[J].Journal of Zhengzhou University (Engineering Science),2022,43(03):44-51.[doi:10.13705/j.issn.1671-6833.2022.03.006]
点击复制

基于DBSCAN_GAN_XGBoost的网络入侵检测方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
43
期数:
2022年03期
页码:
44-51
栏目:
出版日期:
2022-04-10

文章信息/Info

Title:
Network Intrusion Detection Method Based on DBSCAN_GAN_XGBoost
作者:
汪祖民1王冬昊1梁霞3邹启杰1秦静2高兵1
1.大连大学信息工程学院;2.辽宁轻工职业学院信息工程系;3.大连大学软件工程学院;

Author(s):
WANG Zumin1 WANG Donghao1 LIANG Xia3 ZOU Qijie1 QIN Jing2 GAO Bing1
1.College of Information Engineering, Dalian University, Dalian 116622, China;
2.College of Software Engineering, Dalian University, Dalian 116622, China;3.Department of Information Engineering, Liaoning Vocational College of Light Industry, Dalian 116100, China
关键词:
Keywords:
network anomaly detection density-based spatial clustering of applications with noise generate adversarial network extreme gradient boosting integrated algorithm
分类号:
TN915. 08
DOI:
10.13705/j.issn.1671-6833.2022.03.006
文献标志码:
A
摘要:
:由于网络异常流量检测中异常流量数据占比不平衡,导致模型不能对稀有攻击类别流量进行充分学习,从而影响模型训练和检测精度。针对这一问题,提出一种基于DBSCAN_GAN_XCBoost 的网络入侵检测模型,该模型在对稀有攻击类样本进行扩充时,着重扩充更容易让机器学习产生混淆的噪声样本。首先,利用DBSCAN算法对提取出的稀有攻击类别数据进行聚类处理,生成一个或多个子簇,并提取出簇内样本和游离在簇外的噪声样本然后,使用生成对抗网络模型对提取出的簇内样本和噪声样本分别进行样本扩充,改变数据集中原有的样本比例最后,使用重新构建后的数据集对以决策树作为基分类器的XGBoost算法进行训练,并完成网络异常流量数据的检测。采用UNSW-NB15数据集进行对比实验,实验结果表明:DBSCAN_GAN_XCBoost模型的准确率和精确率分别为98.76%和96.5% ,比样本扩充前分别提高了15.63百分点和19.60百分点,有效地提高了稀有攻击类别的检测精度。
Abstract:
Due to the unbalanced proportion of abnormal traffic data in network abnormal traffic detection, the model could not fully learn rare attack traffic, which might affect the model training and detection accuracy.To solve this problem, a network intrusion detection model based on DBSCAN_GAN_XGBoost was proposed.When the model expanded rare attack samples, it focused on the noise samples that could more likely cause confusion in machine learning.Firstly, the DBSCAN algorithm was used to cluster the extracted rare attack data categories to generate one or more sub-clusters, and then the samples inside the cluster and the noise samples outside the cluster were extracted.Then, the generative adversarial network model was used to expand the extracted in-cluster samples and noise samples respectively, and to change the original sample proportion.Finally, the reconstructed data set was used to train the XGBoost algorithm based on decision tree classifier, and a complete the detection of abnormal network traffic data.UNSW-NB15 data set was used for comparative experiment, and the experimental results showed that the accuracy, and accuracy of DBSCAN_GAN_XGBoost model were 98.76% and 96.5% respectively, which were 15.63 percentage points and 19.60 percentage points higher than that before sample expansion, and effectively improved the detection accuracy of rare attack categories.

参考文献/References:

[1] TSIROPOULOU E E,BARAS J S,PAPAVASSILIOU S, et al. On the mitigation of interference imposed by intruders in passive RFID networks[C] / / Decision and Game Theory for Security. Berlin:Springer,2016: 62-80.

[2] AMARAL A A,MENDES L D S,ZARPELÃO B B,et al. Deep IP flow inspection to detect beyond network anomalies[J]. Computer communications,2017,98:80-96. 
[3] PAJOUH H H,DASTGHAIBYFARD G,HASHEMI S. Two-tier network anomaly detection model: a machine learning approach[ J] . Journal of intelligent information systems,2017,48(1) :61-74. 
[4] EBENUWA S H,SHARIF M S,ALAZAB M,et al. Variance ranking attributes selection techniques for binary classification problem in imbalance data[ J] . IEEE access,2019,7:24649-24666. 
[5] 王磊,刘雨,刘志中,等. 处理不平衡数据的聚类欠 采样 加 权 随 机 森 林 算 法 [ J ] . 计 算 机 应 用 研 究, 2021,38(5) :1398-1402.
[6] 高忠石,苏旸,柳玉东. 基于 PCA-LSTM 的入侵检测研 究[J]. 计算机科学,2019,46(增刊 2):473-476,492.
[7] 张仁杰,陈伟,杭梦鑫,等. 基于变分自编码器的不 平衡样本异常流量检测[ J] . 计算机科学,2021,48 (7) :62-69.
[8] 王垚,孙国梓. 基于聚类和实例硬度的入侵检测过采 样方法[J]. 计算机应用,2021,41(6):1709-1714.
[9] 李小剑,谢 晓 尧, 徐 洋. 网 络 流 量 异 常 检 测 方 法: SSAE-IWELM-AdaBoost[ J ] . 武 汉 大 学 学 报 ( 理 学 版) ,2020,66(2) :126-134.
[10] 冯英引,师智斌. 不平衡数据下基于 CNN 的网络入 侵检测[ J] . 中北大学学报( 自然科学版) ,2021,42 (4) :318-324.
[11] 王荣杰,代琪,赵佳亮,等. 不平衡数据的加权集成 分类算法[ J] . 华北理工大学学报( 自然科学版) , 2021,43(3) :125-132.
[12] 徐雪丽,段娟,肖创柏,等. 基于 CNN 和 SVM 的报文 入侵检测方法[ J] . 计算机系统应用,2020,29( 6) : 39-46.
[13] 徐伟,冷静. 基于人工蜂群算法和 XGBoost 的网络 入侵检测方法研究[ J] . 计算机应用与软件,2021, 38(3) :314-318,333. 
[14] 梁杰,陈嘉豪,张雪芹,等. 基于独热编码和卷积神 经网络的异常检测 [ J] . 清华大学学报 ( 自然科学 版) ,2019,59(7) :523-529. 
[15] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[ C] / / Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining ( KDD - 96 ) . Portland: KDD, 1996: 226-231.
[16] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets [ C] / / 2014 Conference and Workshop on Neural Information Processing Systems. Montreal:NIPS,2014:27-37. 
[17] CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system [ C] / / Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 785-794.
[18] MOUSTAFA N,SLAY J. UNSW-NB15:a comprehensive data set for network intrusion detection systems ( UNSW-NB15 network data set) [ C] / / 2015 Military Communications and Information Systems Conference ( MilCIS). Piscataway:IEEE,2015:1-6.

更新日期/Last Update: 2022-05-02