«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1671-6833. 2023. 03. 022]
点击复制

两阶段的近邻密度投票模拟离群点检测算法()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 44
期数:: 2023年06期

页码:: 33-39

栏目:

出版日期:: 2023-12-25

文章信息/Info

Title:: A Two-stage Outlier Detection Method Based on Neighbor Density Using Voting

作者:: 郑忠龙¹; 曾心¹; 刘华文²; 1. 浙江师范大学数学与计算机科学学院,浙江金华 321004;2. 绍兴文理学院计算机系,浙江绍兴 312000

Author(s):: ZHENG Zhonglong ¹ ; ZENG Xin ¹; LIU Huawen²; 1. Institute for Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004,China; 2. Department of Computer Science, Shaoxing University, Shaoxing 312000,China

关键词:: 近邻关系; 密度估计; 投票; 相似性; 离群点检测

Keywords:: neighbor relationship; density estimation; vote; similarity; outlier detection

分类号:: TP301. 6

DOI:: 10. 13705/ j. issn. 1671-6833. 2023. 03. 022

文献标志码:: A

摘要:: 基于近邻的离群点检测算法对近邻选择较为敏感,邻域范围过小会增加模型复杂度,导致过拟合;邻域范围过大会使模型过于简单,忽略大量可用信息。为了降低邻域范围选择对离群点识别的影响,达到更高的精确度, 基于近邻关系设计了一种投票决策的算法。该算法包括密度估计和模拟投票 2 个步骤:密度估计用于加速收敛数据点的密度得到稳态密度,从而根据稳态密度进行不同策略的模拟投票;模拟投票策略是基于社区发现算法改进得到的离群点检测核心算法,同时考虑数据点的重要性与其近邻的相似性进行投票。数据点的重要性与其稳态密度呈正相关,重要性越大的数据点将优先进行主动投票,把自身信息传递给邻域内与其相似度最大的数据点,并累计被投票数据点的投票排名。当每个数据点都进行主动投票后,算法停止迭代,得到各数据点的投票排名,将投票排名越靠后的数据点视为离群点。在 11 个真实数据集上的实验结果表明:基于近邻的投票模拟检测算法平均精确度为 79%,证明了所提算法的有效性。

Abstract:: The outlier detection algorithm based on the nearest neighbor is sensitive to the selection of the nearest neighbor. Too small neighborhood range will increase the complexity of the model, resulting in over-fitting; Too much neighborhood will make the model too simple and ignore a lot of available information. In order to reduce the influence and achieve higher accuracy, a voting decision algorithm was designed based on the neighbor relationship. This algorithm consisted of two steps: density estimation and simulated voting. The density estimation was used to accelerate the density of convergent data to obtain the steady-state density, so that the simulated voting of different strategies could be carried out according to the steady-state density. Simulated voting strategy was the core algorithm of outlier detection based on the improvement of community discovery algorithm, and the importance of data points and the similarity of their neighbors to vote were taken into account. The importance of data points was positively correlated with their steady-state density. The data points with greater importance would have priority to vote actively, transmit their own information to the data with the greatest similarity in the neighborhood, and accumulate the voting ranking of the voted data. After each data has took the initiative to vote, the algorithm stopped iteration and obtained the voting ranking of each data point. The data with lower voting ranking was regarded as outlier. The experimental results on 11 real data sets showed that the average accuracy of the voting simulation detection algorithm based on the nearest neighbor was 79%, which could prove the effectiveness of the algorithm.

参考文献/References:

[1] XU X D, LIU H W, YAO M H. Recent progress of anomaly detection[ J] . Complexity, 2019, 2019: 1-11.

[2] JIANG J F, HAN G J, LIU L, et al. Outlier detection approaches based on machine learning in the internet-ofthings [ J ] . IEEE Wireless Communications, 2020, 27 (3) : 53-59.

[3] 汪祖民, 王冬昊, 梁霞, 等. 基于 DBSCAN_GAN_XGBoost 的网络入侵检测方法[ J] . 郑州大学学报( 工学版) , 2022, 43(3) : 44-51.

WANG Z M, WANG D H, LIANG X, et al. Network intrusion detection method based on DBSCAN _GAN _XGBoost[ J] . Journal of Zhengzhou University ( Engineering Science) , 2022, 43(3) : 44-51.

[4] 陈梦婷, 王兴刚, 刘文予. 基于密集深度插值的 3D 人体姿态估计方法 [ J] . 郑州大学学报 ( 工学版) , 2021, 42(3) : 26-32.

CHEN M T, WANG X G, LIU W Y. Dense depth interpolation for 3D human pose estimation [ J ] . Journal of Zhengzhou University ( Engineering Science) , 2021, 42 (3) : 26-32.

[5] 吴小燕, 刘强, 朱成璋. 社交网络中协同舆论欺诈检测方法应用研究[ J] . 郑州大学学报(工学版) , 2022, 43(2) : 7-14.

WU X Y, LIU Q, ZHU C Z. Research on application of collaborative public opinion fraud detection method in social network[ J] . Journal of Zhengzhou University (Engineering Science) , 2022, 43(2) : 7-14.

[6] TANG B, HE H B. A local density-based approach for outlier detection[ J] . Neurocomputing, 2017, 241: 171- 180.

[7] YANG J W, RAHARDJA S, FRÄNTI P. Mean-shift outlier detection and filtering [ J ] . Pattern Recognition, 2021, 115: 107874.

[8] AFRASSA K W, COSGUN G, GURSOY U F, et al. On the community discovery methods for complex networks: a case study[ C]∥2020 15th Conference on Computer Science and Information Systems ( FedCSIS) . Piscataway: IEEE, 2020: 473-477.

[9] CHEN Y W, ZHOU L D, PEI S W, et al. KNN-BLOCK DBSCAN: fast clustering for large-scale data [ J] . IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(6) : 3939-3953.

[10] KEMPE D K, KLEINBERG J M, TARDOS É. Maximizing the spread of influence through a social network[ J] . Theory of Computing, 2015, 11: 105-147.

[11] ZHANG J X, CHEN D B, DONG Q, et al. Identifying a set of influential spreaders in complex networks[ J] . Scientific Reports, 2016, 6(1) : 1-10.

[12] SUN H L, CHEN D B, HE J L, et al. A voting approach to uncover multiple influential spreaders on weighted networks[ J] . Physica A: Statistical Mechanics and Its Applications, 2019, 519: 303-312.

[13] LIU P F, LI L J, FANG S Y, et al. Identifying influential nodes in social networks: a voting approach[ J] . Chaos, Solitons & Fractals, 2021, 152: 111309.

[14] DING J R, SHAH S, CONDON A. DensityCut: an efficient and versatile topological approach for automatic clustering of biological data [ J] . Bioinformatics, 2016, 32(17) : 2567-2576.

[15] DOMINGUES R, FILIPPONE M, MICHIARDI P, et al. A comparative evaluation of outlier detection algorithms: experiments andanalyses [ J ] . Pattern Recognition, 2018, 74: 406-421.

更新日期/Last Update: 2023-10-22

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics