[1]郑忠龙,曾 心,刘华文.两阶段的近邻密度投票模拟离群点检测算法[J].郑州大学学报(工学版),2023,44(06):33-39.[doi:10. 13705/ j. issn. 1671-6833. 2023. 03. 022]
. Institute for Mathematics and Computer Science,Zhejiang Normal University,Jinhua 00,et al.A Two-stage Outlier Detection Method Based on Neighbor Density Using Voting[J].Journal of Zhengzhou University (Engineering Science),2023,44(06):33-39.[doi:10. 13705/ j. issn. 1671-6833. 2023. 03. 022]
点击复制
两阶段的近邻密度投票模拟离群点检测算法(
)
《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]
- 卷:
-
44卷
- 期数:
-
2023年06期
- 页码:
-
33-39
- 栏目:
-
- 出版日期:
-
2023-09-25
文章信息/Info
- Title:
-
A Two-stage Outlier Detection Method Based on Neighbor Density Using Voting
- 作者:
-
郑忠龙; 曾 心; 刘华文
-
1. 浙江师范大学 数学与计算机科学学院,浙江 金华 321004;2. 绍兴文理学院 计算机系,浙江 绍兴 312000
- Author(s):
-
1. Institute for Mathematics and Computer Science; Zhejiang Normal University; Jinhua 321004; China; 2. Department of Computer Science; Shaoxing University; Shaoxing 312000; China
-
1. Institute for Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004,China; 2. Department of Computer Science, Shaoxing University, Shaoxing 312000,China
-
- 关键词:
-
近邻关系; 密度估计; 投票; 相似性; 离群点检测
- Keywords:
-
neighbor relationship; density estimation; vote; similarity; outlier detection
- DOI:
-
10. 13705/ j. issn. 1671-6833. 2023. 03. 022
- 文献标志码:
-
A
- 摘要:
-
基于近邻的离群点检测算法对近邻选择较为敏感,邻域范围过小会增加模型复杂度,导致过拟合;邻域范 围过大会使模型过于简单,忽略大量可用信息。 为了降低邻域范围选择对离群点识别的影响,达到更高的精确度, 基于近邻关系设计了一种投票决策的算法。 该算法包括密度估计和模拟投票 2 个步骤:密度估计用于加速收敛数 据点的密度得到稳态密度,从而根据稳态密度进行不同策略的模拟投票;模拟投票策略是基于社区发现算法改进 得到的离群点检测核心算法,同时考虑数据点的重要性与其近邻的相似性进行投票。 数据点的重要性与其稳态密 度呈正相关,重要性越大的数据点将优先进行主动投票,把自身信息传递给邻域内与其相似度最大的数据点,并累 计被投票数据点的投票排名。 当每个数据点都进行主动投票后,算法停止迭代,得到各数据点的投票排名,将投票 排名越靠后的数据点视为离群点。 在 11 个真实数据集上的实验结果表明:基于近邻的投票模拟检测算法平均精 确度为 79%,证明了所提算法的有效性。
- Abstract:
-
The outlier detection algorithm based on the nearest neighbor is sensitive to the selection of the nearest neighbor. Too small neighborhood range will increase the complexity of the model, resulting in over-fitting; Too much neighborhood will make the model too simple and ignore a lot of available information. In order to reduce the influence and achieve higher accuracy, a voting decision algorithm was designed based on the neighbor relationship. This algorithm consisted of two steps: density estimation and simulated voting. The density estimation was used to accelerate the density of convergent data to obtain the steady-state density, so that the simulated voting of different strategies could be carried out according to the steady-state density. Simulated voting strategy was the core algorithm of outlier detection based on the improvement of community discovery algorithm, and the importance of data points and the similarity of their neighbors to vote were taken into account. The importance of data points was positively correlated with their steady-state density. The data points with greater importance would have priority to vote actively, transmit their own information to the data with the greatest similarity in the neighborhood, and accumulate the voting ranking of the voted data. After each data has took the initiative to vote, the algorithm stopped iteration and obtained the voting ranking of each data point. The data with lower voting ranking was regarded as outlier. The experimental results on 11 real data sets showed that the average accuracy of the voting simulation detection algorithm based on the nearest neighbor was 79%, which could prove the effectiveness of the algorithm.
更新日期/Last Update:
2023-10-22