«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j. issn. 1671-6833. 2024. 03. 010]
点击复制

融合异常检测与区域分割的高效K-means 聚类算法()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 45
期数:: 2024年03期

页码:: 80-88

栏目:

出版日期:: 2024-04-20

文章信息/Info

Title:: Efficient K-means with Region Segment and Outlier Detection

文章编号:: 1671-6833( 2024) 03-0080-09

作者:: 尹宏伟1; 2; 杭雨晴1; 2; 胡文军1; 2; 1. 湖州师范学院信息工程学院,浙江湖州 313000;2. 湖州师范学院浙江省现代农业资源智慧管理与应用研究重点实验室,浙江湖州 313000

Author(s):: YIN Hongwei¹; 2; HANG Yuqing¹; 2; HU Wenjun¹; 2; 1. College of Information Engineering, Huzhou University, Huzhou 313000, China; 2. Zhejiang Key Laboratory of Smart Management & Application of Modern Agricultural Resources, Huzhou University, Huzhou 313000, China

关键词:: 聚类; K-means; 异常检测; 区域分割; 近邻簇搜索; 自适应

Keywords:: clustering; K-means; outlier detection; region segment; near neighbor clusters search; adaption

分类号:: TP391

DOI:: 10. 13705/ j. issn. 1671-6833. 2024. 03. 010

文献标志码:: A

摘要:: 传统K-means 及其众多改进算法缺乏显式处理异常样本的能力,导致其聚类性能容易受到异常样本的影响。针对此问题,提出一种融合异常检测与区域分割的高效K-means 聚类算法。首先,通过构建统一聚类模型,形成异常检测与聚类之间的交互协同,以提高聚类性能。其次,利用近邻簇搜索技术对各类簇进行自适应的区域分割,以减少冗余计算,提高算法执行效率。最后,为验证所提方法的有效性,在多个合成数据集和真实数据集上分别进行测试。实验结果表明:所提算法聚类性能和执行效率优于其他算法;在添加10%异常样本的Wine 数据集上准确度可达0. 911。

Abstract:: In the traditional K-means and many improved algorithms, the inability to explicitly handle outliers, resulted in their poor clustering performance. To solve this problem, in this paper, an efficient K-means with region segment and outlier detection was proposed. Firstly, to obtain better clustering results, an unified clustering model to form an interactive collaboration between outlier detection and clustering was constructed. Secondly, to improve algorithm efficiency, clusters were adaptively segmented through near neighbor clusters search to reduce redundant calculations. Finally, on synthetic datasets and real datasets were tested to verify the effectiveness of the proposed method. The experimental results showed that EK-means algorithm outperformed other algorithms in terms of clustering performance and execution efficiency. The ACC could reach 0. 911 in the Wine dataset.

参考文献/References:

[1] NIE F P, LI Z H, WANG R, et al. An effective and efficient algorithm for K-means clustering with new formulation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(4): 3433-3443.

[2] BAI L, LIANG J Y, ZHAO Y X. Self-constrained spectral clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(4): 5126-5138.

[3] ZHANG Q H, DAI Y Y, WANG G Y. Density peaks clustering based on balance density and connectivity[J]. Pattern Recognition, 2023, 134: 109052.

[4] HIRECHE C, DRIAS H, MOULAI H. Grid based clustering for satisfiability solving [ J]. Applied Soft Computing,2020, 88: 106069.

[5] MONATH N, KOBREN A, KRISHNAMURTHY A, et al. Scalable hierarchical clustering with tree grafting[C]∥Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM, 2019: 1438-1448.

[6] WAN Y C, LIU X B, WU Y, et al. ICGT: a novel incremental clustering approach based on GMM tree[ J].Data & Knowledge Engineering, 2018, 117: 71-86.

[7] 鲁斌, 范晓明. 基于改进自适应k 均值聚类的三维点云骨架提取的研究[J]. 自动化学报, 2022, 48(8):1994-2006.LU B, FAN X M. Research on 3D point cloud skeleton extraction based on improved adaptive k-means clustering[ J]. Acta Automatica Sinica, 2022, 48 ( 8): 1994-2006.

[8] LI Z, TANG C, ZHENG X, et al. Unified K-means coupledself-representation and neighborhood kernel learning for clustering single-cell RNA-sequencing data[J]. Neurocomputing, 2022, 501: 715-726.

[9] IM S, QAEM M M, MOSELEY B, et al. Fast noise removal for K-means clustering[C]∥ the 23rd International Conference on Artificial Intelligence and Statistics(AISTATS) 2020. Palermo: AISTATS, 2020: 456-466.[10] LI Y M, ZHANG Y, TANG Q T, et al. T-k-means: a robust and stable k-means variant[C]∥International Conferenceon Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 2021: 3120-3124.

[11] GRUNAU C, ROZHON V. Adapting K-means algorithms for outliers[C] ∥the 39th International Conference on Machine Learning. Baltimore: ICML, 2022: 7845-7886.

[12] ZHANG Z, FENG Q L, HUANG J Y, et al. A local search algorithm for k-means with outliers[J]. Neurocomputing,2021, 450: 230-241.

[13] HUANG S D, REN Y Z, XU Z L. Robust multi-view data clustering with multi-view capped-norm K-means[ J]. Neurocomputing, 2018, 311: 197-208.

[14] HUANG S D, KANG Z, XU Z L, et al. Robust deep kmeans: an effective and simple method for data clustering[J]. Pattern Recognition, 2021, 117: 107996.

[15] HAUTAMÄKI V, CHEREDNICHENKO S, KÄRKKÄINENI, et al. Improving k-means by outlier removal[C]∥Proceedings of the 14th Scandinavian conference on Image Analysis.New York: ACM, 2005: 978-987.

[16] PENG D W, CHEN Z Z, FU J C, et al. Fast k-means clustering based on the neighbor information[C]∥ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering. New York: ACM,2021: 551-555.

[17] GIFFON L, EMIYA V, KADRI H, et al. Quick-means: accelerating inference for K-means by learning fast transforms[J]. Machine Learning, 2021, 110(5): 881-905.

[18] HAMERLY G. Making k-means even faster [ J]. Proceedings of the 10th SIAM International Conference on Data Mining, 2010: 130-140.

[19] DRAKE J. Faster K-means clustering[EB/ OL]. (2013-09 - 24) [ 2023 - 06 - 13]. http: ∥hdl. handle. net/2104/ 8826.

[20] NEWLING J, FLEURET F. Fast K-means with accurate bounds[C]∥Proceedings of the 33rd International Conferenceon International Conference on Machine Learning-Volume 48. New York:ACM, 2016: 936-944.

[21] XIA S Y, PENG D W, MENG D Y, et al. Ball $ k $ kmeans: fast adaptive clustering with No bounds [ J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 87-99.

[22] ARTHUR D, VASSILVITSKII S. K-means + +: the advantages of careful seeding[C]∥Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms.New York: ACM, 2007: 1027-1035.

[23] KAUFMAN L, ROUSSEEUW P. Clustering by means of medoids[EB/ OL]. (1987-01-01) [2023-06-13]. https:∥ www. researchgate. net/ publication/ 243777819 _Clustering_by_Means_of_Medoids.

相似文献/References:

[1]胡燕,朱晓瑛,马刚.基于K-Means和时间匹配的位置预测模型[J].郑州大学学报(工学版),2017,38(02):17.[doi:10.13705/j.issn.1671-6833.2017.02.005]
　Hu Yan,Zhu Xiaoying,Ma Gang.Location Prediction Model Based on K-Means Algorithm and Time Matching[J].Journal of Zhengzhou University (Engineering Science),2017,38(03):17.[doi:10.13705/j.issn.1671-6833.2017.02.005]
[2]张忠林,曹志宇,李元韬.基于加权欧式距离的k-means算法研究[J].郑州大学学报(工学版),2010,31(01):89.
　ZHANG Zhonglin,CAO Zhiyu,LI Yuantao.Research Based on Euclid Distance with Weights of K——means Algorithm[J].Journal of Zhengzhou University (Engineering Science),2010,31(03):89.

备注/Memo

备注/Memo:: 收稿日期:2023-10-20;修订日期:2023-11-24
基金项目:国家自然科学基金资助项目(62206094);湖州市公益性应用研究项目(2021GZ05);江苏省网络空间安全工程实验室开放课题(SDGC2237)
作者简介:尹宏伟(1990— ),男,安徽安庆人,湖州师范学院副教授,博士,主要从事机器学习、模式识别及数据挖掘等方面的研究,E-mail: 02713@ zjhu. edu. com。
通信作者:胡文军(1977— ),男,安徽宣城人,湖州师范学院教授,博士,主要从事机器学习、模式识别及智能系统等方面的研究, E-mail: hoowenjun@ foxmail. com。

更新日期/Last Update: 2024-04-29

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

相似文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics