[1]程明畅,敖 兰,刘 浏.基于边界剥离思想的全局中心聚类算法[J].郑州大学学报(工学版),2024,45(05):86-94.[doi:10.13705/j.issn.1671-6833.2024.02.002]
 CHENG Mingchang,AO Lan,LIU Liu.Border-peeling Inspired Globally Central Clustering Algorithm[J].Journal of Zhengzhou University (Engineering Science),2024,45(05):86-94.[doi:10.13705/j.issn.1671-6833.2024.02.002]
点击复制

基于边界剥离思想的全局中心聚类算法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
45
期数:
2024年05期
页码:
86-94
栏目:
出版日期:
2024-08-08

文章信息/Info

Title:
Border-peeling Inspired Globally Central Clustering Algorithm
文章编号:
1671-6833(2024)05-0086-09
作者:
程明畅1 敖 兰12 刘 浏134
1. 四川师范大学 可视化计算与虚拟现实四川省重点实验室,四川 成都 610066;2. 四川师范大学 数学科学学院,四川 成都 610066;3. 成都理工大学 数学地质四川省重点实验室,四川 成都 610059;4. 成都理工大学 数理学院,四川 成都 610059
Author(s):
CHENG Mingchang1 AO Lan12 LIU Liu134
1. V. C. & V. R. Key Lab of Sichuan Provence, Sichuan Normal University, Chengdu 610066, China; 2. School of Mathematical Sciences, Sichuan Normal University, Chengdu 610066, China; 3. Geomathematics Key Laboratory of Sichuan Province, Chengdu University of Technology, Chengdu 610059, China; 4. College of Mathematics and Physics, Chengdu University of Technology, Chengdu610059, China
关键词:
全局中心聚类算法 边界剥离 簇重叠 反向 k 近邻 经验分布
Keywords:
globally central clustering algorithm border peeling overlapping reverse k-nearest neighbors empirical distribution
分类号:
TP311. 13 TP391
DOI:
10.13705/j.issn.1671-6833.2024.02.002
文献标志码:
A
摘要:
全局中心聚类算法如 k-means、谱聚类在类簇分布出现重叠粘连现象时往往容易陷入局部最优且参数难以设定,极大地限制了全局中心聚类算法在实际应用中的效果。 为解决此问题,提出了一种基于边界剥离思想的全局中心聚类算法。 首先,设计了一步边界剥离法,根据样本点间的反向 k 近邻关系定义了一种局部距离加权密度,并利用密度经验分布函数一阶差分最大处的密度值作为阈值将数据集分为边界集与核心集。 其次,嵌入传统的全局中心聚类算法对核心集进行聚类,得益于核心集的簇间重叠问题已明显改善,嵌入算法将更容易收敛到真实的簇中心。 最后,提出一种边界吸引算法,从已被归类的核心集样本点出发,借助已有的反向 k 近邻关系迭代融合边界集中的样本点以完成对整个数据集的聚类。 相较于目前以迭代方式进行的边界剥离算法,所提算法在计算效率上具有明显优势,不需要额外设定复杂的终止条件而直接通过阈值进行边界划分,并且全局性方法在数据局部密度存在差异的情形下具备更强的鲁棒性。 在实验阶段,采用 3 个合成数据集以及 6 个真实数据集从算法性能、参数敏感性、时间消耗多个方面进行评估,实验结果进一步验证了此算法的有效性与实用性。
Abstract:
The globally central clustering algorithms, such as k-means and spectral clustering, often suffer from theproblem of local optima and difficulty in parameter setting with overlapping and adhesive clusters in the data distribution, which might greatly limits the effectiveness of globally central clustering algorithms in practicalapplications. To address this issue, a border-peeling inspired globally central clustering algorithm was proposed.Firstly, a one-step border peeling method was designed, which defines a locally distance-weighted density according to the reverse k-nearest neighbor relationships between sample points. The density value at the maximal point of thefirst-order difference of the density empirical distribution function was utilized as the threshold to divide the datasetinto boundary and core sets. Then, the traditional globally central clustering algorithms were embedded to clusterthe core set. Benefiting from the significant improvement in the overlapping of the core set, the embedding algorithms could converge to the true cluster centers easily. Finally, a boundary attraction algorithm was proposed,which could progressively amalgamate sample points from the boundary set, utilizing existing reverse k-nearestneighbor relationships, and commencing from the already categorized core set sample points. Compared with thecurrently iterative border peeling algorithms, the proposed algorithm had significant advantages in computational efficiency. There was no additional complex termination conditions but only direct performs boundary partitioning using a threshold. Furthermore, the global approach also exhibited stronger robustness local data densities were different. In the experimental phase, three synthetic datasets and six real-world datasets were used to evaluate the algorithm′s performance, parameter sensitivity, and time consumption, further validating the efficacy and practicalityof this algorithm.

参考文献/References:

[1] ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatialdatabases with noise[C]∥Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Palo Alto: AAAI, 1996: 226-231.

[2] COMANICIU D, MEER P. Mean shift: a robust approach toward feature space analysis[ J] . IEEE Transactions on Pattern Analysis and Machine Intelligence,2002, 24(5) : 603-619.
[3] RODRIGUEZ A, LAIO A. Clustering by fast search andfind of density peaks[ J] . Science, 2014, 344 ( 6191) :1492-1496.
[4] GUO W J, WANG W H, ZHAO S P, et al. Density peakclustering with connectivity estimation [ J ] . Knowledge-Based Systems, 2022, 243: 108501.
[5] CHENG M C, MA T F, LIU Y B. A projection-basedsplit-and-merge clustering algorithm[ J] . Expert Systemswith Applications, 2019, 116: 121-130.
[6] SIERANOJA S, FRÄNTI P. Adapting k-means for graphclustering [ J ] . Knowledge and Information Systems,2022, 64(1) : 115-142.
[7] HUANG J Z, NG M K, RONG H Q, et al. Automatedvariable weighting in k-means type clustering[ J] . IEEETransactions on Pattern Analysis and Machine Intelligence, 2005, 27(5) : 657-668.
[8] ZHANG Y Q, CHEUNG Y M. A new distance metric exploiting heterogeneous interattribute relationship for ordinal-and-nominal-attribute data clustering [ J ] . IEEETransactions on Cybernetics, 2022, 52(2) : 758-771.
[9] 周成龙, 陈玉明, 朱益冬. 粒 K 均值聚类算法[ J] . 计算机工程与应用, 2023, 59(13) : 317-324.
ZHOU C L, CHEN Y M, ZHU Y D. Granular K-meansclustering algorithm[ J] . Computer Engineering and Applications, 2023, 59(13) : 317-324.
[10] 邓秀勤, 郑丽苹, 张逸群, 等. 基于新的距离度量的异构属性数据子空间聚类[ J] . 郑州大学学报( 工学版) , 2023, 44(2) :53-60.
DENG X Q, ZHENG L P, ZHANG Y Q, et al. Subspaceclustering of heterogeneous-attribute data based on a newdistance metric [ J ] . Journal of Zhengzhou University(Engineering Science) , 2023, 44(2) :53-60.
[11] AVERBUCH-ELOR H, BAR N, COHEN-OR D. Borderpeeling clustering[ J] . IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 ( 7 ) : 1791-1797.
[12] DU M J, WANG R, JI R, et al. ROBP a robust borderpeeling clustering using Cauchy kernel [ J] . InformationSciences, 2021, 571: 375-400.
[13] CAPÓ M, PÉREZ A, LOZANO J A. An efficient splitmerge re-start for the K-means algorithm [ J ] . IEEETransactions on Knowledge and Data Engineering, 2022,34(4) : 1618-1627.
[14] LLOYD S. Least squares quantization in PCM[ J] . IEEETransactions on Information Theory, 1982, 28(2) : 129-137.
[15] SHI J B, MALIK J. Normalized cuts and image segmentation[ J ] . IEEE Transactions on Pattern Analysis andMachine Intelligence, 2000, 22(8) : 888-905.
[16] VON LUXBURG U. A tutorial on spectral clustering[ J] .Statistics and Computing, 2007, 17(4) : 395-416.
[17] ZHA H Y, HE X F, DING C, et al. Spectral relaxationfor K-means clustering[ C]∥Proceedings of the 14th International Conference on Neural Information ProcessingSystems: Natural and Synthetic. Cambridge: MIT, 2001:1057-1064.
[18] ZHANG X L, WANG W, NRVÅG K, et al. K-AP:generating specified K clusters by efficient affinity propagation[ C]∥Proceedings of the 2010 IEEE InternationalConference on Data Mining. Piscataway: IEEE, 2010:1187-1192.
[19] MEILǍ M. Comparing clusterings—an information baseddistance[ J] . Journal of Multivariate Analysis, 2007, 98(5) :873-895.
[20] HUBERT L, ARABIE P. Comparing partitions[ J] . Journal of Classification, 1985, 2:193-218.
[21] VEENMAN C J, REINDERS M J T, BACKER E. Amaximum variance cluster algorithm [ J] . IEEE Transactions on Pattern Analysis and Machine Intelligence,2002, 24(9) : 1273-1280.
[22] GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation[C]∥ Proceeding of the 21st International Conference on Data Engineering ( ICDE′ 05 ) . Piscataway:IEEE,2005:341-352.
[23] KELLY M, LONGJOHN R, NOTTINGHAM K. The UCImachine learning repository[ DB / OL] . [ 2023 - 06 - 29]https:∥archive. ics. uci. edu / datasets.
[24] ALCALA-FDEZ J, FERNANDEZ A, LUENGO J, et al.KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework[ J] . Journal of Multiple-Valued Logic and Soft Computing, 2011, 17(2 / 3) : 255-287.

更新日期/Last Update: 2024-09-02