基于优化初始种子新策略的K-Means聚类算法

Abstract

Abstract: K-Means is one of classical and heuristic clustering algorithm,which is sensitive to the model's initial state.This makes the initialization of the model deterministic to the clustering solution,and the process usually can obtain the local optima result.The study on supplying a initial seeds set that can reflect the characteristics about the distribution of the data is of great value for clustering research.They would be selected in a denser region as far as possible and would be dispersive as much as possible.And then Hybrid Distance Density Based Seeking (HYDD) stategy is offered:first of all,this method rearranges the original data on the principle of decreasing density,then the date higher density and longer distance than the diameter wer interted into a candidate set.Secondly,k seeds is selected from the candidate set based on the theory that the sum of distance is decreasing.Finally,K-Means is run with the initial seed set.Experiments results on 5 synthesis and 3 real datasets show that HYDD K-Means could obtain clusters which have maximal intra-cluster homogeneity and maximal inter-cluster separation.

Key words: clustering, initial seeds, heuristic searching, K-Means algorithm

CLC Number:

TP301.6

SHI Ya-bing, HUANG Yu, QIN Xiao, YUAN Chang-an. K-Means Clustering Algorithm Based on a Novel Approach for Improved Initial Seeds[J].Journal of Guangxi Normal University(Natural Science Edition), 2013, 31(4): 33-40.

References

[1] CHANDRA B,GUPTA M.A novel approach for distance-based semi-supervised clustering using functional link neural network[J].Soft Comput,2013,17(3):369-379.
[2] JARMAN I H,ETCHEKKS T A,BACCIU D,et al.Clustering of protein expression data:a benchmark of statistical and neural approaches[J].Soft Comput,2011,15(8):1459-1469.
[3] 沙贝贝,谢丽聪.一种基于频繁项集的搜索引擎聚类浏览算法[J].广西师范大学学报:自然科学版,2011,29(2):151-155.
[4] MANSOORI E G.GACH:A grid-based algorithm for hierarchical clustering of high-dimensional data[J].Soft Comput,2013,17(8):1-8.
[5] 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[6] QIAN Wei-ning,ZHOU Ao-ying.Analyzing popular clustering algorithms from different viewpoints[J].Journal of Software,2002,13(8):1382-1394.
[7] BRADLEY P S,FAYYAD U M.Refining initial points for K-Means clustering[C]//Proceedings of 15th International Conference on Machine Learning.San Francisco CA:Morgan Kaufmann,1998:91-99.
[8] 宗瑜,金萍,李明楚.BK-means:骨架初始解K-means[J].计算机工程与应用,2009,45(14):49-52.
[9] McQUEEN J B.Some methods for classification and analysis of multivariate observation[C]//LE CAM L M,NEYMAN J.Berkeley Symposium on Mathematical Statistics and Probability.Berkeley:University of California Press,1967:281-297.
[10] KATSAVOUNIDIS I,KUO C C J,ZHANG Zhen.A new initialization technique for generalized Floyd iteration[J].IEEE Signal Process Lett,1994,1(10):144-146.
[11] 袁方,周志勇,宋鑫.初始聚类中心优化的K-means算法[J].计算机工程,2007,33(3):65-66.
[12] 韩凌波,王强,蒋正峰,等.一种改进的K-means初始聚类中心选取算法[J].计算机工程与应用,2010,46(17):150-152.
[13] 黄敏,何中市,邢欣来,等.一种新的K-means聚类中心选取算法[J].计算机工程与应用,2011,47(35):132-134.
[14] 陈福集,蒋芳.基于2d-距离改进的K-means聚类算法研究[J].太原理工大学学报,2012,43(2):114-118.
[15] 任培花,王丽珍.不确定域环境下基于DKC值改进的K-means聚类算法[J].计算机科学,2013,40(4):181-184.
[16] 石亚冰,元昌安,覃晓,等.基于最大维密度的全局优化空间聚类算法[J].计算机仿真,2013,30(3):277-281.
[17] HE Ji,TAN An-hwee,TAN Chew-lim,et al.On quantitative evaluation of clustering systems[C]//WU Wei-li,XIONG Hui,SHEKHAR S.Clustering and Information Retrieval.Norwell,MA:Kluwer Academic Publishers,2003.
[18] HE Ji,LAN Man,TAN Chew-lim,et al.Initialization of cluster refinement algorithms:a review and comparative study[C]//Proceedings of 2004 IEEE International Joint Conference on Neural Networks.New York:IEEE Press,2004:297-302.

Related Articles 15

[1]	WANG Xun, LI Tinghui, PAN Xiao, TIAN Yu. Image Segmentation Method Based on Improved Fuzzy C-means Clustering and Otsu Maximum Variance [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 68-73.
[2]	SU Lei, LI Junying. Discussion on Classification Standard of Eco-environment Quality in Counties of National Key Eco-functional Areas [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(3): 196-202.
[3]	LIU Jinlong,GUO Yan, YU Zhihua, LIU Yue,YU Xiaoming,CHENGXueqi. A New Method to Detect Busty Events with Different Media Data Based on Word Clustering [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 23-31.
[4]	LIN Yue, LIU Tingzhang, HUANG Lirong, XI Xiaoye, PAN Jian. Anomalous State Detection of Power Transformer Basedon Bidirectional KL Distance Clustering Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 20-26.
[5]	LIN Yue. The Fault Diagnosis of Charging Piles Based on Hybrid AP-HMM Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 25-33.
[6]	YAN Yan, HU Baoqing, HOU Manfu, SHI Shana. Suitability Assessment of Karst Rocky Desertification Control Patternsin Karst Counties of Guangxi, China [J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(4): 145-153.
[7]	TANG Qiling, CHEN Zhilin, ZHOU Shanyi. Geographic Division of Chinese Ants (Hymenoptera: Formicidae) Based on Generic Category [J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(1): 82-91.
[8]	CAO Yong-chun, SHAO Ya-bin, TIAN Shuang-liang, CAI Zheng-qi. A Clustering Method Based on Immune Genetic Algorithm [J]. Journal of Guangxi Normal University(Natural Science Edition), 2013, 31(3): 59-64.
[9]	MA Jing, ZOU Yan-li, LI Fu-tao, MO Yu-fang. Limited-maximum-degree LBA Network Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(4): 21-24.
[10]	ZHENG Lei, ZHU Zheng-li, HOU Ying-kun. Deployment Strategy of Wireless Sensor Network Nodes Based on Improved Particle Swarm Optimization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(4): 56-62.
[11]	SHEN Ze-hao, YE Zhong-xing. Fuzzy Clustering Analysis of Customer Credit Risk of Futures Company [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 101-104.
[12]	XU Li, DING Shi-fei, GUO Feng-feng. A Rough Kernel Clustering Algorithm Based on ImprovedAttribute Reduction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 105-109.
[13]	SHA Bei-bei, XIE Li-cong. Algorithm to Cluster Search Results Based on Frequent Itemsets [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(2): 151-155.
[14]	ZHOU Xin, HAO Zhi-feng, CAI Rui-chu, WEN Wen. Text Clustering with Noise and It's Application in Anti-spam Systems [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(2): 156-160.
[15]	LI Yun-fei, WANG Li-zhen, ZHOU Li-hua. More Effcient Clustering Algorithm Over Uncertain Data [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(2): 161-166.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

K-Means Clustering Algorithm Based on a Novel Approach for Improved Initial Seeds

Abstract

Cite this article

share this article

References

Related Articles 15

Metrics

Comments

Recommended 0