Journal of Guangxi Normal University(Natural Science Edition) ›› 2011, Vol. 29 ›› Issue (1): 92-97.

Previous Articles     Next Articles

Semi-supervised Clustering with Feature Weighting

LI Jia, WANG Ming-wen, HE Shi-zhu, KE Li   

  1. College of Computer Information Engineering,Jiangxi Normal University,Nanchang Jiangxi 330022,China
  • Received:2010-12-14 Published:2018-11-16

Abstract: Semi-supervised clustering is a new research direction of machine learning in recent years and an important branch of data mining,which has gradually become an useful tool in many areas.However,in the research for semi-supervised clustering now,especially when the number of classes in labled informations less than the entire data set,its clustering accuracy is not good.On the basis of the existing semi-clustering technology,the similarity of the samecluster of documents is improved by feature weighting with better clustering result.In order to verify the validity of this idea,experiment is carried out not only on the single-language data sets,but also the Sino-British data set in the labled document containing only Chinese or English language.The experimental results show that the method performs well.

Key words: parts of labled information, feature weighting, multi-language, semi-supervised clustering

CLC Number: 

  • TP181
[1] ZHU Xiao-jin.Semi-supervised learning literature survey:report 1530[R].Madison:Department of Computer Sciences,University of Wisconsin at Madison,2006.[2010-07-08].http://wr.lib.tsinghua.edu.cn/node/17544.
[2] SZUMMER M,JAAKKOLA T.Partially labeled classification with Markovrandom walks[M]//THOMAS G D,BECKER S,GHAHRAMANI Z.Advances in Neural Information Processing Systems 14.Cambridge,MA:MIT Press,2001:945-952.
[3] DAVIDSON I,RAVI S S.Clustering with constraints:feasibility issuesand the K-means algorithm[C]//Proceedings of the 5th SIAM InternationalConference on Data Mining.Newport Beach,CA:SIAM,2005:138-149.
[4] WANG Ming-wen,YE Hao,HUANG Guo-bin,et al.A cross lang uage retrieval model based on interlingua semantics[J].Journal of Computational Information Systems,2007,3(4):1555-1560.
[5] LING Xiao,XUE Guo-rong,DAI Wen-yuan,et al.Can Chinese Web pagesbe classified with english data source[C]//Proceedings of the 17th International World Wide Web Conference.Beijing:[s.n.],2008:969-978.
[6] 熊超,王明文,吴福英,等.基于潜在语义对偶空间的跨语言文本分类研究[J].广西师范大学学报:自然科学版,2010,28(1):157-160.
[7] TAN Pang-ning,STEINBACH M,KUMAR V.数据挖掘导论[M].范明,范宏建,译.北京:人民邮电出版社,2006.
[8] 廖海波,万中英,王明文.基于投影寻踪回归文本自动分类的模型[J].清华大学学报:自然科学版,2005,45(S1):1823-1827.
[9] BI Wen-xia,WANG Ming-wen,LUO Yuan-sheng,et al.A new cross language text categorization based on interlingua semantic[J].Journal of Computational Information Systems,2008,4(1):105-110.
[1] YANG Yang, WANG Li-hong. Active Learning of Pair-wise Constraints in Semi-supervised Clustering [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 87-91.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!