Journal of Guangxi Normal University(Natural Science Edition) ›› 2011, Vol. 29 ›› Issue (2): 156-160.

Previous Articles     Next Articles

Text Clustering with Noise and It's Application in Anti-spam Systems

ZHOU Xin, HAO Zhi-feng, CAI Rui-chu, WEN Wen   

  1. Faulty of Computer,Guangdong University of Technology,Guangzhou Guangdong 510006,China
  • Received:2011-04-22 Published:2018-11-19

Abstract: A method based on Needleman-Wunsch algorithm is proposed to measure the similarity among the spam mails,in which thetexts usually contain a lot of noises.Based on the proposed similarity measurement,an efficient clustering algorithm is devised in the anti-spamsystems.Experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

Key words: text similarity, text clustering, Needleman-Wunsch algorithm, spam

CLC Number: 

  • TP391.1
[1] 彭京,杨冬青,唐世渭,等.一种基于语义内积空间模型的文本聚类算法[J].计算机学报,2007,30(8):1344-1363.
[2] LIU Qun,LI Su-jian.Word similarity computing based on How-Net[J].Computational Lingustics and Chinese Language Processing,2002,7(2):59-76.
[3] 王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10.
[4] 祝庆荣,董守斌,陈彬.基于SMO和指纹技术在线邮件过滤方法与优化[J].郑州大学学报:理学版,2009,41(1):90-93.
[5] 苏绥,林鸿飞,叶正.基于字符语言模型的垃圾邮件过滤[J].中文信息学报,2009,23(2):41-47.
[6] 施展,李郝林.实验数据聚类有效性的评价及其应用[J].模式识别与人工智能,1997,10(2):184-188.
[7] NEEDLEMAN S B,WUNSCH C D.A general method applicable tothe search for similarities in the amino acid sequence of two proteins[J].J Mol Biol,1970,48(3):443-453.
[8] NAVARRO G,RAFFINOT M.Flexible pattern matching in stirngs:practical on-line search algorithms for texts and biological sequences[M].Cambridge:Cambridge University Press,2002:132-137.
[9] 潘文峰.基于内容的垃圾邮件过滤研究[D].北京:中国科学院计算技术研究所,2004.
[10] 金博,金博一,史彦军,等.基于语义理解的文本相似度算法[J].大连理工大学学报,2005,45(2):291-297.
[11] 王学熙,王亚东,湛燕,等.学习特征值对K-均值聚类算法的优化[J].计算机研究与发展,2003,40(6):869-873.
[12] HAN Jia-wei,KAMBER M.Data mining:concept s and techniques[M].2nd ed.San Francisco:Morgan Kaufmann Publishers,2006:263-265.
[13] TAN Pang-ning,STEINBACH M,KUMAR V.数据挖掘导论[M].北京:人民邮电出版社,2006:339-344.
[1] WANG Jian, ZHENG Qifan, LI Chao, SHI Jing. Remote Supervision Relationship Extraction Based on Encoder and Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 53-60.
[2] SONG Jun, HAN Xiao-yu, HUANG Yu, HUANG Ting-lei, FU Kun. A Method for Entity-Oriented Timeline Summarization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2015, 33(2): 36-41.
[3] ZHANG Fen, QU Wei-guang, ZHAO Hong-yan, ZHOU Jun-sheng. Shallow Parsing Based on CRF and Transformation-basedError-driven Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 147-150.
[4] ZHUO Guang-ping, SUN Jing-yu, LI Xian-hua, YU Xue-li. Personalized Recommendation Algorithm Based on CBR [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 151-156.
[5] LIU Jinlong,GUO Yan, YU Zhihua, LIU Yue,YU Xiaoming,CHENGXueqi. A New Method to Detect Busty Events with Different Media Data Based on Word Clustering [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 23-31.
[6] ZHENG Kengtao, LIN Nankai, FU Yingwen, WANG Lianxi, JIANG Shengyi. Study on the Automatic Alignment of Mandarin-Indonesian Bilingual Texts [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 89-97.
[7] CHENG Xian-yi, PAN Yan, ZHU Qian, SUN Ping. Automatic Generating Algorithm of Event-oriented Multi-documentSummarization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 147-150.
[8] YANG Liang, PAN Feng-ming, LIN Hong-fei. Chunk-based Opinion Object Extraction and Application in OpinionAnalysis [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 151-156.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!