一种基于相容粒计算模型的文章相似度计算方法

Journal of Guangxi Normal University(Natural Science Edition) ›› 2010, Vol. 28 ›› Issue (3): 135-139.

Previous Articles Next Articles

An Approach to Computing Similarity Degree Between Chinese Articles Based on Tolerance Granular Computing Model

LIU Tao¹, LI Xiang-jun^1,2, QIU Tao-rong¹, GONG Ke-hua¹, GUO Chuan-jun¹

1. Department of Computer,Nanchang University,Nanchang Jiangxi330031,China;
2. College of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China

Received:2010-04-20 Online:2010-09-20 Published:2023-02-06

Abstract

Abstract: Study on Chinese article comparison has important application value and practical significance.This paper applies the principles of granular computing tothe computation of similarity degree between Chinese articles.A tolerance granular computing model for computing similarity degree between Chinese articles isbuilt by introducing some concepts such as article tolerance granule,paragraphtolerance granule and granule space information table.Based on tolerance granular computing model,an algorithm for computing similarity degree between Chinese articles is proposed.And the effectiveness of the proposed algorithm is investigated by examples and test results.

Key words: granular computing, tolerance granule, article comparing

CLC Number:

TP301.6

LIU Tao, LI Xiang-jun, QIU Tao-rong, GONG Ke-hua, GUO Chuan-jun. An Approach to Computing Similarity Degree Between Chinese Articles Based on Tolerance Granular Computing Model[J].Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 135-139.

References

[1] BRIN S,DAVIS J,GARCIA-MOLINA H.Copy detection mechanisms for digital documents[C]//Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data.New York:ACM,1995:398-409.
[2] SHIVAKUMAR N,GARCIA-MOLINA H.SCAM:a copy detection meachanism for digitaldocuments[C]//Proceedings of Second International Conference in Theory and Practice of Digital Libraries (DL′95).[S.l.]:Texas,1995:151-156.
[3] GARCIA-MOLINA H,GRAVANO L,SHIVAKUMAR N.dSCAM:finding documentcopies across multiple databases[C]//Proceedings of the 4th International Conference onParallel and Distributed System.New York:IEEE,1996[2009-06-20].http://ilpubs.stanford.edu:8090/199/1/1996-69.pdf.
[4] MONOSTORI K,ZASLAVSKY A.MatchDetectReveal:finding overlapping and similardigital documents[C]//Proceeding of the Information Resources Management Association International Conference (IRMA2000).Anchorage,Alaska:[s.n.],2000:955-957[2009-12-20].http://www.csse.monash.edu.au/projects/MDR/papers/.
[5] MONOSTORI K,ZASLAVSKY A,VAJK I.Suffix vector:a space efficient suffix tree representation[C]//Lecture Notes in Computer Science,Volume 2223.Berlin Heidelberg:Springer-Verlag,2001:707-718.
[6] SI A,LEONG H V,LAU R W H.CHECK:a document plaginrism detectionsystem[C]//Proceeding of the 12th ACM Symposium on Applied Computing (ACM SAC′97) Special Track on Database Technology.San Jose,California,USA:ACM,1997:70-77.
[7] 宋擒豹,杨向荣,沈钧毅,等.数字商品非法复制的检测算法[J].计算机学报,2002,25(11):1206-1211.
[8] 金博,史彦军,滕弘飞.中文文档复制检测系统研究[J].计算机工程,2005,31(19):79-81.
[9] 鲍军鹏,沈钧毅,刘晓东,等.自然语言文档复制检测研究综述[J].软件学报,2003,14(10):1753-1761.
[10] 刘清.Rough集及Rough推理[M].3版.北京:科学出版社,2005.
[11] 苗夺谦,王国胤,刘清,等.粒计算:过去、现在与展望[M].北京:科学出版社,2007.
[12] 李道国,苗夺谦,张东星,等.粒度计算研究综述[J].计算机科学,2005,32(9):1-12.
[13] YAO Yi-yu.The art of granular computing[C]//Rough Sets and Intelligent Systems Paradigms:Volume 4585.Berlin Heidelberg:Springer-Verlag,2007:101-112.
[14] YAO Yi-yu.The rise of granular computing[J].Journal of Chongqing University of Posts and Telecommunications:Natural Science Edition,2008,20(3):299-308.
[15] 李道国.信息粒-计算理论-模型与应用研究[M].太原:山西科技出版社,2006.
[16] 郑征.相容粒度空间模型及其应用研究[D].北京:中国科学院研究生院计算技术研究所,2006.
[17] 何娟,高志强,陆青健,等.基于词汇相似度的元素级本体匹配[J].计算机工程,2006,32(16):185-187.
[18] 刘群,李素建.基于《知网》的词汇语义相似度计算[C]//第三届中文词汇语义学研讨会论文集.台北:[出版者不详],2002:59-76.
[19] 鲁松,白硕.词距离的计算方法[M]//自然语言理解与机器翻译.北京:清华大学出版社,2001.
[20] 搜狗实验室.文本分类语料库[DB/OL].2008[2009-04-20].http://www.sogou.com/labs/dl/c.html.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

An Approach to Computing Similarity Degree Between Chinese Articles Based on Tolerance Granular Computing Model

Abstract

Cite this article

share this article

References

Related Articles 1

Metrics

Comments

Recommended 0