Journal of Guangxi Normal University(Natural Science Edition) ›› 2010, Vol. 28 ›› Issue (1): 122-126.

Previous Articles     Next Articles

An Algorithm of Removing Duplicate URL

SU Guo-rong1, YANG Yue-xiang1, DENG Jing-sheng2   

  1. 1. School of Computer Science,National University of Defense and Technology,Changsha Hunan 410073,China;
    2. Information Center,National University of Defense and Technology,Changsha Hunan 410073,China
  • Received:2009-12-28 Online:2010-03-20 Published:2023-02-07

Abstract: Based on the analysis of removing duplicate strategies in collecting Web information,which are used by the Bloom Filter algorithm and its improved versions and combining with Dynamic Bloom Filter algorithm,this adopts dynamic arrayto represent the elements of aggregate,and then proposes a removing duplicate strategy,which supports frequently querying and deletes operation of repeated URL.Finally,an experiment is carried out by using the proposed strategy,and comparing it withother strategies,which shows that the strategy gets better effect in removing duplicate in the case of lower error rates.

Key words: Bloom filter, Hash function, URL, URL filter

CLC Number: 

  • TP391
[1] 中国互联网络信息中心.第23次中国互联网络发展状况统计报告[R/OL].北京:中国互联网络信息中心,2009[2009-11-20].http://www.cnnic.net.cn/uploadfiles/doc/2009/1/13/92209.doc.
[2] 沙芸,张国英,孟凡亮.基于关键词提取的娱乐新闻文档去重算法[J].广西师范大学学报:自然科学版,2007,25(2):30-33.
[3] BLOOM B H.Space/time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426.
[4] FAN L,CAO P,ALMEIDA J,et al.Summary cache:A scalable wide-area Web cache sharing protocol[J].IEEE/ACM Transom Networking,2000,8(3):281-293.
[5] MITZENMACHER M.Compressed bloom filters[J].IEEE/ACM Trans on Networking,2002,10(5):604-612.
[6] 肖明忠,代亚非,李小明.拆分型Bloom Filter[J].电子学报,2004,32(2):241-245.
[7] SAAR C,YOSSI M.Spectral bloom filters[C]//Proc ACM SIGMOD International Conference on Management of Data.San Diego,California:ACM Press,2003:241-252.
[8] 谢鲲,闵应骅,张大方,等.分档布鲁姆过滤器的查询算法[J].计算机学报,2007,30(4):597-607.
[9] 肖明忠,王佳聪,闵博楠.针对动态集的矩阵型Bloom filter表示与查找[J].计算机应用研究,2008,25(7):2002-2003.
[10] 丁振国,吴宝贵,辛友强.基于Bloom Filter的大规模网页去重策略研究[J].现代图书情报技术,2008,3(3):45-50.
[11] GUO De-ke,WU Jie,CHEN Hong-hui,et al.Theory and network application ofdynamic bloom filters[C]//Proc of the 25th IEEE INFOCOM.Barcelona,Spain:IEEEComputer Society,2006.
[12] 池静,倪健,王华,等.Bloom Filter和Weighted Blom Filter的比较与研究[J].河北师范大学学报:自然科学版,2006,30(4):398-402.
[1] LI Lanhang, QIU Senhui, XIAO Dingwei, LI Liangjia, OUYANG Xue, LUO Yuling. Image Encryption Algorithm Based on DNA Sequence and Dynamic Index Diffusion [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(3): 40-53.
[2] SUN Zeyang, DU Huimin, WEI Yu, JIA Yalan, ZHANG Xinghua. Study on 9 Genetic Characteristics in Head and Faceof Han Nationality’s Female Group in Tianjin, China [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 159-165.
[3] TANG Zhenjun. Image Hashing Algorithm Based on PCA Feature Distance [J]. Journal of Guangxi Normal University(Natural Science Edition), 2016, 34(4): 9-18.
[4] TANG Zhen-jun, DAI Yu-min, ZHANG Xian-quan, ZHANG Shi-chao. Perceptual Image Hash Function Using DCT-Based Feature Points [J]. Journal of Guangxi Normal University(Natural Science Edition), 2012, 30(3): 135-141.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] CHEN Yong-qi, BAI Ke-zhao, KUANG hua, KONG Ling-jiang, LIU Mu-ren. Effect of Internal Layout on the Pedestrian Evacuation in the Classroom[J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 1 -4 .
[2] XU Lun-hui, YE Fan. Acceleration Noise Model Based on Horizontal,Vertical and LateralAcceleration[J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(1): 5 -9 .
[3] YANG Li, KONG Ling-jiang. Capillary Force between Microparticles[J]. Journal of Guangxi Normal University(Natural Science Edition), 2012, 30(1): 1 -4 .
[4] HE Qing, LIU Jian, WEI Lianfu. Single-Photon Detectors as the Physical Limit Detections of Weak Electromagnetic Signals[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(5): 1 -23 .
[5] BAI Ke-zhao, LUO Xu-dong, KONG Ling-jiang, LIU Mu-ren. Cellular Automaton Model of Date Transmission with Open Boundary Condition[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 1 -4 .
[6] XU Lun-hui, LIAO Ran-kun. Signal Phasing-Sequence Optimization of Intersection Based on Traffic Track[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 5 -9 .
[7] WANG Xiu-xin, QIN Li-mei, NONG Jing-hui, LIANG Zong-jin, ZHU Qi-jiang. Land Surface Temperature Retrieval with Mono-window Algorithm in Karst City[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 10 -14 .
[8] LI Yu-fang, ZHANG Jun-jian. Strong Consistency of the Regression Weighted Function Estimator for Negatively Associated Samples[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 15 -19 .
[9] JIA Bao-hua. A Strictly Stationary Associated Random Sequence Which Unsatisfythe Central Limit Theorem[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 20 -23 .
[10] CHEN Cui-ling, LI Ming, LIANG Jia-mei, LI Lüe. A Class of New Conjugate Gradient Method and Its Convergence Property Under the Wolfe Line Search[J]. Journal of Guangxi Normal University(Natural Science Edition), 2010, 28(3): 24 -28 .