Journal of Guangxi Normal University(Natural Science Edition) ›› 2013, Vol. 31 ›› Issue (3): 87-93.

Previous Articles     Next Articles

Parallel Classification Compression Algorithm for Stream-Data Based on Granular Analysis and Storage of GEP

YANG Wen1, LI Wen-jing1, LI Shuang1, LI Shu-ju2, LIN Zhong-ming1   

  1. 1.School of Computer and Information Engineering,Guangxi Teachers Education University,Nanning Guangxi 530023,China;
    2.Changyuan County Air Defense Office of Henan,Changyuan Henan 453400,China
  • Received:2013-05-30 Online:2013-09-20 Published:2018-11-26

Abstract: Considering the low accuracy of the stream-data classification hasn't high accuracy and compression rate for data mining,the stream-data parallel classification compression algorithm was proposed based on granular analysis and storage of GEP in order to achieve faster parallel classification compression algorithm of streaming data.Firstly,get the least set of stream-data with the granular analysis method,and the approximate granular space according to division rules.Secondly,establish corresponding GEP classification model for different stream-data;Finally,send the data to compression model of GEP and compression data with dynamic storage record set form,extend serial algorithm to the parallel algorithm in MPI+OpenMP hybrid programming model,and verify the algorithm performance with the UCI data and communications bill.The experimental result shows that the effect of the classification compressions time-consuming and the compression ratio are satisfactory,the student's communication bill time-consuming is about 96 s,and the compression ratio can be achieved to 1/3.

Key words: classification compression, granular analysis, GEP, parallel algorithm

CLC Number: 

  • TP393
[1] 孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5.
[2] GUHA S,GUNOPULOS D,KOUDAS N.Correlating synchronous and asynchronous data streams[C]//Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington DC:ACM Press,2003:529-534.
[3] 姜王月,唐常杰,吴江,等.基于基因表达式编程抽取特征的分类算法[J].计算机工程与应用,2007,43(26):28-33.
[4] 彭锦国,蔡之华,康立山.一种基于GEP的分类规则挖掘算法[J].计算机工程,2007,33(9):90-91.
[5] 丁超,元昌安,李桂来,等.基于GEP的多数据流的压缩算法[J].计算机研究与发展,2008,45(S):191-195.
[6] 杨文,李文敬,罗锦坤.基于基因表达式编程的多数据流压缩并行算法[J].微电子学与计算机,2011,28(8):94-101.
[7] 杨文,李文敬,李双,等.基于基因表达式编程的多数据流分类并行算法[J].华中科技大学学报:自然科学版,2012,40(12):116-119.
[8] 张燕平,张铃,吴涛.不同粒度世界的描述法:商空间法[J].计算机学报,2004,27(3):328-333.
[9] YAO Yi-yu.On modeling data mining with granular computing[C]//25th Annual International Computer Software and Applications Conference (COMPSAC 2001).Los Alamitos,CA:IEEE Computer Society,2001:638-643.
[10] 刘建伟,傅游.基于B/S模式的MPI并行程序提交系统的设计[J].信息技术与信息化,2012(1):64-67.
[11] 朱敏,万剑怡,王明文.基于MR的并行决策树分类算法的设计与实现[J].广西师范大学学报:自然科学版,2011,29(1):82-86.
[1] ZHANG Chao-ying, LI Bing-hua, QIN Zhang-rong. Designing of Comprehensive Optimization Parallel Algorithm for Lattice Boltzmann Method Based on CUDA [J]. Journal of Guangxi Normal University(Natural Science Edition), 2012, 30(3): 142-148.
[2] HU Hui-ying, ZHONG Zhi, YUAN Chang-an, LU Jian-bo, YUAN hui. Gene Expression Programming Based on Attribute Reduction of RoughSet [J]. Journal of Guangxi Normal University(Natural Science Edition), 2012, 30(2): 23-28.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!