Journal of Guangxi Normal University(Natural Science Edition) ›› 2020, Vol. 38 ›› Issue (2): 72-80.doi: 10.16088/j.issn.1001-6600.2020.02.008

Previous Articles     Next Articles

An Improved Multi-decision Tree Algorithm for Imbalanced Classification

DUAN Huajuan1,2, WEI Yongqing2,3*, LIU Peiyu1,2, ZHOU Peng1,2   

  1. 1. School of Information Science and Engineering, Shandong Normal University, Jinan Shandong 250358,China;
    2. Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Jinan Shandong 250358,China;
    3. Basic Education Department,Shandong Police College,Jinan Shandong 250014,China
  • Received:2019-10-10 Published:2020-04-02

Abstract: When dealing with imbalanced datasets, in order to reduce the impact of class overlapping on classification effect, and avoid over-fitting caused by over-sampling and information loss attributed to under-sampling, a Multi-decision tree based on Under-sampling and Attribute selection called UAMDT is proposed. First, Tomek link under-sampling and Ensemble Under-sampling are used for data processing, and many balanced subsets are obtained. Furthermore, single decision tree is constructed on each subset, the hybrid attribute measure of information gain and Gini index as attribute selection criteria are used and the optimal attribute as the split attribute of the root node of each single decision tree is selected, and finally all the single decision trees are integrated to build a multi-decision tree. In this paper, the experiments with multiple evaluation criteria on 10 imbalanced datasets are conducted to verify the effectiveness and feasibility of the proposed algorithm.

Key words: imbalanced data, multi-decision tree, Tomek link under-sampling, ensemble under-sampling, attribute selection

CLC Number: 

  • TP391
[1] 赵楠,张小芳,张利军.不平衡数据分类研究综述[J].计算机科学,2018,45(S1):22-27,57.
[2] 温雪岩,陈家男,景维鹏,等.面向不平衡数据集分类模型的优化研究[J].计算机工程,2018,44(4):268-273,293.DOI: 10.3969/j.issn.1000-3428.2018.04.043.
[3] 冯力力,李跃波,苏宇,等.对不平衡类分类的一种组合方法[J].广西师范大学学报(自然科学版),2007,25(4):277-280. DOI:10.16088/j.issn.1001-6600.2007.04.051.
[4] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16:321-357.DOI: 10.1613/jair.953.
[5] MA Li,FAN Suohai.CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests[J].BMC bioinformatics,2017,18:169.DOI:10.1186/s12859-017-1578-z.
[6] 陈斌,苏一丹,黄山.基于KM-SMOTE和随机森林的不平衡数据分类[J].计算机技术与发展,2015,25(9):17-21.DOI: 10.3969/j.issn.1673-629X.2015.09.004.
[7] 盛凯,刘忠,周德超,等.面向不平衡分类的IDP-SMOTE重采样算法[J].计算机应用研究,2019,36(1):115-118.DOI: 10.19734/j.issn.1001-3695.2017.07.0699.
[8] 熊冰妍,王国胤,邓维斌.基于样本权重的不平衡数据欠抽样方法[J].计算机研究与发展,2016,53(11):2613-2622.DOI:10.7544/issn1000-1239.2016.20150593.
[9] TSAI Chihfong,LIN Weichao,HU Yahan,et al.Under-sampling class imbalanced datasets by combining clustering analysis and instance selection[J].Information Sciences,2019,477:47-54.DOI:10.1016/j.ins.2018.10.029.
[10]LIN Weichao,TSAI Chihfong,HU Yahan,et al.Clustering-based undersampling in class-imbalanced data[J].Information Sciences,2017,409/410:17-26.DOI:10.1016/j.ins.2017.05.008.
[11]BOLÓN-CANEDO V,ALONSO-BETANZOS A.Ensembles for feature selection:a review and future trends[J]. Information Fusion,2019,52:1-12.DOI:10.1016/j.inffus.2018.11.008.
[12]秦孟梅,邱建林,陆鹏程,等.基于AdaBoost的类不平衡学习算法[J].计算机应用研究,2017,34(11):3229-3232.DOI:10.3969/j.issn.1001-3695.2017.11.006.
[13]王正群,张天平,乐晓蓉,等.基于聚类选择的分类器集成[J].计算机应用研究,2007,24(12):85-87.DOI:10.3969/ j.issn.1001-3695.2007.12.025.
[14]SEIFFERT C,KHOSHGOFTAAR T M,HULSE J V,et al.RUSBoost: a hybrid approach to alleviating class imbalance[J]. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans,2010,40(1):185-197.DOI:10.1109/TSMCA.2009.2029559.
[15]BARANDELA R,VALDOVINOS R M,SÁNCHEZ J S.New applications of ensembles of classifiers[J].Pattern Analysis & Applications,2003,6(3):245-256.DOI:10.1007/s10044-003-0192-z.
[16]LIU Xuying,WU Jianxin,ZHOU Zhihua.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems, Man, and Cybernetics,Part B(Cybernetics),2009,39(2):539-550.DOI:10.1109/TSMCB.2008.2007853.
[17]KANG P,CHO S.EUS SVMs:ensemble of under-sampled SVMs for data imbalance problems[C]//International Conference on Neural Information Processing.Berlin:Springer,2006:837-846.DOI:10.1007/11893028_93.
[18]LU Wei,LI Zhe,CHU Jinghui.Adaptive ensemble undersampling-boost:a novel learning framework for imbalanced data [J].Journal of Systems and Software,2017,132:272-282.DOI:10.1016/j.jss.2017.07.006.
[19]PARVIN H,MIRNABIBABOLI M,ALINEJAD-ROKNY H.Proposing a classifier ensemble framework based on classifier selection and decision tree[J].Engineering Applications of Artificial Intelligence,2015,37:34-42.DOI:10.1016/ j.engappai.2014.08.005.
[20]NEJATIAN S,PARVIN H,FARAJI E.Using sub-sampling and ensemble clustering techniques to improve performance of imbalanced classification[J].Neurocomputing,2018,276:55-66.DOI:10.1016/j.neucom.2017.06.082.
[21]TOMEK I.Two modifications of CNN[J].IEEE Transactions on Systems,Man and Cybernetics,1976,6(11):769-772.DOI: 10.1109/TSMC.1976.4309452.
[22]DEVI D,kr BISWAS S,PURKAYASTHA B.Redundancy-driven modified Tomek-link based undersampling:a solution to class imbalance[J].Pattern Recognition Letters,2017,93:3-12.DOI:10.1016/j.patrec.2016.10.006.
[23]LI Fenglian,ZHANG Xueying,ZHANG Xiqian,et al.Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets[J].Information Sciences,2018,422:242-256.DOI:10.1016/j.ins.2017.09.013.
[1] WU Hao, QIN Lichun, LUO Liurong. Improving Classification Rule with Lift Measure for KNN Classifier [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 75-81.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] WANG Mengfei, HUANG Song. Spatial Linkage of Tourism Economy of Cities in West River Economic Belt in Guangxi, China[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 144 -150 .
[2] FENG Xiu, MA Nannan, ZHI Hongtao, HAN Shuangqiao, ZHANG Xiang. Removal of Low Concentration Cadmium Ion in the Wastewater by Heavy Metal Capturing Agent UDTC[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 63 -67 .
[3] ZHANG Haoran. Benthic Diatom Assemblages Distribution in Longjiang and Diaojiang Rivers, in Relation to Chemical and Physiographical Factor[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 132 -141 .
[4] TANG Tang,LUO Xiaoshu,Lü Wande,LIU Xin. Sliding Mode Active Disturbance Rejection Control of Quadrotor Unmanned Aerial Vehicle[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(2): 56 -62 .
[5] WU Juan,ZOU Hua,MEI Ping. Surface Properties of Carboxylate Gemini Surfactant[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(2): 78 -86 .
[6] LI Shuwan. Supplements to Checklist of Vascular Plants of Guangxi, China (Ⅳ)[J]. Journal of Guangxi Normal University(Natural Science Edition), 2016, 34(4): 129 -133 .
[7] WANG Pei, ZHOU Shenglin. Two Dimensional Classical Groups PSL(2,q)and Flag-transitive 2-(v,k,λ) Designs[J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(2): 39 -44 .
[8] DANG Guilan , FENG Huizhe , TANG Qiming , MO Foyan , XUE Yuegui. New Recorded Plant Species in Guangxi,China[J]. Journal of Guangxi Normal University(Natural Science Edition), 2016, 34(2): 147 -150 .
[9] XU Lun-hui, LIU Jing-ning, ZHU Qun-qiang, WANG Qing, XIE Yan, SUO Sheng-chao. Path Deviation Control of Automatic Guided Vehicle[J]. Journal of Guangxi Normal University(Natural Science Edition), 2015, 33(1): 1 -6 .
[10] KUANG Xian-yan, WU Yun, CAO Wei-hua, WU Yin-feng. Cellular Automata Simulation Model for Urban MixedNon-motor Vehicle Flow[J]. Journal of Guangxi Normal University(Natural Science Edition), 2015, 33(1): 7 -14 .