广西师范大学学报(自然科学版) ›› 2010, Vol. 28 ›› Issue (3): 104-108.

• • 上一篇    下一篇

嵌入误分类代价和拒识代价的二元分类算法

邹超1, 郑恩辉1, 任玉玲2, 张英3, 范玉刚4   

  1. 1.中国计量学院机电工程学院,浙江杭州310018;
    2.浙江天达环保股份有限公司,浙江杭州310006;
    3.国际商业机器全球服务(中国)有限公司,上海200032;
    4.昆明理工大学信息工程与自动化学院,云南昆明650000
  • 收稿日期:2010-05-08 出版日期:2010-09-20 发布日期:2023-02-06
  • 通讯作者: 郑恩辉(1975—),男,辽宁新民人,中国计量学院副教授。E-mail:ehzheng@cjlu.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(60905034);浙江省自然科学基金资助项目(Y1080950);国家公益行业专项资助(2007GYJ016);云南省教育厅科学研究基金资助项目(08C0019)

Binary Classification with Misclassification Cost and Reject Cost

ZOU Chao1, ZHENG En-hui1, REN Yu-ling2, ZHANG Ying3, FAN Yu-gang4   

  1. 1. College of Mechatronics Engineering,China Jiliang University,Hangzhou Zhejiang 310018,China;
    2. Zhejiang Tianda Environmental Protection Co. LTD,Hangzhou Zhejiang 310006,China;
    3. IBM Global Services Co. LTD,Shanghai 200032,China;
    4. College of Information Engineering and Automatics,Kunming University of Science and Technology, Kunming Yunnan 650000,China
  • Received:2010-05-08 Online:2010-09-20 Published:2023-02-06

摘要: 传统分类算法隐含的假设(每个样本的误差都具有相同的代价,且接受每个样本的分类结果)并不适用于医疗诊断、故障诊断、欺诈检测等领域的实际需求。在定义拒识代价的基础上,本文提出一种嵌入非对称误分类代价和非对称拒识代价的二元分类算法(CSVM-CMC2RC),包括以下4个步骤:学习代价敏感支持向量机、估计每个样本的后验概率、计算每个样本的分类可靠性、确定每类样本的最优拒识阈值。基于标准数据集的试验研究表明,CSVM-CMC2RC能有效地降低误分类率和平均代价,提高分类结果的可靠性。

关键词: 非对称误分类代价, 非对称拒识代价, 代价敏感支持向量机

Abstract: To minimize “0-1” loss,most of conventional classification algorithmsnon-explicitly assume that all results of classification are accepted.However,the assumption is inappliable to knowledge extraction in such fields as medical diagnosis,fault diagnosis and fraud detection.In this paper,the algorithm Cost-sensitive SVM with Class-dependent Misclassification Cost and Class-dependent Reject Cost (CSVM-CMC2RC) is proposed.In CSVM-CMC2RC algorithm,firstly,acost-sensitive SVM is trained to obtain the preliminary classification results.Secondly,the post probability of each sample is computed.Thirdly,the classification reliability of each sample is estimated.Finally,the optimal reject threshold and the final reject decision are determined based on minimizing the average cost.Experimental results demonstrate that the proposed CSVM-CMC2RC algorithm can reduce the misclassification rate and average cost,and the classification reliability is improved.

Key words: class-dependent misclassification cost, class-dependent reject cost, cost-sensitive SVM

中图分类号: 

  • TP18
[1] VAPNIK V N.The nature of statistical learning theory[M].New York:Springer-Verlag,1999:138-167.
[2] 王修信,秦丽梅,罗涟玲,等.提高城市TM图像分类精度的两种方法比较[J].广西师范大学学报:自然科学版,2009,27(4):19-22.
[3] DOMINGOS P.Metacost:a general method for making classifiers cost-sensitive[C]//Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,1999:155-164.
[4] SáNCHEZ D,VILA M A,CERDA L,et al.Association rules applied tocreditcard fraud detection[J].Expert Systems with Applications,2009,36(2):3630-3640.
[5] SRIVASTAVA A,KUNDU A,SURAL S,et al.Credit card fraud detectionusing hidden Markov model[J].IEEE Transactions on Dependable and Secure Computing,2008,5(1):37-48.
[6] 周生明,廖元秀.Cost-Sensitive学习的一个新课题[J].广西师范大学学报:自然科学版,2007,25(4):55-58.
[7] ELKAN C.The foundations of cost-sensitive learning[C]//Proceedings of the 7th International Joint Conference on Artificial Intelligence.California:AAAIPress,2001:937-978.
[8] LING Xiao-feng,SHENG V S.A comparative study of cost-sensitive classifiers[J].Chinese Journal of Computers,2007,30(8):1203-1211.
[9] ZHOU Zhi-hua,LIU Xu-ying.On multi-class cost-sensitive learning[C]//Proceedings of the 21st National Conference on Artificial Intelligence.California:AAAI Press,2006,1:567-572.
[10] LIU Xu-ying,ZHOU Zhi-hua.The influence of class imbalance on cost-sensitive learning:an empirical study[C]//Proceedings of the Sixth International Conference on Data Mining.Las Vegas:CSREA Press,2006:970-974.
[11] FUMERA G,ROLI F.Cost-sensitive learning in support vector machines[C]//Proceedings of the Ⅷ Convegno Associazione Italiana per L' Intelligenza Artificiale.Bologna:Pitagora Editrice,2002.
[12] CHOW C K.On optimum recognition error and reject tradeoff[J].IEEE Transactions Information Theory,1970,16(1):41-46.
[13] FOGGIA P,SANSONE C,TORTORELLA F,et al.Multi-classification:reject criteria for the Bayesian combiner[J].Pattern Recognition,1999,32:1435-1447.
[14] STEFANO C D,SANSONE C,VENTO M.To reject or not to reject:thatis the question-answer in case of neural classifiers[J].IEEE Transactions on SMC,2008,30(1):84-94.
[15] LANDGREBE T C W,TAX D M J,PACLíK P,et al.The interaction between classification and reject performance for distance-based reject-option classifiers[J].Pattern Recognition Letters,2006,27(8):908-917.
[16] ZHENG En-hui,ZOU Chao,SUN Jian,et al.SVM-based credit card fraud detection with reject cost and class-dependent error cost[C]//Proceedings of thePAKDD' 09 Workshop:Data Mining When Classes are Imbalanced and Errors Have Cost.Rangsit Campus:Thammasat Printing House,2009:50-58.
[17] AHA D.UCI machine learning datasets[OL/DB].(2009-09-09).http://archive.ics.uci.edu/ml/datasets.html.
[1] 代佳洋, 周栋. 基于多任务学习的跨语言信息检索方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(6): 69-81.
[2] 王喜敏, 袁杰, 寇巧媛. 一种基于多策略的改进黏菌算法[J]. 广西师范大学学报(自然科学版), 2022, 40(6): 98-108.
[3] 肖飞, 康增彦, 王维红. 两种算法用于预测A2/O工艺脱氮条件[J]. 广西师范大学学报(自然科学版), 2022, 40(6): 173-184.
[4] 张师超, 李佳烨. 知识矩阵表示[J]. 广西师范大学学报(自然科学版), 2022, 40(5): 36-48.
[5] 杜锦丰, 王海荣, 梁焕, 王栋. 基于表示学习的跨模态检索方法研究进展[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 1-12.
[6] 彭涛, 唐经, 何凯, 胡新荣, 刘军平, 何儒汉. 基于多步态特征融合的情感识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 104-111.
[7] 马新娜, 赵猛, 祁琳. 基于卷积脉冲神经网络的故障诊断方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 112-120.
[8] 蒋瑞, 徐娟, 李强. 基于跨域均值逼近的轴承剩余使用寿命预测[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 121-131.
[9] 段美玲, 潘巨龙. 基于双向LSTM神经网络可穿戴跌倒检测研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 141-150.
[10] 孔亚钰, 卢玉洁, 孙中天, 肖敬先, 侯昊辰, 陈廷伟. 面向强化当前兴趣的图神经网络推荐算法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 151-160.
[11] 吴军, 欧阳艾嘉, 张琳. 基于多头注意力机制的磷酸化位点预测模型[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 161-171.
[12] 陈高建, 王菁, 栗倩文, 袁云静, 曹嘉琛. 数据驱动的自动化机器学习流程生成方法[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 185-193.
[13] 杨迪, 方扬鑫, 周彦. 基于MEB和SVM方法的新类别分类研究[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 57-67.
[14] 唐峯竹, 唐欣, 李春海, 李晓欢. 基于深度强化学习的多无人机任务动态分配[J]. 广西师范大学学报(自然科学版), 2021, 39(6): 63-71.
[15] 路凯峰, 杨溢龙, 李智. 一种基于BERT和DPCNN的Web服务分类方法[J]. 广西师范大学学报(自然科学版), 2021, 39(6): 87-98.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发