广西师范大学学报(自然科学版) ›› 2020, Vol. 38 ›› Issue (2): 19-28.doi: 10.16088/j.issn.1001-6600.2020.02.003

• CTCIS2019 • 上一篇    下一篇

基于可信度的Android恶意代码多模型协同检测方法

张永生1, 朱文焌2, 史若琪2, 杜振华3, 张瑞3, 王志2*   

  1. 1. 中国民用航空华东地区空中交通管理局,上海200335;
    2. 南开大学网络空间安全学院,天津300350;
    3. 国家计算机病毒应急处理中心,天津300457
  • 收稿日期:2019-10-08 发布日期:2020-04-02
  • 通讯作者: 王志(1981—),男,山西潞城人,南开大学讲师。E-mail: zwang@nankai.edu.cn
  • 基金资助:
    国家自然科学基金(61872202);民航安全能力建设项目(PESA2018079,PESA2018082,PESA2019073,PESA2019074);赛尔网络下一代互联网技术创新项目(NGII20180401);中国民航大学信息安全测评中心开放课题基金(CAAC-ISECCA-201701);计算机病毒防治技术国家工程实验室项目

A Confidence-guided Hybrid Android Malware DetectionSystem with Multiple Heterogeneous Algorithms

ZHANG Yongsheng1, ZHU Wenjun2, SHI Ruoqi2, DU Zhenhua3, ZHANG Rui3, WANG Zhi2*   

  1. 1. East China Regional Air Traffic Management Bureau, Civil Aviation Administration of China, Shanghai 200335, China;
    2. College of Cyber Science, Nankai University, Tianjin 300350, China;
    3. National Computer Virus Emergency Response Center, Tianjin 300457, China
  • Received:2019-10-08 Published:2020-04-02

摘要: 当前,基于机器学习模型的Android恶意代码检测系统存在退化问题。因为恶意代码在不断地快速变异和进化,产生了概念漂移现象,恶意代码的数据分布规律随时间产生变化。概念漂移破坏了机器学习提出的数据分布规律具有稳定性的假设。为了缓解检测模型的退化问题,本文提出基于可信度的支持多模型协同检测的方法,对多个异构模型的预测结果进行可信度和置信度分析,突破了由于模型的异构性而不能相互学习和协同检测的问题,建立了开放的多模型协同检测平台,缓解恶意代码的概念漂移问题。实验表明,多模型协同可以提升检测效果。在对66 000多个Android样本的预测中,SVM模型和随机森林模型各有优劣,协同检测系统能够在保证不低于任一种单模型的基础上对预测效果有所提升。

关键词: 恶意代码检测, 机器学习, 可信度计算, 协同检测

Abstract: At present, machine learning based Android malware detection approaches has the problem of model aging. Malware is constantly changing and evolving rapidly with time, which leads to concept drift. Concept drift makes underlying data distribution change over time, which violates the machine learning assumption that the data distribution is stable. In order to alleviate the problem of model aging, a confidence-guided hybrid malware detection system is proposed. By analyzing the credibility and confidence of the predicted results of heterogeneous models, this system can break through the problem that the heterogeneous models could not cooperate with each other. An open hybrid detection platform is established to mitigate concept drift. Experiments show that hybrid Android malware detection system is effective. In an evaluation with 66 000 applications, SVM model and random forest model have their own advantages and disadvantages. Hybrid Android malware detection system can improve the prediction effect on the basis of one single model.

Key words: malware detection, machine learning, confidence calculation, hybrid detection

中图分类号: 

  • TP309
[1] 中国互联网络信息中心. 第42次中国互联网络发展状况统计报告[R/OL]. (2018-08-20)[2019-10-08]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/hlwtjbg/201808/t20180820_70488.htm.
[2] MA Z, GE H, LIU Y, et al. A combination method for android malware detection based on control flow graphs and machine learning algorithms[J]. IEEE Access, 2019, 7: 21235-21245.
[3] VINOD P, ZEMMARI A, CONTI M. A machine learning based approach to detect malicious android apps using discriminant system calls[J]. Future Generation Computer Systems, 2019, 94: 333-350.
[4] CAI H, MENG N, RYDER B, et al. Droidcat: Effective android malware detection and categorization via app-level profiling[J]. IEEE Transactions on Information Forensics and Security, 2019, 14(6): 1455-1470.
[5] HAN Weijie, XUE Jingfeng, YONG Wang, et al. MalDAE: Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics[J]. Computers and Security, 2019, 83: 208-233.
[6] SARACINO A, SGANDURRA D, DINI G,et al. MADAM: Effective and efficient behavior-based android malware detection and prevention[J]. IEEE Transactions on Dependable and Secure Computing, 2018, 15(1): 83-97.
[7] CHEN T M, MAO Q Y, YANG Y M,et al. TinyDroid: A lightweight and efficient model for Android malware detection and classification[J]. Mobile Information Systems, 2018, 2018: 4157156.
[8] CHEN L W, HOU S F, YE Y F, et al. DroidEye: Fortifying security of learning-based classifier against adversarial Android malware attacks[C]//2018: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Piscataway, NJ: IEE Press, 2018: 782-789.
[9] 王全民, 张帅帅, 杨晶. 一种基于协同训练的Android恶意代码检测方法[J]. 计算机技术与发展,2019,29(1): 135-139.
[10]SHAFER G, VOVK V. A tutorial on conformal prediction[J]. Journal of Machine Learning Research, 2008, 9(3): 371-421.
[11]JORDANEY R, SHARAD K, DASH S K, et al. Transcend: Detecting concept drift in malware classification models[C]//Proceedings of the 26th: Usenix Security Symposium. Vancouver: USENIX, 2017: 625-642.
[12]ARP D, SPREITZENBARTH M, HUBNER M, et al. DREBIN: Effective and explainable detection of Android malware in your pocket[C]//NDSS Symposium 2014. San Diego, CA: NDSS, 2014. DOI:10.14722/ndss.2014.23247.
[13]黄衍, 查伟雄. 随机森林与支持向量机分类性能比较[J]. 软件, 2012, 33(6): 107-110.
[1] 林越, 刘廷章, 王哲河. 具有两类上限条件的虚拟样本生成数量优化[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 142-148.
[2] 张仁津, 唐翠芳, 刘彬. 基于人工神经网络游戏程序的研究和设计[J]. 广西师范大学学报(自然科学版), 2011, 29(2): 119-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 汪嘉骎, 邓国和. 基于仿射跳扩散模型的利率衍生品定价[J]. 广西师范大学学报(自然科学版), 2016, 34(3): 74 -85 .
[2] 许伦辉, 刘景柠, 朱群强, 王晴, 谢岩, 索圣超. 自动引导车路径偏差的控制研究[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 1 -6 .
[3] 邝先验, 吴赟, 曹韦华, 吴银凤. 城市混合非机动车流的元胞自动机仿真模型[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 7 -14 .
[4] 肖瑞杰, 刘野, 修晓明, 孔令江. 耦合腔光机械系统中两个机械振子的态交换[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 15 -19 .
[5] 黄慧琼, 覃运梅. 考虑驾驶员性格特性的超车模型研究[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 20 -26 .
[6] 袁乐平, 孙瑞山. 飞行冲突调配概率安全评估方法研究[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 27 -31 .
[7] 杨盼盼, 祝龙记, 操孟杰. 基于STM32的TSC型无功补偿控制系统的研究[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 32 -37 .
[8] 章美月. 关于电子束聚焦系统模型的一些新结果[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 38 -44 .
[9] 侯晓东, 蔡斌斌, 金炜东, 段旺旺. 基于证据距离和模糊熵的加权证据融合新方法[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 45 -51 .
[10] 岳才杰, 陈元琰, 朱新华. 一种有效的传感器网络区域查询算法[J]. 广西师范大学学报(自然科学版), 2015, 33(1): 52 -58 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发