广西师范大学学报(自然科学版) ›› 2023, Vol. 41 ›› Issue (3): 91-104.doi: 10.16088/j.issn.1001-6600.2022100805

• 研究论文 • 上一篇    下一篇

基于熵图像静态分析技术的勒索软件分类研究

邓希桢, 蒋明, 岑明灿*, 罗玉玲   

  1. 广西师范大学 电子与信息工程学院, 广西 桂林 541004
  • 收稿日期:2022-10-08 修回日期:2022-10-29 出版日期:2023-05-25 发布日期:2023-06-01
  • 通讯作者: 岑明灿(1987—), 男, 广西北流人, 广西师范大学讲师。E-mail: cenmingcan@gxnu.edu.cn
  • 基金资助:
    国家自然科学基金(61762018); 广西重点研发基金(2019AB35004, GuiKeAB20238030)

Ransomware Classification Based on Entropy Image Static Analysis Technology

DENG Xizhen, JIANG Ming, CEN Mingcan*, LUO Yuling   

  1. School of Electronic and Information Engineering, Guangxi Normal University, Guilin Guangxi 541004, China
  • Received:2022-10-08 Revised:2022-10-29 Online:2023-05-25 Published:2023-06-01

摘要: 随着人工智能、5G、物联网等技术的快速发展,我国在网络安全领域遭受境外攻击的现象也愈发严重,勒索软件攻击事件已显著增加,给国家、企业和个人造成巨大的数据损失和经济损失。为了有效地对勒索软件家族进行分类,本文提出一种基于熵图像静态分析技术的勒索软件分类方法,直接利用从勒索软件二进制文件中提取的熵特征进行分类,同时提出一种名为Ran-GAN的数据增强方法以解决勒索软件家族间数据不平衡问题。本文提出的方法将注意力机制引入VGG16神经网络架构中,用于提升网络的特征提取能力。实验结果表明,本文提出的方法在14种勒索软件家族上可达97.16%的准确率以及97.12%的加权平均F1-score。与传统可视化方法相比,本文提出的方法在4种评价指标下均明显优于传统的可视化方法,同时,与其他神经网络方法相比,勒索软件的检测性能都有显著提升。

关键词: 勒索软件, 勒索软件可视化, 熵特征, 静态分析, 注意力机制

Abstract: With the rapid development of artificial intelligence, 5G, Internet of Things and other technologies, China has become increasingly vulnerable to attacks from outside the country in the field of cyber security. The number of ransomware attacks has increased significantly, causing huge data losses and economic losses to individuals, enterprises and countries. To effectively classify ransomware families, a ransomware classification method based on entropy image static analysis technology is proposed in this paper, which directly utilizes the entropy features extracted from ransomware binary files for classification. In addtion, a data augmentation method named Ran-GAN is proposed to solve the data imbalance problem among ransomware families. The method proposed in this paper introduces the attention mechanism into the VGG16 neural network architecture to improve the feature extraction ability of the network. Experimental results show that the proposed method achieves 97.16% accuracy and 97.12% weighted average F1-score on 14 ransomware families. Compared with the traditional visualization methods, the proposed method is obviously better than the traditional visualization methods under the four evaluation indicators. At the same time, the ransomware detection performance is significantly improved compared with other neural network methods.

Key words: ransomware, ransomware visualization, entropy features, static analysis, attention mechanism

中图分类号:  TP309

[1] BRIDGES L. The changing face of malware[J]. Network Security, 2008, 2008(1): 17-20. DOI: 10.1016/S1353-4858(08)70010-2.
[2] 腾讯研究院. 2021年勒索攻击特征与趋势研究白皮书[R]. 武汉: 腾讯研究院, 2021.
[3] NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images: visualization and automatic classification[C]//Proceedings of the 8th International Symposium on Visualization for Cyber Security. New York, NY: Association for Computing Machinery, 2011: 4. DOI: 10.1145/2016904.2016908.
[4] KANCHERLA K, MUKKAMALA S. Image visualization based malware detection[C]//2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS). Piscataway, NJ: IEEE, 2013: 40-44. DOI: 10.1109/CICYBS.2013.6597204.
[5] SAXE J, BERLIN K. Deep neural network based malware detection using two dimension binary program features[C]//2015 10th International Conference on Malicious and Unwanted Software (MALWARE). Piscataway, NJ: IEEE, 2015: 11-20. DOI: 10.1109/MALWARE.2015.7413680.
[6] 郭春, 陈长青, 申国伟, 等. 一种基于可视化的勒索软件分类方法[J]. 信息网络安全, 2020, 20(4): 31-39. DOI: 10.3969/j.issn.1671-1122.2020.04.004.
[7] XIAO G Q, LI J N, CHEN Y D, et al. MalFCS:an effective malware classification framework with automated feature extraction based on deep convolutional neural networks[J]. Journal of Parallel and Distributed Computing, 2020, 141: 49-58. DOI: 10.1016/j.jpdc.2020.03.012.
[8] 杨春雨, 徐洋, 张思聪, 等. 一种基于三通道图像的恶意软件分类方法[J]. 武汉大学学报(理学版), 2022, 68(1): 26-34. DOI: 10.14188/j.1671-8836.2021.2005.
[9] 王方伟, 柴国芳, 李青茹, 等. 基于参数优化元学习和困难样本挖掘的小样本恶意软件分类方法[J]. 武汉大学学报(理学版), 2022, 68(1):17-25. DOI: 10.14188/j.1671-8836.2021.2008.
[10] 陈小寒, 魏书宁, 覃正泽.基于深度学习可视化的恶意软件家族分类[J]. 计算机工程与应用, 2021, 57(22): 131-138. DOI: 10.3778/j.issn.1002-8331.2007-0291.
[11] 张英韬, 王宝会.基于图表示学习的恶意软件分类方法[J]. 新型工业化, 2021, 11(10): 91-96. DOI: 10.19335/j.cnki.2095-6649.2021.10.019.
[12] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ: IEEE, 2017: 2223-2232. DOI: 10.1109/ICCV.2017.244.
[13] VINAYAKUMAR R, SOMAN K P, SENTHIL VELAN K K, et al. Evaluating shallow and deep networks for ransomware detection and classification[C]//2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). Piscataway, NJ: IEEE, 2017: 259-265. DOI: 10.1109/ICACCI.2017.8125850.
[14] 陈长青, 郭春, 崔允贺, 等. 基于API短序列的勒索软件早期检测方法[J]. 电子学报, 2021, 49(3): 586-595. DOI: 10.12263/DZXB.20200623.
[15] 汪嘉来, 张超, 戚旭衍, 等. Windows平台恶意软件智能检测综述[J]. 计算机研究与发展, 2021, 58(5): 977-994. DOI: 10.7544/issn1000-1239.2021.20200964.
[16] ZHAO S, MA X B, ZOU W, et al. DeepCG:classifying metamorphic malware through deep learning of call graphs[C]//Security and Privacy in Communication Networks. Cham: Springer Nature Switzerland AG, 2019: 171-190. DOI: 10.1007/978-3-030-37228-6_9.
[17] 杨望, 高明哲, 蒋婷. 一种基于多特征集成学习的恶意代码静态检测框架[J]. 计算机研究与发展, 2021, 58(5): 1021-1034. DOI: 10.7544/issn1000-1239.2021.20200912.
[18] ZHANG B, XIAO W T, XIAO X, et al. Ransomware classification using patch-based CNN and self-attention network on embedded n-grams of opcodes[J]. Future Generation Computer Systems, 2020, 110: 708-720. DOI: 10.1016/j.future.2019.09.025.
[19] ZHANG H Q, XIAO X, MERCALDO F, et al. Classification of ransomware families with machine learning based on n-gram of opcodes[J]. Future Generation Computer Systems, 2019, 90: 211-221. DOI: 10.1016/j.future.2018.07.052.
[20] 白金荣, 王俊峰, 赵宗渠. 基于PE静态结构特征的恶意软件检测方法[J]. 计算机科学, 2013, 40(1): 122-126. DOI: 10.3969/j.issn.1002-137X.2013.01.029.
[21] 张光华, 高天娇, 陈振国, 等. 基于N-Gram静态分析技术的恶意软件分类研究[J].计算机科学, 2022, 49(8): 336-343. DOI: 10.11896/jsjkx.210900203.
[22] CONTI G, DEAN E, SINDA M, et al. Visual reverse engineering of binary and data files[C]//Visualization for Computer Security: LNCS Volume 5210. Berlin: Springer, 2008: 1-17. DOI: 10.1007/978-3-540-85933-8_1.
[23] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2022-10-08]. https://arxiv.org/abs/1409.1556v6. DOI: 10.48550/arXiv.1409.1556.
[24] WOO S Y, PARK J C, LEE J Y, et al. CBAM: convolutional block attention module[C]//Computer Vision-ECCV 2018: LNCS Volume 11211. Cham: Springer, 2018: 3-19. DOI: 10.1007/978-3-030-01234-2_1.
[25] CONTINELLA A, GUAGNELLI A, ZINGARO G, et al. ShieldFS: a self-healing, ransomware-aware filesystem[C]//Proceedings of the 32nd Annual Conference on Computer Security Applications. New York, NY: Association for Computing Machinery, 2016: 336-347. DOI: 10.1145/2991079.2991110.
[26] SGANDURRA D, MUÑOZ-GONZÁLEZ L, MOHSEN R, et al. Automated dynamic analysis of ransomware:benefits, limitations and use for detection[EB/OL]. (2016-09-10)[2022-10-08]. https://arxiv.org/abs/1609.03020. DOI: 10.48550/arXiv.1609.03020.
[27] HIRANO M, HODOTA R, KOBAYASHI R. RanSAP: an open dataset of ransomware storage access patterns for training machine learning models[J]. Forensic Science International: Digital Investigation, 2022, 40: 301314. DOI: 10.1016/j.fsidi.2021.301314.
[28] HU J, SHEN L,ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. DOI: 10.1109/TPAMI.2019.2913372.
[29] HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. (2017-04-17)[2022-10-08]. https://arxiv.org/abs/1704.04861. DOI: 10.48550/arXiv.1704.04861.
[30] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2016: 770-778. DOI: 10.1109/cvpr.2016.90.
[31] SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2015: 1-9. DOI: 10.1109/cvpr.2015.7298594.
[32] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03)[2022-10-08]. https://arxiv.org/abs/2010.11929. DOI: 10.48550/arXiv.2010.11929.
[1] 王利娥, 王艺汇, 李先贤. POI推荐中的多源数据融合和隐私保护方法[J]. 广西师范大学学报(自然科学版), 2023, 41(1): 87-101.
[2] 王宇航, 张灿龙, 李志欣, 王智文. 体现用户意图和风格的图像描述生成[J]. 广西师范大学学报(自然科学版), 2022, 40(4): 91-103.
[3] 李正光, 陈恒, 林鸿飞. 基于双向语言模型的社交媒体药物不良反应识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 40-48.
[4] 万黎明, 张小乾, 刘知贵, 宋林, 周莹, 李理. 基于高效通道注意力的UNet肺结节CT图像分割[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 66-75.
[5] 张萍, 徐巧枝. 基于多感受野与分组混合注意力机制的肺结节分割研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 76-87.
[6] 孔亚钰, 卢玉洁, 孙中天, 肖敬先, 侯昊辰, 陈廷伟. 面向强化当前兴趣的图神经网络推荐算法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 151-160.
[7] 吴军, 欧阳艾嘉, 张琳. 基于多头注意力机制的磷酸化位点预测模型[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 161-171.
[8] 邓文轩, 杨航, 靳婷. 基于注意力机制的图像分类降维方法[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 32-40.
[9] 李维勇, 柳斌, 张伟, 陈云芳. 一种基于深度学习的中文生成式自动摘要方法[J]. 广西师范大学学报(自然科学版), 2020, 38(2): 51-63.
[10] 王健, 郑七凡, 李超, 石晶. 基于ENCODER_ATT机制的远程监督关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(4): 53-60.
[11] 武文雅, 陈钰枫, 徐金安, 张玉洁. 基于高层语义注意力机制的中文实体关系抽取[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 32-41.
[12] 岳天驰, 张绍武, 杨亮, 林鸿飞, 于凯. 基于两阶段注意力机制的立场检测方法[J]. 广西师范大学学报(自然科学版), 2019, 37(1): 42-49.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 张茹, 张蓓, 任鸿瑞. 山西轩岗矿区耕地流失时空特征及其影响因子研究[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 121 -132 .
[2] 胡郁葱, 陈栩, 罗嘉陵. 多起终点多车型混载的定制公交线路规划模型[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 1 -11 .
[3] 吴磊, 黄云峰, 农东新, 许为斌. 广西兰科植物新资料[J]. 广西师范大学学报(自然科学版), 2011, 29(3): 57 -59 .
[4] 杜雪松,林勇,梁国琨,黄姻,宾石玉,陈忠,覃俊奇,赵怡. 两种罗非鱼的耐寒性能比较[J]. 广西师范大学学报(自然科学版), 2019, 37(3): 174 -179 .
[5] 白捷, 高海力, 王永众, 杨来邦, 项晓航, 楼雄伟. 基于多路特征融合的Faster R-CNN与迁移学习的学生课堂行为检测[J]. 广西师范大学学报(自然科学版), 2020, 38(5): 1 -11 .
[6] 胡锦铭, 韦笃取. 不同阶次分数阶永磁同步电机的混合投影同步[J]. 广西师范大学学报(自然科学版), 2021, 39(4): 1 -8 .
[7] 侯欠欠, 方志刚, 秦渝, 朱依文. 团簇Fe4P的成键及极化率探究[J]. 广西师范大学学报(自然科学版), 2021, 39(6): 140 -146 .
[8] 孔亚钰, 卢玉洁, 孙中天, 肖敬先, 侯昊辰, 陈廷伟. 面向强化当前兴趣的图神经网络推荐算法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 151 -160 .
[9] 谭凯, 李永杰, 潘海明, 黄可馨, 邱杰, 陈庆锋. 基于多信息集成的药物靶标预测方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 91 -102 .
[10] 钟辉, 宋树祥, 岑明灿, 蔡超波, 蒋品群, 刘振宇. 基于采样计算的差分N通道滤波器[J]. 广西师范大学学报(自然科学版), 2022, 40(4): 58 -67 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发