|
广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (3): 132-140.doi: 10.16088/j.issn.1001-6600.2021071201
胡强, 刘倩, 周杭霞*
HU Qiang, LIU Qian, ZHOU Hangxia*
摘要: 针对目前大多数钓鱼网站检测技术准确率低、计算资源消耗大和检测不及时等问题,本文提出一种基于改进Stacking策略的钓鱼网站检测方法。该方法将多个分类表现优异的基学习器通过Stacking策略集成为一个高性能模型,并且把该Stacking算法第一级的输入特征与预测结果同时作为第二级的输入特征,充分发挥各模型精度高、速度快等优势,从而进一步提高模型性能。实验结果表明,与传统的机器学习钓鱼网站检测技术相比,在10万级数据集上,此集成学习算法在多个指标上都表现出更好的性能,精确率达到了97.82%,F1值达到97.54%,可以有效地检测钓鱼网站。
中图分类号:
[1]BELL S, KOMISARCZUK P. An analysis of phishing blacklists: Google safe browsing, openphish, and phishtank[C]// Proceedings of the Australasian Computer Science Week Multiconference. New York, NY: ACM Press, 2020:Article 3. DOI: 10.1145/3373017.3373020. [2]黄长慧,胡光俊,李海威. 基于URL智能白名单的Web应用未知威胁阻断技术研究[J].信息网络安全,2021, 21(3): 1-6. DOI: 10.3969/j.issn.1671-1122.2021.03.001. [3]弋晓洋,张健.基于图像的网络钓鱼邮件检测方法研究[J].信息网络安全,2021, 21(9): 52-58. DOI: 10.3969/j.issn.1671-1122.2021.09.008. [4]RAO R S, PAIS A R.Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach[J].Journal of Ambient Intelligence and Humanized Computing,2020, 11(9): 3853-3872. DOI: 10.1007/s12652-019-01637-z. [5]CHEN J L, MA Y W, HUANG K L. Intelligent visual similarity-based phishing websites detection[J].Symmetry,2020, 12(10):1681. DOI: 10.3390/sym12101681. [6]MAO J, BIAN J D, TIAN W Q, et al. Phishing page detection via learning classifiers from page layout feature[J].EURASIP Journal on Wireless Communications and Networking, 2019, 2019: 43. DOI: 10.1186/s13638-019-1361-0. [7]卜佑军,张桥,陈博,等.基于CNN和BiLSTM的钓鱼URL检测技术研究[J].郑州大学学报(工学版),2021,42(6): 1-7. DOI: 10.13705/j.issn.1671-6833.2021.04.022. [8]YANG L Q, ZHANG J W, WANG X Z, et al.An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features[J]. Expert Systems with Applications,2021, 165: 113863. DOI: 10.1016/j.eswa.2020.113863. [9]朱琪,林果园. 基于改进随机森林算法的钓鱼网站检测方法研究[J].微电子学与计算机,2019, 36(4): 43-46,51. DOI: 10.19304/j.cnki.issn1000-7180.2019.04.009. [10]毕青松,梁雪春,陈舒期. 基于mRMR-RF特征选择和XGBoost模型的钓鱼网站检测[J]. 计算机应用与软件,2020, 37(9): 296-301. DOI: 10.3969/j.issn.1000-386x.2020.09.049. [11]周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017, 40(6): 1229-1251. DOI: 10.11897/SP.J.1016.2017.01229. [12]冯健.基于主辅特征和深度学习的钓鱼网页检测方法[J].计算机工程与设计,2021, 42(10): 2748-2754. DOI: 10.16208/j.issn1000-7024.2021.10.007. [13]余恩泽,努尔布力,于清. 一种基于集成学习的钓鱼网站检测方法[J].计算机工程与应用,2019, 55(18): 81-88,200. DOI: 10.3778/j.issn.1002-8331.1812-0362. [14]FRIEDMAN J H.Greedy function approximation: a gradient boosting machine[J].Annals of Statistics,2001,29(5): 1189-1232. DOI: 10.1214/aos/1013203451. [15]CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]// KDD′16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2016:785-794.DOI: 10.1145/2939672.2939785. [16]徐国天,沈耀童. 基于XGBoost和LightGBM双层模型的恶意软件检测方法[J]. 信息网络安全,2020, 20(12): 54-63. DOI: 10.3969/j.issn.1671-1122.2020.12.008. [17]KE G L, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// NIPS′17: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). Red Hook: Curran Associates Inc., 2017: 3149-3157. [18]ZHOU Z H, FENG J. Deep forest[J]. National Science Review,2019, 6(1): 74-86. DOI: 10.1093/nsr/nwy108. [19]WOLPERT D H. Stacked generalization[J]. Neural Networks,1992, 5(2): 241-259. DOI: 10.1016/S0893-6080(5)80023-1. [20]BREIMAN L.Stacked regressions[J]. Machine Learning,1996, 24(1): 49-64. DOI: 10.1007/BF00117832. [21]POWERS D M W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation[J]. Journal of Machine Learning Technologies, 2011,2(1):37-63. [22]BASIT A, ZAFAR M, LIU X, et al. A comprehensive survey of AI-enabled phishing attacks detection techniques[J].Telecommunication Systems,2020, 76: 139-154. DOI: 10.1007/s11235-020-00733-2. [23]LAKSHMI L, REDDY M P, SANTHAIAH C, et al. Smart phishing detection in web pages using supervised deep learning classification and optimization technique ADAM[J].Wireless Personal Communications, 2021, 118: 3549-3564. DOI: 10.1007/s11277-021-08196-7. [24]YUAN J T, CHEN G X, TIAN S W, et al. Malicious URL detection based on a parallel neural joint model[J].IEEE Access,2021, 9: 9464-9472. DOI: 10.1109/ACCESS.2021.3049625. [25]PARRA G D L T, RAD P, RAYMOND K K, et al. Detecting internet of things attacks using distributed deep learning[J].Journal of Network and Computer Applications,2020, 163: 102662. DOI: 10.1016/j.jnca.2020.102662. [26]TAJADDODIANFAR F, STOKES J W, GURURAJAN A. Texception: a character/word-level deep learning model for phishing URL detection[C]// 45th International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE,2020: 2857-2861. DOI: 10.1109/ICASSP40776.2020.9053670. |
[1] | 段美玲, 潘巨龙. 基于双向LSTM神经网络可穿戴跌倒检测研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 141-150. |
[2] | 吴玲玉, 蓝洋, 夏海英. 基于卷积神经网络的眼底图像配准研究[J]. 广西师范大学学报(自然科学版), 2021, 39(5): 122-133. |
[3] | 马玲, 罗晓曙, 蒋品群. 一种基于PNN的点阵喷码字符识别方法[J]. 广西师范大学学报(自然科学版), 2020, 38(4): 32-41. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |