Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (3): 132-140.doi: 10.16088/j.issn.1001-6600.2021071201

Previous Articles     Next Articles

Study on Phishing Website Detection Based on Improved Stacking Strategy

HU Qiang, LIU Qian, ZHOU Hangxia*   

  1. College of Information Engineering, China Jiliang University, Hangzhou Zhejiang 310018, China
  • Received:2021-07-12 Revised:2021-08-03 Online:2022-05-25 Published:2022-05-27

Abstract: Aiming at the problems of low accuracy of most detection technologies for phishing websites, high consumption of computing resources and untimely detection, a phishing website detection method based on an improved Stacking strategy is proposed. This method integrates multiple base learners with excellent classification performance into a high-performance model through stacking strategy, and takes the input characteristics and prediction results of the first level of the stacking algorithm as the input characteristics of the second level at the same time, so as to give full play to the advantages of high precision and fast speed of each model, and further improve the performance of the model. Experimental results show that, compared with traditional machine learning phishing website detection technology, this integrated learning algorithm on a 100,000-level data set shows better performance on multiple indicators, with accuracy rate of 97.82% and F1 value reach 97.54%, which can effectively detect phishing websites.

Key words: phishing website, base learner, Stacking algorithm, feature extraction, ensemble learning

CLC Number: 

  • TP393.08
[1]BELL S, KOMISARCZUK P. An analysis of phishing blacklists: Google safe browsing, openphish, and phishtank[C]// Proceedings of the Australasian Computer Science Week Multiconference. New York, NY: ACM Press, 2020:Article 3. DOI: 10.1145/3373017.3373020.
[2]黄长慧,胡光俊,李海威. 基于URL智能白名单的Web应用未知威胁阻断技术研究[J].信息网络安全,2021, 21(3): 1-6. DOI: 10.3969/j.issn.1671-1122.2021.03.001.
[3]弋晓洋,张健.基于图像的网络钓鱼邮件检测方法研究[J].信息网络安全,2021, 21(9): 52-58. DOI: 10.3969/j.issn.1671-1122.2021.09.008.
[4]RAO R S, PAIS A R.Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach[J].Journal of Ambient Intelligence and Humanized Computing,2020, 11(9): 3853-3872. DOI: 10.1007/s12652-019-01637-z.
[5]CHEN J L, MA Y W, HUANG K L. Intelligent visual similarity-based phishing websites detection[J].Symmetry,2020, 12(10):1681. DOI: 10.3390/sym12101681.
[6]MAO J, BIAN J D, TIAN W Q, et al. Phishing page detection via learning classifiers from page layout feature[J].EURASIP Journal on Wireless Communications and Networking, 2019, 2019: 43. DOI: 10.1186/s13638-019-1361-0.
[7]卜佑军,张桥,陈博,等.基于CNN和BiLSTM的钓鱼URL检测技术研究[J].郑州大学学报(工学版),2021,42(6): 1-7. DOI: 10.13705/j.issn.1671-6833.2021.04.022.
[8]YANG L Q, ZHANG J W, WANG X Z, et al.An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features[J]. Expert Systems with Applications,2021, 165: 113863. DOI: 10.1016/j.eswa.2020.113863.
[9]朱琪,林果园. 基于改进随机森林算法的钓鱼网站检测方法研究[J].微电子学与计算机,2019, 36(4): 43-46,51. DOI: 10.19304/j.cnki.issn1000-7180.2019.04.009.
[10]毕青松,梁雪春,陈舒期. 基于mRMR-RF特征选择和XGBoost模型的钓鱼网站检测[J]. 计算机应用与软件,2020, 37(9): 296-301. DOI: 10.3969/j.issn.1000-386x.2020.09.049.
[11]周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017, 40(6): 1229-1251. DOI: 10.11897/SP.J.1016.2017.01229.
[12]冯健.基于主辅特征和深度学习的钓鱼网页检测方法[J].计算机工程与设计,2021, 42(10): 2748-2754. DOI: 10.16208/j.issn1000-7024.2021.10.007.
[13]余恩泽,努尔布力,于清. 一种基于集成学习的钓鱼网站检测方法[J].计算机工程与应用,2019, 55(18): 81-88,200. DOI: 10.3778/j.issn.1002-8331.1812-0362.
[14]FRIEDMAN J H.Greedy function approximation: a gradient boosting machine[J].Annals of Statistics,2001,29(5): 1189-1232. DOI: 10.1214/aos/1013203451.
[15]CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]// KDD′16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2016:785-794.DOI: 10.1145/2939672.2939785.
[16]徐国天,沈耀童. 基于XGBoost和LightGBM双层模型的恶意软件检测方法[J]. 信息网络安全,2020, 20(12): 54-63. DOI: 10.3969/j.issn.1671-1122.2020.12.008.
[17]KE G L, MENG Q, FINLEY T, et al. LightGBM: a highly efficient gradient boosting decision tree[C]// NIPS′17: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). Red Hook: Curran Associates Inc., 2017: 3149-3157.
[18]ZHOU Z H, FENG J. Deep forest[J]. National Science Review,2019, 6(1): 74-86. DOI: 10.1093/nsr/nwy108.
[19]WOLPERT D H. Stacked generalization[J]. Neural Networks,1992, 5(2): 241-259. DOI: 10.1016/S0893-6080(5)80023-1.
[20]BREIMAN L.Stacked regressions[J]. Machine Learning,1996, 24(1): 49-64. DOI: 10.1007/BF00117832.
[21]POWERS D M W. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation[J]. Journal of Machine Learning Technologies, 2011,2(1):37-63.
[22]BASIT A, ZAFAR M, LIU X, et al. A comprehensive survey of AI-enabled phishing attacks detection techniques[J].Telecommunication Systems,2020, 76: 139-154. DOI: 10.1007/s11235-020-00733-2.
[23]LAKSHMI L, REDDY M P, SANTHAIAH C, et al. Smart phishing detection in web pages using supervised deep learning classification and optimization technique ADAM[J].Wireless Personal Communications, 2021, 118: 3549-3564. DOI: 10.1007/s11277-021-08196-7.
[24]YUAN J T, CHEN G X, TIAN S W, et al. Malicious URL detection based on a parallel neural joint model[J].IEEE Access,2021, 9: 9464-9472. DOI: 10.1109/ACCESS.2021.3049625.
[25]PARRA G D L T, RAD P, RAYMOND K K, et al. Detecting internet of things attacks using distributed deep learning[J].Journal of Network and Computer Applications,2020, 163: 102662. DOI: 10.1016/j.jnca.2020.102662.
[26]TAJADDODIANFAR F, STOKES J W, GURURAJAN A. Texception: a character/word-level deep learning model for phishing URL detection[C]// 45th International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ: IEEE,2020: 2857-2861. DOI: 10.1109/ICASSP40776.2020.9053670.
[1] DUAN Meiling, PAN Julong. Wearable Fall Detection Based on Bi-directional LSTM Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 141-150.
[2] MA Ling, LUO Xiaoshu, JIANG Pinqun. An Ink-jetted Code Character Recognition MethodBased on Probabilistic Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(4): 32-41.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] AI Yan, JIA Nan, WANG Yuan, GUO Jing, PAN Dongdong. Review of Statistical Methods and Applications of Genetic Association Analysis for Multiple Traits and Multiple Locus[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 1 -14 .
[2] BAI Defa, XU Xin, WANG Guochang. Review of Generalized Linear Models and Classification for Functional Data[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 15 -29 .
[3] ZENG Qingfan, QIN Yongsong, LI Yufang. Empirical Likelihood Inference for a Class of Spatial Panel Data Models[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 30 -42 .
[4] ZHANG Zhifei, DUAN Qian, LIU Naijia, HUANG Lei. High-dimensional Nonlinear Regression Model Based on JMI[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 43 -56 .
[5] YANG Di, FANG Yangxin, ZHOU Yan. New Category Classification Research Based on MEB and SVM Methods[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 57 -67 .
[6] CHEN Zhongxiu, ZHANG Xingfa, XIONG Qiang, SONG Zefang. Estimation and Test for Asymmetric DAR Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 68 -81 .
[7] DU Jinfeng, WANG Hairong, LIANG Huan, WANG Dong. Progress of Cross-modal Retrieval Methods Based on Representation Learning[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 1 -12 .
[8] LI Muhang, HAN Meng, CHEN Zhiqiang, WU Hongxin, ZHANG Xilong. Survey of Algorithms Oriented to Complex High Utility Pattern Mining[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 13 -30 .
[9] CHAO Rui, ZHANG Kunli, WANG Jiajia, HU Bin, ZHANG Weicong, HAN Yingjie, ZAN Hongying. Construction of Chinese Multimodal Knowledge Base[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 31 -39 .
[10] LI Zhengguang, CHEN Heng, LIN Hongfei. Identification of Adverse Drug Reaction on Social Media Using Bi-directional Language Model[J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 40 -48 .