广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (3): 13-30.doi: 10.16088/j.issn.1001-6600.2021071101

• 综述 • 上一篇    下一篇

面向复杂高效用模式的挖掘算法综述

李慕航, 韩萌*, 陈志强, 武红鑫, 张喜龙   

  1. 北方民族大学 计算机科学与工程学院, 宁夏 银川 750021
  • 收稿日期:2021-07-11 修回日期:2021-09-09 出版日期:2022-05-25 发布日期:2022-05-27
  • 通讯作者: 韩萌(1982—), 女(回族), 河南商丘人, 北方民族大学教授, 博士。E-mail: 2003051@nmu.edu.cn
  • 基金资助:
    国家自然科学基金(62062004); 宁夏自然科学基金(2022AAC03279, 2020AAC03216)

Survey of Algorithms Oriented to Complex High Utility Pattern Mining

LI Muhang, HAN Meng*, CHEN Zhiqiang, WU Hongxin, ZHANG Xilong   

  1. School of Computer Science and Engineering, North Minzu University, Yinchuan Ningxia 750021, China
  • Received:2021-07-11 Revised:2021-09-09 Online:2022-05-25 Published:2022-05-27

摘要: 复杂高效用模式挖掘是当前研究的一个新兴主题。本文首次从高效用融合模式和衍生模式2个角度进行讨论。首先,对于融合模式,根据数据结构的不同对高效用序列模式进行分类论述;按照时间顺序对高效用片段模式、周期高效用模式进行概述。针对衍生模式,从数据结构角度对高平均效用模式、带有负项的高效用模式、on-shelf高效用模式进行总结;从精简类型角度概述精简高效用模式,并对现有融合模式和衍生模式挖掘算法的优缺点、上边界等进行对比分析。最后,针对现阶段研究缺陷与不足,给出了下一步研究方向,包括不确定数据中的高效用模式挖掘方法、数据流上的高效用on-shelf模式挖掘方法和大数据环境下的并行高效用模式挖掘方法。

关键词: 综述, 模式挖掘, 复杂高效用模式, 高效用融合模式, 高效用衍生模式

Abstract: High utility complex pattern is a new topic in data mining. For the first time, the complex high utility pattern mining algorithms are reviewed from two perspectives: high utility integrated pattern and high utility derivative pattern. Firstly, high utility sequential pattern(HUSP) is classified and discussed according to different data structures; high utility episode pattern(HUEP) and periodic high utility pattern(PHUP) are summarized in chronological order. Secondlyly, for the derivative model, the high average utility pattern, the high-utility pattern with negative unit profits and high on-shelf utility pattern are summarized from the perspective of data structure; the concise high utility pattern is summarized from the perspective of concise types. Thirdly, the advantages and disadvantages and upper boundaries of the existing integrated pattern and derivative pattern mining algorithms are compared and analyzed. Finally, based on the current research deficiencies, further research direction is given, including high utility pattern mining in uncertain data, high utility on-shelf pattern mining over data stream, and parallel high utility pattern mining in a big data environment.

Key words: survey, pattern mining, complex high utility pattern, high utility integrated pattern, high utility derivative pattern

中图分类号: 

  • TP311.13
[1]FOURNIER-VIGER P, WU C W, ZIDA S, et al. FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning[C]// Foundations of Intelligent Systems:LNAI volume 8502. Berlin: Springer, 2014: 83-92.
[2]YIN J F, ZHENG Z G, CAOL B. USpan: an efficient algorithm for mining high utility sequential patterns[C]// Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2012: 660-668. DOI: 10.1145/2339530.2339636.
[3]FOURNIER-VIGER P, YANG P, LIN J C W, et al.HUE-Span: fast high utility episode mining[C]// Advanced Data Mining and Applications:LNAI volume 11888. Berlin: Springer, 2019: 169-184.
[4]FOURNIER-VIGER P, LIN J C W, DUONG Q H, et al. PHM: mining periodic high-utility itemsets[C]// Advances in Data Mining: Applications and Theoretical Aspects:LNAI volume 9728. Berlin: Springer, 2016: 64-79.
[5]WU J M T, LIN J C W, PIROUZ M, et al. TUB-HAUPM: tighter upper bound for mining high average-utility patterns[J]. IEEE Access, 2018, 6: 18655-18669. DOI: 10.1109/ACCESS.2018.2820740.
[6]FOURNIER-VIGER P, ZIDA S, LIN J C W, et al. EFIM-closed: Fast and memory efficient discovery of closed high-utilityitemsets[C]// Machine Learning and Data Mining in Pattern Recognition:LNAI volume 9729. Berlin: Springer, 2016: 199-213.
[7]FOURNIER-VIGER P. FHN: efficient mining of high-utilityitemsets with negative unit profits[C]// Advanced Data Mining and Applications:LNAI volume 8933. Berlin: Springer, 2014: 16-29.
[8]FOURNIER-VIGER P, ZIDA S. FOSHU: faster on-shelf high utility itemset mining-with or without negative unit profit[C]// Proceedings of the 30th annual ACM symposium on applied computing. New York: Association for Computing Machinery, 2015: 857-864. DOI: 10.1145/2695664.2695823.
[9]MABROUKEH N R, EZEIFE C I. A taxonomy of sequential pattern mining algorithms[J]. ACM Computing Surveys (CSUR), 2010, 43(1): 1-41. DOI: 10.1145/1824795.1824798.
[10]FOURNIER-VIGER P, LIN J C W, NKAMBOU R, et al. High-utility pattern mining[M].Berlin:Springer, 2019: 97-129.
[11]RAHMATI B, SOHRABI M K. A systematic survey on high utility itemset mining[J]. International Journal of Information Technology & Decision Making, 2019, 18(4):1113-1185. DOI: 10.1142/S0219622019300027.
[12]SINGH K, SINGH S S, KUMAR A, et al. High utility itemsets mining with negative utility value: a survey[J]. Journal of Intelligent and Fuzzy Systems, 2018,35(6):6551-6562. DOI: 10.3233/JIFS-18965.
[13]王少峰,韩萌,贾涛,等.数据流高效用模式挖掘综述[J].计算机应用研究,2020,37(9):2571-2578. DOI: 10.19734/j.issn.1001-3695.2019.03.0105.
[14]张春砚,韩萌,孙蕊,等.高效用模式挖掘关键技术综述[J].计算机应用研究,2021,38(2):330-340. DOI: 10.19734/j.issn.1001-3695.2020.02.0027.
[15]孙蕊,韩萌,张春砚,等.精简高效用模式挖掘综述[J].计算机应用研究,2021,38(4):975-981. DOI: 10.19734/j.issn.1001-3695.2020.02.0092.
[16]高曼,韩萌,雷冰冰.高效用模式产生策略综述[J].计算机工程与应用,2020,56(16):1-12.
[17]GAN W S, LIN J C W, ZHANG J X, et al. Fast utility mining on complex sequences[EB/OL].(2019-04-28)[2021-09-09].https://arxiv.org/abs/1904.12248.
[18]SHIE B E, HSIAO H F, TSENG V S, et al. Mining high utility mobile sequential patterns in mobile commerce environments[C]// Proceedings of the International Conference on Database Systems for Advanced Applications, DASFAA 2011. Berlin: Springer, 2011: 224-238.
[19]AHMED C F, TANBEER S K, JEONG B S. Mining high utility web access sequences in dynamic web logdata[C]// Proceedings of the 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Piscataway, NJ: IEEE, 2010: 76-81. DOI: 10.1109/SNPD.2010.21.
[20]ZIHAYAT M, WU C W, AN A J, et al. Efficiently mining high utility sequential patterns in static and streaming data[J]. Intelligent Data Analysis, 2017, 21(S1): S103-S135.
[21]唐辉军,王乐,樊成立.基于模式增长的高效用序列模式挖掘算法[J].自动化学报,2021,47(4):943-954. DOI: 10.16383/j.aas.c180660.
[22]WANG J Z, HUANG J L. Incremental mining of high utility sequential patterns in incremental databases[C]// Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York: Association for Computing Machinery, 2016: 2341-2346. DOI: 10.1145/2983323.2983691.
[23]WANG J Z, HUANG J L. On incremental high utility sequential pattern mining[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2018, 9(5): 1-26. DOI: 10.1145/3178114.
[24]ZIHAYAT M, CHEN Y, AN A J. Memory-adaptive high utility sequential pattern mining over data streams[J]. Machine Learning, 2017, 106(6): 799-836. DOI: 10.1007/s10994-016-5617-1.
[25]TANG H J, LIU Y G, WANG L. A new algorithm of mining high utility sequential pattern in streaming data[J]. International Journal of Computational Intelligence Systems, 2018, 12(1):342-350. DOI: 10.2991/ijcis.2019.125905650.
[26]ALKAN O K, KARAGOZ P.CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction[C]// 2016 IEEE 32nd International Conference on Data Engineering (ICDE). Piscataway, NJ: IEEE,2016: 1472-1473. DOI: 10.1109/ICDE.2016.7498380.
[27]WANG J Z, HUANG J L, CHEN Y C. On efficiently mining high utility sequential patterns[J]. Knowledge and Information Systems, 2016, 49(2): 597-627. DOI: 10.1007/s10115-015-0914-8.
[28]XU T T, DONG X J, XU J L, et al. Mining high utility sequential patterns with negative item values[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2017, 31(10): 1750035. DOI: 10.1142/S0218001417500355.
[29]XU T T, XU J L, DONG X J. Mining high utility sequential patterns using multiple minimum utility[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2018, 32(10): 1859017. DOI: 10.1142/S0218001418590176.
[30]SONG W, RONG K K. Mining high utility sequential patterns using maximal remaining utility[C]// Data Mining and Big Data:LNISA volume 10943. Berlin: Springer, 2018: 466-477.
[31]ZHANG C K, ZU Y W, NIE J L, et al. Two efficient algorithms for mining high utility sequential patterns[C]// 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SustainCom/SocialCom 2019s), Piscataway, NJ: IEEE, 2019: 905-911. DOI: 10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00132.
[32]YIN J F, ZHENG Z G, CAO L B, et al. Efficiently mining top-k high utility sequential patterns[C]// 2013 IEEE 13th International Conference on Data Mining. Piscataway, NJ: IEEE, 2013: 1259-1264. DOI: 10.1109/ICDM.2013.148.
[33]LI T X, XU T T, DONG X J. HUNSPM: an efficient algorithm for mining high utility negative sequential patterns[C]// 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). Piscataway, NJ: IEEE, 2017: 1833-1837. DOI: 10.1109/FSKD.2017.8393045.
[34]LIN J C W, ZHANG J X, FOURNIER-VIGER P. High-utility sequential pattern mining with multiple minimum utility thresholds[C]// Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, APWeb-WAIM 2017. Berlin: Springer, 2017: 215-229.
[35]BUFFETT S. Candidate list maintenance in high utility sequential pattern mining[C]// 2018 IEEE International Conference on Big Data (Big Data). Piscataway, NJ: IEEE, 2018: 644-652. DOI: 10.1109/BigData.2018.8622138.
[36]BUFFETT S. Dramatically reducing search for high utility sequential patterns by maintaining candidate lists[J]. Information, 2020, 11(1):44. DOI: 10.3390/info11010044.
[37]AHMED C F, TANBEER S K, JEONG B S. A novel approach for mining high utility sequential patterns in sequence databases[J]. ETRI journal, 2010, 32(5): 676-686.
[38]PEI J, HAN J W, MORTAZAVI-ASL B, et al. Mining sequential patterns by pattern-growth:the prefixspan approach[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(11): 1424-1440. DOI: 10.1109/TKDE.2004.77.
[39]SHIE B E, CHENG J H, CHUANG K T, et al. A one-phase method for mining high utility mobile sequential patterns in mobile commerce environments[C]// Advanced Research in Applied Artificial Intelligence:LNAI volume 7345. Berlin: Springer, 2012: 616-626.
[40]LE B, HUYNH U, DINH D T. A pure array structure and parallel strategy for high-utility sequential patternmining[J]. Expert Systems with Applications, 2018, 104: 107-120. DOI: 10.1016/j.eswa.2018.03.019.
[41]GAN W S, LIN J C W, ZHANG J X, et al.ProUM: high utility sequential pattern mining[C]// Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). Piscataway, NJ: IEEE, 2019: 767-773. DOI: 10.1109/SMC.2019.8914402.
[42]LAN G C, HONG T P, TSENG V S, et al. Applying the maximum utility measure in high utility sequential pattern mining[J]. Expert Systems with Applications, 2014, 41(11): 5071-5081. DOI: 10.1016/j.eswa.2014.02.022.
[43]LIN J C W, LI Y F, FOURNIER-VIGER P, et al. An efficient chain structure to mine high-utility sequential patterns[C]// 2019 International Conference on Data Mining Workshops (ICDMW) . Piscataway, NJ: IEEE, 2019:1013-1019. DOI: 10.1109/ICDMW.2019.00146.
[44]WU C W, LIN Y F, YU P S, et al. Mining high utility episodes in complex event sequences[C]// Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2013: 536-544. DOI: 10.1145/2487575.2487654.
[45]GUO G M, ZHANG L, LIU Q, et al. High utility episode mining made practical and fast[C]// Advanced Data Mining and Applications:LNAI volume 8933. Berlin: Springer, 2014: 71-84.
[46]RATHORE S, DAWAR S, GOYAL V, et al. Top-K high utility episode mining from a complex event sequence[C]// Proceedings of the 21st International Conference on Mmanagement of Data. Telengana:Computer Society of India, 2016:1-8.
[47]LIN Y F, HUANG C F, TSENG V S. A novel methodology for stock investment using high utility episode mining and genetic algorithm[J]. Applied Soft Computing, 2017, 59: 303-315. DOI: 10.1016/j.asoc.2017.05.032.
[48]GAN W S, LIN J C W, CHAO H C, et al. Discovering high utility episodes in sequences[EB/OL].(2019-12-25)[2021-09-09].https://arxiv.org/abs/1912.11670.
[49]HONG T P, HSU J H, YANG K J, et al. Mining high utility partial periodic pattern by GPA[C]// 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Piscataway, NJ: IEEE, 2017: 820-824. DOI: 10.1109/SMC.2017.8122710.
[50]LIN J C W, ZHANG J X, FOURNIER-VIGER P, et al. Efficient mining of short periodic high-utility itemsets[C]// 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Piscataway, NJ: IEEE, 2016: 003083-003088. DOI: 10.1109/SMC.2016.7844710.
[51]DINH T, HUYNH V N, LE B. Mining periodic high utility sequential patterns[C]// Intelligent Information and Database Systems:LNAI volume 10191. Berlin: Springer, 2017: 545-555.
[52]ISMAIL W, HASSAN M M, FORTINO G. Productive-associated periodic high-utility itemsets mining[C]// 2017 IEEE 14th International Conference on Networking, Sensing and Control (ICNSC). Piscataway, NJ: IEEE, 2017: 637-642. DOI: 10.1109/ICNSC.2017.8000165.
[53]DINH D T, LE B, FOURNIER-VIGER P, et al. An efficient algorithm for mining periodic high-utility sequential patterns[J]. Applied Intelligence, 2018, 48(12): 4694-4714. DOI: 10.1007/s10489-018-1227-x.
[54]REDDY T Y, KIRAN R U, TOYODA M, et al. Discovering partial periodic high utility itemsets in temporal databases[C]// Database and Expert Systems Applications:LNISA volume 11707. Berlin: Springer, 2019: 351-361.
[55]HONG T P, LEE C H, WANG S L. Mining high average-utility itemsets[C]// 2009 IEEE International Conference on Systems, Man and Cybernetics. Piscataway,NJ: IEEE, 2009: 2526-2530. DOI: 10.1109/ICSMC.2009.5346333.
[56]YUN U, KIM D. Mining of high average-utility itemsets using novel list structure and pruning strategy[J]. Future Generation Computer Systems, 2017, 68: 346-360. DOI: 10.1016/j.future.2016.10.027.
[57]LAN G C, HONG T P, TSENG V S. A projection-based approach for discovering high average-utility itemsets[J]. Journal of Information Science and Engineering, 2012, 28(1): 193-209.
[58]LU T, VO B, NGUYEN H T, et al. A new method for mining high average utility itemsets[C]// Computer Information Systems and Industrial Management:LNISA volume 8838. Berlin: Springer, 2015: 33-42.
[59]LIN J C W, REN S, FOURNIER-VIGER P, et al. EHAUPM: efficient high average-utility pattern mining with tighter upper bounds[J]. IEEE Access, 2017, 5: 12927-12940. DOI: 10.1109/ACCESS.2017.2717438.
[60]DAM T L, RAMAMPIARO H, NØRVÅG K, et al. Towards efficiently mining closed high utilityitemsets from incremental databases[J]. Knowledge-Based Systems, 2019, 165: 13-29. DOI: 10.1016/j.knosys.2018.11.019.
[61]DAM T L, LI K L, FOURNIER-VIGER P, et al. CLS-Miner: efficient and effective closed high-utility itemset mining[J]. Frontiers of Computer Science, 2019, 13(2): 357-381. DOI: 10.1007/s11704-016-6245-4.
[62]SHIE B E, YU P S, TSENG V S. Efficient algorithms for mining maximal high utility itemsets from data streams with different models[J]. Expert Systems with Applications, 2012, 39(17): 12947-12960. DOI: 10.1016/j.eswa.2012.05.035.
[63]WU C W, FOURNIER-VIGER P, GU J Y, et al. Mining compact high utility itemsets without candidate generation[M]// FOURNIER-VIGER P, LIN J C W, NKAMBOU R, et al. High-Utility Pattern Mining. Berlin:Springer, 2019: 279-302.
[64]CHU C J, TSENG V S, LIANG T. An efficient algorithm for mining high utility itemsets with negative item values in large databases[J]. Applied Mathematics and Computation, 2009, 215: 767-778.
[65]SUBRAMANIAN K, KANDHASAMY P. UP-GNIV: an expeditious high utility pattern mining algorithm for itemsets with negative utility values[J]. International Journal of Information Technology and Management, 2015, 14(1): 26-42. DOI: 10.1504/IJITM.2015.066056.
[66]SINGH K, SHAKYA H K, SINGH A, et al. Mining of high utility itemsets with negative utility[J]. Expert Systems, 2018, 35(6): e12296. DOI: 10.1111/exsy.12296.
[67]KRISHNAMOORTHY S. Efficiently mining high utility itemsets with negative unit profits[J]. Knowledge-Based Systems, 2018, 145: 1-14. DOI: 10.1016/j.knosys.2017.12.035.
[68]GAN W S, LIN J C W, FOURNIER-VIGER P, et al. Mining high-utility itemsets with both positive and negative unit profits from uncertain databases[C]// Advances in Knowledge Discovery and Data Mining:LNAI volume 10234. Berlin: Springer, 2017: 434-446.
[69]LAN G C, HONG T P, HUANG J P, et al. On-shelf utility mining with negative item values[J]. Expert Systems with Applications, 2014, 41(7): 3450-3459. DOI: 10.1016/j.eswa.2013.10.049.
[70]DAM T L, LI K L, FOURNIER-VIGER P, et al. An efficient algorithm for mining top-k on-shelf high utility itemsets[J]. Knowledge and Information Systems, 2017, 52(3): 621-655. DOI: 10.1007/s10115-016-1020-2.
[71]LAN G C, HONG T P, TSENG V S. Discovery of high utility itemsets from on-shelf time periods of products[J]. Expert Systems with Applications, 2011, 38(5): 5851-5857. DOI: 10.1016/j.eswa.2010.11.040.
[72]RADKAR A N, PAWAR S S. Mining high on-shelf utility itemsets with negative values from dynamic updated database[EB/OL].(2015-07-07)[2021-09-09].https://arxiv.org/abs/1507.01759.
[73]LIN J C W, GAN W S, FOURNIER-VIGER P, et al. Efficient algorithms for mining high-utility itemsets in uncertain databases[J]. Knowledge-Based Systems, 2016, 96: 171-187. DOI: 10.1016/j.knosys.2015.12.019.
[74]BUI N, VO B, HUYNH V N, et al. Mining closed high utility itemsets in uncertain databases[C]// Proceedings of the Seventh Symposium on Information and Communication Technology. New York: Association for Computing Machinery, 2016: 7-14. DOI: 10.1145/3011077.3011124.
[75]LIU J Q, ZHAO R, YANG X C, et al. Efficient parallel algorithm for mining high utility patterns based on spark[C]// 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC). Piscataway,NJ:IEEE, 2019: 367-372. DOI: 10.1109/DSC.2019.00062.
[1] 杨州, 范意兴, 朱小飞, 郭嘉丰, 王越. 神经信息检索模型建模因素综述[J]. 广西师范大学学报(自然科学版), 2021, 39(2): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 艾艳, 贾楠, 王媛, 郭静, 潘东东. 多性状多位点遗传关联分析的统计方法研究及其应用进展[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 1 -14 .
[2] 白德发, 徐欣, 王国长. 函数型数据广义线性模型和分类问题综述[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 15 -29 .
[3] 曾庆樊, 秦永松, 黎玉芳. 一类空间面板数据模型的经验似然推断[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 30 -42 .
[4] 张治飞, 段谦, 刘乃嘉, 黄磊. 基于Jackknife互信息的高维非线性回归模型研究[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 43 -56 .
[5] 杨迪, 方扬鑫, 周彦. 基于MEB和SVM方法的新类别分类研究[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 57 -67 .
[6] 陈钟秀, 张兴发, 熊强, 宋泽芳. 非对称DAR模型的估计与检验[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 68 -81 .
[7] 杜锦丰, 王海荣, 梁焕, 王栋. 基于表示学习的跨模态检索方法研究进展[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 1 -12 .
[8] 晁睿, 张坤丽, 王佳佳, 胡斌, 张维聪, 韩英杰, 昝红英. 中文多模态知识库构建[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 31 -39 .
[9] 李正光, 陈恒, 林鸿飞. 基于双向语言模型的社交媒体药物不良反应识别[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 40 -48 .
[10] 周圣凯, 富丽贞, 宋文爱. 基于深度学习的短文本语义相似度计算模型[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 49 -56 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发