|
广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (3): 185-193.doi: 10.16088/j.issn.1001-6600.2021071801
陈高建, 王菁*, 栗倩文, 袁云静, 曹嘉琛
CHEN Gaojian, WANG Jing*, LI Qianwen, YUAN Yunjing, CAO Jiachen
摘要: 自动化机器学习是机器学习前沿的一个重要问题,自动化机器学习工具根据数据集及任务需求组合机器学习算子来构造流程,使领域用户在不具备专业机器学习知识的情况下也能完成相应数据分析工作,但目前的自动化机器学习工具普遍存在耗时长和精度低的问题。本文基于数据集相似性和强化学习原理,提出一种数据驱动的自动化机器学习流程的生成方法,利用相似数据集的历史知识,将神经网络与MCTS相结合,指导机器学习流程的生成。实验结果表明:该方法在耗时方面缩短至分钟级别,流程性能也得到提升。
中图分类号:
[1]HUTTER F, KOTTHOFF L, VANSCHOREN J. Automated machine learning: methods, systems, challenges[M]. Berlin: Springer, 2019. [2]VANSCHOREN J, VAN RIJN J N, BISCHL B, et al. OpenML: networked science in machine learning[J]. ACM SIGKDD Explorations Newsletter, 2013, 15(2): 49-60. DOI: 10.1145/2641190.2641198. [3]BAYDIN A G, PEARLMUTTER B A, RADUL A A, et al. Automatic differentiation in machine learning: a survey[J]. The Journal of Machine Learning Research, 2017, 18(1): 5595-5637. [4]崔佳旭,杨博. 贝叶斯优化方法和应用综述[J]. 软件学报,2018,29(10):3068-3090. [5]季辉,丁泽军. 双人博弈问题中的蒙特卡洛树搜索算法的改进[J]. 计算机科学, 2018, 45(1):140-143. [6]李智勇,黄滔,陈少淼,等. 约束优化进化算法综述[J]. 软件学报, 2017, 28(6): 1529-1546.DOI: 10.13328/j.cnki.jos.005259. [7]刘全,翟建伟,章宗长,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1): 1-27. DOI: 10.11897/sp.j.1016.2018.00001. [8]张爱军, 杨泽斌. 自动化机器学习中的超参调优方法[J]. 中国科学:数学, 2020, 50(5):695-710. [9]KOTTHOFF L, THORNTON C, HOOS H H, et al. Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA[J]. The Journal of Machine Learning Research, 2017, 18(1): 826-830. [10]FEURER M, KLEIN A, EGGENSPERGER K, et al. Efficient and robust automated machine learning[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. New York: ACM, 2015: 2755-2763. [11]BERGSTRA J, BARDENET R,BENGIO Y, et al. Algorithms for hyper-parameter optimization[C]// Proceedings of the 24th International Conference on Neural Information Processing Systems.New York: ACM, 2011: 2546-2554. [12]SWEARINGEN T, DREVO W, CYPHERS B, et al. ATM: a distributed, collaborative, scalable system for automated machine learning[C]// 2017 IEEE International Conference on Big Data(Big Data). Piscataway, NJ:IEEE, 2017: 151-162. [13]OLSON R S, MOORE J H. TPOT: a tree-based pipeline optimization tool for automating machine learning[C]// Proceedings of the Workshop on Automatic Machine Learning. New York:PMLR ,2016: 66-74. [14]CHEN B, WU H, MO W, et al. Autostacker: a compositional evolutionary learning system[EB/OL].(2018-03-02)[2021-09-09]. http://arxiv.org/abs/1803.00684. [15]SILVER D, HUBERT T, SCHRITTWIESER J, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].(2017-12-05)[2021-09-09]. https://arxiv.org/abs/1712.01815. [16]DRORI I, KRISHNAMURTHY Y, RAMPIN R, et al. AlphaD3M: machine learning pipeline synthesis[EB/OL].(2021-11-03)[2021-12-20]. https://arxiv.org/abs/2111.02508. DOI: 10.48550/arXiv.2111.02508. [17]GAMA J, BRAZDIL P. Characterization of classification algorithms[C]// EPIA 1995: Progress in Artificial Intelligence. Berlin: Springer, 1995: 189-200. DOI: 10.1007/3-540-60428-6_16. [18]张忠林,曹志宇,李元韬. 基于加权欧式距离的k_means算法研究[J]. 郑州大学学报(工学版),2010,31(1):89-92. [19]颜奇. 基于皮尔逊相关系数的差分隐私决策树方法研究[D].桂林:广西师范大学,2021. [20]马宏伟,张光卫,李鹏. 协同过滤推荐算法综述[J]. 小型微型计算机系统, 2009, 30(7):1282-1288. [21]刘婷婷,汪云海,屠长河,等. 基于蒙特卡罗树搜索的树图布局[J]. 计算机辅助设计与图形学学报,2021,33(9):1367-1376. [22]MCALEER S, AGOSTINELLI F, SHMAKOV A, et al. Solving the Rubik’s cube without human knowledge[EB/OL].(2018-05-18)[2021-09-09]. https://arxiv.org/pdf/1805.07470.pdf. [23]AYE T T, LEE G K K, SU Y, et al. Layman analytics system: a cloud-enabled system for data analytics workflow recommendation[J]. IEEE Transactions on Automation Science and Engineering, 2016, 14(1): 160-170. DOI:10.1109/TASE.2016.2610521. [24]WU H C, LUK R W P, WONG K F, et al. Interpreting TF-IDF term weights as making relevance decisions[J]. ACM Transactions on Information Systems, 2008, 26(3):1-37. DOI:10.1145/1361684.1361686. [25]SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions: a local SVM approach[C]// Proceedings of the 17th International Conference on Pattern Recognition. Piscataway,NJ: IEEE, 2004: 32-36. DOI: 10.1109/ICPR.2004.1334462. [26]曾兆伟, 曹健. 数据分析服务流程模型推荐[J].小型微型计算机系统,2019,40(7):1374-1379. |
[1] | 唐峯竹, 唐欣, 李春海, 李晓欢. 基于深度强化学习的多无人机任务动态分配[J]. 广西师范大学学报(自然科学版), 2021, 39(6): 63-71. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |