Journal of Guangxi Normal University(Natural Science Edition) ›› 2019, Vol. 37 ›› Issue (1): 142-148.doi: 10.16088/j.issn.1001-6600.2019.01.016

Previous Articles     Next Articles

Quantity Optimization of Virtual Sample Generation with Two Kinds of Upper Bound Conditions

LIN Yue1,2,LIU Tingzhang2*,WANG Zhehe1   

  1. 1.College of Science, Hainan Tropical Ocean University, Sanya Hainan 572022,China;
    2.College of Automation, Shanghai University, Shanghai 200072, China
  • Received:2018-06-18 Online:2019-01-20 Published:2019-01-08

Abstract: With small sample data sets, the virtual sample generation technology has been proved to effectively improve the performance of machine learning algorithm. However, there is no definite conclusion for the optimal generation number. First of all, under the condition of the limit of standard variance of a given training sample, the information entropy theory is proposed to study the number of optimal virtual sample generation. In addition, the noise generated by virtual sample generation is taken into account and a general probability model and the analysis method of the number of optimal virtual samples are established at a given confidence level (0.95). A small sample data set is set up based on the historical monitoring fault data of a substation in Huzhou, Zhejiang, in 2016 and a four virtual sample generation experiment is designed. The results show that the two optimal virtual sample generation rules are effective, and the accuracy of the corresponding machine learning prediction is obviously improved.

Key words: small sample, machine learning, virtual sample, information entropy, confidence level

CLC Number: 

  • TP181
[1] 陈潭.大数据战略实施的实践逻辑与行动框架[J].中共中央党校学报,2017,21(2):19-26.DOI:10.14119/j.cnki.zgxb.2017.02.003.
[2] 郭毅可.走好我们的大数据之路[J].上海大学学报(自然科学版),2016,22(1):1-2.DOI:10.3969/j.issn.1007-2861.2015. 05.016.
[3] 宫夏屹,李伯虎,柴旭东,等.大数据平台技术综述[J].系统仿真学报,2014,26(3):489-496.DOI:10.16182/j.cnki.joss. 2014.03.039.
[4] EFRON B,TIBSHIRANI R J. An introduction to the bootstrap[M]. New York: Chapmen and Hall, 1993.
[5] TSAI T I, LI D C. Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems[J]. Expert Systems with Applications,2008,35(3):1293-1300.DOI:10.1016/j.eswa.2007.08.043.
[6] HUANG Chongfu,MORAGA C.A diffusion-neural-network for learning from small samples[J].International Journal of Approximate Reasoning,2004,35(2):137-161.DOI:10.1016/j.ijar.2003.06.001.
[7] LI D C,WU C S,CHANG F M.Using data-fuzzification technology in small data set learning to improve FMS scheduling accuracy[J].The International Journal of Advanced Manufacturing Technology,2005,27(3/4):321-328.DOI: 10.1007/s00170-003-2184-y.
[8] LI D C, WU C S,TSAI T I,et al.Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge[J].Computers and Operations Research,2007,34(4):966-982.DOI: 10.1016/j.cor.2005.05.019.
[9] LIN Y S,LI D C.The generalized-trend-diffusion modeling algorithm for small data sets in the early stages of manufacturing systems[J].European Journal of Operational Research,2010,207(1):121-130.DOI:10.1016/j.ejor.2010.03. 026.
[10] LI D C,CHEN C C,CHANG C J,et al.A tree-based-trend-diffusion prediction procedure for small sample sets in the early stages of manufacturing systems[J].Expert Systems with Applications,2012,39(1):1557-1581.DOI:10.1016/j.eswa. 2011.08.071.
[11] 朱宝,陈忠圣,余乐安.一种新颖的小样本整体趋势扩散技术[J].化工学报,2016,67(3):820-826.DOI:10.11949/j.issn. 0438-1157.20151921.
[12] CHEN Zhongsheng,ZHU Bao,HE Yanlin, et al.A PSO based virtual sample generation method for small sample sets:applications to regression datasets[J].Engineering Applications of Artificial Intelligence,2017,59:236-243.DOI:10. 1016/j.engappai.2016.12.024.
[13] YANG Jing,YU Xu,XIE Zhiqiang,et al.A novel virtual sample generation method based on Gaussian distribution[J]. Knowledge-Based Systems,2011,24(6):740-748.DOI:10.1016/j.knosys.2010.12.010.
[14] 徐中民,张志强,程国栋,等.运用信息熵理论研究条件估值调查中的抽样问题[J].系统工程理论与实践,2003(3):129-134.DOI:10.3321/j.issn:1000-6788.2003.03.023.
[15] 林耀三,张延全,张哲荣,等.虚拟样本合适性筛选机制[C]//第25届全国灰色系统会议论文集.北京:中国高等科学技术中心,2014:372-379.
[16] 王松桂,张忠占,程维虎,等.概率论与数理统计[M].北京:科学出版社,2004:120-127.
[1] ZHANG Yongsheng, ZHU Wenjun, SHI Ruoqi, DU Zhenhua, ZHANG Rui, WANG Zhi. A Confidence-guided Hybrid Android Malware DetectionSystem with Multiple Heterogeneous Algorithms [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(2): 19-28.
[2] XU Li, DING Shi-fei, GUO Feng-feng. A Rough Kernel Clustering Algorithm Based on ImprovedAttribute Reduction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(3): 105-109.
[3] ZHANG Ren-jin, TANG Cui-fang, LIU Bin. Researching and Programming of Computer Games Using Artificial Neural Networks [J]. Journal of Guangxi Normal University(Natural Science Edition), 2011, 29(2): 119-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!