Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (1): 91-99.doi: 10.16088/j.issn.1001-6600.2021060903

Previous Articles     Next Articles

Sampling Method Based on Slice Inverse Regression in Big Data

HE Jianfeng1*, SHI Li1,2   

  1. 1. School of Economics and Finance, South China University of Technology, Guangzhou Guangdong 510006, China;
    2. School of Economics and Trade, Guangzhou Huashang College, Guangzhou Guangdong 511300, China
  • Received:2021-06-09 Revised:2021-07-27 Online:2022-01-25 Published:2022-01-24

Abstract: Sampling survey is still an indispensable data acquisition and statistical inference method in the era of big data, but better value depends on the adaptation of sampling method to the real situation of big data. Among them, how to extract representative samples of research variables is the most concerned problem. A comprehensive score sampling method based on slice inverse regression is proposed to solve this problem. The slice inverse regression can integrate the dependent variable information into the independent variable. Firstly, slice inverse regression analysis is used on big data to improve its dimension reduction process. Then, the comprehensive score of each principal component is taken as the sampling probability. The results of data simulation analysis show that the proposed method has better sampling estimation effect compared with the sampling without implementation and simple random sampling estimation in the big data situation, and the better sampling estimation effect appears when the individual difference is large. Finally, the feasibility and effectiveness of this method are verified by the actual data.

Key words: big data, slice inverse regression, principal component analysis, comprehensive score, sampling estimation

CLC Number: 

  • O212.2
[1] RIVERS D.Sample matching. Representative sampling from internet panels[R]. Palo Alto: You Gov Polimetrix,2006:2-9.
[2]ORR E S, SISIC M, ROSS C, et al. The influence of shyness on the use of Facebook in an undergraduate sample[J]. Cyberpsychology and Behavior, 2009,12(3):337-340.
[3]KOGAN S M, WEJNERT C, CHEN Y F, et al. Respondent-driven sampling with hard-to-reach emerging adults: an introduction and case study with rural African Americans[J].Journal of Adolescent Research,2011,26(1):30-60.
[4]贺建风,李宏煜.大数据背景下基于社交网络的聚类随机游走抽样算法研究[J].统计研究,2021,38(4):131-144.
[5]LI K C. Sliced inverse regression for dimension reduction[J].Journal of the American Statistical Association,1991,86(414): 316-327.
[6]WANG H S, NI L Q, TSAI C L. Improving dimension reduction via contour-projection[J]. Statistica Sinica, 2008,18: 299-311.
[7]ZHU L P, ZHU L X, FENG Z H. Dimension reduction in regressions through cumulative slicing estimation[J].Journal of the American Statistical Association, 2010, 105:1455-1466.
[8]DONG Y X, YU Z, ZHU L P. Robust inverse regression for dimension reduction[J].Journal of Multivariate Analysis, 2015,134:71-81.
[9]林海明,杜子芳.主成分分析综合评价应该注意的问题[J].统计研究,2013,30(8):25-31.
[10]MA P, MAHONEY M W, YU B. A statistical perspective on algorithmic leveraging[J]. Journal of Machine Learning Research, 2015, 16:861-911.
[11]秦磊,王奕丹,苏治.大规模数据下基于充分降维的Leverage重要性抽样方法[J].统计研究, 2020,37(3): 114-128.
[12]于秀林,任雪松.多元统计分析[M].北京:中国统计出版社,1999:154-162.
[13]石立,林海明.关于主成分分析综合评价函数质疑的讨论[J].数学的实践与认识,2020,50(14): 312-320.
[14]FANAEE T H, GAMA J. Event labeling combining ensemble detectors and background knowledge[J]. Progress in Artificial Intelligence, 2014, 2(2):113-127.
[1] BAI Defa, XU Xin, WANG Guochang. Review of Generalized Linear Models and Classification for Functional Data [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 15-29.
[2] ZHAO Xin , SONG Yingqiang, HU Yueming, LIU Yilun, ZHU Axing. Optimizing Spatial Distribution of Residential Areas by Using Multi-Source Open Data [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(1): 26-40.
[3] TANG Zhenjun. Image Hashing Algorithm Based on PCA Feature Distance [J]. Journal of Guangxi Normal University(Natural Science Edition), 2016, 34(4): 9-18.
[4] LIU Huimin, GUAN Dongjie, ZHANG Mengjie. Stress Factorsand Stress Mechanism on Subsequent Developmentof Ecological Security in the Three Gorges Reservoir Area [J]. Journal of Guangxi Normal University(Natural Science Edition), 2016, 34(3): 150-158.
[5] PU Ling, LI Hai-chao, PU Yu, JIANG Hong-xia. Principal Component Analysis of Trace Elements in 12 Commonly Used Chinese Herbal Medicines [J]. Journal of Guangxi Normal University(Natural Science Edition), 2014, 32(4): 96-100.
[6] HE Xi, LI Xu, ZHOU Wei, LI Qi-sheng. Morphologic Differentiation of Garra imberba from Different Population [J]. Journal of Guangxi Normal University(Natural Science Edition), 2013, 31(4): 128-133.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LIU Guolun, SONG Shuxiang, CEN Mingcan, LI Guiqin, XIE Lina. Design of Bandwidth Tunable Band-Stop Filter[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 1 -8 .
[2] LIU Ming, ZHANG Shuangquan, HE Yude. Classification Study of Differential Telecom Users Based on SOM Neural Network[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 17 -24 .
[3] HU Yucong, CHEN Xu, LUO Jialing. Network Design Model of Customized Bus in Diversified Operationof Multi-origin-destination and Multi-type Vehicle Mixed Load[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 1 -11 .
[4] TANG Tang, WEI Chengyun, LUO Xiaoshu, QIU Senhui. Study of Seeker Optimization Algorithm with Inertia TermSelf-tuning to Attitude Stability of Quadrotor UAV[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 12 -19 .
[5] LIN Yue, LIU Tingzhang, HUANG Lirong, XI Xiaoye, PAN Jian. Anomalous State Detection of Power Transformer Basedon Bidirectional KL Distance Clustering Algorithm[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 20 -26 .
[6] WEI Zhenhan, SONG Shuxiang, XIA Haiying. State-of-charge Estimation Using Random Forest for Lithium Ion Battery[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 27 -33 .
[7] XU Yuanjing, HU Weiping. Identification of Pathological Voice of Different Levels Based on Random Forest[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 34 -41 .
[8] ZHANG Canlong, SU Jiancai, LI Zhixin, WANG Zhiwen. Infrared-Visible Target Tracking Basedon AdaBoost Confidence Map[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 42 -50 .
[9] LIU Dianting, WU Lina. Domain Experts Recommendation in Social Network Basedon the LDA Theme Model of Trust[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 51 -58 .
[10] JIANG Yingxing, HUANG Wennian. Ground State Solutions for the NonlinearSchrödinger-Maxwell Equations[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 59 -66 .