Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (1): 175-186.doi: 10.16088/j.issn.1001-6600.2021060911

Previous Articles     Next Articles

Robust Estimation of Multivariate Linear Regression Model Based on MRCD Estimation

YAN Haibo1, DENG Gang2, JIANG Yunlu2*   

  1. 1. School of Public Administration, Jinan University, Guangzhou Guangdong 510632, China;
    2. School of Economics, Jinan University, Guangzhou Guangdong 510632, China
  • Received:2021-06-09 Revised:2021-07-18 Online:2022-01-25 Published:2022-01-24

Abstract: Data with outliers and high-dimensional data appear more and more frequently, challenging the existing robust estimation methods and multivariate linear model estimation methods. The traditional multivariate linear model estimation is very sensitive to outliers, and the multivariate linear model estimation based on the MCD estimation method has a certain resistance to outliers. But with the increase of the data dimensionality, the accuracy of MCD estimation continues to decrease, and the robustness also decreases. The MCD estimation method fails when the data dimensionality is greater than the sample size. Therefore, using the mean vector and covariance matrix estimation of MRCD, a high-dimensional robust multivariate linear model estimation based on the MRCD estimation method is proposed. Numerical simulation results show that the multivariate linear model estimation based on the MRCD estimation method can resist outliers well, and when the data dimension is larger than the sample size, the multivariate linear model estimation based on the MRCD estimation method is more effective. The results of empirical analysis show that the multiple linear regression estimation based on the MRCD method can better resist outliers and get better prediction results.

Key words: outliers, high-dimensional data, MCD estimation, MRCD estimation, multivariate linear model

CLC Number: 

  • O212.1
[1] 耿修林. 多元质量特性预报: MULTIVARIATE回归分析的应用[J]. 数理统计与管理, 2008, 27(5): 807-814. DOI: 10.13860/j.cnki.sltj.2008.05.002.
[2]邓永亮. 网络营销规模影响因素的多重回归与实证分析[J]. 商业时代, 2013(6): 39-40.
[3]耿修林, 黄婷婷. 基于多重多元回归的多目标影响因素效应比较及应用: 以企业经营活动分析为例[J]. 统计与信息论坛, 2019, 34(10): 100-107. DOI: 10.3969/j.issn.1007-3116.2019.10.013.
[4]向润, 陈素芬, 曾雪强. 基于多重多元回归的人脸年龄估计[J]. 山东大学学报(工学版), 2019, 49(2): 54-60.
[5]廖文辉, 林睿, 何志锋, 等. 基于稳健回归的颗粒物浓度预测研究[J]. 湖南理工学院学报(自然科学版), 2021, 34(2): 20-23, 91. DOI: 10.16740/j.cnki.cn43-1421/n.2021.02.005.
[6]HUBER P J. Robust regression: asymptotics, conjectures and Monte Carlo[J]. The Annals of Statistics, 1973, 1(5): 799-821. DOI: 10.1214/aos/1176342503.
[7]JUREČKOVÁ J. Nonparametric estimate of regression coefficients[J]. The Annals of Mathematical Statistics, 1971, 42(4): 1328-1338. DOI: 10.1214/aoms/1177693245.
[8]KOENKER R, PORTNOY S. L-estimation for linear models[J]. Journal of the American Statistical Association, 1987, 82(399): 851-857. DOI: 10.1080/01621459.1987.10478508.
[9]KRASKER W S, WELSCH R E. Efficient bounded-influence regression estimation[J]. Journal of the American Statistical Association, 1982, 77(379): 595-604. DOI: 10.1080/01621459.1982.10477855.
[10]MARONNA R A, YOHAI V J. Asymptotic behavior of general M-estimates for regression and scale with random carriers[J]. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 1981, 58(1): 7-20. DOI: 10.1007/BF00536192.
[11]ANDREWS D F, BICKEL P J, HAMPEL F R, et al. Robust estimates of location: survey and advances[M]. Princeton, NJ: Princeton University Press, 2016.
[12]HAMPEL F R. Beyond location parameters: robust concepts and methods[J]. Bulletin of the International Statistical Institute, 1975, 46(1): 375-382.
[13]ROUSSEEUW P J. Least median of squares regression[J]. Journal of the American Statistical Association, 1984, 79(388): 871-880. DOI: 10.1080/01621459.1984.10477105.
[14]PIEPEL G F. Robust regression and outlier detection[J]. Technometrics, 1989, 31(2): 260-261. DOI: 10.1080/00401706. 1989.10488524.
[15]AGULLÓ J, CROUX C, VAN AELST S. The multivariate least-trimmed squares estimator[J]. Journal of Multivariate Analysis, 2008, 99(3):311-338. DOI: 10.1016/j.jmva.2006.06.005.
[16]SHE Y Y, OWEN A B. Outlier detection using nonconvex penalized regression[J]. Journal of the American Statistical Association, 2011, 106(494): 626-639. DOI: 10.1198/jasa.2011.tm10390.
[17]KONG D H, BONDELL H D, WU Y C. Fully efficient robust estimation, outlier detection and variable selection via penalized regression[J]. Statistica Sinica, 2018, 28(2): 1031-1052. DOI: 10.5705/ss.202016.0441.
[18]GAO X L, FENG Y. Penalized weighted least absolute deviation regression[J]. Statistics and Its Interface, 2018, 11(1): 79-89. DOI: 10.4310/SII.2018.v11.n1.a7.
[19]JIANG Y L, WANG Y, ZHANG J T, et al. Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method[J]. Journal of Applied Statistics, 2021, 48(2): 234-246. DOI: 10.1080/02664763.2020.1722079.
[20]KOENKER R, PORTNOY S.M estimation of multivariate regressions[J]. Journal of the American Statistical Association, 1990, 85(412): 1060-1068. DOI: 10.1080/01621459.1990.10474976.
[21]BILODEAU M, DUCHESNE P. Robust estimation of the SUR model[J]. The Canadian Journal of Statistics, 2000, 28(2): 277-288. DOI: 10.2307/3315978.
[22]DAVIES P L. Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices[J]. The Annals of Statistics, 1987, 15(3): 1269-1292. DOI: 10.1214/aos/1176350505.
[23]ROELANT E, VAN AELST S, CROUX C. Multivariate generalized S-estimators[J]. Journal of Multivariate Analysis, 2009, 100(5): 876-887. DOI: 10.1016/j.jmva.2008.09.002.
[24]CROUX C, ROUSSEEUW P J, HÖSSJER O. Generalized S-estimators[J]. Journal of the American Statistical Association, 1994, 89(428): 1271-1281. DOI: 10.1080/01621459.1994.10476867.
[25]BEN M G, MARTÍNEZ E, YOHAI V J. Robust estimation for the multivariate linear model based on a τ-scale[J]. Journal of Multivariate Analysis, 2006, 97(7): 1600-1622. DOI: 10.1016/j.jmva.2005.08.007.
[26]GAO C. Robust regression via mutivariate regression depth[EB/OL]. (2017-02-15)[2021-06-09]. https://arxiv.org/abs/ 1702.04656.
[27]ROUSSEEUW P J, VAN DRIESSEN K. A fast algorithm for the minimum covariance determinant estimator[J]. Technometrics, 1999, 41(3): 212-223. DOI: 10.1080/00401706.1999.10485670.
[28]ROUSSEEUW P J, VAN AELST S, VAN DRIESSEN K, et al. Robust multivariate regression[J]. Technometrics, 2004, 46(3): 293-305. DOI: 10.1198/004017004000000329.
[29]BOUDT K, ROUSSEEUW P J, VANDUFFEL S, et al. The minimum regularized covariance determinant estimator[J]. Statistics and Computing, 2020, 30(1): 113-128. DOI: 10.1007/s11222-019-09869-x.
[30]姜云卢, 胡月, 刘巧云, 等. 高维稳健主成分聚类方法及其应用研究[J/OL]. 数理统计与管理, 2020[2021-06-09]. https://doi.org/10.13860/j.cnki.sltj.20201016-002.
[31]LEDOIT O, WOLF M. A well-conditioned estimator for large-dimensional covariance matrices[J]. Journal of Multivariate Analysis, 2004, 88(2): 365-411. DOI: 10.1016/S0047-259X(03)00096-4.
[32]HARDIN J, ROCKE D M. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator[J]. Computational Statistics & Data Analysis, 2004, 44(4): 625-638. DOI: 10.1016/S0167-9473(02)00280-3.
[33]YEH I C. Modeling slump flow of concrete using second-order regressions and artificial neural networks[J]. Cement and Concrete Composites, 2007, 29(6): 474-480. DOI: 10.1016/j.cemconcomp.2007.02.001.
[34]ROUSSEEUW P J, VAN ZOMEREN B C. Unmasking multivariate outliers and leverage points[J]. Journal of the American Statistical Association, 1990, 85(411): 633-639. DOI: 10.1080/01621459.1990.10474920.
[1] ZHANG Zhifei, DUAN Qian, LIU Naijia, HUANG Lei. High-dimensional Nonlinear Regression Model Based on JMI [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 43-56.
[2] CHEN Zhongxiu, ZHANG Xingfa, XIONG Qiang, SONG Zefang. Estimation and Test for Asymmetric DAR Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 68-81.
[3] LIU Yu, ZHOU Wen, LI Ni. Semiparametric Rate Models for Recurrent Event Data with Cure Rate via Empirical Likelihood [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 139-149.
[4] ZHU Enwen, ZHU Anqi, WANG Jiedan, LIU Yujiao. Research on Wind Power Short-term Prediction Based on EEMD-GA-BP Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 166-174.
[5] LIANG Xin, CHEN Xiaoling, ZHANG Xingfa, LI Yuan. A Class of Autoregressive Moving Average Model with GARCH Type Errors [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 195-205.
[6] TIAN Zhentao, ZHANG Junjian. Quantile Feature Screening for Ultra High Dimensional Censored Data [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(6): 99-111.
[7] XIE Donglin, DENG Guohe. Pricing Forward-start Power Options with Product of Two Assets in a Stochastic Interest Rate and Jump Diffusion Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(5): 158-172.
[8] LI Lili, ZHANG Xingfa, LI Yuan, DENG Chunliang. Daily GARCH Model Estimation Using High Frequency Data [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(4): 68-78.
[9] LIN Song, YIN Changming. Asymptotic Properties of Estimation of Penalized Generalized Estimating Equations for Two Stage Logit Models [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(2): 126-130.
[10] HE Lin, YANG Shanchao. Asymptotic Variance Edge Frequency Polygons Estimator for α-Mixing Random Fields [J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(1): 88-94.
[11] LEI Qingzhu,QIN Yongsong,LUO Min. Empirical Bayes Estimation and Test for Scale ExponentialFamilies under Strong Mixing Samples [J]. Journal of Guangxi Normal University(Natural Science Edition), 2017, 35(3): 63-74.
[12] ZHANG Junjian, LAI Tingyu, YANG Xiaowei. Bayesian Empirical Likelihood Estimation on VaR and ES [J]. Journal of Guangxi Normal University(Natural Science Edition), 2016, 34(4): 38-45.
[13] ZHANG Xin-cheng, ZHANG Jun-jian, ZHAN Huan. A Goodness-of-fit Test Based on Empirical Euclidean Likelihood and Vertical Density Representation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2013, 31(4): 60-65.
[14] DU Xue-song, BIN Shi-yu, LIN Yong, TANG Zhang-sheng, ZHANG Yong-de, ZENG Lan, YANG Hui-zan, CHEN Zhong. Cold Tolerance Determination Model of Tilapia Based on ULCIZ and SIT [J]. Journal of Guangxi Normal University(Natural Science Edition), 2013, 31(4): 134-139.
[15] ZHANG Jun-jian, ZHAN Huan, YAN Zhen. A Goodness of Fit Test Based on Empirical Euclidean Likelihood [J]. Journal of Guangxi Normal University(Natural Science Edition), 2012, 30(3): 30-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LIU Guolun, SONG Shuxiang, CEN Mingcan, LI Guiqin, XIE Lina. Design of Bandwidth Tunable Band-Stop Filter[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 1 -8 .
[2] LIU Ming, ZHANG Shuangquan, HE Yude. Classification Study of Differential Telecom Users Based on SOM Neural Network[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(3): 17 -24 .
[3] HU Yucong, CHEN Xu, LUO Jialing. Network Design Model of Customized Bus in Diversified Operationof Multi-origin-destination and Multi-type Vehicle Mixed Load[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 1 -11 .
[4] TANG Tang, WEI Chengyun, LUO Xiaoshu, QIU Senhui. Study of Seeker Optimization Algorithm with Inertia TermSelf-tuning to Attitude Stability of Quadrotor UAV[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 12 -19 .
[5] LIN Yue, LIU Tingzhang, HUANG Lirong, XI Xiaoye, PAN Jian. Anomalous State Detection of Power Transformer Basedon Bidirectional KL Distance Clustering Algorithm[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 20 -26 .
[6] WEI Zhenhan, SONG Shuxiang, XIA Haiying. State-of-charge Estimation Using Random Forest for Lithium Ion Battery[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 27 -33 .
[7] XU Yuanjing, HU Weiping. Identification of Pathological Voice of Different Levels Based on Random Forest[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 34 -41 .
[8] ZHANG Canlong, SU Jiancai, LI Zhixin, WANG Zhiwen. Infrared-Visible Target Tracking Basedon AdaBoost Confidence Map[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 42 -50 .
[9] LIU Dianting, WU Lina. Domain Experts Recommendation in Social Network Basedon the LDA Theme Model of Trust[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 51 -58 .
[10] JIANG Yingxing, HUANG Wennian. Ground State Solutions for the NonlinearSchrödinger-Maxwell Equations[J]. Journal of Guangxi Normal University(Natural Science Edition), 2018, 36(4): 59 -66 .