广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (1): 108-124.doi: 10.16088/j.issn.1001-6600.2021060909

• 研究论文 • 上一篇    下一篇

混合广义部分线性加性模型的参数估计

任帅, 程文慧, 周洁*   

  1. 首都师范大学 数学科学学院, 北京 100048
  • 收稿日期:2021-06-09 修回日期:2021-07-07 出版日期:2022-01-25 发布日期:2022-01-24
  • 通讯作者: 周洁(1986—),男,四川南充人,首都师范大学副教授,博士。E-mail: zhoujie@amss.ac.cn
  • 基金资助:
    国家自然科学基金(11671275); 北京市教委科技计划项目(KM202010028017); 首都师范大学交叉科学研究院项目

Estimation of the Mixed Generalized Partially Linear Additive Model

REN Shuai, CHENG Wenhui, ZHOU Jie*   

  1. School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
  • Received:2021-06-09 Revised:2021-07-07 Online:2022-01-25 Published:2022-01-24

摘要: 广义部分线性加性模型具有参数和非参数2个部分,并且选择不同连接函数可以得到多种不同加性模型,是一种非常灵活的统计模型。有限混合模型是研究异质性总体的有效工具,扩展性很强,随着计算能力的不断提升,得到越来越广泛应用。本文将这2种模型相结合,提出混合广义部分线性加性模型(MGAPLM)。首先给出模型的定义,并在一些温和条件下证明模型可识别性;然后,使用将样条与核方法相结合的spline-backfitted-kernel(SBK) 方法估计模型中参数和非参数函数,并且证明估计量的渐近性质;此外,给出一种模型检验方法,检验所提出模型有效性,同时在正态分布和二项分布2种情形下进行数值模拟,给出估计量在有限样本下的表现;最后,将提出的方法应用到一组经济数据中,得到此数据下模型的具体形式,并结合实际对建模结果进行分析。

关键词: 广义部分线性加性模型, 样条, 混合模型, EM算法, SBK方法

Abstract: The generalized partially linear additive model has two parts: a parametric part and a non-parametric part. Different link functions can be applied to different situations. So it is a very flexible statistical model. The finite mixture model is an effective tool for studying heterogeneous populations, which has strong expansibility. With the improvement of computing power, it has been widely used. In this paper, mixture of generalized additive partial linear model (MGAPLM) is proposed by combining these two models. First, definition of the model and the identifiability results under some regular conditions are presented. Then the spline-backfitted-kernel (SBK) method that combines the spline and the kernel method is used to estimate the parameters and non-parametric function in the model. Furthermore, the asymptotic property of the estimator is given. In order to test whether the proposed model is effective, a model checking method is proposed under the normal distribution and the binomial distribution. Numerical simulation is carried out to show the performance of the estimator with a finite sample size. Finally, the proposed method is applied to economic data and obtain the specific form of the model.

Key words: generalized additive partial linear models, spline, mixture models, EM algorithm, SBK method

中图分类号: 

  • O212.7
[1] HASTIE T, TIBSHIRANI R. Generalized additive models[J]. Boca Raton: Chapman and Hall/CRC, Statistics 1990.
[2] HECKMAN N E. Spline smoothing in a partly linear model[J]. Journal of the Royal Statistical Society, 1986, 48(2): 244-248.
[3] SPECKMAN P. Kernel smoothing in partial linear model[J]. Journal of the Royal Statistical Society Series B: Methodological, 1988, 50(3): 413-436.
[4]ROBINSON P M. Root-N-consistent semiparametric regression[J]. Econometrica, 1988, 56(4): 931-954.
[5]CHEN H. Convergence rates for parametric components in a partly linear model[J]. The Annals of Statistics, 1988, 16 (1): 136-146.
[6]CHEN H, SHIAU J. A two-stage spline smoothing method for partially linear models[J]. Journal of Statistical Planning & Inference, 1991, 27(2): 187-201.
[7]LI Q. Efficient estimation of additive partially linear models[J]. Biometrika, 2002, 89(1): 39-48.
[8] FAN Y, LI Q. A kernel-based method for estimating additive partially linear models[J]. Statistica Sinica, 2003, 13(3): 739-762.
[9]MA S, YANG L. Spline-backfitted kernel smoothing of partially linear additive model[J]. Journal of Statistical Planning & Inference, 2011, 141(1): 204-219.
[10]WANG L, LIU X, LIANG H. Estimation and variable selection for generalized additive partial linear models[J]. The Annals of Statistics, 2011, 39(4): 1827-1851.
[11]黄四民, 梁华. 用半参数部分线性模型分析居民消费结构[J]. 数量经济技术经济研究, 1994(10): 33-38.
[12]李启华, 蓝志青, 邢雅康. 基于半参数估计的笔记本电脑特征价格指数研究[J]. 东北财经大学学报, 2011(2): 83-83.
[13]NEWCOMB S. A generalized theory of the combination of observations so as to obtain the best result[J]. American Journal of Mathematics, 1886, 8(4): 343-366.
[14]PEARSON K. Contributions to the mathematical theory of evolution[J]. Journal of the Royal Statistical Society, 1893, 56(4): 675-679.
[15]BÖHNING D. A review of reliable maximum likelihood algorithms for semiparametric mixture models[J]. Journal of Statistical Planning & Inference, 1995, 47(1/2): 5-28.
[16]BILMES J A. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models[R]. California: International Computer Science Institute, 2000.
[17]FIGUEIREDO M A T, JAIN A K. Unsupervised learning of finite mixture models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 381-396.
[18]YIN J T, ZHANG Y F, GAO L X. Accelerating distributed expectation-maximization algorithms with frequent updates[J]. Journal of Parallel & Distributed Computing, 2018, 111(1): 65-75.
[19]马晓敏, 贾卫东, 杨朔, 等. 有限混合模型在肝硬化住院患者医疗费用研究中的应用[J]. 中国卫生统计, 2017,34(3):412-414.
[20]JANSEN R C. Maximum likelihood in a generalized linear finite mixture model by using the EM algorithm[J].Biometrics, 1993, 49(1): 227-31.
[21]WEDEL M, DESARBO W S. A mixture likelihood approach for generalized linear models[J]. Journal of Classification, 1995, 12(1): 21-55.
[22] YOUNG D S, HUNTER D R. Mixtures of regressions with predictor-dependent mixing proportions[J]. Computational Statistics & Data Analysis, 2010, 54(10): 2253-2266.
[23]HUANG M, YAO W. Mixture of regression models with varying mixing proportions: A semiparametric approach[J]. Journal of the American Statistical Association, 2012, 107(498):711-724.
[24]WU X, YU C. Estimation of the mixtures of GLMs with covariate-dependent mixing proportions[J]. Communications in Statistics-Theory and Methods, 2016, 45(24): 7242-7257.
[25]CAO J, YAO W. Semiparametric mixture of binomial regression with a degenerate component[J]. Statistica Sinica, 2012, 22(1): 27-46.
[26]HUANG M, LI R Z, WANG S L. Nonparametric mixture of regression models[J]. Journal of the American Statistical Association, 2013, 108: 929-941.
[27]WANG S, YAO W, HUANG M. A note on the identifiability of nonparametric and semiparametric mixtures of GLMs[J]. Statistics & Probability Letters, 2014, 93: 41-45.
[28]XIANG S J, YAO W X. Semiparametric mixtures of regressions with single-index for model based clustering[J]. Advances in Data Analysis and Classification, 2020(14): 261-292.
[29]TITTERINGTON D M, SMITH A, Makov U E. Statistical analysis of finite mixture distributions[J]. Biometrics, 1986, 42(3): 679-680.
[30]HENNIG C. Identifiablity of models for clusterwise linear regression[J]. Journal of Classification, 2000, 17(2): 273-296.
[31]JENSEN D R. Mixture models: Theory, geometry and applications[J]. Journal of Statistical Planning & Inference, 1997, 59(1): 179-181.
[32]LI Q, WOOLDRIDGE J M. Semiparametric estimation of partially linear models for dependent data with generated regressors[J]. Econometric Theory, 2002, 18(3),625-645.
[33]ZHANG Y, PAN W Q. Estimation and inference for mixture of partially linear additive models[J//OL]. Communications in Statistics-Theory and Methods, 2020. [2021-06-09]. https://www.tandfonline/doi/full/10.1080103610926.2020.1777035.
[34]CLAESKENS G, KEILEGOM I V. Bootstrap confidence bands for regression curves and their derivatives[J]. The Annals of Statistics, 2003, 31(6): 1852-1884.
[35]FAN J Q, GIJBELS I. Local polynomial modelling and its applications: Monographs on statistics and applied probability[M]. Boca Raton: Chapman and Hall/CRC, 1996.
[36]BOOR C D. A practical guide to splines (Applied Mathematical Sciences)[M]. New York: Springer, 1978: 27.
[37]LI R, LIANG H. Variable selection in semiparametric regression modeling[J]. The Annals of Statistics, 2008, 36(1): 261-286.
[1] 许远静, 胡维平. 基于随机森林的不同程度病态嗓音识别[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 34-41.
[2] 周克良, 王亚光, 叶岑. 心音信号特征分析与识别方法研究[J]. 广西师范大学学报(自然科学版), 2015, 33(3): 34-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘国伦, 宋树祥, 岑明灿, 李桂琴, 谢丽娜. 带宽可调带阻滤波器的设计[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 1 -8 .
[2] 刘铭, 张双全, 何禹德. 基于改进SOM神经网络的异网电信用户细分研究[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 17 -24 .
[3] 胡郁葱, 陈栩, 罗嘉陵. 多起终点多车型混载的定制公交线路规划模型[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 1 -11 .
[4] 唐堂, 魏承赟, 罗晓曙, 丘森辉. 基于附加惯性项人群搜索算法的四旋翼无人机姿态控制研究[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 12 -19 .
[5] 林越, 刘廷章, 黄莉荣, 奚晓晔, 潘建. 基于双向KL距离聚类算法的变压器状态异常检测[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 20 -26 .
[6] 韦振汉, 宋树祥, 夏海英. 基于随机森林的锂离子电池荷电状态估算[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 27 -33 .
[7] 许远静, 胡维平. 基于随机森林的不同程度病态嗓音识别[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 34 -41 .
[8] 张灿龙, 苏建才, 李志欣, 王智文. 基于AdaBoost置信图的红外与可见光目标跟踪[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 42 -50 .
[9] 刘电霆, 吴丽娜. 社会网络中基于信任的LDA主题模型领域专家推荐[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 51 -58 .
[10] 姜影星, 黄文念. 非线性薛定谔-麦克斯韦方程的基态解[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 59 -66 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发