Journal of Guangxi Normal University(Natural Science Edition)

Review of Statistical Methods and Applications of Genetic Association Analysis for Multiple Traits and Multiple Locus

AI Yan, JIA Nan, WANG Yuan, GUO Jing, PAN Dongdong

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 1-14. DOI: 10.16088/j.issn.1001-6600.2021060904

Abstract ( 825 )

PDF（pc） (918KB) ( 2651 )

Save

In this review, the statistical issues of association of genetic variants are firstly analyzed. Secondly,the principles and basic assumptions of multiple traits test,single locus test and multiple loci analysis methods in recent years are systematically summarized, the existing problems and challenges of these methods are analyzed. Finally,the future possible development of the genetic method of multiple traits and multiple loci association analysis is pesented.

References | Related Articles | Metrics

Review of Generalized Linear Models and Classification for Functional Data

BAI Defa, XU Xin, WANG Guochang

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 15-29. DOI: 10.16088/j.issn.1001-6600.2021060908

Abstract ( 747 )

PDF（pc） (924KB) ( 1293 )

Save

The all non-parametric method suppose that functional data comes from a smooth curve. The whole curve is treated as a sample to avoid the problems of high dimension and high correlation. The research of functional data began in 1950s. After more than 100 years of development, many classical statistical analysis methods have been extended to functional data, and written in review and related books by Chinese and foreign scholars for other researchers to use, such as principal component, typical correlation, linear model and clustering problems. However, there are few books and reviews about generalized linear models and classification for functional data. This article gives a detailed review of the development process and future development directions of the functional data analysis and the function approximation, including the basis expansion and principal components, the generalized linear model and classification of functional data. Furthermore, in order to better apply functional data in the fields of economy, finance, medicine, meteorology and environment, some specific calculation programs for the B-spline are provided in this article.

References | Related Articles | Metrics

Empirical Likelihood Inference for a Class of Spatial Panel Data Models

ZENG Qingfan, QIN Yongsong, LI Yufang

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 30-42. DOI: 10.16088/j.issn.1001-6600.2021060918

Abstract ( 273 )

PDF（pc） (822KB) ( 749 )

Save

The empirical likelihood inference for a time-varying coefficient spatial panel data model with spatial autocorrelation and spatial error autocorrelation is investigated in this paper. By transferring the quadratic form of the estimation equation into a linear form of a martingale difference sequence, the empirical likelihood ratio statistic of the model parameters is constructed. It is shown that the limit distribution of the statistic is chi-square distribution under certain conditions.

References | Related Articles | Metrics

High-dimensional Nonlinear Regression Model Based on JMI

ZHANG Zhifei, DUAN Qian, LIU Naijia, HUANG Lei

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 43-56. DOI: 10.16088/j.issn.1001-6600.2021060910

Abstract ( 362 )

PDF（pc） (1914KB) ( 1120 )

Save

Sure Independence Screening (SIS) has been widely used in the variable selection of linear regression models in ultra-high dimensional space, and extended to deal with the variable selection of generalized linear regression models. However, SIS cannot solve the problem of variable selection in nonlinear regression models well, and there are few existing studies on this problem. Therefore, how to effectively select variables in nonlinear regression models in ultra-high dimensional space becomes a problem with research value. Based on the classic SIS method, by considering Jackknife-based estimation of mutual information (JMI), a method combining SIS with JMI is proposed, and a specific algorithm is provided to realize the variable selection of the nonlinear regression model in the ultra-high dimensional space. Through some representative simulation experiments, this paper verifies the consistency of the proposed method. In addition, by the analysis of two examples gene data, the feasibility and practicality of the proposed method are elaborated.

References | Related Articles | Metrics

New Category Classification Research Based on MEB and SVM Methods

YANG Di, FANG Yangxin, ZHOU Yan

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 57-67. DOI: 10.16088/j.issn.1001-6600.2021060913

Abstract ( 393 )

PDF（pc） (7348KB) ( 212 )

Save

This paper mainly studies the following problems: if there is a training set containing only A and B classes,and a test set containing more than these two categories,how should the samples in the test set be classified? For this problem, three new category classification methods based on SVM and minimum enclosing ball method are proposed. These three new methods not only can solves the weakness of SVM that can't correctly identifying new categories, but also can obtain good effect in the real data analysis. The data set used in this paper is breast cancer molecular subtype data set. The final sample classification accuracy rate can reach more than 90%,and the classification accuracy of the new category samples can be more than 99%.

References | Related Articles | Metrics

Estimation and Test for Asymmetric DAR Model

CHEN Zhongxiu, ZHANG Xingfa, XIONG Qiang, SONG Zefang

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 68-81. DOI: 10.16088/j.issn.1001-6600.2021060905

Abstract ( 420 )

PDF（pc） (882KB) ( 819 )

Save

The estimation and testing problems of asymmetric double autoregression (DAR) model are studied in this paper. The parametric component by virtue of quasi maximum likelihood estimation (QMLE) method is firstly proposed. Under some regularity conditions, the resulting estimators are consistent and asymptotically normal. Then, a quasi-likelihood ratio (QLR) statistic to detect asymmetric effect is proposed. The asymptotic properties of the testing statistic are established under null and alternative hypotheses. Finally, both simulation studies and empirical application well demonstrate the finite sample performance of the proposed estimation methodology and testing procedure.

References | Related Articles | Metrics

Bayesian Estimation of Current Status Data with Generalized Extreme Value Regression Model

SUN Ye, JIANG Jingjing, WANG Chunjie

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 82-90. DOI: 10.16088/j.issn.1001-6600.2021060907

Abstract ( 370 )

PDF（pc） (803KB) ( 1144 )

Save

The generalized extreme value distribution has attracted the attention of many scholars since it was proposed. It can be used to fit life data and is widely used in the fields of medical sciences, engineering and meteorology. In this paper, the Bayesian regression analysis under the three-parameter generalized extreme value model is studied under the current status data. Based on the location parameter of generalized extreme value distribution, the covariate are introduced, the Bayesian regression model of location parameter and survival time are established, and the MCMC method combining with Gibbs sampling and MH algorithm is used to draw sample from the posterior distribution of each parameter. The means of the posterior samples are used as the estimated value of the parameters. R software is used for numerical simulation to compare the performance of maximum likelihood estimation and Bayesian estimation, which illustrates that the parametric survival regression model fits the data well. Simulation results indicate that Bayesian estimation outperforms the maximum likelihood estimation. Finally, this method is applied to the lung tumor data of 144 male RFM mice, and some results are obtained.

References | Related Articles | Metrics

Sampling Method Based on Slice Inverse Regression in Big Data

HE Jianfeng, SHI Li

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 91-99. DOI: 10.16088/j.issn.1001-6600.2021060903

Abstract ( 327 )

PDF（pc） (766KB) ( 531 )

Save

Sampling survey is still an indispensable data acquisition and statistical inference method in the era of big data, but better value depends on the adaptation of sampling method to the real situation of big data. Among them, how to extract representative samples of research variables is the most concerned problem. A comprehensive score sampling method based on slice inverse regression is proposed to solve this problem. The slice inverse regression can integrate the dependent variable information into the independent variable. Firstly, slice inverse regression analysis is used on big data to improve its dimension reduction process. Then, the comprehensive score of each principal component is taken as the sampling probability. The results of data simulation analysis show that the proposed method has better sampling estimation effect compared with the sampling without implementation and simple random sampling estimation in the big data situation, and the better sampling estimation effect appears when the individual difference is large. Finally, the feasibility and effectiveness of this method are verified by the actual data.

References | Related Articles | Metrics

Conditional Independence Screening in Sparse Ultra-high Dimensional Nonparametric Additive Models

XU Ping, ZHONG Simin, LI Binbin, XIONG Wenjun

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 100-107. DOI: 10.16088/j.issn.1001-6600.2021060919

Abstract ( 440 )

PDF（pc） (774KB) ( 1302 )

Save

Variable screening is an effective method for processing ultra-high-dimensional data. Barut et al. considered that some of the known variables are significantly related to the response variables, and propose the CSIS method based on the assumption of a linear model. This method can effectively reduce the probability of false variable selection. However, its linear model assumptions are more stringent. In actual research, the structure of the model cannot be determined in advance. Therefore, this paper proposes a conditional non-parametric independent screening method (CNIS) based on a non-parametric additive model, which does not need to make assumptions about the model structure, to increases the scope of application. At the same time, under appropriate conditions, it is proved that the screening in the first stage of the method has consistent screening properties and can retain important variables with probability 1. The variable selection in the second stage also has good consistency. The simulation results based on Monte Carlo data show that this method has better performance than the NIS method.

References | Related Articles | Metrics

Estimation of the Mixed Generalized Partially Linear Additive Model

REN Shuai, CHENG Wenhui, ZHOU Jie

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 108-124. DOI: 10.16088/j.issn.1001-6600.2021060909

Abstract ( 388 )

PDF（pc） (1370KB) ( 945 )

Save

The generalized partially linear additive model has two parts: a parametric part and a non-parametric part. Different link functions can be applied to different situations. So it is a very flexible statistical model. The finite mixture model is an effective tool for studying heterogeneous populations, which has strong expansibility. With the improvement of computing power, it has been widely used. In this paper, mixture of generalized additive partial linear model (MGAPLM) is proposed by combining these two models. First, definition of the model and the identifiability results under some regular conditions are presented. Then the spline-backfitted-kernel (SBK) method that combines the spline and the kernel method is used to estimate the parameters and non-parametric function in the model. Furthermore, the asymptotic property of the estimator is given. In order to test whether the proposed model is effective, a model checking method is proposed under the normal distribution and the binomial distribution. Numerical simulation is carried out to show the performance of the estimator with a finite sample size. Finally, the proposed method is applied to economic data and obtain the specific form of the model.

References | Related Articles | Metrics

Bland-Altman Method for Assessing Agreement of Quantitative Data in Clinical Measurement

FU Meizi, LIN Bingqing

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 125-138. DOI: 10.16088/j.issn.1001-6600.2021060916

Abstract ( 974 )

PDF（pc） (1384KB) ( 4068 )

Save

With the advancement of medical technology, new technologies and new methods continue to emerge. It is particularly important to compare and evaluate the agreement of the measurement results of the old and new methods. Reliable evaluation results of agreement are of great importance to improve the quality of medical services and reduce the waste of medical resources. Currently, there are few studies on the Bland-Altman method of agreement evaluation in clinical measurement in China, and there are many problems in the clinical application of the method. This paper discusses the Bland-Altman evaluation process of agreement in the case of single measurement and repeated measurement, and provides processing methods for different data types and the overall guideline in the hope that this can help medical researchers to choose and use statistical analysis methods properly in clinical data analysis.

References | Related Articles | Metrics

Semiparametric Rate Models for Recurrent Event Data with Cure Rate via Empirical Likelihood

LIU Yu, ZHOU Wen, LI Ni

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 139-149. DOI: 10.16088/j.issn.1001-6600.2021060906

Abstract ( 267 )

PDF（pc） (803KB) ( 502 )

Save

With the continuous development of medical science, recently some diseases that have been considered impossible to be cured before, are found to be possibly cured and will not recur after a certain period. This paper proposes an empirical likelihood method based on semiparametric rate model for recurrent event data with cure rate. An empirical likelihood ratio statistic is introduced for the regression parameters and the Wilk’s theorem is established. By comparing the proposed empirical likelihood method with normal approximation method when the sample size is small, simulation studies are given. Finally, this method is applied to a bladder cancer dataset.

References | Related Articles | Metrics

Double Penalty Quantile Regression for Panel Data Models Based on Bayesian Method

SHU Ting LUO Youxi LI Hanfang

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 150-165. DOI: 10.16088/j.issn.1001-6600.2021060917

Abstract ( 375 )

PDF（pc） (897KB) ( 997 )

Save

In the mixed-effects models of panel data, it is difficult to estimate the parameters of the model because of a large number of unknown random effects. At the same time, because the distribution of random errors is unknown, the random errors under different distributions increase the complexity of model computation and bring difficulties to the selection and estimation of variables for the coefficients of fixed and random effects. To solve this problem, this paper establishes a double Bayesian Adaptive Lasso quantile regression model, and introduces the Adaptive Lasso penalty function into the panel data with fixed and random effects at the same time. A Gibbs sampling algorithm for parameter estimation is also constructed. The Monte Carlo simulation results show that the method not only accurately estimates the parameter coefficients of different panel data models, but also allows the selection of important variables.

References | Related Articles | Metrics

Research on Wind Power Short-term Prediction Based on EEMD-GA-BP Model

ZHU Enwen, ZHU Anqi, WANG Jiedan, LIU Yujiao

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 166-174. DOI: 10.16088/j.issn.1001-6600.2021060912

Abstract ( 318 )

PDF（pc） (3488KB) ( 515 )

Save

With the rapid development of China’s wind power industry, the scale of wind power grid integration is constantly expanding. Accurate prediction of wind farm output power is an effective way to reduce the impact of wind power fluctuations on the power grid, which can improve power quality, and ensure the stable operation of the power grid. In this paper, the method of box analysis and hot card filling is used to clean and reconstruct the abnormal data in the data set. The BP algorithm is improved by combining genetic algorithm and EEMD decomposition algorithm. According to the comparison of prediction results with different time scales, the EEMD-GA-BP model has higher prediction accuracy and more stable prediction effect compared with the traditional prediction model.

References | Related Articles | Metrics

Robust Estimation of Multivariate Linear Regression Model Based on MRCD Estimation

YAN Haibo, DENG Gang, JIANG Yunlu

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 175-186. DOI: 10.16088/j.issn.1001-6600.2021060911

Abstract ( 319 )

PDF（pc） (1649KB) ( 1331 )

Save

Data with outliers and high-dimensional data appear more and more frequently, challenging the existing robust estimation methods and multivariate linear model estimation methods. The traditional multivariate linear model estimation is very sensitive to outliers, and the multivariate linear model estimation based on the MCD estimation method has a certain resistance to outliers. But with the increase of the data dimensionality, the accuracy of MCD estimation continues to decrease, and the robustness also decreases. The MCD estimation method fails when the data dimensionality is greater than the sample size. Therefore, using the mean vector and covariance matrix estimation of MRCD, a high-dimensional robust multivariate linear model estimation based on the MRCD estimation method is proposed. Numerical simulation results show that the multivariate linear model estimation based on the MRCD estimation method can resist outliers well, and when the data dimension is larger than the sample size, the multivariate linear model estimation based on the MRCD estimation method is more effective. The results of empirical analysis show that the multiple linear regression estimation based on the MRCD method can better resist outliers and get better prediction results.

References | Related Articles | Metrics

Sample Size Determination for the Additive Hazards Model with Current Status Data

KONG Lingtao, SONG Xiangjun, WANG Xiaomin

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 187-194. DOI: 10.16088/j.issn.1001-6600.2021060915

Abstract ( 486 )

PDF（pc） (916KB) ( 714 )

Save

Power and sample size calculations are important and necessary parts in the design stage of a scientific study. In the failure time data analysis, the additive hazards model, which specifies that the covariates have an additive effect on the baseline risk, is one of the most popular used semiparametric models. Compared with the proportional hazards model, the additive hazards model would be more plausible in many applications, especially in the two-sample situation where the covariate takes value only 0 or 1. In this paper, a novel method is proposed for calculating power and sample size for the additive hazards model with current status data based on the Wald test. The simulation studies demonstrate that the proposed sample size formula is adequate. Moreover, a real example is presented to illustrate the application of the proposed formula.

References | Related Articles | Metrics

A Class of Autoregressive Moving Average Model with GARCH Type Errors

LIANG Xin, CHEN Xiaoling, ZHANG Xingfa, LI Yuan

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 195-205. DOI: 10.16088/j.issn.1001-6600.2021083003

Abstract ( 246 )

PDF（pc） (823KB) ( 1230 )

Save

In this paper, an autoregressive moving average model with a new GARCH error term is proposed by combining DAR model and traditional ARMA-GARCH model. This model introduces more data information than the DAR model, and defines a new conditional heteroscedasticity structure driven by observable sequence, which is easier to estimate than the traditional ARMA-GARCH model. The article studies the quasi-maximum likelihood estimation of the model parameters, and proves the asymptotic normality of the estimator under weaker moment conditions. Numerical simulation results confirm the effective performance of the model under finite samples. Empirical research shows that this model can improve the data fitting effect, and has certain value of applications.

References | Related Articles | Metrics

Research on China’s Grain Output Based on Interval Data Measurement

LI Chengen, PAN Xiaoying, WANG Meihan, SHI Jianhua

Journal of Guangxi Normal University(Natural Science Edition). 2022, 40 (1): 206-215. DOI: 10.16088/j.issn.1001-6600.2021060914

Abstract ( 354 )

PDF（pc） (3689KB) ( 349 )

Save

Four modeling methods of interval data are used to explore the comprehensive impact of climate change and agricultural production input factors on China’s grain yield from 1993 to 2018. In addition, five evaluation indexes are used to measure the prediction accuracy of methods and the regression results are given and compared. The optimal regression method is applied to predict the change of grain yield in China. The results show that there are regional differences in grain yield in China, and the grain yield per unit area in some provinces changed greatly before 2009. But the grain yield per unit area of China’s eight provinces has tended to stable in the past tent years. Furthermore, climate change and agricultural production input factors have statistical significance in China’s grain yield. Finally, some suggestions are given to improve grain yield.

References | Related Articles | Metrics