广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (1): 1-14.doi: 10.16088/j.issn.1001-6600.2021060904

• 综述 •    下一篇

多性状多位点遗传关联分析的统计方法研究及其应用进展

艾艳, 贾楠, 王媛, 郭静, 潘东东*   

  1. 云南大学 数学与统计学院, 云南 昆明 650500
  • 收稿日期:2021-06-09 修回日期:2021-08-04 出版日期:2022-01-25 发布日期:2022-01-24
  • 通讯作者: 潘东东(1983—), 男, 湖南冷水江人, 云南大学副教授, 博士。E-mail: ddpan@ynu.edu.cn
  • 基金资助:
    国家自然科学基金(11661080); 云南大学第十二届研究生科研创新项目(2020Z68)

Review of Statistical Methods and Applications of Genetic Association Analysis for Multiple Traits and Multiple Locus

AI Yan, JIA Nan, WANG Yuan, GUO Jing, PAN Dongdong*   

  1. School of Mathematics and Statistics, Yunnan University, Kunming Yunnan 650500, China
  • Received:2021-06-09 Revised:2021-08-04 Online:2022-01-25 Published:2022-01-24

摘要: 首先,对罕见变异遗传关联分析领域存在的统计问题及相关研究前沿和热点进行梳理分析;其次,对单位点及多位点分析常用统计方法做系统概述,并讨论这些方法存在的问题及面临的挑战;最后,对多性状多位点关联分析方法的未来发展前景作展望。

关键词: 全基因组关联分析, 多性状, 多位点, 单核苷酸多态性

Abstract: In this review, the statistical issues of association of genetic variants are firstly analyzed. Secondly,the principles and basic assumptions of multiple traits test,single locus test and multiple loci analysis methods in recent years are systematically summarized, the existing problems and challenges of these methods are analyzed. Finally,the future possible development of the genetic method of multiple traits and multiple loci association analysis is pesented.

Key words: GWAS, multiple traits, multiple loci, SNP

中图分类号: 

  • R195.1
[1] CARDON L R,BELL J I. Association study designs for complex diseases[J]. Nature Reviews Genetics,2001,2(2):91-99.
[2]GAMAZON E R,SEGRÈ A V,van de BUNT M,et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease and trait-associated variation[J]. Naturegenetics,2018,50(7):956-967.
[3]KLEIN R J,ZEISS C,CHEW E Y,et al. Complement factor H polymorphism in age-related macular degeneration[J]. Science,2005,308(5720):385-389.
[4]HINDORFF L A,MACARTHUR J,MORALES J,et al. A catalog of published genome-wide association studies[EB/OL]. (2015-05-12)[2021-06-09]. https://genome.gov/catalog-of-published-genomewide-association-studies.
[5]张俊国. SKAT与惩罚回归模型联合分析策略在遗传关联研究中的应用[D]. 广州: 广东医科大学,2016.
[6]唐明生,黄水平, 金英良,等.重抽样方差成分检验的多位点关联分析[J].中国卫生统计,2016,33(6):997-1002.
[7]ASCHARD H,VILHJáLMSSON B J,GRELICHE N,et al. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies[J]. American Journal of Human Genetics,2014,94(5):662-676.
[8]FERREIRA M A,PURCELL S M. A multivariate test of association[J]. Bioinformatics,2009,25(1):132-133.
[9]BOTTOLO L,CHADEAU-HYAM M,HASTIE D I,et al. GUESS-ing polygenic associations with multiple phenotypes using a GPU-Based ebolutionary stochastic search algorithm[J]. PLoS Genetics,2013,9(8):e1003657.
[10]BOLORMAS S,PRYCE J E,REVERTER A,et al. A multi-trait meta-analysis for detecting pleiotropic polymorphisms for stature fatness and reproduction in beef cattle[J]. PLoS Genetics,2014,10(3):e1004198.
[11]XU Y,HU W,YANG Z F,et al. A multivariate partial least squares approach to joint association analysis for multiple correlated traits[J]. The Crop Journal,2016,4(1):21-29.
[12]LI Q,ZHENG G,LIANG X,et al. Robust tests for single-marker analysis in case-control genetic association studies[J]. Annals of Human Genetics,2009,73(2):245-252.
[13]ARMITAGE P. Test for linear trends in proportions and frequencies[J]. Biometrics. 1955,11(3):375-386.
[14]DEVLIN B,ROEDER K. Genomic control for association studies[J]. Biometrics,1999,55(4):997-1004.
[15]SONG K,ELSTON R C. A powerful method of combining measures of association and Hardy-Weinberg disequilibrium for fine-mapping in case-control studies[J]. Statistics Inmedicine,2006,25(1):105-126.
[16]CHANG X L,MAO X Y,LI H H,et al. Association of GWAS loci with PD in China[J]. American Journal of Medical Genetics Part B:Neuropsychiatric Genetics,2011,156(3):334-339.
[17]SCHAID D J. Relative efficiency of ambiguous vs. directly measured haplotype frequencies[J]. Genet Epidemiol,2002,23(4):426-443.
[18]GAO X L,FANG Y X. Penalized weighted least squares for outlier detection and robust regression[EB/OL]. (2016-03-24)[2021-06-06]. https://arxiv.org/abs/1603.07427.
[19]WANG X F,XING E P,SCHAID D J. Kernel methods for large-scale genomic data analysis[J]. Briefings in Bioinformatics,2015,16(2):183-192.
[20]WESSEL J,SCHORK N J. Generalized genomic distance-based regression methodology formultilocus association analysis[J]. American Journal of Human Genetics,2006,79(5):792-806.
[21]ASIMIT J L,DAY-WILLIAMS A G,MORRIS A P,et al. ARIEL and AMELIA:testing for an accumulation of rare variants using next-generation sequencing data[J]. Human Heredity,2012,73(2):84-94.
[22]MORGENTHALER S,THILLY W G. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST) [J]. Mutation Research,2007,615(1/2):28-56.
[23]MORRIS A P, ZEGGINI E. An evaluation of statistical approaches to rare variant analysis in genetic association studies[J]. Genetic Epidemiology,2010,34(2):188-193.
[24]MADSEN B E,BROWNING S R. A groupwise association test for rare mutations using a weighted sum statistic[J]. PLoS Genetics,2009,5(2):e1000384.
[25]ZAWISTOWSKI M,GOPALAKRISHNAN S,DING J,et al. Extending rare-variant testing strategies:analysis of noncoding sequence and imputed genotypes[J]. American Journal of Human Genetics,2010,87(5):604-617.
[26]NEALE B M,RIVAS M A,VOIGHT B F,et al. Testing for an unusual distribution of rare variants[J]. PLoS Genetics,2011,7(3):e1001322.
[27]LEE S M,WU M C,LIN X. Optimal tests for rare variant effects in sequencing association studies[J]. Biostatistics,2012,13(4):762-775.
[28]BASU S,PAN W. Comparison of statistical tests for disease association with rare variants[J]. Genetic Epidemiology,2011,35(7):606-619.
[29]HAN F,PAN W. A data-adaptive sum test for disease association with multiple common or rare variants[J]. Human Heredity,2010,70(1):42-54.
[30]LIN D Y,TANG Z Z. A general framework for detecting disease associations with rare variants in sequencing studies[J]. American Journal of Human Genetics,2011,89(3):354-367.
[31]PRICE A L,KRYUKOV G V,DE BAKKER P I W,et al. Pooled association tests for rare variants in exon-resequencing studies[J]. American Journal of Human Genetics,2010,86(6):832-838.
[32]LIU D J,LEAL S M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions[J]. PLoS Genetics,2010,6(10):e1001156-.
[33]WU M C,LEE S X,CAI T X,et al. Rare-variant association testing for sequencing data with thesequence kernel association test[J]. American Journal of Human Genetics,2011,89(1):82-93.
[34]PAN W. Asymptotic tests of association with multiple SNPs in linkage disequilibrium[J]. Genetic Epidemiology,2009,33(6):497-507.
[35]LEE S,EMOND M J,BAMSHAD M J,et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies[J]. American Journal of Human Genetics,2012,91(2):224-237.
[36]DERKACH A,LAWLESS J F,SUN L. Robust and powerful tests for rare variants using Fisher’s method to combine evidence of association from two or more complementary tests[J]. Genetic Epidemiology,2013,37(1):110-121.
[37]CHEN L S,HSU L,GAMAZON E,et al. An exponential combination procedure for set-based association tests in sequencing studies[J]. American Journal of Human Genetics,2012,91(6):977-986.
[38]HOERL A E,KENNARD R W. Ridge regression:biased estimation for nonorthogonal problems[J]. Technometrics,1970,12(1):55-67.
[39]TIBSHIRANI R J. Regression shrinkage and selection via the LASSO[J]. Journal of the Royal Statistical Society. Series B:Methodological,1996,58(1):267-288.
[40]ZHANG Y M,XU S. A penalized maximum likelihood method for estimating epistatic effects of QTL[J]. Heredity,2005,95(1):96-104.
[41]ZOU H,HASTIE T. Regularization and variable selection via the elastic net[J]. Jornal of the Royal Statistical Society, Series B-Statistical Methodology,2015,67(2):301-320.
[42]PRICE A L,PATTERSON N J,PLENGE R M,et al. Principal components analysis corrects for stratification in genome-wide association studies[J]. Nature Genetics,2006,38(8):904-909.
[43]IWATA H,UGA Y,YOSHIOKA Y,et al. Bayesian association mapping of multiple quantitative trait loci and its application to the analysis of genetic variation among Oryza sativa L germplasms[J]. Theoretical and Applied Genetics,2007,114(8):1437.
[44]PARK T,CASELLA G. The Bayesian Lasso[J]. Journal of the American Statistical Association,2008,103(482):681-686.
[45]HAN B,KANG H M,ESKIN E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers[J]. PLoS Genetics,2009,5(4):e1000456.
[46]TIBSHIRANI R. Regression shrinkage and selection via the lasso:a retrospective[J]. Journal of the Royal Statistical Society:Series B (Statistical Methodology),2011,73(3):273-282.
[47]WU T T,CHEN Y F,HASTIE T,et al. Genome-wide association analysis by lasso penalized logistic regression[J]. Bioinformatics,2009,25(6):714-721.
[48]段巍巍.高维组学研究中的贝叶斯多位点模型[D]. 南京:南京医科大学,2018.
[49]ZHANG Z W,ERSOZ W,LAI C Q,et al. Mixed liner model approach adapted for genome-wide association studies[J]. Nature Genetics,2010,42(4):355-360.
[50]LI M,LIU X L,BRADBURY P,et al. Enrichment of statistical power for genome-wide association studies[J]. BMC Biology,2014,12:73.
[51]KANG H M,ZAITLEN N A,WADE C M,et al. Efficient control of population structure in model organism association mapping[J]. Genetics,2008,178(3):1709-1723.
[52]YU J M, PRESSOIR G, BRIGGS W H, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness[J]. Nature Genetics, 2006, 38(2): 203-208.
[53]KANG H M,SUL J H,SERVICE S K,et al. Variance component model to account for sample structure in genome-wide association studies[J]. Nature Genetics,2010,42(4):348-354.
[54]ZHOU X,STEPHENS M. Genome-wide efficient mixed-model analysis for association studies[J]. Nature Genetics,2012,44(7):821-824.
[55]SVISHCHEVA G R,AXENOVICH T I,BELONOGOVA N M,et al. Rapid variance components-based method for whole-genome association analysis[J]. Nature Genetics,2012,44(10):1166-1170.
[56]WANG Q S,TIAN F,PAN Y C,et al. A super powerful method for genome wide association study[J]. PLoS ONE,2014,9(9):e107684.
[57]LIPPERT C,LISTGARTEN J,LIU Y,et al. Fast linear mixed models for genome-wide association studies[J]. Nature Methods,2011,8(10):833-835.
[58]JOHNSON R W. An introduction to the bootstrap[J]. Teaching Statistics,2001,23(2):49-54.
[59]DABISON A C,HINKLEY D V. Bootstrap methods and their application[M]. New York: Cambridge University Press,1997.
[60]GOOD P I. Permutation,parametric and bootstrap tests of hypotheses[M]. New York: Springer-Verlag,2005.
[61]EFRON B. The Jackknife, the bootstrap and other resampling plans[M]. Philadelphia: Society for Industrial and Applied Mathematics (SIAM), 1982.
[62]TIPPETT L H C. The methods of statistics. an introduction mainly for workers in the biological sciences[M]. London: Williams & Norgate ltd., 1931.
[63]DONOHO D,JIN J S. Higher criticism for detecting sparse heterogeneous mixtures[J]. The Annals of Statistics,2004,32(3):962-994.
[64]BERK R H,JONES D H. Goodness-of-fit test statistics that dominate the Kolmogorov statistics[J]. Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete,1979,47(1):47-59.
[65]LIU Y W,CHEN S X,LI Z L,et al. ACAT:A fast and powerful p value combination method for rare-variant analysis in sequencing studies[J]. Americah Journal of Human Genetics,2019,104(3):410-421.
[66]刘庆.多位点Jonckheere-Terpstra全基因组关联分析方法[D].南京:南京农业大学,2016.
[67]XU S. An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects[J]. Heredity,2010,105(5):483-494.
[68]PAN D D, LI Q Z, JIANG N N,et al. Robust joint analysis allowing for model uncertainty in two-stage genetic association studies[J]. BMC Bioinformatics, 2011, 12: 9.
[69]LI Q Z, PAN D D, YUE W H, et al. Evaluating rare variants under two-stage design[J]. Journal of Human Genetics, 2012, 57(6): 352-357.
[70]PAN D D, XIONG W J, ZHOU J Y, et al. Robust joint analysis with data fusion in two-stage quantitative trait genome-wide association studies[J]. Computational and Mathematical Methods in Medicine,2013, 6: 843563.
[71]PAN D D, LI Z B, LI Q Z, et al. A novel powerful joint analysis with data fusion in two-stage case-control genome-wide association studies[J]. Communications in Statistics-Simulation and Computation, 2016, 45(7): 2362-2376.
[72]HU X N, DUAN X G, PAN D D, et al. A model-embedded trend test with incorporating Hardy-Weinberg equilibrium information[J]. Journal of Systems Science and Complexity, 2017, 30(1): 101-110.
[73]贺建波,刘方东,王吴彬,等. 限制性两阶段多位点全基因组关联分析法在遗传育种中的应用[J].中国农业科学,2020,53(9):1704-1716.
[74]HE J B,MENG S, ZHAO T J,et al. An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding[J]. Theoretical and Applied Genetics,2017,130(11):2327-2343.
[75]贺建波,刘方东,邢光南,等. 限制性两阶段多位点全基因组关联分析方法的特点与计算程序[J]. 作物学报,2018,44(9):1274-1289.
[76]杜应雯. 基于奇异值分解和SCAD估计的多位点全基因组关联分析方法[D].武汉:华中农业大学,2018.
[77]ZENG P,ZHAO Y, LIU J,et al. Likelihood ratio tests in rare variant detection for continuous phenotypes[J]. Annals of Human Genetics,2014,78(5):320-332.
[78]SUN J P,ZHENG Y Y,HSU L. A unified mixed-effects model for rare-variant association in sequencing studies[J]. Genetic Epidemiology,2013,37(4):334-344.
[1] 付美子, 林炳清. 临床测量中定量数据Bland-Altman一致性评价[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 125-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘国伦, 宋树祥, 岑明灿, 李桂琴, 谢丽娜. 带宽可调带阻滤波器的设计[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 1 -8 .
[2] 刘铭, 张双全, 何禹德. 基于改进SOM神经网络的异网电信用户细分研究[J]. 广西师范大学学报(自然科学版), 2018, 36(3): 17 -24 .
[3] 胡郁葱, 陈栩, 罗嘉陵. 多起终点多车型混载的定制公交线路规划模型[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 1 -11 .
[4] 唐堂, 魏承赟, 罗晓曙, 丘森辉. 基于附加惯性项人群搜索算法的四旋翼无人机姿态控制研究[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 12 -19 .
[5] 林越, 刘廷章, 黄莉荣, 奚晓晔, 潘建. 基于双向KL距离聚类算法的变压器状态异常检测[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 20 -26 .
[6] 韦振汉, 宋树祥, 夏海英. 基于随机森林的锂离子电池荷电状态估算[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 27 -33 .
[7] 许远静, 胡维平. 基于随机森林的不同程度病态嗓音识别[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 34 -41 .
[8] 张灿龙, 苏建才, 李志欣, 王智文. 基于AdaBoost置信图的红外与可见光目标跟踪[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 42 -50 .
[9] 刘电霆, 吴丽娜. 社会网络中基于信任的LDA主题模型领域专家推荐[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 51 -58 .
[10] 姜影星, 黄文念. 非线性薛定谔-麦克斯韦方程的基态解[J]. 广西师范大学学报(自然科学版), 2018, 36(4): 59 -66 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发