广西师范大学学报(自然科学版) ›› 2026, Vol. 44 ›› Issue (1): 110-118.doi: 10.16088/j.issn.1001-6600.2025030703

• 数学与统计学 • 上一篇    下一篇

一种基于错误发现率的模型选择规则

荣晶晶, 冶继民*   

  1. 西安电子科技大学 数学与统计学院, 陕西 西安 710126
  • 收稿日期:2025-03-07 修回日期:2025-05-18 出版日期:2026-01-05 发布日期:2026-01-26
  • 通讯作者: 冶继民(1967—),男,陕西宝鸡人,西安电子科技大学教授,博士。E-mail: jmye@mail.xidian.edu.cn
  • 基金资助:
    国家自然科学基金(12272283)

A Model Selection Criterion Based on False Discovery Rate

RONG Jingjing, YE Jimin*   

  1. School of Mathematics and Statistics, Xidian University, Xi’an Shaanxi 710126, China
  • Received:2025-03-07 Revised:2025-05-18 Online:2026-01-05 Published:2026-01-26

摘要: 针对高维稀疏线性回归模型,本文从后验估计角度提出基于错误发现率(false discovery rate, FDR)的模型选择FDR规则;之后在其基础上引入动态信噪比(signal-to-noise ratio, SNR)变化因子,提出对SNR变化更稳健且对数据尺度具有不变性的FDRR规则;结合OMP算法,仿真实验对比分别采用FDR规则、FDRR规则和已有规则下成功选择全部真正变量的概率和FDR值,结果表明,相较于其他规则,FDRR规则在高SNR或大样本量下更稳健,对数据缩放问题更加鲁棒,且错误发现率最低;最后,将所提方法应用到套细胞淋巴瘤患者的真实数据,筛选出影响细胞增殖的基因编号。

关键词: 高维模型选择, 错误发现率, FDRR, OMP算法, 信噪比

Abstract: For high-dimensional sparse linear regression models, this paper proposed an FDR rule for model selection based on false discovery rate (FDR) from the perspective of posterior estimation, and then introduced a dynamic signal-to-noise ratio (SNR) change factor on this basis. The FDRR rule, which was more robust to SNR variations and was invariant to data scale, was proposed. Combined with the OMP algorithm, simulation experiments compared the probabilities of successfully selecting all true variables and the FDR values for the FDR rule, FDRR rule, and existing rules. The results show that the FDRR rule is more robust than the other rules in high SNR or large sample sizes, more resistant to data scaling issues, and achieves the lowest FDR. Finally, the proposed method was applied to real data from patients with mantle cell lymphoma, identifying genes associated with cell proliferation.

Key words: high-dimensional model selection, FDR, FDRR, OMP algorithm, SNR

中图分类号:  O212.8

[1] DING J, TAROKH V, YANG Y H. Model selection techniques: an overview[J]. IEEE Signal Processing Magazine, 2018, 35(6): 16-34. DOI: 10.1109/MSP.2018.2867638.
[2] STOICA P, SELEN Y. Model-order selection: a review of information criterion rules[J]. IEEE Signal Processing Magazine, 2004, 21(4): 36-47. DOI: 10.1109/MSP.2004.1311138.
[3] BOGDAN M, FROMMLET F. Identifying important predictors in large data bases-multiple testing and model selection[M]. Handbook of Multiple Comparisons, Boca Raton, FL:Chapman and Hall/CRC, 2021: 139-182.
[4] MEIR E, ROUTTENBERG T. Cramér-Rao bound for estimation after model selection and its application to sparse vector estimation[J]. IEEE Transactions on Signal Processing, 2021, 69: 2284-2301. DOI: 10.1109/TSP.2021.3068356.
[5] AKAIKE H. A new look at the statistical model identification[J]. IEEE Transactions on Automatic Control, 1974, 19(6): 716-723. DOI: 10.1109/TAC.1974.1100705.
[6] SCHWARZ G. Estimating the dimension of a model[J].The Annals of Statistics, 1978, 6(2): 461-464.
[7] 王斐, 许波. 基于自适应LPP特征降维和改进VPMCD的滚动轴承故障诊断[J]. 现代制造工程, 2024(6): 154-161, 94. DOI: 10.16731/j.cnki.1671-3133.2024.06.020.
[8] 王逸林, 马世龙, 王晋晋, 等. 基于稀疏重构的色噪声背景下未知线谱信号估计[J]. 电子与信息学报, 2018, 40(11): 2570-2577. DOI: 10.11999/JEIT171040.
[9] ISHIJIMA R, EBIHARA T, WAKATSUKI N, et al. Sparse channel estimation with global optimum solution for orthogonal signal division multiplexing in underwater acoustic communication[J]. IEEE Access, 2024, 12: 128778-128790.
[10] TIBSHIRANI R. Regression shrinkage and selection via the Lasso[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996, 58(1): 267-288. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
[11] ZOU H, HASTIE T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2005, 67(2): 301-320. DOI: 10.1111/j.1467-9868.2005.00503.x.
[12] 姜云卢, 卢辉杰, 黄晓雯. 惩罚加权复合分位数回归方法在固定效应面板数据中的应用研究[J]. 广西师范大学学报(自然科学版), 2025,43(6): 120-127. DOI: 10.16088/j.issn.1001-6600.2024111001.
[13] CAI T T, WANG L. Orthogonal matching pursuit for sparse signal recovery with noise[J]. IEEE Transactions on Information Theory, 2011, 57(7): 4680-4688. DOI: 10.1109/TIT.2011.2146090.
[14] CHEN J H, CHEN Z H. Extended Bayesian information criteria for model selection with large model spaces[J].Biometrika, 2008, 95(3): 759-771. DOI: 10.1093/biomet/asn034.
[15] OWRANG A, JANSSON M. A model selection criterion for high-dimensional linear regression[J]. IEEE Transactions on Signal Processing, 2018, 66(13): 3436-3446. DOI: 10.1109/TSP.2018.2821628.
[16] BABU P, STOICA P. Multiple-hypothesis testing rules for high-dimensional model selection and sparse-parameter estimation[J]. Signal Processing, 2023, 213: 109189. DOI: 10.1016/j.sigpro.2023.109189.
[17] 徐萍, 钟思敏, 李斌斌, 等. 基于稀疏超高维非参数可加模型的条件独立筛选[J]. 广西师范大学学报(自然科学版), 2022, 40(1): 100-107. DOI: 10.16088/j.issn.1001-6600.2021060919.
[18] 潘莹丽, 刘展, 闫玲玲. 基于大规模高维线性回归模型的分布式计算方法研究[J]. 应用数学学报, 2022, 45(3):339-354. DOI: 10.3969/j.issn.1006-3110.2018.06.002.
[19] GOHAIN P B, JANSSON M. Robust information criterion for model selection in sparse high-dimensional linear regression models[J]. IEEE Transactions on Signal Processing, 2023, 71: 2251-2266. DOI: 10.1109/TSP.2023.3284365.
[20] STOICA P, BABU P. On the proper forms of BIC for model order selection[J]. IEEE Transactions on Signal Processing, 2012, 60(9): 4956-4961. DOI: 10.1109/TSP.2012.2203128.
[21] SCHMIDT D F, MAKALIC E. The consistency of MDL for linear regression models with increasing signal-to-noise ratio[J]. IEEE Transactions on Signal Processing, 2011, 60(3): 1508-1510. DOI: 10.1109/TSP.2011.2177833.
[22] STOICA P, BABU P. False discovery rate (FDR) and familywise error rate (FER) rules for model selection in signal processing applications[J]. IEEE Open Journal of Signal Processing, 2022, 3: 403-416.
[23] BENJAMINI Y, YEKUTIELI D. The control of the false discovery rate in multiple testing under dependency[J]. The Annals of Statistics, 2001, 29(4):1165-1188. DOI: 10.1214/aos/1013699998.
[24] BUNEA F, WEGKAMP M H, AUGUSTE A. Consistent variable selection in high dimensional regression via multiple testing[J]. Journal of Statistical Planning and Inference, 2006, 136(12): 4349-4364. DOI: 10.1016/j.jspi.2005.03.011.
[25] 邹航, 姜云卢. 高维线性回归模型稳健变量选择方法综述[J]. 应用概率统计, 2024, 40(1): 157-181. DOI: 10.3969/j.issn.1001-4268.2024.01.010.
[26] 黄河, 潘莹丽. Cox模型中基于Model-X Knockoffs的高维控制变量选择方法[J]. 统计与决策, 2023, 39(5): 16-21. DOI: 10.13546/j.cnki.tjyjc.2023.05.003.
[1] 宋婷, 谢显中, 胡小峰. 分簇频谱检测报告信道的信噪比墙及性能分析[J]. 广西师范大学学报(自然科学版), 2013, 31(3): 169-176.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 刘晓娟, 林璐, 胡郁葱, 潘雷. 站点周边用地类型对地铁乘车满意度影响研究[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 1 -12 .
[2] 韩华彬, 高丙朋, 蔡鑫, 孙凯. 基于HO-CNN-BiLSTM-Transformer模型的风机叶片结冰故障诊断[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 13 -28 .
[3] 陈建国, 梁恩华, 宋学伟, 覃章荣. 基于OCT图像三维重建的人眼房水动力学LBM模拟[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 29 -41 .
[4] 李好, 何冰. 凹槽结构表面液滴弹跳行为研究[J]. 广西师范大学学报(自然科学版), 2025, 43(6): 42 -53 .
[5] 田晟, 赵凯龙, 苗佳霖. 基于改进YOLO11n模型的自动驾驶道路交通检测算法研究[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 1 -9 .
[6] 黄艳国, 肖洁, 吴水清. 基于D2STGNN的双向高效多尺度交通流预测[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 10 -22 .
[7] 刘志豪, 李自立, 苏珉. 智能通信与无人机结合的YOLOv8电动车骑行者头盔佩戴检测方法[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 23 -32 .
[8] 张竹露, 李华强, 刘洋, 许立雄. 基于Bi-LSTM特征融合和FT-FSL的非侵入式负荷辨识[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 33 -44 .
[9] 王涛, 黎远松, 石睿, 陈慧宁, 侯宪庆. MGDE-UNet:轻量化光伏电池缺陷分割模型[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 45 -55 .
[10] 黄文杰, 罗维平, 陈镇南, 彭志祥, 丁梓豪. 基于YOLO11的轻量化PCB缺陷检测算法研究[J]. 广西师范大学学报(自然科学版), 2026, 44(1): 56 -67 .
版权所有 © 广西师范大学学报(自然科学版)编辑部
地址:广西桂林市三里店育才路15号 邮编:541004
电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发