Submitted:
15 June 2023
Posted:
16 June 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Estimation Methods for Linear Models
2.1. Nonconvex Lasso
2.2. Convex Conditioned Lasso
2.3. Balanced Estimation
2.4. Calibrated Zero-norm Regularized Least Square Estimation
2.5. Linear and Conic Programming Estimation
3. Estimation Methods for Generalized Linear Models
3.1. Estimation Method for Poisson Models
3.2. Generalized Matrix Uncertainty Selector
4. Hypothesis Testing Methods
4.1. Corrected Decorrelated Score Test
4.2. Wald and Score Tests for Poisson Models
5. Screening Methods
6. Conclusions
- Existing estimation methods for high-dimensional measurement error regression models are mainly for linear or generalized linear models. Therefore, it is urgent to develop estimation methods for nonlinear models with high-dimensional measurement error data such as nonparametric and semiparametric models.
- Existing works mainly focus on independent and identically distributed data. It is worthwhile to extend the estimation and hypothesis testing methods to measurement error models with complex data such as panel data and functional data.
- In most studies of high-dimensional measurement error models, it is assumed that the covariance structure of the measurement errors is specific or that the covariance matrix of measurement errors is known. Thus, it is a challenging problem to develop estimation and hypothesis testing methods in the case that the covariance matrix of measurement errors is completely unknown.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| SIMEX | Simulation-extrapolation |
| SCAD | Smoothly clipped absolute deviation |
| SICA | Smooth integration of counting and absolute deviation |
| MCP | Minimax concave penalty |
| SIS | Sure independence screening |
| CoCoLasso | Convex conditioned Lasso |
| CaZnRLS | Calibrated zero-norm regularized least squares |
| MU | Matrix uncertainty |
| MEBoost | Measurement error boosting |
| SIMSELEX | Simulation-selection-extrapolation |
| IRO | Imputation-regularized optimization |
| FDR | False discovery rate |
| PMSc | Corrected penalized marginal screening |
| SISc | Corrected sure independence screening |
| ADMM | Alternating direction method of multipliers |
| BDCoCoLasso | Block coordinate descent convex conditioned Lasso |
| MPEC | Mathematical program with equilibrium constraints |
| GEP–MSCRA | Multi-stage convex relaxation approach |
| GMU | Generalized matrix uncertainty |
References
- Liang, H.; Härdle, W.; Carroll, R.J. Estimation in a semiparametric partially linear errors-in-variables model. The Annals of Statistics 1999, 27(5), 1519–1535. [Google Scholar] [CrossRef]
- Cook, J.; Stefanski, L.A. Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association 1994, 89(428), 1314–1328. [Google Scholar] [CrossRef]
- Carroll, R.J.; Lombard, F.; Kuchenhoff, H.; Stefanski, L.A. Asymptotics for the SIMEX estimator in structural measurement error models. Journal of the American Statistical Association 1996, 91(433), 242–250. [Google Scholar] [CrossRef]
- Fan, J.Q.; Truong, Y.K. Nonparametric regression with errors in variables. The Annals of Statistics 1993, 21(4), 1900–1925. [Google Scholar] [CrossRef]
- Cui, H.J.; Chen, S.X. Empirical likelihood confidence region for parameter in the errors-in-variables models. Journal of Multivariate Analysis 2003, 84(1), 101–115. [Google Scholar] [CrossRef]
- Cui, H.J.; Kong, E.F. Empirical likelihood confidence region for parameters in semi-linear errors-in-variables models. Scandinavian Journal of statistics 2006, 33(1), 153–168. [Google Scholar] [CrossRef]
- Cheng, C.L.; Tsai, J.R.; Schneeweiss, H. Polynomial regression with heteroscedastic measurement errors in both axes: estimation and hypothesis testing. Statistical Methods in Medical Research 2019, 28(9), 2681–2696. [Google Scholar] [CrossRef]
- He, X.M.; Liang, H. Quantile regression estimates for a class of linear and partially linear errors-in-variables models. Statistica Sinica 2000, 10, 129–140. [Google Scholar]
- Carroll, R.J.; Delaigle, A.; Hall, P. Nonparametric prediction in measurement error models. Journal of the American Statistical Association 2009, 104(487), 993–1003. [Google Scholar] [CrossRef]
- Jeon, J.M.; Park, B.U.; Keilegom, I.V. Nonparametric regression on lie groups with measurement errors. The Annals of Statistics 2022, 50(5), 2973–3008. [Google Scholar] [CrossRef]
- Chen, L.P.; Yi, G.Y. Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics 2020, 14(2), 4054–4109. [Google Scholar] [CrossRef]
- Shi, P.X.; Zhou, Y.C.; Zhang, A.R. High-dimensional log-error-in-variable regression with applications to microbial compositional data analysis. Biometrika 2022, 109(2), 405–420. [Google Scholar] [CrossRef]
- Li, B.; Yin, X.R. On surrogate dimension reduction for measurement error regression: an invariance law. The Annals of Statistics 2007, 35(5), 2143–2172. [Google Scholar] [CrossRef]
- Staudenmayer, J.; Buonaccorsi, J.P. Measurement error in linear autoregressive models. Journal of the American Statistical Association 2005, 100(471), 841–852. [Google Scholar] [CrossRef]
- Wei, Y.; Carroll, R.J. Quantile regression with measurement error. Journal of the American Statistical Association 2009, 104(487), 1129–1143. [Google Scholar] [CrossRef] [PubMed]
- Liang, H.; Li, R.Z. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association 2009, 104(485), 234–248. [Google Scholar] [CrossRef]
- Hall, P.; Ma, Y.Y. Estimation in a semiparametric partially linear errors-in-variables model. The Annals of Statistics 2007, 35(6), 2620–2638. [Google Scholar]
- Hall, P.; Ma, Y.Y. Semiparametric estimators of functional measurement error models with unknown error. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2007, 69, 429–446. [Google Scholar] [CrossRef]
- Ma, Y.Y.; Carroll, R.J. Locally efficient estimators for semiparametric models with measurement error. Journal of the American Statistical Association 2006, 101(476), 1465–1474. [Google Scholar] [CrossRef]
- Ma, Y.Y.; Li, R.Z. Variable selection in measurement error models. Bernoulli 2010, 16(1), 274–300. [Google Scholar] [CrossRef]
- Ma, Y.Y.; Hart, J.D.; Janicki, R.; Carroll, R.J. Local and omnibus goodness-of-fit tests in classical measurement error models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2011, 73, 81–98. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.Q. Estimation of nonlinear models with Berkson measurement errors. The Annals of Statistics 2004, 32(6), 2559–2579. [Google Scholar] [CrossRef]
- Nghiem, L.H.; Byrd, M.C.; Potgieter, C.J. Estimation in linear errors-in-variables models with unknown error distribution. Biometrika 2020, 107(4), 841–856. [Google Scholar] [CrossRef]
- Pan, W.Q.; Zeng, D.L.; Lin, X.H. Estimation in semiparametric transition measurement error models for longitudinal data. Biometrics 2009, 65(3), 728–736. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Zhou, Y. Calibration procedures for linear regression models with multiplicative distortion measurement errors. Brazilian Journal of Probability and Statistics 2020, 34(3), 519–536. [Google Scholar] [CrossRef]
- Zhang, J. Estimation and variable selection for partial linear single-index distortion measurement errors models. Statistical Papers 2021, 62, 887–913. [Google Scholar] [CrossRef]
- Wang, L.Q.; Hsiao, C. Method of moments estimation and identifiability of semiparametric nonlinear errors-in-variables models. Journal of Econometrics 2011, 165, 30–44. [Google Scholar] [CrossRef]
- Schennach, S.M.; Hu, Y.Y. Nonparametric identification and semiparametric estimation of classical measurement error models without side information. Journal of the American Statistical Association 2013, 108(501), 177–186. [Google Scholar] [CrossRef]
- Zhang, X.Y.; Ma, Y.Y.; Carroll, R.J. MALMEM: model averaging in linear measurement error models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2019, 81, 763–779. [Google Scholar] [CrossRef]
- Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models, 2nd ed.; Chapman and Hall: New York, America, 2006. [Google Scholar]
- Cheng, C.L.; Van Ness, J.W. Statistical Regression With Measurement Error; Oxford University Press: New York, America, 1999. [Google Scholar]
- Fuller, W.A. Measurement Error Models; John Wiley & Sons: New York, America, 1987. [Google Scholar]
- Li, G.R.; Zhang, J.; Feng, S.Y. Modern Measurement Error Models; Science Press: Beijing, China, 2016. [Google Scholar]
- Yi, G.Y. Statistical Analysis with Measurement Error or Misclassification; Springer: New York, America, 2017. [Google Scholar]
- Yi, G.Y.; Delaigle, A.; Gustafson, P. Handbook of Measurement Error Models; Chapman and Hall: New York, America, 2021. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 2001, 96(456), 1348–1360. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Zou, H. The adaptive Lasso and its oracle properties. Journal of the American Statistical Association 2006, 101(476), 1418–1429. [Google Scholar] [CrossRef]
- Candès, E.J.; Tao, T. The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics 2007, 35(6), 2313–2351. [Google Scholar]
- Lv, J.C.; Fan, Y.Y. A unified approach to model selection and sparse recovery using regularized least squares. The Annals of Statistics 2009, 37(6A), 3498–3528. [Google Scholar] [CrossRef]
- Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics 2010, 38(2), 894–942. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.Q.; Lv, J.C. A selective overview of variable selection in high dimensional feature space. Statistica Sinica 2010, 20, 101–148. [Google Scholar]
- Wu, Y.N.; Wang, L. A survey of tuning parameter selection for high-dimensional regression. Annual Review of Statistics and Its Application 2020, 7, 209–226. [Google Scholar] [CrossRef]
- Kuchibhotla, A.K.; Kolassa, J.E.; Kuffner, T.A. Post-selection inference. Annual Review of Statistics and Its Application 2022, 9, 1–23. [Google Scholar] [CrossRef]
- Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data: Methods, Theory and Applications; Springer-Verlag: Heidelberg, Germany, 2011. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; Taylor & Francis Group, CRC: Boca Raton, America, 2015. [Google Scholar]
- Fan, J.Q.; Li, R.Z.; Zhang, C.-H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall: Boca Raton, America, 2020. [Google Scholar]
- Fan, J.Q.; Lv, J.C. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008, 70, 849–911. [Google Scholar] [CrossRef]
- Barut, E.; Fan, J.Q.; Verhasselt, A. Conditional sure independence screening. Journal of the American Statistical Association 2016, 111(515), 1266–1277. [Google Scholar] [CrossRef] [PubMed]
- Fan, J.Q.; Song, R. Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics 2010, 38(6), 3567–3604. [Google Scholar] [CrossRef]
- Fan, J.Q.; Feng, Y.; Song, R. Nonparametric independence screening in sparse ultrahigh-dimensional additive models. Journal of the American Statistical Association 2011, 106(494), 544–557. [Google Scholar] [CrossRef] [PubMed]
- Li, G.R.; Peng, H.; Zhang, J.; Zhu, L.X. Robust rank correlation based screening. The Annals of Statistics 2012, 40(3), 1846–1877. [Google Scholar] [CrossRef]
- Ma, S.J.; Li, R.Z.; Tsai, C.L. Variable screening via quantile partial correlation. Journal of the American Statistical Association 2017, 112(518), 650–663. [Google Scholar] [CrossRef] [PubMed]
- Pan, W.L.; Wang, X.Q.; Xiao, W.N.; Zhu, H.T. A generic sure independence screening procedure. Journal of the American Statistical Association 2019, 114(526), 928–937. [Google Scholar] [CrossRef]
- Tong, Z.X.; Cai, Z.R.; Yang, S.S.; Li, R.Z. Model-free conditional feature screening with FDR control. Journal of the American Statistical Association 2022, in press. [Google Scholar] [CrossRef]
- Wen, C.H.; Pan, W.L.; Huang, M.; Wang, X.Q. Sure independence screening adjusted for confounding covariates with ultrahigh dimensional data. Statistica Sinica 2018, 28(1), 293–317. [Google Scholar]
- Wang, L.M.; Li, X.X.; Wang, X.Q.; Lai, P. Unified mean-variance feature screening for ultrahigh-dimensional regression. Computational Statistics 2022, 37, 1887–1918. [Google Scholar] [CrossRef]
- Zhao, S.F.; Fu, G.F. Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation. Journal of Multivariate Analysis 2022, 192, article–105081. [Google Scholar] [CrossRef]
- Slijepcevic, S. ; Megerian, S; Potkonjak, M. Location errors in wireless embedded sensor networks: sources, models, and effects on applications. Mobile Computing and Communications Review 2002. [Google Scholar]
- Purdom, E.; Holmes, S.P. Error distribution for gene expression data. Statistical Applications in Genetics and Molecular Biology 2005, 4(1), 16–16. [Google Scholar] [CrossRef] [PubMed]
- Loh, P.-L.; Wainwright, M.J. High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. The Annals of Statistics 2012, 40(3), 1637–1664. [Google Scholar] [CrossRef]
- Datta, A.; Zou, H. CoCoLasso for high-dimensional error-in-variables regression. The Annals of Statistics 2017, 45(6), 2400–2426. [Google Scholar] [CrossRef]
- Zheng, Z.M.; Li, Y.; Yu, C.X.; Li, G.R. Balanced estimation for high-dimensional measurement error models. Computational Statistics & Data Analysis 2018, 126, 78–91. [Google Scholar]
- Tao, T.; Pan, S.H.; Bi, S.J. Calibrated zero-norm regularized LS estimator for high-dimensional error-in-variables regression. Statistica Sinica 2018, 31(2), 909–933. [Google Scholar] [CrossRef]
- Rosenbaum, M.; Tsybakov, A. Sparse recovery under matrix uncertainty. The Annals of Statistics 2010, 38(5), 2620–2651. [Google Scholar] [CrossRef]
- Rosenbaum, M.; Tsybakov, A. Improved matrix uncertainty selector. From Probability to Statistics and Back: High-Dimensional Models and Processes 2013, 9, 276–290. [Google Scholar]
- Sørensen, Ø; Hellton, K.H.; Frigessi, A.; Thoresen, M. Covariate selection in high-dimensional generalized linear models with measurement error. Journal of Computational and Graphical Statistics 2018, 27, 739–749. [Google Scholar]
- Sørensen, Ø; Frigessi, A.; Thoresen, M. Measurement error in Lasso: impact and likelihood bias correction. Statistics Sinica 2019, 25, 809–829. [Google Scholar]
- Belloni, A.; Rosenbaum, M.; Tsybakov, A.B. Linear and conic programming estimators in high dimensional errors-in-variables models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2017, 79, 939–956. [Google Scholar] [CrossRef]
- Romeo, G.; Thoresen, M. Model selection in high-dimensional noisy data: a simulation study. Journal of Statistical Computation and Simulation 2019, 89(11), 2031–2050. [Google Scholar] [CrossRef]
- Brown, B.; Weaver, T.; Wolfson, J. Meboost: variable selection in the presence of measurement error. Statistics in Medicine 2019, 38, 2705–2718. [Google Scholar] [CrossRef] [PubMed]
- Nghiem, L.H.; Potgieter, C.J. Simulation-selection-extrapolation: estimation in high-dimensional errors-in-variables models. Biometrics 2019, 75, 1133–1144. [Google Scholar] [CrossRef]
- Jiang, F.; Ma, Y.Y. Poisson regression with error corrupted high dimensional features. Statistica Sinica 2022, 32, 2023–2046. [Google Scholar] [CrossRef]
- Byrd, M.; McGee, M. A simple correction procedure for high-dimensional generalized linear models with measurement error. arXiv preprint 2019, arXiv:1912.11740. [Google Scholar]
- Liang, F.M.; Jia, B.C.; Xue, J.N.; Li, Q.Z.; Luo, Y. An imputation–regularized optimization algorithm for high dimensional missing data problems and beyond. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2018, 80, 899–926. [Google Scholar] [CrossRef]
- van de Geer, S.; Bühlmann, P.; Ritov, Y.; Dezeure, R. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 2014, 42(3), 1166–1202. [Google Scholar] [CrossRef]
- Zhang, C.-H.; Zhang, S.S. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2014, 76, 217–242. [Google Scholar] [CrossRef]
- Ma, S.J.; Carroll, R.J.; Liang, H.; Xu, S.Z. Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates. The Annals of Statistics 2015, 43(5), 2102–2131. [Google Scholar] [CrossRef] [PubMed]
- Dezeure, R.; Bühlmann, P.; Meier, L.; Meinshausen, N. High-dimensional inference: confidence intervals, p-values and R-software hdi. Statistical Science 2015, 30(4), 533–558. [Google Scholar] [CrossRef]
- Ning, Y.; Liu, H. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics 2017, 45(1), 158–195. [Google Scholar] [CrossRef]
- Zhang, X.Y.; Cheng, G. Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association 2017, 112(518), 757–768. [Google Scholar] [CrossRef]
- Vandekar, S.N.; Reiss, P.T.; Shinohara, R.T. Interpretable high-dimensional inference via score projection with an application in neuroimaging. Journal of the American Statistical Association 2019, 114(526), 820–830. [Google Scholar] [CrossRef]
- Ghosh, S.; Tan, Z.Q. Doubly robust semiparametric inference using regularized calibrated estimation with high-dimensional data. Bernoulli 2022, 28(3), 1675–1703. [Google Scholar] [CrossRef]
- Belloni, A.; Chernozhukov, V.; Kaul, A. Confidence bands for coefficients in high dimensional linear models with error-in-variables. arXiv preprint 2017, arXiv:1703.00469. [Google Scholar]
- Li, M.Y.; Li, R.Z.; Ma, Y.Y. Inference in high dimensional linear measurement error models. Journal of Multivariate Analysis 2021, 184, article–104759. [Google Scholar] [CrossRef]
- Huang, X.D.; Bao, N.N.; Xu, K.; Wang, G.P. Variable selection in high-dimensional error-in-variables models via controlling the false discovery proportion. Communications in Mathematics and Statistics 2022, 10, 123–151. [Google Scholar] [CrossRef]
- Jiang, F.; Zhou, Y.Q.; Liu, J.X.; Ma, Y.Y. On high dimensional Poisson models with measurement error: hypothesis testing for nonlinear nonconvex optimization. The Annals of Statistics 2023, 51(1), 233–259. [Google Scholar] [CrossRef]
- Nghiem, L.H.; Hui, F.K.C.; Müller, S.; Welsh, A.H. Screening methods for linear errors-in-variables models in high dimensions. Biometrics 2022, in press. [Google Scholar] [CrossRef]
- Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T. Efficient projections onto the l1-ball for learning in high dimensions. Proceedings of International Conference on Machine Learning, New York, America, July 2008. [Google Scholar]
- Agarwal, A.; Negahban, S.; Wainwright, M.J. Fast global convergence of gradient methods for high-dimensional statistical recovery. The Annals of Statistics 2012, 40(5), 2452–2482. [Google Scholar] [CrossRef]
- Chen, Y.D.; Caramanis, C. Noisy and missing data regression: distribution-oblivious support recovery. Journal of Machine Learning Research 2013, 28, 383–391. [Google Scholar]
- Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning 2011, 3(1), 1–122. [Google Scholar] [CrossRef]
- Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. The Annals of Statistics 2004, 32(2), 407–499. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 2010, 33(1), 1–22. [Google Scholar] [CrossRef] [PubMed]
- Escribe, C.; Lu, T.Y.; Keller-Baruch, J.; Forgetta, V.; Xiao, B.W.; Richards, J.B.; Bhatnagar, S.; Oualkacha, K.; Greenwood, C.M.T. Block coordinate descent algorithm improves variable selection and estimation in error-in-variables regression. Genetic Epidemiology 2021, 45, 874–890. [Google Scholar] [CrossRef]
- James, G.M.; Radchenko, P. A generalized Dantzig selector with shrinkage tuning. Biometrika 2009, 96(2), 323–337. [Google Scholar] [CrossRef]
- Huang, J.; Horowitz, J.L.; Ma, S.G. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. The Annals of Statistics 2008, 36(2), 587–613. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
