Volume 8, Issue 2 (Spring 2020)                   Iran J Health Sci 2020, 8(2): 9-22 | Back to browse issues page


XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Roozbeh M, Maanavi M, Babaie-Kafaki S. Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data. Iran J Health Sci 2020; 8 (2) :9-22
URL: http://jhs.mazums.ac.ir/article-1-703-en.html
Faculty of Mathematics, Statistics & Computer Science, Semnan University, Semnan, Iran , mahdi.roozbeh@semnan.ac.ir
Abstract:   (2172 Views)
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variables. In addition, classical methods are affected by the presence of outliers and collinearity.
Methods: Nowadays, many real-world data sets carry structures of high-dimensional problems. To handle this problem, we used the least absolute shrinkage and selection operator (LASSO). Also, due to the flexibility and applicability of the semiparametric model in medical data, it can be used for modeling the genomic data. Motivated by these, here an improved robust approach in a high-dimensional data set was developed for the analysis of gene expression and prediction in the presence of outliers.
Results: Among the common problems in regression analysis, there was the problem of outliers. In the regression concept, an outlier is a point that fails to follow the main linear pattern of the data. The ordinary least-squares estimator was found potentially sensitive to the outliers; this fact provided necessary motivations to investigate robust estimations. Generally, the robust regression is among the most popular problems in the statistics community. In the present study, the least trimmed squares (LTS) estimation was applied to overcome the outlier problem.
Conclusions: We have proposed an optimization approach for semiparametric models to combat outliers in the data set. Especially, based on a penalization LASSO scheme, we have suggested a nonlinear integer programming problem as the semiparametric model which can be effectively solved by any evolutionary algorithm. We have also studied a real-world application related to the riboflavin production. The results showed that the proposed method was reasonably efficient in contrast to the LTS Method.
 
Full-Text [PDF 796 kb]   (1034 Downloads)    
Type of Study: Original Article | Subject: Biostatistics

References
1. Liu H, Sirish S, Wei J. On-line outlier detection and data cleaning. Computers and Chemical Engineering. 2004;28(9):1635-1647. [DOI:10.1016/j.compchemeng.2004.01.009]
2. Hawkins D. Identification of outliers. London: Chapman and Hall; 1980. [DOI:10.1007/978-94-015-3994-4]
3. Barnett V, Lewis T. Outliers in statistical data. 3rd ed. Chichester: John Wiley and Sons; 1994.
4. Beckman RJ, Cook RD. Outlier ... ... .... s. Technometrics. 1983;25(2):119-149. https://doi.org/10.2307/1268541 [DOI:10.1080/00401706.1983.10487840]
5. Sheather SJ. A modern approach to regression with R. New York: Springer; 2009. [DOI:10.1007/978-0-387-09608-7]
6. Moore DS, Mccabe GP, Criag BA. Introduction to the practice of statistics. 9th ed. New York: WH Freeman and Company; 2017.
7. Das MK, Gogoi B. Influential observations and cutoffs of different influence measures in multiple linear regression. International Journal of Computational and Theoretical Statistics. 2015;2 (2):79-85. [DOI:10.12785/ijcts/020202]
8. Rousseeuw PJ, Van Driessen K. Computing LTS regression for large data sets. Data mining and Knowledge Discovery. 2006;12(1):29-45. [DOI:10.1007/s10618-005-0024-4]
9. Engle RF, Granger CWJ, Rice J, WEISS A. Semiparametric estimation of the relation between weather and electricity sale. Journal of the American Statistical Association. 1986;81(394):310-320. [DOI:10.1080/01621459.1986.10478274]
10. Yatchew A. An elementary estimator of the partial linear model. Economics Letters. 1997;57(2):135-143. [DOI:10.1016/S0165-1765(97)00218-8]
11. Yatchew A. Nonparametric regression techniques in economics. Journal of Economic Literature. 1998;36(2):669-721.
12. Leskovec J, Rajaraman A, Ullman JD. Mining of massive datasets. 2nd ed. Cambridge: Cambridge University Press; 2014. [DOI:10.1017/CBO9781139924801] [PMID]
13. Samet H. Foundations of multidimensional and metric data structures. 1st ed. Burlington: Morgan Kaufmann; 2006.
14. Tan N, Steinbach M, Kumar V. Introduction to data mining. Son Francisco: Pearson Addison Wesley, 2006.
15. Tibshirani R. Regression shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, Ser. B. 1996;58(1):267-288. [DOI:10.1111/j.2517-6161.1996.tb02080.x]
16. Rousseeuw PJ. Least median of squares regression. Journal of the American Statistical Association. 1984;79(388):871-880. [DOI:10.1080/01621459.1984.10477105]
17. Visek JA. Regression with high breakdown point. Proceedings of the 11-th summer school JCMF; 2000 Sep 11-15; Nectiny, Czech. 2001. p. 324-356.
18. Rousseeuw PJ, Leroy AM. Robust regression and Outlier Detection. New York: John Wiley and Sons: 1987. [DOI:10.1002/0471725382]
19. Alfons A, Croux C, Gelper S. Sparse least trimmed squares regression for analyzing high dimesional large data set. The Annals of Applied Statistics. 2013;7(1): 226-248. [DOI:10.1214/12-AOAS575]
20. Härdle WK, Liang H, Gao J. Partially linear models. Heidelberg: Physika Verlag; 2000. [DOI:10.1007/978-3-642-57700-0]
21. Speckman P. Kernel smoothing in partial linear models. Journal of the Royal Statistical Society, Ser B. 1988;50(3):413-436. [DOI:10.1111/j.2517-6161.1988.tb01738.x]
22. Roozbeh M. Robust ridge estimator in restricted semiparametric regression models. Journal of Multivariate Analysis. 2016;147:127-144. [DOI:10.1016/j.jmva.2016.01.005]
23. Amini M, Roozbeh M. Optimal partial ridge estimation in restricted semiparametric regression models. Journal of Multivariate Analysis. 2015;136:26-40. [DOI:10.1016/j.jmva.2015.01.005]
24. Roozbeh M. Optimal QR-based estimation in partially linear regression models with correlated errors using GCV criterion. Computational Statistics & Data Analysis. 2018;117:45-61. [DOI:10.1016/j.csda.2017.08.002]
25. Roozbeh M, Babaie-Kafaki S, Naeimi Sadigh A. A heuristic approach to combat multicollinearity in least trimmed squares regression analysis. Applied Mathematical Modeling. 2018;57:105-120. [DOI:10.1016/j.apm.2017.11.011]
26. Akdeniz F, Roozbeh M. Generalized difference-based weighted mixed almost unbiased ridge estimator in partially linear models. Statistical Papers. 2019;60:1717-1739. [DOI:10.1007/s00362-017-0893-9]
27. Hall P, Kay J, Titterington DM. Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika. 1990;77(3):521-528. [DOI:10.1093/biomet/77.3.521]
28. Buhlmann P, Kalisch M, Meier L. High dimensional statistics with a view towards applications in biology. Annual Review of Statistics and its Applications. 2014;1(1):255-278. [DOI:10.1146/annurev-statistics-022513-115545]
29. Arashi M, Roozbeh M. Some improved estimation strategies in high dimensional semiparametric regression models with application to riboflavin production data. Statistical Papers. 2019;60:667-686. [DOI:10.1007/s00362-016-0843-y]
30. Amini M, Roozbeh M. Improving the prediction performance of the LASSO by subtracting the additive structural noises. Computational Statistics. 2019;34:415-432. [DOI:10.1007/s00180-018-0849-0]
31. Babaie-Kafaki S, Roozbeh M. A revised Cholesky decomposition to combat multicollinearity in multiple regression models. Journal of Statistical Computation & Simulation. 2017;87(12):2298--2308. [DOI:10.1080/00949655.2017.1328599]
32. Roozbeh M, Babaie-Kafaki S, Arashi M. A class of biased estimators based on QR decomposition. Linear Algebra and its Applications. 2016;508:190-205. [DOI:10.1016/j.laa.2016.07.009]
33. Roozbeh M, Babaie-Kafaki S. Extended least trimmed squares estimator in semiparametric regression models with correlated errors. Journal of Statistical Computation & Simulation. 2016;86(2):357-372. [DOI:10.1080/00949655.2015.1014371]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2024 CC BY-NC 4.0 | Iranian Journal of Health Sciences

Designed & Developed by : Yektaweb