Volume 10, Issue 2 (Spring 2022)                   Iran J Health Sci 2022, 10(2): 14-28 | Back to browse issues page

XML Print

Semnan University , mahdi.roozbeh@semnan.ac.ir
Abstract:   (1194 Views)
Background and purposeMachine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning approach, SVR trains using a symmetrical loss function, which equally penalizes high and low misestimates. Recently, high-dimensional datasets are the most challenging problem that may be faced. The main problems in high-dimensional data are the estimation of the coefficients and interpretation. In the high-dimension problems, classical methods are not applicable because of a large number of predictor variables. SVR is an excellent alternative method to analyze such datasets. One of the main advantages of SVR is that its computational complexity does not depend on the dimensionality of the input space. Additionally, it has excellent generalization capability, with high prediction accuracy.
Methods: SVR is one of the best methods to analyze high-dimensional datasets. It is a really reliable and robust approach to have a good fit with high accuracy. SVR uses the same principles as the support vector machine for classification, with only a few minor differences.
Results: The techniques for analyzing the high-dimension datasets are really important methods because we frequently face such datasets in medical science and gene expression. It is not easy to analyze the high-dimension datasets because the classic methods cannot be used to estimate and interpret them. Therefore, we have to use alternative methods to analyze them. SVR is one of the best methods that can be applied. In this research, SVR is used in a real high-dimension dataset about the gene expression in eye disease, and then it is compared with well-known methods:  LASSO and Sparse least trimmed squared (sparse LTS) methods. Based on the numerical result, SVR and Sparse LTS were better than LASSO, since the real dataset contained outliers (bad observation with big residuals).
Conclusions: SVR method was the best method to model and predict the high-dimensional mammalian eye dataset, because it was not affected by the outliers' corruptive impact, and it has minimum MSE (mean squares error), MAE (mean absolute error) and RMSE (root mean squared error) fitting criteria in comparison with the classical methods such as LASSO and sparse LTS estimations. Thus, sparse LTS was found to act better than the LASSO method. Moreover, stabilization of the data and freedom from obtaining the regularization parameter by running a complicated algorithmic program, which decreased the computational costs dramatically, were the invaluable advantages of this technique in comparison with the classical methods.
Full-Text [PDF 1349 kb]   (455 Downloads)    
Type of Study: Original Article | Subject: Biostatistics

1. Hastie T, Tibshirani B, Friedman J. The elements of statistical learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer; 2017.‏
2. Boser BE, Guyon IM, Vapnik, VN. A training algorithm for optimal margin classifiers. Proceedings of the fifth annual workshop on computational learning theory. 1992 July 27-29 Pittsburgh, United States. 1992. p. 144-152.‏ [DOI:10.1145/130385.130401]
3. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995; 20(3): 273-297. https://doi.org/10.1007/BF00994018 [DOI:10.1023/A:1022627411411]
4. Vapnik VN. The nature of statistical learning theory. 2nd ed. New York: Springer; 2000. [DOI:10.1007/978-1-4757-3264-1]
5. Noble WS. What is a support vector machine?. Nature biotechnology. 2006;24(12): 1565-1567.‏ [DOI:10.1038/nbt1206-1565] [PMID]
6. Roozbeh M, Maanavi M, Babaie-Kafaki S. Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data. Iranian Journal of Health Sciences. 2020;8(2):9-22. [DOI:10.18502/jhs.v8i2.4025]
7. Efron B, Hastie T. Computer age statistical inference: Data mining, inference and prediction. Cambridge: Cambridge University Press; 2016.‏ [DOI:10.1017/CBO9781316576533]
8. Awad M, Khanna R. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers. Berkeley: Apress, CA; 2015. [DOI:10.1007/978-1-4302-5990-9]
9. Roozbeh M, Babaie-Kafaki S and Aminifard Z. Improved high-dimensional regression models with matrix approximations applied to the comparative case studies with support vector machines. Optimization Methods and Software, 2022; 1-18. [DOI:10.1080/10556788.2021.2022144]
10. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with applications in R. 2nd ed. Springer, New York, 2021. [DOI:10.1007/978-1-0716-1418-1]
11. Roozbeh M, Babaie-Kafaki S and Manavi M. A heuristic algorithm to combat outliers and multicollinearity in regression model analysis. Iranian Journal of Numerical Analysis and Optimization. 2022;12 (1):173-186.
12. Scheetz TE, Kim KYA, Swiderski RE, Philp AR, Braun TA, Knudtson KL et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proceedings of the National Academy of Sciences of the United States of America. 2006; 103(39):14429-14434 [DOI:10.1073/pnas.0602562103] [PMID] [PMCID]

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.