TY - JOUR
T1 - Effect of outliers on the coefficient of determination in multiple regression analysis with the application on the GPA for student
AU - AL Rezami, Afrah Yahya
N1 - Publisher Copyright:
© 2020 The Authors. Published by IASE.
PY - 2020/10
Y1 - 2020/10
N2 - This study aims to solve the problem of contradiction between the statistical significance and real significance of regression parameters when using multiple linear regression analysis. In this regard, an algorithm was presented based on the simple and multiple of determination coefficient, and the sum of averages to estimate multiple outliers when outliers are real. Regression analysis was applied to a phenomenon, whose results are known in advance (The relationship between Semester average and Cumulative average). The results were misleading, and we cannot firmly stand on analysis results. Also, the regression model did not improve much when an increased sample size more than doubled, so the study presents an algorithm for finding a solution to this contradiction. After checking Ordinary Least Squares (OLS) assumptions, outliers were identified, based on Cook's distance because it was the best. The proposed algorithm was compared with some robust regression methods, [Weighted Least Squares, Fully Modified Least Squares, and Least Median of Squares]. The results proved that the proposed method is a robust solution for outliers’ estimation. Therefore, it is recommended to use the proposed algorithm to estimate multiple outliers on other similar phenomena (e.g., The algorithm can be applied to a credit card transaction control system in a bank), and also software Packages statistical for the proposed algorithm. Also, the novelty of this study can be observed by investigating testing the significance of outliers as most of the previous researchers were interested in diagnosing the outliers without checking its significance.
AB - This study aims to solve the problem of contradiction between the statistical significance and real significance of regression parameters when using multiple linear regression analysis. In this regard, an algorithm was presented based on the simple and multiple of determination coefficient, and the sum of averages to estimate multiple outliers when outliers are real. Regression analysis was applied to a phenomenon, whose results are known in advance (The relationship between Semester average and Cumulative average). The results were misleading, and we cannot firmly stand on analysis results. Also, the regression model did not improve much when an increased sample size more than doubled, so the study presents an algorithm for finding a solution to this contradiction. After checking Ordinary Least Squares (OLS) assumptions, outliers were identified, based on Cook's distance because it was the best. The proposed algorithm was compared with some robust regression methods, [Weighted Least Squares, Fully Modified Least Squares, and Least Median of Squares]. The results proved that the proposed method is a robust solution for outliers’ estimation. Therefore, it is recommended to use the proposed algorithm to estimate multiple outliers on other similar phenomena (e.g., The algorithm can be applied to a credit card transaction control system in a bank), and also software Packages statistical for the proposed algorithm. Also, the novelty of this study can be observed by investigating testing the significance of outliers as most of the previous researchers were interested in diagnosing the outliers without checking its significance.
KW - Cumulative average
KW - Determination coefficient
KW - Outliers
KW - Regression
UR - https://www.scopus.com/pages/publications/85151120817
U2 - 10.21833/ijaas.2020.10.004
DO - 10.21833/ijaas.2020.10.004
M3 - Article
AN - SCOPUS:85151120817
SN - 2313-626X
VL - 7
SP - 30
EP - 37
JO - International Journal of Advanced and Applied Sciences
JF - International Journal of Advanced and Applied Sciences
IS - 10
ER -