Stability of correlation coefficient to “outliers” used in regression analysis

Authors

DOI:

https://doi.org/10.20535/mmtu-2019.1-015

Keywords:

Correlation analysis, Regression analysis, Correlation coefficient, Median correlation

Abstract

The question of the stability of the correlation coefficient in the presence of “emissions”, which in the regression analysis is often a consequence of the law of distribution of error, is excellent from the normal, for example, lognormal or normal with “severe cases”, is considered. In this case, they can not be rejected or corrected and remain in the training sample. At the same time there is a bias of the regression model in the direction of deviations. In addition, due to the change in the correlation coefficients in the emission factors, a change in the structure of the model is possible. The purpose of the work is to determine how large the displacement of the correlation coefficient can be, depending on the size of the coefficient itself, the method of its calculation, the magnitude of the emission, and the size of the sample for the various correlation coefficients.

References

Aivazyan, S., Buchshtaber, V. M., & Yenyukov, I. S. (1985). Applied statistics: Study of relationships [in Russian]. Moscow: Finansy i statistika.

Ezekiel, M., & Fox, K. (1959). Methods of correlation and regression analysis: Linear and curvilinear (3rd ed.). John Wiley & Sons, Inc.

Lapach, S. M. (2017). Correlation analysis in application to the definition of the structure of the regression equation [in Ukrainian]. In Proceedings of the Eighteenth International Scientific Conference Mykhailo Kravchuk conference, Kyiv–Lutsk, October 7–10 (pp. 119–123). Kyiv: Igor Sikorsky Kyiv Polytechnic Institute. http://matan.kpi.ua/public/files/2017/kravchuk-conf2017/Kravchuk2017-vol2.pdf#page=119

Lapach, S. M. (2018). Risks of using the correlation coeffi cient for a specific regression model specification [in Ukrainian]. Mathematical Machines and Systems, 2018(3), 142–148. http://www.immsp.kiev.ua/publications/articles/2018/2018_3/03_2018_Lapach.pdf

Lapach, S. N., Pasechnik, M. F., & Chubenko, A. V. (1999). Statistical methods in pharmacology and marketing of the pharmaceutical market [in Russian]. Kyiv: CJSC “Ukrspetsmontazh”.

Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression: A second course in statistics. Reading, Mass.: Addison-Wesley.

Orlov, A. I. (2018). Errors in the use of correlation and determination coeffi cients [in Russian]. Industrial laboratory. Diagnostics of materials, 84(3), 68–72. https://doi.org/10.26896/1028-6861-2018-84-3-68-72

Pardoux, C. (1982). Sur la sélection de variables en régression multiple: une mise au point. Cahiers du Bureau universitaire de recherche opérationnelle Série Recherche, 39, 101–133. http://www.numdam.org/item/BURO_1982__39-40__101_0

Shishlyannikova, L. M. (2009). The use of correlation analysis in psychology [in Russian]. Psychological Science and Education, 2009(1), 98–107. http://psyjournals.ru/psyedu/2009/n1/Shishlyannikova.shtml

Issue

Section

Application of mathematics in related sciences