Preprint Communication Version 1 Preserved in Portico This version is not peer-reviewed

PCR, PLS, or OPLS Evaluation of different regression techniques for hypothesis generation

Version 1 : Received: 24 November 2021 / Approved: 29 November 2021 / Online: 29 November 2021 (15:42:03 CET)

How to cite: Ahuja, A. PCR, PLS, or OPLS Evaluation of different regression techniques for hypothesis generation. Preprints 2021, 2021110549 (doi: 10.20944/preprints202111.0549.v1). Ahuja, A. PCR, PLS, or OPLS Evaluation of different regression techniques for hypothesis generation. Preprints 2021, 2021110549 (doi: 10.20944/preprints202111.0549.v1).

Abstract

In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation

Keywords

Principal Component Regression, Partial Least Squares, Orthogonal Partial Least Squares, multivariate regression, hypothesis generation, Parkinson’s disease

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.