Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape

Version 1 : Received: 11 January 2022 / Approved: 12 January 2022 / Online: 12 January 2022 (10:44:23 CET)

How to cite: Farnell, D.J. An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape. Preprints 2022, 2022010160 (doi: 10.20944/preprints202201.0160.v1). Farnell, D.J. An Exploration of Pathologies of Multilevel Principal Components Analysis in Statistical Models of Shape. Preprints 2022, 2022010160 (doi: 10.20944/preprints202201.0160.v1).

Abstract

3D facial surface imaging is a useful tool in dentistry and in terms of diagnostics and treatment planning. Between-groups PCA (bgPCA) is a method that has been used to analyse shapes in biological morphometrics, although various “pathologies” of bgPCA have recently been proposed. Monte Carlo (MC) simulated datasets were created here in order to explore “pathologies” of multilevel PCA (mPCA), where mPCA with two levels is equivalent to bgPCA. The first set of MC experiments involved 300 uncorrelated normally distributed variables, whereas the second set of MC experiments used correlated multivariate MC data describing 3D facial shape. We confirmed previous results of other researchers that indicated that bgPCA (and so also mPCA) can give a false impression of strong differences in component scores between groups when there is none in reality. These spurious differences in component scores via mPCA reduced strongly as the sample sizes per group were increased. Eigenvalues via mPCA were also found to be strongly effected by imbalances in sample sizes per group, although this problem was removed by using weighted forms of covariance matrices suggested by the maximum likelihood solution of the two-level model. However, this did not solve problems of spurious differences between groups in these simulations, which was driven by very small sample sizes in one group here. As a “rule of thumb” only, all of our experiments indicate that reasonable results are obtained when sample sizes per group in all groups are at least equal to the number of variables. Interestingly, the sum of all eigenvalues over both levels via mPCA scaled approximately linearly with the inverse of the sample size per group in all experiments. Finally, between-group variation was added explicitly to the MC data generation model in two experiments considered here. Results for the sum of all eigenvalues via mPCA predicted the asymptotic amount for the total amount of variance correctly in this case, whereas standard “single-level” PCA underestimated this quantity.

Keywords

Multilevel Principal Components Analysis (mPCA); 3D shape analysis; Monte Carlo simulations

Subject

MATHEMATICS & COMPUTER SCIENCE, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.