Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Detecting Potential Outliers in Longitudinal Data with Time-Dependent Covariates

Version 1 : Received: 5 May 2023 / Approved: 6 May 2023 / Online: 6 May 2023 (08:32:28 CEST)

A peer-reviewed article of this Preprint also exists.

Mramba, L.K.; Liu, X.; Lynch, K.F.; Yang, J.; Aronsson, C.A.; Hummel, S.; Norris, J.M.; Virtanen, S.M.; Hakola, L.; Uusitalo, U.M.; et al. Detecting Potential Outliers in Longitudinal Data with Time-Dependent Covariates. European Journal of Clinical Nutrition 2024, doi:10.1038/s41430-023-01393-6. Mramba, L.K.; Liu, X.; Lynch, K.F.; Yang, J.; Aronsson, C.A.; Hummel, S.; Norris, J.M.; Virtanen, S.M.; Hakola, L.; Uusitalo, U.M.; et al. Detecting Potential Outliers in Longitudinal Data with Time-Dependent Covariates. European Journal of Clinical Nutrition 2024, doi:10.1038/s41430-023-01393-6.

Abstract

Outliers can influence regression model parameters and change the direction of the estimated effect, over-estimating or under-estimating the strength of the association between a response variable and an exposure of interest. Identifying visit-level outliers from longitudinal data with continuous time-dependent covariates is important especially when the distribution of such variable is highly skewed at follow-up visits. The primary objective was to identify potential outliers at follow-up visits using interquartile range (IQR) statistic, motivated by a large TEDDY dietary longitudinal and time-to-event data with a continuous time varying vitamin B12 intake as the exposure of interest and time to developing Islet Autoimmunity (IA) as the response variable. The IQR method was also applied to simulated data. To assess the impact of IQR-method detected outliers, data was analyzed using Cox-proportional hazard model with robust sandwich estimator. Partial residual diagnostic plots were used to detect highly influential outliers. Results showed how some of the detected outliers had large influence on the Cox regression model and changed both the direction of hazard ratios and the strength of association with the risk of developing IA. In conclusion, the IQR method is useful in identifying potential outliers at visit-level which can be further investigated.

Keywords

exploratory data analysis; non-parametric statistics; skewed data; survival analysis; repeated measures.

Subject

Public Health and Healthcare, Other

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.