Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression

Version 1 : Received: 17 May 2021 / Approved: 17 May 2021 / Online: 17 May 2021 (14:35:18 CEST)

How to cite: Hounmenou, C.G.; Behingan, B.M.; Chrysostome, C.; Gneyou, K.E.; Glele Kakaï, R.L. Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression. Preprints 2021, 2021050390 (doi: 10.20944/preprints202105.0390.v1). Hounmenou, C.G.; Behingan, B.M.; Chrysostome, C.; Gneyou, K.E.; Glele Kakaï, R.L. Robustness of Imputation Methods with Backpropagation Algorithm in Nonlinear Multiple Regression. Preprints 2021, 2021050390 (doi: 10.20944/preprints202105.0390.v1).

Abstract

Missing observations constitute one of the most important issues in data analysis in applied research studies. The magnitude and their structure impact parameters estimation in the modeling with important consequences for decision-making. This study aims to evaluate the efficiency of imputation methods combined with the backpropagation algorithm in a nonlinear regression context. The evaluation is conducted through a simulation study including sample sizes (50, 100, 200, 300 and 400) with different missing data rates (10, 20, 30 40 and 50%) and three missingness mechanisms (MCAR, MAR and MNAR). Four imputation methods (Last Observation Carried Forward, Random Forest, Amelia and MICE) were used to impute datasets before making prediction with backpropagation. 3-MLP model was used by varying the activation functions (Logistic-Linear, Logistic-Exponential, TanH-Linear and TanH-Exponentiel), the number of nodes in the hidden layer (3 - 15) and the learning rate (20 - 70%). Analysis of the performance criteria (R2, r and RMSE) of the network revealed good performances when it is trained with TanH-Linear functions, 11 nodes in the hidden layer and a learning rate of 50%. MICE and Random Forest were the most appropriate for data imputation. These methods can support up to 50% of missing rate with an optimal sample size of 200.

Subject Areas

Multilayer perceptron neural network; regression model; backpropagation; missing data; imputation method

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.