Preprint
Article

This version is not peer-reviewed.

Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm

Submitted:

01 December 2025

Posted:

04 December 2025

You are already at the latest version

Abstract
In complex energy storage operating scenarios, batteries seldom undergo complete charge–discharge cycles required for periodic capacity calibration. Methods based on accelerated aging experiments can indicate possible aging paths; however, due to uncertainties like changing operating conditions, environmental variations, and manufacturing inconsistencies, the degradation information obtained from such experiments may not be applicable to the entire lifecycle. To address this, we develop a stage-wise state-of-health (SOH) prediction approach that combines offline training with online updating. During the offline training phase, multiple single-cell experiments were conducted under various combinations of depth of discharge (DOD) and C-rate. Multi-dimensional health features (HFs) were extracted and an accelerated aging probability pAA was defined. Based on the correlation statistics between HF, kHF, SOH, and pAA, all cells in the dataset were divided into general early, middle, and late aging stages. For each stage, cells were further classified by their longevity (long, medium, short), and multiple models were trained offline for each category. The results show that models trained on cells following similar aging paths achieve significantly better performance than a model trained on all data combined. Meanwhile, HF optimization was performed via a three-step process: an initial screening based on expert knowledge, a second screening using Spearman correlation coefficients, and an automatic feature importance ranking using a random forest regression (RFR) model. The proposed method offers the following innovations: (1) The stagewise multi-model strategy significantly improves SOH prediction accuracy across the entire lifecycle, maintaining the mean absolute percentage error (MAPE) within 1%. (2) The improved model provides uncertainty quantification, issuing a warning signal at least 50 cycles before the onset of accelerated aging, thereby enabling early detection of accelerating degradation. (3) Analysis of feature importance from the model outputs allows indirect identification of the primary aging mechanisms at different stages. (4) The model is robust against missing or low-quality HFs—if certain features cannot be obtained or are of poor quality, the prediction process does not fail.
Keywords: 
;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated