Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest

Version 1 : Received: 16 August 2023 / Approved: 16 August 2023 / Online: 17 August 2023 (03:57:54 CEST)

How to cite: Du, Y.; Xu, Z.; Huang, J.; Lyu, C.; Lu, C.; Chen, J. Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest. Preprints 2023, 2023081209. https://doi.org/10.20944/preprints202308.1209.v1 Du, Y.; Xu, Z.; Huang, J.; Lyu, C.; Lu, C.; Chen, J. Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest. Preprints 2023, 2023081209. https://doi.org/10.20944/preprints202308.1209.v1

Abstract

Breast cancer is the most common malignancy in women worldwide. The pathogenesis of this disease is closely related to the estrogen receptor alpha subtype (ERα). Therefore, it is of great importance to develop effective inhibitors of ERα activity for the treatment of breast cancer. In this paper, we propose a novel ensemble machine learning model for quantitative structure-activity relationship of anti-breast cancer drugs, which can effectively predict drug activity in small samples with multiple characteristic variables. To avoid the problem of over-fitting caused by low-correlation independent variables, the scoring mechanism of random forest was improved by incorporating three relevance indicators, including the maximum mutual information number, Pearson correlation coefficient and distance correlation coefficient, and 20 optimal molecular descriptors were selected. The Bayesian hyperparameter optimization method was used to optimize the parameters of multiple linear regression (MLR), support vector regression (SVR), and extreme gradient boosting (XGBoost), respectively. The AdaBoost strong learner was constructed by combining the weak learner with the weighted linear addition method. The results show that the proposed ensemble learning model has the best prediction performance compared to the three basic learner models and the CNN-LSTM combination prediction model. The root mean square error was reduced by 7.60%-26.51%. The mean relative error was reduced by 6.46%-30.92%. Goodness of fit increased by 9.57%-36.94%. Finally, the biological activities of 50 candidate compounds for ERα inhibitors were predicted, and it was found that 4-[2-benzyl-1-[4-(2-pyrrolidin-1-ylethoxy)phenyl]but-1-enyl]phenol had an excellent biological activity value pIC50, which had the potential to be an ERα inhibitor. The model proposed in this paper has good prediction accuracy, which can provide an effective reference for the discovery and development of anti-breast cancer drugs.

Keywords

Breast cancer; Activity prediction; Random forest; Feature selection; Bayesian hyperparameter optimization; AdaBoosting

Subject

Medicine and Pharmacology, Medicine and Pharmacology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.