Version 1
: Received: 16 August 2023 / Approved: 16 August 2023 / Online: 17 August 2023 (03:57:54 CEST)
How to cite:
Du, Y.; Xu, Z.; Huang, J.; Lyu, C.; Lu, C.; Chen, J. Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest. Preprints2023, 2023081209. https://doi.org/10.20944/preprints202308.1209.v1
Du, Y.; Xu, Z.; Huang, J.; Lyu, C.; Lu, C.; Chen, J. Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest. Preprints 2023, 2023081209. https://doi.org/10.20944/preprints202308.1209.v1
Du, Y.; Xu, Z.; Huang, J.; Lyu, C.; Lu, C.; Chen, J. Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest. Preprints2023, 2023081209. https://doi.org/10.20944/preprints202308.1209.v1
APA Style
Du, Y., Xu, Z., Huang, J., Lyu, C., Lu, C., & Chen, J. (2023). Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest. Preprints. https://doi.org/10.20944/preprints202308.1209.v1
Chicago/Turabian Style
Du, Y., Cunhao Lu and Jian Chen. 2023 "Integrated Learning Activity Prediction Model of BHO-AdaBoosting Anti-Breast Cancer ERα Inhibitor Based on Improved Random Forest" Preprints. https://doi.org/10.20944/preprints202308.1209.v1
Abstract
Breast cancer is the most common malignancy in women worldwide. The pathogenesis of this disease is closely related to the estrogen receptor alpha subtype (ERα). Therefore, it is of great importance to develop effective inhibitors of ERα activity for the treatment of breast cancer. In this paper, we propose a novel ensemble machine learning model for quantitative structure-activity relationship of anti-breast cancer drugs, which can effectively predict drug activity in small samples with multiple characteristic variables. To avoid the problem of over-fitting caused by low-correlation independent variables, the scoring mechanism of random forest was improved by incorporating three relevance indicators, including the maximum mutual information number, Pearson correlation coefficient and distance correlation coefficient, and 20 optimal molecular descriptors were selected. The Bayesian hyperparameter optimization method was used to optimize the parameters of multiple linear regression (MLR), support vector regression (SVR), and extreme gradient boosting (XGBoost), respectively. The AdaBoost strong learner was constructed by combining the weak learner with the weighted linear addition method. The results show that the proposed ensemble learning model has the best prediction performance compared to the three basic learner models and the CNN-LSTM combination prediction model. The root mean square error was reduced by 7.60%-26.51%. The mean relative error was reduced by 6.46%-30.92%. Goodness of fit increased by 9.57%-36.94%. Finally, the biological activities of 50 candidate compounds for ERα inhibitors were predicted, and it was found that 4-[2-benzyl-1-[4-(2-pyrrolidin-1-ylethoxy)phenyl]but-1-enyl]phenol had an excellent biological activity value pIC50, which had the potential to be an ERα inhibitor. The model proposed in this paper has good prediction accuracy, which can provide an effective reference for the discovery and development of anti-breast cancer drugs.
Keywords
Breast cancer; Activity prediction; Random forest; Feature selection; Bayesian hyperparameter optimization; AdaBoosting
Subject
Medicine and Pharmacology, Medicine and Pharmacology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.