Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically-Based Features

Version 1 : Received: 15 February 2021 / Approved: 16 February 2021 / Online: 16 February 2021 (10:04:48 CET)
Version 2 : Received: 18 February 2021 / Approved: 18 February 2021 / Online: 18 February 2021 (16:05:17 CET)
Version 3 : Received: 23 February 2021 / Approved: 24 February 2021 / Online: 24 February 2021 (13:14:01 CET)

A peer-reviewed article of this Preprint also exists.

García-Sosa, A.T. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules 2021, 26, 1285. García-Sosa, A.T. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules 2021, 26, 1285.

Journal reference: Molecules 2021, 26, 1285
DOI: 10.3390/molecules26051285

Abstract

Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML

Supplementary and Associated Material

Keywords

Machine Learning; Artificial Intelligence; Androgen Receptor; Random Forest; Deep Neural Network; Convolutional

Comments (1)

Comment 1
Received: 18 February 2021
Commenter: Alfonso T. Garcia-Sosa
Commenter's Conflict of Interests: Author
Comment: Changes after revision
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.