Version 1
: Received: 3 January 2023 / Approved: 4 January 2023 / Online: 4 January 2023 (02:26:50 CET)
How to cite:
Jiang, H.; Shang, S.; Sha, Y.; Li, L. Edeepsadpr: An Extensive Deep-Learning Architecture for Prediction of the in Situ Crosstalks of Serine Phosphorylation and ADP-Ribosylation. Preprints2023, 2023010040. https://doi.org/10.20944/preprints202301.0040.v1
Jiang, H.; Shang, S.; Sha, Y.; Li, L. Edeepsadpr: An Extensive Deep-Learning Architecture for Prediction of the in Situ Crosstalks of Serine Phosphorylation and ADP-Ribosylation. Preprints 2023, 2023010040. https://doi.org/10.20944/preprints202301.0040.v1
Jiang, H.; Shang, S.; Sha, Y.; Li, L. Edeepsadpr: An Extensive Deep-Learning Architecture for Prediction of the in Situ Crosstalks of Serine Phosphorylation and ADP-Ribosylation. Preprints2023, 2023010040. https://doi.org/10.20944/preprints202301.0040.v1
APA Style
Jiang, H., Shang, S., Sha, Y., & Li, L. (2023). Edeepsadpr: An Extensive Deep-Learning Architecture for Prediction of the in Situ Crosstalks of Serine Phosphorylation and ADP-Ribosylation. Preprints. https://doi.org/10.20944/preprints202301.0040.v1
Chicago/Turabian Style
Jiang, H., Yutong Sha and Lei Li. 2023 "Edeepsadpr: An Extensive Deep-Learning Architecture for Prediction of the in Situ Crosstalks of Serine Phosphorylation and ADP-Ribosylation" Preprints. https://doi.org/10.20944/preprints202301.0040.v1
Abstract
Protein phosphorylation and ADP-ribosylation (ADPr), as two types of post-translational modifications (PTM), are the process of adding phosphate group and ADP-ribose moieties to proteins, respectively. Although both PTM types can occur on many amino acid types, serine is the most common. Serine phosphorylation (pS), serine ADPr (SADPr), and their in situ crosstalks (pSADPr) play essential roles in biological processes. Although in silico classifiers have been developed for predicting pS and SADPr sites, the classifier for predicting pSADPr sites is unavailable. In this study, we developed classifiers to predict pSADPr sites. Specifically, we collected 3250 human pSADPr, 7520 SADPr, 151,227 pS and 80,096 unmodified serine sites. Based on them, we investigated the characteristics of pSADPr sites and constructed three classifiers to predict pSADPr sites from the pS dataset, the SADPr dataset and the protein sequences separately. We built and evaluated five deep-learning classifiers in ten-fold cross-validation and independent test datasets. Three of them (e.g. Convolutional Neural Network with the One-Hot encoding, dubbed CNNOH) performed better than the rest two. For instance, CNNOH had the AUC values of 0.700, 0.914 and 0.954 for recognizing pSADPr sites from the SADPr, pS and unmodified serine sites.Therefore, it is challenging to distinguish pSADPr sites from SADPr sites compared to the other two. It is consistent with our observation that pSADPr's characteristics are more similar to those of SADPr than the rest. Furthermore, we used the classifiers as base classifiers to develop a few stacking-based ensemble classifiers to improve performance. However, none of the ensemble classifiers showed better performances, suggesting that the base classifiers have good enough performances. Finally, we developed an online tool for extensively predicting human pSADPr sites based on the CNNOH classifier, dubbed EdeepSADPr. It is freely available through http://edeepsadpr.bioinfogo.org/.
Keywords
ADP-ribosylation; proteomics; post-translational modifications; deep-learning; stacking-based ensemble learning; protein network
Subject
Biology and Life Sciences, Biology and Biotechnology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.