Submitted:
30 June 2023
Posted:
03 July 2023
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. Feature extraction
2.3. Machine Learning Methods
2.4. Feature Selection Using Genetic Algorithm (GA)
2.5. Performance evaluation
3. Results
3.1. Comparison between different methods
| Methods / Metric | MAE | PCC | MAE (% imp.) |
PCC (% imp.) |
Average (% imp.) |
|---|---|---|---|---|---|
| State-of-the-art method [19] | 0.126 | 0.598 | - | - | - |
| Extra Trees Regressor | 0.122 | 0.741 | 3.57% | 23.88% | 13.73% |
| XGB Regressor | 0.123 | 0.727 | 2.67% | 21.57% | 12.12% |
| KNN Regressor | 0.129 | 0.681 | -2.30% | 13.89% | 5.79% |
| Decision Tree Regressor | 0.167 | 0.527 | -24.38% | -11.84% | -18.11% |
| LSTM | 0.125 | 0.678 | 1.13% | 13.35% | 7.24% |
| CNN | 0.166 | 0.608 | -24.21% | 1.68% | -11.27% |
| Tabnet | 0.117 | 0.736 | 7.26% | 23.09% | 15.18% |
| LightGBM Regressor | 0.118 | 0.745 | 6.59% | 24.50% | 15.54% |
| Methods / Metric | MAE | PCC | MAE (% imp.) |
PCC (% imp.) |
Average (% imp.) |
|---|---|---|---|---|---|
| State-of-the-art method [19] | 0.135 | 0.602 | - | - | - |
| Extra Trees Regressor | 0.131 | 0.729 | 2.77% | 21.10% | 11.94% |
| XGB Regressor | 0.132 | 0.715 | 2.22% | 18.73% | 10.48% |
| KNN Regressor | 0.139 | 0.670 | -2.63% | 11.24% | 4.31% |
| Decision Tree Regressor | 0.179 | 0.511 | -24.65% | -15.11% | -19.88% |
| LSTM | 0.132 | 0.665 | 2.29% | 10.48% | 6.38% |
| CNN | 0.144 | 0.702 | -6.46% | 16.61% | 5.07% |
| Tabnet | 0.126 | 0.724 | 7.24% | 20.28% | 13.76% |
| LightGBM Regressor | 0.127 | 0.733 | 6.09% | 21.84% | 13.96% |
3.2. Hyperparameters optimization
3.3. Feature window selection
3.4. Comparison with the State-of-the-art method
4. Discussion
The distribution of torsion-angle fluctuation

Relationship between Δφ and Δψ

Relationship between torsion-angle fluctuation and disordered regions


5. Conclusions
Author Contributions
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PAJJotACS: A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. 1995, 117(19):5179-5197.
- Nechab M, Mondal S, Bertrand MPJCAEJ: 1, n-Hydrogen-Atom Transfer (HAT) Reactions in Which n≠ 5: An Updated Inventory. 2014, 20(49):16034-16059.
- Wright PE, Dyson HJJJomb: Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. 1999, 293(2):321-331.
- Quiocho FAJArob: Carbohydrate-binding proteins: tertiary structures and protein-sugar interactions. 1986, 55(1):287-315.
- Mosimann S, Meleshko R, James MNJPS, Function,, Bioinformatics: A critical assessment of comparative molecular modeling of tertiary structures of proteins. 1995, 23(3):301-317.
- Gao J, Yang Y, Zhou Y: Grid-based prediction of torsion angle probabilities of protein backbone and its application to discrimination of protein intrinsic disorder regions and selection of model structures. BMC Bioinformatics 2018, 19(1):29. [CrossRef]
- Heffernan R, Yang Y, Paliwal K, Zhou Y: Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility. Bioinformatics 2017, 33(18):2842-2849. [CrossRef]
- Lyons J, Dehzangi A, Heffernan R, Sharma A, Paliwal K, Sattar A, Zhou Y, Yang YJJocc: Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network. 2014, 35(28):2040-2046.
- Yang Y, Faraggi E, Zhao H, Zhou YJB: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. 2011, 27(15):2076-2082. [CrossRef]
- Karchin R, Cline M, Mandel-Gutfreund Y, Karplus KJPS, Function,, Bioinformatics: Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. 2003, 51(4):504-514.
- Rohl CA, Strauss CE, Misura KM, Baker D: Protein structure prediction using Rosetta. In: Methods in enzymology. vol. 383: Elsevier; 2004: 66-93. [CrossRef]
- Faraggi E, Yang Y, Zhang S, Zhou YJS: Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. 2009, 17(11):1515-1527. [CrossRef]
- Wu S, Zhang YJPo: ANGLOR: a composite machine-learning algorithm for protein backbone torsion angle prediction. 2008, 3(10):e3400.
- Yang Y, Gao J, Wang J, Heffernan R, Hanson J, Paliwal K, Zhou Y: Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2018, 19(3):482-494.
- Li H, Hou J, Adhikari B, Lyu Q, Cheng JJBb: Deep learning methods for protein torsion angle prediction. 2017, 18(1):1-13.
- Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A et al: Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596(7873):583-589.
- Wu R, Ding F, Wang R, Shen R, Zhang X, Luo S, Su C, Wu Z, Xie Q, Berger B et al: High-resolution de novo structure prediction from primary sequence. bioRxiv 2022.
- Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, Santos Costa Ad, Fazel-Zarandi M, Sercu T, Candido S et al: Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv 2022:2022.2007.2020.500902.
- Zhang T, Faraggi E, Zhou Y: Fluctuations of backbone torsion angles obtained from NMR-determined structures and their prediction. Proteins 2010, 78(16):3353-3362. [CrossRef]
- Kabir MWU, Alawad DM, Mishra A, Hoque MT: Prediction of Phi and Psi Angle Fluctuations from Protein Sequences In: Accepted for 20th IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology: 2023; Eindhoven, The Netherlands.
- Md Kauser A, Avdesh M, Md Tamjidul H: TAFPred: An Efficient Torsion Angle Fluctuation Predictor of a Protein from its Sequence. In.; 2018.
- Iqbal S, Mishra A, Hoque T: Improved Prediction of Accessible Surface Area Results in Efficient Energy Function Application. Journal of Theoretical Biology 2015, 380:380-391. [CrossRef]
- Iqbal S, Hoque MT: PBRpredict-Suite: a suite of models to predict peptide-recognition domain residues from protein sequence. Bioinformatics 2018:bty352-bty352.
- Iqbal S, Hoque MT: Estimation of position specific energy as a feature of protein residues from sequence alone for structural classification. PLOS ONE 2016, 11(9):e0161452.
- Zhu L, Yang J, Song JN, Chou KC, Shen HB: Improving the accuracy of predicting disulfide connectivity by feature selection. Computational Chemistry 2010, 31(7):1478-1485. [CrossRef]
- Islam MN, Iqbal S, Katebi AR, Hoque MT: A balanced secondary structure predictor Journal of Theoretical Biology, 2016, 389:60–71. [CrossRef]
- Wright PE, Dyson HJ: Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. Journal of Molecular Biology 1999, 293(2):321-331. [CrossRef]
- Liu J, Tan H, Rost B: Loopy proteins appear conserved in evolution. Journal of Molecular Biology 2002, 322(1):53-64. [CrossRef]
- Tompa P: Intrinsically unstructured proteins. Trends in Biological Sciences 2002, 27(10):527-533. [CrossRef]
- Ho TK: Random decision forests. In: Document Analysis and Recognition, 1995, Proceedings of the Third International Conference on; Montreal, Que., Canada. IEEE 1995: 278-282.
- Breiman L: Bagging predictors. Machine Learning 1996, 24(2):123-140.
- Geurts P, Ernst D, Wehenkel L: Extremely randomized trees. Machine Learning 2006, 63(1):3-42. [CrossRef]
- Chen T, Guestrin C: XGBoost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM; 2016: 785-794.
- Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning, 2 edn: Springer-Verlag New York; 2009.
- Szilágyi A, Skolnick J: Efficient prediction of nucleic acid binding function from low-resolution protein structures. Journal of Molecular Biology 2006, 358(3):922-933. [CrossRef]
- Altman NS: An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician 1992, 46:175-185.
- Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems; Long Beach, California, USA. Curran Associates Inc. 2017: 3149–3157.
- Arik SO, Pfister T: TabNet: Attentive Interpretable Tabular Learning. arXiv 2019. [CrossRef]
- Hoque MT, Iqbal S: Genetic algorithm-based improved sampling for protein structure prediction. International Journal of Bio-Inspired Computation 2017, 9(3):129-141.
- Hoque MT, Chetty M, Sattar A: Protein Folding Prediction in 3D FCC HP Lattice Model using Genetic Algorithm. In: IEEE Congress on Evolutionary Computation (CEC) Singapore; Singapore. 2007: 4138-4145.
- Hoque MT, Chetty M, Lewis A, Sattar A, Avery VM: DFS Generated Pkathways in GA Crossover for Protein Structure Prediction. Neurocomputing 2010, 73:2308-2316.





| Name of Metric | Definition |
|---|---|
| Pearson Correlation Coefficient (PCC) = | |
| Mean Absolute Error (MAE) = |
| Methods / Metric | MAE | PCC | MAE (% imp.) |
PCC (% imp.) |
Average (% imp.) |
|---|---|---|---|---|---|
| State-of-the-art method [19] | 0.126 | 0.598 | - | - | - |
| TAFPred | 0.114 | 0.746 | 10.08% | 24.83% | 17.45% |
| Methods / Metric | MAE | PCC | MAE (% imp.) |
PCC (% imp.) |
Average (% imp.) |
|---|---|---|---|---|---|
| State-of-the-art method [19] | 0.135 | 0.602 | - | - | - |
| TAFPred | 0.123 | 0.737 | 9.93% | 22.37% | 16.15% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).