Submitted:
28 August 2023
Posted:
29 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data preparation
| Category | Training | Validation | Test | Total |
|---|---|---|---|---|
| Sweet | 637 | 91 | 178 | 906 |
| Nonsweet | 1184 | 169 | 342 | 1695 |
| Bitter | 769 | 118 | 239 | 1126 |
| Nonbitter | 1052 | 142 | 281 | 1475 |
| Umami | 71 | 8 | 19 | 98 |
| Nonumami | 1750 | 252 | 501 | 2503 |

2.2. Molecular representation
2.2.1. Fingerprint
- (1)
- Morgan fingerprint [27]: A circular fingerprint encoding structural information by considering substructures at different radii around each atom.
- (2)
- PubChem fingerprint [28]: A binary fingerprint derived from the PubChem Compound database, representing molecular structural features based on predefined chemical substructures.
- (3)
- Daylight fingerprint: A descriptor developed by the Daylight Chemical Information Systems, encoding chemical features by identifying fragments and substructures within a molecule.
- (4)
- RDKit fingerprint: A fingerprinting method integrated by RDKit package. It is a dictionary with one entry per bit set in the fingerprint, the keys are the bit IDs, the values are tuples of tuples containing bond indices.
- (5)
- ESPF fingerprint [29]: An explainable substructure partition fingerprint capturing extended connectivity patterns within a molecule, representing the presence of specific atom types and their surrounding environments.
- (6)
- ErG fingerprint [30]: A novel fingerprinting method, which is presented that uses pharmacophore-type node descriptions to encode the relevant molecular properties.
2.2.2. Convolutional neural network
- (1)
- Simple CNN [31]:
- (2)
- CNN-LSTM [33]:
- (3)
- CNN-GRU [33]:
2.2.3. Graph neural networks
- (1)
- GCN [35]:
- (2)
- NeuralFP [36]:
- (3)
- GIN-AttrMasking [37]:
- (4)
- GIN-ContextPred [37]:
- (5)
- AttentiveFP [38]:
2.3. Predictor
3. Results
3.1. Evaluation metrics
3.2. Comparison of model performance
3.3. Voting/consensus model performance
3.4. In-silicon compound taste database
4. Discussion
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chandrashekar, J.; Hoon, M.A.; Ryba, N.J.P.; Zuker, C.S. The Receptors and Cells for Mammalian Taste. Nature 2006, 444, 288–294. [Google Scholar] [CrossRef] [PubMed]
- Drewnowski, A.; Gomez-Carneros, C. Bitter Taste, Phytonutrients, and the Consumer: A Review. Am. J. Clin. Nutr. 2000, 72, 1424–1435. [Google Scholar] [CrossRef] [PubMed]
- Johnson, R.J.; Segal, M.S.; Sautin, Y.; Nakagawa, T.; Feig, D.I.; Kang, D.-H.; Gersch, M.S.; Benner, S.; Sánchez-Lozada, L.G. Potential Role of Sugar (Fructose) in the Epidemic of Hypertension, Obesity and the Metabolic Syndrome, Diabetes, Kidney Disease, and Cardiovascular Disease. Am. J. Clin. Nutr. 2007, 86, 899–906. [Google Scholar] [CrossRef] [PubMed]
- Rojas, C.; Ballabio, D.; Pacheco Sarmiento, K.; Pacheco Jaramillo, E.; Mendoza, M.; García, F. ChemTastesDB: A Curated Database of Molecular Tastants. Food Chemistry: Molecular Sciences 2022, 4, 100090. [Google Scholar] [CrossRef] [PubMed]
- Banerjee, P.; Preissner, R. BitterSweetForest: A Random Forest Based Binary Classifier to Predict Bitterness and Sweetness of Chemical Compounds. Front. Chem. 2018, 6. [Google Scholar] [CrossRef]
- Goel, M.; Sharma, A.; Chilwal, A.S.; Kumari, S.; Kumar, A.; Bagler, G. Machine Learning Models to Predict Sweetness of Molecules. Comput. Biol. Med. 2023, 152, 106441. [Google Scholar] [CrossRef]
- Fritz, F.; Preissner, R.; Banerjee, P. VirtualTaste: A Web Server for the Prediction of Organoleptic Properties of Chemical Compounds. Nucleic Acids Res. 2021, 49, W679–W684. [Google Scholar] [CrossRef]
- Zheng, S.; Chang, W.; Xu, W.; Xu, Y.; Lin, F. E-Sweet: A Machine-Learning Based Platform for the Prediction of Sweetener and Its Relative Sweetness. Front. Chem. 2019, 7, 1–14. [Google Scholar] [CrossRef]
- Rojas, C.; Todeschini, R.; Ballabio, D.; Mauri, A.; Consonni, V.; Tripaldi, P.; Grisoni, F. A QSTR-Based Expert System to Predict Sweetness of Molecules. Front. Chem. 2017, 5, 53. [Google Scholar] [CrossRef]
- Zheng, S.; Jiang, M.; Zhao, C.; Zhu, R.; Hu, Z.; Xu, Y.; Lin, F. E-Bitter: Bitterant Prediction by the Consensus Voting From the Machine-Learning Methods. Front. Chem. 2018, 6, 82. [Google Scholar] [CrossRef]
- Zheng, S.; Chang, W.; Xu, W.; Xu, Y.; Lin, F. E-Sweet: A Machine-Learning Based Platform for the Prediction of Sweetener and Its Relative Sweetness. Front. Chem. 2019, 7, 35. [Google Scholar] [CrossRef] [PubMed]
- Tuwani, R.; Wadhwa, S.; Bagler, G. BitterSweet: Building Machine Learning Models for Predicting the Bitter and Sweet Taste of Small Molecules. Sci. Rep. 2019, 9, 7155. [Google Scholar] [CrossRef] [PubMed]
- Bo, W.; Qin, D.; Zheng, X.; Wang, Y.; Ding, B.; Li, Y.; Liang, G. Prediction of Bitterant and Sweetener Using Structure-Taste Relationship Models Based on an Artificial Neural Network. Food Res. Int. 2022, 153, 110974. [Google Scholar] [CrossRef] [PubMed]
- Lee, I.; Keum, J.; Nam, H. DeepConv-DTI: Prediction of Drug-Target Interactions via Deep Learning with Convolution on Protein Sequences. PLoS Comput Biol 2019, 15, e1007129. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.; Ru, X.; Song, R. Application of Machine Learning for Drug–Target Interaction Prediction. Front. Genet. 2021, 12, 680117. [Google Scholar] [CrossRef] [PubMed]
- Wen, M.; Zhang, Z.; Niu, S.; Sha, H.; Yang, R.; Yun, Y.; Lu, H. Deep-Learning-Based Drug-Target Interaction Prediction. J. Proteome Res. 2017, 16, 1401–1409. [Google Scholar] [CrossRef]
- Huang, K.; Fu, T.; Glass, L.M.; Zitnik, M.; Xiao, C.; Sun, J. DeepPurpose: A Deep Learning Library for Drug–Target Interaction Prediction. Bioinformatics 2021, 36, 5545–5547. [Google Scholar] [CrossRef]
- Ye, Q.; Zhang, X.; Lin, X. Drug–Target Interaction Prediction via Multiple Classification Strategies. BMC Bioinf. 2022, 22, 461. [Google Scholar] [CrossRef]
- Aldeghi, M.; Coley, C.W. A Graph Representation of Molecular Ensembles for Polymer Property Prediction. Chem. Sci. 2022, 13, 10486–10498. [Google Scholar] [CrossRef]
- Fang, X.; Liu, L.; Lei, J.; He, D.; Zhang, S.; Zhou, J.; Wang, F.; Wu, H.; Wang, H. Geometry-Enhanced Molecular Representation Learning for Property Prediction. Nat. Mach. Intell. 2022, 4, 127–134. [Google Scholar] [CrossRef]
- Chen, D.; Gao, K.; Nguyen, D.D.; Chen, X.; Jiang, Y.; Wei, G.-W.; Pan, F. Algebraic Graph-Assisted Bidirectional Transformers for Molecular Property Prediction. Nat. Commun. 2021, 12, 3521. [Google Scholar] [CrossRef] [PubMed]
- Cai, H.; Zhang, H.; Zhao, D.; Wu, J.; Wang, L. FP-GNN: A Versatile Deep Learning Architecture for Enhanced Molecular Property Prediction. Brief. Bioinform. 2022, 23, bbac408. [Google Scholar] [CrossRef] [PubMed]
- Yang, Q.; Ji, H.; Lu, H.; Zhang, Z. Prediction of Liquid Chromatographic Retention Time with Graph Neural Networks to Assist in Small Molecule Identification. Anal. Chem. 2021, 93, 2200–2206. [Google Scholar] [CrossRef] [PubMed]
- Rohani, A.; Mamarabadi, M. Free Alignment Classification of Dikarya Fungi Using Some Machine Learning Methods. Neural Comput. Appl. 2019, 31, 6995–7016. [Google Scholar] [CrossRef]
- Cui, T.; El Mekkaoui, K.; Reinvall, J.; Havulinna, A.S.; Marttinen, P.; Kaski, S. Gene–Gene Interaction Detection with Deep Learning. Commun Biol 2022, 5, 1238. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Park, S.; Min, D.; Kim, W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int. J. Mol. Sci. 2021, 22, 9983. [Google Scholar] [CrossRef]
- Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. [Google Scholar] [CrossRef]
- Kim, S.; Thiessen, P.A.; Bolton, E.E.; Bryant, S.H. PUG-SOAP and PUG-REST: Web Services for Programmatic Access to Chemical Information in PubChem. Nucleic Acids Res 2015, 43, W605–W611. [Google Scholar] [CrossRef]
- Huang, K.; Xiao, C. Explainable Substructure Partition Fingerprint for Protein, Drug, and More.
- Stiefl, N.; Watson, I.A.; Baumann, K.; Zaliani, A. ErG: 2D Pharmacophore Descriptions for Scaffold Hopping. J. Chem. Inf. Model. 2006, 46, 208–220. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. Acm 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Tsubaki, M.; Tomii, K.; Sese, J. Compound–Protein Interaction Prediction with End-to-End Learning of Neural Networks for Graphs and Sequences. Bioinformatics 2019, 35, 309–318. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); Association for Computational Linguistics: Doha, Qatar, October 2014; pp. 1724–1734.
- Li, M.; Zhou, J.; Hu, J.; Fan, W.; Zhang, Y.; Gu, Y.; Karypis, G. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science. ACS Omega 2021, 6, 27233–27238. [Google Scholar] [CrossRef] [PubMed]
- Jiang, B.; Zhang, Z.; Lin, D.; Tang, J.; Luo, B. Semi-Supervised Learning With Graph Learning-Convolutional Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); June 2019. pp. 11305–11312.
- Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; Gómez-Bombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In Proceedings of the Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2; MIT Press: Cambridge, MA, USA, December 7 2015; pp. 2224–2232.
- Hu, W.; Liu, B.; Gomes, J.; Zitnik, M.; Liang, P.; Pande, V.; Leskovec, J. Strategies for Pre-Training Graph Neural Networks 2020.
- Xiong, Z.; Wang, D.; Liu, X.; Zhong, F.; Wan, X.; Li, X.; Li, Z.; Luo, X.; Chen, K.; Jiang, H.; et al. Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism. J. Med. Chem. 2020, 63, 8749–8760. [Google Scholar] [CrossRef] [PubMed]



| Model | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Morgan | 0.838 | 0.726 | 0.848 | 0.833 | 0.782 |
| Pubchem | 0.833 | 0.714 | 0.854 | 0.822 | 0.777 |
| Daylight | 0.825 | 0.784 | 0.674 | 0.904 | 0.725 |
| RDKit | 0.835 | 0.784 | 0.713 | 0.898 | 0.747 |
| ESPF | 0.800 | 0.715 | 0.691 | 0.857 | 0.703 |
| ErG | 0.840 | 0.757 | 0.787 | 0.868 | 0.771 |
| CNN | 0.813 | 0.701 | 0.792 | 0.825 | 0.744 |
| CNN_GRU | 0.833 | 0.757 | 0.753 | 0.874 | 0.755 |
| CNN_LSTM | 0.821 | 0.797 | 0.640 | 0.915 | 0.710 |
| GCN | 0.869 | 0.796 | 0.831 | 0.889 | 0.813 |
| NeuralFP | 0.869 | 0.799 | 0.826 | 0.892 | 0.812 |
| GIN_AttrMasking | 0.854 | 0.748 | 0.865 | 0.848 | 0.802 |
| GIN_ContextPred | 0.850 | 0.745 | 0.854 | 0.848 | 0.796 |
| AttentiveFP | 0.810 | 0.838 | 0.550 | 0.944 | 0.664 |
| Model | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Morgan | 0.862 | 0.868 | 0.824 | 0.893 | 0.845 |
| Pubchem | 0.879 | 0.886 | 0.845 | 0.907 | 0.865 |
| Daylight | 0.837 | 0.821 | 0.824 | 0.847 | 0.823 |
| RDKit | 0.869 | 0.864 | 0.849 | 0.886 | 0.857 |
| ESPF | 0.815 | 0.787 | 0.820 | 0.811 | 0.803 |
| ErG | 0.858 | 0.884 | 0.795 | 0.911 | 0.837 |
| CNN | 0.823 | 0.911 | 0.682 | 0.943 | 0.780 |
| CNN_GRU | 0.825 | 0.898 | 0.699 | 0.932 | 0.786 |
| CNN_LSTM | 0.825 | 0.874 | 0.724 | 0.911 | 0.792 |
| GCN | 0.860 | 0.877 | 0.808 | 0.904 | 0.841 |
| NeuralFP | 0.896 | 0.904 | 0.866 | 0.922 | 0.885 |
| GIN_AttrMasking | 0.831 | 0.883 | 0.728 | 0.918 | 0.798 |
| GIN_ContextPred | 0.838 | 0.923 | 0.707 | 0.950 | 0.801 |
| AttentiveFP | 0.831 | 0.899 | 0.711 | 0.932 | 0.794 |
| Model | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Morgan | 0.992 | 1.000 | 0.789 | 1.000 | 0.882 |
| Pubchem | 0.994 | 0.944 | 0.895 | 0.998 | 0.919 |
| Daylight | 0.985 | 0.762 | 0.842 | 0.990 | 0.800 |
| RDKit | 0.973 | 0.576 | 1.000 | 0.972 | 0.731 |
| ESPF | 0.985 | 0.789 | 0.789 | 0.992 | 0.789 |
| ErG | 0.992 | 0.941 | 0.842 | 0.998 | 0.889 |
| CNN | 0.983 | 0.813 | 0.684 | 0.994 | 0.743 |
| CNN_GRU | 0.992 | 1.000 | 0.789 | 1.000 | 0.882 |
| CNN_LSTM | 0.988 | 0.933 | 0.737 | 0.998 | 0.824 |
| GCN | 0.992 | 1.000 | 0.789 | 1.000 | 0.882 |
| NeuralFP | 0.982 | 0.708 | 0.895 | 0.986 | 0.791 |
| GIN_AttrMasking | 0.975 | 0.615 | 0.842 | 0.980 | 0.711 |
| GIN_ContextPred | 0.983 | 0.727 | 0.842 | 0.988 | 0.780 |
| AttentiveFP | 0.994 | 1.000 | 0.842 | 1.000 | 0.914 |
| Model | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Consensus FP | 0.883 | 0.855 | 0.792 | 0.930 | 0.822 |
| Consensus CNN | 0.844 | 0.771 | 0.775 | 0.880 | 0.773 |
| Consensus GNN | 0.887 | 0.861 | 0.798 | 0.933 | 0.828 |
| FP + CNN | 0.881 | 0.796 | 0.876 | 0.883 | 0.834 |
| FP + GNN | 0.898 | 0.845 | 0.860 | 0.918 | 0.852 |
| CNN + GNN | 0.879 | 0.844 | 0.792 | 0.924 | 0.817 |
| FP + CNN + GNN | 0.896 | 0.841 | 0.860 | 0.915 | 0.850 |
| Model | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Consensus FP | 0.883 | 0.884 | 0.858 | 0.904 | 0.870 |
| Consensus CNN | 0.852 | 0.905 | 0.757 | 0.932 | 0.825 |
| Consensus GNN | 0.877 | 0.935 | 0.787 | 0.954 | 0.855 |
| FP + CNN | 0.879 | 0.923 | 0.805 | 0.943 | 0.859 |
| FP + GNN | 0.896 | 0.922 | 0.845 | 0.940 | 0.882 |
| CNN + GNN | 0.879 | 0.956 | 0.791 | 0.954 | 0.857 |
| FP + CNN + GNN | 0.890 | 0.929 | 0.824 | 0.947 | 0.874 |
| Model | Accuracy | Precision | Sensitivity | Specificity | F1 score |
|---|---|---|---|---|---|
| Consensus FP | 0.992 | 0.941 | 0.842 | 0.998 | 0.889 |
| Consensus CNN | 0.994 | 1.000 | 0.842 | 1.000 | 0.914 |
| Consensus GNN | 0.994 | 0.944 | 0.895 | 0.998 | 0.919 |
| FP + CNN | 0.992 | 1.000 | 0.789 | 1.000 | 0.882 |
| FP + GNN | 0.992 | 1.000 | 0.789 | 1.000 | 0.882 |
| CNN + GNN | 0.994 | 1.000 | 0.842 | 1.000 | 0.914 |
| FP + CNN + GNN | 0.992 | 1.000 | 0.789 | 1.000 | 0.882 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).