Submitted:
28 April 2026
Posted:
29 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. The ChronobioticsDB Database
2.2. Molecular Representation and Model Architecture
2.3. Validation Protocols and Metrics
2.4. Optimisation and Loss Functions
2.5. Virtual Screening of the SAVI Library
3. Results
3.1. Evaluation of Baseline Predictive Capability (DEM)
3.2. Robustness Under Scaffold Validation
3.3. Comparison of Feature-Based and End-to-End Approaches
3.4. Influence of Effect-Labelling Variants
3.5. Results of Virtual Screening of the SAVI Library
3.5.1. Predicted Pharmacological Classes
3.5.2. Predicted Action Effects
3.5.3. Predicted Molecular Targets
3.5.4. Structural Features of the Top Candidates
| No. | SAVI ID | SMILES | Consensus Score |
|---|---|---|---|
| 1 | 1A275D1E3DD3524A_059EC9AC131E1D6F_2201_UN | CC1(C)COC2=C(C3=C(N=C12)C4=C(S3)NC(C(=C4)C(=O)O)=O)C5=CC=CC=C5 | 0.95093 |
| 2 | 8D3182EB37F23078_F37BCE2DDB9434D2_2201_UN | CCOC1=CC3=C(C=C1)OCC4=C(C)C2=C(C=CC(=C2)[N](=O)=O)N=C34 | 0.94279 |
| 3 | 37D4039641B3BC96_C51990CEE4ECD988_2201_UN | ClC1=C(C=CC=C1)C2=C(C4=C(N=C2C3CCOC3)N=CC=C4)C5=CC(=CC=C5)Cl | 0.93457 |
| 4 | D0AE13FD123CDC8C_7435451788DFFA67_2201_UN | ClC1=C(C=CC=C1)C3=C(C2=CC(=C(OC)C=C2N=C3C4=CSC=C4)OC)C | 0.93363 |
| 5 | 4A662B2114895A5C_17C8243AD7AA0E1B_2201_UN | ClC1=C(C=CC=C1)C2=C(C4=C(N=C2C3CCOC3)N=CC=C4)C5=CC(=CC(=C5)F)F | 0.93169 |
| 6 | 48D2262B63F9A9ED_AC5CCD9004562092_2201_UN | ClC1=C(C=CC=C1)C2=CC4=C(N=C2C3CCOC3)C=C(C=N4)F | 0.92927 |
| 7 | FCAE9B9C9F85925C_C93F992927D663B2_2201_UN | CC1=NC3=C(C(=C1CCCC=C)C2=CC=C(OC)C=C2)C4=C(S3)CCC4 | 0.92651 |
| 8 | 3DC78391E8CFE6FE_04382ADD4A0EE0BF_2201_DP | OC2C1=C(C3=C(N=C1C2(C)C)C4=C(S3)NC(C(=C4)C(=O)O)=O)C5=CC=CC=C5 | 0.92577 |
| 9 | F0EEAB259F5604BB_0784D55EEB07C643_2201_UN | CC2(C)COC3=C(C1=CC=C(C)C=C1)C4=C(N=C23)SC5=C4CCC(C5)(C)C | 0.92511 |
| 10 | DD91AFA3FD38EC01_EA8D819DAFE7BBFF_2201_UN | COC1=CC4=C(C=C1)C3=NC2=C(C=C(C=C2)Cl)C(=C3CC4)C5=C(C=CC=C5)F | 0.92311 |
| 11 | 7333F032F376DB51_EEF15F3C4B182B7F_2201_UN | ClC1=C(C=CC=C1)C3=C(C)C2=C(C=C(C=C2)F)N=C3C4CCOC4 | 0.92239 |
| 12 | 1B8724E4F079AF13_D8F1E5D17DCFE5AF_2201_UN | CCOC(=O)C2CCN(CC1=CC=CC=C1)CCC3=C(C4=C(N=C23)C=C(C=C4)Cl)C5COCCC5 | 0.92089 |
| 13 | D70D62A960D63F7C_332F7499CFA77E1C_2201_UN | ClC1=C(C=CC=C1)C3=C(C2=C(OC)C=CC=C2N=C3C4CCOC4)C | 0.91971 |
| 14 | 26B5CA1C071C41C7_E875983BE719FFF7_2201_UN | C2=NC1=C(C=CC=C1)C(=C2CC3=CON=C3)C[S](=O)(=O)N | 0.91788 |
| 15 | 3EA1FB1AFF1DA556_3973C278384A9C44_2201_UN | CC1=NC2=C(C(=C1CCCC=C)C(C)C)C3=C(S2)C(CC(C3)(C)C)(C)C | 0.91660 |
| 16 | 301F30FFC04DC940_6A2991726B857875_2201_UN | CCOC1=CC2=C(C=C1)OCC3=C(C4=C(N=C23)SC5=C4CCC5)C6=CN=CC=C6 | 0.91629 |
| 17 | AD6DC4D914DEF26E_1EB6E3F7DECCFB7A_2201_UN | CC2=NC1=C(C=NC=C1)C(=C2CCCC=C)C(F)(F)F | 0.91437 |
| 18 | C838FE0623C5B07D_D9320B3361EB2866_2201_UN | C3=NC1=C(C2=C(S1)CCCC2)C(=C3CC4=CON=C4)C5=CC(=CC=C5)F | 0.91286 |
| 19 | 4220E144E1F5944B_FF7C2A854CEED719_2201_UN | CCOC1=CC3=C(C=C1)OCC4=C(C)C2=C(C=CC(=C2)O)N=C34 | 0.91137 |
| 20 | B7C81F203CC1F6C5_F5508E7A730D0588_2201_UN | ClC1=C(C=CC=C1)C2=C(C4=C(N=C2C3CCOC3)C=CC(=C4)F)C5=CC=CC=C5 | 0.91109 |
| 21 | B88882A3FAA95561_2A5FE6C4841D3564_2201_DP | OC2C1=CC4=C(N=C1C23CCC3)C(=NC=C4)Br | 0.90852 |
| 22 | 5A8108D267EFBBF6_F1DF654F4F5B1E47_2201_DP | OC2C1=C5C4=C(N=C1C23CCC3)C(=CC(=C4C(C6=C5C=CC=C6)=O)Br)C(=O)O | 0.90731 |
| 23 | FF5E13C995B41E36_B5B6E33BB7C03FDF_2201_UN | COC1=CC5=C(C=C1)C4=NC2=C(C3=C(S2)CCC3)C(=C4CC5)C6=CC=NC=C6 | 0.90707 |
| 24 | 2E6BF2F0D8FA29A0_30D34167389A43FD_2201_DP | OC2C1=CC3=C(N=C1C2(C)C)N=CC=C3OC(F)(F)F | 0.90643 |
| 25 | 8779633B5BFF22F4_75EA71CFD58A7E89_2201_UN | CCOC1=CC3=C(C=C1)OCC4=C(C)C2=C(C=CC(=C2)Br)N=C34 | 0.90493 |
| 26 | C1941DBDD6103EBA_A0692FCDD1DF39C3_2201_UN | ClC1=C(C=CC=C1)C2=C(C4=C(N=C2C3CCOC3)C=CC=C4)C5=CC=CC=C5 | 0.90455 |
| 27 | F866369954281749_B36D81BEB5370B6F_2201_UN | CC2(C)COC3=C(C1=CC(=C(C)C=C1)F)C4=C(N=C23)SC5=C4CCC5 | 0.90379 |
| 28 | 2FBF185F383D3EDA_5F2563289E30B338_2201_UN | COC1=CC5=C(C=C1)C4=NC2=C(C3=C(S2)CCC3)C(=C4CC5)C6=CC(=CC=C6)Cl | 0.90329 |
| 29 | F9919567E86AD30F_4D6275FD2B17E549_2201_UN | CCOC(=O)C2CCN(CC1=CC=CC=C1)CCC3=C(C4=C(N=C23)C=C(C=C4)Cl)C5COCC5 | 0.90323 |
| 30 | 4BD3AF3106A1D416_F8C04917CCC7A808_2201_UN | CC(C)(C)C3=NC2=C([N]1CCCCC1=N2)C(=C3CC4=CC=CC=C4)C5=CC=C(C=C5)Cl | 0.90284 |
| 31 | D1CE029DC385F1BF_37224ED6F642E061_2201_DP | OC2C1=CC4=C(N=C1C23CCC3)N=CC(=N4)Br | 0.90280 |
| 32 | 57EA6E9327C0702B_E91BC4B810E319EB_2201_UN | C1=NC3=C(C(=C1CC2=CON=C2)C)SC4=C3C(=CC(=N4)C)C | 0.90236 |
| 33 | 26F6C2711940E9B9_CF3B95CDD8C70C89_2201_UN | CC(C)(C)C1=NC3=C(C(=C1CC2=CC=CC=C2)C(C)=C)C=CC=C3 | 0.90236 |
| 34 | 94EB09F58F06CB7C_9E67FB133423B583_2201_DP | OC4C3=C(C)C1=C(C2=C(S1)N=C(C)C=C2C)N=C3C4(C)C | 0.90148 |
3.5.5. Characterisation of the 34 Super-Candidates
4. Discussion
4.1. The Influence of Small Samples on Model Stability
4.2. Scaffold Validation and Generalisation Ability
4.3. The Role of Data Curation in Prediction Quality
4.4. Interpretation of the SAVI Screening Results
4.5. Comparison with Results in Chemoinformatics
5. Conclusion
Supplementary Materials
Funding
Authorship
Compliance with Ethical Standards
Conflicts of Interest
References
- Solovev, I.A.; Golubev, D.A.; Yagovkina, A.I.; Kotelina, N.O. ChronobioticsDB: The Database of Drugs and Compounds Modulating Circadian Rhythms. Clocks Sleep 2025, 7, 30. [Google Scholar] [CrossRef] [PubMed]
- Solovev, I.A.; Shaposhnikov, M.V.; Moskalev, A.A. Chronobiotics KL001 and KS15 Extend Lifespan and Modify Circadian Rhythms of Drosophila melanogaster. Clocks Sleep 2021, 3, 429–441. [Google Scholar] [CrossRef] [PubMed]
- Solovev, I.A.; Golubev, D.A. Chronobiotics: Classifications of existing circadian clock modulators, future perspectives. Biomeditsinskaya Khimiya 2024, 70, 381–393. [Google Scholar] [CrossRef] [PubMed]
- Bi, X.; Wang, Y.; Wang, J.; Liu, C. Machine learning for multi-target drug discovery: Challenges and opportunities in systems pharmacology. Pharmaceutics 2025, 17, 1186. [Google Scholar] [CrossRef] [PubMed]
- Mswahili, M.E.; Jeong, Y.S. Transformer-based models for chemical SMILES representation: A comprehensive literature review. Heliyon 2024, 10, e39038. [Google Scholar] [CrossRef] [PubMed]
- Honda, S.; Shi, S.; Ueda, H.R. SMILES Transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv 2019, arXiv:1911.04738. [Google Scholar] [CrossRef]
- Gage, P. A new algorithm for data compression. C. Users J. 1994, 12, 23–38. [Google Scholar] [CrossRef]
- Temizer, A.B.; Uludoğan, G.; Özçelik, R.; Koulani, T.; Ozkirimli, E.; Ulgen, K.O.; Karali, N.; Özgür, A. Exploring data-driven chemical SMILES tokenization approaches to identify key protein–ligand binding moieties. Mol. Inform. 2024, 43, e202300249. [Google Scholar] [CrossRef] [PubMed]
- Sharma, R.; Mukherjee, S.; Sipka, A.; Hullermeier, E.; Vollmer, S.; Redyuk, S.; Selby, D.A. X-Hacking: The Threat of Misguided AutoML. Open Access LMU (Ludwig Maximilian University of Munich). January 2025. [CrossRef]
- Ridnik, T.; Ben-Baruch, E.; Zamir, N.; Noy, A.; Friedman, I.; Protter, M.; Zelnik-Manor, L. Asymmetric Loss for Multi-Label Classification. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 82–91. [Google Scholar] [CrossRef]
- Mao, W.; Wu, J.; Liu, H.; Sui, Y.; Wang, X. Invariant graph learning meets information bottleneck for out-of-distribution generalization. Front. Comput. Sci. 2026, 20, 2001305. [Google Scholar] [CrossRef]
- Ramos, M.C.; Collison, C.J.; White, A.D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 2025, 16, 2514–2572. [Google Scholar] [CrossRef] [PubMed]
- Naser, M.Z. A review of machine learning with small and limited data. J. Big Data 2026, 13, 18. [Google Scholar] [CrossRef]

| Task | DEM ROC-AUC Micro | Repeats |
|---|---|---|
| classf | 0.571 | 20 |
| mechanism | 0.567 | 20 |
| target | 0.552 | 20 |
| effect | 0.521 | 20 |
| effect_coarse | 0.529 | 20 |
| effect_expert | 0.520 | 20 |
| Protocol | F1 Micro | F1 Macro | ROC-AUC Micro | CI95 Low | CI95 High |
|---|---|---|---|---|---|
| single_split | 0.287 | 0.264 | 0.567 | — | — |
| repeated_5×5 | 0.319 | 0.254 | 0.587 | 0.299 | 0.340 |
| Configuration | F1 Micro | F1 Macro | ROC-AUC Micro |
|---|---|---|---|
| feature_based_pro | 0.304 | 0.232 | 0.597 |
| end2end_single | 0.223 | 0.133 | 0.470 |
| end2end_sweep_best | 0.376 | 0.144 | 0.666 |
| Variant | F1 Micro | F1 Macro | ROC-AUC Micro | Number of Labels |
|---|---|---|---|---|
| effect_raw | 0.500 | 0.148 | 0.832 | 6 |
| effect_coarse | 0.304 | 0.232 | 0.597 | 9 |
| effect_expert | 0.296 | 0.254 | 0.584 | 10 |
| Score Range | Number of Molecules | Interpretation |
|---|---|---|
| >0.90 | 34 | Super-candidates (highest confidence of all 4 models) |
| 0.80–0.90 | 401 | Very strong candidates |
| 0.70–0.80 | 1130 | Strong candidates |
| 0.60–0.70 | 2132 | Moderate candidates |
| 0.50–0.60 | 2299 | Weak candidates |
| < 0.50 | 4004 | Lower half of the top 10,000 |
| Effect | Proportion | Potential Medical Application |
|---|---|---|
| Phase shift | 37.00% | Correction of jet-lag, shift work |
| Circadian restoration | 31.55% | Neurodegeneration, oncology |
| Sleep modulation | 31.45% | Insomnia, disturbances of sleep architecture |
| Target | Proportion | Role in the Circadian Rhythm |
|---|---|---|
| CLOCK-BMAL1 | 60.49% | Principal activator of the circadian cycle |
| CRY1–PER2 | 28.02% | Negative regulator of CLOCK-BMAL1 |
| Arntl (Bmal1) | 6.72% | Genetic regulation of BMAL1 |
| Melatonin receptor | 1.85% | Melatonin signal transduction |
| Other | 2.92% | Auxiliary targets |
| Structural Feature | Frequency | Chemical Significance |
|---|---|---|
| Aromatic rings | 100% (30/30) | Binding within the active sites of proteins |
| Imine bond N=C | 77% (23/30) | Planar structure, π–π stacking |
| Chlorine substituent (Cl) | 43% (13/30) | Increased lipophilicity |
| Sulfur (thiophene/thiazole) | 40% (12/30) | Pharmacophoric heterocycles |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).