Submitted:
12 November 2024
Posted:
14 November 2024
You are already at the latest version
Abstract
The evolution timeline of SARS-CoV-2 is examined. We found an approximately linear relationship between the number of mutated sites (x) on the spike protein of a variant and its first global sample collection time. By combining the emergence of novel strains at a given x with this linear relationship, we can predict the emergence time of macro-lineages. It is forecasted that macro-lineage Q will emerge shortly after the emergence of lineage P.
Keywords:
1. Introduction
2. Materials and Methods
2.1. Materials
2.2. Methods
3. Results
3.1. Each Macro-Lineage Has a Specific Survival Time. The Relationship Between the Number of Mutated Sites and Time t Is a Discontinuous Function
3.2. Linear Regression of the Number of Mutated Sites in a Variant Versus Sample Collection Date as a Good Approximation
3.3. Prediction of the Emergence of the Q Macro-Lineage
4. Discussion
4.1. Why Is the Increase in the Number of Mutated Sites of a Variant Approximately Linear with Respect to Its Emergence Time?
4.2. What Is the Relationship Between the Time Prediction for the Emergence of a Macro-Lineage and the Mutant Prediction in the A-X Model?
5. Conclusions
- The n-distance algorithm, applied in UPGMA, generates a phylogenetic tree of viral evolution based on amino acid mutations in the spike protein. The reconstructed tree aligns closely with established evolutionary data;
- The A-X model is introduced to simulate the generation of new strains on the phylogenetic tree. By combining set A (existing mutated sites) with set X (which includes x randomly generated sites), we can predict the emergence of novel strains. Expanding stochastic sampling to a larger scale reveals statistical patterns governing new strain production. As x increases, the proportions of the four macro-lineages change: lineage O surpasses N first, followed by lineage P surpassing O, and finally, lineage Q emerges;
- A linear regression between the number of mutated sites (NMS) for a variant (i.e., x) and its worldwide first sample collection time (i.e., t) provides a good approximation. This linearity arises from the combined effects of stepwise changes in NMS at lineage transformations and varying slopes of NMS versus time in neighboring lineages;
- By integrating the information on novel strain production at a given x from the A-X model and the linear relationship between x and t, we forecast that macro-lineage Q will emerge around February 2025 (when x≈79), and will reach a stage of strong outbreak approximately 23 months later (when x≈108).
Author Contributions
Funding
Acknowledgements
Conflicts of Interest
References
- Rahnsch, B.; Taghizadeh, L. Network-based uncertainty quantification for mathematical models in epidemiology. J. Theor. Biol. 2024, 577, 111671. [Google Scholar] [CrossRef]
- Ramachandran, A.; Lumetta, S.S.; Chen, D. PandoGen: generating complete instances of future SARS-CoV-2 sequences using deep learning. PLoS Comput. Biol. 2024, 20, e1011790. [Google Scholar] [CrossRef] [PubMed]
- Fowler, D.; Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 2014, 11, 801–807. [Google Scholar] [CrossRef] [PubMed]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef] [PubMed]
- Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348. [Google Scholar] [CrossRef] [PubMed]
- Luo, L.; Lv, J. Mathematical modelling of virus spreading in COVID-19. Viruses 2023, 15, 1788. [Google Scholar] [CrossRef] [PubMed]
- Luo, L.; Lv, J. An evolutionary theory on virus mutation in COVID-19. Virus Res. 2024, 344, 199358. [Google Scholar] [CrossRef] [PubMed]
- Luo, L.; Lv, J. Prediction on emergence of SARS-CoV-2 based on evolutionary theory of virus mutation. Available online: https://ssrn.com/abstract=4938698 (accessed on 31 August 2024).
- Gangavarapu, K.; Latif, A.A.; Mullen, J.L.; Alkuzweny, M.; Hufbauer, E.; Tsueng, G.; Haag, E.; Zeller, M.; Aceves, C.M.; Zaiets, K.; et al. Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations. Nat Methods 2023, 20, 512–522. [Google Scholar] [CrossRef] [PubMed]



| Macro-lineage | Variant | NMS * | ANMS † | Earliest date ‡ | Variant | NMS * | ANMS † | Earliest date ‡ |
|---|---|---|---|---|---|---|---|---|
| N-lineage | B.1 | 1 | 1 | 15 Jan 2020 | B.1.621 | 9 | 40 | 19 Sep 2020 |
| B.1.177 | 2 | 2 | 7 Mar 2020 | C.37 | 14 | 52 | 8 Nov 2020 | |
| P.2 | 3 | 4 | 15 Apr 2020 | B.1.526 | 4 | 53 | 15 Nov 2020 | |
| B.1.1.7 | 10 | 13 | 14 May 2020 | B.1.525 | 9 | 57 | 11 Dec 2020 | |
| B.1.429 | 4 | 16 | 6 Jul 2020 | P.3 | 7 | 59 | 15 Jan 2021 | |
| B.1.351 | 10 | 23 | 9 Jul 2020 | AZ.2 | 6 | 60 | 5 Feb 2021 | |
| B.1.617.2 | 9 | 29 | 7 Sep 2020 | AV.1 | 10 | 64 | 23 Mar 2021 | |
| P.1 | 12 | 36 | 11 Sep 2020 | B.1.1.529 | 7 | 67 | 15 Apr 2021 | |
| B.1.617.1 | 5 | 37 | 15 Sep 2020 | C.1.2 | 15 | 71 | 11 May 2021 | |
| O-lineage | BA.1 | 33 | 87 | 27 Jan 2021 | BN.1.2 | 40 | 106 | 7 Feb 2022 |
| BA.1.1 | 35 | 88 | 28 Jan 2021 | CH.1.1 | 41 | 106 | 12 May 2022 | |
| BA.2 | 31 | 96 | 25 Mar 2021 | XBB.1.5 | 42 | 111 | 12 Jun 2022 | |
| BA.2.12.1 | 33 | 97 | 28 Sep 2021 | BM.4.1.1 | 39 | 111 | 20 Jul 2022 | |
| BA.2.65 | 31 | 97 | 11 Oct 2021 | CH.1.1.1 | 42 | 112 | 15 Oct 2022 | |
| BA.1.1.15 | 37 | 97 | 27 Nov 2021 | XBB.1.16 | 43 | 113 | 4 Jan 2023 | |
| BA.5 | 34 | 98 | 9 Dec 2021 | EG.1 | 43 | 114 | 16 Jan 2023 | |
| BA.4.1 | 35 | 99 | 14 Dec 2021 | HV.1 | 46 | 115 | 29 Jan 2023 | |
| BQ.1.1 | 37 | 101 | 20 Dec 2021 | HK.3 | 45 | 116 | 29 Jan 2023 | |
| BA.2.75 | 30 | 103 | 31 Dec 2021 | EG.5.1 | 44 | 116 | 31 Jan 2023 | |
| BF.5 | 35 | 104 | 8 Jan 2022 | DV.7.1 | 45 | 117 | 29 May 2023 | |
| BF.7 | 35 | 104 | 24 Jan 2022 | |||||
| P-lineage | JN.1 | 60 | 132 | 13 Jan 2023 | XDQ.1 | 55 | 139 | 5 Jan 2024 |
| BA.2.86.1 | 59 | 132 | 17 Jan 2023 | KP.3 | 63 | 139 | 7 Jan 2024 | |
| BA.2.86 | 58 | 132 | 11 Mar 2023 | LB.1 | 64 | 139 | 15 Jan 2024 | |
| JN.2 | 59 | 132 | 22 Jun 2023 | KP.1 | 63 | 140 | 1 Feb 2024 | |
| JN.1.7 | 62 | 134 | 25 Sep 2023 | KS.1 | 58 | 142 | 15 Feb 2024 | |
| JN.1.11.1 | 62 | 135 | 29 Dec 2023 | KP.1.1.3 | 65 | 142 | 23 Feb 2024 | |
| KP.3.1.1 | 64 | 136 | 1 Jan 2024 | XDV.1 | 56 | 142 | 26 Feb 2024 | |
| KP.2 | 59 | 136 | 2 Jan 2024 | LP.1 | 66 | 143 | 22 Apr 2024 | |
| JN.1.37 | 61 | 137 | 3 Jan 2024 | XED | 64 | 144 | 19 Jun 2024 | |
| XEB | 61 | 138 | 3 Jan 2024 | XEC | 65 | 145 | 28 Jun 2024 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).