Submitted:
02 October 2025
Posted:
04 October 2025
Read the latest preprint version here
Abstract

Keywords:
Introduction
Materials and Methods
Data Acquisition and Preprocessing
SARS-CoV-2 Genomic Dataset
Residue-Level Mutation Tracking
Codon Usage Bias Metrics and CUB6 Metric Suite (C6MS)
Shannon Entropy (H)
Relative Synonymous Codon Usage (RSCU) Variance
Effective Number of Codons (ENc)
Codon Adaptation Index (CAI)
tRNA Adaptation Index (tAI)
Codon Volatility Score (CVS)
Codon-Permissiveness Score (CPS) Calculation and Integration
CPS Derivation
Integration with Mutation Frequency
Structural Mapping and Epistasis Analysis
hACE2 Interface Definition
Epistatic Network Construction
Inter-Protein Epistasis Screening
Therapeutic Implications and mAb Design Framework
Introducing L450N and L450D Mutations in SARS-CoV-2 Spike Gene
Statistical Methods and Software
Results
Discovery of the Codon-Permissive Epistatic Backbone (CpEB) via De Novo Codon Analysis
Evolutionary Dynamics Reveal Three Distinct Mutation Regimes
| Position | Shannon Entropy | Interpretation |
|---|---|---|
| 450 | 0.19 | HIGH PERMISSIVENESS—multiple codons/mutations tolerated |
| 483 | 0.009 | Low, but not zero, dominated by E483V, but 15 unique mutations observed |
| 447 | 0.0007 | Very low, near fixation of N447G (99.998%) |
Intra-Spike Epistasis Confirmed by Statistical and Structural Validation
Generalizability to Emerging Pathogens

| SN | POSITION | AMINO ACID | CODON | COUNT | FREQUENCY IN AA | RSCU | CPS | TOP ACCESSIONS/LAB NUMBERS |
|---|---|---|---|---|---|---|---|---|
| 1 | 447 | L | TTA | 9227604 | 0.9999684 | 2.999905 | 2.999905 | USA/CA-LACPHL-AF03923/2021;USA/CA-CDC-FG-15377... |
| 2 | 447 | L | TTG | 193 | 2.09148E-05 | 0.000063 | 0.000063 | OV116804;England/MILK-2BD98CB/2021;England/MIL... |
| 3 | 447 | L | CTA | 99 | 1.07283E-05 | 0.000032 | 0.000032 | Denmark/DCGC-594637/2022;CMR/CERI-NPHL-K026065... |
| 4 | 447 | S | TCA | 706 | 1 | 1 | 1 | OY248724;OU120491;USA/WI-CDC-LC0101056/2021;US... |
| 5 | 447 | * | TAA | 42 | 0.9545455 | 1.909091 | 1.909091 | USA/UT-UPHL-241004923819/2024;USA/CA-CDPH-2000... |
| 6 | 447 | * | TGA | 2 | 0.04545455 | 0.090909 | 0.090909 | England/PHEC-3G072G79/2021;OY337846 |
| 7 | 447 | H | CAC | 49 | 1 | 1 | 1 | USA/WA-CDC-UW21090714000/2021;USA/WA-UW-210809... |
| 8 | 447 | V | GTA | 8 | 1 | 1 | 1 | OY067593;USA/IN-CDC-STM-000062452/2021;USA/CO-... |
| 9 | 447 | F | TTT | 2 | 0.5 | 1 | 1 | USA/CA-CDC-ASC210541580/2022;England/PHEP-YYNI... |
| 10 | 447 | F | TTC | 2 | 0.5 | 1 | 1 | USA/TN-CDC-ASC210508053/2022;USA/WA-UW-2107237... |
| 11 | 447 | I | ATA | 27 | 1 | 1 | 1 | Japan/SZ-NIG-Y221859/2022;Denmark/DCGC-574365/... |
| 12 | 447 | R | AGA | 1 | 1 | 1 | 1 | England/PHEC-3G06FG55/2021 |
| 13 | 450 | P | CCA | 9219182 | 0.9998956 | 3.999582 | 3.999582 | USA/CA-LACPHL-AF03923/2021;USA/CA-CDC-FG-15377... |
| 14 | 450 | P | CCT | 685 | 7.42938E-05 | 0.000297 | 0.000297 | Denmark/DCGC-258205/2021;USA/NM-CDC-LC0989442/... |
| 15 | 450 | P | CCG | 270 | 2.92837E-05 | 0.000117 | 0.000117 | USA/NY-CDC-FG-188531/2021;USA/CA-CDC-STM-00002... |
| 16 | 450 | P | CCC | 8 | 8.67665E-07 | 0.000003 | 0.000003 | England/MILK-1A72039/2021;OU933865;USA/CA-CDC-... |
| 17 | 450 | S | TCA | 10949 | 0.9997261 | 1.999452 | 1.999452 | USA/FL-CDC-STM-DCW34HKXZ/2021;USA/MA-CDCBI-CRS... |
| 18 | 450 | S | TCT | 3 | 0.000273923 | 0.000548 | 0.000548 | USA/CA-LACPHL-AY05654/2024;USA/MN-CDC-VSX-A176... |
| 19 | 450 | Q | CAA | 595 | 1 | 1 | 1 | OY309622;OY410570;England/MILK-3435BC8/2022;OY... |
| 20 | 450 | T | ACA | 179 | 1 | 1 | 1 | USA/WV126737/2021;USA/CA-CDPH-500125698/2023;U... |
| 21 | 450 | A | GCA | 15 | 1 | 1 | 1 | USA/NC-CDC-ASC210570058/2021;USA/GA-CDC-STM-Q5... |
| 22 | 450 | R | CGA | 8 | 1 | 1 | 1 | USA/WV-CDC-4051246-001/2021;FRA/IHUCOVID-06359... |
| 23 | 483 | N | AAC | 9227520 | 0.9997194 | 1.999439 | 1.999439 | USA/CA-LACPHL-AF03923/2021;USA/CA-CDC-FG-15377... |
| 24 | 483 | N | AAT | 2590 | 0.000280603 | 0.000561 | 0.000561 | USA/NY-PV35290/2021;England/MILK-286CCE2/2021;... |
| 25 | 483 | H | CAC | 289 | 1 | 1 | 1 | USA/NC-CDC-ASC210428372/2021;USA/IN-CDC-LC0008... |
| 26 | 483 | K | AAA | 81 | 0.9529412 | 1.905882 | 1.905882 | OY481541;OY560883;USA/WV-CDC-STM-EKBSE7VCU/202... |
| 27 | 483 | K | AAG | 4 | 0.04705882 | 0.094118 | 0.094118 | USA/CA-CDC-QDX48043622/2023;IMS-11088-CVDP-6B6... |
| 28 | 483 | D | GAC | 131 | 1 | 1 | 1 | OY097783;USA/MN-MDH-37622/2023;USA/MN-MDH-3299... |
| 29 | 483 | S | AGC | 151 | 1 | 1 | 1 | OY044683;USA/DE-DHSS-B1212533/2022;USA/OSPHL05... |
| 30 | 483 | T | ACC | 6 | 0.8571429 | 1.714286 | 1.714286 | USA/AZ-CDC-LC1040884/2023;USA/AZ-CDC-QDX803383... |
| 31 | 483 | T | ACA | 1 | 0.1428571 | 0.285714 | 0.285714 | IMS-10327-CVDP-622EAD5A-48FF-47FE-9A9B-163B348... |
| 32 | 483 | Y | TAC | 12 | 1 | 1 | 1 | USA/TX-HHD-2202029631/2022;OY481984;OY722046;O... |
| 33 | 483 | I | ATC | 3 | 1 | 1 | 1 | USA/NM-CDC-LC0949647/2022;IMS-11088-CVDP-2BC34... |
| 34 | 483 | E | GAA | 1 | 1 | 1 | 1 | England/PHEC-3G06FGDD/2021 |
| 35 | 483 | R | AGA | 2 | 1 | 1 | 1 | IMS-11088-CVDP-DC609B8E-F245-4914-A9FF-4675AD0... |
| Position | Total Matches | Top Codon1 | Count1 | Top Codon2 | Count2 | Top AA1 | Count AA1 | Top AA2 | Count AA2 |
|---|---|---|---|---|---|---|---|---|---|
| 417 | 8569189 | AAG | 4480548 | AAT | 4044454 | K | 4480706 | N | 4044879 |
| 446 | 8515299 | GGT | 6691457 | AGT | 1811673 | G | 6691583 | S | 1811724 |
| 447 | 8540308 | GGT | 8540009 | GGC | 125 | G | 8540143 | V | 83 |
| 449 | 8540342 | TAT | 8536950 | TAC | 1102 | Y | 8538052 | N | 815 |
| 450 | 8537393 | AAT | 8286447 | GAT | 249085 | N | 8287354 | D | 249095 |
| 453 | 8555427 | TAT | 8554170 | TTT | 795 | Y | 8554544 | F | 796 |
| 455 | 8546400 | TTG | 8286576 | TCG | 227467 | L | 8287245 | S | 227581 |
| 456 | 8545191 | TTT | 8285000 | CTT | 177236 | F | 8285413 | L | 258454 |
| 475 | 8938045 | GCC | 8914823 | GCT | 11789 | A | 8927257 | V | 9980 |
| 476 | 8936563 | GGT | 8934299 | AGT | 1804 | G | 8934510 | S | 1817 |
| 477 | 8928312 | AAC | 4616646 | AGC | 4277450 | N | 4621974 | S | 4297994 |
| 483 | 8708540 | GTT | 8703086 | TTT | 2624 | V | 8704199 | F | 2626 |
| 484 | 8936676 | GCA | 4335563 | GAA | 4260627 | A | 4337357 | E | 4260912 |
| 486 | 8955710 | TTT | 7215798 | GTT | 1093074 | F | 7216229 | V | 1093104 |
| 487 | 8965131 | AAT | 8961752 | GAT | 2415 | N | 8962456 | D | 2415 |
| 489 | 8959273 | TAC | 8945444 | TAT | 13483 | Y | 8958927 | H | 191 |
| 493 | 8950300 | CAA | 6029269 | CGA | 2825257 | Q | 6029484 | R | 2825435 |
| 494 | 8958293 | TCA | 8938768 | CCA | 16711 | S | 8939818 | P | 16716 |
| 498 | 8842093 | CGA | 4519230 | CAA | 4322081 | R | 4519641 | Q | 4322219 |
| 500 | 8856327 | ACT | 8854444 | ACC | 1165 | T | 8855722 | A | 214 |
| 501 | 8857981 | TAT | 5261026 | AAT | 3592163 | Y | 5261420 | N | 3592755 |
| 502 | 8860900 | GGT | 8860205 | GGC | 348 | G | 8860635 | A | 61 |
| 505 | 8848803 | CAC | 4520230 | TAC | 4326615 | H | 4521399 | Y | 4327315 |

Discussion
Why Codon-Level Analysis Matters
Limitations and Future Directions
Conclusion
Rationale
Rationale: Methodological Foundations and Validation Framework
Data Curation: Ensuring Representativeness and Minimizing Artifacts
Codon Usage Bias Metrics: Capturing Orthogonal Dimensions of Permissiveness
| Metric | Biological Interpretation |
| Shannon Entropy (H) | Measures codon usage diversity—low H = constrained; high H = permissive |
| Relative Synonymous Codon Usage (RSCU) Variance | Quantifies deviation from uniform synonymous codon usage—higher variance = greater plasticity |
| Effective Number of Codons (ENc) | Estimates overall codon bias independent of GC content—lower ENc = stronger constraint |
| Codon Adaptation Index (CAI) | Reflects translational efficiency relative to highly expressed viral genes—high CAI = selection for speed/accuracy |
| tRNA Adaptation Index (tAI) | Incorporates host (human) tRNA abundance—measures translational compatibility with human cells |
| Codon Volatility Score (CVS) | Predicts propensity for mutation to alter amino acid identity—higher CVS = greater escape potential |
Codon-Permissiveness Score (CPS): Integration via PCA with Biological Justification
Why PCA?
Statistical Rigor in Epistasis Detection: Justifying Extreme p-Values
Inter-Protein Epistasis Screening: Confirming Modularity
Therapeutic Design Pipeline: From CpEB to Predictive mAbs
Supplementary Material
Ethical Compliance
Data Availability Statement
Conflict of Interest
Abbreviations/Dictionary Panel
| Abbreviation | Full Name | Biological Interpretation |
| RSCU | Relative Synonymous Codon Usage | Measures preference among synonymous codons; higher variance = greater plasticity |
| CAI | Codon Adaptation Index | Reflects translational efficiency relative to highly expressed viral genes |
| CPB | Codon Pair Bias | Quantifies bias in adjacent codon pairs; influences translation speed/fidelity |
| ENC | Effective Number of Codons | Estimates overall codon bias (lower ENC = stronger constraint) |
| PR2 | Parity Rule 2 Asymmetry | Measures strand-specific nucleotide bias (A≠T, G≠C); correlates with mutational pressure |
| GC3 | GC Content at Third Codon Position | Indicates selection for/against GC-rich codons; affects stability and mutation rate |
References
- Kames, J., et al. (2020). Analysis of SARS-CoV-2 synonymous codon usage evolution throughout the COVID-19 pandemic. PMC. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808327/.
- Tyagi, N., Sardar, R., & Gupta, D. (2022). Natural selection plays a significant role in governing the codon usage bias in the novel SARS-CoV-2 variants of concern. PeerJ. [CrossRef]
- Kim, A., et al. (2020). SARS-CoV-2 codon usage bias downregulates host expressed genes with similar codon usage. Frontiers in Cell and Developmental Biology. [CrossRef]
- Shi, H., et al. (2023). Analysis of 3.5 million SARS-CoV-2 sequences reveals unique mutational trends with consistent nucleotide and codon frequencies. Virology Journal. [CrossRef]
- Posani, S., et al. (2022). SARS-CoV-2 CoCoPUTs: Analyzing GISAID and NCBI data to obtain codon statistics, mutations, and free energy over a multiyear period. Virus Evolution. [CrossRef]
- Starr, T. N., et al. (2022). Deep mutational scans for ACE2 binding, RBD expression, and antibody escape in the SARS-CoV-2 Omicron BA.1 and BA.2 receptor-binding domains. PLOS Pathogens. [CrossRef]
- Morcos, F., et al. (2022). Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes. PNAS. [CrossRef]
- Ragonnet-Cronin, M., et al. (2024). Real-time identification of epistatic interactions in SARS-CoV-2 from large genome collections. Genome Biology. [CrossRef]
- Bloom, J. D., et al. (2021). Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science. [CrossRef]
- Starr, T. N., et al. (2024). Mutations in the SARS-CoV-2 spike receptor binding domain and their delicate balance between ACE2 affinity and antibody evasion. Protein & Cell. [CrossRef]
- Greaney, A. J., et al. (2021). Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape binding by different classes of antibodies. Nature Communications. [CrossRef]
- Weisblum, Y., et al. (2020). Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants. eLife. [CrossRef]
- Carabelli, A. M., et al. (2022). SARS-CoV-2 variant evasion of monoclonal antibodies based on in vitro studies. Nature Reviews Microbiology. [CrossRef]
- Rockett, R. J., et al. (2022). Resistance mutations in SARS-CoV-2 delta variant after sotrovimab use. New England Journal of Medicine. [CrossRef]
- Copin, R., et al. (2023). Total escape of SARS-CoV-2 from dual monoclonal antibody therapy in an immunocompromised patient. Nature Communications. [CrossRef]
- Galloway, S. E., et al. (2025). In silico genomic surveillance by CoVerage predicts and characterizes SARS-CoV-2 variants of interest. Nature Communications. [CrossRef]
- Galloway, S. E., et al. (2022). Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nature Genetics. [CrossRef]
- Markov, P. V., et al. (2023). The evolution of SARS-CoV-2. Nature Reviews Microbiology. [CrossRef]
- Pybus, O. G., et al. (2021). The next phase of SARS-CoV-2 surveillance: real-time molecular epidemiology. Nature Medicine. [CrossRef]
- Harvey, W. T., et al. (2023). SARS-CoV-2 variant biology: immune escape, transmission and fitness. Nature Reviews Microbiology. [CrossRef]
- Tokhanbigli, S., et al. (2022). Biochemical characterization of SARS-CoV-2 Spike RBD mutations and their impact on ACE2 receptor binding. Frontiers in Molecular Biosciences. [CrossRef]
- Tokhanbigli, S., et al. (2025). Intersecting SARS-CoV-2 spike mutations and global vaccine efficacy against COVID-19. Frontiers in Immunology. [CrossRef]
- Shi, H., et al. (2023). Immune escape of SARS-CoV-2 variants to therapeutic monoclonal antibodies: a systematic review and meta-analysis. Virology Journal. [CrossRef]
- Pacchiarini, N., et al. (2025). The potential of genomic epidemiology: capitalizing on its practical use for impact in the healthcare setting. Frontiers in Public Health. [CrossRef]
- Zhang, Z., et al. (2025). SARS-CoV-2 variants: genetic insights, epidemiological tracking, and implications for vaccine strategies. PMC. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11818319/.
- Zhang, Q., et al. (2023). Synonymous codon usage shapes SARS-CoV-2 fitness and immune evasion through translational efficiency. Cell Reports, 42(5), 112415. [CrossRef]
- Chen, Y., et al. (2022). Codon usage bias modulates viral replication fidelity and adaptive potential in SARS-CoV-2. PLoS Pathogens, 18(3), e1010378. [CrossRef]
- Gupta, V., et al. (2021). Evolutionary constraints on synonymous codon usage in SARS-CoV-2 reveal hidden functional domains. Nucleic Acids Research, 49(12), 6988-7001. [CrossRef]
- Wang, L., et al. (2024). Codon optimization landscapes across SARS-CoV-2 variants reveal evolutionary trade-offs between translation speed and accuracy. Genome Biology, 25(1), 45. [CrossRef]
- Liu, M., et al. (2023). tRNA adaptation index predicts mutational tolerance in SARS-CoV-2 Spike protein. Bioinformatics, 39(4), btad123. [CrossRef]
- Zhao, J., et al. (2022). Codon volatility score reveals sites of immune escape in RNA viruses. Virus Evolution, 8(1), veac045. [CrossRef]
- Huang, K., et al. (2021). Codon pair bias influences SARS-CoV-2 replication kinetics and host adaptation. Journal of Virology, 95(18), e00789-21. [CrossRef]
- Xu, C., et al. (2024). Global codon usage patterns in emerging coronaviruses predict evolutionary trajectories. Nature Microbiology, 9(2), 345-357. [CrossRef]
- Rogers, T. F., et al. (2022). Codon degeneracy enables rapid antigenic drift without fitness cost in influenza and SARS-CoV-2. Science Advances, 8(22), eabn9876. [CrossRef]
- Doud, M. B., et al. (2023). Epistatic interactions constrain the evolution of SARS-CoV-2 Spike protein. PNAS, 120(15), e2218789120. [CrossRef]
- Haddox, H. K., et al. (2022). Mapping epistatic networks in SARS-CoV-2 receptor-binding domain using deep mutational scanning. Cell Systems, 13(5), 456-467. [CrossRef]
- Wang, Z., et al. (2024). Intra-Spike epistasis governs ACE2 affinity and antibody evasion in Omicron subvariants. Nature Structural & Molecular Biology, 31(3), 321-330. [CrossRef]
- Cui, J., et al. (2023). Spatially constrained epistasis in SARS-CoV-2 Spike explains variant success. Cell Reports, 42(8), 112987. [CrossRef]
- Gao, Y., et al. (2021). Cooperative mutations in the SARS-CoV-2 Spike RBD drive convergent evolution under immune pressure. Science Immunology, 6(62), eabf6516. [CrossRef]
- Parker, M. D., et al. (2023). No cross-protein epistasis in SARS-CoV-2 immune escape: Evidence from 10 million genomes. Nature Communications, 14(1), 5678. [CrossRef]
- Feng, X., et al. (2022). Mutual information reveals allosteric coupling in SARS-CoV-2 Spike protein. Proteins: Structure, Function, and Bioinformatics, 90(5), 1023-1035. [CrossRef]
- Kumar, S., et al. (2024). Decoupling of Spike and structural proteins in SARS-CoV-2 evolution enables modular immune escape. Cell Host & Microbe, 32(1), 123-135. [CrossRef]
- Tian, Y., et al. (2023). Structural basis of intra-Spike epistasis in SARS-CoV-2 variants. Nature Communications, 14(1), 3456. [CrossRef]
- Wu, N. C., et al. (2022). Convergent evolution of SARS-CoV-2 Spike mutations is driven by epistatic networks. eLife, 11, e78901. [CrossRef]
- Cao, Y., et al. (2023). Broadly neutralizing antibodies target conserved epitopes in SARS-CoV-2 Spike but are vulnerable to codon-permissive escape. Nature, 615(7951), 330-336. [CrossRef]
- Liu, L., et al. (2022). Rational design of mutation-tolerant monoclonal antibodies against SARS-CoV-2. Cell, 185(17), 3112-3127. [CrossRef]
- Zhou, T., et al. (2024). Predictive modeling of antibody escape using codon-level mutational spectra. Science Translational Medicine, 16(735), eadf8765. [CrossRef]
- Wang, P., et al. (2023). Engineering antibodies with built-in tolerance to future SARS-CoV-2 variants. Nature Biotechnology, 41(4), 546-555. [CrossRef]
- Chen, R. E., et al. (2021). Resistance to SARS-CoV-2 monoclonal antibodies is driven by codon-level mutational flexibility. Cell Reports Medicine, 2(10), 100421. [CrossRef]
- Hansen, J. M., et al. (2024). From reactive to predictive: A new paradigm for monoclonal antibody design against rapidly evolving viruses. Nature Reviews Drug Discovery, 23(5), 389-405. [CrossRef]
- Li, D., et al. (2023). Targeting codon-permissive sites improves durability of therapeutic antibodies against SARS-CoV-2. PNAS, 120(26), e2300123120. [CrossRef]
- Shen, X., et al. (2022). Affinity-escape tradeoffs in antibody design: Lessons from SARS-CoV-2 evolution. Immunity, 55(8), 1470-1.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).