Evolutionary insights into the envelope protein of SARS-CoV-2

M. Shaminur Rahman, M. Nazmul Hoque, M. Rafiul Islam, Israt Islam, Israt Dilruba Mishu, Md. Mizanur Rahaman, Munawar Sultana, M. Anwar Hossain Department of Microbiology, University of Dhaka, Dhaka-1000, Bangladesh Department of Gynecology, Obstetrics and Reproductive Health, Bangabandhu Sheikh Mujibur Rahman Agricultural University, Gazipur-1706, Bangladesh Present address: Vice-Chancellor, Jessore University of Science and Technology, Jessore 7408, Bangladesh


The Study
SARS-CoV-2, the etiologic agent of COVID-19 disease has impacted the entire world, and created a public health emergency since December 2019 1,2 . The inherently higher mutations in the genome of SARS-CoV-2 have already produced many descendants from the original Wuhan strain, thereby escaping the host immune responses [3][4][5][6] . The genome of SARS-CoV-2 virus encodes for four major structural proteins such as the spike (S) protein, nucleocapsid (N) protein, membrane (M) protein, and the small envelope (E) protein, all of which are required to complete a successful infectious event/replication cycle of virus including entry, assembly, packaging and release of new virus particles within the human cells [8][9][10][11] . The E protein is the smallest of the major structural proteins, and associated with viral assembly, budding, envelope formation, and pathogenesis 7 . During the replication cycle, the virus expresses the E protein in high abundance inside the host cell, however, only a small portion is incorporated into the virion envelope. This protein carries out its functions by interacting membrane (M) and other accessory proteins viz.
ORF3a, ORF7a, and host cell proteins 9,10 . The ongoing rapid transmission, and global spread of COVID-19 have raised intriguing questions whether the evolution and adaptation of SARS-CoV-2 is driven by synonymous mutations, deletions and/or replacements 5,12,13 . Although, the mutational spectra of different structural proteins (S, M, and N) of SARS-CoV-2 has been reported by several research groups 5,6,13-15 over a short period of time, however, available literature on the nucleotide and aa-level mutations of E protein is till limited.
To comprehensively analyze the mutational spectra of E protein of SARS-CoV-2 as a continuous part of the coronavirus genomic mutational research 5,6,15,16 , we retrieved 83,607 complete or near-complete genome sequences of SARS-CoV-2 (human host) from the global initiative on sharing all influenza data (GISAID) (https://www.gisaid.org/) belonging to 159 countries or territories till 20 August 2020 (Supplementary Data 1). We obtained 81,818 cleaned sequences (97.86%) after removing the low quality sequences. Multiple sequence alignment was performed in MAFFT by using Wuhan strain as a reference (NCBI accession no. NC_045512) 17 , and nonsynonymous mutations were retrieved using the previously reported methods 5,6,15 . In this study, we also observed worldwide mutational variations within the primer-probes binding sites of SARS-CoV-2 E gene ( Table 1, Supplementary Data 1). We found a total of 74 nucleotide mutations that occupied the binding sites of primer-probes recommended by several research groups 18-20 for E gene targeted PCR-based detection of SARS-CoV-2 ( Table 1). The forward primer of Charite, German contained 15 mismatches within primer in the viral strains of USA, England, India, Scotland and Wales, whereas a USA strain showed the 3´ end mismatch.
The reverse primer and the probe of the same set exhibited 14 and 17 mismatches, respectively.
No strain showed 3´ end mismatch with the reverse primer while a strain of Netherlands showed 5´ end mismatch in the probe of the primer set (Table 1) Overall this study warrants continuous monitoring and to update the primer-probe sequences based on the regional viral genomic sequences for efficient and accurate detection of COVID-19.
The primary structure of the E protein contains 75 aa of which 84.0% (63/75) sites underwent to 115 unique aa mutations (Fig. 1). Our analysis showed that 35 sites in the E protein structure underwent to more than one aa mutations, and of them, aa position 5 and 72 had aa variation numbers of 4 and 6, respectively ( Table 2). Comparing the individual strain level mutations, we found that the S68F mutation in 250 strains (highest frequency) followed by L73F, R69I, and P71L mutations noticed in 100, 88, and 59 strains, respectively. The N-terminal, transmembrane domain (TMD), and C-terminal domain of the E protein had 7, 25, and 31sites for aa substitutions, respectively (Fig. 1). Several earlier studies 5,8 also reported aa mutations is 10 sites (aa positions: 26, 36, 37, 39, 46, 58, 68, 71, 72, 73) of the E protein corroborating our current findings.