Mapping the non-structural transmembrane proteins of SARS-CoV-2

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) responsible for the disease COVID-19 has wreaked havoc on the health and economy of humanity. In addition, the disease is observed in domestic and wild animals. The disease has impacted directly and indirectly every corner of the planet. Currently, there are no vaccines and effective therapies for COVID-19. SARS-CoV-2 is an enveloped virus with a singlestranded RNA genome of 29.8 kb. More than two-thirds of the genome comprises Orf1ab encoding 16 non-structural proteins (nsps) followed by mRNAs encoding structural proteins, spike (S), envelop (E), membrane (M), and nucleocapsid (N). These genes are interspaced with several accessory genes (open reading frames [Orf] 3a, 3b, 6, 7a, 7b, 8, 9b, 9c and 10). The functions of these proteins are of particular interest for understanding the pathogenesis of SARS-CoV-2. Several of the nsps (nsp3, nsp4, nsp6) and Orf3a are transmembrane proteins involved in regulating the host immunity, modifying host cell organelles for viral replication and escape and hence considered drug targets. In this paper we report mapping the transmembrane structure of the nonstructural proteins of SARS-CoV-2.


INTRODUCTION
The coronavirus-19 disease  caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in late 2019 is responsible for the pandemic and is a major public health concern. The disease has resulted in mortality, massive hospitalizations, lockouts, economic loss, employment layoffs, school and college closings, and have impacted all the countries. The SARS-CoV-2 virus not only infects humans, but also companion and wild animals. Even after a year, there are no effective medicines or vaccines to protect against the disease (Thomas, 2020). The only effective strategies to decrease the incidence of the disease is social distancing, use of masks and lockdowns in seriously affected areas.
People with diabetes are at risk of the disease. As yet we do not know why the virus is highly successful in causing the pandemic within 3 months of its first report (Thomas, 2020).Understanding the structure and function of the proteins of SARS-CoV-2 will lead to the development of effective vaccines and drugs to protect against the virus.
The structural proteins of SARS-CoV-2 include, membrane glycoprotein (M), envelope protein (E), nucleocapsid protein (N) and the spike protein (S). SARS-CoV-2 contains a 29.8-kb single-stranded RNA genome wrapped in a helical nucleocapsid composed of multiple copies of N protein, which in turn is surrounded by an envelope containing S glycoprotein, M glycoprotein, and a small E protein. The viral gene order is similar to that in other known coronaviruses, with the first two open reading frames (1a and 1b) encoding the viral replicase and the downstream mRNAs encoding the structural proteins. Using bioinformatics, we previously showed that the M proteins of SARS-CoV-2 resembles the sugar transporters that may be involved in functions favorable to the virus (Thomas, 2020). The non-structural proteins (nsps) of SARS-CoV-2 are involved in inhibiting innate immunity and also induce virus replication. Two overlapping Orfs, Orf1a and Orf1b, are translated from the positive-strand genomic RNA and generate continuous polypeptides, which are cleaved into a total of 16 nsps. The genes coding for the structural proteins are interspaced with several accessory genes including the open reading frames (Orf) 3a, 3b, 6, 7a, 7b, 8, 9b, 9c, 10). The functions of these proteins are of particular interest for understanding the pathogenesis of SARS-CoV-2 (Yoshimoto, 2020;Gordon et al. 2020).
To evade detection by host innate immune sensors, viruses that replicate in the cytoplasm compartmentalize their genome transcription in organelle-like structures, thereby protecting the virus against host cell defenses and increasing the replication efficiency (Belov and van Kuppeveld, 2012). The transmembrane nsp3, nsp4 and nsp6 are known to rearrange endoplasmic reticulum (ER) membranes thereby inducing curvature of the ER membrane, essential for virus replication. Lack of antiviral therapies is the paucity of knowledge regarding the β-coronavirus-host cell interface (Ghosh et al. 2020). In this paper we provide information on the transmembrane and lumen domains of the nsps of SARS-CoV-2.

Protein modeling
For a comprehensive understanding of biological systems of how proteins complexes and networks operate a detailed description of the interactions of proteins and the overall quaternary structure is essential. Residue-based diagrams of proteins, also called snake diagrams or protein plots, are 2-D representations of a protein sequence that contain information about properties such as secondary structure (Skrabanek et al. 2003). To determine a snake diagram model of a protein we used Protter (http://wlab.ethz.ch/protter). Protter is an interactive and customizable web-based application that enables the integration and visualization of both annotated and predicted protein sequence features together with experimental proteomic evidence for peptides and posttranslational modifications onto the transmembrane topology of a protein. It allows users to choose from numerous annotation sources, integrate their own proteomics data files, select the best-suited peptides for targeted quantitative proteomics applications, and export publication-quality illustrations (Omasits et al. 2014).

Multiple sequence alignments
Multiple sequence alignments (MSAs) are essential in bioinformatics analyses that involve comparing homologous sequences (Thompson et al. 1994). ClustalW2 is a server for MSA that is also used for phylogenetic tree analysis. Multiple sequence alignments between the ORF3a protein of SARS-CoV-2 and that of SARS-CoV was performed using the ClustalW2 server (http:/www.ebi.ac.uk/tools/msa/clustalW2/).

RESULTS
As yet there are no effective therapies and vaccines for COVID-19. To meet the increasing demand for the treatment of COVID-19 there is a need to accelerate novel antiviral drug development as quickly as possible. Target-based drug development may be a promising approach to achieve this goal (Liu et al. 2021). Identifying molecular targets could lead to development of medications that protect against SARS-CoV-2 virus.
Coronaviruses, including SARS-CoV-2 that replicate in the cytoplasm compartmentalize their genome transcription in organelle-like structures thereby protecting the virus against host cell defenses and increasing the replication efficiency (Santerre et al. 2020). We previously mapped the structural proteins of SARS-CoV-2 and demonstrated that the M protein the virus resembles the sugar transporter SemiSWEET (Thomas, 2020).
The nsps are critical elements of the replication and transcription complex (RTC), as well as immune system evasion. Through hijacking the endoplasmic reticulum (ER) membrane, nsps help the virus establish the RTC (Santerre et al. 2020). The structure and function of nsps of SARS-CoV-2 is similar to SARS-CoV. In SARS-CoV, the primary structures of the three nsps-nsp3, nsp4, and nsp6, contain hydrophobic stretches, and these proteins are predicted to be integral membrane proteins. Hence, they are likely to function in anchoring the replication complexes to the lipid bilayer (Oostra et al. 2007).
Using bioinformatics, we mapped the transmembrane nsps of SARS-CoV-2. We initially mapped the Orf1ab. The Orf1ab codes for 16 nsps. The Orf1ab is cleaved into 16 nsps. between the third and four transmembrane domains (Fig. 3).
The nsp6 has six transmembrane domains and two small luminal domains between the third and fourth as well as fifth and sixth transmembrane in the ER lumen (Fig. 4).
The ORF3a of SARS-CoV-2 is also a transmembrane protein. However, it was difficult to model it by Protter. The structure of Orf3a of SARS CoV-2 is similar to SARS-CoV and has around 73% homology as determined by ClustalW (Fig. 5). Hence, we used Orf3a of SARS-CoV as a template to model the snake diagram. The Orf3a has three transmembrane domains and a long and short luminal domain jutting into the ER lumen (Fig. 5).
The transmembrane domains of the nsps of SARS-CoV-2 is responsible for inhibiting the host immunity as well as increasing the replication efficiency of the virus. In this paper we have mapped the transmembrane nsps of SARS-CoV-2 and they could be used as a target to inhibit virus replication.

DISCUSSION
The COVID-19 pandemic caused by the SARS-CoV-2 virus has immobilized the world. It is the most severe pandemic of the twentieth century. As of the first week of December 2020, the virus has infected 63 million people, with 1.5 million deaths world-wide. The disease is more severe in old people compared to children and young adults. Currently there are no therapies and vaccines for the disease; hence, there is an urgent need to develop effective therapies and vaccines for the deadly disease.
Although accessory proteins have been viewed as dispensable for viral replication in vitro, some have been shown to play an important role in virus-host interactions in vivo (Yoshimoto, 2020;Gordon et al. 2020).
The entry of the SARS-CoV-2 into cells starts when the spike glycoprotein expressed on the viral envelope binds to ACE2 receptor of the host cell. The virus enters the cells through endocytosis process, which is possibly facilitated, via a pH dependent endosomal cysteine protease cathepsins. Once inside the cells, SARS-CoV-2 exploits the endogenous transcriptional machinery of host cells to replicate and spread inside the cell.
The virus activates or hijack the intracellular pathways of the host in favor of its replication (Sureda et al. 2020).
Knowledge on the structure of the structural proteins of SARS-CoV-2 is essential to the development of vaccines. Most of the vaccines that protect against SARS-CoV-2 developed in the laboratory are based on the S protein (Gu et al. 2020;vanDoremalan et al. 2020;McKay et al. 2020). Several commercial entities are also developing the vaccines based on the Spike mRNA, adenovector or recombinant protein.
In a previous paper we showed the structures of the structural proteins of SARS-CoV-2.
In silico analysis showed that the M protein of SARS-CoV-2 resembled the prokaryotic sugar transporter SemiSWEET (Thomas, 2020). The nsps are involved in inhibiting host innate immunity, inducing RNA replication and virus exit. They are potential drug targets.
However, the transmembrane domains of these nsps are not clearly documented. In this paper we report the domains of the nsps that may be targets for potential drug candidates.
The first nsps encoded by Orf1a/Orf1ab are papain-like proteinase (PL proteinase, nsp3) and 3-chymotrypsin-like proteinase (3CLPro protease). The PL proteinase nsp3 cleaves nsps 1 to 3 and the 3CLPro proteinase slices the C-terminus from nsp4 to nsp16. Nsp3 is the largest element of the RTC. In addition to cleaving, nsp3 alters cytokine expression to decrease the host innate immune response. Nsp3, nsp4, and nsp6 form a complex to induce double-membrane vesicles (DMV) (Rohaim et al. 2020). Until recently it was not understood how the newly synthesized genomes and messenger RNAs can travel from the sealed replication compartments to the cytosol to ensure their translation and the assembly of progeny virions. Wolff et al. (2020) showed that a molecular pore spans the DMVs and it allows the export of newly synthesized viral RNA from the DMVs to the cytosol. They also demonstrated that the nsp3, nsp4 and nsp6 are components of the pore. Several nsp3 domains, including the conserved N-terminal ubiquitin-like domain 1 (Ubl1) binds single-stranded RNA and the N protein that also interact with viral RNA. The double-membrane-spanning molecular pore may constitute the exit pathway for coronaviral RNA products from the DMV's interior toward the cytosol, with the large and multifunctional nsp3 being its central component. In this paper we map the ER luminal domain of nsp3, nsp4 and nsp6 that may be involved in DMV formation.

Replication and transcription of the virus happen within a replication/transcription complex
Interferons (IFNs) are cytokines with strong antiviral activities and is the first line of defense against invading pathogens. Multiple nsps are involved in inhibiting IFN-I production. The nsp6 binds TANK binding kinase 1 (TBK1) to suppress interferon regulatory factor 3 (IRF3) phosphorylation, nsp13 binds and blocks TBK1 phosphorylation, and Orf6 binds importin Karyopherin α 2 (KPNA2) to inhibit IRF3 nuclear translocation. SARS-CoV-2 nsp1 and nsp6 suppress IFN-I signaling more efficiently than SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV) (Xia et al. 2020).
The Orf3a protein is expressed abundantly in infected and transfected cells, which localizes to intracellular and plasma membranes. (Hassan et al. 2020). ORF3a induce apoptosis of cells mediated through caspase 3 (Ren et al. 2020). Issa et al. (2020) identified six functional domains (I to VI) in the SARS-CoV-2 Orf3a protein. The functional domains were linked to virulence, infectivity, ion channel formation, and virus release. Orf 3a may also be involved in vesicle trafficking (Gordon et al. 2020). The sequence of Orf3a of SARS-CoV-2 is 73% similar to that of SARS. As it was difficult to model the Orf3a sequence of SARS-CoV-2 by Protter, we used the corresponding sequence of SARS.
Based on our analysis, the Orf3a has three transmembrane domains and a long and short luminal domain jutting into the ER lumen.
Overall, this paper maps the structure of the nsps that modify the ER to DMVs so as induce replication and further exit the host cell. How the SARS-CoV-2 regulate the host cell to hide in DMVs and replicate will lead to the development of therapies that treat COVID-19 (Santerre et al. 2020). Fig.1. The topology of Orf1ab (snake diagram) determined using Protter. The Orf1ab is cleaved into 16 nsps. Analysis of the Orf1ab shows the three transmembrane nonstructural proteins-nsp3, nsp4, nsp6. The 12 transmembrane domains of the nsps are shown in figures 2-4.