Genetic and Epigenetic Variations of SARS-Cov-2 and Host Genes and Their Impact in Covid-19 Pandemic

The global COVID-19 pandemic claiming global spread continues to evolve, now to the verge of a third wave of outbreak possibly caused by the novel variants of concern of severe acute respiratory syndrome corona virus-2 (SARS-CoV-2). The test positivity rate (TPR) and case fatality rate (CFR) have increased steeply in the second wave of COVID-19 compared to the first. From the example of Kerala, a state in southern India, positivity increased from 1.33% at the peak of wave one in 10th June 2020 to 13.45% during 10th June 2021 in the second wave of pandemic. SARS-CoV-2 is an enveloped single-stranded RNA virus. Angiotensin-Converting Enzyme-2 (ACE-2) is a trans membrane surface protein present on multiple types of cells in the human body to which the viral spike protein attaches. Genetic variations in the SARS-CoV-2 and ACE2 receptor can affect the transmission, clinical manifestations, mortality and the efficacy of drugs and vaccines for COVID-19. Mutations are the primary cause of genetic variations. Given the high TPR and CFR, it is necessary to understand the variations of SARS-CoV-2 and cellular receptors of SARS-CoV-2 at the molecular level. In this review, we summarize the impact of genetic and epigenetic variations in determining COVID-19 pathogenesis and disease outcome.

(MERS) spread to humans in Arabian Peninsula [14]. The virus was named MERS corona virus (MERS-CoV) and the CFR was 20% [15][16][17]. The MERS-CoV had ability to alter immune response [18]. In 2017, World Health Organisation (WHO) included SARS and MERS in the priority pathogen list. In December 2019, COVID-19 was initially identified as a novel corona virus (2019-nCoV)-infected pneumonia (NCIP) that was reported from Wuhan, Hubei Province, in China [19]. At the initial stage, the disease was characterised by an incubation period of 5.2 days and a reproductive number estimate of 2.2 [1]. The genomic sequence of the virus was available in early 2020 from the samples isolated from patients suffering from pneumonia in Wuhan, China [20,21]. The SARS-CoV-2 had more amino acid homology with SARS-CoV than MERS-CoV [20,22].
WHO announced in January 30, 2020 that the novel corona virus pneumonia epidemic caused by SARS-CoV-2 was classified as a public health emergency of international concern in January 30, 2020 [23,24]. The International Committee on Taxonomy of Viruses (ICTV) renamed 2019-nCoV as severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) in February 11, 2020 [25]. Since the outbreak of SARS and MERS, COVID-19 is counted the seventh corona virus disease in humans [26].
Although SARS-CoV-2 has lower CFR compared to SARS and MERS viruses, the highly contagious nature of SARS-CoV-2, with an estimated reproduction number (Ro) of 2-6.47 makes COVID-19 a public health concern [27][28][29]. The successive emergence of SARS-CoV-2 variants with highly variable pathogenicity and transmission potential warrants studying the genetic diversity and evolution of SARS-CoV-2 with immediate priority [30][31][32]. COVID-19 disease dynamics, measured in terms of CFR, TPR and R0, are evolving continuously over time in different geographical locations [33,34,5]. This differential disease dynamics over time and space could closely be linked to rapidly evolving genetic variations in SARS-CoV-2, interacting with diverse genetic variability of hosts in different areas [35][36][37]. Ribosomal frameshifting in corona viruses enable the virus to adapt to the host cell [38].
Bats are considered the reservoir of corona viruses [39,40]. Wild and domestic animals act as intermediate hosts and facilitate mutation and recombination which enhances genetic diversity of viruses [41]. The coronavirus comprises of two open reading frames (ORF) ORF1a and ORF1b), four structural proteins, and several accessory proteins [42]. Based on the ORF1a and b, coronaviruses are divided into four groups: two mammal-specific alpha and beta and two avian-specific gamma and delta [43]. Inside the host cell, SARS-CoV-2 exerts its action by forming short-and long-range RNA-RNA interactomes which facilitate viral and host RNA interactions [44]. Based on RNA interactome analysis of SARS-CoV-2, 17 host and viral proteins that exert antiviral activity and 9 proviral host factors hijacked by SARS-CoV-2 were identified [45]. This RNA-RNA interaction through RNA binding proteins (RBPs) enables SARS-CoV-2 to evade host immune barrier. Seven human coronaviruses-SARS-CoV, MERS-CoV, SARS-CoV-2, HCoV-229E, HCoV-OC43, HCoV-NL63, and HCoV-HKU-mainly differ in their accessory proteins. SARS-CoV-2 is one of the RNA viruses with the largest genome. The reference genome of SARS-CoV-2 is 29903 bp single-strand RNA (accession number NC045512, severe acute respiratory syndrome coronavirus 2; isolate Wuhan-Hu-1). There are 630,559 SRA (sequence read archive) runs and 570,941nucleotide records of the SARS-CoV-2 genome in national center for biotechnology information (NCBI, accessed on 07-06-2021). The variants are designated by comparing the genetic sequence of emerging SAR-CoV-2 with that of the reference genome [46,47].

Spike protein in SARS-CoV-2
Spike proteins help the virus to bind with host cells and thus determine the virus-host interactions [48,49]. For SARS-CoV-2, the most prominent region susceptible to the mutation is the spike (S) protein [50]. Spike protein consists of two subunits-S (receptor binding subunit) and S1 (membrane fusion subunit). Among these subunits, the S subunit is highly variable, while S1 is conserved compared to S. The S subunit functions as attachment sites that enable the virus to bind to the ACE-2 receptor located on the surface of the host cell [51]. Once the virus binds to the host cell, the S1 subunit helps the fusion of the virus with the host cell membrane. Spike protein has an N-terminal domain (NTD), a C-terminal domain (CTD), and a receptor-binding domain (RBD). The RBD contains receptor binding motifs (437-508) which functions as host receptor-binding residues. The sites involved in host-cell binding and/or host immunity invasion are potential mutation hotspots in SARS-CoV-2.

Host cellular receptors
Immunological competency is the key in determining COVID-19 transmissibility and pathogenicity as infection-preventing immunity disappears earlier than pathogenicity alleviating immunity with a projected outcome of a potential variant of SARS-CoV-2 that may cause acute disease in children [52]. Based on this reasoning, these scientists suggest loss of virulence of SARS-CoV-2 only to cause common cold except in children when COVID-19 spread to an endemic phase. The host cellular receptors differ among corona viruses [53]. For SARS-CoV 1 and SARS-CoV 2, RBD binds to ACE-2 receptors on human host cells. For other human coronaviruses like HCoV-HKU-1 and OC 43, the NTD recognizes sugar derivatives on cell surface while HCoV-229E uses RBD to bind with aminopeptidase N (APN) on human host cells. However, the mouse hepatitis coronavirus uses NTD to bind with carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1-the host protein present on the cell surface). In MERS (middle east respiratory syndrome) the viral RBD binds to dipeptidyl peptidase 4 (DPP4) on the host cell surface.

SARS-CoV-2 variants
Viewing form this light, common locally evolved variants of SARS-CoV-2 are considered to be originated by the repetitive changes of the structural motifs which are specific to corona virus lineages [54]. These variants are also demonstrated in animal models [55]. In the Italian Sardinia Island, four different clusters of viruses were observed, which were characterised by amino acid substitutions [56]. The probability of mutations in S protein, particularly in NTD and RBD, vary widely as these residues are more flexible in their structural conformation, are functionally unique, and are important in the virus-host cell-binding process. Among NTD and RBD, the latter is more prone to mutations. Although SARS-CoV-2 RBD bind to the ACE-2 cell surface receptors on human cells, the binding target of NTD is yet unknown. The mutations in S protein affect the infectivity of SARS-CoV-2. For example, D614G mutation in the spike protein of SARS-CoV-2 increased the viral infectivity [57,58].
The SARS-CoV-2 variant identified in southeast England in September 2020, known as B1.1.7 or 501Y.V1 contains eight mutations in the spike gene in addition to the D614G mutation-including two deletions in the NTD (ΔY144 and ΔH69/ΔV70), one substitution in the RBD (N501Y) and another substitution close to the furin cleavage site (P681H). The second variant identified in the Eastern Cape, South Africa is known as B.1.351 or 501Y.V2. Along with the D614G mutation, this variant has nine mutations in the spike gene including those in the NTD (R246I and Δ242-Δ244), RBD ((N501Y E484K and K417N), and one mutation close to the furin cleavage site (A701V). The third variant known as P.1 or 501Y.V3 was identified from Brazil and contains three substitutions at RBD (N501Y, E484K, K417T). These mutations in the new variants of SARS-CoV-2 could affect the treatment with monoclonal antibodies and may affect the protective value of vaccines [59].

ACE-2 receptor protein
Genetic variability of host receptor proteins also influences the susceptibility, pathogenicity, and disease progression of viral diseases [60]. As ACE-2 receptor protein is the entry point and site of attachment of SARS-CoV-2, the genetic variability of ACE-2 is considered as a risk factor in the COVID-19 pandemic. ACE-2 variants associated with other ailments like hypertension and cardiovascular diseases are also considered candidate markers for studying ACE-2-associated genetic variations in COVID-19 susceptibility.
The ACE-2 gene is located on the Xp22 chromosome and possesses 22 exons. The expression level of ACE-2 varies with individuals and within an individual, ACE-2 expression is organ-specific. Human, bat (Chinese horseshoe), pig, and civet ACE-2 bind with SARS-CoV spike protein, whereas mouse ACE-2 do not bind with SARS-CoV. Three S protein trimers bind with two ACE-2 dimers initiating the entry of the virus into the host cell. Three amino acid residues in ACE-2 are specifically involved in SAR-CoV-2 binding. ACE-2 mutations that affect the circulating levels of ACE-2 in individuals are identified. These alleles may predispose varied binding of SARS-CoV-2 to the host cells and variations in COVID-19 pathogenicity. For example, ACE-2 variant rs2106809 was found to be associated with circulating ACE-2 levels. ACE-2 expression also varies with age, sex and ethnicity. Invariably, COVID-19 severity varies among different ethnic groups such as Asians, Africans, and Caucasians. Most of these deviations are explained based on the variations in the allele frequency of eQTLs in diverse populations. Molecules that block angiotensin receptor are suggested against COVID-19 [61].

Other host proteins
Pattern recognition receptors (PRRs) are proteins that control the transcription of inflammatory genes and regulate intracellular signaling processes, and thus, are involved in virus-induced inflammatory responses. Intracellular DNA sensors, RIG-like receptors (RLRs), C-type lectin receptors, (CLRs), Toll-like receptors (TLRs), and NOD-like receptors (NLRs) are major PRRs found in humans. PPRs play important role in innate immune response as they identify PAMPs (pathogen-associated molecular patterns) and DAMPs (damage-associated molecular patterns-molecules released from damaged cells). Overexpression of PRRs, particularly TLR 4 is associated with cytokine storm-mediated morbidity and mortality. The age-related variation in COVID-19 severity is also attributed to differential expression of PRRs. Thus, PRRs play a critical role in mediating SARS-CoV-2-host cell interactions, particularly in individuals with comorbidities like diabetes, obesity, and cardiac diseases.
Studies show that epigenetic modifications at transcriptional, post-transcriptional, and post-translational levels influence COVID-19 pathogenesis [62]. Both human and viral microRNAs (miRNAs) function as intermediaries in viral attachment and further host-viral interactions. Histone modification and decreased DNA methylation in ACE-2 cause up-regulation of ACE-2. Factors like tissue type, sex, and age affect ACE-2 DNA methylation and predispose altered ACE-2 expression. Viral miRNAs also mediate virus-host interactions. Gene expression profiling revealed discrete profiles of inflammatory cytokines in COVID-19 patients. Important cytokines such as CCL4/MIP1B, CCL2/MCP-1, CCL3/MIP-1A, CXCL10/IP-10 were differentially expressed. Virus stimulated activation of apoptosis and P53 signaling pathway is proposed as the cause of lymphopenia in COVID-19 [63].

Significance of studies on genetic and epigenetic variations
In the rapidly spreading phase of COVID-19, it is very important to identify the sequence variability if any, in the circulating viruses and identify the emergence of new variants [64]. It is also important to corelate the implications of current preventive and therapeuticmeasures to genetics and genomics of COVID-19 [65,22]. For example, Pachettiet al. (2020) evaluated impact of lockdown on viral mutations and COVID-19 CFR in various countries in Europe and North America. They found that mutations in SARV-COV-2 tend to stabilise at four genomic regions of SARS-CoV-2 whereas, new nonsynonymous mutation was seen in the samples from Sweden where soft lockdown was implemented [66].
In fact, virus surveillance through genomic sequencing helps to understand the mutations in the virus genome and helps to formulate control/therapeutic strategies. Escape variants (the mutated virus that causes disease in vaccinated or recovered individuals) should also be monitored promptly. Exploring genetic variations in the host cell receptors may also help us to elucidate the mechanistic ways by which SARS-CoV-2 differentially bind to the host cells and cause differential susceptibility and pathogenicity in different individuals/diverse populations [67].

Epigenetic changes
There are reports showing disproportionate clinical manifestation of COVID-19 in old age people. Involvement of differential abundance and activity of 315 miRNAs common for age related signaling and COVID-19 are identified and their lower activity is reported in old age and comorbidity conditions [68]. Circular non-coding RNA (CircR-NAs-formed by covalent attachment of 3′ and 5′ ends of RNA) and long non-coding RNAs (LncRNAs-with lengths greater than 200 nucleotides) that regulate cell division, cell death, immune response and signaling pathways are involved in COVID-19 pathogenesis. Wu et al (2020) identified 114 circRNAs and 10 lncRNAs in COVID-19 patients that were connected to exosomes. Exosomes are 40-100 nm size vesicles present in the cell which mediate virus-hot cell interaction and influence cellular response against viral diseases. These circRNA and lncRNA are projected as biomarkers to COVID-19 severity [69].

Impact on clinical manifestations
The significance of deletions in immune evasion on the evolutionary trajectory of SARS-CoV-2 to an endemic virus was studied [70]. Deletions in ORF7, ORF8, and ORF10 regions found in Bangladesh and deletions in or near the furin polybasic site of the spike protein have been associated with reduced virulence [71]. The three HLA alleles (HLA-A*11, -C*01, and -DQB1*04) among Spaniards and the HLA-DRB1*08 in the Italian population correlated with mortality of COVID-19 [72]. Variants in cytokine genes such as IL1B, IL1R1, IL1RN, IL6, IL17A, FCGR2A, and TNF could be related to disease susceptibility, cytokine storm, and other COVID-19 complications [72].
Several variants in ACE2 and TMPRSS2 affecting the expression of the receptors related to COVID-19 have been associated with the disease susceptibility and risk factors.
Germline variants in UNC13D and AP3B1 (two typical hemophagocyticlymphohistiocytosis related genes) were found to be associated with the development of severe cytokine storm and fatal outcomes in COVID-19 [73]. Deaths during the first phase of the epidemic was found to be associated with L84S mutation (ORF8 protein involved in immune system evasion) and 2 other helicase mutations (NSP13, P504L, and Y541C) [71]. The 3p21.31 chromosome region (LZTFL1, SLC6A20, CCR9, FYCO1, CXCR6, and XCR1) and 9q34.2 (ABO) were found to have association with COVID-19 severity in genome-wide association studies [74]. 501Y.V2 variant was associated with in-hospital mortality that was 20% higher in the second wave in South Africa than in the first wave. The B.1.1.7 variant was associated with a higher risk of death than preexisting variants in the United Kingdom [75].

Impact on COVID-19 tests
Rapid and accurate COVID-19 tests are crucial for diagnosis and treatment, prevention, contact tracing, epidemiologic characterization, and public health decision making. The major concern regarding COVID-19 diagnostic tests related to the emergence of new variants is the potential failure of RT-PCR tests for diagnostics. New variants can affect the performance of certain tests if the mutation is in a region of the genome targeted by the test. A high-frequency mutation on the right end of a primer or probe position of a target would possibly produce more false-negatives in diagnostics. The impact of new variants on test sensitivity is influenced by the genomic sequence of the variant, the test design and the prevalence of the particular variant in the tested population. Tests based on multiple regions of the genome may be less impacted by the new variants than the tests that rely on detection of only a single region. The risk of false negative results may increase when testing patients with B.1.1.7 variant (with double deletion at positions 69 and 70 on the spike protein gene), while using assays like TaqPath COVID-19 Combo Kit and the Linea COVID-19 Assay Kit. Failure to target the gene encoding the spike protein was observed during detection of the 501Y.V1 variant in the UK [76]. Thus, relying on only one target for the detection of SARS-CoV-2 infection using RT-PCR is not recommended. New variants should also be reliably recognized by the most widely used commercial and in-house tests. Novel genomic variants appearing during the ongoing pandemic represents an important diagnostic issue that needs to be monitored in the future [77]. PCR assays which are specifically designed for variant surveillance would help to fill in many of the gaps about their distribution and frequency.

Impact on COVID-19 treatment
Several new compounds are tested and some existing drugs are studied for repurposing to target various SARS-CoV-2 proteins [78,79]. For instance, a compound ABBV-744 was found to bind with the main protease enzyme of SARS-CoV-2 with heavy binding affinity (ΔGbin −45.43 kcal/mol) [80]. Viral proteases are also considered drug target against COVID-19 [81]. For advanced lung injury in COVID-19 stem cell therapy using extracellular vesicles is suggested [82]. Variations of TMPRSS2 and CD26 have been demonstrated to enhance COVID-19 susceptibility [83].
Analysing molecular pathways involved in humans also helps to formulate novel treatment protocols. For example, ATP1A1belonging to multi subunit Na+/K+ ATPase is an integral membrane protein responsible for maintaining electrochemical gradients across plasma membrane and transmits cardiotonic steroid binding induced signals in to the cells. ATP1A1 is also involved in the entry of viruses in to the cells, blood coagulation and oxidative stress during COVID-19 [84][85]. Burkard et al. regulated Srcsignalling blocks entry of corona virus in to host cells [86]. Cardiac glycoside bufalin, that inhibits ATP1A1 is suggested as inhibitor of SARS-CoV-2 [87].
Non-epigenetic drugs re-purposed for treating COVID-19 aiming at the most significant pathways in the SARS-CoV-pathogenesis were not completely successful in eliminating the virus [88]. Recently approved monoclonal antibodies (bamlanivimab plus etesevimab and casirivimab plus imdevimab) for the treatment of non-hospitalised patients with mild to moderate COVID-19, targeting the S protein could limit progression to severe disease particularly in those who have not yet developed an endogenous antibody response [89]. Some SARS-CoV-2 variants (The P.1 variant, B.1.429/B.1.427 variants & B.1.526 variant) have markedly reduced susceptibility to even the newer monoclonal antibodies like bamlanivimab and may have lower sensitivity to etesevimab and casirivimab [90]. Ongoing population-based genomic surveillance of the circulating SARS-CoV-2 variants and their susceptibility to available drugs including anti-SARS-CoV-2 monoclonal antibodies, will be important in defining the utility of the strategy in the future.
The molecular level understanding on epigenetic mechanisms including microRNA, DNA methylation, chromosomal structural alterations, and microbiome involvement will provide useful information on the importance of epigenetics in SARS-CoV-2 disease [91]. Epigenetic enzymes (such as DNMT1, histone acetyltransferase 1 (HAT1), histone deacetylase 2 (HDAC2), and lysine demethylase 5B (KDM5B) responsible for the modifications of ACE2 expression including DNA methylation and histone modifications can act as potential targets to control the host immune response [92].
Even though currently no drugs targeting epigenetic pathways are available to treat COVID-19 as such, epigenetics-based drugs have a bright future. Epigenetic mechanisms that participate in the immune response through different mechanisms (response to interferon or NF-kB complex) are evident drug target candidates for COVID-19 because they already have associated drugs targeting them (such as STAT5A and STAT1). Furthermore, EP300 and RELA are new candidate druggable epigenetic pathways which are targeted by drugs with anti-inflammatory or antiviral properties. TRIM25, TRIM28, and MOV10 are also considered as good candidates for epigenetics-based drugs [93]. Having viral sequences at hand will also help clinicians to develop new prognostics/diagnostics and facilitate vaccine development. Future studies focusing on the novel paradigm involved between host and SARS-CoV-2 and the epigenetic basis of immune evasion can discover new potential drug targets and drug candidates which could be effective and whose potential use has not been exploited yet.

Impact on vaccine efficacy
Neutralising antibodies against SARS-CoV-2 has been evaluated [94]. The experimental data on neutralization assays shows that the plasma from vaccinated individuals has mild to modest reduction in neutralizing activity against mutants. Evidence from clinical trials is broadly consistent with the laboratory results, with the B.1.351 variant showing greater signs of vaccine escape. The ChAdOx1 nCoV-19 vaccine showed clinical efficacy against the B.1.1.7 variant but failed to provide protection against mild to moderate disease caused by the B.1.351 variant, with vaccine efficacy against the variant estimated at 10.4% [95]. The NVX-CoV2373 (Novavax) protein-based vaccine have 95.6% efficacy against the wild-type virus and that this is moderately lower for the B.1.1.7 variant (85.6%) and is further reduced for the B.1.351 variant (60.0%) as per the preliminary data from clinical trials [96]. The JNJ-78436735 (Johnson & Johnson/Janssen) vaccine showed 72% protection against moderate to severe SARS-CoV-2 infections in the USA, but the proportion significantly decreased to 57% in South Africa (at a time when the B.1.351 variant was widespread) [97]. Sequencing of viruses associated with prolonged infections will provide useful information on mutations that could contribute to increased escape from vaccine-mediated immunity [98].

Conclusions and way forward
Genetic characterization of the propagating and /or the emerging variants of the SARS-CoV-2 and identifying the genetic diversity of host cell receptor proteins among distinct populations (e.g., individuals with mild COVID-19 disease vs individuals with severe disease) could offer possible solutions for the management and prevention COVID-19 both in individuals and among different populations. This can be achieved through setting up of centrally co-ordinated local laboratory facilities with genomic variant screening at district levels and linking those with the line departments at the field level. Genomic surveillance in different regions can thus be strengthened to increase the number of sequences representing each region in the global data bases, and to better understand the molecular epidemiology of the virus. Both human and animal surveillance system should be established following the "one health" principles and corona virus surveillance could be done in both humans and animals along with parallel environmental metagenomic studies at the district level system.