Structural and Functional Characterization of RNA-binding Site of Rev and Rev-Response-Element RNA Complex via All Atom Molecular Dynamics Simulations

Nuclear export of viral mRNAs, is an essential step in the HIV replication cycle. This role is played by a small regulatory protein of HIV-1 called Rev.The N-terminal region of Rev contains an arginine-rich sequence. The arginine-rich motif (ARM) is located between amino acids 38-50 and forms an alpha-helical secondary structure. Expression of the structural proteins of human immunodeficiency virus type 1 requires the direct interaction of multiple copies of the viral Rev protein with its highly structured RNA target sequence, the Rev Response element (RRE). The major viral proteins are not produced if this transport of RNA is stopped. Therefore, knowledge of Rev structure is essential for understanding of its cooperative binding to the RRE, for understanding the mechanism of HIV infection and for the development of antiviral drugs that interfere with Rev’s essential functions and for acknowledgment of good candidate drugs for treatment of AIDS. To understand how REV interact with RRE element of HIV-RNA and its formation of oligomeric complex it is better to characterize the domain wise structure of REV with regard in function of each domain. Due to lack of structural data on Rev no single compound is reported as inhibitor of REV expect antiviral drugs. Identification of a high-affinity RNA-binding site for the human immunodeficiency virus type 1 Rev Protein is much more important. The ARM is a highly specific sequence which allows for the multimerization of Rev and also binding of REV with RNA. Here we are first time exploring the structural characteristics of REV protein both in free form and in complex with RNA at domain function level especially explore the role of ARM motif in REV HIV-1 protein as RNA binding sites by molecular dynamics (MD) simulation and homology modeling studies. Results indicate that the arginine-rich motif (ARM) is crucial in stability of this complex. The residues ARG38, 39, 41, 43, 44, 48, 50, and ASN40 are most interacting with nucleobases of RRE in Crystal structure of Rev and Rev-response-element RNA complex. Our study plays a major role in elaboration of binding of RNA with REV and pave the way for further investigation for therapeutically agent for HIV.


Introduction
HIV virus is a retrovirus belongs to family of virus called Retroviridae contain ribonucleic acid (RNA) as a genetic material [1][2][3].Total size of genome of HIV virus is 9 kb and encode for 15 proteins [4][5].Before translation of this RNA into protein it undergoes various strategies such as overlapping reading frames and alternative splicing [6][7][8].After splicing three types of mRNA transcripts are produced one is unspliced, singly-spliced and other is full-spliced RNA [9][10].HIV-1 regulatory proteins (including Rev) are translated from completely spliced mRNA transcripts, while structural proteins are translated from incompletely spliced transcripts [11].Both the single-spliced viral mRNAs and unspliced RNA must be exported from nucleus to cytoplasm where it is translated into all essential structural protein that make complete new virions [12].This nuclear export is promoted by the 13 kDa HIV-1 Rev protein [13][14].Before exporting of RNA, REV protein migrates into nucleus where it undergoes oligomerization and then make complex with RNA [15].REV protein recognizes a highly conserved 350 nucleotide structured intronic RNA element, called the Rev Response Element (RRE) [16].Then the export of RNA is carried out by Crm1 nuclear export pathway [17].First Rev oligomerizes onto the RRE to form a ribonucleoprotein (RNP) complex of 200-300 kDa containing 8-10 Rev molecules [18][19], that binds Crm-1 (exportin-1), GTP-bound Ran, and other host cell proteins via the Rev nuclear export sequence (NES), to promote nuclear export [20].Structure of REV contain 116 a.a in a single chain [21].It contains multiple conserved domain which are known essential for the function of REV.At N-terminal it contains arginine-rich RNA-binding motif (ARM) that specifically binds to a purine-rich bulge within stem loop IIb of the RRE [14] [22].The RRE contains several stem loops such as Stem iA, iiA, iii/ivA, vA, Stem iiB and Stem iiC.The most well characterized being is the iiB for binding sites for REV ARM domain [23] [24].HIV RNA and REV make a ribo-nucleus RNA protein (RNP) complex that have to export from nucleus to cytoplasm.For this export REV protein has a conserved sequence of amino acids At C-terminal (71-82 residues) called NES (Nuclear export sequence) [1].It is also called REV activating domain and rich with leucine residues [17].As befits their biological importance, the interactions whereby Rev recognizes RRE-containing RNAs have been intensively studied, as has Rev's propensity to oligomerize, both in the absence of RNA and in the context of the RRE [23][24][25][26].
Rev is a good target for antiviral therapies, since Rev is absolutely necessary for HIV-1 replication [27,28].It has been shown that various organic compounds have the ability to target the Rev/RRE interaction [29].Neomycin B, diphenylfuran cation, and proflavine are small molecules that can prevent Rev from binding to the RRE sequence [27,29,30].If Rev is incapable of binding to the RRE on the pre-mRNA, the RNA will not be exported to the cytoplasm, also resulting in lack of necessary structural proteins.
In our study we elaborated the large-scale molecular dynamics simulation behavior of a Crystal structure of Rev and RRE complex REV, and free REV protein (homology model), in order to see the structural part of REV protein that help in assembly of REV units and then binding of these units with RNA.Our results suggest that in structural domain (ARM) some residues such as ARG and ASN in position (38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50) are essential for binding with RNA and some residues of oligo-domain (9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26) and (51-65) such as LEU18,22, ILE59,55 and PHE21 provide a central hydrophobic core and are necessary in oligomerization.In this paper we also identified the nucleotide sequence of HIV RNA that are most reactive with REV protein.Our result suggested that some nucleotide of HIV RNA such as U45, 66, 72/G46, 47, 67, 70, and A44 belong to stem iiB of RRE elements are in contact with REV protein residues.The ARM motif of REV is well elaborated structurally and its importance in interaction of viral Rev protein with its highly structured RNA target sequence, the Rev Response element (RRE) is identified.We also elaborated Nucleotides interaction of RRE with REV protein.

Molecular Dynamics Simulation
Crystal structure of Rev and Rev-response-element RNA complex (PDB: ID 4PMI) was downloaded.In crystal structure tow molecules of REV protein (1-72) residues bound with RRE element (stem iiB).Crystal structure were partially charged, Protonated and energy minimized by MOE software.MD simulations of 4PMI and homology model was carried out by version 2016.4 of GROMACS.The AMBER96 protein, nucleic AMBER94 force field was used throughout MD simulations.We used a simple cubic box as the unit cell, the modeled protein was centered in this box and placed at least 1.0 nm from the box edge (-d 1.0).We used spc216.gro,which is a generic equilibrated 3-point solvent model Water to solvate the box.The ions were added to neutralize the system by using the genion algorithm of Gromacs.The structure was relaxed through a process called energy minimization (EM) with steepest descent minimization.The system was equilibrate both at temperature around 300 K and pressure at 1 bar, by using Berendsen thermostat and Parrinello-Rahman barostat respectively.To constrain all bonds Lincs algorithm was used and for integration the leapfrog algorithm was used.
Particle-mesh Ewald (PME) summation was used to account the Electrostatic forces with cutoff was 10 Å for both electrostatics and van der Waals interactions.A 20 ns of md was run with these conditions for both homology model and crystal structure.After MD simulation the results were analyzed by GROMACS tools and VMD.
I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach used both for prediction of protein structure and function of that protein based on its structure [32].The full-length sequence of REV protein from NCBI server with ID mentioned before was submitted in FASTA format to I-TASSER.The first step in homology modeling is identification of threading templates.For this purpose, I-TASSER used LOMETS (metaserver threading approach).This server search template from the PDB library.From targeted template I-TASSER form a set structural conformation called as decoys.For the construction of 3-D models, I-TASSER uses the SPICKER program.This program cluster all the decoys based on the pair-wise structure similarity and reports up to five models.Biological annotation of 5 selected model was the carried out by COFACTOR and COACH [21].

Homology modeling
As there is no homologous protein structures deposited in the Protein Data Bank (PDB) of the HIV-1 protein Rev, so we used Protein threading, also known as fold recognition method to model 3-D structure of REV protein.In Protein threading we searched template that have same fold as query and of known structures.It differs from the homology modeling method of structure prediction as it (protein threading) is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB).Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

Top 10 threading templates used by I-TASSER
I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library.LOMETS is a meta-server threading approach containing multiple threading programs, where each threading program can generate tens of thousands of template alignments [33].I-TASSER only uses the templates of the highest significance in the threading alignments, the significance of which are measured by the Z-score, i.e. the difference between the raw and average scores in the unit of standard deviation.The templates in this section are the 10 best templates selected from the LOMETS threading programs.Table no 1 shows top 10 templates used in modeling and their alignments against query sequence is shown in Figure 1.

Top 5 final models predicted by I-TASSER
For each target, I-TASSER simulations generate a large ensemble of structural conformations, called decoys.To select the final models, I-TASSER uses the SPICKER program [34] to cluster all the decoys based on the pair-wise structure similarity and reports up to five models which corresponds to the five largest structure clusters (Figure 2).The confidence of each model is quantitatively measured by C-score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations.C-score is typically in the range of [-5, 2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa.Since the top 5 models are ranked by the cluster size, it is possible that the lower-rank models have a higher C-score in rare cases.Although the first model has a better quality in most cases, it is also possible that the lower-rank models have a better quality than the higher-rank models.If the I-TASSER simulations converge, it is possible to have less than 5 clusters generated; this is usually an indication that the models have a good quality because of the converged simulations.

Predicted function using COFACTOR and COACH
In the NMR structure of REV protein its α-helical ARM motif bound to the IIB RNA hairpin [8], and a second binding site is in stem IA [35].To find out how the individual subunits of Rev recognize the RRE, we predicted the binding sites of REV model by COFACTOR and COACH based on the I-TASSER structure prediction.While COFACTOR [36] deduces protein functions (ligand-binding sites, EC and GO) using structure comparison and protein-protein networks, COACH [37] is a meta-server approach that combines multiple function annotation results (on ligand-binding sites) from the COFACTOR, TM-SITE and S-SITE program.We only predicted ligand binding sites of Rev model.This result showed that in REV protein model the most interacting domain from residues 36-60 is ARM domain.
Table 2 and Figure 3 shows ligand binding sites.

Secondary analysis of REV sequence by psipred on line webserver
The HIV-REV sequence (residues 1-116; NCBI Reference Sequence: NP_057854.1,www.ncbi.nlm.nih.gov) was used to predict secondary structure of REV by PSIPRED.The structure of REV protein consists of highly conserved motif called helix-loop-helix motif [41].Secondary analysis of REV sequence also showed these conserved motifs.The region from 9-26 (oligomerization domain) that contain conserved hydrophobic residues such as LEU, PRO and ILU and serve as a hydrophobic interface providing surface area for the contact of different units of REV to be organized into highly oligomeric form during binding with RRE elements form alpha-helix structure (Figure 4).The region from 34-60 (Argininerich motif ARM) also form alpha helix.These two  helix are connected with by short coiled loop.Coiled loops are present between these two helix and at N-Terminal (1-10) and Cterminal Domain (60-116) and form same conserved helix-loop-helix motif (Figure 4).The conformation of these helices changes and come close to each other during the binding with RNA.However, no  strand are found throughout the sequence.with RNA these residues help in oligomerization of REV molecules around RNA.The ARM motif (38-50) also showed some structural changes.During the 20ns simulation they change their alpha helix structure into turn and coil after a time of about 10 ns.This also support that during the RNA binding these motifs form fully well organized into alpha-helix structure.
The NES domain in case of free REV (homology model) but not in complex also have structural flexibility.The residue LEU81 form beta strand in this domain.Other residues at C-terminal GLU87, ASP88, CYS89 also form beta strand, this is not in case of well model 3-D structure of REV.Finally, MD simulation analysis clearly showed that REV proteins secondary elements become more rigid with binding to RNA.It may be possible that RRE element make it possible for REV to change its shape and its rigidity [22].For looking structural stability and conformational changes RMSD of the backbone atoms of the whole trajectories of 20 ns of the Crystal structure of Rev and Rev-responseelement RNA complex and homology model of REV was calculated.Figure 6 depicts the RMSD values as calculated for the whole Crystal structure of Rev and Rev-response-element RNA complex (black), and homology model of rev protein (red).There is a large conformational changes between complex and homology model with an increased RMSD of 0.2 nm (in case of complex) and 0.5 nm (in case of homology model).The RMSD of the REV protein in crystal structure in complex with RNA exhibits very little structural fluctuations.It increases until ∼0.35 nm, stays around this value for ∼4 ns, decreases for a short while, and then stabilizes at ∼0.18 nm.The RMSD track of the REV protein (homology model), exhibits almost same pattern as of crystal structure in regard with conformational stability.From starting until the end of the simulation, the RMSD of the REV model is mostly stable.To further analyze the trajectories of the Crystal structure of Rev and Rev-responseelement RNA complex REV and REV protein homology model, we computed the standard deviation from the RMSD for each of its residues, i.e., their root mean-square fluctuations (RMSF).Figure 7 presents the RMSF of the REV protein complex with RNA (black Lines)

The Root Mean-Square Fluctuation (RMSF) Analysis
and REV protein homology model (Red Line).
It should be noted that PDB 4pmi that is Crystal structure of Rev and Rev-responseelement RNA complex REV contain only N-terminal domain of REV protein from 1-72 amino acids.And we are also focused our study to this region which is RNA binding.In general, the crystal structure of REV protein complex with RNA is more rigid and confined and tend to be less flexible than its protein structure without RNA [38].In free form (homology model) the residues LEU21, 22, TYR23, GLN24, SER25 and ASN26 shows higher fluctuation and an average value of RMSF 0.4nm.This can be correlate with that these residues also showed structural flexibility in secondary structure analysis.And in case of complex these residues become rigid and showing very less flexibility as their RMSF values are less than 0.1 nm in complex.The RMSF value of ARM motif (38-50) they showed very little flexibility (0.15nm) and as their secondary structure analysis showed they have little flexibility.In case of complex this motif (ARM) become stable and showing no flexibility (0.05nm).The other two domain of REV protein in free form from residues 65-70 and 90-100 at C-terminal shows higher values of RMSF around 0.5nm.

Hydrogen-bond analysis of of Crystal Structure of Rev and Rev-Response-Element RNA Complex After MD
In the Crystal structure of Rev and Rev-response-element RNA complex REV [39] the R38 make hydrogen bond with U66 and G67 of RNA, R39 with G70, N40 with G47, R41 with G46, R43 with U72, R44 with U45, R48 with A44 and R50 with U72 as in Figure 8.
The residues at position 42, 45, 46, and 47 in ARM motif make no interaction.After MD simulation hydrogen bod analysis was carried out.During MD simulation of 20ns the average hydrogen bond between R38, U66, G67, R39, G70, R41,G46, R48, A44 and R50, U72, U72, are 1.4,1.9 and 1.7, 1.2 and 1.3, respectively.All the trajectories show that these hydrogen bond are constant throughout the MD simulation as Figure 9.But the hydrogen bond between N40G47, R43U72, R44U45 are not constant as in Figure 9.

Solvent Accessible Surface Area (SASA)
Prior to binding of REV with HIV-RNA, REV protein forms a oligomeric complex.Estimated solvation free energy as a function of time of homology model of REV was 0 as compare with crystal structure in complex with RNA was about 5 as in Figure 10 C and D. also support the idea that free REV has ability for interaction.

Conclusion
We have developed homology model of REV protein and MD simulation of crystal structure of Rev and Rev-response-element RNA complex, aiming to characterize the ligand-

Figure 1 :
Figure 1: The top 10 template-query alignments generated by LOMETS.All the residues are coloured in black; however, those residues in template which are identical to the residue in the query sequence are highlighted in colour.Colouring scheme is based on the property of amino acids, where polar are brightly coloured while non-polar residues are coloured in dark shade.
10 threading templates used by I-TASSER.Rank of templates represents the top ten threading templates used by I-TASSER.Ident1 is the percentage sequence identity of the templates in the threading aligned region with the query sequence.Ident2 is the percentage sequence identity of the whole template chains with query sequence.Cov represents the coverage of the threading alignment and is equal to the number of aligned residues divided by the length of query protein.Normal Z-score is the normalized Z-score of the threading alignments.Alignment with a Normalized Z-score >1 mean a good alignment and vice versa.

Figure 3 :
Figure 3: Predicted ligand binding sites.Binding residues are shown in blue ball & stick.

Figure: 4
Figure: 4 Secondary structure analysis or REV sequence by psipred

Figure: 5
Figure: 5 Secondary structure analysis after MD simulation.(A) Crystal structure of Rev and Rev-response-element RNA complex REV (B) REV protein homology model.Amino acids sequence is shown on Y-axis and number of frame is given on X-axis.The colour plate shows the secondary elements T = turn C = coil E= extended conformation B= isolated bridge H = alpha-helix G = 3-10 helix and I = pi-helix.

Figure 6 :
Figure 6: RMSD of Crystal structure of Rev and Rev-response-element RNA complex REV (black colored) and REV protein homology model (red colored).

Figure 7 :
Figure 7: RMSF of Crystal structure of REV and Rev-response-element RNA complex (black colored) and free REV protein homology model (red colored).

Figure: 8
Figure:8 (A) Crystal structure of Rev and Rev-response-element RNA complex REV (PDB: ID 4PMI).(B) ARM(arginine rich motif) of REV protein (C) crystal structure of REV complex after MD simulation of 20ns.ARM motif is shown in yellow ribbion, interacting residues are shown in blue ball&stick and interacting nucleobases are shown in green sphere.

Figure 10 :Figure 11 :
Figure 10: Solvent accessible surface area (SASA) analysis of crystal structure of REV and homology model.(A,B) Total area as a function of time (C,D) Estimated solvation free energy as a function of time.

Table 2 :
C-score is the confidence score of the prediction.C-score ranges [0-1], where a higher score indicates a more reliable prediction.Cluster size is the total number of templates

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 21 February 2018 doi:10.20944/preprints201802.0138.v1
[14]rwise ARM domain has higher values about 330 Å. the NESs domain on the other side that recruit Crm1[40][41]has lower values 274 Å[28].Due to hydrophobic residues (LEU 75, 78, 81, 83, PRO 76, 77 and THR 82. in crystal structure the average SASA of ARM motif is 328 Å that is not difference much as compared with homology model.Crystal structures of Rev and homology model some of residues Rev subunits and positions the ARMs on one side of the Rev oligomer to bind RNA[14].Lower SASA value of residue ILE55 (283 Å in free form and 278 Å in complex) indicate that this residue is essential to maintain the dimeric state of REV.As Edgcomb et al describe that the Rev subunit bound to the junction site forms an extensive surface with the RNA that causes the dimer interface to pivot around a single residue, Ile55, that is known to be critical for dimer integrity and function(Jain and Belasco, 2001; Edgcomb et al., 2008).
calculated that shows much similarity in homology model and crystal structure, as in Figure11 A and B. In homology model ARG residues have the higher value of SASA due to hydrophilic nature throughout the sequence.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 21 February 2018 doi:10.20944/preprints201802.0138.v1 binding
sites of REV protein.The ligand-binding sites of homology model was deduced by COFACTOR indicating that REV has ability to bind with nucleic acids such as RNA.The most interacting residues with nucleic acids belong to domain called ARM (arginine rich domain).In crystal structure of REV we found that R38, 39, 41, 48, 50 and N40 are in contact with nucleotides of RRE.And these interactions are constant throughout MD simulation.In COFACTOR results we identified that domain ARM is interacting domain with nucleic acids.Further secondary analysis of REV model indicates that there are conformational changes occurs between complex and free REV protein.Further, these conformational changes are conformed with RMSD and RMSF calculation of REV in free form and with complex.SASA calculation also divide the protein into its hydrophobic and hydrophilic part and help in identification of role of different domain inside the structure of REV protein.The RNA binding sites that is ARM motif in REV protein is well characterized structurally and functionally.