The Role of Zinc Finger Linkers in Zinc Finger Protein Binding to DNA

Zinc finger proteins (ZFP) play important roles in cellular processes. The DNA binding region of ZFP consists of 3 zinc finger DNA binding domains connected by amino acid linkers, the sequence TGQKP connects ZF1 and ZF2, and TGEKP connects ZF2 with ZF3. Linkers act to tune the zinc finger protein in the right position to bind its DNA target, the type of amino acid residues and length of linkers reflect on ZF1-ZF2-ZF3 interactions and contribute to the search and recognition process of ZF protein to its DNA target. Linker mutations and the affinity of the resulting mutants to specific and nonspecific DNA targets were studied by MD simulations and MM_GB(PB)SA. The affinity of mutants to DNA varied with type and position of amino acid residue. Mutation of K in TGQKP resulted in loss in affinity due to the loss of positive K interaction with phosphates, mutation of G showed loss in affinity to DNA, WT protein and all linker mutants showed loss in affinity to a nonspecific DNA target, this finding confirms previous reports which interpreted this loss in affinity as due to ZF1 having an anchoring role, and ZF3 playing an explorer role in the binding mechanism. The change in ZFP-DNA affinity with linker mutations is discussed in view of protein structure and role of linker residues in binding.


Introduction
Zinc finger protein (ZFP) is present in the transcription factor (TFIIIA) as nine zinc finger domains [1]. Transcription factors containing a Cys2-His2 zinc finger domain functions in specific DNA recognition [2][3][4]. The three finger protein is organized in tandem arrays linked by peptide linkers (Figure 1). Zinc finger proteins play a role in DNA replication and repair, transcription and translation [2] [5].The protein structure is stabilized by a tetrahedrally coordinated zinc ion to two histidine residues in the α-helix and two cysteine residues in the βsheet. Figure1: Major amino acid residues on each linker in ZFP bound to DNA target (5'-GCG TGG GCG-3') average structure after MD simulation. Finger1 (Red); linker1 (blue) with labeled residues T28G29E30K31P32). Finger 2 Yellow; linker2 (pink with labeled residues T56G57G58K59P60) and finger three (green).
Experimental work revealed that zinc finger protein makes both specific and nonspecific contacts with DNA. In specific contacts, the amino acid residues in positions -1, +2, +3, and +6 in the αhelices bind specifically to the DNA sequence 5'-GCG TGG GCG-3' [7] [8]. The binding energy of the protein to its specific DNA target is an enthalpy driven process resulting from amino acid side chains contacts with specific DNA bases [7] [8] [9] [10]. Displacement of bound water molecules from the protein-DNA interface and hydrogen bonds to back bone phosphates also contribute to the entropy of binding [11][12] [13].

F1
TGQKP F2 TGEKP F3 Linker1 Linker2 The length of linkers one and two are 14.5 Å and 14.4 Å which gives enough flexibility to the Zinc fingers to bind DNA targets. Indeed, Pabo et al [6] used phage display and modeling of three separate fingers docked on B-DNA and obtained a distance of 18.1 Å between F1 and F2 and 17.7 Å between F2 and F3, this led him to conclude that there is a need for finger-finger interaction and DNA twisting to shorten these distances for binding to take place [6] [14] [15] [27].
Linking more than three fingers using TGEKP to target more than 10 bases showed reduced affinity [19]. Designed ZF motifs with more flexible linkers were found to reduce the strain and to improve specificity, such designs made the targeting of unique DNA sequences more possible and aided the building of various sets of 6 fingers linked by varying linkers [19] [26] [27]. The linkers in the first three fingers in zinc finger protein in TFIIIA are dynamically disordered in solution and they show a well ordered structure upon binding to DNA [16,20,28].
Choo et al [21] investigated the effect of mutating the linkers and measured the protein-DNA dissociation constants, the result was that the affinity of point mutants T(L), K(S),P(G) of the linker TGEKP was reduced by a factor ranges from 7 to 13-fold, also reported a 24-fold loss in binding affinity for the G 29 P mutant. The mutations of three amino acids in the linker (TGQKP) using site directed mutagenesis was found to reduce the ZFP binding affinity to DNA by 8-fold [17]. A 50-fold reduction in specific binding was reported upon mutation of linker one (TGEKP) to SEQKP and this was interpreted as an effect on specificity [17]. NMR spectroscopic study of zinc fingers showed that the interactions between fingers initiated by linkers contributed to the affinity of binding [29] [20]. Crystal structures and NMR spectra of the ZFP-DNA complex revealed the formation of a hydrogen bonding between the gamma oxygen (O γ ) threonine (T1) on the α-helix and the back-bone amide of the linker's amino acid E 3 . This interaction is known as "C-capping" and acts as a lock on the specific site once the mobile ZFP finds its target [20].
Experiments demonstrated that the intrinsic flexibility of linkers is needed for diffusion of ZFP along the DNA in search of its cognate site [14] [15]. This aids in the dynamic DNA scanning process in which ZFP binds nonspecifically to DNA before it switches to its specific site [14].
Mutating key amino acids in certain positons, namely: -1, 3 and 6 reflects on specificity and affinity[8] [7]. kinetic and linker mutations studies proved that linker flexibility and charges of residues involved are expressed in the dynamic nature of the DNA binding process. Positions of residues in the linker sequence enable the fingers to twist to an optimum position for binding.
Linkers establish a balance between the scanning of DNA (nonspecific DNA binding) and other thermodynamic aspects on the way to establish an equilibrium between search and recognition. Linker mutations shifted this equilibrium in a way to influence the affinity and/or to accelerate the kinetically driven search process [14] [15]. Experimental studies on mutating T28 to P32 residues in linker one (L1) showed pronounced effect on the protein binding to DNA [18][20] [17] [30].
In view of the above findings, it became worthwhile to study the energy effect upon linker mutations in relation to DNA binding process. The DNA binding energies of ZF protein with varying linker sequences are expected to show variations [7] [8]. The outcome will help scientists to build zinc finger proteins with various linker sequences and tune these motifs to expand the binding to more specific DNA targets [28]. Linker design must make sure that the target DNA remains accessible to ZFP, also to take into consideration the major and minor grooves of DNA In order to preserve the coordination structure of the tetrahedral zinc ion to Cys2His2, zinc ion library files and coordination parameters were loaded to the Amber library [37]. The zinc library files were always loaded to the editor prior to loading the protein-DNA complex. The system was always checked for any changes in zinc ion coordination after each run and no significant changes were observed. The complexes were solvated in a TIP3P cubic water box [34] with dimensions of 15.0 Å. Na + ions were added to neutralize the system. The structure was checked for errors and then converted to topology and coordinate files.
The particle mesh Ewald method was used for treating long range electrostatics, a 9Å cutoff was set for long range interactions. The force field energy of each structure was minimized by progressively relaxing the system before starting the MD simulations.
Minimizations were performed employing steepest descent followed by conjugate gradient minimizations (1000 cycles in tandem). After relaxation of the system it was heated to 300K applying harmonic restraint (10 Kcal/Å 2 .mol) on solute. This was followed by an unrestrained 2ns MD simulation at 300K and 1 atm to equilibrate the system and adjust the density. The Shake algorithm was used to constrain hydrogen atoms in order to enable a longer time step (2fs) in the simulation. A Langevin thermostat with 2 ps -1 collision frequency and weak coupling barostat with 2 ps of relaxation time were employed. Production MD simulations were carried out for 50, 100 and 150 ns and gave converged trajectory. Trajectories were collected at 2 ps intervals, these trajectories were used to calculate the binding free energy using 'MMPBSA.py' script, 50 or 100 frames were used in calculations (for details see supplements S-methods-1). The binding energy of the Zif268-DNA complex was calculated using both MM_GBSA /PBSA methods, entropy contributions were calculated using the normal mode. All properties were monitored during simulations to avoid any sudden jumps in system properties (see Figures 1-S, 2-S,3-S and S-an -6 in supplements), [38]. The temperature rises regularly from 0 K to an equilibrium value of 300K during the simulation indicating that Langevin dynamics worked effectively. Over the remaining part of the simulation the system was kept under constant pressure to make sure that the system has reached the equilibrium state [39][40] [41]. Both binding sites used in this study (5'-GCG TGG GCG-3') and (5'-GCG GGG GCG-3') were reported as consensus high-affinity binding sites for Zif268, with a slight difference in their binding affinity with ZF in favor of the former [42][43][44]. The nonspecific binding was studied using the nonspecific DNA site (5'-GCA GAT TCC-3') studied previously by 1 H NMR as by [15][14].

Analysis of MD trajectory:
In all trajectory files cpptraj facility in Amber was used to strip water and Na + ions. RMSD analysis on wild type protein and its mutants with and without complex with DNA showed stability of all systems and it also showed that both the proteins and their complexes are well equilibrated. All RMSD plots indicate reasonable stability for all mutants and WT protein bound to DNA after 100 ns MD simulation.
Pairwise RMSD for specific snapshots was Computed using pytraj in Amber. The RMSD to the experimental structure reference was computed, then, pairwise RMSD for first 50 snapshots and skipping every 10 frames was computed (For output see Figure  Clustering analysis [45][46] was performed using mdtraj, numpy and scipy [47] and pytraj (Amber.jupyter notebook) in AMBER. Finding centroid (representative structure for group conformations, this group might potentially come from clustering using Ward hierarchal clustering ways to define centroid. Using an md.rmsd [47] algorithm to compute all pairwise RMSD's between conformations, transform these distances to similarity scores: To check the effect of MD simulation on the protein structure: The average structure was created after simulation and converted to pdb files (using ambpdb program in Amber) and structures superimposed on the original structure using pymol [48], no evident changes in the structure were observed. (for details, see S-methods-4 in supplements).

Zinc Parameters and binding
In order to preserve the coordination structure of the tetrahedral zinc ions to Cys 2 His 2, amino acids were manually modified to enable the binding to zinc ions, by manually modifying CYS to the Amber deprotonated CYM and histidine bound to zinc where modified from HIE to the Amber recognized deprotonated HID, zinc ion library files and coordination parameters were prepared and loaded to the Amber library. The zinc library files were always loaded to the editor prior to loading the protein-DNA complex. Add atom types for the ZAFF metal center then the library for atomic ions loaded using the command Load ZAFF prep file and Load ZAFF frcmod file then the zinc bonded to the CYM and HID using the command bond.

Calculation of the Free Energy of Zinc finger interaction with DNA bases:
Before using MM-GBSA [31][32] [49][50] the system equilibration was verified by considering temperature, density, total energy and root mean squared deviation of coordinates (RMSD). An RMSD value relative to the crystal structure of 1.5Å was deemed acceptable. Extensive analysis of trajectory was performed to make sure the energies calculated are reliable depending on the snapshots (see Analysis section). The resulting trajectories were analyzed using MMPBSA.py script [33][51] [52].
MM_GBSA method in AMBER18 was employed for the calculations:  The RMSD analysis showed that both the wild type protein and its mutants reached a well equilibrated state. The affinity of WT ZFP and its mutants in Table 1 to a specific DNA target are used to study the structural role in the ZFP DNA scanning before achieving specific binding. The DNA binding process starts with scanning (search followed by recognition (specific binding)).

Table1: ZFP linker point mutations. The point mutant residues shown in red color
The equilibrium state is a result of a balance between dynamic (nonspecific) binding and a specific binding. Any loss in affinity is interpreted as loose binding of ZF1 due to conformational changes in the protein resulted from mutations which confirms the anchoring role of ZF1 in the binding process. Weak nonspecific ZFP-DNA interaction shifts the conformation towards the search mode, while strengthening this interaction shifts the conformation towards recognition mode. point mutants of linker one (L1) and linker two (L2) were built using pymol as shown in Table 1.The enthalpies, free energies of binding of mutants to the specific DNA seqence (5'-A GCG TGG GCG T-3') were calculated (see Tables 1-S, 1

-SB, 2-S, 3-S and 4-S in supplements)
and plotted along with energies of the WT protein binding. (Figures 4 and 5).  Table 2).  [54]. Some mutants in both L1 and L2 showed loss in affinity except Q30E in L1 and T56Y, P60 mutants in L2, see plots of ∆∆HGBSA in Figure 4B and ∆∆GGBSA 5A, these results are in agreement with the reported site directed mutagenesis results where three linker amino acids were substituted in L1 and L2, with corresponding amino acids from p43 [17].   The effect of residue type, vdw volume, charge and position in linker on ZF protein affinity to DNA is evident (see data in supplements Table 5-S). In K31D mutant the loss in affinity took place upon the change from positive to negative residue with a smaller radius, where in L2 E58Q suffered maximum loss in affinity amongst L2 residues and the change is from negative E to neutral Q with comparable volume. The loss in affinity to DNA upon mutation of either T1 or E3 ( Figure 5A) confirms the previous finding of the presence of hydrogen bonding between the backbone amide of E3 and the side-chain O ɣ of T1 in the linker [20]. It was suggested that these DNA-induced C-capping interactions provide a means whereby the ZF protein-complex, which showed flexibility in the unbound state when searching for its target DNA sequence, once the target DNA sequence is found the protein locks in place. These observations support a rationale for the conservation of the TGEKP linker sequences in zinc finger proteins   In search mode, the positively charged ZF2 and ZF3 are bound to DNA while ZF1 remains dissociated from nonspecific DNA target. Indeed, the nmr study revealed the dynamic nature of the binding process expressed as a shift between search and recognition modes. The study of WT zinc finger protein and T23/Q32E linker mutants to the non-specific DNA target revealed that ZF1 is most mobile amongst the three fingers in the search process before binding the target [15]. T28K and Q30E mutants showed higher energy barrier than the WT protein resulting in a kinetically slower binding process, i.e, process shifts towards search [15]. The loss in binding to non-specific DNA target showed that ZF1 dissociates and causes the loss in affinity in linker mutants, this change in binding is attributed to changes in the composition and distribution of charged residues in the binding interface.
Values of the binding energies of the wild type zinc finger protein and its single point mutants to DNA were calculated, see plots in

Hydrogen bonding:
The hydrogen bonds between ZF protein and DNA were discussed in previous report [8]. The method used here is Baker Hubbard which identifies hydrogen bonds based on cutoffs for the Donor-acceptor distance and the angles. The criterion employed is θ>120º and r H-acceptor <2.5Å in at least 10% of the trajectory. The return value is a list of the indices of the atoms (donor, H, acceptor) that satisfy this criteria, an example of the plot of frequency of hydrogen bonds variation with donor-acceptor distance is shown in Figure 8. appeared in E58G mutant [8]. R-1 in ZF2 in the WT shows 2 specific contacts with G, these contacts changed to 3 nonspecific with phosphates in linker mutants. In K31N mutant, one specific contact was preserved. Residue S1 of ZF2 showed nonspecific contact with G and another with C in the wild type disappeared in mutant proteins ( see Table 3)[7] [8]. R-1 of ZF3 gives one nonspecific contact with G in WT protein, this contact (R-1 from ZF3 bond to G) has changed in K31N mutant to a specific contact with G(O). The nonspecific contacts of histidine and lysine were preserved in both the WT and mutant proteins. The highest frequency of short H-bonds in the WT protein was observed for C2 as donor to Glu as acceptor ( Figure 8 and Table 3). While in both E58G and K31N mutants, this high frequency of hydrogen bonds shifted to Arginine nonspecific contacts with C and G Bases (see Figure 8B and C) see 8-S in supplements for hydrogen bond listing.
Direct recognition is due mainly to hydrogen bonding between residues and DNA bases and the indirect recognition is due to hydrogen bonding to the back bone phosphates, other contributions to binding like electrostatic attractions, water mediated hydrogen bonds, hydrophobic forces and ion release in addition to DNA deformation. Figure 9 shows how the hydrogen bonding between ZF protein and a nonspecific DNA target varies with simulation time which implies a relationship between establishing the equilibrium position and H-bonding. High frequency and short distance of C(2) -Glu 96 hydrogen bond in ZF -specific DNA complex (Figure 9(A)) to that in ZF-nonspecific DNA complex, a loss in bonding is observed and the bonding was replaced with several Arg -backbone phosphate bonds in the nonspecific

DNA complex
Concluding remarks: In previous reports [7] [8], we have established that the affinity of ZFP to a specific DNA target is larger than the sum of affinities of individual fingers (no linkers involved). This difference was attributed to protein structure including linkers. Also we reported that a ZFP consisting of F1F2 which lacks F3 is destabilized more than an F2F3 protein which lacksF1, also mutating F1 had a maximum effect on reducing the ZFP affinity to DNA amongst the three fingers, this finding gives a special role to ZF1 and consequently L1, this is followed in importance by the role of ZF3.
The affinity of ZF protein to its specific DNA target showed sensitivity to linker mutations. The binding energy of the ZFP to DNA target results from complex interactions in the major and minor groves, i.e. hydrogen bonding of amino acid side chains in fingers with DNA bases (specific) and to backbone phosphates (nonspecific). ZFP structure has a direct effect on binding to DNA, the finger-finger and finger-DNA interactions are important factors contributing to hydrogen bonding and hence binding energy (affinity). Mutations in the protein sequence and measuring the affinity to specific and nonspecific DNA targets give insight on the mechanism of binding, the protein DNA binding process starts with scanning the DNA in search of the specific sight (low affinity for nonspecific binding), then upon recognition of the target the specific binding takes place (ZF1 anchoring and both ZF2 and ZF3 bound, high affinity for specific binding). The affinity of WT ZFP and its mutants to nonspecific DNA target is reduced indicating a shift to search mode [14].
Binding of ZFP to both specific and nonspecific DNA targets showed sensitivity to mutations in linkers between ZF1, ZF2 and ZF3, this finding confirms the role of linkers in tuning the zinc fingers to an optimum position [23]. The largest reduction in affinity to specific DNA target was for K31D (residue 4 in L1) and E58Q (residue 3 in L2). The reduction in affinity is related to the ZF1 search and recognition process. These results give more insights on the effect of energy of ZF protein DNA binding upon mutation of both protein Fingers and the DNA target [7] [8].