3.2. Structural and Physicochemical Characterization of LDM Molecules
A computational structural analysis of the maximum distance between the reactive centers was carried out for each molecule to define the available reaction space once the first moiety is attached to the protein (
Table 1). Distances cover a range from 12.5 Å (molecule 6) to 21.5 Å (molecule 9), assuring a wide range of contact possibilities with respect to a typical protein surface (
Table 1).
In addition, key molecular properties, such as solubility in water, were estimated through the dedicated software Chemicalize (
Table 2). The theoretical solubility in water highlighted that the aromatic ring in the spacer, present in molecules 1-4, lowers the solubility of the LDM molecule to less than 0.2 mg/mL. Even removing the aromatic ring, an increase in the number of methylene groups above 4 resulted in a drop of water solubility (molecules 8 and 9). Solubility was then experimentally evaluated by preparing 1 mM stock solution of the molecules in water added of 1% DMSO and exploiting the spectroscopical features of the molecules, since the benzaldehyde group exhibits absorption maxima at 363 and 281 nm, and fluorescence emission at 490 nm (
Figure S1). These signals provide a straightforward method to determine the concentration and solubility of the LDM molecules (
Figure S2).
Experimental data confirmed the solubility trend, since only molecules with predicted solubility equal to or greater than 0.5 mg/mL were shown to be experimentally soluble as 1 mM stock solution with 1% DMSO (
Table 2). Also, the pKa values were computationally estimated, resulting to be 8.91 for all molecules.
To mimic the reactivity of amine side chains in a protein, the LDM molecules were reacted with NAL, an amino acid derivative in which only the side chain Nε is available for the reaction with an aldehyde group, thus mimicking the reaction with a lysine residue within a protein sequence (
Scheme S1). The reaction of the LDM molecules with NAL caused spectral changes: difference spectra with respect to that of the unreacted molecules showed variations in the UV-Vis range, with maximum differences at 300 and 378 nm (spectra for molecule 5 are reported as an example in
Figure 3a). The time course of the absorbance signal at 300 nm shown in
Figure 3b was fitted to an exponential equation, yielding a half-life time of 4.87 ± 0.29 h. This suggests that the reaction can take place in a time frame typical of protein sensors functionalization protocols.
3.3. Structure-Based Analysis of GFP Reaction with the LDM Molecules
After characterizing the LDM molecules and their reactivity towards the target amino acid under protein-compatible conditions, we proceeded to evaluate the application of this chemistry on a more complex system, selecting GFP as a model sensor protein. GFP and its engineered variants are widely employed as fluorescent biosensors, with a broad range of GFP variants developed to modulate spectral properties and confer sensitivity to environmental parameters such as pH, redox state, and ligand binding [
17]. In this context, we considered the possibility to introduce covalent modifications at defined positions on GFP variants with different sensing capabilities as a valuable objective.
A necessary condition for the reaction to occur is that the distance between the two centres on the LDM molecules is compatible in terms of length and accessibility to allow both reactions on the protein. The nature of the spacer is determinant in this chemical modification approach, that is proposed to reach different, single sites on the protein.
In view of chemically modifying a protein sensor for labelling or its anchoring on a solid support or surface, detailed structural information on the protein is required to identify which surface regions or residues are more suited to sustain a chemical modification without altering the protein sensing function. So, to challenge the potential of LDM approach in these terms, we evaluated if it is possible to predict and direct, through a structure-based analysis of the protein reactivity landscape, chemical modification on a desired amino acid side chain by modulating the linker length. The proposed structure-based computational method is designed to offer a comparison between GFP inter-residue distances and the spacer length of LDM molecules.
The NCONT module of CCP4 suite enabled the calculation of all distances between a selected histidines and lysines of the protein in a single procedure. We focused our comparison on molecules 5, 6 and 7, in view of following experimental validation, for two reasons: a) these molecules are the most soluble, with comparable solubility limits, thus avoiding any effect of partial solubilization on modification yield; b) they differ by only a minimal structural variation, i.e., by the removal (molecule 6) or addition (molecule 7) of a single methylene unit relative to molecule 5 (
Table 1). The resulting data were visualized as a heatmap (
Figure S3), which allows direct comparison between inter-residue distances and the spacer lengths of LDM molecules. This graphical representation provided rapid identification of residue pairs whose distances fell within the range accessible to the distances of the two reactive groups in LDM molecules, while excluding residue pairs located too close or too far relative to the length of the molecules. The final output of this structure-based analysis is a set of residues\distances combinations which were compared with the maximum distance for each LDM molecule. The distance analysis is functional to a compatibility score, which was modelled to reflect the conformational hypothetic structural behavior of the linchpin reagent, reaching a maximum when the inter-residue distance matches the optimized molecular span, and progressively decreasing for shorter distances. A shallow decay was applied within the first ~2 Å to account for limited conformational flexibility, likely occurring given the rotational freedom of the spacers, followed by a steeper decline beyond this range to penalize increasing molecular distortion, with the score approaching zero at ~7 Å below the optimal distance (
Figure S4). Distances exceeding the optimized span were assigned a score of zero, as such geometries are not physically accessible. The matching of distances between reactive groups in LDM molecules and of Lys-His pairs is reported in
Figure 4, and the differences between these distances were used to calculate the compatibility score (from 1, green, to 0, red).
From the analysis it emerges that for each molecule the compatibility score pattern is significantly different, consistent with the fact that, if the reaction is mainly governed by the distance parameter, a single methylene group can be sufficient to differently direct the reaction on different sites, even though not on a unique one. Moreover, structural analysis suggests that several of these residue pairs are unlikely to engage in plausible interactions. Although they are spatially close, they are separated by intervening regions of the protein, preventing any feasible linker-mediated reaction. These pairs (in grey in
Figure 4, and in red in
Figure S3, panel B) were excluded from further consideration. However, the reactivity of both amino acid side chains cannot be given for granted in all cases, since steric hindrance, local polarity, protonation state, etc., can impact in addition to the geometric distances. An assessment of lysine and histidine intrinsic reactivity, influenced by the surrounding environment, was also included in the computational workflow to evaluate the reactivity landscape. The results returned by propKa tool [
15] (
Table S2) showed that the predicted pKa of lysines ranged from 10.5 (reference standard value) to 9.0. Since the reaction is carried out at pH 8.0, as in most protein chemical-modification protocols, all lysine residues are expected to have only a very small fraction in the unprotonated, nucleophilic form. Consequently, the observed differences should have only a marginal effect on reactivity toward the hydroxybenzaldehyde group, resulting in a slow but kinetically homogeneous multiderivatization process. On the other hand, predicted histidine pKa values showed larger variations. His25, His77, His81, and His217 show pKa values between 6.0 and 7.4 (
Table S2). These values are not too far from the canonical pKa of a free histidine (~6.0). In contrast, the remaining residues (His148, His16 and His181) are estimated to have pKa values of 4.0 or lower, suggesting a state in which the imidazole group is completely unprotonated and therefore more prone to react with the epoxide group of LDM molecules.
3.4. Experimental Validation of GFP Reactivity with LDM Molecules
GFPmut2 was overexpressed and purified by affinity chromatography in a fast and high-yield procedure (
Figure S5). The absorption spectrum of the protein at pH 8.0 shows two peaks, one centered at 278 nm, accounting for the aromatic residues, and one centered at 485 nm attributed to the deprotonated form of the chromophore, that has a pKa of about 6.0 [
18]. The reactivity of GFP with LDM molecules was then tested. Similarly to free amino acids, the conjugation of GFP lysine residues can be monitored by exploiting intrinsic spectroscopic properties of LDM molecules. The results would clarify whether adding or removing one methylene in the spacer is sufficient to differently direct the final conjugation on His, or whether larger differences or an improved bond rigidity is necessary.
Noteworthy, it is not possible to discriminate if the reaction has reached Intermediate 1 (benzaldehyde bound to lysines) or has already proceeded to Intermediate 2 (
Scheme 2), forming a bridge adduct with a histidine residue, because these two intermediates are not expected to be spectroscopically different since the epoxide reaction does not directly perturb the benzaldehyde chromophoric core. Superimposition of the difference spectra with those obtained in presence of NAL confirmed that the observed spectral changes correspond to the conjugation of lysines, as the overall profiles were highly consistent and shared the characteristic band centered at ~300 nm (
Figure 5a).
The spectroscopic properties of the LDM molecules allowed real-time tracking of the reaction and estimation of kinetic parameters. The half-life, determined by fitting the absorbance time profile at 302 nm with an exponential equation (Inset
Figure 5a), was estimated to be 4.2 ± 0.12 h, a value comparable to that obtained for the free amino acid derivative NAL, that suggests a high and homogeneous reactivity of the lysine residues on the protein. Moreover, the reaction does not seem to perturb the protein structure, as demonstrated by the overlap of the absorption spectrum in the visible region before and after the reaction occurred. The spectroscopic features of LDM molecules were also helpful in determining the DOL, intended as the ratio of probe concentration to total protein fraction.
DOL for molecules 5, 6 and 7 was comparable, with an average value of 8.33 ± 0.59, confirming that multiderivatization occurs, in agreement with the number of available lysine residues on GFP. The following displacement reaction with hydroxylamine brought a decrease in the anchored LDM molecules, in agreement with the mechanism in
Scheme 2, and the formation of Intermediate 3. For all molecules, the final DOL upon displacement is around 1, supporting a single derivatization for each molecule (
Figure 5b).
Since the reaction of epoxide is spectroscopically silent, we confirmed the formation of Intermediate 3 (
Scheme 2) by MS using molecule 4. Upon derivatization, the GFP sample was analyzed as an intact protein, and a peak at 27740.77 m/z appeared in addition to the 27276.57 m/z peak of the unmodified GFP. The mass difference corresponds exactly to the Intermediate 3 adduct of molecule 4 with one histidine (464.20 g/mol).
The observation that the initial multiderivatization step led to the formation of Intermediate 3 with a DOL of approximately 1 suggests that the reaction proceeds according to the proposed mechanism (
Scheme 2). However, at this point, it was not clear if the modified histidine is only one, or a distribution of monoderivatized GFP molecules occurred, and which was (or were) the involved residue(s). This latter information was fundamental to answer our starting question, i.e., if it is possible to predict and direct the covalent modification on a specific amino acid, thus allowing a site-selective labeling and anchoring of a protein biosensor without the need of site-directed mutagenesis.
To determine the modification site, GFPmut2 upon derivatization with molecules 5, 6 and 7 and following displacement was digested and MS\MS analysis was performed. The mass spectrometry analysis pointed out that His181 was the only modified residue upon displacement, with no evidence of Lys adduct, as expected from the reaction mechanism (
Scheme 2).
The experimental results were then structurally evaluated to explain the obtained selectivity. Firstly, we observed that the His181 belongs to the histidine residues with low pKa, and hence completely deprotonated at the reaction pH (
Table S2). The site of labeling is compatible with structural computational evaluation (compatibility score between 0.28 and 0.91) and between the most reactive ones in terms of pKa. However, this site is the same for all LDM molecules, suggesting that the spacer did not result to be relevant in the selection of the conjugation site. Indeed, structural evaluation of the modified histidine revealed that His181 adopts a conformation in which the imidazole of the side chain in the structure rotated towards the internal part of the barrel, occupying a solvated, accessible position, though not on the protein surface (
Figure 6). To promote histidine conjugation, the role of Lys166 is hypothesized based on structural analysis. In fact, it is the only available residue based on distance and steric hindrance evaluation (
Figure S3).
LDM reagent is so kept in close proximity to the imidazole ring, thereby limiting conformational freedom and increasing the local concentration of the reactive group compared with freely exposed histidines. Within the reaction time window, linchpin anchoring through a lysine residue is therefore expected to accelerate the epoxide reaction with the imidazole, a process that would otherwise require days to weeks in solution [
19]. This structural context, together with the reactivity of the imidazole, surrounded by water molecules and available for proton exchange inside the sensor [
20], makes this residue particularly suitable for derivatization.
These results highlight an alternative paradigm for biosensors, and more generally for protein bioconjugation: usually covalent modification (whether stry allows modification on internal yet accessible residues. This cavity-directed anchoring does not rely on a specific binding affinity, as observed for ligands targeting enzyme active sites, but rather on reversible interactions with multiple lysine residues on the protein surface, which increase its local concentration, thereby making the subsequent reaction kinetically accessible.
The comparable reactivity observed for LDM molecules endowed with spacers of different lengths indicates that, when this kind of accessible and reactive internal residues are available, spacer architecture is not the primary determinant of site-selectivity. This contrasts with conventional lysine-directed modification strategies, in which spacer geometry often would direct final conjugation on different sites. Instead, we found it possible also that the protein scaffold guides a geometrically constrained funneling effect that directs the reactive group toward a unique, accessible site.
The chemical modification did not perturb the protonation equilibrium of the chromophore; indeed pH-titration of the modified protein allows to estimate a pKa of 5.9±0.4 (
Figure S6), in full agreement with data in literature for the unmodified protein [
21]. To our knowledge, His181 is not among the residues that have been involved in the generation of mutants with different spectroscopic features, hence expanding the applicability of this chemistry to the series of sensors based on this protein scaffold.
These findings on GFP protein sensors expand the scope of LDM-type reagents, demonstrating that they can operate in both surface-directed and cavity-directed modes, and that their ultimate selectivity can be dictated by protein structure and surface landscape rather than solely by the spacer design. This alternative mechanism offers a strategy to access cryptic, pocket-located functionalization sites that are not addressable through conventional surface-based approaches and may offer be a valuable alternative to achieve protein sensor modification more protected towards degradation reactions that can impair sensors transduction and functioning.