Protein biochemistry remains the meat and potatoes of biological and medicinal research, and indeed, any disciplines that harness purified proteins to understand interactions at the molecular level. With increased power of resolution and detection, similar experiments are now also routinely used on partially-purified protein mixtures isolated from cells or tissues. Regardless of the origin of the samples to be analyzed, or the goals of the experiment, biophysical methods have their own inherent limits, and pros and cons that need to be considered both during planning of experiments and subsequent interpretation of results.
In this mini-review, we briefly discuss techniques we and others use to understand structure, binding, and activity/function of purified proteins (either recombinantly expressed or native entities isolated from cells). We focus on key scenarios that these methods can address, and where they may be limited in delivering a holistic picture. We hope such discussions will aid researchers—particularly interdisciplinary scientists and chemical biologists alike, not specialized in structural biology/biophysics research, but nonetheless wishing to exploit these tools—in deciding which methods could best match their specific research questions. Indeed, biophysical techniques are particularly useful to get up close and personal with proteins in which one has specific interest. Such an interest in these proteins could come from high throughput screens, literature searches, or through less formal means such as interactions during scientific symposia. Nonetheless, it is likely that once we start to hone in on studying a specific protein, some of our publications and key conclusions will hinge on correct biophysical characterization! Thus, this opinion piece, also set to a menu for a 3-course dinner to spur the broader readership, is written with the spirit that it could help make that happen for researchers if and where applicable.
Protein structure – first orders
As our article focuses on methods to assess protein structure or structure-dependent function, for
amuse bouche, we present an overview on protein structure. Protein structure is a complex, multifactorial problem that cannot be described by a single parameter. This complexity inherent to protein structure can be reflected by breaking it down into four categories (
Figure 1a):
1) primary structure, the specific sequence of amino acids that make up the protein;
2) secondary structure, local structure including β-sheets and helices (the most common of which is α-, but others such as p and 310 exist);
3) tertiary structure, the overall fold of a single polypeptide chain, incorporating all secondary structural elements; and
4) quaternary structure, the convergence of multiple tertiary structures (e.g., oligomerization) as well as cofactor incorporation.
Unsurprisingly, different biophysical methods are applicable to measure specific structural aspects; some go a stage further and focus on specific regions within a protein. We will discuss this aspect in each section.
3. Methods to Determine Three-Dimensional Structure – Returning to the Fold for the Main Course
Having negotiated “the first course”, and identified a protein, its modifications, and truncations, it is now important to start asking some more specific questions. Indeed, although primary structure information is hugely important, in the end, protein folding, i.e., secondary structure and beyond, are key to open a window into the soul of one’s favorite protein. There are several methods applicable to answer these questions.
3.1. Crystallography – Peering into Infinity
Applicability: purified protein
Main structural aspect investigated: all structural elements and ligand associations
Other notes: little limit in size or other parameters
Is it quantitative? N/A
Crystallography is a venerable structural method that has its roots in the genesis of structural biology and structure-guided enzymology. Indeed, since the structure of lysozyme was solved by David Chilton Phillips in 1965, revealing several important aspects of how enzymes function, protein structure has been considered to give crucial insights into the inner workings of proteins. This has extended to ligand interactions, and inhibitor mode of action. With almost 60 years of work behind it, there is now a huge inventory of structures available in the Protein Data Bank (PDB). This is a very useful resource for rationalizing interactions, and predicting mutants. For instance, when we identify an electrophile-sensitive cysteine, we typically investigate its surface availability, and surrounding residues by perusing the crystal structure. Of course, not all proteins have been crystallized, and even less so have been crystallized with specific ligands bound. New additions to the structural armory, particularly α-fold (2) that can predict protein structure from primary sequence have helped broaden the remit of our structural understanding. Of course, care should be exercised when interpreting computed structures and common pitfalls underpinning modern structural prediction tools have been nicely reviewed elsewhere. In our case, we often use Swissmodel to perform homology modeling, for instance of zebrafish proteins, for which there are rarely structures available [
5].
In the context of empirical crystallographic data, conditions under which specific crystal structures were obtained (pH, ionic strength, reducing agents, etc.), the presence or absence of ligands, and also the overall resolution, as well as the B-factors across the crystal structure, are worth closer attention. The B-factor—often referred to as the temperature factor—reflects precision within the structure. This should be relatively low for well-defined atoms, but can become large for atoms whose precise structure is not defined. A similar logic applies to overall resolution: as resolution increases, the B-factor should diminish [
6]. We illustrate this point by putting side by side crystal structures of glutathione-S-transferase with 3.00 Å and 1.97 Å resolution, respectively (
Figure 2a). It is clear that the B-factors for some atoms are much larger in the former. Emerging frontiers of this method include observing chemical/enzymatic processes within crystal structures [
7], although the rigid packing in the crystals may limit the scope of such efforts [
8].
3.2. Electron Microscopy (EM) – Is the Atmosphere Getting Electric?
Applicability: purified protein, or potentially mixtures
Main structural aspect investigated: all structural elements and ligand associations
Other notes: little limit in size or other parameters
Is it quantitative? specific particles in a sample can be classified and quantified
EM is becoming one of the most powerful structural methods. This method is applicable to answer questions pertaining to gross structural arrangements, principally by negative stain EM. But it is also able to give high resolution information (up to ~1.5Å) in the form of cryo-electron microscopy.
Negative stain EM – It’s OK to focus on the negative
Negative stain EM is a technique that is robust and can give gross structural information, particularly of large proteins, complexes and aggregates. It has relatively low resolution, ~10-20 Å, and hence is not particularly useful for small proteins < ~80 kDa, although some reports claim that smaller proteins than this can be visualized [
9]. Negative stain EM is applicable to homogeneous, as well as relatively non-homogeneous samples, and indeed, for the latter case, negative stain EM is preferred to cryo-EM. In this way negative stain EM can give information on conformational heterogeneity, for instance, during enzymatic activation and associated changes in conformational dynamics. We have used this method to classify specific conformationally-distinct hexamers of the enzyme ribonucleotide reductase subunit-α (RNR-α, monomer weight, ~90 kDa) induced by different approved nucleotide therapeutics that target RNR-α (
Figure 2b). We further backed up the negative-stain-EM-derived conformationally distinct states of the resulting hexamers, by demonstrating that different hexamers shaped by different drugs were differentially susceptible to protease digestion.
Cryo-EM – Putting things on ice to get a better perspective
Cryo-EM is a high-resolution method that can give sufficient resolution to allow “complete” structural resolution. Such outputs are possible because at very low temperatures damage caused by electron beams (that is high at room temperature) is limited. Cryo-EM typically requires relatively high structural homogeneity, although strategies to deal with heterogeneity are available [
10].
3.3. NMR – Take Advantage of the Rough and Tumble
Applicability: purified proteins & either small-molecule or biomolecule-based ligands
Main structural aspect investigated: all structural elements and ligand associations
Other notes: large proteins are not amenable to this method
Is it quantitative? capable of providing a large amount of quantitative information
Solution NMR is a structural method that investigates specific spin-active nuclei, for instance, hydrogen or nitrogen, in a sample. As the method is particularly sensitive to nuclear environment, in an NMR spectrum, non-equivalent nuclei within a sample show up as specific peaks, each of which have a defined value, referred to as a chemical shift. In standard protein NMR measurements, 1H and 15N are used in a two-dimensional experiment, to separate out peaks that would otherwise be occluded in one-dimensional experiment. In this experiment broadly known as heteronuclear single quantum coefficient (HSQC), N–H bonds show up as specific peaks in a two-dimensional grid. Thus, the technique focuses specifically on peptide N–H bonds, as well as tryptophan aromatic N–H and asparagine/glutamine NH2 (Figure 2c); typical spectra run from 6-11 ppm for 1H, and 105-140 ppm for 15N. Several of these residues, as well as specific peptide N–H bonds have relatively characteristic chemical shifts that can aid rapid analysis of spectral quality (e.g., glycine typically has relatively low 15N chemical shifts (~110 ppm); glutamine/asparginine NH2 usually have low 1H (~7.5 ppm) and 15N (~110 ppm) chemical shifts and have two protons on same nitrogen; tryptophan aromatic NH, usually has high 1H (10.5 ppm) and 15N (130 ppm) chemical shifts). It should be noted that the natural isotope of nitrogen contains an even number of protons and neutrons within its nucleus, and is silent in NMR. It is thus necessary to feed bacteria for protein preparation with heavy nitrogen (15N, typically in the form of heavy ammonia), in minimal media in order to prepare proteins for NMR.
One of the standard limitations of NMR was classically the size of molecule that can be measured. This limitation traces its roots back to the fact that NMR relies on molecular tumbling, that leads to the line breadth of samples increasing as the size of the molecule increases. This effect is prohibitive for molecules of several kDa’s in size. With the advent of highly powerful spectrometers and transverse relaxation-optimized spectroscopy (TROSY) pulse sequences, that seek to limit the effect of line broadening due to size, it is now quite feasible to measure NMR of proteins in excess of 50 kDa, although, very large proteins are still problematic.
NMR can give a large amount of both qualitative and quantitative structural information. In general, NMR spectra can inform on protein folding. Spectra manifesting well-dispersed peaks are often indicative of a complex, heterogeneous environment provided by a protein structure. Folding states of different mutants can be compared by investigating changes in peaks upon mutation – in general, residues spatially close to the mutated site will change, as NMR is particularly sensitive to chemical environment. However, residues distal from the mutated site should be unchanged. This can be defined by parameters such as minimal chemical shift perturbations, i.e., the difference in chemical shift between each residue. A similar argument applies to ligand binding, where chemical shift perturbations will occur only when an interaction occurs. There are indeed a huge range of programs designed specifically for NMR analysis, peak assignment, and the like [
11].
Determination of whether a protein is folded or not by NMR is more or less independent of the assignment of peaks within the NMR spectrum of the protein [
12]. However, to assign protein structure by NMR requires more detained information both in terms of peak assignment and spatial distribution of the peaks. Oftentimes, further dimensions are needed to separate out peaks better (usually
13C is used). Although NMR usually provides information on through-bond interactions (i.e., unaffected by spatial distribution), it can also provide information on proximity by measuring through-space effects, using the nuclear Overhauser effect (nOe) [
13]. Thus, NMR has all the necessary properties to solve protein structures. Aside from requiring a large amount of computational analysis, such pursuits often require complex protein preparation procedures, which can be technically difficult. Of course, numerous proteins have already had their NMR structures solved, including peak assignments, which simplifies matters considerably. One aspect that is particularly striking when viewing NMR structures of proteins versus X-ray crystallographic data, especially, is that the former show many conformations of the protein (
Figure 2d). This is because in solution the protein structure can “breathe”, whereas in the restricted environment of a crystal mostly the protein is fixed in a (small number of) conformations.
NMR indeed is a uniquely versatile method that lends itself to numerous specialized procedures that we cannot completely cover here. Aside from using heavy isotopes to allow NMR visualization, the incorporation of atoms not present endogenously, particularly fluorine (
19F) open means to simplify spectra, and potentially improve responsivity. There are also numerous NMR experimental approaches that are ideally suited to answer specific biological questions. Protein ligand interactions and screening can be studied by several techniques, including saturation-transfer difference (STD) NMR [
14], a technique based on nOe. Moreover, proteins can show a gamut of different motions, associated with simple bond rotations, and larger conformational changes. NMR is equipped to study many, if not most of these dynamical changes [
15].
3.4. Circular Dichroism – the Right Time to Split?
Applicability: purified protein
Main structural aspect investigated: secondary structure, stability, and ligand association
Other notes: no real limit in size, but high analyte concentration can affect low wavelength absorbance, limiting data acquisition
Is it quantitative? can give percentage of secondary structure, but this is a rough guide
Residue-specific information is typically critical for understanding, for instance, how mutations affect protein structure, or potentially how ligand binding occurs. However, in many instances, a more global view of protein structure is sufficient, or perhaps even preferred. In this case, CD can be particularly useful. This technique informs on protein secondary structure. Characteristic CD spectra for α-helices and β-sheets are established, and are significantly different from each other and unfolded polypeptide chains. Based on these behaviors, there are several programs that can assign structural composition based on CD spectra, although these likely should be interpreted carefully. One simple test is to use analogy to published crystal (or more ideally NMR) structures to see if the secondary structural characteristics are sensible.
We have regularly used CD spectra to compare gross structural similarities across recombinantly expressed mutants. We have also used CD to show that there are gross structural changes in specific proteins when they are treated with reactive electrophilic ligands. Other uses of CD include measuring thermal stability of proteins. This technique can also be applied to assessing mutant proteins, which ideally should be similarly stable the wt-protein. Ligand binding can also be measured. In general, ligand binding should stabilize bound proteins, typically with a change in temperature related to the Gibbs free energy of binding-the higher the affinity, the larger the thermal stabilization [
12]. Assays that use fluorescent-dye binding to assess protein stability can give similar information, and are amenable to high-throughput experimentation [
16]. However, CD remains a preferred method if relatively few mutants or conditions are to be investigated.
3.5. Small Angle X-ray Scattering (SAXS)
Applicability: purified protein (ideally free of aggregates, and contaminants)
Main structural aspect investigated: protein aggregation, size, shape, and ligand binding (relatively low resolution)
Other notes: no real limit in size
Is it quantitative? can provide kinetics of information on structural changes
SAXS is a solution method that can inform on relatively large structural perturbations/transitions/polydispersity in macromolecules. In many ways it is thus complementary to crystallography and other similar methods that offer more profound structural information. In several instances it has been used in conjunction with those methods. SAXS can provide several parameters that correlate with important physical parameters in solution. These include the radius of gyration,
Rg, which relates to the overall size of the molecule in solution, molecular weight and maximum dimension (
Dmax). These parameters can be compared to those calculated for specific proteins, to understand specific aspects of a protein such as folding and flexibility. Improvements in data collection, and analysis have allowed for several improvements in rapidity of SAXS experiments, and uses available [
17,
18]. Nonetheless, SAXS requires a strong X-ray beam source and is usually performed at a specialist facility using synchrotron sources.