Open-Bundle Structure as the Unfolding Intermediate of Cyto- chrome c’ Revealed by Small Angle Neutron Scattering

The open-bundle structure of cytochrome c’ as an unfolding intermediate was determined by small-angle neutron scattering experiment (SANS). The four-α-helix bundle structure of Cyt c’ at neutral pH was transited to an open-bundle structure (at pD ~13), which is a joint-clubs consisting of four clubs (α-helices) connected by short loops. The compactly folded structure of Cyt c’ (radius of gyration, Rg = 18 Å for the Cyt c’ dimer) at neutral or mildly alkaline pD transitioned to a remarkably larger “open-bundle” structure at pD ~13 (Rg = 25 Å for the Cyt c’ monomer). Cyt c’ adopts an unstructured random coil structure at pD = 1.7 (Rg = 25 Å for the Cyt c’ monomer). Numerical partial scattering function analysis (joint-clubs) and ab initio modelling gave structures similar to the “openbundle”, which retains the α-helices but loses the bundle structure.


Introduction
Protein molecule is folded into a distinctive structure to express the unique role in many biological systems. The mechanisms of protein folding/unfolding have been studied long time, because the knowledge of protein folding/unfolding is a key issue to predict the three dimensional protein structure from the amino acid sequence for the developments of medicinal drugs and functional cosmetics.
Anfinsen showed that most small proteins fold spontaneously into their specific functional structure [1,2]. Many protein folding/unfolding experiments have been performed to date to reveal protein self-assembly mechanisms, and several general mechanisms have been proposed. The "energy landscape" model is a widely accepted hypothesis for describing protein folding pathways. This model proposes that the protein folding reaction explores exothermic conformational arrangements of the polypeptide along the potential energy surface, leading to the native protein structure. Many non-native protein structures are trapped in local minima on the energy surface. The structurally unidentifiable non-native state (molten globule) is a presumed intermediate during the folding/unfolding reaction [3][4][5][6][7]. Numerous kinetic studies have probed the structures and properties of transient intermediate states [8,9]. using far-UV CD, FT-IR, fluorescence, and NMR spectroscopy, as well as small-angle X-ray/neutron scattering and dynamic light scattering [7,[10][11][12][13][14][15][16]. Englander et al. reported that the protein domains of cytochrome c and RNase are sequentially stabilized and folded in an orderly manner [17][18][19].
Together with experimental studies, molecular dynamics (MD) simulation is a useful technique to understand protein folding/unfolding mechanisms [20]. MD simulation of the unfolding of lysozyme showed that unfolding is triggered by the loss of hydrophobic contacts at the inter-domain surface of the enzyme [21]. Lindorff-Larsen et al. reported that protein folding first occurs locally, followed by stabilization of the local structural elements accompanied by intermolecular interactions, which further promote the folding process [22].
Structural characterization of the folding/unfolding intermediate species is key to understanding the principles underlying protein structure formation. The application of small-angle X-ray (SAXS) and neutron scattering (SANS) techniques to protein molecules provides information on their flexibility, size, and structural morphology in solution [23,24]. The unfolding intermediate of lysozyme was previously characterized using SAXS [16]. The unfolding of lysozyme is triggered by extension of the β-domain, which preserves the folding of the α-domain. Synchrotron radiation SAXS rapidly provides data compared to SANS experiments. The combination of flow-cell experiments [25] and size exclusion chromatography [26] with SAXS reduces radiation damage to and aggregate formation by protein molecules.
SANS experiments are very useful for determining the solution structure of a protein without radiation damage. The energy of the neutron beam used for SANS experiments is on the order of meV, which is almost identical to the energy of the infrared region. Very recent progress in the ab initio structure analysis of small-angle scattering data has made possible "low-resolution structure models" that represent the overall shape of the solution protein structure [11,27,28].
Cytochrome c' (Cyt c') is a member of the c-type cytochrome family. Cyt c' comprises a four-α-helix bundle structure, in contrast to cytochrome c, with a heme prosthetic group embedded in the C-terminal region ( Figure 1). Cyt c' undergoes a unique pH-dependent spin state transition. The spin state between pH 3 and 7 is likely a quantum mechanical admixture of an intermediate-spin (IS: S = 3/2) and a high-spin (HS) state. The spin state transition into the HS state at alkaline conditions (8 < pH < 12) is triggered by breakage of the inter-helix hydrogen-bonding linkage between helix C and helix D [29][30][31][32]. The structure of Cyt c' at pH ~13 was suggested to be a 6-coordinated low-spin (LS) state with imidazole (His120)/OHor two OHas the axial ligands, resulting in an "open-bundle" structure at pH ~13, as reported previously [33]. Very recently a program AlphaFold running on the Artificial Intelligence, Google DeepMind succeeded to provide 3D structure from the amino acid sequence [34]. The experimental solution structure determination of unfolded protein is very important to verify such computationally predicted protein structure.
With these points in mind, we determined the SANS solution structures of Cyt c' under different pD conditions to better understand the mechanism underlying the alkaline structure transition of Cyt c'. This is the first report of the alkaline transition intermediate structure of Cyt c' at pD ~13. Determining the structures of Cyt c' in solution at pD 6.4, pD 9.4 and pD ~13 provides a general explanation of protein folding/unfolding mechanisms and the relevance of these mechanisms to the alkaline spin-state transition of Cyt c'.

Sample preparation for the SANS experiments
Cyt c' was extracted from Alcaligenes xylosoxidans and purified by cation exchange chromatography (CM-Sephadex C-50, GE Healthcare) and size-exclusion chromatography (Hiload Superdex 200 pg, GE Healthcare), followed by re-crystallization according to a previous method [33]. The purity of Cyt c' was checked by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE, Figure S1).
Samples for SANS experiments were prepared at pD 1.7, 6.4, 9.6, and ~13 with 20 mM phosphate buffer. The phosphate buffers used for pD 1.7 and ~13 samples were prepared by adding small amount of DCl and NaOD. The hydrogen atoms in potassium dihydrogen phosphate and dipotassium hydrogen phosphate were substituted with deuterium by repeated evaporation/addition of D2O to the solution. The buffer solutions were prepared using D2O, KD2PO4, K2DPO4, DCl, and NaOD. A small amount of concentrated Cyt c' D2O solution was added to the D2O buffer solution at the target pD (pD-jump method). The concentrations of Cyt c' at pD 1.7 and 6.4 were 5.5 mg/mL and at pD 9.6 and ~13 were 5.3 mg/mL. The electronic absorption spectrum of Cyt c' at each pD value was measured using a nanodrop-2000C spectrometer (Thermo Fisher) to check the electronic state of the heme in Cyt c'( Figure S2). The pD values of the buffers and samples prepared in D2O were measured using a pH electrode and corrected to pD by adding 0.4, based on an earlier report. [35]

SANS measurements
The SANS experiments were performed at the KWS-1 beam line at the Julich Centre for Neutron Science (JCNS) at the Research Neutron Source Heinz Maier-Leibnitz Zentrum (FRM II) in Garching, Germany [36] using 2 mm path length quartz cells. The wavelength (λ) of the incident neutron beam was 5 A, selected using the velocity selector, with a spread ∆λ/λ = 10%.
The collimator and detector were configured symmetrically at 8 m and 4 m for the low-Q region, respectively, and the configuration for the high-Q region was 4 m for collimation and 1.5 m for detection. Background scattering from the buffer solutions and empty cell were obtained in the same configurations and used to subtract unwanted background scattering. A secondary calibration standard of Plexiglas was used to determine the scattering intensity on an absolute scale and to correct detector sensitivity.

SANS data reduction and analysis
Data reduction from raw SANS profile images to 1D SANS curves, including all the above-mentioned corrections and calibrations, was conducted using qtiKWS. Radial averaging converted the 2D data to 1D intensity profiles (I(Q)), with the scattering vector Q being calibrated by the wavelength and scattering geometry. The data obtained for each sample with different sample-to-detector distances were merged. The background was removed by subtraction of the scattering intensity from individual buffer solutions.
The radius of gyration (Rg) of Cyt c' at each condition was estimated from the linear low-Q region of the Guinier plot and the pair distribution function, P(r), using the PRI-MUS program [37] in the ATSAS package. The gradual increase in the low Q intensity at pD ~13 could be due to a minor contribution of weak aggregation. The region corresponding to weak aggregation was not used for the Guinier and P(r) analyses. The data ranges for Gunier analysis and the calculation of P(r) were 0.001 Å -2 < Q 2 < 0.002 Å -2 and 0.05 Å -1 < Q < 0.25 Å -1 , respectively. Rg at pD 1.7 was estimated from fitting with the Debye function (D(Q)) corresponding to the scattering curve from a polymer in random walk conformation using a numerical model described previously [38]. The Rg values were also calculated from the crystal structures at pH 6.0 (PDB ID: 4WGZ) and 11.4 (PDB ID: 4WGY) using CRYSON [39]. Further detailed analyses (joint-club model, molecular weight estimation, bead modelling, estimation of the aggregated particle fraction) were described in the Appendix A-D and Figure S3.

Results
SANS experiments with Cyt c' were performed at pD 1.7, 6.4, 9.6, and ~13. Figure 2 shows plots of the SANS intensity of Cyt c' as a function of neutron scattering momentum, Q. The SANS curves at pD 6.4 and 9.6 have almost identical profiles, with a shoulder band at 0.2 Å -1 which is absent at pD 1.7 and ~13. The scattering intensities at pD 1.7 and ~13 were lower than at pD 6.4 and pD 9.6, likely due to dissociation of the dimer to the monomer. Furthermore the shoulder band at pD 6.4 and 9.6 observed at 0.2 Å -1 is reproduced in the simulated SANS curve for the dimer structure ( Figure 2B and 2C) and thus is characteristic of the dimer structure of Cyt c'. The experimentally obtained radius of gyration (Rg) values of Cyt c' at various pD were evaluated from the Guinier plot and P(r) function ( Figure 3) and are summarized in Table 1. The experimental and calculated Rg values from the crystal structures of Cyt c' at pH 6.0 and 10.4 are summarized in Table 2. The experimental Rg values of Cyt c' at pD 6.4 and 9.4 were 18-19 Å and 25-28 Å at pD 1.7. The larger Rg value at pD 1.7 is due to the expansion or oligomerization of Cyt c'. The SANS profile at pD 1.7 was evaluated using the Debye function (Figure 2A), used to analyze disordered polymer structures [40,41]. The SANS profile at pD 1.7 was fit well using the Debye function and gave Rg = 25.7 A, and clearly showed that the structure of Cyt c' transitioned to the unfolded random coil structure. The unfolded random coil structure of Cyt c' at pD 1.7 indicated by the SANS data is consistent with the structure previously proposed based on CD and ESI-MS spectrometry data [33].  The crystallographic structure of Cyt c' from Alcaligenes was suggested to be a dimer, but the association state in solution was unknown [42]. The Rg values for the monomer and dimer structures of Cyt c' were calculated using the crystal structures of Cyt c' at pH 6.0 (PDB: 4WGZ) and pH 10.4 (PDB: 4WGY) and were 17-19 Å (Table 2). Thus, the calculated Rg values are in good agreement with the experimentally obtained Rg values (Table  1), clearly demonstrating that the quaternary structure of Cyt c' is a dimer in solution at pD 6.4 and pD 9.4. The calculated SANS curves based on the crystal structure of dimeric Cyt c' at pH 6.0 and 10.4 are shown in Figure 2 and are quite similar to the experimental SANS curves at pD 6.4 and 9.6, especially for the range of Q < 0.1 Å− 1 , which represents the global structure of Cyt c'. The similarity in the calculated and experimental SANS profiles strongly suggests that Cyt c' is a dimer in solution.  Fetler et al. reported that differences between experimental and simulated SAXS curves in the high-Q region of aspartate transcarbamoylase are due to quaternary structural differences in the solution and crystal structures [43][44][45]. In the present study, the discrepancy between the experimental and simulated curves of Cyt c' in the higher-Q region might similarly be due to quaternary structural differences in the dimeric structure in solution and in the crystal ( Figure 2B, C). Kratky plots analysis of SANS profiles can distinguish flexible and rigid structures [46], with typical highly rigid folded states giving a bell-shaped curve in the low-Q region, and a disordered flexible structure giving a plateau shape in the high-Q region and lacking a bell-shaped curve in the low-Q region [47,48].

Preprints
The Kratky plot of Cyt c' at pD 1.7 shows an intensified plateau region in the high-Q region, without a bell-shaped profile in the low-Q region ( Figure 4A), strongly suggesting that Cyt c' becomes a very flexible random coil structure, in good agreement with the Debye function analysis and previous CD and ESI-MS experiments [33]. The Kratky plots of Cyt c' at pD 6.4 and 9.6 ( Figure 4B and 4C) are clearly bell-shaped profiles centered at Q = 0.1 Å -1 with a plateau shape in the higher-Q region (Q > 0.2 Å -1 ), indicating a flexible moiety in the protein structure. The Kratky plot at pD ~13 shows a weaker, broader bellshaped pattern in the low-Q region, suggesting a different size and/or shape of Cyt c' compared to at lower pD values.

Discussion
In SANS curves, the extrapolated intercept intensity (I(0)) is proportional to the molecular weight (MW) and the weight concentration (C) of the scattering molecules.
The MW was determined according to equation 1: where MWst, Cst, and I(0)st are the molecular weight, weight concentration, and extrapolated intensity of the standard sample, respectively [49]. The MW ratios at various pD values were calculated based on the I(0)st pD 1.7 value, used as the standard sample. The I(0) values were obtained by extrapolation of the SANS I(Q) curves (Fourier transformation of P(r)) and the Guinier plot ( Figure S4). The calculated MW ratios of Cyt c' are summarized in Table 2. The MW ratios evaluated from Guinier analysis and P(r) at pD 6.4 and 9.6 were twice that of the ratio at pD 1.7. Thus, the MW ratios also strongly support 7 of 13 the dimer structure of Cyt c' at pD 6.4 and 9.6. The I(0) value estimated at pD ~13 is identical to the value at pD 1.7, further indicating that Cyt c' exists as a monomeric structure at pD 13. Cyt c' has a four α-helix bundle structure. The structure at pH > 12 was suggested to be an "open-bundle" structure, which retains the α-helix structure [33]. The SANS curve at pD ~13 was numerically analyzed using a "joint-clubs model", designed to describe the scattering patterns from four cylinder-shaped clubs connected by three short loops based on the zig-zag chain model described in earlier reports [50,51]. The "joint-clubs model" for analyzing the SANS curve at pD ~13 well reflected the "open-bundle" structure [33]. A schematic image of the joint-clubs model is shown in Figure 2D. Each α-helix of Cyt c' was modeled as a rigid club. Fitting using this joint-clubs model indicated that the length (L) of each club is 31.5 ± 3.1 A. The Cyt c' structure consists of four α-helices: A(Ala3-Lys31), B(Asp37-Phe59), C(Ala76-Asp98), and D(Asp103-Arg124). The lengths of helices A, B, C, and D along each helix axis are 42.6, 35.4, 32.4, and 32.7 A, respectively, as determined from the crystal structure (PDB ID: 4WGZ). The average length of the helices is 35.8 A, which is in good agreement with the L value of 31.5 ± 3.1 Å calculated using the jointclubs model. The average diameter of the clubs (R) was determined as 10.6 ± 0.9 Å by the fitting, in good agreement with an average diameter of the helices of ~10 Å in the crystal structure. Therefore, the joint-clubs model strongly supports the previously speculated "open-bundle" structure of Cyt c' at pD ~13. Figure 5. The ab initio bead models generated by the program DENFERT [27,28], and fitting of the crystal structure into each volumetric map (bottom panel) at pD 6.4 (A), 9.6 (B) and ~13 (C). The four B helices were docked in the volumetric map for (C) using the program Situs. [52] All models are drawn with the same scaling factor.
Ab initio analysis was conducted to clarify the low resolution solution structure of Cyt c'. Figure 5 shows the ab initio bead models drawn at the same scaling factor level to compare the size of the protein at different pD conditions. The bead models at pD 6.4 ( Figure 5A) and 9.6 ( Figure 5B) have essentially the same shape in the resolution of ab initio analysis. The bead model obtained at pD ~13 ( Figure 5C) showed an extended structure. This elongation suggests two possibilities regarding the solution structure: the oligomerization of Cyt c', or formation of the "open-bundle" structure. The first possibility can be excluded immediately because the molecular weight ratio (Table 3) reflects the monomeric state, as described above. Table 3. Concentration (C) of Cyt c', I(0), and estimated molecular weight ratio from experimental SANS curves at each pD condition. We also conducted docking simulation [52] of the helices to the elongated bead model structure at pD ~13 to clarify the suitability of the bead model to reflect an "openbundle" structure of Cyt c' (Figure 5C, bottom). The volumetric map generated from the bead model at pD ~13 was fitted with four ideal helices. Docking with helices A, B, C, and D did not converge in the given space of the volumetric map and thus we chose helix B as a representative helix structure because its length is similar to the average length of the four helices in Cyt c'. The loop structures connecting each helix were omitted. Figure 5C shows a possible arrangement of the four helices in the "open-bundle" form of Cyt c' in the space of the volumetric map. This arrangement of the four helices is consistent with the joint-clubs model. The volumetric map at pD ~13 was also fitted to a monomer with the correlation coefficient 0.81 between the volumetric map and the volume calculated from the model (Appendix C "Ab initio bead modelling and analysis of bead model"). This volume is insufficient to accommodate the dimer structure. Therefore, the bead model for pD ~13 ( Figure 5C) can reasonably be concluded to be the structure of the monomeric "open-bundle" structure with four α-helices, in contrast to the oligomerization of cytochrome c [53,54]. The oligomerization of cytochrome c is initiated by the displacement of the C-terminal helix domain to the corresponding position of other monomer [47]. The SANS experiment with Cyt c' at pD ~13 clearly demonstrated the monomeric "open-bundle" structure, and the fraction of aggregated particles was estimated to be only ~0.2% from the low-Q intensity (Appendix D "Estimation of aggregated particle fraction at pD ~13"). Therefore, Cyt c' does not undergo the intermolecular structural reassembly observed for cytochrome c [53].

Conclusions
The pH-induced structural transition mechanisms of Cyt c' were studied based on SANS solution structure determinations and were determined to be (i) random coil monomer at pD 1.7, (ii) folded dimer at pD 6.4, (iii) initial dimer dissociation at pD 9.6, and (iv) monomeric "open-bundle" structure at pD ~13. The comprehensive SANS analysis in the present study is consistent with previous spectroscopic studies (mass-spectrometry, CD/MCD spectroscopy) and precise crystal structure analyses. The relation between the pH-induced spin state and the structural transition of Cyt c' was described previously [33,55]. The present SANS study showed that the unfolding intermediate at pD ~13 is key for elucidating the structural transition mechanism. The alkaline transition structure at pD ~13 is the first folding/unfolding intermediate structure obtained for Cyt c' and has not been determined by crystal structure analysis. The "open-bundle" structure of Cyt c' as the unfolding/folding intermediate could provide insights into the initial or last step in the unfolding or folding process, respectively. Opening of the four α-helix bundle of Cyt c' into the intermediate "open-bundle" structure could be induced by the disappearance of inter-helix hydrogen-bonds at alkaline pH, reported previously [33].
Supplementary Materials: Figure S1: SDS-PAGE of purified Cyt c' on a 12.5% gel., Figure S2: Electronic absorption spectra of Cyt c' at pD 1.7, 6.4, 9.6, and ~13. The spectral patterns are annotated by charge-transfer 3 (CT3), Soret, CT2, Q and CT 1 at short wavelengths. The samples were prepared by diluting the samples following SANS experiments., Figure S3: Curves obtained by ab initio analyses (red lines) and data points used for each condition (squares: pD 6.4, circles: pD 9.6, triangles: pD ~13)., Figure S4 Funding: This work was supported in part by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan under Contract Nos. 17-214, 18-190, and 19-198 to TK, and Grantsin-Aid for Scientific Research (Nos. 18550147 and 22550145 to TK) from the Japan Society for the Promotion of Science (JSPS).
Acknowledgments: Part of this study was performed under the approval of the Photon Factory Program Advisory Committee (Proposal Nos. 2018G018 and 2018G032). TY is grateful to the Early Researcher Encouragement Program of Ibaraki University. This work was performed using the KWS-1 instrument operated by the Jülich Centre for Neutron Science (JCNS) at the Heinz Maier-Leibnitz Zentrum (MLZ), Garching, Germany.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Joint-clubs model
The SANS curve at pD ~13 was analyzed using a numerical "joint-clubs" model, where the length of a club rod (L) and its diameter (R) correspond to the length and diameter of the α-helix, respectively. The other parameter, r, represents the radius of the blob scattering of the polymeric linkers within the Debye function D(Q). The following functions (eq. A1-5) are involved in the model: Here, Si(x) is the sine integral and J1(x) is the Bessel function of the first order. The prefactors I0 and I1 determine the scattering powers of all the helices and the linkers, respectively. In equation (eq. A5), the function in the large square brackets describes the zig-zag line scattering, while B 2 (Q) describes the spherical cross-section of the cylinder. [50,51] The total scattering intensity is given by the product of the single-particle scattering, I(Q), and a simple structure factor [56], S(Q) (eq. A6) emerging from single particles and aggregates, respectively: where Sagg(0) is the dimensionless amplitude of the aggregates and ξ is a correlation length in the sense of van Hove [57], which represents the size of an aggregated particle. The contribution to scattering from aggregates was estimated to be ~0.2%. The estimated I(0) values and the concentrations of Cyt c' were used to determine the ratio of the molecular weight, MW, to the molecular weight at pD 1.7 using equation S7: where C is the weight concentration. The I(0) values at Q = 0 Å -1 were estimated by extrapolation of the intercept intensities on the Guinier plots and P(r) functions.

Appendix C. Ab initio bead modelling and analysis of the bead model
An indirect transform using the program GNOM [58] was performed to obtain the regularized scattering curves for the data where the protein retains its secondary structure (pD 6.4, 9.6, and ~13). The regularized data were used as input to the program DENFERT v.2 [27,28], which restores the low-resolution shape of the protein by considering the contribution of the hydration layer to the measured scattering. A 10% higher scattering length density for the hydration layer was applied. The protein scattering length density was set to 3.1 × 10 -6 Å-2 and the buffer scattering length density was set to 6.4 × 10 -6 Å-2 . Ten independent runs for each pD condition were performed and compared using the program DAMAVER [59]. Those with the lowest normalized discrepancy (NSD; a measure of quantitative similarity among sets of three-dimensional points) were chosen as the typical model. The average NSD in all cases was in the range of 0.6-0.7, indicating solution stability, consistent with the experimental data. Figure S3 shows the curves resulting from the ab initio analyses for pD 6.4 and 9.6 and the joint-clubs model for pD ~13 with the experimental SANS curves. The square root values of the reduced χ 2 were 1.8, 2.9, and 1.9 for pD 6.4, 9.6, and ~13, respectively. The bead radius in the ab initio analysis was set to 2 A. The models constructed for the pD ~13 condition was further converted to volumetric maps using the pdb2vol program, and the maps was used for docking analyses. The docking of models in map were performed using Powell's optimization program Collage in the Situs v.2.8 package [52]. The docking analysis at pH ~13 was carried out with four identical helices (Helix B), and the correlation coefficient for full multi-body docking was 0.81.
Appendix D. Estimation of the aggregated particle fraction at pD ~13 Forward scattering, S(0), is generally expressed by equation A8: where () 2 is the contrast of the scattering particle,  is the volume fraction of the particles, and V is the volume of the particle. The ratio of forward scattering of the aggregate and a joint-club, Sagg(0)/Sjc(0), is described by the volume fraction of Cyt c' in the beam,  Cyt c', the fraction of aggregated particles, , the volume of aggregated particles, Vagg, and the volume of the joint-clubs structure, Vjc (eq. A9). The structure factor, S(Q), is described by equation S6 [56] The experimental data at pD ≃ 13 were fitted to the equation of the product of S(Q) and the scattering function of the joint-clubs model defined by equations S1-5. Curve-fitting in Figure 2(D) gave the parameters Sagg(0)/Sjc(0) = 9.2 and ξ = 432  64 Å. The ratio of the volume between the joint-clubs structure and aggregated particles, Vjc/Vagg (eq. A10), was estimated: