2.1.1. Signal Peptides and Precursor Maltose Binding Protein Fusion Strategies
The fusion of a signal peptide to the N-terminus of eukaryotic TMPs was among the earliest strategies to produce these proteins in E. coli. Typically, upon synthesis, eukaryotic, viral, and some bacterial TMPs are not recognized for membrane insertion and end up in a misfolded inclusion body state.37, 38 However, the addition of short 20-30 amino acids signal peptides to the target protein’s N-terminus makes the protein recognizable by the E. coli machinery for trafficking to the plasma membrane.12 Therefore, for expression in E. coli, the periplasmic leader sequences derived from ompT, ompA, pelB, phoA, malE, lamB, β-lactamase and PelB can generally be used to direct eukaryotic TMPs to E. coli’s plasma membrane.12, 39 In this case, the signal peptide-TMP polypeptides are translocated post-translationally via the Sec-dependent pathway. Conversely, the native to E. coli TMPs have highly hydrophobic signal peptides and are translocated via the SRP-dependent pathway utilizing a co-translational mechanism. These hydrophobic signal peptides (e.g., the peptide derived from the DsbA protein) can also be used as an N-terminal tag to express heterologous TMPs.39
The application of the malE (maltose binding protein, MBP) signal peptide has been successful in the production of several members of the G protein-coupled receptors (GPCRs) family. In these studies, the peptide containing the signal sequence for periplasmic localization of the
E. coli-encoded MBP, or even the entire MBP with the signal peptide included (the precursor MBP, pMBP), was fused to the N-terminus of GPCRs (
Figure 1). This chimeric construct was directed to the plasma membrane, where it adopted a natively folded and functional state.
40-42 Initially, this method was used to express in
E. coli serotonin 5-HT1A and neurotensin receptors in a membrane-bound state.
37, 40 Later, the strategy was applied to several other GPCRs, such as the rat NK-2 (neurokinin A) receptor,
43 rat neurotensin receptor,
41 M2 muscarinic acetylcholine receptor,
42 peripheral cannabinoid receptor,
44 and others. The success of these studies was partly due to the extracellular localization of the GPCRs’ N-terminus, which allowed the MBP signal sequence to direct this protein region to the
E. coli periplasmic space and ensure proper orientation of the first TM helices of the receptors.
40, 45 These advancements were instrumental in progressing GPCRs’ structural and functional studies, aiding pharmacological developments. In their original work, Henderson and colleagues and later studies,
40, 46 found that the expressed in
E. coli membranes neurotensin receptor with N-terminus fusion signal sequence with and without the entire MBP, could bind the ligand neurotensin. However, the presence of pMBP significantly increased the receptor-ligand affinity. After that, the high-resolution structures of GPCRs produced in
E. coli were solved, thus further enhancing the understanding of these proteins’ structure-function relationship. As a result, multiple X-ray structures of neurotensin receptor one was solved at high resolution.
46 Further, the high-level functional GPCRs’ expression in
E. coli have greatly facilitated NMR studies of these proteins as well, providing structural and dynamic insights underlying the interaction with agonist and antagonist molecules.
47, 48
All these studies were based on a similar construct design and cloning in the E. coli expression vectors pRG/II-pMBP or pRG/III-hs-pMBP created in the original studies of neurotensin under the control of lac promoter and IPTG induction.40, 41, 43 The original vector containing the Thrombin (Thr) cleavage site to remove the tag was further replaced by a more selective HRV 3C protease site because Thr was found to aggregate the GPCR.49 In addition to protein engineering to incorporate a signal peptide, the high-yield production of functional GPCRs in E. coli was improved through the optimization of protein expression temperature (typically at 22 °C or lower) and concentration of IPTG (typically low concentration of 0.1-0.3 mM was used).42, 44
2.1.2. Mistic Protein Fusion Strategies
In other studies, the mistic protein fused to the N-termini of eukaryotic TMPs for expression in
E. coli was utilized.
20, 50 Mistic (an acronym for “membrane-integrating sequence for translation of integral membrane protein constructs”) is encoded by
Bacillus species and was originally found in
Bacillus subtilis.
51, 52 The protein folds into a four-helix bundle with a hydrophobic core and a significant fraction of polar and charged amino acids (
Figure 2 A).
51 Mistic is found in both cytoplasmic and membrane-bound states.
20, 53 The mistic protein of
Bacillus subtilis (M110) comprises 110 amino acid residues and has a net charge of -12.0 at pH 7. It has been suggested that its acidic nature enables the tight association with the lipid bilayer alone or as a fusion tag when expressed in
E. coli.
20 However, the shorter than M110 mistic constructs or orthologs found in other species with also highly acidic nature, e.g., the 84 amino acids C-terminal truncated version of M110 (referred to as M1) as well as mistic from
B. leicheniformis (referred to as M2) and from
B. mojavensis (referred to as M3), are highly soluble with almost exclusive cytoplasmic localization.
In contrast, the mistic from B. atrophaeus (M4) is comparable to M110 membrane affinity.20 Interestingly, outside the membrane, soluble mistic forms fibrils with a protomer’s structure that is largely different from those determined by NMR for non-fibrinous mistic.51, 52 The fibrous structures possibly shield the charged regions of mistic and facilitate its interaction with hydrophobic membranes.52 To determine the membrane association regions, Marino et al. analyzed truncated mistic constructs containing individual or combined helices. They found that helices 1, 2, and 4 interact with lipid membranes, whereas helix 3 is primarily soluble.53 It was found in the same study that the single helices 1, 2, and 4 fused to the N-terminus of Y4 GPCR can direct the protein to the E. coli membrane,53 similarly to full-length (FL) mistic.54 However, only the fusion of Y4 GPCR to helix 2 yielded an expression level comparable to those when FL mistic was used and a segment of amino acids “GLDAFIQLY” in helix 2 was identified as the minimal sequence for mistic and its fusion protein to interact with the membrane.53
It has been proposed that the absence of a detectable signal sequence, which is a unique feature of mistic, enables this protein to avoid the Sec translocon’s pathway of E. coli; due to this, mistic’s and mistic-tagged TMPs’ expression does not overload the protein translation machinery.20, 51 Therefore, high expression yields of heterologous TMPs in mistic-tagged TMP chimeras can be achieved.20, 55 It has also been reported that mistic facilitates the expression of functional proteins with both N-terminus inside or N-terminus outside the cell,50, 56 suggesting its adaptive membrane-bound topology to accommodate the expression and folding of the target protein.
Besides GPCR,
53, 54 the aKv1.1 channel, and its six-transmembrane helix (6TM) domain have also been successfully produced in
E. coli as mistic-fusion constructs (
Figure 2 B).
20 It was found that the expression of the aKv1.1 6TM and shortening of the mistic—aKv1.1 6TM linker had a positive effect on the target protein expression levels due to possibly better interaction between mistic’s C-terminus and aKv1.1 6TM as well as reduced proteolysis in the linker region.
20 It was further established that the fusion of aKv1.1 6TM to the C-termini of mistic M110 and mistic M4 resulted in comparable expression levels, possibly because both M110 and M4 aided the membrane insertion of aKv1.1 6TM similarly.
20
Mistic fusion strategy has facilitated the studies of the eukaryotic type I rhodopsin as well because it enabled the economical production of the functional form of this protein in
E. coli.
22 Interestingly, this study found that two mistics copies fused to the N- and C-termini of the target proteins were needed to direct them into the
E. coli membranes; the study was conducted on several eukaryotic rhodopsin variants, including ARI and CSRB, as well as other eukaryotic TMPs, (
Figure 2 C).
22 It was also found in this study that the mistic moieties of the fusion construct do not severely affect the proton transport function of the ARI,
22 which might be advantageous as the expression level of some heterologous TMPs is relatively low, and the removal of fusion tag typically leads to a further reduction in protein quantities.
Expressed in E. coli and purified mistic-tagged eukaryotic proteins were also used as antigens for raising polyclonal antibodies, and it was found that the mistic-TMPs antibodies recognized the corresponding TMPs in native membranes more efficiently than the antibodies raised against just the soluble domains of these TMPs.57 As the study’s authors suggest, this could be because the soluble domains of the studied TMPs might adopt a distinct conformation when included in the membrane-bound FL protein vs. truncated soluble versions.57
Generally, the use of mistic for the expression of different membrane proteins depends on the target protein’s proteolytic susceptibility, the protein expression induction conditions, and the number of amino acids that connect the mistic to the recombinant membrane protein.58 Mistic’s structure and membrane affinity are critical for their ability to facilitate the production of heterologous TMPs.58 This was confirmed by the work of Tarmo and colleagues, who used three mutant variants of mistic protein (W13A, Q36E, and M75A) with amino acid substitutions in different helical regions. The expression level of the mutants in the cytoplasm and membrane were tested when alone and fused to aKv1.1. It was seen that the mutation of methionine-75 to alanine destabilized the structure of mistic protein due to its substantial partitioning between the membrane and cytoplasm. Also, when the mutant was fused to aKv1.1, there was no expression of this protein in the membrane.58
The mistic protein can also be combined with another fusion protein to increase the expression rate of some TMPs. Ananda et al. discovered that the CB2 gene can be expressed only when mistic and TarCf are fused to its N- and C-terminus, respectively, indicating a synergistic effect of the two tags on the expression.50
2.1.3. Apolipoprotein A-I Fusion Strategy
Apolipoprotein A-I (apoAI) belongs to the spherical high-density lipoproteins (HDL), which are abundant in human plasma. ApoAI is a highly α-helical protein of 28 kDa, which in vivo serves as a “glue” to hold HDL particles together.59 The protein is easily produced in E. coli. It has been widely utilized in structural and functional studies of membrane-reconstituted TMPs as the tertiary complex of target TMP-lipid-apoAI form discoidal nanoparticles stabilized by a double belt of apoAI.8, 60-62
Recently, motivated by a study on soluble TMPs fused to the N-terminus of apoAI (discussed in more detail below),26 our research group designed and expressed in E. coli a chimera construct of apoAI with the Mycobacterium tuberculosis EfpA (Mtb-EfpA) drug exporter.63 By doing so, we produced, for the first time to the best of our knowledge, highly pure FL Mtb-EfpA in quantities sufficient for downstream in vitro characterization. Remarkably, when reconstituted in lipid, because of the presence of apoAI in the apoAI-EfpA fusion construct, we observed by electron microscopy the formation of protein-lipid nanoparticles,63 which are similar to previously described nanodiscs.60-62 This suggests that we can carry out future studies on EfpA’s properties (e.g., drug binding, structure determination, assessing the conformational dynamics) using these two-component (apoAI-EfpA protein and lipid) nanoparticles. Moreover, the methodology could also be adopted in studies on other TMPs.
Interestingly, apoAI is typically expressed as a soluble protein in E. coli.64 We also found that the untagged EfpA is deposited in inclusion bodies upon expression. 63 Therefore, there is a question of how and why the apoAI-EfpA is directed to the membrane. One explanation could be that the additional sequence at EfpA’s N-terminus prevents the protein from misfolding at the stage of protein translation, as was previously proposed for mistic’s mechanism to prevent TMPs’ aggregation. 52 Similar effects on protein expression were also observed when TMPs were tagged at their N-termini with glutathione S-transferase (GST)65 or YbeL and YnaI.66