Results
Our study is the first one to demonstrate and provide evidence for harnessing the non-expressing genomic reservoir to accelerate drug discovery.
(a) Proof-of-the-concept: Six unique intergenic sequences, each exceeding 100 base pairs in length and lacking any prior evidence of transcriptional activity, were randomly selected. The potential intergenic DNA sequences were PCR-amplified and cloned into the pBAD202/D-TOPO expression vector. Upon induction, all six synthetically engineered genes demonstated both transcription and translation activity. Western blot analysis confirmed successful expression of his-tagged intergenic sequence derived proteins. Notably, expression of one novel protein (Eka1) led to marked growth inhibition in E. coli, an effect that was completely reversed when cells were cultured in inducer-free conditions. Computational modeling predicted that two of the six de novo proteins adopt stable globular tertiary structures. This is the first report demonstrating the artificial synthesis of computationally predicted, non-native proteins synthesized from non-expressing genomic elements (Dhar et al 2009).
(b) Anti-Malaria: Using S. cerevisiae intergenic sequences, a library of synthetic peptides was computationally generated and screened for similarity to natural ligands of key Plasmodium falciparum invasion proteins: EBA-175, MSP-1(19), and AMA-1. Molecular docking simulations revealed favourable binding of selected peptides to their targets, suggesting potential to inhibit parasite invasion (Joshi et al 2013). Peptides were chemically synthesized and tested against clinical strains of the parasite in the infected blood cell culture. Experimental studies showed that more than 60% of the parasites were unable to enter infected red blood cells (personal communication, Dr Shailja Singh, JNU). A follow up study led to the identification and characterization of four novel microRNAs—ast-mir-2502, ast-mir-2559, ast-mir-3868, and ast-mir-9891 in Anopheles stephensi, from over 3,000 transcriptome sequences. Computational analyses predicted 26 potential gene targets involved in essential processes like gametogenesis, morphogenesis, protein translation, and signal transduction. The findings offer foundational insights into miRNA-mediated gene regulation in mosquitoes and suggest new molecular targets for vector control (Krishnan et al 2015).
(c) Anti-Alzheimers’: A library of 2,500 intergenic sequences was screened for open reading frames, resulting in 424 novel peptides with no known similarity to existing proteins. Using I-TASSER, secondary and tertiary structures of these peptides were predicted and virtually screened against Beta-secretase 1 (BACE1), a key target in Alzheimer’s disease. Docking studies using PatchDock and FireDock revealed peptides with strong binding affinities and favorable interactions at the BACE1 active site. Lead peptides exhibited optimal molecular weight (500–5000 Da) and functional sites such as N-glycosylation and phosphorylation, suggesting potential for enhanced bioavailability and further chemical modification via click chemistry. While this represents an early-stage finding, it demonstrated therapeutic potential of mining genomic dark matter for developing molecules against neurodegenerative diseases (Raj et al 2015).
As a key outcome of this study, two peptides—ECOI2 and ECOI3—exhibited notable inhibitory activity against BACE1. In a FRET-based enzymatic assay, ECOI2 achieved up to 86.7% inhibition of BACE1 activity at a concentration of 1 µM, while ECOI3 displayed moderate inhibition. Treatment of SH-SY5Y neuroblastoma cells with ECOI2 led to a marked reduction in BACE1 protein levels, as evidenced by Western blot analysis, without affecting BACE1 mRNA expression—suggesting a post-translational mode of regulation. ELISA quantification further demonstrated that ECOI2 significantly decreased the levels of amyloidogenic Aβ1–40 and Aβ1–42 peptides. Notably, MTT assays confirmed the non-cytotoxic nature of both peptides in SH-SY5Y cells. Collectively, these findings position ECOI2 as a potent and biologically active BACE1 inhibitor, warranting further preclinical evaluation in relevant animal models (Verma et al 2023).
(d) Anti-Leishmania: Transfer RNAs (tRNAs), primarily recognized for their role in protein synthesis, have not been explored much as peptide-encoding molecules due to intrinsic translational limitations at the ribosomal interface. Considering that tRNAs have not undergone translation throughout evolutionary history, we asked whether chemically synthesized peptides derived from tRNA sequences (tREPs) could exhibit functional activity. To explore this, 87 Escherichia coli tRNAs were computationally translated into peptide sequences, yielding 29 structurally stable candidates. Of these, chemically synthesized tREP-18 exhibited potent antileishmanial activity against Leishmania donovani Ag83 promastigotes (IC₅₀ = 22.13 nM) and the PKDL clinical isolate BS12 (IC₅₀ = 18 nM), with minimal toxicity to J774.A1 macrophages (CC₅₀ = 275 µM). Cell viability and LDH assays revealed dose- and time-dependent cytotoxicity. AFM and SEM analyses showed membrane disruption and cytoskeletal damage in treated parasites. This study reports the first functional tRNA-derived peptide with antiparasitic efficacy, establishing tREPs as a novel class of bioactive molecules and opening new avenues in drug discovery and synthetic biology.
(e) Vaccines: We investigated the potential of tRNA-encoded peptides (tREPs) as a novel source for developing epitope-based vaccines against viral pathogens. Leveraging the growing recognition of functional synthetic peptides derived from non-expressing sequences, we developed a comprehensive computational pipeline that integrates curated data sources and standard prediction tools to identify and rank candidate epitopes from tRNA sequences. For each viral target, the top-ranking epitope—predicted to bind specific HLA molecules—was subjected to 200 ns molecular dynamics (MD) simulations and binding free energy analyses to assess stability and interaction strength. Our results highlight two promising tREP-derived epitopes: RRHIDIVV for Mamastrovirus 3 and IMVRFSAE for Norovirus GII, both demonstrating favorable binding and structural stability. These findings suggest that tREPs offer an untapped molecular reservoir for vaccine design. By demonstrating antiviral potential in tRNA-derived peptides, our work opens a novel path toward rational, computation-enabled vaccine development for expanding this unexplored class of molecules (Shanthappa et al 2024)
(f) Additional evidences A large number of ORFs have been predicted from intergenic and antisense sequences of E. coli. Out of these, a subset was selected based on coding potential, conservation, and structural predictions. Several peptides were predicted to affect bacterial growth, stress response, or metabolism—indicating potential regulatory or antimicrobial properties (Varughese et al 2016).
Antisense proteins: We examined full-length hypothetical genes located on antisense strands in both forward and reverse orientations. Sequences containing in-frame stop codons upon computational translation were excluded from further analysis. Our findings revealed untapped genomic potential in the form of full-length antisense and reverse antisense proteins in E. coli (0.7% and 5.1%), S. cerevisiae (0.15% and 0.5%), and D. melanogaster (0.2% and 2.1%), respectively. Predicted physicochemical properties indicated that many of these peptides could adopt stable structures with functional relevance. Subcellular localization predictions suggested diverse cellular roles, with some proteins showing potential for secretion. Functional annotations linked many candidates to enzymatic or transporter activity (Garg & Dhar 2023a).
Reverse proteins: Reverse proteins are full-length translational equivalents derived by reading existing protein-coding sequences in the reverse direction, specifically in the -1 frame. We systematically explore reverse proteins across E. coli, S. cerevisiae, and D. melanogaster, uncovering their structural, functional, and interaction profiles. Reverse proteins were computationally predicted to encode enzymes such as oxidoreductases, lyases, transferases, hydrolases, and ATP synthases. Subcellular localization predictions further indicate that reverse proteins may play compartment-specific roles within the cell. Our work uncovers an unrecognized layer of genomic coding potential, offering a platform for the discovery of ‘first-in-the-class’ functional proteins. Although this study focused on full-length sequences, extending the framework to include partial-length reverse proteins could dramatically expand the synthetic proteome landscape. While bidirectional expression from a single locus may seem non-intuitive in natural systems, synthetically engineering reverse- proteins introduces exciting opportunities to enrich genomic, transcriptomic, and proteomic datasets and unlock new applications in delivering novel therapeutic peptides and proteins (Nayak & Dhar 2023a).
Intronic proteins: Full-length intronic sequences from S. cerevisiae, C. elegans, and D. melanogaster were computationally translated to explore their potential as novel proteins with defined structural, physicochemical, and functional properties. This work revealed that intron-derived proteins, long thought to be non-coding byproducts of gene architecture, may actually fold into stable, structured entities. Ramachandran plot analysis confirmed that the majority of residues in these proteins occupy energetically favorable regions, supporting their potential to form biologically relevant conformations. Subcellular localization predictions mapped these intronic proteins to distinct compartments—including the nucleus, mitochondria, and cytoplasm—with several candidates exhibiting features of secretory proteins. Functional annotations suggested roles as membrane transporters, DNA-binding proteins, and enzymes, with predicted metal ion interactions involving calcium, zinc, and manganese—characteristics often seen in therapeutic targets (Garg & Dhar, 2023b).
Noncoding proteins: The non-coding RNA comprises a group of RNA molecules that do not encode proteins but play important regulatory roles within the cell. We computationally translated “ncRNA” sequences from C. elegans, D. melanogaster, A. thaliana, and H. sapiens into putative proteins, analyzing their structural, physicochemical, and functional properties. Using I-TASSER, predicted proteins showed stable folding with favorable Ramachandran plots and physicochemical profiles suitable for cellular environments. Functional predictions revealed diverse enzymatic activities, including hydrolases, oxidoreductases, and kinases, along with roles as transporters, signaling molecules, and membrane proteins. Many proteins were localized to key cellular compartments such as the cytoplasm and nucleus. These findings suggest that “ncRNA-derived proteins” (non-coding proteins, NCPs) represent a novel class of biomolecules with potential applications in drug discovery (Nayak & Dhar 2023b).
As an extension of above work, we performed a study of proteins derived from intergenic sequences of E. coli and found strong evidence of antimicrobial properties against gram negative and gram-positive bacteria
Figure 3.
Predicted Antimicrobial peptide from E.coli intergenic sequence.
Figure 3.
Predicted Antimicrobial peptide from E.coli intergenic sequence.