1. Introduction
The fundamental question of how life emerged on planet earth, which is up to now the only one among the thousands of exoplanets discovered, has been accompanying mankind in belief and scientific reflection systems for millenia and remains unsolved. Even though it is unknown how on early earth life originated and there are many conflicting models [
1,
2], the concept of a primordial soup by Alex Oparin [
3] has provided inspirations to the question what type of components would have been needed in such a prebiotic soup [
4]. Key features of living biological cells such as metabolism and self-organization, which are characterized by a complex organization in space and time of biocatalytic reaction networks coded in the cell-specific genome, have evolved over more than 3 billion years, with evidence for the oldest bacterial life forms near ancient submarine hydrothermal vents [
5]. Whatever origin of life model is favoured, metabolism is a key part and criteria have been proposed which must satisfied by chemical reaction systems to be a simple metabolism able to support protocell growth and division [
6]. It is of much interest what catalysts and non-enzymatic reaction networks of ancient metabolism have made use of simple starting materials and energy sources [
7,
8,
9], prior to enzymatic reactions catalyzed by proteins and RNA enzymes. The known enzyme functions and pathways, which have been discovered over more than hundred years up to now, are already catalyzing a tremendous diversity of reactions, not only in central metabolism but also in more remote and specialized metabolism, such as natural product biosynthesis [
10]. The universe of enzyme functions and pathways is however much bigger and is expanding with the investigations of enzymes and biosynthetic pathways in a growing number of microbes [
11], plants [
12] and animals [
13].
The great advances of molecular biology and genetics as well as the impressive scientific and technological developments and improvements in methods for a) analyzing and sequencing DNA [
14] and RNA [
15], and b) synthesizing DNA [
16] and RNA [
17,
18], have also revitalized the old interest in the influence of non-genetic factors such as nutrition, environment, competition, symbiosis, on health and disease of living organisms. The structural protein data collections, such as UniProt with more than 227 million protein sequences [
19], PDB with more than 180'000 three-dimensional structures of proteins [
20] and the more than 200 million predicted protein structures [
21], provide a tremendous structural knowledge base for protein science and are advancing at a fast pace. Keeping up the pace of correct and experimentally verified functional annotations of protein sequences is essential [
22], and the discovery of novel biocatalytic functions is supported by a variety of methodologies and tools linking genomics and enzymology [
23]. Advances in the analysis of the whole small molecular domain (metabolome) of biological cells by MS and NMR methods have significantly enlarged the tools for identifying protein functions [
24]. The simultaneous analysis of known protein functions in a biological cell, which has been shown for 132 enzyme catalytic rates in
Escherichia coli by extracting from omics data the maximum of the catalytic rate which has been observed for an enzyme inside cells [
25], is attractive for accelerating and extending the collection of enzyme kinetics data. For elucidation
in vivo effects on enzyme kinetics, for experimentally verifying the discovery of a novel enzyme function or confirming an already known protein function, the synthesis of the respective protein and the small molecules predicted as substrate are essential.
Fundamental discoveries, new approaches and instrumental advances have enabled classical molecular biology and classical metabolism to meeting in a new integrated molecular view of life, where the molecular biology central dogma [
26] regarding the sequential information transfer and the utilized alphabet is still key and the entire small molecular world of a biological organism is now also considered as an important part of a truly molecular biology [
27,
28,
29,
30,
31]. The more than 55 natural metabolite-sensing riboswitches, which have been discovered to control transcription termination, translation initiation and alternative splicing, demonstrate the importance of linking the monitoring of biologically relevant elements, electrons and ions in the form of ligands to riboswitches with the gene expression control and the complex metabolic pathway regulation. The way how riboswitches typically achieve this sensing and control is by forming a mRNA domain in the 5′ untranslated region which is able to bind small molecules as ligands and is partly overlapping with an expression platform. Expression control is achieved by having the riboswitch in a genetic on mode when no ligand is bound, and a genetic off mode when the ligand is bound [
32]. Although more than 55 classes of riboswitches for common cofactors and nucleotide-based signalling molecules have already been discovered, many opportunities still exist for discovering novel riboswitch classes sensing important metabolites, such as fatty acids, lipids, terpenoids, or unmodified sugars [
33].
The opportunity to switch on or off metabolic pathways and the involved enzymes according to whether biologically relevant small molecules are presenr or absent in the nutrition or the environment connects metabolism with molecular biology and can be realized by different mechanisms. Metabolic pathway and network regulation can be achieved at multiple levels, such as genetic regulation [
34], activation of silent pathways [
35], or metabolite-protein interactions [
36]. Metabolic efficiency can be achieved by separating highly reactive metabolites from the intracellular space in a protein-bound form, such as γ-glutamyl phosphate bound in L-glutamine synthase [
37]. The hypothesis of metabolite-enzyme coevolution is attractive for integrating the avoidance of reaction losses through toxic and inhibitory metabolites, recruitment of enzymes and the evolution of metabolic pathways and networks [
38].
The proper functioning of metabolic reactions needs also damage control and repair systems for the components of living biological cells which may be exposed to various kinds of damaging influences, either by external hazards and environmental stress, or by internal factors such as the working life span of an enzyme [
39], side reactions due to additional non-native catalytic functions or additional non-native substrate acceptance of enzymes [
40] or chemical reactions within biological cells, which can not only damage macromolecules like nucleic acids and proteins but also metabolites [
41]. Repair systems for damaged metabolites, which may be just useless when having no biological effects but in more severe cases may have negative biological effects, are important for maintaining healthy life, and biocatalytic functions and pathways converting damaged metabolites back to useful and non-toxic pathway metabolites contribute to robustness and stability [
42]. The analysis of a minimized Mycoplasma JCVI-Syn3A genome has shown that metabolite damage repair systems are still in place and cannot therefore be separated from life itself [
43].
Metabolites play also key roles in the regulatory processes by the enzymatic modifications of essential biopolymers of molecular biology, such as DNA, RNA and proteins. The enzymatic modification of histones and DNA, which involves metabolites from nutrients and the microbiome, connects metabolism and epigenetics in a complex interplay [
44]. Other important interfaces between metabolism and molecular biology appear in gene silencing [
45], post-transcriptional RNA modification [
46] and post-translational modifications of proteins [
47].
Tremendous advances have been achieved in assigning enzyme functions to protein sequences and the discovery of novel biocatalytic functions and pathways. More than 6000 enzymes, which have been recognized and categorized according to the reaction they catalyze by a four-digit EC number within the seven EC classes [
48], have been shown to have already a large variety of enzyme activities. As the universe of enzymes is however much larger, moving the frontiers of enzyme function knowledge is therefore highly important. This is clearly evident from the growing number of sequences of structures of known proteins, for which the corresponding enzyme function is unknown, or has not been experimentally verified. The main aim of the work is to bring back the attention to metabolism as a key research area for the study of life and as a valuable resource for applications in biocatalysis. The great progress in genomic enzymology tools and methodologies, discovery of unknown biocatalytic functions and pathways as well as experimental technologies for analysis and synthesis enlarge the power and opportunities of biocatalysis [
49].
In addition to biocatalysts having a single enzyme function, biocatalysts have been discovered which can catalyze more than one reaction. Biocatalysts have been discovered which can switch their enzyme function depending on the pH, such as the terpene cyclases AaTPS and FgGS can act at basic pH as aromatic prenyltransferase for generating prenylindoles [
50]. Biocatalysts containing multiple enzyme functions in the same protein are well known and of much interest for stabilizing labile intermediates and directing cascade reactions selectively towards rapid generation of molecular complexity [
51]. Multifunctional enzymes, which are among the largest and most complex enzyme machineries, are involved as megasynthases in catalyzing the biosynthesis of numerous product groups, such as fatty acids, non-ribosomal peptides, polyketides, or terpenes, from simple natural building blocks. The biosynthesis of a large diversity of complex polyketides is catalyzed by multifunctional polyketide synthases, which can operate by the programmed use of the same enzyme functions repeatedly in an iterative mode, or by a linear channeling of the intermediates from one function to the subsequent function in an assembly-line mode [
52,
53,
54].
2. Discovery and Characterization of Proteins with Unknown Biocatalytic Functions
Life on planet earth at this timepoint can be categorized into ancient living organisms which have become extinct now, the biological organisms living now, and the biological organisms evolving in the future. As the currently occurring loss of biodiversity his of major concern [
55], it is of much interest to gain knowledge about the life of current biological organisms before they become extinct. From genes to proteins to metabolites the characterization of hidden biocatalytic functions and pathways, whether cryptic or silent [
56], is an exciting research area. One key area for understanding the metabolism of a biological organism in healthy and diseased conditions is knowledge about sequence, structure and function of its constituting proteins. The knowledge of gene sequences coding for proteins is growing much faster than the corresponding experimental identification and verification of its corresponding protein functions. This widens the gap which represents a major challenge as the assignment and experimental characterization of biocatalytic functions to the corresponding gene products requires substantial efforts. The deep learning model DeepECtransformer has been developed to predict known enzyme functions at the level of EC numbers in order to reduce the number of unannotated genes [
57]. From 464 un-annotated
E. coli genes enzyme functions at the level of EC numbers were predicted for the corresponding proteins by DeepECtransformer and from these three proteins, the predicted glucose 1-dehydrogenase for YgfF, L-threonylcarbamoyladenylate synthase for YciO, and phosphonoacetate hydrolase for YjdM, have been randomly selected [
57]. This facilitated experimental validation of the enzyme functions, which has been performed by
in vitro enzyme assays of the overexpressed and affinity-purified proteins YgfF, YciO, and YjdM [
57]. For special enzyme reactions or completely novel enzyme functions without EC numbers, a significant effort and time may be needed for the development of suitable analytical and preparative methods [
58,
59,
60]. These include the expression and purification of proteins, synthesis of substrates, analysis of substrates and products, as the catalytic function of a protein needs to be demonstrated by its incubation with a potential substrate and the identification of the nature of the product formed from the substrate in a protein- and time-dependent way. Further experimental characterization is of much interest and includes the identification of the optimum reaction conditions and the measurement of catalytic performance parameters such
kcat and
KM. The reporting of these functional datasets according to the STRENDA guidelines, which are recommended by an increasing number of journals, and its deposition in the STRENDA database provide a modular framework in the workflow of processing, storing and retrieving an increasing number of enzyme function data [
61]. The growing Pfam database of protein sequence families [
62], which have been generated according to a significant degree of sequence similarity of a protein domain, is of much interest for connecting to enzyme functions and evolutionary history of proteins, and for guiding experiments. The activity of a known natural enzyme has also been a common starting point for engineering and evolving the properties of the enzyme towards optimum performance of a desired biocatalytic reaction under defined reaction conditions with respect to catalytic efficiency, selectivity stability, or substrate scope [
63].
Whole genome sequencing of biological organisms has yielded a large dataset of genes which code for proteins whose function is completely unknown. The description of domains of unknown functions (DUF) for uncharacterized protein families started in 1998 with the first two members DUF1 and DUF2 [
64]. Since then, the DUFs, both in absolute numbers and as a percentage of all protein families, have been continuously increasing to more than 20% of all protein families over the years [
65], reaching DUF 6807 in release 35.0 of the Pfam database, which has now been integrated into the InterPro database [
66]. Genomic enzymology web tools, such as sequence similarity networks, enzyme similarity tools, genome neighbourhood tools and taxonomy tools, enable the exploration of databases towards the
in vitro characterization of enzyme activities to uncharacterized proteins [
67,
68,
69,
70].
Screening of transport system proteins which bind solutes and applying sequence similarity networks and genome neighbourhood networks has enabled the identification of novel kinases, which are ATP-dependent and are acting on four-carbon sugar acids, from the DUF1537 protein family [
71]. Thereby the novel DUF1537 enzymes D-threonate kinase DtnK and D-erythronate kinase DenK (see
Figure 1) have been identified and characterized [
71]. A strategy for finding enzyme activities within protein families of unknown function has been based on defining a generic conserved reaction in the protein family, high-throughput screening, analysis of genomic and metabolic context [
71].
The protein Cj1418 from
Campylobacter jejuni, which has been recombinantly expressed and affinity-purified, has been discovered as first enzyme to directly phosphorylate the amide nitrogen [
72]. Cj1418 has been clearly demonstrated to act as ATP-dependent L-glutamine kinase (see
Figure 1), which corrected its former annotation as putative phosphoenolpyruvate synthase or pyruvate phosphate dikinase [
72]. The application of this approach to the DUF849 Pfam family enabled the discovery of various novel β-keto acid cleavage enzymes [
73].
For functional annotation of the proteins Ms0025 from
Mycoplasma synoviae and Mag6390 from
Mycoplasma agalactiae (see
Figure 1) as novel lactonases a combination of approaches was needed, from the consideration of genetic context, computational, empirical and structural screening, to the comparison of sequences and addition of newly synthesized substrates to the original libraries [
74]. Both lactonases have been demonstrated to catalyze the hydrolysis of D-xylono-1,4-lactone-5-phosphate with a
kcat/
Km value of 5.7 × 10
4 M
−1s
−1 for Mag6390 and 4.7 × 10
4 M
−1s
−1 for Ms0025, and the hydrolysis of L-arabino-1,4-lactone-5-phosphate with a
kcat/
Km value of 2.2 × 10
4 M
−1s
−1 for Mag6390 and 1.3 × 10
4 M
−1s
−1 for Ms0025 [
74].
3. Discovery and Characterization of Unknown Metabolic Pathways
Central metabolic cycles and pathways, such as glycolysis, mevalonate and methyl-erythritol phosphate pathways, the Calvin cycle, citric acid cycle, or urea cycle, have been discovered through significant scientific efforts and fundamental investigations, which have been honoured by many Nobel Prizes and have become standard biochemistry knowledge. In addition to the central metabolic pathways for sustaining healthy life in the large diversity of biological species, specialized pathways for preparing bioactive small molecules from the nutrients available may be connected with special living conditions, diseases or tasks. From microbes to plants, animals and humans, the identification of the relevant biochemical pathways and missing enzymes continues to be highly important not only for natural and synthetic pathways to known bioactive metabolites and natural products, but also for orphan, cryptic or silent pathways to unknown metabolites, salvage and repair pathways. Therefore methods for identifying functional genes, such as gene expression profiling in real time, knockouts or heterologous expression of all the target genes of a complete biosynthetic pathway [
75], combined analysis of genome and transcriptome data, as well as metabolome and enzymatic analysis are essential for elucidating biochemical pathways [
76]. This requires also outlining the organic chemistry of the biochemical pathways and connecting the metabolites with the genes that encode their biosynthesis [
10,
77]. Computational methods and tools using the databases of biochemical compounds and principles of biochemical reactions [
78] are of much interest for potential unknown metabolic pathways towards shining light on the dark matter of metabolism. The identification of all enzymes and their functions along a biocatalytic pathway, as well as the metabolic intermediates is key for a molecular understanding of the natural pathway and for designing synthetic pathways.
The identification of missing enzymatic reaction steps in metabolic pathways has been a classical area in the discovery of now well established metabolic pathways. Newly developed experimental tools and methods have however enabled fresh and straightforward approaches to identify missing enzymatic reaction steps often leading to the discovery of entirely novel enzyme functions. The question of how nature catalyzes the synthesis of altemicidin by the gene products of a recently identified biosynthetic gene cluster has been addressed by a smart combination of various experimental techni-ques [
79]. This has led to the discovery of a fascinating novel pathway from β-nicotin-amide adenine dinucleotide to altemicidin in eight enzymatic reaction steps (see
Figure 2), whereby a novel enzymatic [3+2]-annulation between β-nicotinamide adenine dinuc-leotide and S-adenosyl-L-methionine has been discovered [
79]. From the separately expressed genes of the Streptomyces lividans biosynthetic gene cluster sbz and functional anaylsis by untargeted metabolomics analysis, SpzP has been found as the gatekeeping enzyme in the generation of the 6-azatetrahydroindane backbone [
79].
The identification of all missing enzymatic reaction steps for completing the whole bio-synthetic pathway to metabolites traditionally extracted in low yields from biological species is not only of fundamental interest but provides also a starting point for develo-ping sustainable multi-step enzyme-catalyzed processes for their production. The identi-fication of all missing enzymes in the complex 31-step vinblastin biosynthetic pathway of Catharanthus roseus has demonstrated how these enzymes catalyze the resource-effi-cient generation of chemical complexity from the simple metabolites tryptophan and ge-ranylpyrohosphate by a combination of divergent and convergent synthesis strategies [
80,
81,
82,
83]. The question of how stemmadenine acetate, which is formed from strictosidine [
81], is converted by divergent biocatalytic reactions to the two metabolite building blocks cataranthine and tabersonine has been addressed by chemical investigations, sequence data, gene silencing, synthesis of metabolite standards, NMR and mass spectrometry [
80]. These methods and the validation of the biocatalytic reaction steps in vitro with expressed and purified proteins enabled the discovery of two novel redox enzymes which have been named precondylocarpine acetate synthase (PAS) and dihydroprecondylocarpine acetate synthase (DPAS), and the characterization of the two hydrolases tabersonine synthase (TS) and catharanthine synthase (CS) [
80]. The two enzymes PAS and DPAS have been shown to catalyze the conversion of stemmadenine acetate into the unstable intermediates precondylorapine acetate and dihydroprecondy-locarpine acetate, which is converted by TS- or CS-catalyzed desacetoxylation to dehy-drosecodine [
80], to subsequently generate through Diels-Alder cyclizations, either the TS-catalyzed reaction to tabersonine or the CS-catalyzed reaction to catharanthine [
80]. After the biotransformation of tabersonine to vindoline, which is catalyzed by seven enzymes [
82], the convergent synthesis of the anticancer natural product vinblastine in the biosynthetic pathway is completed by the condensation of catharanthine and vindo-line [
83]. The benefit of identifying all missing enzymes in a pathway has been demon-strated by the impressive achievement of engineering the thirty enzymes catalyzing the reactions to vindoline and catharanthine into yeast, using a chemical coupling reaction in the final step to vinblastine [
84].
It is also of much interest how the diversity of biological cells and their environments, from which the uptake of nutrients and energy is needed, is also reflected in diverse core metabolic pathways for the biosynthesis of the central molecules of life. Investigating different domains of life for central metabolic pathways can provide not only insights into fundamental reactions, unusual pathways and evolution, but also valuable novel biocatalytic functions. A reversible reductive tricarboxylic acid cycle, which has been discovered in the chemolithotrophic thermophile
Thermosulfidibacter takaii required a combination of genomic, metabolomic and enzymatic analysis [
85]. The biodiversity and biosynthetic potential of the human gut microbiome has been demonstrated by the identification of 19890 primary metabolic gene clusters in 4240 genomes, which represents an important milestone for advancing the understanding of its role in human physiology [
86]. While the biosynthetic pathway to the essential cofactor coenzyme A has been well established in bacteria and eukarya, it was only recently that the entire coenzyme biosynthetic pathway in archaea has been experimentally validated and demonstrated to be different from bacteria and eukarya [
87].
The fast growth of genome sequences in all domains of life has revealed the extent of unknown metabolic and biosynthetic capabilities of living organisms [
88,
89]. An impressive 231534 biosynthetic gene cluster regions have been selected from archaeal, bacterial and fungal genomes for the antiSMASH database version 4 [
90]. Specialized metabolic pathways to complex and uniquely functionalized natural products remain therefore a very promising and vast area for discovering novel biosynthetic logic and biocatalytic functions. The biosynthetic pathway for enediyne aromatic polyketides has been investigated utilizing the coexpression in
E. coli of combinations of genes which code for a polyketide synthase and a thioesterase from the enediyne biosynthetic gene clusters have been expressed in recombinant
E. coli strains to complement mutant strains able to produce anthraquinone-fused enediynes but lacking the corresponding polyketide synthase [
91]. A combination of synthetic biology, chemical complementation and
13C stable isotope labeling experiments enabled the identification of the common linear polyene intermediate 1,3,5,7,9,11,13-pentadecaheptaene and the proposal of a unifying pathway (see
Figure 3) for enediyne aromatic polyketides [
91]. This provides an excellent groundwork for the exploration of the intriguing biocatalytic reactions by which the pathways are diverging from 1,3,5,7,9,11,13-pentadecaheptaene, whereby one molecule is transformed to the enediyne core, while the anthraquinone moiety is formed from a second one [
91].
The search for various types of completely unknown or hidden biosynthetic pathways, such as cryptic, silent or orphan pathways, leading to still unknown metabolites/natural products may not only provide novel biocatalytic functions and exciting new chemistry but also attractive novel scaffolds for biologically active small molecules [
35,
56]. A range of approaches have been developed for discovering novel structures of biologically active small molecules and for uncovering the links with the respective genes coding for the enzymes which catalyze the reactions leading to their biosynthesis [
92]. Specialized biologically active small molecules may be only needed at specific times or certain conditions of life and it is therefore not surprising that their biosynthesis is dependent on cultivation conditions, environmental signals, stress, presence of elicitors, inducers, or exogenous metabolites from co-cultivation [
92]. These empirical approaches may however be challenging and impractical for larger numbers of samples. With the advances in genomics and the findings that often the genes which code for the enzymes used in a specific pathway are organized in biosynthetic gene clusters [
93], genome-guided methods have attracted much interest [
92]. Bioinformatics analysis tools for discovering biosynthetic pathways and for identifying biosynthetic gene clusters are very promising [
94,
95,
96]. While genome sequences are important, much more is needed for connecting genes and the functions coded by them, and for deciphering the biosynthetic logic of the corresponding metabolic pathways. Important complementary information can be derived from the combination of chemical and biochemical knowledge with genomic context [
97], identification of co-expressed genes by RNA-sequencing and transcriptome-wide analysis of differential gene expression [
98], metabolomic analysis and their correlation with absent or present expression of a biosynthetic gene cluster [
99]. Finally, the integration of powerful analytical technologies with high information content is key for establishing the sequence of the biocatalytic reactions along the pathway, and the molecular structure of the metabolites and natural products [
100,
101].
Bacterial and fungal genomes have also numerous silent biosynthetic gene clusters [
88,
89,
90], which are poorly and not at all expressed under standard laboratory conditions. The question how cryptic, orphan and silent biosynthetic gene clusters, which outnumber the active ones, can be expressed is of fundamental importance for the discovery of unknown metabolic pathways. Therefore much attention has been paid to general approaches and methods for activating and characterizing natural product biosynthetic routes as well as the corresponding genes encoding all the enzymes catalyzing the biosynthetic reactions. Various methods have been shown to be valuable, such as perturbation of epigenetic regulation, promoter exchange, control of the translation machinery by ribosome engineering, activator gene overexpression, or repressor gene inactivation [
92]. The novel urea natural product class gaburedins has been discovered via the derepression of its silent
gbnABC biosynthetic gene cluster, which has been achieved by deleting the putative regulatory gene
gbnR, which is pathway-specific for the transcription repressor in
Streptomyces venezuelae [
102]. Structure determination of the metabolites which were present in the derepressed
gbnR mutant but lacking in the silent wild type, feeding of possible precursors and the demonstration of no gaburedin biosynthesis when the
gbnB gene was deleted in the
gbnR mutant have led to the proposed roles of the enzymes GbnA and GbnB in gaburedin pathways [
102]. The expression of silent biosynthetic genes was derepressed in a
Streptomyces host by CRISPR/Cas9-mediated genome editing, after the capture of a specific
Streptomyces sclerotialus biosynthetic gene cluster, which was cryptic and silent, and he transfer into the
Streptomyces host [
103]. A novel natural product class has been discovered by this approach, as demonstrated by (2-(benzoyloxy)acetyl)-L-proline, named scleric acid, and its proposed biosynthesis [
103].
Other general approaches involve the insertion of promoters which are constitutively active using CRISPR-Cas9, identifying small molecules as inducers by high-throughput elicitor screening, and creating overproducing strains by reporter-guided mutant selection [
104,
105].
Biological cells may have various provisioning paths to more complex metabolites and natural products in between their uptake from the environment and their complete biosynthesis from essential and simple low-molecular weight biochemicals by natural or synthetic pathways, also termed de novo pathways. Nutrient supply issues may thereby be experiences at different levels, from intermediates and precursors to the more com-plex metabolites and natural products, in the uptake from the environment or in the bio-chemical degradation of biopolymers. Therefore the recycling and utilization of such intermediates and precursors by pathways to the more complex metabolites and natural products, also termed salvage pathways, is not only avoiding the accumulation of waste and ensuring resource efficiency but also supporting the life, resilience and stability of biological cells, for example by keeping adequate cofactor levels. Major coenzymes have been shown to be remarkably stable in vivo in Escherichia coli, Bacillus subtilis and Saccharomyces cerevisiae [
106].
The cofactor nicotinamide adenine dinucleotide (NAD
+) with its de novo biosynthe-tic pathway from L-tryptophan via kynurenine is important for health and mitochon-drial function [
107]. The maintenance of adequate NAD
+-levels is critical for a multitude of enzymatic and cellular functions, and several additional biosynthetic pathways to NAD
+ have been discovered. NAD
+ biosynthesis can be achieved in 3 enzymatic reaction steps (Preiss-Handler pathway) from niacin (nicotinic acid), whereby the last two steps converge with the kynurenine pathway [
108,
109,
110]. Other biosynthetic path-ways to NAD
+ have been discovered which start from the intermediates nicotinamide [
111], nicotinamide riboside [
112], nicotinic acid riboside [
113] and its reduced form [
114,
115], and nicotinamide mononucleotide [
116]. As these pathways are requiring less reac-tion steps to NAD
+, due to their use of metabolites containing the pyridine structure, and are recycling these to the production of NAD
+, a more inclusive use of the term salvage pathway seems reasonable [
117,
118].
The recycling and re-utilization of materials in salvage pathways provides biolo-gical cells with additional flexibility for maintaining the levels of key cellular compo-nents, such as cofactors, under changing living conditions, while benefitting from shor-ter biosynthetic routes to complex cellular components. This re-utilization blueprint from nature is also of much interest for synthetic applications, for example in the re-cycling of cofactors. Further benefits of salvaging precursors have been demonstrated in populations of engineered Escherichia coli strains, where salvagers able to use the precur-sors cobinamide and 5,6-dimethylbenzimidazole for the biosynthesis of the complete cobamide vitamin B
12 make this cofactor available to nonproducing consumer strains, are not overexploited and remove nonfunctional and inhibiting precursors [
119]. For the cofactor S-adenosyl-L-methionine (SAM) several salvage pathways for L-methionine and SAM byproducts are known and two novel oxygen-independent salvage pathways for the SAM byproducts 5ʹ-deoxy-adenosine and 5ʹ-methylthioadenosine have been discovered (see
Figure 4) in Rhodospirillum rubrum and pathogenic Escherichia coli [
120]. Salvage Pathways leading back to SAM, either from nature or by design, can provide important tools for advancing and broadening the synthetic applications of different classes of SAM-dependent enzymes [
121].
Genetically encoded biocatalytic systems for preventing the formation of damaged metabolites, or for repairing damaged metabolites and transforming them back into valuable metabolic intermediates, which cells can utilize again, are of key importance for maintaining the health of living cells. Biocatalytic damage control systems may be as important for cellular life over the course of time as biosynthetic pathways, as they can systematically counteract the negative effects of damaged metabolites, which can be for-med from normal physiological metabolites by enzymatic side reactions or spontaneous chemical reactions to toxic products [
41,
42,
43]. The enzymatic formation of damaged meta-bolites can occur under a variety of condtions, such as an unintended enzymatic trans-formation of a normal physiological metabolite, or an enzymatic transformation of an unintended substrate in addition to the normal physiological substrate. Spontaneous chemical reactions occurring under physiological conditions have been assembled in the database CD-MINE, which is the abbreviation for Chemical Damage - Metabolic In Silico Network Expansion [
122].
The activities of metabolite repair and clearance enzymes, which have been found to eliminate damaged metabolites and side-products in glycolysis [
123], citric acid cycle [
124], photosynthesis [
125], and other major pathways are essential for the proper funcioning of metabolic pathways. Damaged metabolites and side products which are toxic to biological cells have been a good starting point in the search for enzymes cataly-zing its conversion to non-toxic natural metabolites which can be utilized in the corres-ponding cell metabolism. The repair of the intermediate L-4-hydroxy-threonine, which is toxic, by its phosphorylation to L-4-hydroxythreonine-5-phosphate (see figure 1), an essential metabolite of the pyridoxal-5-posphate pathway, has been demonstrated to be catalyzed by kinase STM0162 (DUF1537) [
126]. The unnatural metabolite L-glyceralde-hyde 3-phosphate, which can be taken up by E. coli cells through different transport systems, or can be formed by glycerolkinase-catalyzed phosphorylation of L-glyceralde-hyde, or by a very slow non-enzymatic racemization of D-glyceraldehyde 3-phosphate [
127], is toxic to biological cells due to its action as bactericidal agent and enzyme inhibitor [
128]. L-glyceraldehyde-3-phosphate reductase YghZ from E. coli has been discovered as enzyme for catalyzing the removal of toxic L-glyceraldehyde 3-phosphate by its conversion to L-glycerol-3-phosphate (see
Figure 5), a natural non-toxic metabolite for use in phospholipid biosynthesis or for bypassing the triosephosphate isomerase-catalyzed reaction [
128,
129].