“ Drug-likeness ” versus “ natural product-likeness ”

We discuss further details on the concepts of “drug-likeness”, “lead-likeness”, and “natural product-likeness”. The discussion will first focus on natural products as drugs, then a discussion of previous studies in which the complexities of the scaffolds and chemical space of naturally occurring compounds have been compared with synthetic, semi-synthetic compounds and FDA-approved drugs. This is followed by guiding principles for designing “drug-like” natural product libraries for lead compound discovery purposes. We end up by presenting a tool for measuring “natural product-likeness” of compounds and a brief presentation of machine learning approaches and a binary quantitative structure-activity relationship (QSAR) for classifying drugs from non-drugs and natural compounds from non-natural ones, respectively.


Introduction
In a previous chapter, focused on the definition and classification of natural products (NPs), NPs were defined as substances isolated from plants, micro-organisms, insects, mammals, etc. [1].Since the term NPs is not inclusive of products of primary metabolism or those found in all living cells, e.g.proteins, nucleic acids, carbohydrates, and compounds that are substrates for biological transporters.The definition of NPs is rather restricted to products of secondary metabolism.Thus, NPs could also be referred to as secondary metabolites (SMs).Consequently, the terms NP, SM or naturally-occurring metabolite or compound (NOC) will be used interchangeably throughout the text.NPs and NP mimics comprise an important source for human therapeutics and are estimated to compass a significant market share among the approved drugs [2].Throughout human history, nature's chemical library has been proven to be a rich resource for many biologically active medicinal leads and drugs.However, major changes in trends of the drug discovery programs have occurred during the last four decades.Thus, drug discovery programs started to focus on target-based methods; after the emergence of in vitro assays and the development of large combinatorial chemical libraries.With the increasing need to have access to large libraries or chemical library collections for screening, which was not clearly possible with classical NP extraction and purification methods, these last changes enabled the shift from the natural products-based discovery programs to the high-throughput screening (HTS) technology as the main strategy for target-based drug discovery programs [3].
Following the availability of huge chemical screening libraries from combinatorial synthesis and valuable biological activities data collected from HTS, it became clear that medicinal chemists needed some criteria to distinguish between biologically active compounds and drugs.Christopher Lipinski was able to analyze a wealth of data that had accumulated from the HTS and failed drug discovery programs which had been stored in the World Drug Index (WDI) during the 1980s and 1990s.He then suggested that the concept of "drug-likeness" was linked to oral bioavailability, hence to the famous "rule of 5" (RO5).Oral bioavailability is the ability of a drug to be administered orally in an efficient manner, a concept often linked to the absorption, distribution, metabolism, excretion and toxicity (ADME/T) of drug molecules.This is the ability of the drug to cross the intestinal walls, go through general blood circulation, reach it's intended target site and eventually stay at the target site in sufficient time to carry out its pharmacological function, then be eliminated efficiently so as not to accumulate into amounts that are unsafe (toxic) to the body.The RO5 comprises a simple set of 4 physical-chemical property ranges, which give the biologically active compound higher probability to be orally bioavailable and promising favorable pharmacokinetic (ADME/T) profile [4].The RO5 include: 1) molecular weight (MW) less than 500 Da 2) computed logarithm of on n-octanol/water partition coefficient (clogP) less than 5 3) number of hydrogen bond acceptors (HBA, defined as the number of N and O atoms) less than 10 4) number of hydrogen bond donors (HBD, defined as the sum of OH and NH groups) less than 5 The RO5 derives its appellation from the fact that these numbers are all multiples of 5. A further proposal by Oprea and colleagues suggested more stringent rules to identify what is called lead-likeness [5,6].Compounds to be classified as "leads" were by definition: 1) less complex compounds with less chemical features, 2) display good biological activity with good ADMET profile and 3) amenable for chemical optimization to improve the biological activity or enhance the pharmacokinetic properties.
Hence, Oprea and coworkers were able to distinguished the lead-like chemical space from the drug-like space by stating the following lead-likeness conditions, otherwise known as Oprea lead-likeness filters: 1) MW: maximum 450 2) clogP: between -3,5 and +4,5 3) HBA: maximum 8 4) HBD: maximum 5 The statistical analysis of three compound classes; natural products, molecules from combinatorial synthesis, and drug molecules, displayed significant differences between the combinatorial synthetic libraries and NP libraries [7].In another chapter in this book, Saldívar-González et al. discuss some major compound databases of NPs and cheminformatics strategies that have been used to characterize the chemical space of natural products.The authors analyzed NPs from different sources and their relationships with other compounds are also discussed using novel chemical descriptors and data mining approaches that are emerging to characterize the chemical space of naturally occurring compounds [8].In this chapter, our discussion will first focus on NPs as drugs, then a discussion of previous studies comparing NPs and drugs, a brief discussion on NPs, principles of designing NP libraries, the concept of "natural product-likeness" and finally tools and used for the prediction of "drug-likeness" and NP-likeness.

2
Natural products as drugs

The proportion of natural products in catalogues of drugs
NPs play important roles in drug discovery, providing scaffolds as starting points for hit/lead discovery [9,10].Several known drugs, e.g. the anticancer compounds (1 to 5, Fig. 1), are from natural sources [11].We must note that NPs continue to play a role as drugs [2], as biological probes, and as study targets for synthetic and analytical chemists [12].About half of all approved drugs between 1981 and 2010 were shown to be NP-based [13].This study also showed that, of all approved drugs, NPs constituted 6% (unaltered), 26% (NP derivatives), 32% (NP mimics) or from NP pharmacophores, 73% of small molecule antibacterials and 50% of anticancer drugs (including taxol, vinblastine, vincristine, topotecan, etc.), Table 1.This implies that if structural features provided by nature are successfully incorporated into synthetic drugs (SDs), this would increase the chemical diversity available for small-molecule drug discovery [2].Howbeit, the reasons for the decline of interest by the pharmaceutical industry during the last two decades include the time factor involved in the search for NP lead compounds to the labor intensiveness of the whole process [14].This has now been rendered much easier within industrial settings by streamlined screening procedures and enhanced organism sourcing mechanisms [15].

The future of natural product drug discovery
Despite their evolving role in drug discovery [16,17], a recent chemoinformatic study involving a dataset of all published microbial and marine-derived compounds since the 1940s (comprising 40,229 NPs) showed that most NPs being published today bear close similarity to previously published structures, with a plateau being observed since the mid-1990s [18], Figure 2. The authors observed a general trend that the rate of discovery of new NPs had flattened out since the 1990s, structures with novel scaffolds had become scarce (Figure 3).In the mentioned study, two compounds were considered to be dissimilar by taking a Tanimoto cutoff of Tc < 0.4.This study had, thus, suggested that the range of scaffolds readily accessible from nature is limited, i.e. scientists were close to having described all of the chemical space covered by NPs, even though appreciable numbers of NPs with no structural precedents continue to be discovered.A reproduction of the same study using another dataset of 32,380 NPs, showed the same trend [19].However, a similar analysis on a dataset of randomly selected compounds from the ZINC database having overall lower structural similarity, the authors of the latter study further proved that such trends may be a feature of any growing database of chemical structures, rather than reflecting trends specific to NP discovery.Besides, a Kolmogorov-Smirnov test conducted on the dataset of 40,229 NPs, with P = 6.2×10 -14 , showed that since 1990, the rate of structurally novel compound discovery has dramatically outpaced random expectation [19].This implies that NPs discovered within the last three decades have been characterized by unprecedented chemical diversity, suggesting that the dream of continuously discovering new chemical structures from nature remains positive.NPs are unique, when compared with SDs in that they often contain more complex scaffolds and chiral centers, with more O-atoms and aromatic groups [20], Table 1.In addition, a study involving a comparison of SDs versus NPs showed that drugs derived from NP-based structures display greater chemical diversity and occupy wider regions of chemical space [2].This is because drugs which are synthesized based on NP pharmacophores often exhibit lower hydrophobicity and greater stereochemical content when compared with drugs which are completely of synthetic origin.Natural products mostly are more potent with higher binding affinities to a specific biological receptor.Consequently, their biological activities are often more selective than the synthetic compounds.A property distribution of three investigated datasets consisting of 3,287 NPs, 10,968 drug molecules and 13,506 randomly selected combinatorially-derived lead candidates, respectively, led to the analysis of the number of chiral centers, rotatable bonds, aromatic rings, complex ring systems, degrees of saturation, as well as the ratios of different heteroatoms (O, N, etc.) [7].This study showed that the main structural differences between NPs and combinatorially-derived libraries arise from properties introduced during the synthetic process in order to render combinatorial synthesis more efficient.Moreover, it was shown that, since drug molecules originate from both natural and synthetic sources, they occupy a joint area of chemical space spread between NPs and combinatorially-derived compounds.
Although NPs are often said not to satisfy all criteria of the RO5, a large proportion of NP libraries provide very good leads for drug development.For example, 60% of the 126,140 unique compounds in the DNP were found as 'drug-like', complying with the RO5 [21].Moreover, other investigations revealed that only 10% of analyzed NP libraries violated two or more of Lipinski's RO5 [17,22].Attempts to quantify biosynthetic bias in screening libraries showed that 83% (12 977) of core ring scaffolds present in NPs are missed in the combinatorial databases [23,24], and the inclusion of these missed NP fragments inside the screening libraries would improve the hit rates [23].In order to bring the drug-like space of synthesized chemical closer to the properties of natural products, a new measure called natural product-likeness score was proposed by Ertl et al. (section 7) [25].

The complexity and diversity of natural product scaffolds
NPs are, generally, compounds with large, diverse and structurally complex scaffolds (see previous chapter [1]).This is because during their often complex biosynthesis processes.The NPs contained in the DNP were previously according to their origins using a classification tree approach [26], with the aim of analyzing systems of rings that are typical according to the source.The high selectivity of natural products is attributed to their higher degree of complexity, higher number of stereogenic centers, more polar functional groups, and different ratios of atom types, e.g.N, O, S, and halogens [7].This study shed the light on the remarkable diversity of natural products occupying different region of the chemical space with distinct ranges of the physical-chemical properties.The complexity and diversity of NPs has been illustrated by use of the tool ChemGPS-NP [27], which was designed for handling the chemical diversity encountered in natural products research, in contrast to previously designed chemical global positioning system (ChemGPS) [28], which focused on the much more restricted drug-like chemical space.The uniqueness of the ChemGPS-NP tool is that, as contrasted to ChemGPS is that a better representation of biologically relevant chemical space is achieved by including complex structural examples from the creative chemistry of naturally-occurring bioactive molecules.Rules for plotting the chemical space maps include aspects of size, shape, lipophilicity, polarity, polarizability, flexibility, rigidity, and hydrogen bond capacity.In ChemGPS-NP the chemical space map coordinates are t-scores derived from principal component analysis (PCA) [29].This is achieved through a carefully selected subset of 35 descriptors that evaluate rules on a total set of 1779 chosen satellite and core structures [27].In Figure 4, we illustrate the complexity of NPs by the diversities of the three most important principal component values or t-scores (t1, t2 and t3) [27].4 Navigating the natural product chemical space Several investigations of the three-dimensional (3D) chemical space, occupied by compounds of synthetic and natural origins, using principal component analysis (PCA) have been published [2,7,27,[30][31][32][33][34][35].It was generally observed that, when compared with Food and Drug Administration (FDA)-approved drugs and SDs, the distribution of NPs in chemical space cover regions that lack representation in synthetic medicinal chemistry compounds (Figure 5), thus showing that NPs have a much wider coverage of chemical space.In the following sub-paragraphs, we examine a few case studies in more detail.
4.1 The Universal Natural Products Database (UNPD) versus FDA-approved drugs Figure 4A shows an example of the visualization of the chemical space of NPs according to the origin of the compounds from the Universal Natural Products Database (UNPD), shown in green, when compared with a dataset of drugs approved by the Food and Drug Administration (FDA), USA, shown in black [30].In this study, Gu and colleagues collected a total 197,201 NPs, by including data structures from the Reaxys database [36], the Chinese Natural Product Database (CNPD) [37], the Traditional Chinese Medicines Database (TCMD) [38], and the Chinese traditional medicinal herbs database (CHDD) [39].The authors then used PCA to explore their chemical space, by superposing with that of FDA-approved drugs.This study showed that the NPs occupied a much large portion of overlap between NPs and FDA-approved drugs in the chemical space, indicating that the investigated NPs had large quantity of potential lead compounds not yet approved by the FDA, thus NPs have a vast chemical diversity when compared with known drugs.Besides, the authors explored the network properties of NP-target networks and found that their polypharmacology was greatly enriched to those compounds with large degree and high betweenness centrality.Although a vast number of the NPs included in the UNPD had no biological activities, by docking all the derived 3D structures towards the 332 target proteins of the FDA-

Antidiabetic medicinal plant-based bioactive natural products versus known antidiabetic drugs
In this study, the authors developed a docking score-weighted prediction model based on drug-target network in order to evaluate the efficacy of medicinal plants for the treatment of type II diabetes mellitus (T2DM).The docking dataset was composed of >208,000 medicinal plant-based NPs from retrieved from the UNPD versus drugs from DrugBank [40], which were FDA-approved for T2DM treatment.The both datasets were docked against X-ray or NMR for each protein from RCSB protein databank (PDB) [41] which was related to T2DM pathogenesis, based on information of these proteins from KEGG Pathway database [42] and DrugBank.The binding free energy-based docking score () was used to evaluate the affinity between each compound and each protein and compared with the experimental binding affinities of each of the FDA-approved T2DM drugs against their respective target proteins.It could be inferred most of the NPs would be drug-like.Besides, the wide distribution of the investigated NPs in chemical space (Figure 5B) showed that there would be vast structural and functional diversity.Moreover, the large overlap between NPs and the 25 FDA-approved smallmolecule drugs for T2DM demonstrated that the NPs contained in the medicinal plants had a hopeful prospect for drug discovery for T2DM.

Analysis of the World of Molecular Bioactivity (WOMBAT) dataset against the Dictionary of Natural Products (DNP)
In an attempt to prove NPs to be a rich source of novel compound classes and new drugs, researchers from the working group of Tudor Oprea used the chemical space navigation tool ChemGPS-NP to evaluate the chemical space occupancy by NPs (from the DNP) and bioactive medicinal chemistry compounds from the World of Molecular Bioactivity (WOMBAT) database.Euclidean distances  between points  = ( ,  ,...,  ) and  = ( ,  ,...,  ) in Euclidean -space, computed using ChemGPS-NP scores of the compounds, based on computed molecular descriptor were determined using the formula (Equation 1): It was observed that two sets differed in coverage of chemical space (Figure 5C).Besides, several "lead-like" NPs were found to cover regions of chemical space not present in WOMBAT.The authors also used property based similarity calculations to identify NP neighbors of approved drugs and showed from this method that several of the NPs exhibited the same activities as their drug neighbors in WOMBAT.It could be concluded that NPs could be identified via this method as useful lead compounds for drug discovery in searching for novel leads with unique properties.From Figure 5C, it could be clearly seen that NPs cover parts of chemical space not represented in medicinal chemistry compound space, showing that these areas of chemical space are yet to be investigated and which could be of interest in drug discovery.
5 The design of "drug-like" natural product libraries and implications for drug discovery

Strategy for designing a library with focused properties
Classical natural product drug discovery is only able to undertake drug-likeness analysis after the compounds are isolated and their structures elucidated.However, there are success stories using approaches that address frontloading of both extracts and subsequent fractions with desired physico-chemical properties prior to screening for drug discovery [24,35].If NPs are often referred to as 'sources of inspiration', it simply implies that 'lead-like' libraries could be designed, starting from NP scaffolds, with many examples available in the literature [24,35,[43][44][45][46].However, when an NP is used as the guiding structure for the creation of 'NP-like' libraries, controlling certain molecular descriptors (e.g.MW, clogP, etc.) during the synthetic process is of major importance for the generation of 'lead-like' libraries [24,35].This simply means preparing a RO5-compliant library can ensure the timely development of natural product lead compounds at a reasonable rate.The reader is invited to carefully read reference [24] for a summary of what to take most seriously when preparing a NP drug-like library.

Case study
NPs are known for containing fused medium-sized rings (Figure 6).In an attempt to mimic such NPs, Ventosa-Andrés et al. synthesized several molecular scaffolds containing medium-sized fused heterocycles using amino acids [46].This is because amino acids are known to be useful building blocks used in natural reservoirs as well as chemistry laboratories to create structural diversity.The authors employed a traditional Merrifield solid-phase peptide synthesis, and cyclization was carried out through acid-mediated tandem endocyclic N-acyliminium ion formation.The last steps were nucleophilic addition with internal nucleophiles.These led to seven-, eight-, and nine-membered ringed molecular scaffolds with newly generated stereogenic centers in most cases, using variety of heteroatoms contained in the bicycles, e.g.N, O, and S. The details of the synthetic strategy are beyond this discussion.The reader is invited to consulted the original paper for further details [46].

"Drug-likeness" prediction on available electronic natural products libraries for chemoinformatics analysis
An entire chapter on NP databases and datasets for virtual lead discovery is available in this collection [47].
Researchers within the Kirchmair group have provided a recent analysis of available NP virtual and physical (vendor and academic) compound libraries which are highly useful for lead compound discovery [48,49].These include 25 virtual and 31 physical NP datasets employable for chemoinformatics projects, e.g. chemical space exploration, fragment based design, NP mimicking, and virtual screening.For each library, the authors provide detailed information on the extent of available structural information, and the overlap between the different datasets.From the analysis, it was observed that at least 10% of known NPs belong to the readily purchasable space (including small sized NPs for fragment-based design and macrocycles) and that with the renewed interest in NPs as lead structure, many more NPs and NP derivatives are being made available through on-demand sourcing, extraction and synthesis services.

Virtual libraries
Chen et al. recently described a large number of NP libraries, most of which can be freely accessed for chemoinformatics purposes towards lead compound discovery [48], further characterizing the chemical space thereof [49].Most of these libraries were curated by academic groups based on literature information.Our recent analysis of these libraries showed ~1,500 unique compound entries, the major limitation being the absence of biological activities and compound sample accessibility information.This collection was done by Chen et al, by keying in the data vendor catalogues from compound suppliers and collections from academic groups [48].With the goal of characterizing the chemical space of extent of coverage of chemical space by known and readily obtainable natural products and by individual natural product databases, the authors compiled comprehensive data sets of known and readily obtainable natural products from 18 virtual databases (including the Dictionary of Natural Products), 9 physical libraries, and the PDB [49].After removing all sugars and sugar-like moieties, which are not of interest in drug discovery projects, the authors were able to show that the readily obtainable NPs are highly diverse and populate regions of chemical space that are of high relevance to drug discovery.In some cases, substantial differences in the coverage of natural product classes and chemical space by the individual databases are observed, while >2,000 NPs were found to be co-crystallized with at least one biomacromolecule in an X-ray crystal structure within the PDB.
6 Computational methods for estimating drug-likeness and ADME/T It has been regrettably observed that many drugs often fail to enter the market due to poor pharmacokinetic (ADME/T) profiles [50].This has necessitated the inclusion of pharmacokinetic considerations at earlier stages of drug discovery programs [51,52].However, due to the high cost of such experiments, the use of computerbased methods is often sufficient at early stages of lead discovery to save time and cost [53][54][55].It requires, for example, less than 1 minute to screen 20,000 molecules in an in silico (computer-based) model, when compared with 20 weeks in the "wet" laboratory to do the same exercise [51].In silico modeling of drug-likeness often employs standard filters that have been established using the accumulated ADME/T data in the late 1990s.Thus, many pharmaceutical companies now prefer computational models that, in some cases, are replacing the "wet" screens [51].This has spurred up the development of several theoretical methods and software programs for ADMET prediction [56][57][58][59], even though some of the predictions could be disappointing [60].Most software tools currently used for ADMET prediction make use of statistical models like quantitative structure-activity relationships (QSAR) modeling [60,61] or knowledge-based methods [62][63][64].A promising lead compound may, therefore, be defined as one which combines an interesting biological activity against a drug target (potency) with an attractive ADMET profile.This saves time and cost by discarding compounds with uninteresting predicted ADMET profiles from the list of potential drug candidates early enough, even if these prove to be highly potent.Otherwise, the DMPK properties are "fine-tuned" in order to improve their chances of making it to clinical trials [65].Machine learning has now become very useful in the ADME/T profiling and drug-likeness prediction of compounds aimed at drug discovery [66].
7 Computational methods for estimating natural product-likeness 7.1 The natural product-likeness score The concept of 'NP-likeness' has been around for about a decade [25].It simply connotes the similarity of a molecule to the structure space covered by natural products, is a useful criterion in screening compound libraries and in designing new lead compounds.
Ertl et al. used a Bayesian measure which allows for the determination of how molecules are similar to the structural space covered by NPs.The NP-likeness score is an efficient approach to separate NPs from synthetic molecules (SMs).This score is very useful in virtual screening, prioritization of compound libraries toward NPlikeness, and the design of building blocks for the synthesis of 'NP-like' libraries [25].The NP-likeness score ( ) in Equation 2 ranges from -5 to 5 and is computed for a whole molecule, as a sum of contributions of  fragments,  , (considered to be independent of each other, Equation 3) in the molecule, normalized relative to the molecule size: where  is the number of NPs which contain fragment ,  is the number of SMs which contain fragment ,  is the total number of NPs, and  is the total number of SMs in the training set.This scoring system can be used as a filter for metabolites in computer assisted structure elucidation or to select natural-product-like molecules from molecular libraries for the use as leads in drug discovery.A distribution of the scores for the training (synthetic molecules and natural products) and the test datasets have been shown in Figure 7.

Machine learning methods to classify drugs from non-dugs
Attempts to define molecules likely to be drugs have been limited to simple numerical rules related to computed physico-chemical parameters, based on the RO5, which was derived following a statistical analysis of known drugs, e.g.70% of the "drug-like" compounds are known to have 0 to 2 HBDs, 2 to 9 HBAs, 2 to 8 rotatable bonds, and 1 to 4 rings.Although such models are quite simple to implement and very fast to compute; by simply putting off molecules that fail two or more of the criteria, more sophisticated computational models of "drug-likeness" have been developed using machine learning techniques (e.g.neural networks or decision trees).Machine learning models begin with a training set of compounds with divergent properties, e.g.drugs and non-drugs.A number molecular descriptors are computed for each dataset.The model is then developed using the training set and its computed descriptors.Using a dataset of drugs from the WDI and a set of compounds from Available Chemicals Directory (ACD) with no known activities (regarded as non-drugs), and using a set of Ghose-Crippen atom type count descriptors, Sadowski and Kubinyi developed neural network model with 92 input nodes, 5 hidden nodes and 1 output node to predict "drug-likeness" [70].This model could correctly predict 77% of the WDI drugs and 83% of the ACD molecules as drugs and non-drugs, respectively.Similar results have been obtained using neural network [71,72] and decision trees [73], using the same databases of drugs and non-drugs and the same set of descriptors.The performance of the decision tree model was comparable to that of the neural networks, correctly predicting ~83% of a validation set not included in the initial model.
9 A binary QSAR model to classify natural products from synthetic molecules Researchers within the working group of Jurgen Bajorath were able to build a model in order to distinguish between NPs from the DNP and SDs from ACD based on Shannon entropy ( ) analysis [74].The authors computed values of 98 descriptors from 2D representations of 199,420 ACD molecules and 116,364 NPs from the DNP, respectively. values were then defined as in Equation 4: where  is the probability of observing a particular descriptor value, computed from the number of compounds with a descriptor value that falls within a specific histogram bin, or "count" (c), for a specific data interval .Thus,  is calculated as in Equation 5: The  concept, initially employed in digital communication theory, is now popularly used in molecular descriptor analysis, since it is often combined with binary QSAR methodology to correlate structural features and properties of compounds with a binary formulation of biological activity (i.e., active or inactive).The authors adapted this approach to correlate molecular features with chemical source (i.e., natural or synthetic) by applying different combinations of such descriptors and variably distributed structural keys to the training sets of natural and synthetic molecules and used it to derive predictive binary QSAR models.The derived models were then applied to predict the source of compounds >80% prediction accuracy for the best models.

Conclusions
NPs have often been said not to abide by the RO5, as noted by Chris Lipinski himself [75], although about 60% of compounds from the DNP showed no violation of any of these "rules" [21].In this chapter, we have navigated from simple rule-based approaches for determining what could likely be orally bioavailable of 'drug-like", 'leadlike' of 'natural product-like' for more advanced approaches like neural networks, decision trees and a combination of Shannon entropy and binary QSAR.We have shown that naturally-occurring compounds represent a significant proportion of known drugs and that the chemical space occupied by natural compounds is much wider than those of synthetic compounds and known drugs, implying that a large proportion of possible 'drug-like' space is yet to be investigated.

Figure 2 :
Figure 2: Presentation of structural diversity (A) by plotting the median maximum Tanimoto scores as a function of time.The median average deviation shown as shaded blue region.(B) by plotting the absolute number of low similarity compounds (Tc < 0.4) per year [18].Figures reproduced by permission.

Figure 3 :
Figure 3: Presentation of structural diversity by plotting the number of compounds published per year and rate of novel compounds isolated as a percentage of total natural products isolated [18].Figure reproduced by permission.

3 Natural versus synthetic drugs 3 . 1
The uniqueness and potential of natural products for drug discovery

Figure 5 :
Figure5: The distribution of biologically relevant chemical space of NPs, when compared with SDs: (A) PCA analysis of NPs in the Universal Natural Products Database (UNPD) and FDA-approved drugs.The green triangles and black dots represent natural products and FDAapproved drugs, respectively[30]; (B) PCA analysis of NPs contained in medicinal plants and 25 FDA-approved drugs for the treatment of type II diabetes mellitus (T2DM).The black dots and green triangles represent natural products and FDA-approved drugs, respectively[32]; (C) Predicted score (tPS) plots of NPs (in green) and bioactive medicinal chemistry compounds from the World of Molecular Bioactivity (WOMBAT) database (in black)[31]; (D) Property space representation for lead-like molecules of some selected chemical libraries[24,35].Figures reproduced by permission.
Figure5: The distribution of biologically relevant chemical space of NPs, when compared with SDs: (A) PCA analysis of NPs in the Universal Natural Products Database (UNPD) and FDA-approved drugs.The green triangles and black dots represent natural products and FDAapproved drugs, respectively[30]; (B) PCA analysis of NPs contained in medicinal plants and 25 FDA-approved drugs for the treatment of type II diabetes mellitus (T2DM).The black dots and green triangles represent natural products and FDA-approved drugs, respectively[32]; (C) Predicted score (tPS) plots of NPs (in green) and bioactive medicinal chemistry compounds from the World of Molecular Bioactivity (WOMBAT) database (in black)[31]; (D) Property space representation for lead-like molecules of some selected chemical libraries[24,35].Figures reproduced by permission.

Figure 6 :
Figure 6: Fused structures and examples of natural products containing fused medium-sized rings.Figure reproduced by permission.

7. 2
Implementations of the natural product-likeness score The NP-likeness measure has now been implemented in several open-source, open-data tools, e.g. in a Taverna 2.2 workflow [67], which is available under Creative Commons Attribution-Share Alike 3.0 Unported License [68].It is also available for download as an executable stand-alone java package under Academic Free License [69].

Figure 7 :
Figure 7: Distribution of NP-likeness score for the training (synthetic molecules and natural products) and the test datasets [66].Figure reproduced by permission.

Table 2 :
List of abbreviations and definitions used in the text.