Unbiased Approach for the Identification of Molecular Mechanisms Sensitive to Chemical Exposures

Background: Targeted methods that dominated toxicological research until recently did not allow for screening of all molecular changes involved in toxic response. Therefore, it is difficult to infer if all major mechanisms of toxicity have already been discovered, or if some of them are still overlooked. Objectives: To identify molecular mechanisms sensitive to chemical exposures in an unbiased manner. Methods: We used data on 641,516 unique chemical-gene interactions from the Comparative Toxicogenomic Database. Only data from high-throughput gene expression experiments with human, rat or mouse cells/tissues were extracted. The total number of chemical-gene interactions was calculated for every gene, and used as a measure of gene sensitivity to chemical exposures. These values were further used in enrichment analyses to identify molecular mechanisms sensitive to chemical exposures. Results: Remarkably, use of different input subsets with non-overlapping lists of chemical compounds identified largely the same genes and molecular pathways as most sensitive to chemical exposures, indicative of an unbiased nature of our analysis. One of the most important findings of this study is that almost every known molecular mechanism may be affected by chemical exposures. Predictably, xenobiotic metabolism pathways and mechanisms of cellular response to stress and damage were among the most sensitive. Additionally, our analysis identified a range of highly sensitive molecular pathways, which are not widely recognized by modern toxicology as major targets of toxicants, including lipid metabolism pathways, longevity regulation cascade and cytokine mediated signaling. Discussion: Molecular mechanisms identified as the most sensitive to chemical exposures are relevant for significant public health problems, such as aging, cancer, metabolic and autoimmune disease. Thus, public health system will likely benefit from future research focus on these sensitive molecular mechanisms. Additionally, approach used in this study may guide identification of priority adverse outcome pathways (AOP) for in-vitro and in-silico toxicity testing methods.


Introduction
The total burden of disease costs associated with exposures to environmental chemicals likely exceeds 10% of the global domestic product (Grandjean and Bellanger 2017). The number of new chemicals is rising rapidly, with the Chemical Abstract Service Registry growing from 20 million to 156 million chemicals between 2002 and 2019 (Escher et al. 2020). This situation poses a significant challenge for regulatory toxicology and requires the development of new, rapid, costefficient, and reliable methods of toxicity testing. Indeed, in recent decades, regulatory toxicology has begun a revolutionary transformation that was outlined in a strategic vision in 2007 (NRC 2007). One important component of this transformation consists in a transition from outdated animal-based tests to in-vitro tests, targeting sensitive molecular mechanismstoxicity pathways or adverse outcome pathways (AOP) (Ankley et al. 2010;Haynes 2010;NRC 2007;OECD 2017;Vinken 2013). To ensure comprehensive coverage of all possible adverse effects using this approach, the battery of in-vitro tests should include all pathways that mechanistically connect exposures with adverse health outcomes. Given that the transformation of toxicology is going rapidly (EPA 2013;NASEM 2017), it is critically important to identify all of these pathways, or at least all major toxicity pathways.
Today, the toxicological community recognizes many molecular mechanisms responsible for a significant portion of all toxicity events, such as the formation of DNA adducts (Himmelstein et al. 2009;Jarabek et al. 2009), oxidative stress (Sies et al. 2017), activation of the AhR receptor (Strapacova et al. 2018), disruption of estrogenic signaling (Singleton and Khan 2003) and others. All these currently well-recognized mechanisms of toxicity were discovered during the period of toxicology development when high-throughput methods were unavailable. Therefore, it is impossible to infer if all major mechanisms of toxicity were already discovered, or if some of them were overlooked. Here, we hypothesized that data from toxicological -omics experiments rapidly accumulating in publicly accessible databases may help to answer this question.
Each chemical compound induces specific changes in gene expression in an exposed biological system. Some of these changes may be unique for a compound, but other may overlap with changes induced by other compound(s). Thus, by overlaying changes in genomic profiles resulting from many different exposures, we may identify genes and associated pathways that most commonly respond to exposure (Fig. 1A). In this study, we use publicly available data from the Comparative Toxicogenomic Database (CTD) (Davis et al. 2019) on changes in gene expression in response to a broad range of chemical compounds to identify, in an unbiased manner, the molecular mechanisms most sensitive to chemical exposure.

Database Creation
A database was created by extracting data on chemical-gene interactions from the CTD (Davis et al. 2019) on 08.24.2018 using the following filtering criteria. First, data was extracted only from experiments that used high-throughput approaches for gene expression analysis (microarray, RNA-seq). In addition, we selected data only from experiments that used human, rat or mouse cells, tissue or the whole organism for gene expression analysis in in vitro and in vivo studies. Further, we removed 1,846 olfactory receptor genes from the database, as the nomenclatures assigned to these genes in mouse, rat and human are different and independent. The resulting database included 641,516 entries, each representing one chemical-gene interaction (significant change in expression of a gene in response to exposure to a chemical compound) reported in a published study. Approach of this study for the unbiased identification of genes sensitive to chemical exposures. A -Each circle represents a group of genes differentially expressed in response to one chemical in one experiment. Genes sensitive to exposure to more than one chemical populate the zones where circles overlap. The numbers of chemical-gene interactions shown in every zone reflect sensitivity of corresponding genes to chemical exposures. B and C -The distribution of chemical-gene interactions by model organism and by annotation terms for chemical usage respectively in the analyzed database. Individual chemicals were annotated with up to 3 annotation terms in accordance with their most common uses.
At the next step, all chemical compounds in the database were manually annotated to identify their major uses. To obtain this information a search was conducted using the name of the chemical as the key word in Wikipedia, PubChem and PubMed. Based on the obtained descriptive annotation, every compound was categorized in accordance with the following terms: pharmaceutical, recreational drug, research, warfare, endobiotic, agricultural, cosmetics, environment, food components, industrial, and pollutant. The term "pharmaceutical" was used for all drugs, including prescription, over the counter, and traditional medicines. Chemicals that are currently going through preclinical and clinical trials were also categorized as "pharmaceutical". The term "research" was used for chemicals, which are mainly used for research purposes, such as components of molecular biology or analytical chemistry protocols. Some of these chemicals, such as protein kinase inhibitors, for example, have been or are currently tested as candidate pharmaceutical drugs. Given that information about preclinical testing is not always readily available (Piller 2020), we assigned both "pharmaceutical" and "research" annotation terms to them. The term "recreational drug" was used for compounds currently used as recreational drugs, their major metabolites, and compounds that are not currently used as recreational drugs but which possess these properties making them potential recreational drugs. The term "warfare" was used for compounds used as chemical weapons. The "endobiotic" annotation term was assigned to molecules synthetized in mammalian organisms as normal components of healthy physiology, such as cholesterol, hormones, bile acids and other. The term "agricultural" was used for fertilizers and pesticides. The term "cosmetics" was used for a broad range of chemical compounds used in cosmetics and perfumes. The annotation term "environment" was used for oxygen and ozone. The term "food component" was used for a broad range of compounds that can be found in food, such as dietary nutrients, food additives, byproducts of food processing and others. The term "industrial" was used to annotate a broad range of chemicals used as intermediates or final products in a variety of industrial processes, applications, and final products resulting from industrial process. Finally, the term "pollutant" was used for these chemicals, which do not have current uses but are produced by natural processes and human activity such as products of incomplete combustion of organic material, air and water pollutants, toxins produced by algal blooms and similar. For chemicals that fell into multiple categories, we used up to three annotation terms reflecting the most common uses. All authors of the manuscript annotated an equal numbers of chemicals and all annotations were than checked by the corresponding author to insure uniformity of annotation approaches.

Bioinformatic analysis
We hypothesized that due to the presence of a high number of pharmaceuticals (40%) and other biologically active compounds in the database, some findings may be clouded by the fact that many chemical compounds were intentionally designed to target important molecular pathways in mammalian cells. To differentiate between unintended molecular responses to chemical exposure and compare with responses to biologically active chemicals, we split the original database into 2 subsets. All chemicals annotated with one or more of the following terms: pharmaceutical, drug of abuse, research, warfare and endobiotic, -were combined into a database subset, named "Bioactive" (BA). The second subset "Non-bioactive" (NBA) included all chemicals, which did not have these listed terms in their annotations. Thus, this second subset included only chemicals annotated with the following terms: agricultural, cosmetics, environment, food components, industrial, and pollutant.
For every gene in a database, we calculated the total number of chemical-gene interactions. Each individual interaction used for this calculation was one line in the database and represented a unique combination of the original study, biological model and chemical compound. Similarly, for each gene we calculated the total number of activating (gene expression increases in response to exposure) and suppressive (gene expression decreases in response to exposure) interactions.
The resulting lists of genes were used in bioinformatic tools developed for enrichment analysis of gene expression data. The numbers of chemical-gene interactions were used in place of differential expression values. The short lists of top genes with highest numbers of interactions were uploaded to Metascape (Tripathi et al. 2015), and enriched functional terms were explored with default settings. The lists of all genes with their respective chemical-gene interactions numbers were further used for gene set enrichment analysis (GSEA). This approach is particularly effective for the identification of biologically significant changes in activity of the whole pathway or other biological category, as its enrichment function stems from all, even small, changes in multiple members of a gene set (pathway) (Boylan et al. 2015). The details of the method and statistical approaches used by GSEA are described elsewhere (Mootha et al. 2003;Subramanian et al. 2005), see also Supplementary materials 1. Originally, GSEA was developed to characterize the cumulative shift of all genes in a particular pathway towards an increase or decrease of expression. As such, it was designed for an input in which values of gene expression changes have semisymmetrical distribution, where comparable numbers of genes are up-and downregulated. To prepare datasets suitable for GSEA, we subtracted the same number from the values of chemicalgene interactions for every gene, to achieve equal negative and positive area under the curve for these values. The resulting gene lists were uploaded to GSEA and were analyzed against three independent open access databases of pathways: Hallmark (Liberzon et al. 2015), Reactome (Fabregat et al. 2018;Jassal et al. 2020) and KEGG (Kanehisa and Goto 2000). The following stringent criteria were used to identify molecular mechanisms most and list sensitive to chemical exposures: normalized enrichment score (NES) ≥ 1.9 or ≤ -1.9 and FDR q ≤ 0.05.

Database characteristics
The database used in this study (Supplementary Data File S1) consists of 641,516 individual chemical-gene interactions reported in 2,180 original studies using high-throughput gene expression analysis only. All these studies were done with 3 mammalian species (humans, mice, rats) or their cells and tissues (Fig. 1B). Chemical-gene interactions included 26,704 unique genes and 1,265 unique chemicals. Distribution of chemicals in accordance to their usage annotation terms is shown in Fig. 1C. This database was further split into 2 subsets defined by a purposeful design of chemicals to interact with essential molecular pathways. The bioactive (BA) subset included chemical compounds intentionally designed to interact with essential molecular mechanisms in mammalian cells. The non-bioactive (NBA) subset contained all other chemical compounds. Characteristics of these subsets are shown in Table 1.

Genes and pathways sensitive to "bioactive" and "non-bioactive" compounds
To test if our identification of molecular mechanisms sensitive to chemical exposures might be clouded by including chemicals intentionally designed to target specific molecular pathways in mammalian cells, we compared the number of chemical-gene interactions for every gene in the BA and NBA subsets of the database. Values of chemical-gene interactions were similar in both subsets, with Pearson correlation = 0.89 ( Fig. 2A).
We further compared molecular pathways enriched with genes sensitive to "bioactive" and "nonbioactive" chemicals. To do so, we conducted gene-set enrichment analysis (GSEA) with gene lists from BA and NBA subsets and compared obtained normalized enrichment scores (NES) for Hallmark, KEGG and Reactome pathways. The results demonstrate a high level of similarity between pathways enriched with genes from both subsets ( Fig. 2B-D). In these graphs, discordant distribution between BA and NBA datasets are mostly seen for pathways with non-significant enrichment and low positive and negative NES (≥ 1.5 or ≤ -1.5). The Pearson correlation for pathway enrichment in BA and NBA datasets was 0.97 for Hallmark, 0.94 for KEGG and 0.93 for Reactome collections.
Given that very similar lists of highly sensitive genes and molecular pathways were identified using 2 subsets of the database with non-overlapping lists of chemicals, we concluded that the number of chemical-gene interactions in both subsets is high enough to mask unique responses to individual compounds with specific bioactivity properties and subsequent analysis was done on the entire database.

Top genes and pathways activated or suppressed by chemicals
At the next step, we identified the number of activating or suppressive chemical-gene interactions for every gene in the database. The top 20 genes with the highest numbers of activating and suppressive interactions are shown in Table 3. Among the highest scoring genes for suppressive interactions there were important members of the GH-IGF signaling cascade (GHR, IGFBP3, and IGF1), cytokines (CXCL8, CXCL12, CCL2), cyclins (CCNB1, CDK1, CCNA2, CCND1), lipid metabolism genes (THRSP, HMGCS1, FASN) and more. Among the highest scoring genes for activating interactions, there were components of the AP-1 transcription factor (FOS, JUN, ATF3), and other transcription factors -regulators of growth, proliferation, and apoptosis (MYC, EGR1), proteins that promote cell cycle arrest and/or apoptosis in response to stress stimuli (GADD45A, DDIT3, CDKN1A), microsomal (CYP1B1, CYP1A1, HMOX1) and cytosolic (NQO1, SRXN1, TXNRD1) oxidoreductases, and enzymes of glutathione synthesis (SLC7A11, GCLC, GCLM), including 2 rate-limiting enzymes (CCL2 and GDF15). The distribution of all genes in the database in accordance to the numbers of suppressive and activating chemical-gene interactions is shown in Fig. 7A. This figure illustrates that most of the genes sensitive to chemical exposures have both suppressive and activating interactions. To identify genes, showing differential expression in one direction in response to chemical exposure, the ratio of activating to suppressive interactions for genes (RASIG) with more than 50 total chemical-gene interactions was calculated. Application of this threshold resulted in a list of 799 genes. Log2 RASIG for these genes has semi normal distribution, indicating that most genes sensitive to a variety of chemical exposures undergo expression changes in both directions (Fig.  7B). The top 20 genes on each margin of this distribution are shown in Table 4. Next, we used GSEA to identify which pathways are mostly regulated by chemical exposures in the direction of activation, suppression, or both. Pathway enrichment were identified for two lists of genes, one including only suppressive and another including only activating interactions. Pearson correlations (rp) for enrichment (NES) of pathways by the 2 gene lists were as follows: 0.80 for Hallmark (Fig. 7C), 0.79 for KEGG (Fig. 7D) and 0.74 for Reactome datasets (Fig. 7E). Thus, most pathways sensitive to exposures are sensitive to both activation and suppression by chemical compounds. The top pathways highly enriched by both gene lists are shown in Table 5. These pathway include response to chemical exposures and associated stress ("Xenobiotic metabolism", "UV response up", "Hypoxia", "Apoptosis", "Glutathione metabolism" and "P53 signaling pathway"); immune response ("Interferon gamma response", "TNF alpha signaling via NF-kappa B", IL4 and IL13 signaling", "Focal adhesion", and "Response to elevated platelet Ca2+"); regulation of lipid metabolism ("Fatty acid metabolism", "mTORC1 signaling", "PPAR signaling pathway", "Regulation of lipid metabolism by PPAR alpha"); response to xenoestrogens ("Estrogen response late", Estrogen response early"); and cell cycle, development and cancerrelated pathways ("Pancreatic cancer", Pathways in cancer", "Cell cycle", "Mitotic G1, G1/S phases", "Epithelial-mesenchymal transition", "Focal adhesion", "P53 signaling pathway", "mTORC1 signaling"). To identify pathways predominantly activated or predominantly suppressed by chemical exposures we ranked pathways in accordance with the ratio of NESs identified using lists of genes with suppressive or activating interactions respectively. Only pathways for which at least one NES was ≥ 1.9 or ≤ -1.9 and was identified with FDR q ≤ 0.05 were used for this analysis. We call the ratio of NESs used here RASEPratio of activating to suppressive enrichment of a pathway. Pathways with RASEP > 2 are enriched 2-fold higher by genes that are activated by chemicals than by genes that are suppressed by chemicals, and vice versa, pathways with RASEP < -2 are enriched 2-fold higher by genes that are suppressed by chemicals than by genes that are activated. No dataset from the Hallmark collection passed this threshold (Fig. 7F). In fact, all pathways, except one ("Pancreas beta cells", RASEP = -1.7), had RASEP values in the interval between -1.2 and 1.2. Among KEGG datasets, one passed our 2-fold criteria, -"Maturity onset diabetes of the young" (RASEP = -3.1) (Fig. 7G). Among Reactome datasets, three had a RASEP < -2 ("Regulation of gene expression in beta cells", RASEP = -2.6; "Regulation of beta cell development", RASEP = -2.2; "Complex I biogenesis", RASEP = -2.2 and "PKMTS methylate histone lysines", RASEP = -2) (Fig. 7H). One Reactome dataset had a RASEP > 2 ("Acyl chain remodeling of PE", RASEP = 2.7).

Discussion
Our study is an attempt to identify molecular mechanisms sensitive to chemical exposures in an unbiased manner. The approach used in this study is based on the assumption that sensitive molecular pathways can be identified as overlapping changes in gene expression from many individual experiments using different biological systems and chemical exposures.

We have enough data to identify molecular pathways sensitive to chemical exposures in an unbiased manner
To identify in an unbiased manner genes and pathways that represent the most generalized response to chemical exposure large numbers of overlaying datasets is needed. We asked, whether the database used in our study is truly big enough (saturated) to identify most sensitive genes and pathways, or if the results of the analysis change (unsaturated database) when more chemical-gene interactions from new studies are added. To answer to this question, we split our database into 2 non-overlapping subsets in terms of lists of chemicals (BA and NBA) and identified the number of chemical-gene interactions per gene and enrichment values of pathways for these 2 subsets independently. The results for the 2 subsets were highly concordant at the gene level ( Fig. 2A). Concordance was even higher at the pathway level ( Fig. 2B-D), with Pearson correlation of pathway enrichment values in the range of 0.93-0.98. Given that 2 non-overlapping subsets of the database produce highly similar results in the identification of the most sensitive genes and pathways, we concluded that our database is saturated enough to produce unbiased results.

The majority of known molecular pathways are sensitive to chemical exposures
One important finding of this study consists in the observation that majority of known molecular pathways is sensitive to chemical exposures. In fact, out of 1,735 combined Hallmark, KEGG and Reactome datasets 1,173 (67.6%) were positively enriched in GSEA analysis (Supplementary  Table S1) and 883 (50.9%) were significantly positively enriched (NES ≥ 1.9 or ≤ -1.9 and FDR q ≤ 0.05). The numbers of negative and significant negative enrichment (corresponding pathways were non-sensitive to chemical exposures) were 10 (0.58%) and 1 (0.06%) respectively. In the previous decade, a significant effort was made to develop in vitro toxicity testing approaches, which test perturbation of specific molecular mechanisms known to be connected with adverse phenotypic outcomes (toxicity pathways, adverse outcome pathways) (Ankley et al. 2010;Haynes 2010;NRC 2007;OECD 2020). One important question that remains unanswered is what pathways should be covered by in vitro assays to ensure that we do not miss possible toxicities of chemicals using this new paradigm of toxicity testing. Our data suggest that almost every known molecular pathway may be affected by chemical exposures. Should there be evidence connecting these pathways with adverse outcomes, these pathways must be included in the list of targets for in-vitro testing.

Unbiased analysis confirms sensitivity of previously recognized mechanisms of toxic response
Modern toxicology recognizes a range of molecular mechanisms responsible for a significant part of toxicity. We asked if these mechanisms are enriched in our analysis. The ability of our approach to identify these pathways as sensitive to chemical exposures may be considered a quality control for the efficiency of our strategy. Indeed, our analysis demonstrates that mechanisms involved in response to xenobiotics and metabolism of xenobiotics are among the most sensitive to chemical exposures.
Corresponding pathways were enriched in both Metascape (Fig. 3B), and GSEA ( Fig. 3C-D) analysis. AhR-regulated CYP1A1 and CYP1B1 were in the list of top 20 genes with highest numbers of chemical-gene interactions (Table 2). Many microsomal and cytosolic oxidoreductases were also among the highest scoring genes for activating chemical-gene interactions (Table 3), while the "xenobiotic metabolism" pathway was equally sensitive to activating and suppressive action of chemicals (Table 5). Furthermore, our data highlights the sensitivity of the "glutathione metabolism" pathway ( Fig. 3D, Table 3, Table 5), which is an indicator of ROS production in response to exposures (Mailloux et al. 2013). Some genes of the oxidative stress response pathway are among the highest scoring for the number of activating chemical-gene interactions (Table 3), although this pathway was highly enriched by both activating and suppressive chemical-gene interactions (Table 5). Additionally, our analysis identified many molecular pathways associated with "cellular response to stress" (Fig. 3B-E), including "response to unfolded protein" (Fig. 3B) and "UV response" (Fig.  3C, Table 5) as sensitive to chemical exposures. Specifically, "p53 signaling" was among the most highly enriched pathways in both Metascape and GSEA ( Fig. 3B-D, 4E, and 5), and was equally enriched by both activating and suppressive chemical-gene interactions (Table 5). In response to a stress signal, the p53 protein is activated by post-translational modifications, and this leads to either cell cycle arrest and senescence, or cellular apoptosis (Harris and Levine 2005;Jin and Levine 2001). Therefore, it is not surprising that "apoptosis", "cell cycle" and other related categories were also among the mechanisms most sensitive to chemical exposures (Fig, 3B-E, 4B, F, L). Both "apoptosois" and "cell cycle" were equally sensitive to activating and suppressive chemical-gene interactions ( Table 5). The p53 pathway is reciprocally connected with the hypoxic pathway (Sermeus and Michiels 2011), which is a long recognized response to a broad range of chemical compounds with different toxicities (Lee et al. 2007). "Hypoxia" or "HIF1 TF pathway" were among the top sensitive molecular mechanisms identified in our analysis (Fig. 3B, C, 4D) and this pathway was equally sensitive to activating and suppressive chemical-gene interactions (Table 5). Given high enrichments of pathways related to cellular stress and the cell cycle, it is not surprising that many cancer-related pathways were also highly enriched in our analysis (see Fig.  4G for example).
Finally, estrogenic signaling, recognized as a pathway affected by a broad range of chemicals (Singleton, Khan, 2003) was among the Hallmark datasets most sensitive to activating and suppressive interaction with chemical compounds (Table 5).
All these findings, taken together, suggest that our approach for the identification of molecular mechanisms sensitive to chemical exposures produced results concordant with the current state of toxicological knowledge.

Lipid metabolism and PPAR signaling
Regulation of lipid metabolism was identified by this study among the molecular mechanisms most sensitive to chemical exposures. This finding was supported by Metascape analysis of the most sensitive genes (Fig. 3B) and by GSEA analysis, where "fatty acid metabolism" was enriched in all 3 analyzed collections of datasets (Fig. 3 C-E, 4K), along with many other relevant categories: "cholesterol homeostasis" (Fig. 3C), "adipogenesis" (Fig. 3C), "PPAR signaling pathway" (Fig.  3D, 4H, 6), "regulation of lipid metabolism by PPAR alpha" (Fig. 3E). "Fatty acid metabolism" and "PPAR signaling pathway" were among the top categories equally sensitive to activating and suppressive chemical-gene interactions ( Table 5).
The peroxisome proliferator-activated receptors (PPARs) are a subfamily of nuclear receptors/transcription factors activated by a range of ligands, including free fatty acids, eicosanoids and Vitamin B3, which consists of three isotypes: PPARα, PPARβ/δ and PPARγ encoded by separate genes (Michalik et al. 2006). All three isotypes as well as downstream cascades were among the top scoring genes for chemical-gene interactions (Fig. 5). PPARs modulate genes that regulate energy balance, glucose homeostasis, triglyceride and lipoprotein metabolism, fatty acid synthesis, oxidation, storage, and export, as well as cell proliferation (Berger and Moller 2002;Gross et al. 2017;Liss and Finck 2017;Pawlak et al. 2015;Wang, Y. X. 2010). PPAR mediated dysregulation of these metabolic processes contributes to the pathogenesis of metabolic diseases such as obesity, metabolic syndrome, diabetes, and non-alcoholic fatty liver disease (NAFLD) (Gross et al. 2017;Pawlak et al. 2015;Semple et al. 2006). The ability of some xenobiotics, such as phthalates (Casals-Casas et al. 2008;Ito and Nakajima 2008;Li, Y. et al. 2017), per-and polyfluoroalkyl substances (PFAS) (Behr et al. 2019;Jacobsen et al. 2018;Zhao et al. 2017) and some others (Casals-Casas et al. 2008;Peraza et al. 2006) to interfere with PPAR signaling is long-recognized (Xi et al. 2020). However, to our knowledge, promiscuous sensitivity of the PPAR regulatory cascade to a broad range of chemical agents has not shown before.
The sensitivity of lipid metabolism pathways to chemical exposures may be relevant to the current epidemic of metabolic disease, one of the biggest public health issues in modern day. According to World Health Organization (WHO), worldwide the prevalence of obesity nearly tripled between 1975 and 2016 (WHO 2016). Other rapidly growing metabolic conditions include Type 2 diabetes, metabolic syndrome and non-alcoholic fatty liver disease (NAFLD). Approximately two out of every five Americans will develop Type 2 diabetes at some point during their adult lives (Gregg et al. 2014). About one third of US adults have metabolic syndrome (National Center for Health Statistics, Division of Health Interview Statistics 2012). Non-alcoholic fatty liver disease (NAFLD) is the most common form of chronic liver disease among adults and children (Kleiner et al. 2005;Patton et al. 2006) with 33% to 88% prevalence (Browning et al. 2004;Ryan et al. 2002;Soejima et al. 2003;Szczepaniak et al. 2005). Increased incidence of obesity and other metabolic conditions cannot be completely explained by poor diet and lack of exercise (Kelly et al. 2008;Wang, Y. and Lobstein 2006;Wang, Y. et al. 2008). Emerging evidence links epidemic of metabolic disease with exposures to environmental xenobiotics (Heindel et al. 2015;Mendrick et al. 2018), but the role of chemical exposure in metabolic health is not well understood. Our data provide an unbiased evidence, supporting sensitivity of lipid metabolism to a broad range of chemical agents. This finding may have significant public health implications and requires additional research.

GH/IGF/mTOR and GH/IGF/FOXO signaling cascades
Another molecular cascade identified in our study as highly sensitive to chemical exposures starts from growth hormone (GH)insulin-like growth factor (IGF) signaling and includes downstream pathways controlled by the GH-IGF axis. These pathways include mechanistic target of rapamycin (mTOR) and forkhead box O (FOXO) signaling cascades (Fig. 8). Indeed, "mTOR complex 1 signaling" was among molecular pathways with the highest NES (Fig 3C, 4C) and it was equally sensitive to both activating and suppressive chemical-gene interactions. "FOXO mediated transcription" (Fig 3E) and "glycolysis" (Fig. 3C), process activated by FOXO, were also identified as metabolic pathways sensitive to chemical exposures. "Regulation of IGF transport and uptake by IGFBPs" was also highly enriched (Fig. 3E). IGF1 was among the top genes with > 200 chemical gene interactions ( Table 2). The list of the top 20 genes scoring highest for suppressive interactions included three critical members of the GH-IGF signaling cascade (GHR, IGFBP3, and IGF1) ( Table 3). GH regulated insulin-like growth factor binding protein, acid labile subunit (IGFALS) had the second lowest RASIG of all 26,704 genes. Thus, it appears that the whole cascade is highly sensitive to chemical exposures and at the level of GH-IGF it is predominantly suppressed by chemicals.
IGFs are growth factors produced by tissues in response to GH secreted by pituitary. The majority of IGF1 in humans is produced by the liver (LeRoith and Yakar 2007). About 75-80% of serum IGFs are found in a ternary complex consisting of an IGF, IGFBP3 and IGFALS (Allard and Duan 2018). IGF-1 initiates intracellular signaling by binding to receptor tyrosine kinases (RTKs): the IGF-1 receptor (IGF1R), and the insulin receptor (IR). IGF-1 binds with 100-500 times higher affinity to IGF1R than to insulin receptor (IR), while insulin binds with 100 times higher affinity to IR than to IGF1R (Werner et al. 2008). Binding to RTKs activates downstream PI3K/Akt signaling, and IGF-1 is one of the most potent natural activators of this cascade (Peruzzi et al. 1999). Fig. 8. The growth, longevity, and metabolism cascade identified as highly sensitive to chemical exposures: 1 -GH initiates intracellular signaling via GH receptor/JAK2/STAT pathway and induces production of IGF1; 2 -about 75-80% of serum IGFs is bound in a complex with IGFBP3 and IGFALS; 3binding of IGF1 to IGF or insulin receptors activates PI3K/AKT signaling; 4activated Akt suppresses FOXO transcription factors and activates downstream lipogenesis, adipogenesis and glycolysis; 5activated Akt suppresses TSC complex, which is the major mTORC1 inhibitor; activated mTORC1 suppresses catabolism of lipids via PPARα dependent mechanism and activates lipogenesis and adipogenesis via PPARγ and SREBP transcription factors; 6a variety of stress-signaling pathways merge on mTORC1. All nodes shown in red represent the top 1% of genes most sensitive to chemical exposures (≥123 chemicalgene interactions), nodes shown in pink represent the top 10% of genes most sensitive to chemical exposures (≥ 56 chemical-gene interactions). Aktprotein kinase B; ATF4 -activating transcription factor 4; DYRK3dual specificity tyrosine phosphorylation regulated kinase 3; FOXOforkhead box O; GHgrowth hormone; HIF1hypoxia inducible factor 1; IGF1insulin-like growth factor 1; IGFALSinsulinlike growth factor binding protein, acid labile subunit; IGFBP3insulin-like growth factor binding protein 3; IRSinsulin receptor substrate; JAK2 -Janus kinase 2; mTORC1mechanistic target of rapamycin complex 1; p53tumor protein p53; PDK1pyruvate dehydrogenase kinase 1; PI3Kphosphatidylinositol-4,5-bisphosphate 3-kinase; PPARγperoxisome proliferator activated receptor alpha; PPARγperoxisome proliferator activated receptor gamma; PTENphosphatase and tensin homolog; REDD1regulated in development and DNA damage responses 1; RHEB -RAS homolog enriched in brain; SREBPsterol regulatory element-binding proteins; STATsignal transducer and activator of transcription; TSCtuberous sclerosis complex.
PI3K/Akt activates mTOR, a serine/threonine kinase, which emerged recently as a central masterswitch between anabolism and catabolism (Dibble and Manning 2013;Laplante and Sabatini 2012;Laplante and Sabatini 2013). Chronic activation and suppression of the mTOR signaling pathway are associated with obesity and Type 2 diabetes as well as cancer, neurodegeneration, aging and other aging-associated conditions (Khamzina et al. 2005;Laplante and Sabatini 2009;Laplante and Sabatini 2012;Tremblay et al. 2007;Zoncu et al. 2011). Both fatty acid metabolism (Antikainen et al. 2017;Ben-Sahra and Manning 2017;Lamming and Sabatini 2013;Mao and Zhang 2018), and PPAR signaling (Kim, J. E. and Chen 2004;Sengupta et al. 2010), discussed in the previous section, are regulated downstream of the mTOR pathway. Activation of PI3K/Akt also results in suppression of FOXO signaling via nuclear exclusion and degradation of this transcription factor (Matsuzaki et al. 2003). FOXO positively regulates glucose production via transcriptional activation of gluconeogenesis and glycogenolysis pathways (Daitoku and Fukamizu 2007) and negatively regulates adipogenesis (Farmer 2003) by suppressing transcription of PPARγ (Armoni et al. 2006).
IGF1 is a major regulator of growth, and its secretion patterns in different organisms are linked with different life-history strategies. Species with high IGF concentrations grow, reach reproductive age, reproduce, and age rapidly, while those with low basal levels of IGF have much longer periods of development, reach sexual maturity later, and have a longer lifespan (Swanson and Dantzer 2014). Multiple mouse and other models demonstrate that suppression of different nodes in the GH-IGF1 signaling cascade increase longevity (Bartke 2019;Junnila et al. 2013;Kim, S. S. and Lee 2019;Rincon et al. 2005). Human studies support these findings by revealing greater longevity in people with smaller body size (Chmielewski 2016;Samaras 2014;Samaras 2013). Downstream of GH-IGF1, both mTOR (Blagosklonny 2013a;Blagosklonny 2013b;Papadopoli et al. 2019;Wilkinson et al. 2012) and FOXO (Daitoku and Fukamizu 2007;Jiang et al. 2019) cascades are considered as central mechanisms of longevity regulation.
Ability of diverse chemicals to modulate the GH-IGF axis was noticed previously (Scarth 2006). However, this axis does not attract a lot of attention from researchers despite its relevance for the current epidemic of metabolic disease (see section 4.4) and its importance as a major regulator of growth, metabolism and aging. For example, a PubMed search done on February 14, 2020 for "endocrine disruption" in combination with different hormone names retrieved the following numbers of papers: 1100 for estrogen 502 for "androgen, 365 for testosterone, 353 for thyroid, 134 for progesterone, 71 for cortisol and only 30 for insulin-like growth factor or 24 for IGF. Lack of attention to endocrine disruption of the GH-IGF axis was also reported in our previous study (Suvorov and Takser 2008).

Cytokine mediated signaling
Another group of molecular pathways identified by our study as highly sensitive to chemical exposures consists of immune response pathways. "Cytokine mediated signaling pathway" and "TNF signaling pathway" were highly enriched in our Metascape analysis (Fig. 3B). Additionally, "TNF alpha signaling via NF kappa B" was the top enriched pathway among the Hallmark collection of datasets (Fig. 3C). Other immune-related pathways identified by GSEA were "Interferon gamma response", "inflammatory response", "IL2 STAT5 signaling" (Fig. 3C), "complement and coagulation cascades", "adipocytokine signaling pathway" (Fig. 3D), and "IL4 and IL13 signaling" (Fig. 3E). Among these listed pathways "Interferon gamma response", "TNF alpha signaling via NF kappa B" and "IL4 and IL13 signaling" were equally sensitive to both activating and suppressive chemical-gene interactions (Table 5).
These findings demonstrate a high sensitivity of cytokine-mediated pathways to chemical exposures. Although having a broad range of functions, TNF alpha and interferon gamma (IFN-γ) are both cytokines involved in systemic inflammation and activation of the acute phase reaction, including the induction of complement and coagulation factors (Gruys et al. 2005). IFN-γ plays an essential role in the development and severity of systemic autoimmunity (Pollard et al. 2013). IL-2 also plays an important role in the maintenance of peripheral self-tolerance. Mice lacking IL-2 or IL-2 receptor subunits develop lymphoproliferative autoimmune syndrome (Malek and Bayer 2004). Similarly, autoimmune disease development is linked to IL-2 deficiencies in humans (Caudy et al. 2007;Roifman 2000;Sharfe et al. 1997). Interleukin-4 and IL-13 are the signature cytokines of the type II inflammatory response. When dysregulated, their activity may trigger asthma, atopic dermatitis, allergic rhinitis, COPD, cancer, inflammatory bowel disease, autoimmune disease and fibrotic disease (May and Fung 2015).
Two major outcomes of a dysfunctional immune system are allergy and autoimmunity. Prevalence for both conditions follows upward trends. For example, among children aged 0-17 years, the prevalence of food allergies in the USA increased from 3.4% in 1997-1999 to 5.1% in 2009-2011. The prevalence of skin allergies increased from 7.4% to 12.5% during the same time-period (Jackson et al. 2013 (Schmidt 2011). Our data demonstrate that immune mechanisms dysfunctional in allergy and autoimmunity are among the molecular mechanisms that are highly sensitive to chemical exposures. This finding indicates the necessity of additional efforts to dissect the connection between chemical exposures and diseases of the immune system.

Many pathways sensitive to chemical exposures are central in cancer biology
A significant portion of the pathways discussed in previous paragraphs 4.3-4.6 and shown in figure  3B-E play important roles in different kinds of cancers. For example, mTOR is the major growth regulator, while p53 is the major growth suppressor and the two have an intricate pattern of crosstalk to determine whether the cell should grow or die in response to damage (Hasty et al. 2013). mTOR inhibitors are a prevalent cancer treatment as mTOR is hyperactivated and/or mutated in many cancers (Hua et al. 2019). Nearly half of human cancers harbor p53 mutations and mutant p53 promotes carcinogenesis, metastasis, tumor recurrence and chemoresistance (Di Agostino et al. 2019). Traditionally, cancer biology is focusing on mutations in proto-oncogenes, which are considered causative to cancer. Our data demonstrate that the same mechanisms that are known to be damaged by mutations in cancers can also undergo suppression or activation as a result of interaction with a broad range of chemical compounds. Given that genetically heterogeneous human populations are ubiquitously exposed to a mix of man-made chemical agents the risks of cancer may be elevated in some susceptible genotypes due to exposures. Identification of these susceptible genotypes results usually in a conclusion, that genetics is the causal factor rather than the environment (Takser and Hunting 2020), while the right interpretation here would be, that specific gene-environment interaction is responsible for increased cancer risks (Takser and Hunting 2020). Identification of molecular mechanisms involved in cancer pathogenesis sensitive to chemical exposures using our or similar approaches could prove useful in advancing our understanding of the biology of gene-environment interactions in heath and disease.

Suppression of pancreatic beta cell genes
Our analysis of molecular pathways that are predominantly enriched by suppressive chemical-gene interactions have identified pancreas as a potentially sensitive target of a broad range of chemical exposures. In fact, "Pancreas beta cells" (Hallmark), "Maturity onset diabetes of the young" (KEGG), "Regulation of gene expression in beta cells" and "Regulation of beta cell development" (Reactome) were datasets with the lowest RASEP values, indicating that genes involved in pancreas physiology and development are highly sensitive to suppressive but not to activating effect of chemical exposures. This phenomenon may be linked to very high susceptibility of pancreatic beta cells to oxidative stress due to a high endogenous production of reactive oxygen species and a low expression of antioxidative enzymes (Lenzen et al. 1996;Li, N. et al. 2008;Wang, D. et al. 2018). Additionally, beta cells development is positively controlled by oxygen in a dose-dependent manner through hypoxia-inducible factor I alpha (HIF-1α) (Heinis et al. 2010;Heinis et al. 2012). Both response to oxidative stress and hypoxia (HIF1a), were among the most sensitive mechanisms affected by chemical exposures in our study. Involvement of these mechanisms cannot completely explain the primarily suppressive effect of chemical compounds on pancreatic genes, however, given that both hypoxia and oxidative stress response ("glutathione metabolism") were equally sensitive to suppressive and activating effects of exposures (Table 5).
Insufficient function of beta cells is a critical characteristic of pathogenesis of all three major types of diabetestype 1 diabetes, type 2 diabetes, and gestational diabetes. According to the WHO, about 422 million people worldwide have diabetes (WHO 2018). Its prevalence has been steadily increasing for the past 3 decades worldwide, and the number of people with diabetes has nearly quadrupled since 1980. In 2012, diabetes was the direct cause of 1.5 million deaths globally. The ability of a broad range of chemicals to suppress expression of genes essential for beta cell development and function may be a significant factor that predisposes the modern population to diabetes development.

Identification of the most sensitive molecular pathways may provide new approaches for the study of mixed exposures
Today, the major approach for studying mixed exposures in toxicology is based on a similarity of mode of action (MOA) within a group of compounds. For example, toxicity of chemical compounds that act as ligands for the aryl hydrocarbon receptor (AhR), is expressed through toxic equivalency (TEQ) to the most potent AhR ligand, 2,3,7,8-tetrachlorodibenzo-p-dioxin. TEQ is established through toxic equivalent factor (TEF) -the ratio of the toxicity of a compound in this category to the toxicity of the most toxic compound (Ahlborg and Hanberg 1994;Van den Berg et al. 2006). TEQ allows for the expression of cumulative toxicity of all AhR ligands in a mixture as a single number. Similar approaches are being developed for other MOA. For example, cumulative effects of compounds with estrogenic activity were successfully predicted using concentration addition (CA) model, which assumes that one agent can be substituted for another in proportion to their relative potencies (Silva et al. 2002;Silva et al. 2011). This approach was further generalized to describe the combination effects of full agonists, partial agonists, and competitive antagonists (Howard and Webster 2009;Howard et al. 2010). In pharmacodynamic modeling, CA provides close approximation even for a mix of agents that affect similar pathways or produce common outcomes via different MOA (Webster 2013). The approach used in this study, identified molecular mechanisms most sensitive to chemical exposures regardless the nature of a chemical compound or its MOA. Use of these pathways and corresponding outcomes to analyze cumulative effects of heterogeneous mixtures may be useful in the future.

Limitations and future directions
The approach used in our study is based on a high level of generalization, which may introduce some bias. For example, it is not clear, how our overall results depend on the composition of target organ/tissues/cells used in toxicological experiments in our database. Also, we do not consider doses of chemicals used in different studies. It is likely that higher doses of different compounds may trigger different molecular changes (response to damage) than the same compounds at lower doses. In future research we plan to develop annotations of experiments, the sources of data for the database, to elucidate effects of dose, biological model, organ and other factors.

Supplementary Materials
Supplementary Materials 1. Explanation of data preparation for GSEA and results interpretation.
Supplementary Table S1. GSEA enrichment of Hallmark, KEGG and Reactome datasets with genes sensitive to chemical exposures.
Supplementary Data File S1. Chemical-gene interaction database extracted from CTD and manually annotated for chemical use.
Supplemental Material 1. Explanation of data preparation for GSEA and GSEA plots interpretation.
File with pre-ranked gene list was prepared the following way:

1.
All genes were ranked in accordance with numbers of chemical-gene interactions -see figure S1 below.