Submitted:
18 April 2024
Posted:
19 April 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Transcription Factor Databases
2.2. Transcription Factor Databases
- The GTEx (Genotype-Tissue Expression) project database provides extensive reference data for gene expression and its variability in different normal human tissues, enabling researchers to understand how gene expression is influenced by genetic background on a broader scale [17].
- The TCGA (The Cancer Genome Atlas) database collects multidimensional cancer genomic data including gene expression, mutations, copy number variations, and epigenetic data, offering strong support for the discovery of cancer biomarkers and therapeutic targets [18].
- The CCLE (Cancer Cell Line Encyclopedia) database provides a wealth of gene expression, mutation, and epigenetic characteristic data for numerous cancer cell lines, serving as a crucial resource for studying cell line-specific responses and drug screening [19].
2.3. Software
3. Results
3.1. Module 1: Procedures for the Prediction of the Target Genes of TF
- 1)
- Input a TF name: Users initiated the process by selecting a TF of interest from a dropdown list.
- 2)
- Select datasets: Participants then chose the predictive tools to include in the analysis. Notably, if “KnockTF” was selected, an additional interface element appeared: a slider for setting the log2 fold-change (Log2FC) threshold, along with a checkbox to include only downregulated genes. This specification is necessary as KnockTF predictions are based on differential gene expression data following TF knockdown or knockout.
- 3)
- Initiate prediction: By clicking the “Go” button, users commenced the predictive analysis.
- 4)
- View results: After a short wait, the prediction outcomes were displayed on the “All results” tab, which encompassed individual tool results and the intersected findings.
- 5)
- Intersection selection: Given that some tools may yield sparse predictions or lack data, we provided an option box to select well-predicted datasets for intersection analysis, which is visualized through a Venn diagram.
- 6)
- Visualize intersections: The “Venn diagram” tab allowed for the visualization of overlapping predictions across multiple tools using Venn or petal diagrams.
- 7)
- Individual dataset review: The “Individual dataset” tab enabled viewing and downloading detailed information for each tool’s predictive results.
3.2. Module 2: Procedures for the Prediction of Upstream TFs of Target Genes
- 1)
- Input a target gene symbol: Such as GAPDH.
- 2)
- Select datasets: Participants then chose the predictive tools to include in the analysis. Notably, if “KnockTF” was selected, an additional interface element appeared: a slider for setting the log2 fold-change (Log2FC) threshold, along with a checkbox to include only downregulated genes. This specification is necessary as KnockTF predictions are based on differential gene expression data following TF knockdown or knockout.
- 3)
- Correlation analysis: Researchers began by choosing the data type for correlation analysis through the “Correlation” selection box, with options including data from TCGA, GTEx, or a combination of both.
- 4)
- Tissue type selection: Users selected specific cancer types from the TCGA database and/or normal tissue types from the GTEx database to tailor the correlation analysis to their research interests.
- 5)
- Correlation parameter setting: The method of correlation analysis and the cutoff for the correlation coefficient were set, allowing for the customization of the stringency of the correlation criteria.
- 6)
- Initiate prediction: The “Go” button was clicked to start the prediction analysis, incorporating the correlation parameters specified.
- 7)
- Results display: After a brief processing period, the predictive results, including the outcomes from individual tools and the intersected data, were displayed on the “All results” tab.
- 8)
- Intersection selection: Similar to Module 1, we provided the option to select datasets with robust predictions for intersection analysis, with the results visualization through a Venn diagram.
- 9)
- Visualization of Intersections: By navigating to the “Venn diagram” tab, users could visualize the intersection results between different datasets.
- 10)
- Dataset Details: Detailed information regarding the predictive results from each tool could be viewed and downloaded from the “Individual dataset” tab.
3.3. Module 3: Pan-Tissue Correlation Analysis between the Expression of Predicted TF-Target Pair
- 1)
- TF and target gene input: The user begins by selecting a transcription factor and entering the symbol for the target gene.
- 2)
- Database selection: The database(s) for analysis are chosen from among TCGA, GTEx, and CCLE. Notably, upon selecting TCGA, a popup menu appears, offering the user the option to include tumor data exclusively.
- 3)
- Correlation analysis parameters: Parameters for correlation analysis are set, including the selection of the analysis method and the establishment of thresholds for the correlation coefficient and p-value.
- 4)
- Initiate analysis: Data retrieval and correlation analysis are initiated by clicking the “Go” button.
- 5)
- Correlation results and scatter plot: Subsequently, the results of the correlation analysis are presented, along with a scatter plot illustrating the expression correlation.
- 6)
- Plotting parameter: Options are provided to adjust parameters relevant to the scatter plot visualization.
- 7)
- Detailed scatter plot: Clicking on a row within the results table prompts a popup window that displays a detailed scatter plot for the expression of the two genes within a single tissue type.
3.4. Module 4: TF-Targets Regulation Network Analysis
- 1)
- Data upload: Users upload their gene expression differential analysis results. It is important to ensure that the column names in the uploaded file are consistent with those in the example data provided.
- 2)
- Differential gene selection criteria: Set the thresholds for selecting differentially expressed genes, specifically the log2 fold change (log2FC) and p-value.
- 3)
- Tool selection: Choose the predictive tools to be included in the analysis for identifying TF-target gene relationships.
- 4)
- TF List Update: Upon input completion, the ‘TF to analysis’ input field automatically updates with a list of TFs. This list is generated based on the intersection of differentially expressed genes from the uploaded results and the TFs contained within the chosen predictive tools.
- 5)
- TF different expression result: The ‘TF result’ page will exhibit the differential analysis results of TFs extracted from the user’s uploaded data.
- 6)
- Initiate Prediction Analysis: Clicking the ‘Go’ button starts the predictive analysis process.
- 7)
- Network Visualization: After the analysis is complete, a network diagram is generated. Note that some TFs may not display target genes in the network diagram if no target genes are identified after intersecting the results from multiple tools. In such cases, it may be beneficial to reduce the number of tools included in the analysis to obtain more extensive information.
- 8)
- Plotting Data Interface: The ‘Plotting data’ interface will present the predicted results for TF-target genes.
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Conflicts of Interest
References
- He, H.; Yang, M.; Li, S.; Zhang, G.; Ding, Z.; Zhang, L.; Shi, G.; Li, Y. Mechanisms and biotechnological applications of transcription factors. Synthetic and Systems Biotechnology 2023, 8, 565–577. [Google Scholar] [CrossRef] [PubMed]
- Weidemüller, P.; Kholmatov, M.; Petsalaki, E.; Zaugg, J.B. Transcription factors: Bridge between cell signaling and gene regulation. Proteomics 2021, 21, e2000034. [Google Scholar] [CrossRef] [PubMed]
- Wade, J.T. Mapping Transcription Regulatory Networks with ChIP-seq and RNA-seq. In Prokaryotic Systems Biology; Krogan, P.N.J., Babu, P.M., Eds.; Springer International Publishing: Cham, 2015; pp. 119–134. [Google Scholar] [CrossRef]
- Mundade, R.; Ozer, H.G.; Wei, H.; Prabhu, L.; Lu, T. Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond. Cell Cycle 2014, 13, 2847–2852. [Google Scholar] [CrossRef] [PubMed]
- Castro-Mondragon, J.A.; Riudavets-Puig, R.; Rauluseviciute, I.; Lemma, R.B.; Turchi, L.; Blanc-Mathieu, R.; Lucas, J.; Boddie, P.; Khan, A.; Manosalva Pérez, N. , et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2022, 50, D165–d173. [Google Scholar] [CrossRef] [PubMed]
- Rauluseviciute, I.; Riudavets-Puig, R.; Blanc-Mathieu, R.; Castro-Mondragon, J.A.; Ferenc, K.; Kumar, V.; Lemma, R.B.; Lucas, J.; Chèneby, J.; Baranasic, D. , et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024, 52, D174–d182. [Google Scholar] [CrossRef]
- Feng, C.; Song, C.; Liu, Y.; Qian, F.; Gao, Y.; Ning, Z.; Wang, Q.; Jiang, Y.; Li, Y.; Li, M. , et al. KnockTF: a comprehensive human gene expression profile database with knockdown/knockout of transcription factors. Nucleic Acids Res 2020, 48, D93–d100. [Google Scholar] [CrossRef] [PubMed]
- Feng, C.; Song, C.; Song, S.; Zhang, G.; Yin, M.; Zhang, Y.; Qian, F.; Wang, Q.; Guo, M.; Li, C. KnockTF 2.0: a comprehensive gene expression profile database with knockdown/knockout of transcription (co-)factors in multiple species. Nucleic Acids Res 2024, 52, D183–d193. [Google Scholar] [CrossRef] [PubMed]
- Daily, K.; Patel, V.R.; Rigor, P.; Xie, X.; Baldi, P. MotifMap: integrative genome-wide maps of regulatory motif sites for model species. BMC Bioinformatics 2011, 12, 495. [Google Scholar] [CrossRef] [PubMed]
- Xie, X.; Rigor, P.; Baldi, P. MotifMap: a human genome-wide map of candidate regulatory motif sites. Bioinformatics 2009, 25, 167–174. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, W.; Zhang, H.M.; Xie, G.Y.; Miao, Y.R.; Xia, M.; Guo, A.Y. hTFtarget: A Comprehensive Database for Regulations of Human Transcription Factors and Their Targets. Genomics Proteomics Bioinformatics 2020, 18, 120–128. [Google Scholar] [CrossRef]
- Han, H.; Shim, H.; Shin, D.; Shim, J.E.; Ko, Y.; Shin, J.; Kim, H.; Cho, A.; Kim, E.; Lee, T. , et al. TRRUST: a reference database of human transcriptional regulatory interactions. Sci Rep 2015, 5, 11432. [Google Scholar] [CrossRef]
- Han, H.; Cho, J.W.; Lee, S.; Yun, A.; Kim, H.; Bae, D.; Yang, S.; Kim, C.Y.; Lee, M.; Kim, E. , et al. TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res 2018, 46, D380–d386. [Google Scholar] [CrossRef]
- Zheng, R.; Wan, C.; Mei, S.; Qin, Q.; Wu, Q.; Sun, H.; Chen, C.H.; Brown, M.; Zhang, X.; Meyer, C.A. , et al. Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis. Nucleic Acids Res 2019, 47, D729–d735. [Google Scholar] [CrossRef]
- An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489, 57–74. [CrossRef]
- Rouillard, A.D.; Gundersen, G.W.; Fernandez, N.F.; Wang, Z.; Monteiro, C.D.; McDermott, M.G.; Ma’ayan, A. The harmonizome: a collection of processed datasets gathered to serve and mine knowledge about genes and proteins. Database 2016, 2016, baw100. [Google Scholar] [CrossRef]
- The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013, 45, 580–585. [CrossRef]
- Tomczak, K.; Czerwińska, P.; Wiznerowicz, M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn) 2015, 19, A68–77. [Google Scholar] [CrossRef]
- Barretina, J.; Caponigro, G.; Stransky, N.; Venkatesan, K.; Margolin, A.A.; Kim, S.; Wilson, C.J.; Lehár, J.; Kryukov, G.V.; Sonkin, D. , et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012, 483, 603–607. [Google Scholar] [CrossRef]
- Goldman, M.J.; Craft, B.; Hastie, M.; Repečka, K.; McDade, F.; Kamath, A.; Banerjee, A.; Luo, Y.; Rogers, D.; Brooks, A.N. , et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nature Biotechnology 2020, 38, 675–678. [Google Scholar] [CrossRef]
- Wang, S.; Xiong, Y.; Zhao, L.; Gu, K.; Li, Y.; Zhao, F.; Li, J.; Wang, M.; Wang, H.; Tao, Z. , et al. UCSCXenaShiny: an R/CRAN package for interactive analysis of UCSC Xena data. Bioinformatics 2022, 38, 527–529. [Google Scholar] [CrossRef]
- Luo, Y.; Hitz, B.C.; Gabdank, I.; Hilton, J.A.; Kagda, M.S.; Lam, B.; Myers, Z.; Sud, P.; Jou, J.; Lin, K. , et al. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res 2020, 48, D882–d889. [Google Scholar] [CrossRef] [PubMed]
- Blais, A.; Dynlacht, B.D. Constructing transcriptional regulatory networks. Genes Dev 2005, 19, 1499–1511. [Google Scholar] [CrossRef] [PubMed]
- Pavesi, G. ChIP-Seq Data Analysis to Define Transcriptional Regulatory Networks. Adv Biochem Eng Biotechnol 2017, 160, 1–14. [Google Scholar] [CrossRef]
- Levine, M.; Tjian, R. Transcription regulation and animal diversity. Nature 2003, 424, 147–151. [Google Scholar] [CrossRef]




| Data type | Datasets | DB link | Data source | Evidence |
|---|---|---|---|---|
| TF database | MotifMap [9,10] | http://motifmap.ics.uci.edu/ | http://motifmap.ics.uci.edu/ | motifs |
| hTFtarget [11] | https://guolab.wchscu.cn/hTFtarget/#!/ | http://bioinfo.life.hust.edu.cn/hTFtarget#!/ | ChIP-Seq data | |
| KnockTF [7,8] | https://bio.liclab.net/KnockTFv1/ | https://bio.liclab.net/KnockTF/index.php | Knockdown/knockout | |
| TRRUST [12,13] | https://www.grnpedia.org/trrust/ | https://www.grnpedia.org/trrust/ | Pubmed | |
| Cistrome DB [14] | http://cistrome.org/db/ | http://cistrome.org/db/#/ | ChIP-Seq and DNase-Seq | |
| ENCODE [15] | https://www.encodeproject.org/ | https://maayanlab.cloud/Harmonizome/dataset/ENCODE+Transcription+Factor+Targets | ChIP-Seq data | |
| JASPAR [6] | https://jaspar.elixir.no/ | https://maayanlab.cloud/Harmonizome/dataset/JASPAR+Predicted+Transcription+Factor+Targets | motifs | |
| Gene expression database | GTEx [17] | https://www.genome.gov/Funded-Programs-Projects/Genotype-Tissue-Expression-Project | https://xenabrowser.net/datapages/?dataset=gtex_rsem_isoform_tpm&host=https%3A%2F%2Ftoil.xenahubs.net | gene expression RNAseq |
| TCGA [18] | https://portal.gdc.cancer.gov/ | https://xenabrowser.net/datapages/?dataset=tcga_RSEM_gene_tpm&host=https%3A%2F%2Ftoil.xenahubs.net | gene expression RNAseq | |
| CCLE [19] | https://sites.broadinstitute.org/ccle/ | https://xenabrowser.net/datapages/?dataset=ccle%2FCCLE_DepMap_18Q2_RNAseq_RPKM_20180502&host=https%3A%2F%2Fucscpublic.xenahubs.net | gene expression RNAseq |
| R Package | Function |
|---|---|
| shiny | Building interactive web application UI |
| bs4Dash | Advanced UI design with Bootstrap 4 integration |
| httr | Handling HTTP requests for web data retrieval |
| rvest | Web scraping functionalities |
| curl | Data transfer with URL syntax |
| jsonlite | Parsing JSON data |
| UCSCXenaShiny [21] | Extracting gene expression data from XENA database |
| RMySQL | Accessing and extracting database data |
| VennDiagram | Venn diagram visualization |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).