Submitted:
18 December 2024
Posted:
18 December 2024
You are already at the latest version
Abstract
Keywords:
Introduction
Methods
Selection of G4 Sequencing Datasets
Selection of Peak Calling Algorithms
G4 Sequencing Data Analysis Pipeline
G4 Peak Identification and Preprocessing
Construction of G4 Benchmark Datasets
Calculating Precision, Recall, and Harmonic Mean for Each Candidate Algorithms
Proportion of pG4, oG4, and aG4 in Candidate G4 Peaks
Chromatin Accessibility Data Processing
Construction of the G4 to-Be-Tested Datasets for Hypothesis Testing Evaluation
Reads/Fragment Distribution Models and False Positive Rate Evaluation
Results and Discussion
Evaluating the G4 Benchmark Dataset
Overall Performance of the Candidate Algorithms
Evaluating Candidate Algorithms Based on Known G4 Information
Evaluating the Validity of Algorithm-Specific G4 Peaks
Overall Evaluation of the Candidate Algorithms
Conclusion
Supplementary Materials
Funding
References
- Spiegel, J.; Adhikari, S.; Balasubramanian, S. The Structure and Function of DNA G-Quadruplexes. 2019, 2, 123–136. [Google Scholar] [CrossRef] [PubMed]
- Rhodes, D.; Lipps, H.J. G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 2015, 43, 8627–8637. [Google Scholar] [CrossRef] [PubMed]
- Sato, K.; Knipscheer, P. G-quadruplex resolution: From molecular mechanisms to physiological relevance. DNA Repair 2023, 130, 103552. [Google Scholar] [CrossRef] [PubMed]
- Cave, J.W.; Willis, D.E. G-quadruplex regulation of neural gene expression. FEBS J. 2021, 289, 3284–3303. [Google Scholar] [CrossRef]
- Kosiol, N.; Juranek, S.; Brossart, P.; Heine, A.; Paeschke, K. G-quadruplexes: a promising target for cancer therapy. Mol. Cancer 2021, 20, 1–18. [Google Scholar] [CrossRef]
- Hänsel-Hertsch, R.; Beraldi, D.; Lensing, S.V.; Marsico, G.; Zyner, K.; Parry, A.; Di Antonio, M.; Pike, J.; Kimura, H.; Narita, M.; et al. G-quadruplex structures mark human regulatory chromatin. Nat. Genet. 2016, 48, 1267–1272. [Google Scholar] [CrossRef]
- Galli, S.; Flint, G.; Růžičková, L.; Di Antonio, M. Genome-wide mapping of G-quadruplex DNA: a step-by-step guide to select the most effective method. RSC Chem. Biol. 2024, 5, 426–438. [Google Scholar] [CrossRef]
- Hänsel-Hertsch, R.; Spiegel, J.; Marsico, G.; Tannahill, D.; Balasubramanian, S. Genome-wide mapping of endogenous G-quadruplex DNA structures by chromatin immunoprecipitation and high-throughput sequencing. Nat. Protoc. 2018, 13, 551–564. [Google Scholar] [CrossRef]
- Zheng, K.-W.; Zhang, J.-Y.; He, Y.-D.; Gong, J.-Y.; Wen, C.-J.; Chen, J.-N.; Hao, Y.-H.; Zhao, Y.; Tan, Z. Detection of genomic G-quadruplexes in living cells using a small artificial protein. Nucleic Acids Res. 2020, 48, 11706–11720. [Google Scholar] [CrossRef]
- Lyu, J.; Shao, R.; Yung, P.Y.K.; Elsässer, S.J. Genome-wide mapping of G-quadruplex structures with CUT&Tag. Nucleic Acids Res. 2021, 50, e13–e13. [Google Scholar] [CrossRef]
- Thomas, R. Features that define the best ChIP-seq peak calling algorithms. Brief Bioinform 2017, 18, 441–450. [Google Scholar] [CrossRef] [PubMed]
- Diaz, A. Normalization, bias correction, and peak calling for ChIP-seq. Stat Appl Genet Mol Biol 2012, 11, 9. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008, 9, R137. [Google Scholar] [CrossRef] [PubMed]
- Hui, W.W.I.; Simeone, A.; Zyner, K.G.; Tannahill, D.; Balasubramanian, S. Single-cell mapping of DNA G-quadruplex structures in human cancer cells. Sci. Rep. 2021, 11, 1–7. [Google Scholar] [CrossRef]
- Lago, S.; Nadai, M.; Cernilogar, F.M.; Kazerani, M.; Moreno, H.D.; Schotta, G.; Richter, S.N. Promoter G-quadruplexes and transcription factors cooperate to shape the cell type-specific transcriptome. Nat. Commun. 2021, 12, 3885. [Google Scholar] [CrossRef]
- Jeon, H. Comparative analysis of commonly used peak calling programs for ChIP-Seq analysis. Genomics Inform 2020, 18, e42. [Google Scholar] [CrossRef]
- Eder, T.; Grebien, F. Comprehensive assessment of differential ChIP-seq tools guides optimal algorithm selection. Genome Biol 2022, 23, 119. [Google Scholar] [CrossRef]
- Li, C.; Wang, H.; Yin, Z.; Fang, P.; Xiao, R.; Xiang, Y.; Wang, W.; Li, Q.; Huang, B.; Huang, J.; et al. Ligand-induced native G-quadruplex stabilization impairs transcription initiation. Genome Res. 2021, 31, 1546–1560. [Google Scholar] [CrossRef]
- Heinz, S.; Benner, C.; Spann, N.; Bertolino, E.; Lin, Y.C.; Laslo, P.; Cheng, J.X.; Murre, C.; Singh, H.; Glass, C.K. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol. Cell 2010, 38, 576–589. [Google Scholar] [CrossRef]
- Guo, Y.; Mahony, S.; Gifford, D.K. High Resolution Genome Wide Binding Event Finding and Motif Discovery Reveals Transcription Factor Spatial Binding Constraints. PLOS Comput. Biol. 2012, 8, e1002638. [Google Scholar] [CrossRef]
- Xu, S. Spatial clustering for identification of ChIP-enriched regions (SICER) to map regions of histone methylation patterns in embryonic stem cells. Methods Mol Biol 2014, 1150, 97–111. [Google Scholar] [PubMed]
- Feng, X.; Grossman, R.; Stein, L. PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinformatics 2011, 12, 139. [Google Scholar] [CrossRef] [PubMed]
- Yashar, W.M.; Kong, G.; VanCampen, J.; Curtiss, B.M.; Coleman, D.J.; Carbone, L.; Yardimci, G.G.; Maxson, J.E.; Braun, T.P. GoPeaks: histone modification peak calling for CUT&Tag. Genome Biol. 2022, 23, 1–21. [Google Scholar] [CrossRef]
- Meers, M.P.; Tenenbaum, D.; Henikoff, S. Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling. Epigenetics Chromatin 2019, 12, 1–11. [Google Scholar] [CrossRef]
- Martin, M.J.E.j. Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011, 17, 10–12. [Google Scholar] [CrossRef]
- Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
- Li, H. , Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
- Quinlan, A.R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinform. 2014, 47, 11.12–1. [Google Scholar] [CrossRef]
- Hon, J.; Martínek, T.; Zendulka, J.; Lexa, M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 2017, 33, 3373–3379. [Google Scholar] [CrossRef]
- Chambers, V.S.; Marsico, G.; Boutell, J.M.; Di Antonio, M.; Smith, G.P.; Balasubramanian, S. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015, 33, 877–881. [Google Scholar] [CrossRef]
- Zhang, R.; Shu, H.; Wang, Y.; Tao, T.; Tu, J.; Wang, C.; Mergny, J.-L.; Sun, X. G-Quadruplex Structures Are Key Modulators of Somatic Structural Variants in Cancers. Cancer Res. 2023, 83, 1234–1248. [Google Scholar] [CrossRef] [PubMed]
- Quinlan, A.R.; Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010, 26, 841–842. [Google Scholar] [CrossRef] [PubMed]
- Pohl, A.; Beato, M. bwtool: a tool for bigWig files. Bioinformatics 2014, 30, 1618–1619. [Google Scholar] [CrossRef] [PubMed]
- Chan, Y.; Walmsley, R.P. Learning and Understanding the Kruskal-Wallis One-Way Analysis-of-Variance-by-Ranks Test for Differences Among Three or More Independent Groups. Phys. Ther. 1997, 77, 1755–1761. [Google Scholar] [CrossRef]
- Brázda, V.; Kolomazník, J.; Lýsek, J.; Bartas, M.; Fojta, M.; Šťastný, J.; Mergny, J.-L. G4Hunter web application: a web server for G-quadruplex prediction. Bioinformatics 2019, 35, 3493–3495. [Google Scholar] [CrossRef]
- Calo, E.; Wysocka, J. Modification of Enhancer Chromatin: What, How, and Why? Mol. Cell 2013, 49, 825–837. [Google Scholar] [CrossRef]
- Furey, T.S. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet 2012, 13, 840–852. [Google Scholar] [CrossRef]








| GEO Accession Number | Cell Line | Antibody Type | Sequencing Technique | Library Type |
|---|---|---|---|---|
| GSE107690 | K562 | BG4 | G4-ChIP-seq | Single-end |
| GSE145090 | K562 | BG4 | G4-ChIP-seq | Single-end |
| GSE133379 | HEK293T | G4P | G4P-ChIP-seq | Paired-end |
| GSE178668 | HEK293T | BG4 | G4-ChIP-seq | Paired-end |
| GSE178668 | HEK293T | BG4 | G4 CUT&Tag | Paired-end |
| GSE221437 | HEK293T | BG4 | G4 CUT&Tag | Paired-end |
| Candidate Algorithm | Proportion of pG4-overlapping peaks | Proportion of oG4-overlapping peaks | Proportion of aG4-overlapping peaks |
|---|---|---|---|
| SEACR | 99%-100% | 79%-100% | 83%-93% |
| SICER | 76%-98% | 53%-85% | 44%-83% |
| MACS2 | 76%-95% | 52%-78% | 54%-83% |
| PeakRanger | 66%-91% | 50%-71% | 60%-82% |
| GoPeaks | 73%-91% | 49%-72% | 58%-88% |
| HOMER | 50%-82% | 35%-61% | 54%-78% |
| GEM | 42%-69% | 29%-52% | 34%-76% |
| Peak calling algorithm | Performance on benchmark | Performance based on known G4 information | Validity of Algorithm-specific Peaks | ||
|---|---|---|---|---|---|
| Peak width | Proportion of overlapping peaks | Percentage of pG4-overlaping peaks | Consistency with epigenetic signals | ||
| MACS2 | High HM score | Moderate | Moderately high | High | High |
| PeakRanger | Highest HM score | Moderate | Moderately high | High | High |
| GoPeaks | - | Moderate | Moderately high | High | High |
| HOMER | Moderate HM score | Narrow | Low | Low | Low |
| GEM | Much fewer peaks | Narrow | Low | Low | Low |
| SICER | Moderate HM score | Excessively high | High | - | - |
| SEACR | - | Excessively high | High | - | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).