Version 1
: Received: 4 January 2023 / Approved: 4 January 2023 / Online: 4 January 2023 (02:22:00 CET)
How to cite:
Bayati, M.; Rezaie, N.; Hamidi, M.; Tahaei, M. S.; Rabiee, H. A New R Package for Categorizing Coding and Non-Coding Genes. Preprints2023, 2023010039. https://doi.org/10.20944/preprints202301.0039.v1
Bayati, M.; Rezaie, N.; Hamidi, M.; Tahaei, M. S.; Rabiee, H. A New R Package for Categorizing Coding and Non-Coding Genes. Preprints 2023, 2023010039. https://doi.org/10.20944/preprints202301.0039.v1
Bayati, M.; Rezaie, N.; Hamidi, M.; Tahaei, M. S.; Rabiee, H. A New R Package for Categorizing Coding and Non-Coding Genes. Preprints2023, 2023010039. https://doi.org/10.20944/preprints202301.0039.v1
APA Style
Bayati, M., Rezaie, N., Hamidi, M., Tahaei, M. S., & Rabiee, H. (2023). A New R Package for Categorizing Coding and Non-Coding Genes. Preprints. https://doi.org/10.20944/preprints202301.0039.v1
Chicago/Turabian Style
Bayati, M., Maedeh Sadat Tahaei and Hamid Rabiee. 2023 "A New R Package for Categorizing Coding and Non-Coding Genes" Preprints. https://doi.org/10.20944/preprints202301.0039.v1
Abstract
Previous studies demonstrate the critical importance of non-coding RNAs interfacing with chromatin-modifying machinery resulting in promoter-enhancer-based gene regulation and raise the possibility that many other enhancer-like RNAs may operate via similar mechanisms. Critically, more than 80% of the disease-linked variations identified in genome-wide studies are located in the non-coding regions of genomes, especially non-coding RNA, suggesting non-coding RNAs are relevant to disease. Thus, a critical path forward for understanding non-coding RNAs' role, especially long non-coding RNAs, is to understand the genomic regions' transcriptional regulation, especially non-coding regions. Here, we developed a user-friendly R package called SomaGene for studying and identifying enhancer-like non-coding RNAs with enriched somatic mutations in the cancer genome. SomaGene accepts different genomic variants (whole genome/exome somatic point mutations, structural variations, copy number variations) to identify those RNAs that significantly mutated in diseases (e.g., cancer). It then uses multiple publicly available genomics and epigenetics datasets including ENCODE epigenomics annotations, FANTOM5 tissue-specific expression profiles, disease-associated genome-wide association SNPs, and tissue-specific eQTL pairs to identify those RNAs with potentially enhancer function. SomaGene, as a powerful R package, can provide the opportunity to cancer scientists to study the roles of non-coding RNAs in different cancer genomes.
Keywords
somatic point mutations; non-coding RNA; biomarker discovery; driver genes; non-coding RNAs prioritization; health data analytics
Subject
Biology and Life Sciences, Biochemistry and Molecular Biology
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.