Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A New R Package for Categorizing Coding and Non-Coding Genes

Version 1 : Received: 4 January 2023 / Approved: 4 January 2023 / Online: 4 January 2023 (02:22:00 CET)

How to cite: Bayati, M.; Rezaie, N.; Hamidi, M.; Tahaei, M.S.; Rabiee, H. A New R Package for Categorizing Coding and Non-Coding Genes. Preprints 2023, 2023010039. https://doi.org/10.20944/preprints202301.0039.v1 Bayati, M.; Rezaie, N.; Hamidi, M.; Tahaei, M.S.; Rabiee, H. A New R Package for Categorizing Coding and Non-Coding Genes. Preprints 2023, 2023010039. https://doi.org/10.20944/preprints202301.0039.v1

Abstract

Previous studies demonstrate the critical importance of non-coding RNAs interfacing with chromatin-modifying machinery resulting in promoter-enhancer-based gene regulation and raise the possibility that many other enhancer-like RNAs may operate via similar mechanisms. Critically, more than 80% of the disease-linked variations identified in genome-wide studies are located in the non-coding regions of genomes, especially non-coding RNA, suggesting non-coding RNAs are relevant to disease. Thus, a critical path forward for understanding non-coding RNAs' role, especially long non-coding RNAs, is to understand the genomic regions' transcriptional regulation, especially non-coding regions. Here, we developed a user-friendly R package called SomaGene for studying and identifying enhancer-like non-coding RNAs with enriched somatic mutations in the cancer genome. SomaGene accepts different genomic variants (whole genome/exome somatic point mutations, structural variations, copy number variations) to identify those RNAs that significantly mutated in diseases (e.g., cancer). It then uses multiple publicly available genomics and epigenetics datasets including ENCODE epigenomics annotations, FANTOM5 tissue-specific expression profiles, disease-associated genome-wide association SNPs, and tissue-specific eQTL pairs to identify those RNAs with potentially enhancer function. SomaGene, as a powerful R package, can provide the opportunity to cancer scientists to study the roles of non-coding RNAs in different cancer genomes.

Keywords

somatic point mutations; non-coding RNA; biomarker discovery; driver genes; non-coding RNAs prioritization; health data analytics

Subject

Biology and Life Sciences, Biochemistry and Molecular Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.