Preprint Review Version 1 Preserved in Portico This version is not peer-reviewed

Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data

Version 1 : Received: 14 December 2020 / Approved: 15 December 2020 / Online: 15 December 2020 (12:10:44 CET)

A peer-reviewed article of this Preprint also exists.

Yousef, M.; Kumar, A.; Bakir-Gungor, B. Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data. Entropy 2021, 23, 2. Yousef, M.; Kumar, A.; Bakir-Gungor, B. Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data. Entropy 2021, 23, 2.

Journal reference: Entropy 2020, 23, 2
DOI: 10.3390/e23010002

Abstract

In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. For gene expression data analysis, most of the existing feature selection methods rely on expression values alone to select the genes; and biological knowledge is integrated at the end of the analysis in order to gain biological insights or to support the initial findings. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. Since the integrative approach attracted attention in the gene expression domain, lately the gene selection process shifted from being purely data-centric to more incorporative analysis with additional biological knowledge.

Subject Areas

Feature Selection, Feature Ranking, Grouping, Clustering, Biological Knowledge.

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.