Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets

Version 1 : Received: 29 March 2022 / Approved: 31 March 2022 / Online: 31 March 2022 (08:00:03 CEST)

A peer-reviewed article of this Preprint also exists.

Leske, M.; Bottacini, F.; Afli, H.; Andrade, B.G.N. BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets. Methods Protoc. 2022, 5, 42. Leske, M.; Bottacini, F.; Afli, H.; Andrade, B.G.N. BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets. Methods Protoc. 2022, 5, 42.

Journal reference: Methods Protoc. 2022, 5, 42
DOI: 10.3390/mps5030042

Abstract

The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of Machine Learning (ML) and Deep Learning (DL) models. Here we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 19 out of 21 classification exercises, BiGAMi achieved its results by selecting 6-68% fewer features than the highest performance of a Sequential Forward Feature Selection algorithm. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition.

Keywords

microbiome; genetic algorithm; feature selection; human health; machine learning

Subject

MATHEMATICS & COMPUTER SCIENCE, Artificial Intelligence & Robotics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.