Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Entropic Ranks: A Methodology for Enhanced, Threshold-Free, Information-Rich Data Partition and Interpretation

Version 1 : Received: 17 September 2020 / Approved: 18 September 2020 / Online: 18 September 2020 (09:11:04 CEST)

A peer-reviewed article of this Preprint also exists.

de Lastic, H.-X.; Liampa, I.; G. Georgakilas, A.; Zervakis, M.; Chatziioannou, A. Entropic Ranks: A Methodology for Enhanced, Threshold-Free, Information-Rich Data Partition and Interpretation. Appl. Sci. 2020, 10, 7077. de Lastic, H.-X.; Liampa, I.; G. Georgakilas, A.; Zervakis, M.; Chatziioannou, A. Entropic Ranks: A Methodology for Enhanced, Threshold-Free, Information-Rich Data Partition and Interpretation. Appl. Sci. 2020, 10, 7077.

Abstract

Background: Traditional omic analysis relies on p-value and fold change as selection criteria. There is an ongoing debate on their effectiveness in delivering systemic and robust interpretation, due to their dependence on assumptions of conformity with various parametric distributions.Here, we propose a threshold-free selection method based on robust, non-parametric statistics, ensuring independence from the statistical distribution properties and broad applicability. Such methods could adapt to different initial data distributions, contrary to statistical techniques based on fixed thresholds. Methods: Our work extends the Rank Products methodology with a neutral selection method of high information-extraction capacity. We introduce the calculation of the RP distribution’s entropy to isolate the features of interest by their contribution to the distribution’s information content. The aim is a methodology performing threshold-free identification of the differentially expressed features, which are highly informative about the phenomenon under scrutiny. Conclusions: Applying the proposed method on microarray (transcriptomic and DNA methylation) and RNAseq count data of varying sizes and noise presence, we observe robust convergence for the different parameterisations to stable cutoff points. Functional analysis through BioInfoMiner and EnrichR was used to evaluate the information potency of the resulting feature lists. Overall, the derived functional terms provide a systemic description highly compatible with the results of traditional statistical hypothesis testing techniques. The methodology behaves consistently across different data types. The feature lists are compact and information-rich, indicating phenotypic aspects specific to the tissue and biological phenomenon i nvestigated. Selection by information content measures efficiently addresses problems, emerging from arbitrary thresholding, thus facilitating the full automation of the analysis.

Keywords

data analysis; threshold-free; differential analysis

Subject

Computer Science and Mathematics, Data Structures, Algorithms and Complexity

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.