Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Simple Method for Cutoff Point Identification In Descriptive High-Throughput Biological Studies

Version 1 : Received: 8 April 2021 / Approved: 8 April 2021 / Online: 8 April 2021 (10:13:21 CEST)

How to cite: Suvorov, A. Simple Method for Cutoff Point Identification In Descriptive High-Throughput Biological Studies. Preprints 2021, 2021040227. https://doi.org/10.20944/preprints202104.0227.v1 Suvorov, A. Simple Method for Cutoff Point Identification In Descriptive High-Throughput Biological Studies. Preprints 2021, 2021040227. https://doi.org/10.20944/preprints202104.0227.v1

Abstract

Rapid development of high-throughput omics technologies generates an increasing interests in algorithms for cutoff point identification. Existing cutoff methods and tools identify cutoff points based on association of continuous variables with another variable, such as phenotype, disease state or treatment group. These approaches are not applicable for descriptive studies in which continuous variables are reported without known association with any biologically meaningful variables. The most common shape of the ranked distribution of continuous variables in high-throughput descriptive studies corresponds to a biphasic exponential/super-exponential curve, where the first phase includes big number of variables with values slowly growing with rank and the second phase includes smaller number of variables rapidly growing with rank. This study describes an easy algorithm to identify the boundary between these phases to be used as a cutoff point. The major assumption of that approach is that small number of variables with high values dominate biological system and determine its major processes and functions. This approach was tested on three different datasets: genes in the human cerebral cortex, mammalian genes sensitive to chemical exposures, and proteins expressed in human heart. In every case, the described cutoff identification method produced shortlists of variables (genes, proteins) highly relevant for dominant functions/pathways of the analyzed biological systems. Thus, our described method for cutoff identification may be used to prioritize variables for a focused functional analysis, in situations where other methods of dichotomization of data are inaccessible.

Keywords

cutoff, dichotomization, threshold, -omics, bioinformatics, methods, values distributon

Subject

Medicine and Pharmacology, Immunology and Allergy

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.