Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Correlations in Compositional Data without Log-Transformations

Version 1 : Received: 10 May 2023 / Approved: 10 May 2023 / Online: 10 May 2023 (08:57:50 CEST)

A peer-reviewed article of this Preprint also exists.

Monich, Y.V.; Nechipurenko, Y.D. Correlations in Compositional Data without Log Transformations. Axioms 2023, 12, 1084. Monich, Y.V.; Nechipurenko, Y.D. Correlations in Compositional Data without Log Transformations. Axioms 2023, 12, 1084.


The article proposes a method for determining the p-value of correlations in compositional data, i.e., those data that arise as a result of dividing the original values by their sum. Data organized in this way are typical for many fields of knowledge, but there is still no consensus on methods for interpreting correlations in such data. In a space closed by normalizing quantity, correlation coefficients behave differently than under normal conditions: their probabilities of occurrence do not coincide with those inherent in the standard scale of estimates. In the tens of the new millennium, almost all newly emerging methods for estimating correlation in compositional data began to require mandatory log-transformation of the variable values. In the method proposed here there are no log-transformations. We return to the early stages of attempting to solve the problem and rely on negative shifts in correlations in the multinomial distribution. In modeling the data, we use a hybrid method that combines the hypergeometric distribution with the distribution of any other law. During our work on the calculation method, we found that the number of degrees of freedom in compositional data measures discretely only when all the normalizing sums are equal and that it decreases when the sums are not equal, becoming a continuously varying quantity. Estimation of the number of degrees of freedom and the strength of its influence on the magnitude of the shift in the distribution of correlation coefficients is the basis of the proposed method.


compositional data; mathematical expectation shift; loss of degrees of freedom; hybrid model


Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.