Submitted:
05 March 2024
Posted:
06 March 2024
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
1.1. Related Work
1.2. Contributions
- We propose a representation of distinct uncertainty and distinct information, which is used to demonstrate the unexpected behavior of the measure by Williams and Beer [1] (Section 2.2 and Section 3).
- We propose a decomposition for any f-information on both the redundancy lattice (Figure 1b) and synergy lattice (Figure 1c) that satisfies an inclusion-exclusion relation and provides a meaningful operational interpretation (Section 3.2).
- We prove that the proposed decomposition satisfies the original axioms of Williams and Beer [1] and guarantees non-negative partial information (Theorem 3).
- We propose to transform the non-negative decomposition of one information measure into another. This transformation maintains the non-negativity and its inclusion-exclusion relation under a re-definition of information addition (Section 3.3).
- We demonstrate the transformation of an f-information decomposition into a decomposition for Rényi- and Bhattacharyya-information (Section 3.3).
- We demonstrate that the proposed decomposition obtains different properties from different information measures and analyze the behavior of total variation in more detail (Section 4).
- We demonstrate the analysis of partial information flows through Markov chains (Figure 1) for each information measure on both the redundancy and synergy lattice (Section 4.2).
2. Background
2.1. Blackwell and Zonogon Order
2.2. Partial Information Decomposition
- A source is a non-empty set of visible variables.
- An atoms is a set of sources constructed by Equation 8.
2.3. Information Measures
- f is convex,
- ,
- is finite for all .
3. Decomposition Methodology
- In a similar manner to how Finn and Lizier [11] used probability mass exclusion to differentiate distinct information, we use the achievable decision regions for each state of a target variable to differentiate distinct information.
- We propose applying the concepts about lattice re-graduations discussed by Knuth [17] to PIDs to transform the decomposition of one information measure to another while maintaining its consistency.
3.1. Representing f-Information
- We define a function as shown in Equation 24a to quantify a vector, where .
- We define a target pointwise f-information function , as shown in Equation 24b, to quantify half the zonogon perimeter for the corresponding pointwise channel .
- The convexity of in is shown separately in Lemma A1 of Appendix A.
- That scales linearly in can directly be seen from Equation 24a.
- The triangle inequality of in is shown separately in Corollary A1 of Appendix A.
- A vector of slope one is quantified to zero , since is a requirement on the generator function of an f-divergence (Definition 15).
- The zero vector is quantified to zero by the convention of generator functions for an f-divergence (Definition 15).
- That the function maintains the ordering relation of the Blackwell order on binary input channels is shown separately in Lemma A2 of Appendix A (Equation 25a).
- The bottom element consists of a single vector of slope one, which is quantified to zero by Theorem 1 (Equation 25b). The combination with Equation 25a ensures the non-negativity.
3.2. Decomposing f-Information
- Axiom 1: The measure (Equation 29a) is invariant to permuting the order of sources in , since the join operator of the zonogon order () is. Therefore, also satisfies Axiom 1.
- Axiom 2: The monotonicity of both and on the synergy lattice is shown separately as Corollary A2 in Appendix B.
- Non-negativity: The non-negativity of and is shown separately as Lemma A7 in Appendix B.
3.3. Decomposing Rényi-Information
- Axiom 1: is invariant to permuting the order of sources, since satisfies Axiom 1 (see Section 3.2).
- Axiom 2: satisfies monotonicity, since satisfies Axiom 2 (see Section 3.2) and the transformation function is monotonically increasing for .
- Axiom 3*: Since satisfies Axiom 3* (see Section 3.2, Equation 33 and 35), satisfies the self-redundancy axiom by definition, however, at a transformed operator: .
- Non-negativity: The decomposition of is non negative, since is non-negative (see Section 3.2), the Möbius inverse is computed with transformed operators (Equation 36b) and the function satisfies .
4. Evaluation
4.1. Partial Information Decomposition
4.1.1. Comparison of Different f-Information Measures
4.1.2. The Special Case of Total Variation
- a)
- b)
- For a non-empty set of pointwise channels , pointwise total variation quantifies the join element to the maximum of its individual channels (Equation 39b).
- c)
- The loss measure quantifies the meet for a set of sources on the synergy lattice to their minimum (Equation 39c).
4.2. Information Flow Analysis

5. Discussion
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. Quantifying Zonogon Perimeters
Appendix B. The Non-Negativity of Partial f-Information
Appendix B.1. Properties of the Loss Measure on the Synergy Lattice
- If , then the implication holds for any since the bottom element is inferior (⊑) to any other channel.
- If , then is also a non-empty set since .
Appendix B.2. Mapping Subsets of Even and Odd Cardinality
- a)
- For any subset with even cardinality, the function returns a subset of function :
- b)
- The function which satisfies Equation A8 has an inverse on its first argument .

-
At the base case , the sets of subsets are and . We define the function for any to satisfy both required properties:
-
For the induction step, we show the definition of a function that satisfies both required properties. For sets , the subsets of even and odd cardinality can be expanded as shown in Equation A10.We define for and at any as shown in Equation A11 using the function and its inverse from the induction hypothesis. The function is defined for any subset in as it can be seen from Equation A10.Figure A1 provides an intuition for the definition of : the outcome of determines, if the function maintains or breaks the mapping of .The function F as defined in Equation A11 satisfies both requirements (Equation A8 and A9) for any :
- a)
-
To demonstrate that the function satisfies the subset relation of Equation A8, we analyze the four cases for the return value of as defined in Equation A11 individually:
- holds, since the function always returns a subset of its input (Equation A7).
- holds by the induction hypothesis.
- if then : Since the input to function is not the empty set, the function returns a singleton subset of its input (Equation A7). If the element in the singleton subset is unequal to q, then it is a subset of .
- if then holds trivially.
- b)
-
To demonstrate that the function has an inverse (Equation A9), we show that the function is a bijection from to . Since the function is defined for all elements in and both sets have the same cardinality (, Equation A6), it is sufficient to show that the function is distinct for all inputs.The return value of has four cases, two of which return a set containing q (case 1 and 4 in Equation A11), while the two others do not (case 2 and 3 in Equation A11). Therefore, we have to show that both of these cases cannot coincide for any input:
- -
- Case 2 and 3 in Equation A11: If the return value of both cases was equal, then and therefore . This leads to a contradiction, since the condition of case 3 ensures , while the condition of case 2 ensures . Hence, the return values of case 2 and 3 are distinct.
- -
- Case 1 and 4 in Equation A11: If the return value of both cases was equal, then and therefore . This leads to a contradiction, since the condition of case 4 ensures , while the condition of case 1 ensures . Hence, the return values of case 1 and 4 are distinct.
Since the function is a bijection, there exists an inverse .
Appendix B.3. The non-negativity of the decomposition
-
Let , then its cover set is non-empty (). Additionally, we know that no atom in the cover set is the empty set (), since the empty atom is the top element ().Since it will be required later, note that the inclusion-exclusion principle of a constant is the constant itself as shown in Equation A16 since without the empty set there exists one more subset of odd cardinality than with even cardinality (see Equation A6).We can re-write the Möbious inverse as shown in Equation A17.Consider the non-empty set of channels , then we obtain Equation A18b from Lemma A6.We can construct an upper bound on based on the cover set as shown in Equation A19.By transitivity of Equation A18b and A19d, we obtain Equation A20.By Equation A17 and A20, we obtain the non-negativity of pointwise partial information as shown in Equation A21.
Appendix C. Scaling f-information does not affect its transformation
Appendix D. Decomposition Example Distributions
![]() |

Appendix E. The Relation of Total Variation to the Zonogon Height

represents the slope of vector




corresponds to the convex hull of all individual zonogons (see Equation 7). The maximal height that the convex hull reaches above the diagonal is equal to the maximum of the maximal height that each individual zonogon reaches. Since pointwise total variation
is a liner scaling of the (Euclidean) zonogon height above the diagonal (Lemma 2 a) shown above), the join element is valuated to the maximum of its individual channels. □Appendix F. Information Flow Example Parameters and Visualization


References
- Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv 2010, arXiv:1004.2515. [Google Scholar]
- Lizier, J.T.; Bertschinger, N.; Jost, J.; Wibral, M. Information Decomposition of Target Effects from Multi-Source Interactions: Perspectives on Previous, Current and Future Work. Entropy 2018, 20. [Google Scholar] [CrossRef]
- Griffith, V.; Chong, E.K.P.; James, R.G.; Ellison, C.J.; Crutchfield, J.P. Intersection Information Based on Common Randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J. Shared Information—New Insights and Problems in Decomposing Information in Complex Systems. In Proceedings of the Proceedings of the European Conference on Complex Systems 2012; 2013; pp. 251–269. [Google Scholar]
- Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef]
- Finn, C. A New Framework for Decomposing Multivariate Information. PhD thesis, University of Sydney, 2019.
- Polyanskiy, Y.; Wu, Y. Information theory: From coding to learning. Book draft 2022.
- Mironov, I. Renyi Differential Privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF); 2017; pp. 263–275. [Google Scholar] [CrossRef]
- Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying Unique Information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
- Griffith, V.; Koch, C. , Heidelberg, 2014; pp. 159–190. https://doi.org/10.1007/978-3-642-53734-9_6.Information. In Guided Self-Organization: Inception; Springer Berlin Heidelberg: Berlin, Heidelberg, 2014; pp. 159–190. [Google Scholar] [CrossRef]
- Finn, C.; Lizier, J.T. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy 2018, 20. [Google Scholar] [CrossRef]
- Ince, R.A.A. Measuring Multivariate Redundant Information with Pointwise Common Change in Surprisal. Entropy 2017, 19. [Google Scholar] [CrossRef]
- Rosas, F.E.; Mediano, P.A.M.; Rassouli, B.; Barrett, A.B. An operational information decomposition via synergistic disclosure. Journal of Physics A: Mathematical and Theoretical 2020, 53, 485001. [Google Scholar] [CrossRef]
- Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy 2022, 24. [Google Scholar] [CrossRef]
- Bertschinger, N.; Rauh, J. The Blackwell relation defines no lattice. In Proceedings of the 2014 IEEE International Symposium on Information Theory; 2014; pp. 2479–2483. [Google Scholar] [CrossRef]
- Lizier, J.T.; Flecker, B.; Williams, P.L. Towards a synergy-based approach to measuring information modification. In Proceedings of the 2013 IEEE Symposium on Artificial Life (ALife); 2013; pp. 43–51. [Google Scholar] [CrossRef]
- Knuth, K.H. Lattices and Their Consistent Quantification. Annalen der Physik 2019, 531, 1700370. [Google Scholar] [CrossRef]
- Mages, T.; Rohner, C. Decomposing and Tracing Mutual Information by Quantifying Reachable Decision Regions. Entropy 2023, 25. [Google Scholar] [CrossRef]
- Blackwell, D. Equivalent comparisons of experiments. The annals of mathematical statistics, 1953; 265–272. [Google Scholar]
- Neyman, J.; Pearson, E.S. IX. On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 1933, 231, 289–337. [Google Scholar]
- Chicharro, D.; Panzeri, S. Synergy and Redundancy in Dual Decompositions of Mutual Information Gain and Information Loss. Entropy 2017, 19. [Google Scholar] [CrossRef]
- Csiszar, I. On information-type measure of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
- Renyi, A. On measures of entropy and information. In Proceedings of the Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics.; 1961; Volume 4, pp. 547–562. [Google Scholar]
- Sason, I.; Verdu, S. f -Divergence Inequalities. IEEE Transactions on Information Theory 2016, 62, 5973–6006. [Google Scholar] [CrossRef]
- Kailath, T. The divergence and Bhattacharyya distance measures in signal selection. IEEE transactions on communication technology 1967, 15, 52–60. [Google Scholar] [CrossRef]
- Arikan, E. Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels. IEEE Transactions on Information Theory 2009, 55, 3051–3073. [Google Scholar] [CrossRef]
- Bhattacharyya, A. On a measure of divergence between two statistical populations defined by their probability distribution. Bulletin of the Calcutta Mathematical Society 1943, 35, 99–110. [Google Scholar]
- Mages, T.; Anastasiadi, E.; Rohner, C. Implementation: PID Blackwell specific information. https://github.com/uucore/pid-blackwell-specific-information, 2024.






| KL divergence | ||
| TV | ||
| -divergence | ||
| Squared Hellinger distance | ||
| Le Cam distance | ||
| Jensen-Shannon divergence | ||
| Hellinger divergence with | ||
| -divergence with |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

