Submitted:
13 June 2026
Posted:
15 June 2026
You are already at the latest version
Abstract

Keywords:
1. Introduction
- 1) Analyze the error in FEP and its mathematical cause.
- 2) Clarify the equivalence relationship between the minimum free energy criterion and the maximum information efficiency criterion to promote the application of the maximum information efficiency criterion.
2. Two Expected VFEs and the Mathematical Cause of the above Error
- Friston also confirmed in his correspondence with the author that the expected VFE is F1.
- Friston agreed in his reply that the author interpreted H(x) – F1 as semantic mutual information, which implies that the expected VFE he uses is close to Hθ(x|z).
- In machine learning, it uses the maximum likelihood criterion for the optimization of the model parameter θ;
- In Active Inference, minimizing F1 is equivalent to minimizing average error.
3. Equivalence Relation and Differences Between VB and SVB
4. Experimental Results
4.1. Performances of q(z|x) and F1 Under Different Tasks
4.2. F2 Decreases While F1 Increases During a Mixture Model’s Convergence
5. Discussion
6. Conclusions
References
- Hinton, G.E.; van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual Conference on Computational Learning Theory, 1993; pp. 5–13. [Google Scholar]
- Hinton, G.E.; Zemel, R.S. Autoencoders, minimum description length and Helmholtz free energy. In Proceedings of the 6th International Conference on Neural Information Processing Systems, 1993; pp. 3–10. [Google Scholar]
- Neal, R.M.; Hinton, G.E. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models; Jordan, M.I., Ed.; MIT Press: Cambridge, MA, 1999; pp. 355–368. [Google Scholar]
- Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci. 2010, 11, 127–138. [Google Scholar] [CrossRef] [PubMed]
- Friston, K.J.; Parr, T.; de Vries, B. The graphical brain: Belief propagation and active inference. Netw. Neurosci. 2017, 1, 381–414. [Google Scholar] [CrossRef] [PubMed]
- Parr, T.; Pezzulo, G.; Friston, K.J. Active Inference: The Free Energy Principle in Mind, Brain, and Behavior; MIT Press: Cambridge, MA, 2022. [Google Scholar]
- Lu, C. Improving the minimum free energy principle to the maximum information efficiency principle. Entropy 2025, 27, 684. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, D. An in Depth Introduction to Variational Bayes Note. Available at. 2023. [Google Scholar] [CrossRef]
- Lu, C. A semantic generalization of Shannon’s information theory and applications. Entropy 2025, 27, 461. [Google Scholar] [CrossRef] [PubMed]
- Lu, C. Semantic Variational Bayes Based on Semantic Information G Theory for Solving Latent Variables. J. Electron. Inf. Syst. 2026, 8(1), 30–46. [Google Scholar] [CrossRef]
- Sengupta, B.; Stemmler, M.B.; Friston, K.J. Information and efficiency in the nervous system—A synthesis. PLoS Comput. Biol. 2013, 9, e1003157. [Google Scholar] [CrossRef] [PubMed]
- Da Costa, L.; Parr, T.; Sajid, N.; Veselic, S.; Neacsu, V.; Friston, K. Active inference on discrete state-spaces: A synthesis. J. Math. Psychol. 2020, 99, 102447. [Google Scholar] [CrossRef] [PubMed]
- Shannon, C.E. Coding theorems for a discrete source with a fidelity criterion. In IRE National Convention Record; 1959; pp. 142–163. [Google Scholar]
- Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. Beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proceedings of the International Conference on Learning Representations (ICLR), 2017. [Google Scholar]




| Task | Requirements | R(bits) | G(bits) | F1(bits) | G/R |
| Mixture model, minimizing F2 |
s=1, R=G, Hθ(x)≈H(x)=6.38 |
0.621=G | 0.621 | 5.76 | 1.0 |
| Classification, minimizing F1 |
s→∞, Hθ(x)≈H(x) | 0.954=H(z) | 0.80 | 5.58 | 0.84 |
| Active Inference, minimizing F1 |
Ha(x)≠H(x), H(X|Z)=0 |
6.38=H(x) | 1.62 | 4.76 | 0.25 |
| True model’s Parameters | Initial parameters | ||||||
| c* | σ* | P*(Y) | c | σ | P(Y) | ||
| y1 | 40 | 15 | 0.5 | 40 | 5 | 0.5 | |
| y2 | 75 | 15 | 0.5 | 40 | 5 | 0.5 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).