Submitted:
24 July 2025
Posted:
25 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Novelty and Contributions
1.2. Applications
2. Preliminaries
3. Characterization of in (1)
4. Characterizations of in (2)
When the reference measure Q is the Lebesgue measure
When the reference measure Q is the counting measure
5. Characterizations of in (6)
- The probability measures and are both absolutely continuous with respect to a given σ-finite measure Q; and
- The probability measures and are mutually absolutely continuous.
6. Final Remarks
Author Contributions
Funding
Conflicts of Interest
References
- Gama, J.; Medas, P.; Castillo, G.; Rodrigues, P. Learning with drift detection. In Proceedings of the Proceedings of the 17th Brazilian Symposium on Artificial Intelligence, Sao Luis, Maranhao, Brazil, Oct. 2004; pp. 286–295.
- Webb, G.I.; Lee, L.K.; Goethals, B.; Petitjean, F. Analyzing concept drift and shift from sample data. Data Mining and Knowledge Discovery 2018, 32, 1179–1199. [Google Scholar] [CrossRef]
- Oliveira, G.H.F.M.; Minku, L.L.; Oliveira, A.L. Tackling virtual and real concept drifts: An adaptive Gaussian mixture model approach. IEEE Transactions on Knowledge and Data Engineering 2021, 35, 2048–2060. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. The Bell System Technical Journal 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. The Bell System Technical Journal 1948, 27, 623–656. [Google Scholar] [CrossRef]
- Palomar, D.P.; Verdú, S. Lautum information. IEEE Transactions on Information Theory 2008, 54, 964–975. [Google Scholar] [CrossRef]
- Perlaza, S.M.; Esnaola, I.; Bisson, G.; Poor, H.V. On the Validation of Gibbs Algorithms: Training Datasets, Test Datasets and their Aggregation. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, Jun. 2023.
- Perlaza, S.M.; Bisson, G.; Esnaola, I.; Jean-Marie, A.; Rini, S. Empirical Risk Minimization with Relative Entropy Regularization. IEEE Transactions on Information Theory 2024, 70, 5122–5161. [Google Scholar] [CrossRef]
- Zou, X.; Perlaza, S.M.; Esnaola, I.; Altman, E. Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, Feb.
- Zou, X.; Perlaza, S.M.; Esnaola, I.; Altman, E.; Poor, H.V. The Worst-Case Data-Generating Probability Measure in Statistical Learning. IEEE Journal on Selected Areas in Information Theory 2024, 5, 175–189. [Google Scholar] [CrossRef]
- Perlaza, S.M.; Zou, X. The Generalization Error of Machine Learning Algorithms. arXiv 2024. [Google Scholar] [CrossRef]
- Chentsov, N.N. Nonsymmetrical distance between probability distributions, entropy and the theorem of Pythagoras. Mathematical notes of the Academy of Sciences of the USSR 1968, 4, 686–691. [Google Scholar] [CrossRef]
- Csiszár, I.; Matus, F. Information projections revisited. IEEE Transactions on Information Theory 2003, 49, 1474–1490. [Google Scholar] [CrossRef]
- Müller, A. Integral probability metrics and their generating classes of functions. Advances in applied probability 1997, 29, 429–443. [Google Scholar] [CrossRef]
- Zolotarev, V.M. Probability metrics. Teoriya Veroyatnostei i ee Primeneniya 1983, 28, 264–287. [Google Scholar] [CrossRef]
- Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A Kernel Two-Sample Test. Journal of Machine Learning Research 2012, 13, 723–773. [Google Scholar]
- Villani, C. Optimal transport: Old and new, first ed.; Springer: Berlin, Germany, 2009. [Google Scholar]
- Liu, W.; Yu, G.; Wang, L.; Liao, R. An Information-Theoretic Framework for Out-of-Distribution Generalization with Applications to Stochastic Gradient Langevin Dynamics. arXiv 2024, arXiv:2403.19895 2024. [Google Scholar]
- Liu, W.; Yu, G.; Wang, L.; Liao, R. An Information-Theoretic Framework for Out-of-Distribution Generalization. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Athens, Greece, July 2024; pp. 2670–2675.
- Agrawal, R.; Horel, T. Optimal Bounds between f-Divergences and Integral Probability Metrics. Journal of Machine Learning Research 2021, 22, 1–59. [Google Scholar]
- Rahimian, H.; Mehrotra, S. Frameworks and results in distributionally robust optimization. Open Journal of Mathematical Optimization 2022, 3, 1–85. [Google Scholar] [CrossRef]
- Xu, C.; Lee, J.; Cheng, X.; Xie, Y. Flow-based distributionally robust optimization. IEEE Journal on Selected Areas in Information Theory 2024, 5, 62–77. [Google Scholar] [CrossRef]
- Hu, Z.; Hong, L.J. Kullback-Leibler divergence constrained distributionally robust optimization. Optimization Online 2013, 1, 9. [Google Scholar]
- Radon, J. Theorie und Anwendungen der absolut additiven Mengenfunktionen, first ed.; Hölder: Vienna, Austria, 1913. [Google Scholar]
- Nikodym, O. Sur une généralisation des intégrales de MJ Radon. Fundamenta Mathematicae 1930, 15, 131–179. [Google Scholar] [CrossRef]
- Aminian, G.; Bu, Y.; Toni, L.; Rodrigues, M.; Wornell, G. An Exact Characterization of the Generalization Error for the Gibbs Algorithm. Advances in Neural Information Processing Systems 2021, 34, 8106–8118. [Google Scholar]
- Perlaza, S.M.; Bisson, G.; Esnaola, I.; Jean-Marie, A.; Rini, S. Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, Jul. 2022; pp. 684–689.
- Jiang, W.; Tanner, M.A. Gibbs posterior for variable selection in high-dimensional classification and data mining. The Annals of Statistics 2008, 36, 2207–2231. [Google Scholar] [CrossRef]
- Perlaza, S.M.; Esnaola, I.; Bisson, G.; Poor, H.V. On the Validation of Gibbs Algorithms: Training Datasets, Test Datasets and their Aggregation. In Proceedings of the Proceedings of the IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan, Jun. 2023.
- Alquier, P.; Ridgway, J.; Chopin, N. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research 2016, 17, 8374–8414. [Google Scholar]
- Bu, Y.; Aminian, G.; Toni, L.; Wornell, G.W.; Rodrigues, M. Characterizing and understanding the generalization error of transfer learning with Gibbs algorithm. In Proceedings of the Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual Conference, Mar. 2022; pp. 8673–8699.
- Raginsky, M.; Rakhlin, A.; Tsao, M.; Wu, Y.; Xu, A. Information-theoretic analysis of stability and bias of learning algorithms. In Proceedings of the Proceedings of the IEEE Information Theory Workshop (ITW), Cambridge, UK, Sep. 2016; pp. 26–30.
- Zou, B.; Li, L.; Xu, Z. The Generalization Performance of ERM algorithm with Strongly Mixing Observations. Machine Learning 2009, 75, 275–295. [Google Scholar] [CrossRef]
- He, H.; Aminian, G.; Bu, Y.; Rodrigues, M.; Tan, V.Y. How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm? In Proceedings of the Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain, Apr. 2023; pp. 8494–8520. [Google Scholar]
- Hellström, F.; Durisi, G.; Guedj, B.; Raginsky, M. Generalization Bounds: Perspectives from Information Theory and PAC-Bayes. Foundations and Trends® in Machine Learning 2025, 18, 1–223. [Google Scholar] [CrossRef]
- Jaynes, E.T. Information Theory and Statistical Mechanics I. Physical Review Journals 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Jaynes, E.T. Information Theory and Statistical Mechanics II. Physical Review Journals 1957, 108, 171–190. [Google Scholar] [CrossRef]
- Kapur, J.N. Maximum Entropy Models in Science and Engineering, first ed.; Wiley: New York, NY, USA, 1989. [Google Scholar]
- Bermudez, Y.; Bisson, G.; Esnaola, I.; Perlaza, S.M. Proofs for Folklore Theorems on the Radon-Nikodym Derivative. Technical Report RR-9591, INRIA, Centre Inria d’Université Côte d’Azur, Sophia Antipolis, France, 2025.
- Donsker, M.D.; Varadhan, S.S. Asymptotic evaluation of certain Markov process expectations for large time, I. Communications on pure and applied mathematics 1975, 28, 1–47. [Google Scholar] [CrossRef]
- Heath, T.L. The Thirteen Books of Euclid’s Elements, 2nd revised edition ed.; Dover Publications, Inc.: New York, 1956. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).