Submitted:
13 October 2025
Posted:
16 October 2025
You are already at the latest version
Abstract
Keywords:
1. Mathematical Foundations
- Functional Approximation: Modeling complex, non-linear functions with neural networks. Functional approximation is one of the building blocks of deep learning, and it lies at the core of how deep learning models, especially neural networks, are able to tackle hard problems. In deep learning, functional approximation means the capability of neural networks to approximate complex, high-dimensional, and non-linear functions that tend to be hard or impossible to model with standard mathematical methods.
- Optimization Theory: Efficiently solving non-convex optimization problems. Optimization theory is a key area of deep learning, as training deep neural networks is really just minimizing the best set of parameters (weights and biases) of a given objective, commonly referred to as the loss function. The objective will usually be some measure of difference between the network outputs and the actual values. Optimization methods control the process of training and decide the extent to which a neural network may learn from data.
- Statistical Learning Theory: Generalization behavior on unseen data. Statistical Learning Theory (SLT) gives the mathematical basis for the generalization behavior of machine learning methods, including deep learning methods. It provides important insights into how models generalize from training data to novel data, and this is very important in making sure that deep learning models are not just correct on the training set but also work well on new, unseen data. SLT assists in solving basic issues like overfitting, bias-variance tradeoff, and generalization error.
1.1. Problem Definition: Risk Functional as a Mapping Between Spaces
1.1.1. Measurable Function Spaces

1.1.1.1 Literature Review of Measurable Function Spaces
1.1.1.2 Analysis of Measurable Function Spaces
- f belongs to a hypothesis space .
- is a Borel probability measure over , satisfying .
1.1.2. Risk as a Functional
1.1.2.1 Literature Review of Risk as a Functional
1.1.2.2 Analysis of Risk as a Functional
1.2. Approximation Spaces for Neural Networks
- VC-dimension theory for discrete hypotheses.
- Rademacher complexity for continuous spaces:where are i.i.d. Rademacher random variables.
1.2.1. VC-Dimension Theory for Discrete Hypotheses
1.2.1.1 Literature Review of VC-Dimension Theory for Discrete Hypotheses
1.2.1.2 Analysis of VC-Dimension Theory for Discrete Hypotheses
- Shattering Implies Non-empty Hypothesis Class: If a set S is shattered by H, then H is non-empty. This follows directly from the fact that for each labeling , there exists some that produces the corresponding labeling. Therefore, H must contain at least one hypothesis.
- Upper Bound on Shattering: Given a hypothesis class H, if there exists a set of size k such that H can shatter S, then any set of size greater than k cannot be shattered. This gives us the crucial result that:
- Implication for Generalization A central result in the theory of statistical learning is the connection between VC-dimension and the generalization error. Specifically, the VC-dimension bounds the ability of a hypothesis class to generalize to unseen data. The higher the VC-dimension, the more complex the hypothesis class, and the more likely it is to overfit the training data, leading to poor generalization.
- Example 1: Linear Classifiers in : Consider the hypothesis class H consisting of linear classifiers in . These classifiers are hyperplanes in two dimensions, defined by:where is the weight vector and is the bias term. The VC-dimension of linear classifiers in is 3. This can be rigorously shown by noting that for any set of 3 points in , the hypothesis class H can shatter these points. In fact, any possible binary labeling of the 3 points can be achieved by some linear classifier. However, for 4 points in , it is impossible to shatter all possible binary labelings (e.g., the four vertices of a convex quadrilateral), meaning the VC-dimension is 3.
- Example 2: Polynomial Classifiers of Degree d: Consider a polynomial hypothesis class in of degree d. The hypothesis class H consists of polynomials of the form:where the are coefficients and . The VC-dimension of polynomial classifiers of degree d in grows as , implying that the complexity of the hypothesis class increases rapidly with both the degree d and the dimension n of the input space.
1.2.1.3 Python Code to Generate Figure 2 and Figure 3 Illustrating VC-Dimension Theory for Discrete Hypotheses




1.2.2. Rademacher Complexity for Continuous Spaces
1.2.2.1 Literature Review of Rademacher Complexity for Continuous Spaces
1.2.2.2 Analysis of Rademacher Complexity for Continuous Spaces
1.2.2.3 Python Code to Generate Figure 4 Illustrating Rademacher Complexity vs. Bound



1.2.2.4 Python Code to Generate Figure 5 Illustrating Rademacher Complexity: Linear vs Quadratic Function Classes



1.2.3. Sobolev Embeddings
1.2.3.1 Literature Review of Sobolev Embeddings
1.2.3.2 Analysis of Sobolev Embeddings
- Semi-norm Dominance: The Wk,p-norm is controlled by the seminorm , ensuring sensitivity to highorder derivatives.
- Poincaré Inequality: For Ω bounded, u − uΩ satisfies:
-
Sobolev Embedding Theorem: Let be a bounded domain with Lipschitz boundary. Then:
- If k > n/p, Wk,p(Ω) ↪ Cm,α() with m = ⌊k − n/p⌋ and α = k − n/p − m.
- If k = n/p, Wk,p(Ω) ↪ Lq(Ω) for q < ∞.
- If k < n/p, Wk,p(Ω) ↪ Lq(Ω) where .
-
Rellich-Kondrachov Compactness Theorem: The embedding is compact for . Compactness follows from:
- (a)
- Equicontinuity: -boundedness ensures uniform control over oscillations.
- (b)
- Rellich’s Selection Principle: Strong convergence follows from uniform estimates and tightness.
1.2.3.3 Python Code to Generate Figure 6, Figure 7, Figure 8, and Figure 9 Illustrating Sobolev Embeddings







1.2.4. Rellich-Kondrachov Compactness Theorem
1.2.4.1 Literature Review of Rellich-Kondrachov Compactness Theorem
1.2.4.2 Analysis of Rellich-Kondrachov Compactness Theorem
- The sequence does not oscillate excessively at small scales.
- The sequence does not escape to infinity in a way that prevents strong convergence.
1.2.4.3 Python Code to Generate Figure 10 and Figure 11 Illustrating Rellich-Kondrachov Compactness Theorem



1.2.5. Fréchet-Kolmogorov Compactness Criterion
1.2.5.1 Literature Review of Fréchet-Kolmogorov Compactness Criterion
1.2.5.2 Analysis of Fréchet-Kolmogorov Compactness Criterion
1.2.5.3 Python Code to Generate Figure 12



1.2.6. Sobolev-Poincaré Inequality
1.2.6.1 Literature Review of Sobolev-Poincaré Inequality
1.2.6.2 Analysis of Sobolev-Poincaré Inequality
- Regularity of PDE Solutions: The Sobolev-Poincaré inequality is crucial in proving the existence and regularity of weak solutions to elliptic PDEs.
- Compactness and Rellich-Kondrachov Theorem: It plays a role in proving the compact embedding of into , which is fundamental in functional analysis.
- Control of Function Oscillations: It quantifies how much a function can deviate from its mean, which is used in various areas of mathematical physics and geometry.
1.2.6.3 Python Code to Generate Figure 13 and Figure 14




2. Universal Approximation Theorem: Refined Proof
2.1. Literature Review of Universal Approximation Theorem
2.2. Approximation Using Convolution Operators
2.2.1. Python Code to Generate Figure 15 and Figure 16




2.2.2. Stone-Weierstrass Application
2.2.2.1 Literature Review of Stone-Weierstrass Application
2.2.2.2 Analysis of Stone-Weierstrass Application
2.2.2.3 Python Code to Generate Figure 17, Figure 18 and Figure 19






2.3. Depth vs. Width: Capacity Analysis
2.3.1. Bounding the Expressive Power
2.3.1.1 Literature Review of Kolmogorov-Arnold Superposition Theorem
| Paper | Main Contribution | Impact |
| Kolmogorov (1957) | Original KST theorem | Laid foundation for function decomposition |
| Arnold (1963) | Refinement using 2-variable functions | Made KST more practical for computation |
| Lorentz (2008) | KST in approximation theory | Linked KST to function approximation errors |
| Pinkus (1999) | KST in neural networks | Theoretical basis for deep learning |
| Perdikaris (2024) | Deep learning reinterpretation | Proposed Kolmogorov-Arnold Networks |
| Alhafiz (2025) | KST-based turbulence modeling | Improved CFD simulations |
| Lorencin (2024) | KST in naval propulsion | Optimized ship energy efficiency |
2.3.1.2 Analysis of Kolmogorov-Arnold Superposition Theorem
2.3.1.3 Python Code to Generate Figure 20



2.3.2. Fourier Analysis of Expressivity
2.3.2.1 Literature Review of Fourier Analysis of Expressivity
2.3.2.2 Analysis of Fourier Analysis of Expressivity
| Activation Function | Fourier Decay Rate | Effect on Frequency Learning |
| Sigmoid | (Exponential) | Strong low-pass filter, retains only low frequencies |
| Tanh | (Exponential) | Strong low-pass filter, smooth approximations |
| ReLU | (Power-law) | Allows moderate frequency learning |
| Leaky ReLU | (Power-law) | Similar to ReLU with slightly improved high-frequency retention |
| Sinusoidal | No decay | Captures all frequencies, highly oscillatory functions |
2.3.2.3 Python Code to Generate Figure 21 and Figure 22




2.3.2.4 Python Code to Generate Figure 23 and Figure 24




2.3.3. Fourier Transforms of Various Activation Functions
2.3.3.1 Fourier Transform of the Sigmoid Function

2.3.3.2 Fourier Transform of the Hyperbolic Tangent Function

2.3.3.3 Fourier Transform of the ReLU Function

2.3.3.4 Fourier Transform of the Leaky ReLU Function
2.3.3.5 Fourier Transform of the Sinusoidal Activation Function

2.4. The Connection Between Different Mathematics Problems and Deep Learning
2.4.1. Basel Problem and Deep Learning
2.4.1.1 The Basel Problem, Fourier Series, and Function Approximation in Deep Learning
2.4.1.2 The Role of the Basel Problem in Regularization and Weight Decay
2.4.1.3 Spectral Bias in Deep Learning and the Basel Problem
3. Training Dynamics and NTK Linearization
3.1. Python Code to Generate Figure 30 Illustrating the Training Dynamics vs NTK Linearization




3.2. Literature Review of Training Dynamics and NTK Linearization
3.3. Gradient Flow and Stationary Points
3.3.1. Literature Review of Gradient Flow and Stationary Points
3.3.2. Analysis of Gradient Flow and Stationary Points
3.3.3. Hessian Structure
3.3.4. NTK Linearization
3.3.5. Python Code to Generate Figure 31 Illustrating the Gradient Flow and Stationary Points



3.3.6. Python Code to Generate Figure 32 and Figure 33 Illustrating the Hessian Structure





3.3.7. Python Code to Generate Figure 34 Illustrating the Final Fit & Training Dynamics (NN vs NTK)




3.4. NTK Regime
3.4.1. Literature Review of NTK Regime
3.4.2. Analysis of NTK Regime
3.4.3. Python Code to Generate Figure 35 Illustrating the Final Predictions in NTK Regime & Training Dynamics (NN vs NTK)




4. Generalization Bounds: PAC-Bayes and Spectral Analysis
4.1. PAC-Bayes Formalism
4.1.1. Literature Review of PAC-Bayes Formalism
4.1.2. Analysis of PAC-Bayes Formalism
- Empirical Risk: captures how well the posterior Q fits the training data.
- Complexity: The KL divergence ensures that Q remains close to P, discouraging overfitting and promoting generalization.
- Confidence: The term shrinks with increasing sample size, tightening the bound and enhancing reliability.
4.1.2.1 Python Code to Generate Figure 36, Figure 37, and Figure 38 Illustrating PAC-Bayes Bound vs. Sample Size





4.1.3. KL Divergence
4.1.3.1 Python Code to Generate Figure 39, Figure 40 Illustrating KL Divergence in Bernoulli Random Variables and Figure 41, Figure 42 Illustrating KL Divergence in Gaussian Random Variables






4.1.4. Rényi Divergence
4.1.4.1 Python Code to Generate Figure 43 and Figure 44 Illustrating Rényi Divergence




4.1.5. Wasserstein Distance
4.1.5.1 Python Code to Generate Figure 45, Figure 46, and Figure 47 Illustrating Wasserstein Distance





4.2. Spectral Regularization
4.2.1. Literature Review of Spectral Regularization
4.2.2. Analysis of Spectral Regularization
4.2.3. Python Code to Generate Figure 48 and Figure 49 Illustrating Spectral Regularization



4.2.4. Python Code to Generate Figure 50 Illustrating Singular Value Spectrum of First Layer (MNIST)



5. Game-Theoretic Formulations of Deep Neural Networks
5.1. Literature Review of Game-Theoretic Formulations of Deep Neural Networks
5.2. Analysis of Game-Theoretic Formulations of Deep Neural Networks
5.2.1. Python Code to Generate Figure 51, Figure 52, and Figure 53 Illustrating Game-Theoretic Formulations of Deep Neural Networks





5.2.2. Min-Max (Saddle) Dynamics:
5.2.3. Potential/Cooperative Game (Gradient Descent on Shared Phi)
5.2.4. Replicator Dynamics (Rock-Paper-Scissors) — Cyclic Behavior
5.2.5. Python Code to Generate Figure 54, Figure 55 and Figure 56 Illustrating Min-Max (Saddle) Dynamics, Potential/Cooperative Game (Gradient Descent on Shared Phi), and Replicator Dynamics (Rock-Paper-Scissors) — Cyclic Behavior







5.3. Game-Theoretic Formulations of Deep Neural Networks (DNNs) Through Evolutionary Game Dynamics
5.3.1. Python Code to Generate Figure 57, Figure 58 and Figure 59 Illustrating Game-Theoretic Formulations of Deep Neural Networks (DNNs) Through Evolutionary Game Dynamics





5.4. Analysis of Deep Neural Networks (DNNs) Through Variational Inequalities
5.4.1. Monotone/Potential Operator (Gradient of a Convex Potential)
5.4.2. Saddle/Non-Monotone Bilinear Operator (Models min–max Behavior)
- The block-Jacobianwith
- The identityfor the bilinear case
- Conservation of quadratic energies in continuous time, and
- Discrete amplificationfor explicit Euler.
5.4.3. Skew (Rotation)
5.4.4. Skew + Dissipation
5.4.5. Python Code to Generate Figure 60, Figure 61 Illustrating Monotone/Potential Operator and Figure 62 Illustrating Saddle Bilinear F (min-max) and Figure 63 Illustrating Skew (Rotation) F: R z and Figure 64 Illustrating Skew + Dissipation F: R + Alpha I










5.5. Optimal Control in Reinforcement Learning (RL)-Based Deep Neural Network (DNN) Training
5.5.1. Backpropagation as Pontryagin’s Maximum Principle (PMP)
5.5.2. The Method of Successive Approximation (MSA)
5.5.3. Differentiable Optimization Layers
5.5.4. Neural Ordinary Differential Equations (Neural ODEs)
5.5.5. Stochastic Optimal Control for RL and Robust Training
5.5.6. Stable and Efficient Network Design
5.5.7. Optimal Control of the Training Process Itself
5.5.8. Connections to Reinforcement Learning
5.6. Differential Game Theory and Training Dynamics
6. Optimal Transport Theory in Deep Neural Networks
6.1. Literature Review of Optimal Transport Theory in Deep Neural Networks
6.2. Analysis of Optimal Transport Theory in Deep Neural Networks
6.3. Jensen-Shannon Divergence
6.4. Matching Latent and Data Distributions in Probabilistic Autoencoders
6.4.1. Probabilistic Autoencoders
6.4.2. 2-Wasserstein Distance
6.5. Optimal Transport (OT)-Based Priors in Bayesian Deep Learning
6.6. Application of Optimal Transport in Learning Energy-Based Models
6.7. Sinkhorn-Knopp Algorithm
6.7.1. Sinkhorn Distance
6.7.2. Wasserstein GANs
6.8. Kantorovich Duality
6.9. Entropy Regularization
6.9.1. Classical Monge-Kantorovich Formulation
6.9.2. Fenchel-Rockafellar Duality Theorem
6.10. Optimal transport theory Based Training Dynamics and Sinkhorn Divergences
6.11. Optimal Transport Theory in Neural Architecture and Gradient Flows
6.12. Geometric Regularization and Barycenters
7. Categorical Foundations of Deep Learning
7.1. Introduction
7.1.1. Motivation: Why Category Theory for Deep Learning?
- Abstraction over computation: Neural networks are fundamentally morphisms (maps) between data representations, and Category Theory provides tools to study compositions of such maps.
- Generalization across architectures: Whether working with CNNs, RNNs, or Transformers, Category Theory allows us to describe them as instances of categorical constructs (e.g., monoidal categories, functors).
- Handling complex data flows: Many Deep Learning constructs (e.g., attention mechanisms, residual connections) are naturally expressed via universal properties (limits, adjunctions).
- Bridging discrete and continuous reasoning: Gradient-based optimization (backpropagation) can be formulated categorically via lenses or reverse derivative categories.
- Formal reasoning about model behavior.
- Systematic architecture design via compositionality.
- Novel generalizations (e.g., neural networks over graphs, higher-order interactions).
7.1.2. Limits of Traditional Mathematical Formalism in Modern Machine Learning
- Lack of Compositionality: Deep Learning models are built via hierarchical composition of layers, but linear algebra treats them as sequences of matrix operations without inherent structure. The Category Theory Solution is to represent networks as morphisms in a category, where composition is associative by definition.
- Poor Handling of Complex Data Types: Tensors, graphs, and sequences require ad-hoc notation. Backpropagation, for instance, is often derived via index manipulation rather than structurally. The Category Theory Solution is to use monoidal categories to model tensors and string diagrams for graphical reasoning.
- Opaque Optimization Dynamics: Gradient descent is typically analyzed via local approximations (Taylor expansions), obscuring global behavior. The Category Theory Solution is to use reverse derivative categories (RDCs) to give an abstract formulation of backpropagation.
- Difficulty in Generalizing Architectures: Novel components (e.g., attention, memory) are introduced empirically without a unifying framework. The Category Theory Solution is to use universal properties (e.g., adjunctions) to guide principled extensions.
- Weak Theoretical Guarantees: Traditional theory (e.g., VC dimension) fails to explain the success of overparametrized Deep Learning models. The Category Theory Solution is to use functorial semantics to connect learning to algebraic invariants.
7.1.3. Overview of the Chapter and Learning Goals
- Categories and Functors: At its core, a category formalizes the notion of objects (such as data spaces) and morphisms (such as neural network layers) that can be composed associatively. Functors extend this by mapping one category to another, preserving structure—allowing us to describe how neural architectures transform data systematically. We shall discuss the basic definitions, examples (e.g., FeedForwardNet, the category of neural networks).
- Universal Constructions: Universal constructions, including products and limits, characterize optimal ways to combine or relate objects. In Deep Learning, these appear in architectures like residual networks, where skip connections arise as coproducts, or attention mechanisms, which aggregate information universally. The ability to reason about such constructions enables principled architecture design rather than relying on intuition alone. We shall discuss the products, coproducts, and how they relate to branching architectures (e.g., ResNet skip connections).
- Monoidal Categories: Monoidal categories introduce tensor products, capturing parallel computation and multi-dimensional data flow. This framework elegantly models operations in convolutional networks or transformers, where tensor contractions and parallel processing are fundamental. String diagrams, a graphical notation from monoidal categories, offer an intuitive way to visualize and manipulate complex network structures. We shall discuss the modeling of tensors and parallel computation (e.g., CNNs as functors).
- Lenses and Backpropagation: Differentiation and optimization, central to Deep Learning, are naturally expressed using lenses and reverse derivative categories. A lens pairs a forward pass (evaluation) with a backward pass (gradient propagation), abstracting backpropagation as a compositional process. This shifts gradient-based learning from an algorithmic procedure to a categorical construct, clarifying its mathematical essence. We shall discuss the categorical differentiation via reverse derivative categories.
- Higher-Order Abstractions: Higher-order abstractions like adjunctions and monads further extend the expressive power of the framework. Adjunctions formalize relationships between paired functors, such as encoder-decoder networks in autoencoders, while monads model computational effects like stochasticity in dropout or reinforcement learning. These tools allow us to describe complex behaviors—such as memory, recursion, or attention—in a unified way. We shall discuss the adjunctions, monads, and their role in attention/ memory mechanisms.
- Design models more systematically.
- Analyze them more rigorously.
- Discover new architectures more confidently.
7.2. Category Theory Primer for Machine Learners
7.2.1. Basic Definitions: Categories, Functors, Natural Transformations, Adjunctions, Monoidal Categories
7.2.1.1 Categories A category consists of:
- A collection of objects (e.g., sets, vector spaces).
- For every pair , a set of morphisms (arrows) .
- A composition rule ∘ such that for and , there exists .
- An identity morphism for each object, satisfying:
- Associativity: For ,
7.2.1.2 Functors A (covariant) functor between categories consists of:
- A mapping on objects.
-
For each morphism in , a morphism in , preserving:
- −
- Identities: .
- −
- Composition: .
7.2.1.3 Natural Transformations Given functors , a natural transformation assigns to each a morphism such that for every in , the following naturality condition holds:
7.2.1.4 Adjunctions
7.2.1.5 Monoidal Categories
7.2.2. Use of String Diagrams and Abstraction and Generalization in Machine Learning
7.2.2.1 String Diagrams as Graphical Calculus for Neural Networks
7.2.2.2 Abstraction and Generalization in Machine Learning
7.2.3. Examples from Familiar Settings
7.2.3.1 Category of Sets ()
- Objects: Sets.
- Morphisms: Functions .
-
Functors:
- −
- The powerset functor maps A to its power set and to the direct image .
-
Natural Transformation:
- −
- The singleton map , where , is natural because for any ,
7.2.3.2 Category of Vector Spaces ()
- Objects: Vector spaces over .
- Morphisms: Linear maps .
-
Functors:
- −
- The dual space functor maps V to and to its transpose .
-
Natural Transformation:
- −
- The double dual embedding , where , is natural because for any linear ,
7.2.3.3 Category of Neural Networks (Informal Example)
- Objects: Data spaces (e.g., ).
- Morphisms: Neural network layers (e.g., affine maps ).
-
Functors:
- −
- A training functor could map a network architecture to its trained parameter space.
7.2.4. Diagrams as Proofs and Reasoning Tools
7.2.4.1 Commutative Diagrams A diagram commutes if all paths between two objects yield the same morphism. For example, the naturality square for :
7.2.4.2 Universal Properties via Diagrams
7.2.4.3 Applications in ML
- Backpropagation as a Functorial Diagram: The chain rule can be expressed as a commuting diagram in the category of differentiable functions.
- Attention Mechanisms: The query-key-value interaction in transformers can be modeled as a limit diagram.
7.2.4.3.1 Backpropagation as a Functorial Diagram
7.2.4.3.2 Attention Mechanisms
7.2.5. Summary
- Abstracting computation via categories and functors.
- Unifying structures (e.g., vector spaces, neural networks) under common principles.
- Enabling diagrammatic proofs for complex architectures.
7.3. Neural Networks as Composable Morphisms
7.3.1. Layers as Morphisms; Networks as Compositions
7.3.1.1 Neural Networks as Composable Functions
-
Each layer be a function , where:
- −
- is the input space (e.g., ),
- −
- is the parameter space (e.g., weights and biases ),
- −
- is the output space.
- A network N with k layers is the composition:where .
7.3.1.2 Category-Theoretic Interpretation
- Objects are spaces (e.g., ),
- Morphisms are pairs , where:is a smooth function, and .
- Composition of morphisms and is given by:where:
- Identity morphism is given by (no parameters).
7.3.2. The Category of Parameterized Functions
7.3.2.1 Definition of the Category Para
- Objects: Euclidean spaces (or more generally, smooth manifolds).
- Morphisms: A morphism is a smooth function:where is the parameter space (also Euclidean).
- Composition: As defined above, composition is associative due to the associativity of function composition.
- Identity: The identity morphism is the projection .
- A parameter space (a measurable space, often )
- A measurable function
7.3.2.2 Functoriality of Neural Networks
- Each node maps to a space ,
- Each edge maps to a parameterized morphism ,
- Functoriality ensures that compositions in the graph correspond to compositions in Para.
- Object Preservation: (the underlying data space).
- Morphism Action: where .
- Compositionality: for composable , with parameter spaces concatenated:
- Identity Preservation: .
7.3.2.3 Monoidal Structure (for Parallel Composition)
- Tensor product ⊗ combines spaces and parameters:
7.3.3. Role of Associativity and Identity in Sequential Models
7.3.3.1 Associativity of Composition
- The associativity law in Para ensures that composing layers in any order (while respecting dependencies) yields the same function:
- This justifies modular network design: we can group layers into submodules without changing behavior.
7.3.3.2 Identity Morphisms and Skip Connections
- The identity morphism allows for skip connections (e.g., ResNet):
- Identity ensures that a “null” layer (doing nothing) is a valid network component.
7.3.3.3 Universality of Sequential Models
- The ability to compose morphisms arbitrarily allows for universal approximation (e.g., via deep feedforward networks).
- The categorical framework generalizes to recurrent networks by working in a category of dynamical systems, where morphisms are parameterized recurrent cells.
7.3.4. Conclusion
- Layers as morphisms, with networks as their compositions.
- A rigorous category of parameterized functions, supporting functorial and monoidal structures.
- Associativity and identity as fundamental properties enabling modular and correct-by-construction network design.
8. Open Set Learning
-
Gaussian Distribution Model (GDM) The fundamental assumption is that the feature distribution of each known class follows a multivariate normal distribution parameterized by the mean vector and covariance matrix . The likelihood of a given sample belonging to class c is:To quantify the confidence in assigning to class c, we compute the Mahalanobis distance:A sample is rejected as unknown if:where is a threshold chosen based on extreme value statistics.
- Gaussian Mixture Model (GMM) The Gaussian assumption can be generalized using a Gaussian Mixture Model (GMM), which represents each class as a weighted sum of multiple Gaussian components:where are the mixture weights satisfying , and each Gaussian component is given by:Unknown samples are rejected based on low maximum likelihood estimation (MLE) scores:where is a predefined threshold.
- Dirichlet Process Gaussian Mixture Model (DP-GMM) A Dirichlet Process (DP) prior can be introduced to allow the number of mixture components K to grow dynamically. The prior over the mixture weights follows:where is the concentration parameter controlling cluster sparsity. This enables automatic adaptation of the number of mixture components to better capture class distributions. The likelihood of follows:but with a nonparametric prior that ensures more flexible decision boundaries.
- Extreme Value Theory (EVT) Models The tail distribution of softmax probabilities is modeled using an Extreme Value Theorem (EVT) approach. Given softmax scores , we fit a Weibull distribution to the tail:A sample is rejected if:
- Bayesian Neural Networks (BNNs) BNNs introduce uncertainty estimation by placing priors over network weights:>Posterior inference is performed via Bayesian updating:A sample is rejected if the entropy of the predictive distribution is high:
- Support Vector Models (OC-SVM and SVDD) One-Class SVM (OC-SVM): Finds a separating hyperplane such that:subject to . A sample is rejected if:
- Support Vector Data Description (SVDD): Finds a minimum enclosing hypersphere with center and radius R:subject to:A sample is rejected if:
8.1. Literature Review of Deep Neural Network-Based Open Set Learning
| Author(s) | Contribution |
|---|---|
| Scheirer et al. (2012) [1215] | Introduced the concept of open space risk and proposed the 1-vs-Set Machine classifier, which minimizes both empirical and open space risk to identify unseen categories. |
| Bendale and Boult (2015) [1216] | Extended the framework to open world learning with the Nearest Non-Outlier (NNO) algorithm, allowing incremental learning from evolving data. |
| Busto and Gall (2017) [1217] | Proposed the Assign-and-Transform Iterative (ATI) method for domain adaptation when the target domain includes unknown classes not present in the source. |
| Saito et al. (2018) [1218] | Used adversarial training to align features of known classes and distinguish unknowns, improving generalization in open set domain adaptation scenarios. |
| Geng et al. (2020) [1219] | Offered a comprehensive taxonomy and theoretical framework for Open Set Learning methods, becoming a foundational survey in the field. |
| Chen et al. (2020) [1221] | Introduced Reciprocal Points to define extra-class space and improve the separation between known and unknown classes. |
| Authors (Year) | Citation Key | Contribution / Summary |
|---|---|---|
| Liu et al. (2020) | [1222] | Introduced the PEELER algorithm combining meta-learning and entropy maximization for few-shot open set recognition. |
| Kong and Ramanan (2021) | [1223] | Used GANs to generate diverse open-set examples, improving classifier robustness by explicitly modeling the open space. |
| Fang et al. (2021) | [1224] | Proposed generalization bounds and the Auxiliary Open-Set Risk (AOSR) algorithm for robust decision-making in open-world conditions. |
| Mandivarapu et al. (2022) | [1225] | Linked active learning with open set recognition by querying unknown instances for improved adaptivity. |
| Engelbrecht and du Preez (2020) | [1226] | Proposed a semi-supervised OSL model using positive and unlabeled learning to increase robustness to unknowns. |
| Zhou et al. (2024) | [1235] | Introduced a contrastive learning framework with an "unknown score" to enhance known-unknown separation. |
| Shao et al. (2022) | [1227] | Analyzed distributional shifts between training and testing phases, improving OSL generalization. |
| Park et al. (2024) | [1228] | Provided theoretical insights on distinguishing known and unknowns using Jacobian-based metrics in neural networks. |
| Liu et al. (2022) | [1230] | Developed a hybrid OSL object detection model combining labeled and unlabeled data for complex visual scenes. |
| Abouzaid et al. (2023) | [1236] | Used D-band FMCW radar with deep learning and open-set recognition for reliable material characterization. |
| Cevikalp et al. (2023) | [1237] | Unified anomaly detection and OSL using compact hypersphere models to define decision boundaries. |
| Palechor et al. (2023) | [1238] | Proposed large-scale ImageNet-based open-set protocols and a new validation metric for realistic OSL evaluation. |
| Cen et al. (2023) | [1240] | Introduced FS-KNNS for few-shot Unified Open-Set Recognition, analyzing uncertainty distributions and pretraining impacts. |
| Authors (Year) | Main Contribution |
|---|---|
| Huang et al. (2022) [1241] | Propose a semantic reconstruction approach that focuses on class-specific feature recovery, enhancing rejection of out-of-distribution samples by bridging the gap between known and unknown classes. |
| Wang et al. (2022) [1242] | Introduce an AUC-optimized objective function that trains deep networks to balance closed-set accuracy and unknown detection, improving open-set decision boundary learning. |
| Alliegro et al. (2022) [1243] | Present a benchmark dataset for 3D open-set learning in object point cloud classification, emphasizing the need for improved 3D feature representation. |
| Grieggs et al. (2021) [1244] | Apply OSL to handwriting recognition by leveraging human perception to identify transcription errors from unfamiliar handwriting styles, expanding OSL beyond classification. |
| Liu et al. (2022) [1230] | Propose a semi-supervised framework for open-world object detection using both labeled and unlabeled data, enabling dynamic adaptation to emerging object classes. |
| Grcić et al. (2022) [1245] | Combine anomaly detection and deep feature learning to enhance open-set semantic segmentation performance in dense prediction tasks. |
| Moon et al. (2022) [1246] | Introduce a simulator for generating synthetic unknown samples to improve model robustness against unfamiliar data distributions. |
| Kuchibhotla et al. (2022) [1248] | Develop an incremental learning framework for adapting to new unknown categories without retraining, suitable for continual learning and autonomous systems. |
| Katsumata et al. (2022) [1249] | Propose a GAN-based framework for semi-supervised open-set image generation that aligns synthesized images with both known and unknown class features. |
| Bao et al. (2022) [1250] | Extend OSL to temporal action recognition by detecting and localizing unseen human actions in video, applicable to surveillance and activity monitoring. |
| Dietterich and Guyer (2022) [1251] | Provide a theoretical analysis of why deep networks fail in open-set generalization, attributing it to feature familiarity levels and proposing architectural considerations. |
| Authors (Year) | Main Contribution |
|---|---|
| Cai et al. (2022) [1253] | Propose a method to localize unfamiliar samples in long-tailed distributions using feature similarity measures, enabling outlier rejection and integrating OSL with long-tailed classification. |
| Wang et al. (2022) [1254] | Present a framework for adapting open-world learning to user-defined tasks, enhancing model adaptability to dynamic real-world data distributions. |
| Zhang et al. (2022) [1256] | Introduce an architecture search algorithm tailored to OSL, highlighting the importance of network design in effectively rejecting unknown instances. |
| Lu et al. (2022) [1257] | Develop a prototype-based method that refines decision boundaries and improves open-set rejection by mining robust feature prototypes from known classes. |
| Xia et al. (2021) [1258] | Propose the Adversarial Motorial Prototype Framework (AMPF), using adversarial learning to refine class prototypes and explicitly model uncertainty boundaries. |
| Kong and Ramanan (2021) [1259] | Introduce OpenGAN, which uses GANs to synthesize OOD data for improving generalization, although requiring auxiliary OOD data. |
| Huang et al. (2021) [1260] | Present a semi-supervised cross-modal method (Trash to Treasure) for mining OOD samples from unlabeled data, with dependency on multi-modal data availability. |
| Wang et al. (2021) [1262] | Develop an energy-based model (EBM) for uncertainty calibration that offers principled confidence measures without OOD data, albeit with high computational cost. |
| Zhang and Ding (2021) [1263] | Adapt prototypical matching for zero-shot segmentation with open-set rejection, achieving efficiency but relying on pre-defined class embeddings. |
| Author(s) | Main Contribution |
|---|---|
| Girish et al. [1264] | Propose a framework for detecting GAN-generated images using contrastive learning and clustering to discover novel synthetic sources in open-world scenarios. |
| Wang et al. [1265] | Introduce a benchmark for open-world video object segmentation combining uncertainty estimation and spatio-temporal consistency to reject unknowns and learn new categories. |
| Cen et al. [1266] | Use deep metric learning with prototype-based margin separation to improve open-set semantic segmentation by distinguishing known and unknown classes. |
| Wu et al. [1267] | Present NGC, a framework combining graph-based label propagation and uncertainty-aware sample selection for robust learning under noisy and open-world conditions. |
| Bastan et al. [1268] | Address large-scale open-set logo detection using hierarchical clustering and outlier-aware loss to manage noisy real-world open-set data. |
| Saito et al. [1269] | Propose OpenMatch, a semi-supervised learning approach that integrates consistency regularization with open-set outlier rejection. |
| Esmaeilpour et al. [1270] | Extend CLIP to zero-shot open-set detection, leveraging vision-language models to detect novel categories without labeled data, but note limitations in fine-grained unknown discrimination. |
| Chen et al. [1272] | Introduce Adversarial Reciprocal Points Learning using adversarial optimization to define class boundaries while rejecting unknowns via a geometric margin constraint. |
| Guo et al. [1273] | Develop a Conditional Variational Capsule Network combining capsules and VAEs for hierarchical uncertainty modeling in open-set recognition. |
| Bao et al. [1274] | Apply Evidential Deep Learning and subjective logic to explicitly model epistemic uncertainty in video-based action recognition. |
| Sun et al. [1275] | Propose M2IOSR, an information-theoretic model that maximizes mutual information for compact, separable class manifolds and robust unknown rejection. |
| Hwang et al. [1276] | Address open-set panoptic segmentation using prototype learning to distinguish known and unknown objects, integrating metric learning in dense prediction. |
| Balasubramanian et al. [1278] | Focus on real-world detection of unknown traffic scenarios using ensemble diversity to improve uncertainty estimation and robustness. |
| Author(s) | Main Contribution |
|---|---|
| Salomon et al. [1285] | Apply metric learning to distinguish known from unknown classes in open-set face recognition with small galleries. |
| Jia and Chan [1284] | Incorporate margin-based constraints into feature learning to improve discriminability in OSR. |
| Jia and Chan [1283] | Learn robust representations through reconstruction of original images from augmented views to generalize to unknowns. |
| Yue et al. [1282] | Generate synthetic unknowns to refine decision boundaries using counterfactual reasoning, bridging OSR and zero-shot learning. |
| Cevikalp et al. [1281] | Model known classes as convex cones using a deep polyhedral conic classifier to enable open-set robustness. |
| Zhou et al. [1280] | Learn placeholder prototypes for potential unknown classes during training to dynamically adjust decision boundaries. |
| Jang and Kim [1279] | Introduce a teacher-explorer-student (TES) meta-learning framework where an explorer guides the student using challenging open-set samples. |
| Sun et al. [1287] | Propose a Conditional Gaussian Distribution Learning (CGDL) method to model class-conditional distributions for uncertainty-based OSR. |
| Perera et al. [1288] | Combine variational autoencoders (VAEs) with discriminative classifiers in a hybrid framework to separate known and unknown classes. |
| Ditria et al. [1289] | Present OpenGAN, which generates synthetic outliers to improve open-set detection by training the discriminator to reject unknowns. |
| Geng and Chen [1290] | Propose a collective decision framework that aggregates multiple classifiers to improve robustness in open-set scenarios. |
| Jang and Kim [1291] | Develop a One-vs-Rest deep probability model to estimate the probability of a sample belonging to an unknown class. |
| Zhang et al. [1292] | Explore hybrid models that combine discriminative and generative components for joint optimization of feature learning and OSR. |
| Author(s) | Main Contribution |
|---|---|
| Shao et al. [1293] | Developed Open-set Adversarial Defense by integrating OSR robustness into adversarial training, enabling resilience to both adversarial and unknown-class intrusions. |
| Yu et al. [1294] | Proposed a Multi-Task Curriculum Framework for semi-supervised OSR, balancing supervised and unsupervised learning to progressively handle unknown classes. |
| Miller et al. [1295] | Introduced Class Anchor Clustering, a distance-based loss to form compact class clusters while maximizing inter-class separation in the feature space. |
| Jia and Chan [1296] | Proposed MMF loss to enhance intra-class compactness and inter-class separation, improving discriminative feature learning in OSR. |
| Oliveira et al. [1300] | Extended OSR to semantic segmentation via Fully Convolutional Open Set Segmentation with uncertainty-aware pixel-wise rejection. |
| Yang et al. [1301] | Proposed S2OSC, a semi-supervised OSR framework combining consistency regularization and entropy minimization to exploit both labeled and unlabeled data. |
| Sun et al. [1302] | Used conditional probabilistic generative models to estimate likelihoods and reject unknowns based on uncertainty thresholds. |
| Yang et al. [1303] | Introduced Convolutional Prototype Network (CPN), learning prototypes for known classes and using distance-based rejection for OSR. |
| Dhamija et al. [1304] | Highlighted limitations in open set detection for object recognition and introduced an evaluation framework stressing real-world challenges. |
| Meyer and Drummond [1305] | Advocated for metric learning in robotic vision OSR, emphasizing active learning and incremental unknown class discovery. |
| Oza and Patel [1306] | Proposed a multi-task learning method using autoencoders for joint classification and reconstruction to detect outliers in OSR. |
| Yoshihashi et al. [1307] | Introduced CROSR, which combines classification and reconstruction to use reconstruction error as a cue for open set rejection. |
| Malalur and Jaakkola [1308] | Proposed an alignment-based matching network using metric learning for one-shot OSR with focus on feature alignment. |
| Schlachter et al. [1309] | Developed intra-class splitting to improve decision boundaries by subdividing known classes into more refined sub-clusters. |
| Imoscopi et al. [1310] | Focused on speaker identification in OSR using confidence thresholds in discriminatively trained neural networks. |
| Mundt et al. [1311] | Showed that uncertainty-based methods like softmax entropy and Monte Carlo dropout can rival generative models in OSR. |
| Author(s) | Main Contribution |
|---|---|
| Liu et al. [1313] | Proposed a decoupled learning framework for large-scale, long-tailed recognition that improves balance across head and tail classes while rejecting unknowns. |
| Perera and Patel [1314] | Explored deep transfer learning for novelty detection using pre-trained models to identify multiple unknown classes. |
| Xiong et al. [1315] | Presented a spatial divide-and-conquer framework for open-set to closed-set object counting, applying OSR concepts to counting tasks. |
| Yang et al. [1316] | Applied open-set recognition to human activity recognition using micro-Doppler radar signatures to distinguish known and unknown movements. |
| Oza and Patel [1317] | Introduced C2AE, a class-conditioned autoencoder that separates known and unknown samples using reconstruction error thresholds. |
| Liu et al. [1318] | Provided PAC-based theoretical guarantees for open category detection, offering bounds on detection error. |
| Venkataram et al. [1319] | Adapted CNNs for open-set text classification using prototype-based rejection mechanisms. |
| Hassen and Chan [1320] | Proposed a representation learning technique that models uncertainty to improve robustness to unknown data. |
| Shu et al. [1321] | Developed a framework for open-world classification, enabling discovery of new classes incrementally. |
| Dhamija et al. [1322] | Tackled overconfidence on unknowns by designing loss functions that reduce incorrect high-confidence predictions. |
| Zheng et al. [1324] | Investigated adversarial attacks in open-set systems and proposed defenses to enhance unknown-class detection reliability. |
| Author(s) | Main Contribution |
|---|---|
| Neal et al. [1325] | Introduced counterfactual image generation to simulate unknown classes and improve classifier robustness using synthetic outliers. |
| Rudd et al. [1326] | Proposed the Extreme Value Machine (EVM), leveraging EVT to model sample inclusion probabilities for open set recognition. |
| Vignotto and Engelke [1327] | Compared GPD and GEV classifiers for EVT-based modeling of tail distributions in open set recognition. |
| Cardoso et al. [1328] | Explored weightless neural networks using probabilistic memory structures, enabling dynamic adaptation to new data without retraining. |
| Rozsa et al. [1329] | Compared Softmax and Openmax under adversarial conditions, showing Openmax’s superior ability to reject uncertain samples. |
| Shu et al. [1330] | Developed DOC, a deep open classification framework for text, modeling semantic boundaries to detect unknown classes. |
| Ge et al. [1331] | Introduced Generative Openmax, synthesizing unknown class samples to improve multi-class open set classification. |
| Yu et al. [1332] | Used adversarial sample generation to train classifiers for distinguishing between known and unknown categories. |
| Vaze et al. [1231] | Claimed well-trained closed-set classifiers can inherently perform open set recognition without specific modifications. |
| Barcina-Blanco et al. [1232] | Provided a comprehensive literature review on OSL, highlighting its ties to out-of-distribution detection and uncertainty estimation. |
| iCGY96 (GitHub) [1233] | Curated a repository of papers and resources for open set learning research. |
8.2. Literature Review of Traditional Machine Learning Open Set Learning
8.2.1. Foundational Theoretical Frameworks of Open Set Recognition
8.2.2. Sparse and Hyperplane-Based Models for OSR
8.2.3. Support Vector and Nearest Neighbor Approaches
8.2.4. Domain Transfer and Zero-Shot Learning Integration
8.2.5. Application-Centric Advances: Face Recognition and Text Classification
8.2.6. Ensemble and Fusion-Based Enhancements
8.3. Mahalanobis Distance
8.4. Literature Review of Bayesian Formulation in Open Set Learning
8.4.1. Literature Review of Bayesian Neural Networks (BNNs) for OSL
8.4.2. Literature Review of Dirichlet-Based Uncertainty for OSL
8.4.3. Literature Review of Gaussian Processes (GPs) for OSL
8.4.4. Literature Review of Variational Autoencoders (VAEs) and Bayesian Generative Models
8.4.5. Critical Synthesis
8.5. Analysis of Bayesian Formulation in Open Set Learning
8.6. Gaussian Mixture Model (GMM)
8.7. Dirichlet Process Gaussian Mixture Model (DP-GMM)
8.8. Conjugate Normal-Inverse-Wishart (NIW) Distribution
8.9. Extreme Value Theory (EVT) Models
8.10. Bayesian Neural Networks (BNNs)
8.11. Support Vector Models
8.12. Support Vector Data Description
9. Zero-Shot Learning
9.1. Literature Review of Zero-Shot Learning
9.2. Analysis of Zero-Shot Learning
9.3. Energy-based models for Zero-Shot Learning
9.3.1. Various Loss function Used in Energy-Based Models for Zero-Shot Learning
9.3.2. Generative Constraints in Energy-Based Models for Zero-Shot Learning
9.3.3. Use of Inference in Energy-Based Models for Zero-Shot Learning
9.4. Meta-Learning Approaches for Zero-Shot Learning
9.4.1. Model-Agnostic Meta-Learning (MAML)
9.4.2. Meta-Embedding Strategy
9.4.3. Metric-Based Meta-Learning
9.4.4. Graph-Based Meta-Learning
9.4.5. Bayesian Meta-Learning
10. Neural Network Basics
10.1. Literature Review of Neural Network Basics
10.2. Perceptrons and Artificial Neurons
10.3. Feedforward Neural Networks
10.4. Activation Functions
10.5. Loss Functions
10.5.1. Loss Functions for Regression Tasks
10.5.1.1 Mean Squared Error (MSE/L2 Loss)
10.5.1.2 Mean Absolute Error (MAE/L1 Loss)
10.5.1.3 Huber Loss (Smooth Mean Absolute Error)
10.5.1.4 Python Code to Generate Figure 65 Illustrating Mean Squared Error (MSE/L2) Loss Function


10.5.1.5 Python Code to Generate Figure 66 Illustrating Mean Absolute Error (MAE/L1) Loss Function


10.5.1.6 Python Code to Generate Figure 67 Illustrating Huber Loss Function



10.5.1.7 Python Code to Generate Figure 68 Comparing the Loss Functions: Huber, MSE, MAE



10.5.2. Loss Functions for Classification Tasks
10.5.2.1 Binary Cross-Entropy (Log Loss)
10.5.2.2 Categorical Cross-Entropy
10.5.2.3 Sparse Categorical Cross-Entropy
10.5.2.4 Kullback-Leibler Divergence (KL Divergence)
10.5.2.5 Python Code to Generate Figure 69 Illustrating Binary Cross-Entropy (BCE) Loss Function



10.5.2.6 Python Code to Generate Figure 70 Illustrating Binary Cross-Entropy Loss Surface



10.5.2.7 Python Code to Generate Figure 71 Illustrating Categorical Cross-Entropy Loss


10.5.2.8 Python Code to Generate Figure 72 Illustrating Categorical Cross-Entropy Loss



10.5.2.9 Python Code to Generate Figure 73 Illustrating Sparse Categorical Cross-Entropy Loss



10.5.2.10 Python Code to Generate Figure 74 Illustrating Surface Plot of Sparse Categorical Cross-Entropy Loss



10.5.2.11 Python Code to Generate Figure 75 Illustrating KL Divergence



10.5.2.12 Python Code to Generate Figure 76 Illustrating 2D Surface Plot of KL Divergence



10.5.3. Advanced and Specialized Loss Functions
10.5.3.1 Loss Functions for Generative Adversarial Networks (GANs)
10.5.3.1.1 Wasserstein Loss (Earth Mover’s Distance)
10.5.3.1.2 Least Squares GAN (LSGAN) Loss
10.5.3.1.3 Adversarial Loss (Standard GAN Loss)
10.5.3.2 Loss Functions for Siamese and Metric Learning
10.5.3.2.1 Contrastive Loss
10.5.3.2.2 Triplet Loss
10.5.3.2.3 Center Loss
10.5.3.3 Loss Functions for Style Transfer and Super-Resolution
10.5.3.3.1 Perceptual Loss
10.5.3.3.2 Style Loss
10.5.3.3.3 Total Variation (TV) Loss
10.5.3.4 Loss Functions for Uncertainty Estimation and Bayesian Deep Learning
10.5.3.4.1 Evidence Lower Bound (ELBO) Loss
10.5.3.4.2 Negative Log-Likelihood (NLL) Loss
10.5.3.5 Loss Functions for Domain Adaptation
10.5.3.5.1 Domain Adversarial Loss
10.5.3.5.2 Maximum Mean Discrepancy (MMD) Loss
10.5.3.6 Loss Functions for Object Detection and Segmentation
10.5.3.6.1 IoU Loss (Intersection over Union) and Its Variants (GIoU, DIoU, CIoU)
10.5.3.6.2 Dice Loss
10.5.3.6.3 Focal Loss
10.5.3.7 Loss Functions for Knowledge Distillation
10.5.3.8 Loss Functions for Reinforcement Learning
10.5.3.8.1 Hinge Loss
10.5.3.8.2 Value Loss
10.5.3.8.3 Policy Loss
10.5.3.8.4 Entropy Regularization
10.5.3.9 Loss Functions for Sparse and Structured Outputs
10.5.3.9.1 Kullback-Leibler (KL) Divergence Loss (as a Sparsity Constraint)
10.5.3.9.2 Structured Support Vector Machine (SSVM) Loss
10.5.3.9.3 Conditional Random Field (CRF) Loss
10.5.3.9.4 Connectionist Temporal Classification (CTC) Loss
10.5.3.9.5 Maximum Margin Markov Networks Loss
10.5.3.9.6 Listwise Ranking Losses
10.5.3.9.7 Pairwise Ranking Losses
10.5.3.9.8 Contrastive Divergence (CD) Loss
10.5.3.9.9 Persistent Contrastive Divergence (PCD) Loss
10.5.3.9.10 Hausdorff Distance Loss
10.5.3.10 Python Code to Generate Figure 77 Illustrating 1D Wasserstein Distance of Two Distributions Having Parameters: Normal() and Normal()



10.5.3.11 Python Code to Generate Figure 78 Illustrating Multidimensional Wasserstein Distance of Two Distributions Having Parameters, Distribution A: , Distribution B:



10.5.3.12 Python Code to Generate Figure 79 Comparing Equal CDF, WGAN Critic, GAN Discriminator



10.5.3.13 Python Code to Generate Figure 80 Illustrating Least Squares GAN (LSGAN) Loss Functions



10.5.3.14 Python Code to Generate Figure 81 Illustrating Standard GAN (Adversarial) Loss Functions



10.5.3.15 Python Code to Generate Figure 82 Illustrating Contrastive Loss Functions



10.5.3.16 Python Code to Generate Figure 83 Illustrating Contrastive Loss Landscapes



10.5.3.17 Python Code to Generate Figure 84 Illustrating 3D Contrastive Loss Landscapes



10.5.3.18 Python Code to Generate Figure 85 Illustrating Triplet Loss Function


10.5.3.19 Python Code to Generate Figure 86 Illustrating Triplet Loss Landscape



10.5.3.20 Python Code to Generate Figure 87 Illustrating Triplet Loss Contour Plot



10.5.3.21 Python Code to Generate Figure 88 Illustrating Center Loss Function



10.5.3.22 Python Code to Generate Figure 89 Illustrating Center Loss Visualization: Features vs Class Centers



10.5.3.23 Python Code to Generate Figure 90 Illustrating 3D Center Loss Surface


10.5.3.24 Python Code to Generate Figure 91 Illustrating Perceptual Loss Function


10.5.3.25 Python Code to Generate Figure 92 Illustrating Pixel-Wise MSE vs Perceptual Loss (Nonlinear Features)



10.5.3.26 Python Code to Generate Figure 93 Illustrating Style Loss Function in Neural Style Transfer



10.5.3.27 Python Code to Generate Figure 94 Illustrating Style Loss vs Scaling of Generated Features



10.5.3.28 Python Code to Generate Figure 95 Illustrating Total Variation (TV) Loss



10.5.3.29 Python Code to Generate Figure 96 Illustrating Total Variation Loss vs Noise Level



10.5.3.30 Python Code to Generate Figure 97 Illustrating Evidence Lower Bound (ELBO) Loss During Training


10.5.3.31 Python Code to Generate Figure 98 Illustrating Negative Log-Likelihood (NLL) Loss


10.5.3.32 Python Code to Generate Figure 99 Illustrating Domain Adversarial Loss During Training



10.5.3.33 Python Code to Generate Figure 100 Illustrating Maximum Mean Discrepancy (MMD) Loss During Training



10.5.3.34 Python Code to Generate Figure 101 Illustrating IoU Loss and Variants (GIoU Loss, DIoU Loss, CIoU Loss)



10.5.3.35 Python Code to Generate Figure 102 Illustrating Dice Loss During Training



10.5.3.36 Python Code to Generate Figure 103 Illustrating Focal Loss vs Predicted Probability


10.5.3.37 Python Code to Generate Figure 104 Illustrating Focal Loss vs Standard Binary Cross-Entropy


10.5.3.38 Python Code to Generate Figure 105 Illustrating Hinge Loss vs Prediction Score



10.5.3.39 Python Code to Generate Figure 106 Illustrating Comparison of the Standard Hinge Loss and the Squared Hinge Loss



10.5.3.40 Python Code to Generate Figure 107 Illustrating Value Loss (MSE) for Different Targets


10.5.3.41 Python Code to Generate Figure 108 Illustrating Value Loss (MSE) vs Huber Loss



10.5.3.42 Python Code to Generate Figure 109 Illustrating Policy Loss vs Action Probability



10.5.3.43 Python Code to Generate Figure 110 Illustrating PPO Clipped Surrogate Objective



10.5.3.44 Python Code to Generate Figure 111 Illustrating PPO Loss Comparison



10.5.3.45 Python Code to Generate Figure 112 Illustrating PPO Clipped Surrogate Loss with Entropy Regularization


10.5.3.46 Python Code to Generate Figure 113 Comparing PPO Unclipped, PPO Clipped, PPO + Entropy



10.5.3.47 Python Code to Generate Figure 114 Illustrating Soft Actor-Critic (SAC) Policy Loss with Entropy Regularization


10.5.3.48 Python Code to Generate Figure 115 Illustrating KL Divergence Loss as Sparsity Constraint



10.5.3.49 Python Code to Generate Figure 116 Illustrating Total KL Sparsity Loss for 10 Hidden Units ()



10.5.3.50 Python Code to Generate Figure 117 Illustrating Total KL Sparsity Loss for 10 Hidden Units with Random Activations ()



10.5.3.51 Python Code to Generate Figure 118 Illustrating Structured SVM (SSVM) Hinge Loss



10.5.3.52 Python Code to Generate Figure 119 Illustrating Structured SVM Loss Surface ()



10.5.3.53 Python Code to Generate Figure 120 Illustrating Structured SVM Loss Contour ()


10.5.3.54 Python Code to Generate Figure 121 Illustrating Simplified CRF Loss vs Correct Label Score



10.5.3.55 Python Code to Generate Figure 122 Illustrating CRF Loss Surface (Single Timestep, 2 Labels)



10.5.3.56 Python Code to Generate Figure 123 Illustrating CRF Loss Contour (Single Timestep, 2 Labels)



10.5.3.57 Python Code to Generate Figure 124 Illustrating CTC Loss vs Probability of Correct Sequence



10.5.3.58 Python Code to Generate Figure 125 Illustrating Approximate CTC Loss vs Per-Timestep Probability for Different Sequence Lengths



10.5.3.59 Python Code to Generate Figure 126 Illustrating CTC Loss Surface vs Sequence Length and Per-Timestep Probability


10.5.3.60 Python Code to Generate Figure 127 Illustrating CTC Loss Contour vs Sequence Length and Per-Timestep Probability



10.5.3.61 Python Code to Generate Figure 128 Illustrating Maximum Margin Markov Networks (M³N) Loss



10.5.3.62 Python Code to Generate Figure 129 Illustrating M³N Loss Surface ()



10.5.3.63 Python Code to Generate Figure 130 Illustrating Contour Plot of M³N Loss ()



10.5.3.64 Python Code to Generate Figure 131 Illustrating Listwise Ranking Losses (ListNet vs ListMLE)




10.5.3.65 Python Code to Generate Figure 132 Illustrating Listwise Ranking Losses (3D Surfaces)




10.5.3.66 Python Code to Generate Figure 133 Illustrating Listwise Ranking Loss Contours



10.5.3.67 Python Code to Generate Figure 134 illustrating Pairwise Ranking Losses



10.5.3.68 Python Code to Generate Figure 135 Illustrating Pairwise Ranking Loss Landscapes



10.5.3.69 Python Code to Generate Figure 136 Illustrating 3D Surfaces of Pairwise Ranking Losses



10.5.3.70 Python Code to Generate Figure 137 Illustrating Contrastive Divergence (CD) Loss vs Weight


10.5.3.71 Python Code to Generate Figure 138 Illustrating Contrastive Divergence (CD) Loss Landscape



10.5.3.72 Python Code to Generate Figure 139 Illustrating Contrastive Divergence (CD) Loss Landscape (3D)



10.5.3.73 Python Code to Generate Figure 140 Illustrating Persistent Contrastive Divergence (PCD) Loss Landscape



10.5.3.74 Python Code to Generate Figure 141 Illustrating Persistent Contrastive Divergence (PCD) Loss Landscape (3D)



10.5.3.75 Python Code to Generate Figure 142 Illustrating Hausdorff Distance Loss Visualization



10.5.3.76 Python Code to Generate Figure 143 Illustrating Hausdorff Distance Heatmap



11. Few-Shot Learning
11.1. Meta-Learning Formulation in Few Shot Learning
11.2. Bayesian Methods in Few Shot Learning
11.3. Prototypical Networks in Few Shot Learning
11.4. Model-Agnostic Meta-Learning (MAML) in Few Shot Learning
11.5. Metric-Based Learning in Few Shot Learning
11.6. Bayesian Methods in Few Shot Learning
12. Metric Learning
12.1. Large Margin Nearest Neighbors (LMNN) Approach to Metric Learning
12.2. Information-Theoretic Metric Learning (ITML) Framework Approach to Metric Learning
12.3. Deep Metric Learning
12.4. Normalized Temperature-Scaled Cross-Entropy Loss (NT-Xent) Approach to Metric Learning
13. Adversial Learning
13.1. Fast Gradient Sign Method (FGSM) in Adversial Learning
13.2. Projected Gradient Descent (PGD) Method in Adversial Learning
13.3. Generative Approach in Adversarial Learning
13.4. Interpreting Generative Adversarial Networks Within the Framework of Energy-Based Models
14. Casual Inference in Deep Neural Networks
14.1. Structural Causal Model (SCM)
- Abduction: Infer exogenous variables using the observed data:
- Action: Modify the SCM by replacing X with :
- Prediction: Solve for in the modified SCM:
14.2. Counterfactual Reasoning in Causal Inference for Deep Neural Networks
14.3. Domain Adaptation in Causal Inference Within Deep Neural Networks
14.4. Invariant Risk Minimization (IRM) in Causal Inference for Deep Neural Networks
14.5. Empirical Risk Minimization (ERM) in Causal Inference for Deep Neural Networks
15. Network Architecture Search (NAS) in Deep Neural Networks
15.1. Evolutionary Algorithms in Network Architecture Search


15.2. Reinforcement Learning in Network Architecture Search

15.3. Policy Gradient Methods in Network Architecture Search

15.4. Neural Tangent Kernels (NTKs) in Network Architecture Search


16. Learning Paradigms
16.1. Unsupervised Learning
16.1.1. Literature Review of Unsupervised Learning
| Authors (Year) | Contribution |
|---|---|
| MacQueen (1967) [1035] | Introduced the k-means algorithm, a foundational clustering method that minimizes intra-cluster variance through iterative centroid updates and point reassignment based on Euclidean distance. |
| Dempster, Laird, and Rubin (1977) [1036] | Developed the Expectation-Maximization (EM) algorithm, a general framework for maximizing likelihood estimates in models with latent variables, forming the basis for Gaussian Mixture Models (GMMs). |
| Kohonen (1982) [1037] | Proposed self-organizing maps (SOMs), a neural-inspired model for competitive learning, which preserves topological relationships and has been instrumental in feature extraction. |
| Belkin and Niyogi (2003) [1038] | Introduced Laplacian Eigenmaps, which use a graph Laplacian to capture local geometric properties of data manifolds, providing a foundation for spectral clustering and nonlinear dimensionality reduction. |
| Tishby, Pereira, and Bialek (2000) [1039] | Proposed the Information Bottleneck (IB) method, optimizing mutual information to balance compression and predictive efficiency, influencing representation learning in autoencoders. |
| Hinton and Salakhutdinov (2006) [1040] | Demonstrated deep belief networks (DBNs), where layer-wise training of restricted Boltzmann machines (RBMs) enables hierarchical unsupervised representation learning. |
| Kingma and Welling (2013) [1041] | Developed variational autoencoders (VAEs), leveraging variational inference for probabilistic generative modeling of complex data distributions. |
| Goodfellow et al. (2020) [121] | Introduced generative adversarial networks (GANs), an adversarial framework where a generator and discriminator compete, leading to advances in synthetic data generation. |
| van der Maaten and Hinton (2008) [1042] | Developed t-distributed stochastic neighbor embedding (t-SNE), a probabilistic approach for high-dimensional data visualization that preserves local similarities. |
| Roweis and Saul (2000) [1043] | Introduced Locally Linear Embedding (LLE), a nonlinear dimensionality reduction technique that preserves local geometric relationships and is effective for manifold learning. |
| Bell and Sejnowski (1995) [1044] | Developed Independent Component Analysis (ICA), an information-theoretic method for blind source separation, leveraging higher-order statistics to extract statistically independent signals. |
16.1.2. Recent Literature Review of Unsupervised Learning
| Authors (Year) | Contribution |
|---|---|
| Parmar (2025) [1045] | Introduced an unsupervised learning framework for identifying unknown defects in semiconductor manufacturing, leveraging clustering and anomaly detection to improve quality control in industrial settings. |
| Raikwar and Gupta (2025) [1046] | Developed an AI-driven trust management framework for wireless ad hoc networks, combining unsupervised and supervised learning to classify network nodes based on trustworthiness and detect malicious activity. |
| Moustakidis et al. (2025) [1047] | Proposed deep learning autoencoders for FFT-based clustering in structural health monitoring, enabling automated detection of temporal damage evolution in composite materials. |
| Liu et al. (2025) [1048] | Designed an unsupervised feature selection algorithm using L2, p-norm feature reconstruction, reducing redundant features and improving clustering performance for high-dimensional datasets. |
| Zhou et al. (2025) [1049] | Applied unsupervised clustering techniques to metabolic profiles, identifying hidden metabolic subtypes associated with hypertriglyceridemia and disease risks, advancing personalized medicine. |
| Lin et al. (2025) [1050] | Developed an unsupervised learning-based risk control model for health insurance fund management, effectively identifying high-risk groups and fraudulent claims through anomaly detection. |
| Huang et al. (2025) [1051] | Proposed an unsupervised domain adaptation method for open-world object detection, enabling models to generalize across different environments without extensive labeled datasets. |
| Wu and Liu (2025) [1052] | Designed a VQ-VAE-2-based unsupervised detection algorithm for concrete crack identification, automating structural health monitoring and reducing manual inspection efforts. |
| Nagelli and Saleena (2025) [1053] | Developed an aspect-based sentiment analysis model using self-attention mechanisms, enabling multilingual sentiment analysis without labeled training data. |
| Ekanayake (2025) [1054] | Applied deep learning-based unsupervised learning for MRI reconstruction and super-resolution, reducing scan times while maintaining high image quality in medical imaging. |
16.1.3. Mathematical Analysis of Unsupervised Learning
16.1.4. Information Bottleneck (IB) Method
16.1.4.1 Literature Review of Information Bottleneck (IB) Method
| Authors (Year) | Contribution |
| Tishby et al. (1999) [1068] | Introduced the Information Bottleneck (IB) method, formulating an optimization problem that balances mutual information terms to extract relevant information from a random variable while minimizing redundancy. Developed an iterative variational algorithm for solving the IB problem, demonstrating its application in clustering and representation learning. |
| Chechik et al. (2003) [1069] | Extended the IB method to jointly Gaussian variables, deriving analytical solutions that connect IB with canonical correlation analysis and PCA. Provided a rigorous foundation for applying IB to real-world Gaussian data. |
| Chechik and Tishby (2002) [1070] | Developed a variant of the IB framework incorporating side information, enabling extraction of representations that retain information about one target while obfuscating another. Applied to privacy-preserving and fairness-aware machine learning. |
| Tishby and Zaslavsky (2015) [1071] | Proposed that deep neural network training follows an IB perspective, consisting of an initial fitting phase followed by a compression phase, explaining generalization through information-theoretic principles. |
| Saxe et al. (2019) [1072] | Critically examined the IB hypothesis in deep learning, showing that the presence of a compression phase depends on network architecture and activation functions, challenging the universality of IB in training dynamics. |
| Shwartz-Ziv and Tishby (2017) [1073] | Conducted empirical analysis of information flow in deep networks using information plane visualizations, providing evidence for compression in networks trained with SGD and reinforcing IB-based interpretations. |
| Noshad et al. (2019) [1074] | Developed a new mutual information estimator using dependence graphs to improve the scalability and accuracy of IB-based analyses in high-dimensional settings, addressing limitations of traditional estimators. |
| Goldfeld et al. (2018) [1075] | Provided refined mutual information estimation techniques for deep networks, offering rigorous mathematical justifications for compression effects in neural representations. |
| Geiger (2021) [1077] | Reviewed information plane analyses in neural classifiers, evaluating the strengths and weaknesses of IB interpretations, highlighting cases where IB fails to accurately characterize training dynamics. |
| Kawaguchi et al. (2023) [1078] | Analyzed generalization properties of neural networks under IB, linking information compression to generalization error bounds and establishing IB as a regularization mechanism for improved performance. |
16.1.4.2 Recent Literature Review of Information Bottleneck (IB) Method
| Authors (Year) | Contribution |
|---|---|
| Dardour et al. (2025) [1079] | Introduced a novel approach to enhance adversarial robustness in stochastic neural networks. By leveraging inter-separability and intra-concentration, their study demonstrated that IB constraints help neural networks learn more robust latent features, effectively mitigating adversarial perturbations. |
| Krinner et al. (2025) [1080] | Applied IB principles to reinforcement learning by designing state-space world models that accelerate learning efficiency. Their study showed that IB-based methods help an agent discard irrelevant environmental noise while retaining essential features, leading to improved exploration efficiency. |
| Yildirim et. al. (2024) [1081] | Explored how IB constraints affect StyleGAN-based image editing. They demonstrated that GAN-based inversion techniques often suffer from excessive compression-induced detail loss, proposing refined inversion methods that better preserve fine-grained image features. |
| Yang et al. (2025) [1082] | Developed a cognitive-load-aware activation mechanism for large language models (LLMs), improving efficiency by dynamically activating only the necessary model parameters. Their study used IB principles to retain relevant contextual representations while discarding redundant computations, reducing computational overhead. |
| Liu et al. (2025) [1083] | Incorporated IB principles in a structure-aware Vision Mamba network for crack segmentation in infrastructure. Their method efficiently filters out redundant spatial information, enhancing computational efficiency and segmentation accuracy, making it crucial for real-time applications in structural health monitoring. |
| Stierle and Valtere (2025) [1084] | Applied IB theory to medical innovation, examining how information bottlenecks in regulatory and patent frameworks slow down gene therapy advancements. Their work analyzed how such bottlenecks in medical research and policy impede technological progress. |
| Chen et al. (2025) [1085] | Applied IB concepts to quantum computing, particularly in optimizing construction supply chains. Their work demonstrated that quantum models integrated with IB techniques efficiently compress relevant data while filtering out extraneous information, improving decision-making processes. |
| Yuan et al. (2025) [1086] | Extended IB applications to plant metabolomics by proposing a novel feature selection approach that retains highly informative metabolite interactions while discarding non-essential data. This method improved interpretability in plant metabolic studies. |
| Dey et al. (2025) [1087] | Utilized IB principles in spatio-temporal prediction models for NDVI (Normalized Difference Vegetation Index), which is crucial for rice crop yield forecasting. Their IB-augmented neural network improved prediction accuracy by filtering out irrelevant environmental variables. |
| Li (2025) [1088] | Applied IB principles in robotic path planning, developing an optimized method for navigation path extraction in mobile robots. Their approach eliminated irrelevant environmental noise while preserving crucial navigational data, improving robotic movement efficiency. |
16.1.4.3 Mathematical Analysis of Information Bottleneck (IB) method
- is the mutual information between the input X and the compressed representation T, which measures the amount of information retained about X in T.
- is the mutual information between T and the target variable Y, ensuring that the compressed representation remains useful for predicting Y.
- is a Lagrange multiplier that controls the trade-off between compression and prediction accuracy.
- is a normalization constant ensuring that is a valid probability distribution.
- is the Kullback-Leibler (KL) divergence between the posterior distributions and , ensuring that T retains relevant information about Y.
16.1.5. Restricted Boltzmann machines (RBMs)
16.1.5.1 Literature Review of Restricted Boltzmann machines (RBMs)
| Authors (Year) | Contribution |
|---|---|
| Smolensky (1986)[1090] | Introduced the concept of the Harmonium, providing the theoretical foundation for energy-based models and probabilistic representations in neural networks. |
| Hinton and Salakhutdinov (2006)[1040] | Demonstrated how RBMs could be stacked to form Deep Belief Networks (DBNs), enabling efficient unsupervised pretraining and improving deep learning architectures. |
| Carreira-Perpiñán and Hinton (2005)[1091] | Analyzed the Contrastive Divergence (CD) algorithm, providing insights into its convergence properties and limitations for RBM training. |
| Hinton (2012)[1092] | Provided a practical guide for training RBMs, detailing hyperparameter tuning, initialization strategies, and best practices. |
| Fischer and Igel (2014)[1093] | Offered a comprehensive introduction to RBMs, covering theoretical foundations, training methodologies, and practical applications. |
| Larochelle and Bengio (2008)[1094] | Introduced a discriminative variant of RBMs tailored for classification tasks, demonstrating their adaptability to supervised learning. |
| Salakhutdinov, Mnih, and Hinton (2007)[1095] | Applied RBMs to collaborative filtering, showing their effectiveness in recommender systems by capturing latent user-item interactions. |
| Coates, Lee, and Ng (2011)[1096] | Analyzed RBMs for unsupervised feature learning, demonstrating their ability to extract hierarchical representations from raw data. |
| Salakhutdinov and Hinton (2009)[1097] | Proposed the Replicated Softmax model, extending RBMs for modeling word counts in natural language processing tasks. |
| Adachi and Henderson (2015)[1098] | Investigated the use of quantum annealing for RBM training, exploring potential acceleration of learning using quantum computing techniques. |
16.1.5.2 Recent Literature Review of Restricted Boltzmann Machines (RBMs)
| Authors (Year) | Contribution |
| Salloum et al. (2024) [1099] | Compared classical RBMs with quantum-restricted Boltzmann machines for MNIST classification, demonstrating that quantum models exhibit superior performance in certain optimization scenarios. |
| Joudaki (2025) [1100] | Conducted a comprehensive literature review on RBMs and Deep Belief Networks (DBNs) for human action recognition, identifying key challenges such as overfitting and slow convergence. |
| Prat Pou et al. (2025) [1101] | Proposed an improved method for evaluating the partition function in RBMs using annealed importance sampling, which enhances accuracy in statistical physics applications. |
| Decelle et al. (2025) [1102] | Investigated the ability of RBMs to infer high-order dependencies in complex systems, particularly in protein interaction networks and spin glasses. |
| Savitha et al. (2025) [1103] | Integrated RBMs within DBNs for cardiovascular disease prediction, leveraging optimization techniques such as the Harris Hawks Search algorithm to improve diagnostic accuracy. |
| Béreux et al. (2025) [1104] | Developed an efficient training strategy for RBMs that accelerates convergence while maintaining strong generalization capabilities in large-scale machine learning problems. |
| Thériault et al. (2024) [1105] | Explored structured learning in RBMs within a teacher-student setting, demonstrating that incorporating structured priors enhances generalization beyond seen data. |
| Manimurugan et al. (2024) [1106] | Combined Bi-LSTM networks with RBMs for underwater object detection, showcasing the ability of RBMs to effectively capture spatial dependencies in sonar and optical imagery. |
| Hossain et al. (2025) [1107] | Benchmarked RBMs against classical and deep learning models for human activity recognition, highlighting their effectiveness in extracting latent features. |
| Qin et al. (2025) [1108] | Integrated RBMs with magnetic tunnel junctions for magnetic anomaly detection, demonstrating their potential in neuromorphic computing for energy-efficient AI systems. |
16.1.5.3 Mathematical Analysis of Restricted Boltzmann Machines (RBMs)

16.1.6. Deep Belief Networks (DBNs)
16.1.6.1 Literature Review of Deep Belief Networks (DBNs)
| Authors (Year) | Contribution |
|---|---|
| Hinton et al. (2006) [854] | Introduced a fast learning algorithm for DBNs using a greedy layer-wise pre-training strategy based on Restricted Boltzmann Machines (RBMs). Addressed the vanishing gradient problem and established DBNs as foundational deep learning architectures. |
| Lee et al. (2009) [1109] | Developed Convolutional Deep Belief Networks (CDBNs) by incorporating convolutional structures into DBNs, introducing local receptive fields and weight sharing for improved scalability in image and speech processing. |
| Mohamed et al. (2012) [1111] | Pioneered the application of DBNs for acoustic modeling in speech recognition, demonstrating superior performance over traditional Gaussian Mixture Models (GMMs) in conjunction with Hidden Markov Models (HMMs). |
| Zhang and Zhao (2017) [1113] | Applied DBNs for fault diagnosis in chemical processes, showing that DBNs effectively model dependencies in multivariate datasets and enhance fault detection accuracy. |
| Peng et al. (2019) [1112] | Developed a DBN-based health indicator construction framework for bearing fault diagnosis, enabling automatic extraction of degradation features from vibration signals for early failure detection. |
| Zhang et al. (2018) [1115] | Integrated DBNs with feature selection methods for predicting clinical outcomes in lung cancer patients, enhancing predictive accuracy and interpretability in medical prognosis. |
| Zhong et. al. (2017) [1118] | Demonstrated that DBNs could learn meaningful representations even with limited training data, reinforcing their utility in scenarios with scarce labeled datasets. |
| Liu (2018) [1114] | Combined DBNs with the Autoregressive Integrated Moving Average (ARIMA) model for stock trend forecasting, leveraging DBNs’ pattern recognition capabilities with ARIMA’s time-series forecasting strengths. |
| Hoang and Kang (2018) [1116] | Developed a novel fault diagnosis framework by integrating DBNs with Dempster–Shafer evidence theory, enhancing fault detection accuracy through probabilistic reasoning. |
16.1.6.2 Recent Literature Review of Deep Belief Networks (DBNs)
| Authors (Year) | Contribution |
|---|---|
| Joudaki (2025) [1100] | Provides an extensive literature review on the theoretical foundations of DBNs and RBMs, emphasizing their role in human action recognition. Demonstrates their effectiveness in capturing complex patterns in human gestures and postures for applications such as motion tracking and gesture-based interface design. |
| Alzughaibi (2025) [1119] | Applies DBNs in pest detection, integrating them with a modified artificial hummingbird algorithm. Enhances pest classification accuracy using deep hierarchical feature extraction on large image datasets. |
| Savitha et al. (2025) [1103] | Employs DBNs for cardiovascular disease prediction, integrating them with the Harris Hawks Search optimization algorithm. Demonstrates improved feature selection and classification accuracy in medical diagnosis applications. |
| Tausani et al. (2025) [1120] | Investigates the top-down inference capabilities of DBNs compared to other deep generative models. Explores their interpretability and efficiency in artificial intelligence and cognitive computing tasks. |
| Kumar and Ravi (2025) [1121] | Introduces XDATE, an explainable deep learning framework that combines DBNs with auto-encoders. Uses the Garson Algorithm to enhance feature attribution, balancing accuracy and interpretability in classification tasks. |
| Alhajlah (2024) [1122] | Applies DBNs in medical image analysis for automated lesion detection in gastrointestinal endoscopic images. Integrates DBNs with a genetic algorithm-based segmentation technique to enhance diagnostic precision and reduce false positives. |
| Hossain et al. (2025) [1107] | Benchmarks DBNs against classical and deep learning models for human activity recognition. Demonstrates DBNs’ robustness and adaptability, particularly in small and medium-sized datasets. |
| Pavithra et al. (2025) [1123] | Develops a hybrid RNN-DBN model for IoT attack detection, capturing temporal dependencies in network traffic for anomaly detection and threat mitigation. |
| Bhadane and Verma (2024) [1124] | Explores DBNs for personality trait classification, comparing their performance with CNNs and RNNs. Highlights the advantages of DBNs in processing high-dimensional psychological datasets. |
| Keivanimehr and Akbari (2025) [1125] | Investigates DBNs for edge computing applications in cardiovascular disease monitoring. Discusses feasibility in TinyML environments for computational efficiency and real-time processing. |
16.1.6.3 Mathematical Analysis of Deep Belief Networks (DBNs)
- represents the conditional distribution of the visible layer given the first hidden layer.
- represents the conditional dependency between consecutive layers.
- represents the top-layer prior, which is modeled as a Restricted Boltzmann Machine (RBM).

- Train the first RBM with input to obtain hidden activations:
- Use hidden activations as input for training the second RBM:
16.1.7. t-Distributed Stochastic Neighbor Embedding (t-SNE)
16.1.7.1 Literature Review of t-Distributed Stochastic Neighbor Embedding
| Authors (Year) | Contribution |
|---|---|
| van der Maaten and Hinton (2008) [1042] | Introduced t-SNE as an extension of Stochastic Neighbor Embedding (SNE) by incorporating a Student’s t-distribution to address the crowding problem. This modification improved the preservation of both local and global structures and formulated t-SNE as an optimization problem minimizing Kullback-Leibler divergence. |
| Kobak and Berens (2019) [1126] | Developed "opt-SNE," an automated parameter selection framework for fine-tuning perplexity and learning rate in t-SNE. Their work demonstrated that proper parameter selection significantly affects embedding quality and reduces misleading visualizations in single-cell transcriptomics. |
| Belkina et al. (2019) [1127] | Proposed an optimized pipeline for selecting t-SNE parameters, focusing on reproducibility and interpretability in large-scale biological datasets. Their work emphasized the importance of standardizing parameter selection to improve clustering outcomes in t-SNE applications. |
| Linderman and Steinerberger (2019) [1128] | Provided a mathematical proof that t-SNE reliably recovers well-separated clusters under specific conditions, bridging the gap between empirical observations and theoretical guarantees in clustering applications. |
| Amorim and Mirkin (2012) [1129] | Investigated the role of distance metrics in t-SNE clustering, proposing the use of Minkowski metrics and feature weighting to refine cluster separation. Their work highlighted how different distance functions influence embedding outcomes. |
| Wattenberg et al. (2016) [1130] | Analyzed interpretational pitfalls in t-SNE, cautioning against misinterpretation of distances in low-dimensional embeddings. Their study outlined best practices for applying t-SNE in real-world data visualization tasks. |
| Pezzotti et al. (2016) [1131] | Developed a real-time, user-steerable t-SNE method that enabled progressive visual analytics. Their approach reduced computational overhead and allowed for interactive exploration of embeddings in large-scale applications. |
| Kobak and Linderman (2021) [1132] | Investigated initialization strategies for t-SNE and UMAP, demonstrating that different initialization schemes lead to vastly different embeddings. Their study emphasized the importance of careful initialization to preserve global data structures. |
| Becht et al. (2019) [1133] | Introduced UMAP as an alternative to t-SNE, showing that UMAP provides improved scalability and better preservation of global structures while requiring significantly less computational time. |
| Moon et al. (2019) [1134] | Proposed PHATE as a dimensionality reduction technique designed to capture both local and global structures more effectively than t-SNE. Their study demonstrated PHATE’s superiority in visualizing continuous biological trajectories, such as cell differentiation processes. |
16.1.7.2 Recent Literature Review of t-Distributed Stochastic Neighbor Embedding
| Authors (Year) | Contribution |
|---|---|
| Rivera and Deniega (2025) [1135] | Demonstrated the efficacy of t-SNE in automating flow cytometry gating, improving cell population classification using clustering techniques such as DBSCAN and PCA. |
| Chang (2025) [1136] | Provided a survey of dimensionality reduction techniques, analyzing the strengths and weaknesses of t-SNE compared to PCA and UMAP, emphasizing its performance on non-linear data distributions. |
| Chern et al. (2025) [1137] | Applied t-SNE for visualizing metal defect classification in YOLO-based deep learning models, enhancing interpretability in industrial defect detection. |
| Li et al. (2025) [1138] | Utilized t-SNE for olfactory sensor data analysis in detecting aflatoxin B1 contamination in wheat, demonstrating its utility in chemical data visualization. |
| Singh and Singh (2025) [1139] | Developed a hybrid medical image retrieval approach by integrating deep learning features with t-SNE, improving clustering and accuracy in gastric image retrieval systems. |
| Sun et al. (2025) [1142] | Investigated t-SNE’s application in biomechanics, reducing the dimensionality of electromyography (EMG) data for improved muscle fatigue classification. |
| Su et al. (2025) [1143] | Incorporated t-SNE in seismic fragility analysis of earth-rock dams, enhancing predictive accuracy when integrated into a deep residual shrinkage network. |
| Yousif and Al-Sarray (2025) [1144] | Combined t-SNE with spectral clustering via convex optimization for breast cancer gene classification, achieving superior clustering performance over conventional methods. |
| Park et al. (2025) [1145] | Assessed the use of t-SNE in flow cytometry for hematologic malignancies, highlighting its superiority in preserving local structures compared to UMAP. |
| Qiao et al. (2025) [1146] | Applied t-SNE to analyze cancer-associated fibroblasts (CAFs) in pancreatic ductal adenocarcinoma, identifying gene expression subclusters for patient stratification. |
| Su et al. (2025) [1143] | Employed t-SNE for damage quantification in aircraft structures, integrating it into a deep learning framework to enhance structural health monitoring. |
16.1.7.3 Mathematical Analysis of t-Distributed Stochastic Neighbor Embedding


16.1.8. Locally Linear Embedding (LLE)
16.1.8.1 Literature Review of Locally Linear Embedding (LLE)
| Authors (Year) | Contribution |
| Roweis and Saul (2000) [1043] | Introduced Locally Linear Embedding (LLE), a nonlinear dimensionality reduction method that preserves local geometric properties while embedding high-dimensional data into a lower-dimensional space. LLE represents each data point as a linear combination of its nearest neighbors and determines an embedding that best preserves these local reconstructions. |
| Saul and Roweis (2000) [1147] | Provided an in-depth mathematical exposition of the LLE algorithm, detailing its optimization formulation and demonstrating its effectiveness in uncovering nonlinear manifold structures without requiring explicit parametrization of the data manifold. |
| Polito and Perona (2001) [1148] | Extended LLE to clustering and dimensionality reduction, showing that LLE naturally groups data points based on intrinsic geometric properties, allowing for soft clustering useful in vision tasks. |
| Zhang and Zha (2004) [1150] | Proposed Local Tangent Space Alignment (LTSA), which aligns local tangent spaces rather than preserving local linear relationships, addressing LLE’s sensitivity to variations in neighborhood density and improving global manifold reconstruction. |
| Donoho and Grimes (2003) [1151] | Introduced Hessian Eigenmaps, a variation of LLE utilizing Hessian-based quadratic forms to capture local curvature, reducing distortion in embeddings for data with significant variations in local density. |
| Zhang and Wang (2006) [1152] | Developed Modified Locally Linear Embedding (MLLE), incorporating multiple weights in each neighborhood to improve numerical stability and robustness in low-dimensional representations. |
| Liang (2005) [1153] | Applied LLE to semi-supervised learning in natural language processing, leveraging the geometric structure of unlabeled data to discover meaningful feature representations. |
| Coates and Ng (2012) [1154] | Compared LLE with other unsupervised learning techniques, such as K-means clustering, highlighting LLE’s strengths and weaknesses in automatic feature extraction and representation learning. |
| Hyvärinen and Oja (2000) [1155] | Explored Independent Component Analysis (ICA) as an alternative method for structured representation learning, contrasting ICA’s focus on statistical independence with LLE’s preservation of local geometric relationships. |
| Lee et al. (2006) [1156] | Developed efficient sparse coding algorithms, discussing differences between sparse coding techniques and manifold learning approaches like LLE, emphasizing the advantages of sparsity constraints in generating interpretable features. |
16.1.8.2 Recent Literature Review of Locally Linear Embedding (LLE)
| Authors (Year) | Contribution |
| Yang et al. (2025) [1157] | Utilized LLE in international tourism competitiveness analysis, integrating it with entropy-TOPSIS and GRA models to enhance ranking systems for city-level data. |
| Wang et al. (2025) [1158] | Introduced a hybrid model combining LLE with Transformer and LightGBM to predict thermal conductivity of natural rock materials, improving feature selection in geothermal energy applications. |
| Jin et al. (2025) [1159] | Proposed Neighbor-Adapted LLE (NALLE) for SAR image processing, enabling zero-shot learning and improving maritime object classification in remote sensing. |
| Li et al. (2024) [1160] | Developed a novel variation of LLE using the Ali Baba and The Forty Thieves Algorithm, enhancing computational efficiency while preserving data manifold accuracy. |
| Jafari et al. (2025) [1161] | Conducted an extensive review of LLE and its variants, focusing on feature extraction, non-linear dimensionality reduction, and data visualization in biological and big data applications. |
| Zhou et al. (2025) [1162] | Applied LLE in nondestructive testing (NDT) of thermal barrier coatings, demonstrating improved feature extraction from terahertz imaging data for stress detection. |
| Dou et al. (2024) [1164] | Proposed an LLE-based method for fault detection in high-speed train traction systems, facilitating early system failure detection and enhancing train safety. |
| Bagherzadeh et al. (2021) [1165] | Combined LLE with K-means clustering for test case prioritization in software testing, improving defect detection efficiency and reducing debugging costs. |
| Liu et al. (2025) [1166] | Developed an intelligent recognition algorithm for analyzing substation secondary wiring diagrams using a denoised LLE (D-LLE) variant, enhancing power system automation and maintenance. |
16.1.8.3 Mathematical Analysis of Locally Linear Embedding (LLE)
16.1.8.4 Python Code to Generate Figure 154 Illustrating the Locally Linear Embedding (LLE) Applied to a Swiss Roll Dataset



16.1.9. Independent Component Analysis (ICA)
16.1.9.1 Literature Review of Independent Component Analysis (ICA)
| Authors (Year) | Contribution |
| Comon (1994) [1167] | Established the theoretical framework of ICA, emphasizing statistical independence over uncorrelatedness for blind source separation. Introduced mutual information minimization as a means to achieve independence and linked ICA with higher-order statistics, particularly kurtosis. |
| Jutten and Herault (1991) [1168] | Developed an adaptive learning algorithm for ICA using neuromorphic principles. Introduced nonlinear functions to iteratively adjust the weight matrix based on higher-order statistical moments. This work laid the foundation for iterative computational approaches in ICA. |
| Hyvärinen and Oja (1997) [1169] | Proposed the FastICA algorithm, a fixed-point method maximizing non-Gaussianity using negentropy approximation. Improved convergence speed and incorporated PCA-based whitening for signal decorrelation. |
| Bell and Sejnowski (1995) [1044] | Developed the Infomax algorithm using maximum likelihood estimation, linking ICA to entropy maximization in neural networks. Provided a biological perspective on ICA and its role in sensory processing. |
| Cardoso and Souloumiac (1993) [1170] | Introduced the JADE algorithm, employing joint approximate diagonalization of eigenmatrices to separate independent sources. Used fourth-order cumulants to achieve robust and efficient source separation. |
| Amari et al. (1995) [1171] | Developed an ICA algorithm based on natural gradient learning, utilizing a Riemannian metric for efficient optimization. Provided an information-geometric perspective, improving convergence speed. |
| Lee et al. (1999) [1172] | Extended the Infomax algorithm to handle both sub-Gaussian and super-Gaussian source distributions using a nonlinear adaptive function. Enhanced ICA applicability to biomedical signals with mixed statistical distributions. |
| Pham and Garat (1997) [1173] | Introduced a quasi-maximum likelihood estimation (QMLE) approach to ICA, formulating it as an optimization problem within statistical estimation theory. Improved robustness in noisy and low-sample data environments. |
| Højen-Sørensen et al. (2002) [1174] | Proposed a variational Bayesian framework for ICA using mean-field approximations. Enabled ICA to incorporate uncertainty quantification and handle missing data, extending its robustness. |
| Stone (2004) [1175] | Authored a comprehensive textbook on ICA, detailing its mathematical principles, mutual information, higher-order statistics, and applications in signal processing and machine learning. Provided a systematic exploration of ICA’s theoretical and practical aspects. |
16.1.9.2 Recent Literature Review of Independent Component Analysis (ICA)
| Authors (Year) | Contribution |
| Behzadfar et al. (2025) [1176] | Proposed a multi-frequency ICA-based approach for fMRI data processing, enhancing component extraction and eliminating non-gray matter signals for improved frequency-based brain imaging accuracy. |
| Eierud et al. (2025) [1177] | Developed the NeuroMark PET ICA framework, enabling ICA decomposition of whole-brain PET signals into networks to construct multivariate molecular imaging brain atlases. |
| Wang et al. (2025) [1178] | Applied ICA to analyze terrestrial water storage anomaly (TWSA) trends in the Yangtze River Basin, identifying independent spatial trends and improving hydrological cycle assessments. |
| Heurtebise et al. (2025) [1179] | Utilized ICA to stabilize estimators in hydrological dataset analysis, enhancing the reliability of multivariate mutual information measurements. |
| Ouyang and Li (2025) [1180] | Integrated ICA with PCA for semi-automated EEG preprocessing, facilitating artifact removal and improving signal quality in large-scale EEG studies. |
| Zhang and Luck (2025) [1181] | Investigated ICA-based artifact correction in brain-computer interfaces, demonstrating significant improvements in SVM-based EEG decoding accuracy. |
| Kirsten and Süssmuth (2025) [1182] | Applied ICA to financial time-series data, filtering noise and identifying independent market-driving factors, enhancing cryptocurrency price prediction when combined with ARIMA modeling. |
| Jung et al. (2025) [1183] | Developed a hybrid fault detection system integrating ICA with auto-associative kernel regression (AAKR) for power plant monitoring, improving anomaly detection and predictive maintenance. |
| Wang et al. (2025) [1184] | Implemented ICA for noise filtering in passive acoustic localization, significantly enhancing underwater object detection through improved signal clarity. |
| Luo et al. (2025) [1185] | Applied ICA in brain-computer interfaces (BCIs) to eliminate electrical noise and enhance neural signal transmission, improving the reliability of noninvasive BCIs. |
16.1.9.3 Mathematical Analysis of Independent Component Analysis (ICA)
16.1.9.4 Python Code to Generate Figure 155 Illustrating the Independent Component Analysis (ICA) of Mixed Signals



16.2. Supervised Learning
16.2.1. Literature Review of Supervised Learning
| Authors (Year) | Contribution |
| Vapnik (1995) [143] | Introduced statistical learning theory, including the Vapnik-Chervonenkis (VC) dimension, structural risk minimization (SRM), and support vector machines (SVMs). Established the kernel trick for computing inner products in high-dimensional feature spaces. |
| Bishop (2006) [124] | Provided a probabilistic treatment of supervised learning, including Bayesian inference, Gaussian processes, Bayesian linear regression, and probabilistic neural networks. Discussed mixture models and the Expectation-Maximization (EM) algorithm. |
| Breiman (2001) [754] | Introduced Random Forests and bootstrap aggregation (bagging) for variance reduction. Theoretical justification of random feature selection and the out-of-bag (OOB) error estimation method. |
| Friedman, Hastie, and Tibshirani (2000) [1056] | Provided a statistical interpretation of boosting, demonstrating its connection to stagewise optimization of an exponential loss function. Established the foundation for gradient boosting machines (GBMs). |
| Schapire (1990) [1058] | Proved that weak classifiers can be transformed into strong classifiers through boosting. Developed a mathematical framework for iterative sample reweighting to reduce error bounds. |
| LeCun et al. (1998) [1055] | Introduced convolutional neural networks (CNNs) with applications to handwritten digit recognition. Provided rigorous derivations of backpropagation and weight sharing for reducing parameters. |
| Srivastava et al. (2014) [141] | Proposed dropout as a regularization technique in deep learning, providing a probabilistic interpretation as model averaging. Demonstrated improvements in generalization performance. |
| Rosenblatt (1958) [1057] | Introduced the perceptron algorithm and proved the perceptron convergence theorem for linearly separable data. Inspired the development of multi-layer perceptrons (MLPs) and deep networks. |
| Hastie, Tibshirani, and Friedman (2009) [139] | Provided a rigorous mathematical treatment of supervised learning, including regularization methods (ridge regression, Lasso), bias-variance tradeoff, kernel methods, decision trees, and ensemble learning. Theoretical analyses of algorithm convergence properties. |
| Kingma and Ba (2014) [176] | Developed the Adam optimization algorithm, combining Adagrad and RMSProp principles. Provided a mathematical derivation of Adam’s update rules for adaptive learning rates in deep learning models. |
16.2.2. Recent Literature Review of Supervised Learning
| Authors (Year) | Contribution |
| Raikwar and Gupta (2025) [1046] | Developed an AI-driven trust management framework integrating supervised and unsupervised learning to classify security levels in wireless ad hoc networks. Enhanced decentralized communication robustness and improved detection of malicious nodes. |
| Rafiei et al. (2025) [1059] | Applied supervised multi-output classification models, including Random Forest and SVMs, to optimize lipid nanoparticle design for mRNA delivery, improving drug formulation predictions. |
| Pei et al. (2025) [1060] | Proposed a weakly supervised learning approach for segmenting vegetation from UAV images, enhancing classification accuracy in precision farming. |
| Efendi et al. (2025) [1061] | Designed an IoT-based health monitoring system that uses supervised learning algorithms to classify and predict health anomalies in elderly patients, improving remote patient care. |
| Pang et al. (2025) [1062] | Introduced DeepPath, a supervised deep learning framework integrating active learning for protein transition pathway prediction, reducing dependency on extensive labeled datasets. |
| Curry et al. (2025) [1063] | Compared supervised classification techniques and unsupervised clustering in geoscience, optimizing machine learning models for geological pattern detection and hazard assessment. |
| Li et al. (2025) [1064] | Developed -PhenoDrug, a deep learning-based framework employing supervised learning for phenotypic drug screening, accelerating drug candidate identification. |
| Liu et al. (2025) [1065] | Integrated supervised learning with molecular docking and simulations to identify ASGR1 and HMGCR dual-target inhibitors, streamlining drug discovery. |
| Dutta and Karmakar (2025) [1067] | Investigated Random Forest applications in business analytics, demonstrating superior predictive modeling accuracy for optimizing organizational decision-making. |
| Ekanayake (2025) [1054] | Applied supervised deep learning models to enhance MRI reconstruction and super-resolution, improving medical diagnostics by reducing scan times. |
16.2.3. Mathematical Analysis of Supervised Learning
16.2.4. Python Code to Generate Figure 156 Illustrating the Supervised Learning in Regression



16.2.5. Bias-Variance Tradeoff of Supervised Learning
16.2.5.1 Literature Review of Bias-Variance Tradeoff of Supervised Learning
| Reference | Contribution |
| Geman, Bienenstock, and Doursat (1992) | Introduced the bias-variance decomposition in neural networks, demonstrating how model complexity influences predictive error. They formulated a theoretical framework explaining the tradeoff between underfitting (high bias) and overfitting (high variance). |
| Hastie, Tibshirani, and Friedman (2001) | Provided a comprehensive treatment of the bias-variance tradeoff in statistical learning, including rigorous mathematical derivations, practical insights, and empirical analyses across multiple machine learning models. |
| Belkin et al. (2019) | Challenged the classical U-shaped bias-variance curve by introducing the "double descent" phenomenon, showing that over-parameterized models can exhibit improved generalization after an initial increase in test error. |
| Neal et al. (2018) | Analyzed the bias-variance tradeoff in deep learning, arguing that implicit regularization induced by stochastic gradient descent prevents excessive variance growth in large neural networks, explaining their surprising generalization ability. |
| Rocks and Mehta (2020) | Derived analytical expressions for bias and variance in over-parameterized models using statistical physics, uncovering a phase transition where increasing model complexity leads to test error reduction despite interpolation. |
| Guest and Martin (2021) | Extended the bias-variance framework to cognitive science, demonstrating how cognitive models suffer from similar tradeoffs between flexibility and generalization. They argued that model complexity in human cognition follows analogous patterns. |
| Almeida et al. (2020) | Developed a method for mitigating label uncertainty near class boundaries, proposing an adaptive weighting mechanism to reduce both bias and variance, thereby improving model generalization. |
| Zhou et al. (2021) | Investigated knowledge distillation through the lens of the bias-variance tradeoff, demonstrating that soft labels act as implicit regularizers and proposing an optimal weighting scheme to balance bias and variance at the sample level. |
| Gupta et al. (2022) | Provided a rigorous analysis of ensemble methods, proving that ensembling reduces variance without substantially increasing bias, explaining why ensemble learning consistently outperforms single-model approaches. |
| Ranglani (2024) | Conducted an extensive empirical study of the bias-variance tradeoff across various supervised learning models, quantifying bias and variance components and offering practical guidelines for model optimization. |
16.2.5.2 Recent Literature Review of Bias-Variance Tradeoff of Supervised Learning
| Reference | Summary of Contribution |
| Rahman & Rahman (2024) | Provides a foundational introduction to the bias-variance tradeoff in machine learning, focusing on logistic regression, linear classifiers, and regularization techniques. Discusses how adjusting model complexity impacts generalization performance. |
| Tran, Mitra, & Nguyen (2024) | Explores how an ensemble-based neural network effectively balances the bias-variance tradeoff in power system optimization. Demonstrates the model’s ability to enhance stability and accuracy in real-world energy distribution scenarios. |
| George (2024) | Investigates how different machine learning models handle the bias-variance tradeoff in character recognition. Uses the Kaggle digits dataset to optimize performance through regularization and model selection strategies. |
| Du et al. (2025) | Develops a theoretical framework linking bias-variance tradeoff to margin theory and optimization techniques. Proposes computational trade-offs to improve model selection in ensemble learning methods. |
| Polson & Sokolov (2024) | Explains the role of ridge and lasso regression in controlling the bias-variance tradeoff. Provides empirical studies demonstrating how hyperparameter tuning influences model performance and generalization. |
| Jogo (2025) | Covers the statistical foundations of the bias-variance tradeoff, linking it to support vector machines, unsupervised learning, and computational efficiency in high-dimensional spaces. |
| Wang & Pope (2025) | Challenges the traditional U-shaped bias-variance tradeoff by showing that increased complexity beyond a certain point can improve generalization in deep learning models. |
| Chen, Schmidt-Hieber, & Donnat (2024) | Analyzes the impact of graph convolutional networks (GCNs) on bias-variance tradeoff, providing a theoretical framework for evaluating convolutional layer depth in regression models. |
| Obster, Ciolacu, & Humpe (2024) | Investigates the tradeoff between predictive accuracy and interpretability in machine learning models. Introduces a scoring system for model complexity optimization while maintaining an optimal bias-variance balance. |
| Owen, Dick, & Whigham (2024) | Extends the bias-variance decomposition for stochastic learning algorithms, demonstrating how bagging-based ensemble methods improve generalization. Highlights hybrid model selection strategies for variance reduction. |
16.2.5.3 Mathematical Analysis of Bias-Variance Tradeoff of Supervised Learning
- Bias: This term quantifies how far the expected prediction is from the true function . Formally, it is defined asA high-bias model makes systematic errors because it fails to capture the complexity of the data. This often occurs in underfitting, where the model is too simple to represent the underlying structure of the data.
- Variance: This term quantifies the variability of the model’s predictions around its expected value, given byA high-variance model is highly sensitive to fluctuations in the training data and does not generalize well to unseen data. This typically occurs in overfitting, where the model captures noise instead of the true signal.
- Irreducible Error: The term represents noise inherent in the data that no model can eliminate.
16.2.5.4 Python Code to Generate Figure 157 Illustrating the Bias-Variance Tradeoff of Supervised Learning



16.2.6. Support Vector Machine
16.2.6.1 Literature Review of Support Vector Machine
| Reference | Contribution |
| Vladimir N. Vapnik (1995) | Introduced Structural Risk Minimization (SRM), the foundational principle behind SVMs. Established the theoretical basis for SVMs within statistical learning theory. |
| Bernhard Schölkopf and Alexander J. Smola (2002) | Provided a rigorous mathematical treatment of kernel methods, including reproducing kernel Hilbert spaces (RKHS) and Mercer’s theorem, formalizing the kernel trick. |
| Nello Cristianini and John Shawe-Taylor (2000) | Made SVM theory accessible by providing a practical introduction to the mathematical foundations and applications of SVMs. |
| Ingo Steinwart and Andreas Christmann (2008) | Offered a statistical analysis of SVMs, covering consistency, robustness, and learning rates, providing insight into their asymptotic properties. |
| Bernhard Schölkopf, Christopher J.C. Burges, and Alexander J. Smola (1999) | Compiled major advances in kernel methods, including extensions of SVMs to regression (SVR) and novel kernel functions. Showcased applications in image processing and bioinformatics. |
| Harris Drucker et al. (1997) | Developed Support Vector Regression (SVR), adapting SVMs for continuous-valued predictions. Introduced the -insensitive loss function. |
| Thorsten Joachims (1999) | Introduced Transductive SVMs (TSVMs), which leverage both labeled and unlabeled data for improved classification, particularly in text mining applications. |
| Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller (1998) | Developed Kernel Principal Component Analysis (Kernel PCA), extending SVM-related methods to nonlinear dimensionality reduction and feature extraction. |
| Christopher J.C. Burges (1998) | Provided an accessible and mathematically detailed tutorial on SVMs, explaining concepts such as margin maximization, convex optimization, and duality theory. |
| Bernhard Schölkopf et al. (2001) | Introduced One-Class SVMs for anomaly detection, providing a method to estimate the support of a high-dimensional distribution. Applied in fraud detection and network security. |
16.2.6.2 Recent Literature Review of Support Vector Machine
| Study | Contribution | Domain |
| Guo & Sun (2025) | Applied SVM to analyze neuroimaging data, assessing stroke rehabilitation effects in brain tumor patients. Demonstrated SVM’s potential in biomedical imaging and clinical decision-making. | Medical Imaging, Neuroscience |
| Diao et al. (2025) | Optimized Bi-LSTM networks for lung cancer detection. Found that SVM achieved the highest classification accuracy when extracting GLCM features. | Medical Diagnosis, Deep Learning |
| Lin et al. (2025) | Integrated SVM with deep transfer learning in MRI-based sinonasal malignancy detection, achieving 92.6% accuracy. Highlighted SVM’s potential in radiomics. | Radiomics, MRI Analysis |
| Çetintaş (2025) | Used an optimized SVM model with Grid Search for monkeypox detection. Improved classification performance on imbalanced datasets. | Medical Image Classification |
| Wang & Zhao (2025) | Compared sentiment lexicon and machine learning methods for citation sentiment identification. Demonstrated SVM’s effectiveness in text classification. | NLP, Sentiment Analysis |
| Muralinath et al. (2025) | Explored multichannel EEG classification using spectral graph kernels. Showed SVM’s robustness in epilepsy detection. | EEG Analysis, Neurology |
| Hu et al. (2025) | Addressed class imbalance in sarcasm detection using ensemble-based oversampling techniques. Showed SVM’s high performance in NLP. | NLP, Social Media Analysis |
| Wang et al. (2025) | Developed an SVR-based predictive model for tensile properties of automotive steels. Proved its effectiveness in mechanical engineering. | Materials Science, Engineering |
| Husain et al. (2025) | Applied SVR for modeling shear thickening fluid behavior, showing its ability to forecast nonlinear physical behaviors. | Physics, Fluid Mechanics |
| Iqbal & Siddiqi (2025) | Integrated SVM into a hybrid deep learning model for seasonal streamflow prediction, demonstrating SVM’s utility in hydrological modeling. | Hydrology, Environmental Science |
16.2.6.3 Mathematical Analysis of Support Vector Machine
16.2.6.4 Python Code to Generate Figure 158 Illustrating the Support Vector Machine (SVM) with Linear Kernel




16.2.7. Linear Regression
16.2.7.1 Literature Review of Linear Regression
| Reference | Contribution |
| Legendre (1805) | Introduced the least squares method for estimating parameters by minimizing the sum of squared residuals, initially applied to astronomical data. |
| Gauss (1809, 1821) | Formally justified the least squares method under normal error assumptions, proving its best linear unbiased estimation (BLUE) property and laying the foundation for the Gauss-Markov theorem. Developed statistical inference for regression. |
| Pearson (1901) | Developed principal component analysis (PCA), establishing the geometric relationship between regression and orthogonal projections, which later connected to singular value decomposition (SVD). |
| Fisher (1922) | Formalized the statistical inference framework for regression, deriving the sampling distributions of regression coefficients, hypothesis testing procedures, and maximum likelihood estimation (MLE) methods. |
| Koopmans (1937) | Extended regression to time series analysis, addressing issues such as serial correlation, heteroskedasticity, and multicollinearity, which are critical in econometric models. |
| Goldberger (1964) | Developed generalized least squares (GLS) to handle correlated errors and non-constant variance, providing a rigorous treatment of assumption violations in ordinary least squares (OLS). |
| Rao (1973) | Expanded regression theory by introducing ridge regression to handle multicollinearity, unifying regression within the broader framework of multivariate statistical inference. |
| Huber (1964) | Developed robust regression techniques, particularly M-estimation, which mitigates the impact of outliers and non-Gaussian error distributions, improving model stability. |
| Hastie, Tibshirani, and Friedman (2009) | Introduced regularized regression techniques such as ridge regression and LASSO, incorporating penalization terms to prevent overfitting and improve generalization in high-dimensional data. |
16.2.7.2 Recent Literature Review of Linear Regression
| Paper | Key Contribution |
| Ramadhan & Ali (2025) | Introduces a multivariate wavelet shrinkage approach for quantile regression, improving accuracy in high-dimensional data. |
| Zhou et al. (2025) | Integrates linear regression with machine learning techniques (PLS, DT) for real-time environmental monitoring of chemical degradation. |
| Zhong et al. (2025) | Uses multiple linear regression to analyze factors influencing nurses’ attitudes and practices in postural management of premature infants. |
| Liu et al. (2025) | Applies regression analysis to evaluate research capabilities of pediatric clinical nurses, identifying key influencing factors. |
| Dietze et al. (2025) | Examines the impact of opioid overdose prevention training using multivariable regression, evaluating demographic influences on knowledge retention. |
| Ming-jun & Jian-ya (2025) | Constructs multiple linear regression models to analyze the economic effects of environmental taxation policies. |
| Hasan & Ghosal (2025) | Uses regression techniques to quantify inequities in healthcare access, identifying determinants like affordability and availability. |
| Zeng et al. (2025) | Employs LASSO regression for maize yield prediction, integrating remote sensing data for enhanced agricultural forecasting. |
| Baird et al. (2025) | Uses AI-driven regression to analyze orthopedic surgery trends, particularly in ACL reconstruction and hip arthroscopy. |
| Overton & Eicker (2025) | Applies linear and logistic regression to assess fertility and milk production efficiency in Holstein dairy cows. |

16.2.7.3 Mathematical Analysis of Linear Regression


16.2.8. Logistic Regression
16.2.8.1 Literature Review of Logistic Regression
| Reference | Contribution |
|---|---|
| Cox (1958) | Formulated logistic regression using the logit function and MLE. |
| Nelder & Wedderburn (1972) | Introduced GLMs, formalizing logistic regression in a unified |
| statistical framework. | |
| Haberman (1973) | Developed deviance tests for logistic regression goodness-of-fit. |
| Hosmer & Lemeshow (1980) | Introduced the Hosmer-Lemeshow test for model calibration. |
| McCullagh & Nelder (1983) | Provided a rigorous theoretical foundation for GLMs, including |
| logistic regression. | |
| Hastie, Tibshirani & Friedman (2001) | Discussed regularization methods and logistic regression in the |
| context of statistical learning. | |
| Green & Silverman (1994) | Extended logistic regression to nonparametric settings. |
| Firth (1993) | Proposed bias reduction techniques for small-sample logistic |
| regression (Firth’s correction). | |
| King & Zeng (2001) | Addressed logistic regression’s biases in rare event data. |
| Gelman & Hill (2007) | Developed Bayesian and hierarchical logistic regression models. |
16.2.8.2 Recent Literature Review of Logistic Regression
| Study Title | Main Contribution |
| Sani et al. 2025 | Logistic regression used to study sociodemographic predictors of contraception adoption. |
| Dorsey et al. 2025 | Assesses how visual exposure influences shorebird nesting behavior. |
| Slawny et al. 2025 | Examined how language dominance affects bilingual family communication. |
| Waller et al. 2025 | Identifies birth defect risk factors using logistic regression analysis. |
| Beyeler et al. 2025 | Evaluated impact of vessel characteristics on stroke interventions. |
| Yedavalli et al. 2025 | Uses logistic regression to assess hypoperfusion intensity ratio and stroke outcomes. |
| Aarakit et al. 2025 | Examines social network effects on renewable energy choices. |
| Yang et al. 2025 | Studied how cardiac health influences neurological risks. |
| Cortese 2025 | Identifies increased perinatal risks in ADHD-diagnosed mothers. |
| Gaspar et al. 2025 | Evaluated factors affecting bleeding risks under anticoagulant therapy. |

16.2.8.3 Mathematical Analysis of Logistic Regression
16.2.9. Linear Discriminant Analysis
16.2.9.1 Literature Review of Linear Discriminant Analysis
| Reference | Contribution |
|---|---|
| Fisher (1936) | Introduced LDA, maximizing inter-class separation while minimizing intra-class variance. |
| Anderson (1958) | Established LDA’s statistical foundation under multivariate Gaussian assumptions. |
| Rao (1948) | Generalized LDA via canonical discriminant analysis and established its statistical properties. |
| Duda, Hart & Stork (2001) | Compared LDA to other classifiers and provided geometric/probabilistic insights. |
| McLachlan (2004) | Discussed regularized LDA, kernel LDA, and mixture model extensions. |
| Hastie, Tibshirani & Friedman (2009) | Presented LDA in a machine learning context, linking it to logistic regression and SVMs. |
| Belhumeur et al. (1997) | Developed Fisherfaces, applying LDA to face recognition for robustness. |
| Mika et al. (1999) | Proposed Kernel Fisher Discriminant Analysis (KFDA) for nonlinear classification. |
| Ye (2005) | Analyzed Generalized Discriminant Analysis (GDA) for high-dimensional, low-sample-size (HDLSS) problems. |
| Sugiyama (2007) | Developed Local Fisher Discriminant Analysis (LFDA) for multimodal classification. |
16.2.9.2 Recent Literature Review of Linear Discriminant Analysis
| Authors (Year) | Contribution |
| W. Wolff, C.S. Martarelli (2025) | Used LDA for emotion classification via bodily sensation mapping, revealing distinct bodily activation patterns for different emotional states. |
| A. Rincón Santamaría, F.E. Hoyos (2025) | Achieved 91.67% accuracy in classifying P. falciparum-infected red blood cells using LDA, demonstrating its effectiveness in medical diagnostics. |
| B. Li, S. Jiang (2025) | Applied LDA with graph convolutional networks for petroleum reservoir fluid classification, improving feature selection in geophysical studies. |
| B.A. Caceres, A. Nyembwe (2025) | Used LDA to analyze the psychological and physiological effects of discrimination on blood pressure among young Black mothers. |
| S.K. Singh, M. Kumar (2025) | Integrated LDA for dimensionality reduction in a CNN-BiLSTM model, enhancing facial expression recognition performance. |
| T. Akter, M.A. Faqeerzada (2025) | Compared LDA with SVM and PLS-DA for hyperspectral imaging-based fruit defect classification, validating LDA’s effectiveness in food quality assessment. |
| F. Deng, L. Zhang (2025) | Applied LDA to classify causes of death in colorectal and lung cancer patients, showcasing its utility in predictive healthcare analytics. |
| H.M. Chick, N. Sparks (2025) | Used LDA in microbiome analysis to classify bacterial strains linked to gut inflammation in broiler chickens. |
| X. Miao, L. Xu (2025) | Developed an LDA-PCA hybrid model for breast cancer molecular subtyping using spectral imaging data, improving diagnostic accuracy. |
| D. Rohan, G.P. Reddy (2025) | Compared LDA with ensemble AI techniques for heart disease prediction, demonstrating LDA’s limitations in non-linear data analysis. |

16.2.9.3 Mathematical Analysis of Linear Discriminant Analysis
16.2.9.4 Multiclass Linear Discriminant Analysis


16.2.9.5 Python Code to Generate Figure 167 Illustrating the Eigenvalue Perturbation Analysis of Linear Discriminant Analysis (LDA)



16.2.9.6 Python Code to Generate Figure 168 Illustrating the Spectral Decomposition of the Discriminant Space in Linear Discriminant Analysis (LDA)



16.2.9.7 Python Code to Generate Figure 169 Illustrating the Asymptotic Convergence of the Decision Boundary in Linear Discriminant Analysis (LDA)



16.2.9.8 Python Code to Generate Figure 170 Illustrating the Exact Asymptotic Bounds for the Misclassification Probability in Linear Discriminant Analysis (LDA)



16.2.9.9 Python Code to Generate Figure 171 Illustrating the Condition Number of the Fisher Matrix in Linear Discriminant Analysis (LDA)



16.2.9.10 Python Code to Generate Figure 172 Illustrating the Random Matrix Theory (RMT) Perspective on Linear Discriminant Analysis (LDA)



16.2.9.11 Python Code to Generate Figure 173 Illustrating the Exact Rate of Eigenvalue Concentration in Linear Discriminant Analysis (LDA)



16.2.9.12 Python Code to Generate Figure 174 Illustrating the Weyl Bounds on the Eigenvalues of the Fisher Matrix in Linear Discriminant Analysis (LDA)
- The original eigenvalues of F.
- The perturbed eigenvalues of .
- The Weyl bounds around each eigenvalue, i.e. intervals:
- Blue curve: eigenvalues of the unperturbed Fisher matrix.
- Red curve: eigenvalues after perturbation.
- Gray dashed vertical intervals: Weyl bounds, showing the range each eigenvalue could move under perturbation.
-
The red points always fall within these intervals, verifying Weyl’s theorem.Figure 174. Weyl bounds on the eigenvalues of the Fisher matrix in Linear Discriminant Analysis (LDA). Blue points denote the original eigenvalues, red points denote the perturbed eigenvalues under a random perturbation of , and gray dashed intervals show the Weyl bounds . The perturbed eigenvalues lie within the predicted Weyl intervalsFigure 174. Weyl bounds on the eigenvalues of the Fisher matrix in Linear Discriminant Analysis (LDA). Blue points denote the original eigenvalues, red points denote the perturbed eigenvalues under a random perturbation of , and gray dashed intervals show the Weyl bounds . The perturbed eigenvalues lie within the predicted Weyl intervals




16.2.9.13 Python Code to Generate Figure 175 Illustrating the Ky Fan Norm Bound of the Eigenvalues of the Linear Discriminant Analysis (LDA)



16.2.9.14 Python Code to Generate Figure 176 Illustrating the Bauer-Fike Theorem Applied to Eigenvalues of the Fisher Matrix in LDA
- Blue points: original eigenvalues of
- Red crosses: eigenvalues after perturbation
- Gray shaded interval: Bauer-Fike bound



16.2.9.15 Python Code to Generate Figure 177 Illustrating the Generalization Error of Linear Discriminant Analysis (LDA) Using PAC-Bayesian Bounds
- P = prior distribution over classifiers,
- Q = posterior over classifiers,
- = KL divergence,
- n = sample size.
- Empirical error
- PAC-Bayesian bound as a function of sample size.
- Blue curve = empirical misclassification error (test error)
- Red dashed curve = PAC-Bayesian upper bound on generalization error



16.2.10. Naïve Bayes Classifier
16.2.10.4 Python Code to Generate Figure 178
- The contour plot shows the predicted class regions (decision boundary).
- Scatter points show the training data.



16.2.11. Decision Tree Learning
16.2.11.1 Literature Review of Decision Tree Learning
| Reference | Key Contribution |
| Quinlan (1986) | Introduced the ID3 algorithm, a fundamental top-down greedy approach that selects attributes based on maximum information gain, enabling efficient decision tree construction. |
| Quinlan (1993) | Developed the C4.5 algorithm, which extended ID3 by incorporating continuous attributes, handling missing values, and implementing pruning techniques to mitigate overfitting. |
| Breiman et al. (1984) | Proposed the CART methodology, which introduced binary decision trees, the Gini impurity measure, cost-complexity pruning, and regression trees, laying a rigorous statistical foundation. |
| Kohavi & John (1997) | Introduced the wrapper method for feature selection, demonstrating how optimal feature subset selection improves decision tree accuracy and computational efficiency. |
| Breiman (1996) | Developed the bagging (bootstrap aggregating) technique, which improves decision tree stability and reduces variance by averaging predictions from multiple bootstrapped models. |
| Freund & Schapire (1997) | Introduced the AdaBoost algorithm, an adaptive boosting approach that iteratively adjusts training sample weights to focus on misclassified instances, significantly enhancing decision tree performance. |
| Breiman (2001) | Created the Random Forests algorithm, which enhances decision trees using ensemble learning by constructing multiple randomized trees and aggregating their predictions. |
| Domingos & Hulten (2000) | Developed the Very Fast Decision Tree (VFDT) algorithm, designed for real-time data stream mining using Hoeffding bounds for incremental tree construction with fixed memory constraints. |
| Freund & Mason (1999) | Proposed the Alternating Decision Tree (ADTree), integrating decision trees with boosting to create interpretable models with superior generalization properties. |
| Quinlan (1993) | Developed Oblique Decision Trees, which introduce linear combination-based decision boundaries, improving classification performance by allowing non-axis-aligned splits. |
16.2.11.2 Recent Literature Review of Decision Tree Learning
| Authors (Year) | Title | Contribution |
| Usman et al. (2025) | Identifying the Best-Selling Product using Machine Learning Algorithms | Explores decision trees for product sales forecasting, comparing single decision trees with ensemble models like random forests to enhance predictive accuracy. |
| Abbas et al. (2025) | Low Back Pain Among Health Sciences Undergraduates | Uses decision tree classifiers to predict low back pain patterns in students, demonstrating applications in healthcare analytics. |
| Deng et al. (2025) | Prediction of Retail Commodity Hot-Spots | Investigates boosting techniques where successive decision trees correct errors from previous ones, improving retail demand forecasting. |
| Eili et al. (2025) | Predicting Clinical Pathways of Traumatic Brain Injuries (TBI) | Integrates decision trees with Markov models for patient treatment pathway prediction, showcasing applications in dynamic decision-making. |
| Yin et al. (2025) | Gamma-Glutamyl Transferase Plus Carcinoembryonic Antigen Ratio Index | Compares decision trees with logistic regression for predicting cancer treatment responses, showing tree-based methods’ effectiveness in handling nonlinear relationships. |
| Liu et al. (2025) | The Influence of Different Factors on the Bond Strength of Lithium Disilicate Glass–Ceramics to Resin | Applies decision trees in dental materials research, analyzing bond strength factors and highlighting feature importance. |
| Barghouthi et al. (2025) | A Fused Multi-Channel Prediction Model of Pressure Injury | Develops a hybrid model integrating decision trees with K-nearest neighbors and gradient boosting for improved predictive healthcare analytics. |
| Jewan (2025) | Remote Sensing Technology and Machine Learning for Crop Yield Prediction | Utilizes decision tree classifiers for agricultural forecasting, modeling the influence of environmental factors on crop productivity. |
| Akbal et al. (2025) | Accurate Indoor Home Location Classification through Sound Analysis | Implements decision trees in indoor localization using sound analysis, demonstrating its effectiveness in hierarchical feature classification. |
| Mokan et al. (2025) | Pixel-Wise Classification of the Retinal Vasculature into Arteries and Veins | Uses decision trees for medical image segmentation, distinguishing arteries and veins in retinal scans, improving diagnostic precision. |

16.2.11.3 Mathematical Analysis of Decision Tree Learning
16.2.12. k-Nearest Neighbors Algorithm
16.2.12.1 Literature Review of k-Nearest Neighbors (KNN) Algorithm
| Reference | Key Contribution |
| Fix and Hodges (1951) | Introduced the k-NN algorithm as a non-parametric classification method, forming the foundation of instance-based learning. |
| Cover and Hart (1967) | Provided theoretical analysis proving that the k-NN error rate is at most twice the Bayes error, establishing its consistency. |
| Devroye, Györfi and Lugosi (1996) | Developed a probabilistic framework for pattern recognition, including bounds on k-NN performance. |
| Toussaint (2005) | Explored geometric proximity graphs to improve k-NN efficiency by structuring data spatially. |
| Arya et al. (1998) | Introduced an optimal approximate nearest neighbor search algorithm for high-dimensional spaces. |
| Terrell and Scott (1992) | Discussed variable kernel density estimation, offering insights into adaptive density-based k-NN improvements. |
| Samworth (2012) | Proposed optimal weighted k-NN classifiers, improving performance in non-uniform distributions. |
| Bremner et al. (2005) | Developed output-sensitive algorithms for computing k-NN decision boundaries efficiently. |
| Ramaswamy et al. (2000) | Applied k-NN for outlier detection in large datasets, aiding in anomaly detection applications. |
| Cover and Thomas (1991) | Connected k-NN to information theory concepts such as entropy and mutual information. |
16.2.12.2 Recent Literature Review of k-Nearest Neighbors (KNN) Algorithm
| Author (Year) | Summary of Contributions |
| Alaca & Emin (2024) | Evaluates KNN within hybrid models for medical kidney image classification, comparing it with SVM and RF. Demonstrates how KNN performs in medical imaging tasks. |
| Chen, Hung & Yang (2025) | Proposes a Probability-Integrated Projection (PIP)-based KNN algorithm, improving classification accuracy for epidemic spread prediction through optimized distance metrics. |
| Liu et al. (2025) | Uses KNN for predicting bond strength of dental materials. Highlights KNN’s performance against kernel-based methods like SVM. |
| Barghouthi et al. (2025) | Develops a multi-channel fusion model using KNN for pressure injury prediction in hospitalized patients, integrating it with deep learning techniques. |
| Jewan (2025) | Examines KNN’s **effectiveness in remote sensing applications** for crop yield prediction using UAV images, comparing it with Decision Trees and RF. |
| Moldovanu et al. (2025) | Investigates how data corruption affects KNN’s accuracy, proposing feature transformation techniques to improve robustness. |
| HosseinpourFardi & Alizadeh (2025) | Introduces a hardware-accelerated KNN model for incremental learning, optimizing KNN’s performance in embedded systems. |
| Afrin et al. (2025) | Utilizes KNN to classify oil pipeline failure causes, demonstrating its effectiveness in industrial failure prediction. |
| Hussain et al. (2025) | Applies KNN to geospatial flood susceptibility prediction, comparing it with RF and XGBoost, showing its viability in disaster risk assessment. |
| Reddy & Murthy (2025) | Combines Particle Swarm Optimization (PSO) with KNN to enhance accuracy in cardiovascular disease prediction, demonstrating KNN’s adaptability in medical applications. |

16.2.12.3 Mathematical Analysis of k-Nearest Neighbors (KNN) Algorithm
16.2.13. Similarity Learning
16.2.13.1 Literature Review of Similarity Learning
| Reference | Contribution |
| Chen et al. (2009) | Established a mathematical foundation for similarity-based classification by systematically converting similarity measures into kernels and evaluating their performance in various learning scenarios. |
| Chechik et al. (2010) | Developed OASIS, an efficient online algorithm that learns a bilinear similarity function for large-scale image ranking, optimizing a margin-based criterion to enhance ranking performance. |
| Wang et al. (2013) | Proposed a similarity learning framework for content-based image retrieval (CBIR) that incorporates relative comparisons, aligning retrieval results with human perception of image similarity. |
| Kar and Jain (2011) | Introduced a similarity embedding framework that maps similarity functions into data-driven feature spaces, bridging the gap between similarity learning and traditional classification approaches. |
| Liu et al. (2011) | Addressed positive and unlabeled (PU) learning by leveraging similarity-based weighting of uncertain examples, improving classification performance when labeled data is scarce. |
| Zhang et al. (2024) | Conducted an extensive survey on deep learning techniques for similarity learning, covering applications in sequence modeling, graph-based learning, and high-dimensional data similarity computation. |
| Wikipedia Contributors | Documented theoretical aspects of semantic similarity, including classical methods such as node-based and edge-based similarity computations, and their applications in knowledge representation. |
| PingCAP (2024) | Explored NLP tools like Word2Vec and BERT for semantic similarity computation, analyzing trade-offs between computational complexity and accuracy in various language processing applications. |
| ResearchGate Contributors (2023) | Demonstrated the application of similarity-based scoring techniques in document retrieval and clustering, showcasing improvements in contextual understanding using deep learning models. |
| Co-citation Proximity Analysis | Investigated citation-based similarity measures, introducing co-citation proximity analysis to quantify the relationship between academic articles based on their citation network structures. |
16.2.13.2 Recent Literature Review of Similarity Learning
| Reference | Title | Contribution |
| Nanyonga et al. (2025) | Multi-Head Attention-Based Transformer Model for Predicting Causes in Aviation Incidents | Introduces a transformer-based similarity training approach to predict aviation incidents with a multi-head attention mechanism, achieving a similarity score of 0.697. |
| Fan & Chung (2025) | Integrating Image Processing Technology and Deep Learning to Identify Crops in UAV Orthoimages | Uses similarity training to classify crops in UAV images, leveraging RGB and vegetation indices (VARI) to improve accuracy. |
| Bakaev et al. (2025) | Who Will Author the Synthetic Texts? | Implements cosine similarity and Mahalanobis distance for evaluating synthetic text similarity, improving text coherence in LLMs. |
| Ahn et al. (2025) | Deep Learning-Based Automated Guide for Developmental Dysplasia of the Hip Screening | Employs the Dice Similarity Coefficient to optimize deep learning-based segmentation in medical imaging. |
| Peng et al. (2025) | Range and Bird’s Eye View Fused Cross-Modal Visual Place Recognition | Introduces similarity label supervision to refine visual place recognition through descriptor similarity search. |
| Zhao et al. (2025) | Privacy-Preserved Federated Clustering with Non-IID Data via GANs | Uses similarity-based clustering and GANs to enhance privacy-preserved federated learning. |
| Wang et al. (2025) | Accurate Genomic Prediction for Maize Hybrids Using Multi-Environment Data | Develops a similarity matrix (W matrix) for genomic prediction, improving maize yield and moisture content predictions. |
| Xu et al. (2025) | Medical Image Registration Meets Vision Foundation Model: Prototype Learning and Contour Awareness | Introduces similarity loss functions to optimize image registration in medical imaging applications. |
| Sun et al. (2025) | Idiosyncrasies in Large Language Models | Evaluates similarity training effects on LLM-generated text, analyzing divergence from human-authored content using ROUGE-1 similarity scores. |
| Liang et al. (2025) | NaturalL2S: High-Quality Multi-Speaker Lip-to-Speech Synthesis | Employs similarity-based training to enhance lip-to-speech synthesis, achieving high speaker similarity. |


16.2.13.3 Mathematical Analysis of Similarity Learning
16.3. Self Learning
16.3.1. Literature Review of Self Learning
| Reference | Contribution |
| Schmidhuber (1991) | Introduced curiosity-driven learning and intrinsic motivation for self-learning agents. |
| Sutton and Barto (1998) | Formalized reinforcement learning through MDPs, TD learning, and actor-critic architectures. |
| Silver et al. (2017) | Demonstrated self-play in AlphaZero, mastering games without human input using MCTS. |
| Bengio et al. (2009) | Introduced curriculum learning, simulating human cognitive progression in deep learning. |
| He et al. (2020) | Developed contrastive learning through Momentum Contrast (MoCo) for self-supervised learning. |
| Grill et al. (2020) | Introduced BYOL, proving self-supervised learning without negative pairs. |
| Hinton et al. (2006) | Pioneered deep belief networks (DBNs) for hierarchical self-learning representations. |
| Finn et al. (2017) | Formalized model-agnostic meta-learning (MAML) for few-shot learning. |
| Jaderberg et al. (2017) | Proposed auxiliary self-learning objectives in RL to enhance policy learning. |
| Dosovitskiy et al. (2020) | Developed Vision Transformers (ViTs), enabling self-learning representations in vision tasks. |
16.3.2. Recent Literature Review of Self Learning
| Paper | Contribution |
| Mousavi (2025) | Examines the ethical implications of self-aware AGI and whether it possesses moral standing. Utilizes fuzzy logic to assess AI consciousness, influencing AI governance debates. |
| Bjerregaard et al. (2025) | Demonstrates the application of self-supervised learning to structural biology, improving molecular structure prediction and advancing computational drug discovery. |
| Cui et al. (2025) | Develops a dual-level self-supervised learning model to enhance generalization in physics-based AI, particularly in interatomic potential modeling. |
| Jia et al. (2025) | Introduces a graph-based self-supervised learning model for molecular property prediction, utilizing retrosynthetic fragmentation to improve AI-driven drug design. |
| Hou (2025) | Investigates the psychological effects of AI-driven adaptive learning on self-esteem and academic mindfulness, highlighting AI’s role in personalized education. |
| Liu et al. (2025) | Proposes a reinforcement learning-based scheduling system for elective surgeries, optimizing hospital efficiency and reducing patient wait times. |
| Song et al. (2025) | Develops a deep self-supervised learning framework for anomaly detection in time-series data, improving AI applications in finance and industry. |
| Li et al. (2025) | Explores generative AI’s ability to provide real-time adaptive scaffolding for personalized self-regulated learning, enhancing online education strategies. |
| Chaudary et al. (2025) | Presents an EEG-based AI model for emotion recognition using self-learning algorithms, improving human-computer interaction and mental health monitoring. |
| Tautan et al. (2025) | Conducts a systematic review of unsupervised learning methods for epilepsy detection using EEG data, showcasing AI’s diagnostic potential in neurology. |
16.3.3. Mathematical Analysis of Self Learning
16.3.4. Python Code to Generate Figure 183



16.4. Reinforcement Learning
16.4.1. Literature Review of Reinforcement Learning
| Reference | Contribution |
| Bellman (1957) | Introduced Dynamic Programming, laying the mathematical foundation for reinforcement learning. Developed the Bellman equation, which enables recursive computation of optimal policies in Markov Decision Processes (MDPs). |
| Sutton and Barto (1998, 2018) | Formalized reinforcement learning as a computational framework. Introduced key concepts such as temporal difference (TD) learning, actor-critic methods, and policy evaluation. Their textbook serves as the primary resource for both theoretical and applied RL. |
| Watkins and Dayan (1992) | Developed Q-learning, a model-free off-policy algorithm that enables agents to learn optimal policies without requiring an explicit model of the environment. Provided proof of Q-learning’s convergence under certain conditions. |
| Mnih et al. (2015) | Introduced Deep Q Networks (DQN), combining deep learning with Q-learning to handle high-dimensional state spaces. Innovations include experience replay and target networks, leading to stable and sample-efficient training. Achieved superhuman performance on Atari games. |
| Silver et al. (2016) | Developed AlphaGo, which integrated Monte Carlo Tree Search (MCTS) with deep reinforcement learning. Demonstrated the ability to learn complex planning tasks with deep policy and value networks. Paved the way for AlphaZero, which generalized the approach to chess and shogi. |
| Konda and Tsitsiklis (2000) | Proposed actor-critic methods, separating policy learning (actor) from value estimation (critic). Their framework improved policy stability and inspired modern policy gradient methods such as Proximal Policy Optimization (PPO). |
| Schulman et al. (2017) | Developed Proximal Policy Optimization (PPO), a policy gradient method using a clipped objective function to stabilize training. PPO is widely used due to its balance between sample efficiency and simplicity. |
| Lillicrap et al. (2016) | Introduced Deep Deterministic Policy Gradient (DDPG), extending reinforcement learning to continuous action spaces. Utilized deterministic policies, target networks, and batch normalization to improve stability. |
| Haarnoja et al. (2018) | Developed Soft Actor-Critic (SAC), which incorporates entropy maximization for improved exploration and stability. SAC’s stochastic policy formulation and automatic entropy adjustment enhanced sample efficiency. |
| Levine et al. (2016) | Applied reinforcement learning to robotics using guided policy search. Combined RL with supervised learning to improve sample efficiency, demonstrating end-to-end learning from raw sensory inputs to control outputs. |
16.4.2. Recent Literature Review of Reinforcement Learning
| Authors, Year, and Source | Key Contribution |
| H. Shah (2025), ResearchGate | Explores the security vulnerabilities of reinforcement learning (RL) models to adversarial attacks and proposes mitigation techniques for model robustness. |
| B. Hengzhi, W. Haichao, H. Rongrong, et al. (2025), Chinese Journal of Aeronautics (Elsevier) | Uses multi-agent reinforcement learning (MARL) to optimize UAV relay covert communication, improving security and efficiency in real-time transmission. |
| R. Pan, Q. Yuan, G. Luo, B. Chen, et al. (2025), SSRN | Introduces a novel Markov Decision Process (MDP) graph-based approach to improve sample efficiency in multi-task reinforcement learning (MTRL). |
| G. Soman, M.V. Judy, A.M. Abou (2025), Cognitive Systems Research (Elsevier) | Applies reinforcement learning to enhance Retrieval-Augmented Generation (RAG) for AI-driven mental health support systems. |
| D.R.X. Oliveira, G.J.P. Moreira, A.R. Duarte (2025), Environmental and Ecological Statistics (Springer) | Develops RL-based spatial cluster detection techniques to improve geospatial data analysis beyond traditional statistical methods. |
| Z. Ajanovi, T. Gros, F. Den Hengst, et al. (2025), AAAI Conference on Artificial Intelligence (IBM Research) | Integrates AI planning techniques with RL to create more interpretable and structured decision-making models. |
| H. Chen, W. Guo, W. Bao, et al. (2025), Energy and Buildings (Elsevier) | Introduces an interpretable RL framework for energy management in smart buildings, ensuring transparency in decision-making. |
| H. Liu, D. Li, B. Zeng, Y. Xu (2025), Applied Intelligence (Springer) | Enhances multi-hop reasoning tasks by using RL to optimize knowledge graph reasoning efficiency. |
| W. Zhao, Y. Lv, K.M. Lee, et al. (2025), Computers and Industrial Engineering (Elsevier) | Uses reinforcement learning-enhanced LSTM models to improve predictive maintenance and fault detection in industrial systems. |
| G.A. Anwar, M.Z. Akber (2025), Computers and Structures (Elsevier) | Applies multi-agent RL to optimize structural resilience under extreme environmental conditions, improving infrastructure durability. |

16.4.3. Mathematical Analysis of Reinforcement Learning
16.4.4. Python Code to Generate Figure 185



16.4.5. Policy Gradient Method
16.4.5.1 Literature Review of Policy Gradient Method
| Reference | Contribution |
| Sutton et al. (1999) | Introduced policy gradient methods with function approximation, deriving an unbiased gradient estimator for direct policy optimization. Established the foundation for parameterized policies independent of value functions. |
| Kakade (2001) | Developed Natural Policy Gradient (NPG), which utilizes the Fisher information matrix to normalize gradient updates, making learning invariant to policy parameterization. Improved stability and efficiency in policy optimization. |
| Schulman et al. (2015) | Proposed Trust Region Policy Optimization (TRPO), a theoretically motivated approach that enforces a KL divergence constraint on policy updates, ensuring monotonic performance improvement and preventing catastrophic performance drops. |
| Schulman et al. (2017) | Introduced Proximal Policy Optimization (PPO), which simplifies TRPO by using a clipped surrogate objective, striking a balance between computational efficiency and stability, making it widely adopted in deep reinforcement learning. |
| Agarwal et al. (2021) | Provided theoretical analysis on policy gradient optimality, sample complexity, and performance under distribution shift, giving insights into when policy gradient methods effectively converge to near-optimal solutions. |
| Liu et al. (2024) | Conducted a rigorous theoretical study on projected and natural policy gradients in discounted Markov Decision Processes (MDPs), deriving convergence rates and properties of different policy optimization methods. |
| Lorberbom et al. (2020) | Developed Direct Policy Gradients (DirPG), an approach optimized for discrete action spaces that maximizes return-to-go trajectories, allowing integration of domain knowledge into policy learning. |
| McCracken et al. (2020) | Analyzed policy gradient methods in exactly solvable Partially Observable Markov Decision Processes (POMDPs), deriving analytical results on value distributions and probabilistic convergence behavior. |
| Lehmann (2024) | Provided a definitive theoretical guide to policy gradients in deep reinforcement learning, covering entropy regularization, KL divergence constraints, and their impact on sample efficiency and stability. |
| Sutton et. al. (2000) | Conducted a comparative study on policy-gradient algorithms, evaluating their theoretical convergence properties and empirical efficiency, guiding practitioners in selecting appropriate methods. |
16.4.5.2 Recent Literature Review of Policy Gradient Method
| Authors (Year) | Contribution |
| Mustafa et al. (2025) | Utilizes Proximal Policy Optimization (PPO) to optimize offloading decisions in vehicular communication networks, reducing latency and enhancing efficiency. |
| Huang et al. (2025) | Introduces the Knowledge Collaboration Actor-Critic Policy Gradient (KCACPG) method to optimize reinforcement learning in traffic management using knowledge transfer. |
| Yang et al. (2025) | Employs Deep Deterministic Policy Gradient (DDPG) within a hierarchical reinforcement learning framework to optimize vehicular resource allocation. |
| Jamshidiha et al. (2025) | Combines Graph Neural Networks (GNN) with DDPG to improve mobile user association in cellular networks through dynamic traffic-aware optimization. |
| Raei et al. (2025) | Develops a DDPG-based framework for robotic nonprehensile manipulation, enabling efficient object sliding with adaptive policy gradient optimization. |
| Ting-Ting et al. (2025) | Proposes a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to optimize UAV coordination under limited communication conditions. |
| Zhang et al. (2025) | Integrates neuromorphic computing with reinforcement learning to develop a Hybrid Deep Deterministic Policy Gradient (Neuro-HDDPG) for underwater robot navigation. |
| Nguyen et al. (2025) | Implements REINFORCE and RELAX policy gradient algorithms to improve text-to-SQL model fine-tuning with reward optimization. |
| Chathuranga Brahmanage et al. (2025) | Introduces a constraint-aware policy gradient algorithm that learns from violation signals, optimizing decision-making within strict operational constraints. |
| Li et al. (2025) | Proposes FedDDPG, a federated learning extension of Deep Deterministic Policy Gradient (DDPG) to optimize vehicle trajectory prediction while preserving data privacy. |
16.4.5.3 Mathematical Analysis of Policy Gradient Method
16.4.5.4 Python Code to Generate Figure 186



16.4.6. Deep Reinforcement Learning
16.4.6.1 Literature Review of Deep Reinforcement Learning
| Reference | Contribution |
| Mnih et al. (2015) | Introduced Deep Q-Learning for high-dimensional state spaces using deep neural networks. Proposed experience replay to break correlation among samples and target networks for stable Q-value updates. Enabled deep reinforcement learning for discrete action spaces. |
| Lillicrap et al. (2016) | Extended Q-learning to continuous action spaces using an actor-critic framework. Leveraged off-policy learning with target smoothing to improve convergence and stability in high-dimensional environments. |
| Schulman et al. (2015) | Developed a stable policy gradient method with constraints on policy divergence using a trust region approach. Ensured monotonic policy improvement and was particularly effective in robotic control tasks. |
| Schulman et al. (2017) | Simplified TRPO by introducing a clipped surrogate objective for stable and efficient policy updates. Achieved state-of-the-art performance while being computationally efficient and less sensitive to hyperparameters. |
| Haarnoja et al. (2018) | Introduced entropy-regularized reinforcement learning to encourage exploration. Optimized a stochastic policy while balancing reward accumulation and entropy maximization, improving sample efficiency and robustness. |
| Hessel et al. (2018) | Unified multiple improvements to DQN, including Double DQN, Prioritized Experience Replay, Dueling Networks, Multi-step Learning, Noisy Networks, and Distributional RL. Provided a comprehensive framework with enhanced sample efficiency and stability. |
| Silver et al. (2016) | Combined Monte Carlo Tree Search (MCTS) with deep policy and value networks to achieve superhuman performance in the game of Go. Demonstrated the power of planning-based reinforcement learning. |
| Levine et al. (2016) | Integrated deep learning with guided policy search for end-to-end robotic manipulation. Enabled robots to learn complex tasks directly from raw visual inputs. |
| Bellemare et al. (2017) | Proposed predicting the distribution of future rewards instead of the expected value. Led to improved stability and sample efficiency, forming the foundation for quantile-based RL methods such as QR-DQN and IQN. |
| Sutton and Barto (2018) | Provided a rigorous mathematical foundation for reinforcement learning. Covered fundamental topics such as value iteration, policy iteration, and temporal difference learning. Served as a definitive reference for both theoretical and practical RL research. |
16.4.6.2 Recent Literature Review of Deep Reinforcement Learning
| Authors (Year) | Contribution |
| K. Xue, L. Zhai, Y. Li, Z. Lu, W. Zhou (2025) | Proposes a DRL-based algorithm to optimize task offloading and caching in UAV-assisted MEC networks, addressing computational efficiency and resource allocation. |
| O.A. Amodu, R.A.R. Mahmood, H. Althumali (2025) | Provides a comprehensive review of DRL applications in UAV-based MEC and IoT systems, identifying challenges and proposing future research directions. |
| A. Silvestri, D. Coraci, S. Brandi, A. Capozzoli (2025) | Investigates imitation learning as an alternative to DRL for real-world building control systems to enhance energy efficiency while maintaining occupant comfort. |
| R. Sunder, U.K. Lilhore, A.K. Rai, E. Ghith, M. Tlija (2025) | Introduces SmartAPM, a DRL-based adaptive power management system that optimizes battery life based on environmental and user activity data. |
| L. Yan, Q. Wang, G. Hu, W. Chen, B.R. Noack (2025) | Develops a mutual information-based transfer learning approach for DRL in active flow control, reducing training costs and improving aerodynamic simulations. |
| E. Mustafa, J. Shuja, F. Rehman, A. Namoun (2025) | Proposes a PPO-based DRL framework for vehicular computation offloading, significantly improving latency and energy efficiency in intelligent transport systems. |
| S.A. Alajaji, R. Sabzian, Y. Wang, A.S. Sultan, R. Wang (2025) | Reviews the use of DRL in medical imaging, highlighting its potential in infrared spectroscopy-based cancer diagnosis for improved accuracy and early detection. |
| N. Ali, G. Wallace (2025) | Explores DRL applications in cybersecurity, particularly in autonomous cyber defense for SOCs, enhancing threat detection and response capabilities. |
| F.A. Sarigül, I. Bayezit (2025) | Applies DRL to fixed-wing aircraft heading control, improving maneuverability and autonomy over traditional control techniques. |
| R. Mukhamadiarov (2025) | Investigates the application of DRL for controlling stochastic dynamical systems by integrating reinforcement learning with control theory to optimize stability. |
16.4.6.3 Mathematical Analysis of Deep Reinforcement Learning
16.4.6.3.1 Python Code to Generate Figure 187




16.4.6.4 Advantage Actor-Critic (A2C)
16.4.6.4.1 Literature Review of Advantage Actor-Critic (A2C)
| Reference | Contribution |
| Mnih et al. (2016) | Introduced the Asynchronous Advantage Actor-Critic (A3C) algorithm, which inspired the synchronous A2C algorithm by stabilizing policy gradient methods through multi-threaded training. |
| Wang et al. (2022) | Developed RLS-based A2C variants that improve sample efficiency and learning stability in deep reinforcement learning environments. |
| Rubell Marion Lincy et al. (2023) | Applied A2C for stock trading using technical indicators, showing improved decision-making for buy and sell strategies. |
| Paczolay et al. (2020) | Adapted A2C for multi-agent scenarios, addressing coordination and competition among agents with improved training mechanisms. |
| Zhang et al. (2024) | Applied A2C for industrial optimization problems, demonstrating enhanced cycle time minimization in disassembly line balancing. |
| Kölle et al. (2024) | Explored quantum implementations of A2C, benchmarking QA2C and HA2C against classical A2C architectures for efficiency gains. |
| Benhamou (2019) | Provided a mathematical framework for reducing variance in A2C methods, optimizing control variate estimators for policy gradient algorithms. |
| Peng et al. (2018) | Introduced Adversarial A2C, incorporating a discriminator to enhance exploration in dialogue policy learning. |
| van Veldhuizen (2022) | Applied A2C for adaptive control in robotics, optimizing PID tuning for an apple-harvesting robot. |
16.4.6.4.2 Recent Literature Review of Advantage Actor-Critic (A2C)
| Authors and Year | Contribution and Key Findings |
| Wang and Liu (2025) | Developed a transformer-enhanced A2C model for portfolio optimization in petroleum futures trading. Their approach improves risk-sensitive decision-making by dynamically adjusting trading strategies, outperforming traditional risk-assessment methods. |
| Thongkairat and Yamaka (2025) | Applied A2C for automated stock trading, demonstrating its superiority over Deep Q-Networks (DQN) in handling volatile markets. The model enables more stable convergence and improved long-term returns through policy gradient optimization. |
| Dey and Ghosh (2025) | Proposed an A2C-based intrusion detection system for QUIC-based Denial of Service (DoS) attacks. Unlike static cybersecurity systems, their model adapts dynamically to evolving network threats, achieving higher detection accuracy. |
| Zhao et al. (2025) | Designed an A2C-based UAV trajectory optimization system, enabling drones to optimize real-time flight paths and task offloading. The approach minimizes energy consumption while improving autonomous navigation efficiency. |
| Mounesan et al. (2025) | Introduced Infer-EDGE, an A2C-driven deep learning inference optimization system. The model dynamically balances computational cost, latency, and resource allocation in edge-AI environments, enhancing performance efficiency. |
| Hou et al. (2025) | Developed a multi-agent cooperative A2C (MAC-A2C) framework for fuel cell degradation prediction in automotive applications. The model extends fuel cell lifespan and reduces operational costs by optimizing predictive maintenance. |
| Radaideh et al. (2025) | Applied asynchronous A2C algorithms to optimize neutron flux distribution in nuclear microreactors. Their model enhances reactor control strategies, reducing the risk of critical failures through reinforcement learning-based adaptation. |
| Li et al. (2025) | Compared A2C with Soft Actor-Critic (SAC) for edge computing task offloading. The results indicate that A2C’s on-policy learning approach provides greater stability and efficiency in resource-constrained cloud environments. |
| Khan et al. (2025) | Used A2C for optimizing route planning in supply chain logistics, particularly in dairy product distribution. Their reinforcement learning approach reduces delivery costs and dynamically adapts to traffic conditions for improved efficiency. |
| Yuan et al. (2025) | Developed a transformer-enhanced A2C (TR-A2C) framework to improve multi-user semantic communication in vehicle networks. The approach enhances data transmission efficiency and ensures better adaptation to real-time network changes. |
16.4.6.4.3 Mathematical Analysis of Advantage Actor-Critic (A2C)
16.4.6.4.4 Python Code to Generate Figure 188




16.4.6.5 Deep Deterministic Policy Gradient
16.4.6.5.1 Literature Review of Deep Deterministic Policy Gradient
| Authors & Year | Contribution & Key Improvements |
| Lillicrap et al. (2015) | Introduced Deep Deterministic Policy Gradient (DDPG), combining deterministic policy gradients with deep Q-learning in an actor-critic framework. Enabled off-policy learning for high-dimensional continuous action spaces, stabilized training using target networks and experience replay. |
| Silver et al. (2014) | Developed the deterministic policy gradient theorem, proving that deterministic policies yield lower variance gradients compared to stochastic policy gradients. Provided the theoretical foundation for DDPG, ensuring efficient learning in continuous control environments. |
| Cicek et al. (2021) | Proposed Batch Prioritized Experience Replay (BPER) to improve off-policy learning by prioritizing data points based on their likelihood of being generated by the current policy. This reduced policy divergence and improved sample efficiency. |
| Han et al. (2021) | Developed the Regularly Updated Deterministic (RUD) Policy Gradient Algorithm to mitigate Q-value overestimation bias and variance. Introduced a structured update mechanism that improved stability in policy learning. |
| Pan et al. (2020) | Introduced Softmax Deep Double Deterministic Policy Gradient (SD3), incorporating a Boltzmann softmax operator to smooth policy updates. This reduced overestimation errors in Q-learning and improved stability during training. |
| Luck et al. (2019) | Integrated latent trajectory optimization with DDPG to enhance exploration in sparse-reward environments. Improved exploration by leveraging model-based trajectory estimation, enabling better generalization in complex environments. |
| Dong et al. (2023) | Extended DDPG for robotic arm control by integrating adaptive reward shaping and improved experience replay mechanisms. Increased policy robustness and efficiency in real-world dexterous manipulation tasks. |
| Jesus et al. (2019) | Applied DDPG to autonomous mobile robot navigation, demonstrating its effectiveness in real-time obstacle avoidance. Showed that learning-based control could be achieved without explicit localization or map-based planning. |
| Lin et al. (2023) | Used DDPG for personalized medicine dosing strategies to optimize treatment decisions. Demonstrated reinforcement learning’s capability in healthcare applications by personalizing medical treatment plans based on patient-specific characteristics. |
| Sumalatha et al. (2024) | Provided a comprehensive survey of DDPG advancements, consolidating various algorithmic improvements and practical applications across multiple domains. Analyzed the evolution of DDPG and its role in real-world implementations. |
16.4.6.5.2 Recent Literature Review of Deep Deterministic Policy Gradient
| Study | Contribution |
| Yang et al. (2025) | Introduced a three-stage hierarchical optimization framework (3SHO) where DDPG optimizes resource allocation, improving efficiency in vehicular networks. |
| Jamshidiha and Pourahmadi (2025) | Employed DDPG within a traffic-aware graph neural network to optimize user association, enhancing network adaptability in cellular networks. |
| Tian et al. (2025) | Developed Multi-State Iteration DDPG (SIDDPG) to optimize task partitioning and reduce latency and energy consumption in Internet of Vehicles (IoV). |
| Saad et al. (2025) | Applied Twin Delayed DDPG (TD3) for edge server selection in 5G-enabled industrial applications, minimizing computation delays. |
| Deng et al. (2025) | Proposed a Multi-Agent DDPG (MADDPG) for satellite communication networks, improving distributed coordination in non-terrestrial networks. |
| Raei et al. (2025) | Designed a DDPG-based reinforcement learning framework for robotic manipulation through sliding, improving control efficiency. |
| Ting-Ting et al. (2025) | Developed MADDPG with inter-agent communication for UAV clusters under constrained communication environments. |
| Zhang et al. (2025) | Utilized TD3 to optimize multi-vehicle and multi-edge computing resource allocation, reducing system delays in vehicular edge computing. |
| Chen et al. (2025) | Integrated DDPG into UAV-based computation offloading to improve task scheduling and computational efficiency in cloud-edge collaborative computing. |
| Anwar and Akber (2025) | Applied MADDPG to enhance building resilience through multi-agent reinforcement learning, optimizing utility interactions in structural engineering. |
16.4.6.5.3 Mathematical Analysis of Deep Deterministic Policy Gradient
16.4.6.5.4 Python Code to Generate Figure 189
- Actor (policy network) outputs continuous actions.
- Critic (Q-network) evaluates .
- Target networks stabilize learning.
- Replay buffer stores transitions.
- Environment generates experience.




16.4.6.6 Proximal Policy Optimization
16.4.6.6.1 Literature Review of Proximal Policy Optimization
| Reference | Contribution |
| Schulman et al. (2017) | Introduced Proximal Policy Optimization (PPO), a first-order policy gradient method that balances stability and efficiency. PPO replaces the trust region constraint in TRPO with a clipped surrogate objective, simplifying implementation while maintaining strong empirical performance across various reinforcement learning tasks. |
| Huang and Dossa (2022) | Provided a comprehensive checklist of 37 crucial implementation details necessary for reproducing PPO’s reported performance. This work highlighted the impact of hyperparameter choices, architectural considerations, and training strategies on PPO’s consistency and reliability. |
| Zhang et al. (2023) | Proposed a dynamic clipping strategy where the clipping threshold adapts based on task feedback, ensuring optimal constraint selection for policy updates. This method improves training stability and sample efficiency while addressing PPO’s sensitivity to fixed clipping bounds. |
| Zhang et al. (2020) | Developed an exploration-enhancing variant of PPO that incorporates uncertainty estimation. Their approach improves sample efficiency by encouraging exploration in regions of high uncertainty, particularly in continuous control tasks. |
| Kobayashi (2022) | Introduced a threshold adaptation mechanism for PPO using the symmetric relative density ratio, leading to a more theoretically grounded method for selecting the clipping bound. This modification enhances stability and improves policy update efficiency. |
| Kobayashi (2020) | Formulated PPO using relative Pearson divergence, providing a principled policy divergence constraint that ensures smoother and more stable training updates, improving theoretical soundness. |
| Piao and Zhuo (2021) | Extended PPO to multi-agent reinforcement learning by introducing Coordinated Proximal Policy Optimization (CoPPO), which adapts step sizes dynamically for multiple interacting agents. This approach improves training stability and performance in decentralized multi-agent environments. |
| Zhang and Wang (2020) | Proposed a dynamic clipping bound method that adjusts the threshold throughout training, further refining the balance between exploration and exploitation in PPO-based learning. |
| Wang et al. (2019) | Introduced Truly Proximal Policy Optimization (TPPO), which explicitly incorporates a trust region constraint into PPO’s objective function, leading to more reliable and theoretically grounded policy updates. |
| Henderson et al. (2018) | Critically examined the reproducibility of deep reinforcement learning algorithms, including PPO. The study demonstrated the sensitivity of performance metrics to hyperparameter tuning and implementation details, emphasizing the need for rigorous benchmarking and standardization in reinforcement learning research. |
16.4.6.6.2 Recent Literature Review of Proximal Policy Optimization
| Author(s) & Year | Contribution |
| Cuéllar et al., 2024 | Uses PPO for optimizing interplanetary trajectory design, ensuring efficient orbital maneuvers with fuel management optimization. |
| Guan et al., 2025 | Integrates PPO with a transformer-based attention mechanism for vehicle routing, improving convergence and efficiency. |
| Wu & Xie, 2025 | Proposes PPO for optimizing HVAC energy efficiency in smart buildings, significantly reducing carbon footprint. |
| Chandrasiri & Meedeniya, 2025 | Combines PPO with Graph Neural Networks to optimize cloud workflow scheduling, reducing latency and energy use. |
| Liu et al., 2025 | Employs PPO to optimize real-time navigation for underwater robots, adapting to environmental disturbances. |
| Mustafa et al., 2025 | Uses PPO for dynamic task offloading in edge-assisted vehicular networks, enhancing real-time resource management. |
| Figueroa et al., 2025 | Applies PPO for humanoid robot gait adaptation, enabling robots to walk efficiently across different terrains. |
| Xu et al., 2025 | Develops a PPO-based Lyapunov-guided model for optimizing resource allocation in cognitive radio networks. |
| Bukhari et al., 2025 | Integrates PPO with causal learning for natural language-based robotic control, improving adaptability in robotics. |
| Dai et al., 2025 | Utilizes PPO with control barrier functions for autonomous vehicle trajectory planning, ensuring safe and efficient merging. |
16.4.6.6.3 Mathematical Analysis of Proximal Policy Optimization
16.4.6.6.4 Python Code to Generate Figure 190
- Environment → generates transitions Policy (Actor) → produces actions from states.
- Value Function (Critic) → estimates state values
- Advantage Estimator → computes using rewards and values.
- Clipped Objective → updates the policy with constraints to prevent large steps.
- Replay/Trajectory Buffer → stores collected rollouts.




16.4.6.7 Soft Actor-Critic (SAC) Algorithm
16.4.6.7.1 Literature Review of Soft Actor-Critic (SAC) Algorithm
| Authors (Year) | Contribution |
| Haarnoja et al. (2018) | Introduced Soft Actor-Critic (SAC), integrating the maximum entropy principle into reinforcement learning to enhance sample efficiency, stability, and policy robustness. SAC introduced an entropy-regularized objective, encouraging high-entropy policies for better exploration and generalization. |
| Haarnoja et al. (2019) | Developed an automatic temperature tuning mechanism for SAC, dynamically adjusting the entropy coefficient to balance exploration and exploitation. This improvement alleviates the need for manual hyperparameter tuning, leading to more adaptive learning. |
| Wu et al. (2023) | Applied SAC to LiDAR-based robot navigation, demonstrating its effectiveness in dynamic obstacle avoidance. The study showed improved training efficiency and higher navigation success rates, making SAC more viable for real-world robotic applications. |
| Hossain et al. (2022) | Proposed an inhibitory network-based modification to SAC for UAV control, enhancing adaptability and retraining speed in fast-changing environments. This modification allows SAC to better handle non-stationary reinforcement learning settings. |
| Ishfaq et al. (2025) | Introduced Langevin Soft Actor-Critic (LSAC), combining SAC with Thompson sampling and distributional Langevin Monte Carlo updates to improve uncertainty estimation and exploration efficiency. This modification leads to more robust decision-making under uncertainty. |
| Verma et al. (2023) | Developed the Soft Actor Retrospective Critic (SARC), incorporating a retrospective loss term in the critic network to accelerate convergence and stabilize training dynamics, improving sample efficiency in high-dimensional problems. |
| Tasdighi et al. (2023) | Introduced PAC-Bayesian Soft Actor-Critic, integrating a PAC-Bayesian objective into SAC’s critic network to reduce uncertainty and improve sample efficiency. This approach provides theoretical guarantees on policy performance. |
| Duan (2021) | Proposed Distributional Soft Actor-Critic (DSAC), learning a Gaussian distribution over stochastic returns to mitigate value overestimation errors in standard SAC implementations. This leads to more reliable policy learning in continuous control tasks. |
| Papers with Code Analysis | Provided a comprehensive breakdown of SAC’s theoretical foundations, benchmarking its performance across various continuous control environments, and highlighting its superiority over traditional policy optimization methods. |
| Haarnoja et al. (Various Analyses) | Conducted in-depth studies on SAC’s ability to handle stochasticity, demonstrating its advantages in learning multi-modal policy distributions while maintaining computational efficiency. |
16.4.6.7.2 Recent Literature Review of Soft Actor-Critic (SAC) Algorithm
| Authors (Year) | Contribution |
| Ewers et al. (2025) | Implement SAC and PPO in a recurrent autoencoder-based reinforcement learning framework for search and rescue operations, demonstrating SAC’s superior performance in dynamic and uncertain environments. |
| Yan et al. (2025) | Develop a mutual information-based knowledge transfer learning (MIKT-SAC) method to enhance SAC’s generalization across domains, improving active flow control in bluff body flows. |
| Asmat et al. (2025) | Propose a Digital Twin (DT) framework integrated with SAC to enable intelligent cyber-physical systems, facilitating the transition from Industry 4.0 to Industry 5.0. |
| Chao & Jiao (2025) | Apply SAC for network spectrum resource allocation, optimizing spectrum usage in dynamic wireless environments to enhance telecommunications efficiency. |
| Ma et al. (2025) | Introduce SIE-SAC, a novel reinforcement learning mechanism for UAV navigation in adversarial conditions, particularly in GPS/INS-integrated spoofing scenarios. |
| Walia et al. (2025) | Combine SAC with causal generative adversarial networks (GANs) and large language models (LLMs) to improve bond yield predictions and enhance financial forecasting. |
| Lalor & Swishchuk (2025) | Extend SAC to non-Markovian market-making, demonstrating its effectiveness in handling long-term dependencies and stochastic pricing models. |
| Zhang et al. (2025) | Propose a diffusion-based SAC framework for multi-UAV networks in the Metaverse, optimizing cooperative task allocation and resource distribution. |
| Zhao et al. (2025) | Utilize SAC for energy management in hybrid storage systems for urban rail transit, optimizing power distribution in traction power supply systems. |
| Tresca et al. (2025) | Apply SAC to develop adaptive energy management strategies for hybrid electric vehicles, enhancing fuel efficiency and battery longevity through dynamic energy consumption adjustments. |
16.4.6.7.3 Mathematical Analysis of Soft Actor-Critic (SAC) Algorithm
16.4.6.7.4 Python Code to Generate Figure 191
- Actor (Policy Network) → outputs actions.
- Critics (Q-networks) → estimate Q-values.
- Target networks → stabilize learning.
- Entropy term → adds exploration via temperature parameter
- Replay buffer → stores experiences.




16.5. Neuroevolution
16.5.1. Python Code to Generate Figure 192
- Population of neural networks (at a given generation).
- Evolutionary process: selection → crossover/mutation → next generation.
- A population of small neural networks.
- Arrows showing how parents produce offspring.




16.5.2. Neuro-Genetic Evolution
16.5.2.1 Literature Review of Neuro-Genetic Evolution
| Authors (Year) | Contribution |
| Stanley & Miikkulainen (2002) | Introduced NeuroEvolution of Augmenting Topologies (NEAT), a genetic algorithm that evolves both the structure and weights of neural networks. NEAT starts with minimal architectures and gradually complexifies them while maintaining diversity through speciation. |
| Stanley et al. (2005) | Extended the NEAT framework, demonstrating its ability to evolve increasingly complex and high-performing neural architectures through systematic augmentation. |
| Gauci & Stanley (2007) | Developed HyperNEAT, which leverages compositional pattern-producing networks (CPPNs) to encode connectivity patterns, enabling the evolution of large-scale networks with inherent symmetries and regularities. |
| Kassahun & Sommer (2005) | Proposed Evolutionary Acquisition of Neural Topologies (EANT), where neural networks start simple and gain complexity over time, reducing computational costs while improving network efficiency. |
| Miikkulainen et al. (2024) | Introduced CoDeepNEAT, an extension of NEAT that optimizes deep learning architectures by evolving topology, hyperparameters, and components, achieving performance comparable to human-designed networks. |
| Liang et al. (2019) | Developed the Learning Evolutionary AI Framework (LEAF), which applies evolutionary algorithms to optimize both neural architectures and hyperparameters, achieving state-of-the-art results in medical imaging and natural language processing. |
| Vargas & Murata (2016) | Proposed Spectrum-Diverse Neuroevolution, a method that preserves diversity at the behavioral level to enhance the robustness of evolved networks. |
| Such et al. (2017) | Demonstrated that simple genetic algorithms can rival traditional gradient-based approaches such as Q-learning and policy gradients in reinforcement learning tasks, particularly in sparse reward environments. |
| Assunção et al. (2021) | Developed Fast-DENSER, which utilizes grammatical evolution to efficiently search for deep neural network architectures while reducing computational overhead. |
| Rempis (2012) | Introduced Interactively Constrained Neuro-Evolution (ICONE), incorporating constraint masks to restrict the search space and evolve specialized neural controllers with domain-specific knowledge. |
16.5.2.2 Recent Literature Review of Neuro-Genetic Evolution
| Authors (Year) | Contribution |
| Stanley et al. (2019) | Pioneered NeuroEvolution of Augmenting Topologies (NEAT), enabling artificial neural networks to evolve in structure and parameters, outperforming traditional gradient-based methods. |
| Bertens and Lee (2019) | Explored synergies between neural networks and evolutionary algorithms, particularly in biological neural modeling and robotics, showcasing adaptability improvements in AI through nature-inspired selection. |
| Wang et al. (2023) | Demonstrated genetic evolution in feature selection for bioinformatics, improving genomic data classification and disease prediction accuracy through evolutionary deep learning. |
| Pagliuca et al. (2020) | Proposed a neuro-evolutionary approach for robotic movement refinement via self-adaptive evolutionary deep learning, enhancing efficiency in industrial automation and autonomous decision-making. |
| Behjat et al. (2019) | Introduced AGENT, a neuro-evolution framework evolving both topology and weights of neural networks to prevent premature convergence and stagnation, applied to UAV collision avoidance. |
| Ahmed et al. (2023) | Developed a genetic reinforcement learning mechanism to optimize Deep Q-networks for real-time applications such as self-driving cars and robotic control. |
| Miikkulainen et al. (2023) | Applied evolutionary strategies for hyperparameter tuning in deep learning, achieving a 40% improvement in training speed and accuracy compared to traditional methods. |
| Kannan et al. (2024) | Designed a neuro-genetic deep learning framework for IoT security, enabling adaptive, self-learning anomaly detection systems for RPL attack detection. |
| Zeng et al. (2022) | Developed a hybrid genetic-deep learning model for financial forecasting, demonstrating increased accuracy and robustness in predicting volatile market trends. |
| S KV and Swamy (2024) | Explored ensemble-based neuro-genetic models for software quality improvement, refining feature selection and defect prediction using genetic evolution strategies. |
16.5.2.3 Mathematical Analysis of Neuro-Genetic Evolution
16.5.2.4 Python Code to Generate Figure 193
- Each individual is a neural network (nodes + weighted connections).
- A population of these networks exists at each generation.
- Crossover and mutation create new networks.
- Selection propagates the fittest.
- Population at one generation → networks side by side.
- Evolutionary flow → how parents produce offspring.




16.5.3. Cellular Encoding (CE)
16.5.3.1 Literature Review of Cellular Encoding (CE)
| Authors (Year) | Contribution |
| Gruau (1993) | Introduced Cellular Encoding (CE) as a graph grammar-based approach to representing neural networks. Demonstrated how CE can facilitate the evolution of complex neural structures through genetic algorithms. |
| Gruau (1996) | Compared Cellular Encoding with direct encoding methods, proving CE’s superiority in producing compact, efficient, and generalizable neural networks. Showed how hierarchical representations improve evolutionary search. |
| Gruau & Whitley (1993) | Applied Cellular Encoding to neurocontrol problems, illustrating its ability to evolve adaptive and robust neural controllers for dynamic systems. |
| Gutierrez et. al. (2004) | Investigated the capacity of CE to generate diverse feedforward neural network topologies, providing empirical evidence of its flexibility and scalability. |
| Zhang & Muhlenbein (1993) | Explored the application of CE with genetic algorithms under Occam’s Razor, demonstrating how CE can evolve minimal yet high-performing neural architectures. |
| Kitano (1990) | Introduced a graph generation system for evolving neural networks using genetic algorithms, laying a theoretical foundation for Cellular Encoding approaches. |
| Miller & Turner (2015) and Miller (2020) | Developed Cartesian Genetic Programming (CGP), a related graph-based encoding approach that influenced CE by reinforcing modular and reusable neural components. |
| Stanley & Miikkulainen (2002) | Proposed the NEAT algorithm, which evolves neural networks through progressive augmentation of topologies. Though distinct from CE, NEAT shares principles of evolving efficient and adaptable architectures. |
| Hernandez Ruiz et al. (2021) | Extended Cellular Encoding to Neural Cellular Automata (NCA), demonstrating its application in generative models and image synthesis using convolutional mechanisms. |
| Hajij, Istvan, & Zamzmi (2020) | Introduced Cell Complex Neural Networks (CXNs), generalizing message-passing schemes to higher-dimensional structures, providing a new theoretical framework for encoding neural computations. |
16.5.3.2 Recent Literature Review of Cellular Encoding (CE)
| Authors (Year) | Contribution |
| Sun et al. (2025) | Investigated how learning transforms hippocampal neural activity into an orthogonalized state machine, demonstrating how structured encoding optimizes memory retrieval and spatial navigation. |
| Hu et al. (2025) | Developed an ensemble deep learning framework for long non-coding RNA (lncRNA) subcellular localization, showcasing how cellular encoding enhances gene regulation analysis. |
| Guan et al. (2025) | Proposed a graph neural structure encoding method for semantic segmentation of nuclei in pathological tissues, enabling improved cellular structure identification in biomedical imaging. |
| Ghosh et al. (2025) | Designed a deep learning-based transcription factor binding site predictor using DNABERT and convolutional neural networks (CNNs) to extract and encode DNA sequence motifs. |
| Sun et al. (2025) | Introduced a perturbation proteomics-based virtual cell model, integrating protein interaction networks with deep learning to simulate cellular responses under environmental and pharmacological perturbations. |
| Grosjean et al. (2025) | Developed a self-supervised learning approach for detecting genetic modifiers of neuronal activity, using a network-aware encoding strategy to enhance high-content phenotypic screening. |
| de Carvalho et al. (2025) | Conducted a gene network analysis on autism spectrum disorder (ASD), encoding synaptic and cellular alterations linked to transcription factor mutations affecting neuronal communication. |
| Gonzalez et al. (2025) | Created an in vivo single-cell electroporation tool, enabling real-time encoding of hippocampal neurons through genetically encoded calcium indicators and voltage sensors. |
| Sprecher (2025) | Investigated how neural networks encode and regulate brain-wide connectivity, providing a computational model for synaptic excitability and neural dynamics. |
| Li et al. (2025) | Examined non-neuronal contributions to neural encoding, revealing that glial cells and extracellular matrix components play an active role in shaping encoded neural signals. |
16.5.3.3 Mathematical Analysis of Cellular Encoding (CE)
16.5.3.4 Python Code to Generate Figure 194 and Figure 195
- A derivation tree (the CE program).
- The neural network produced after applying the developmental rules.





16.5.4. GeNeralized Acquisition of Recurrent Links (GNARL)
16.5.4.1 Literature Review of GeNeralized Acquisition of Recurrent Links (GNARL)
| Authors (Year) | Contribution |
| Saunders, Angeline, and Pollack (1993) | Introduced GNARL as an evolutionary algorithm for evolving both the topology and weights of recurrent neural networks (RNNs). Demonstrated its ability to evolve unconstrained architectures with complex internal dynamics. |
| Angeline, Saunders, and Pollack (1994) | Further refined GNARL by comparing its efficiency with traditional methods such as genetic algorithms and gradient-based approaches. Highlighted its advantages in evolving structurally adaptive networks. |
| Stanley and Miikkulainen (2002) | Developed NEAT, an evolutionary algorithm for augmenting neural network topologies. Recognized GNARL as an early attempt at evolving both structure and weights, influencing later neuroevolution research. |
| Schmidhuber (1996) | Discussed GNARL in the context of self-improving AI systems. Positioned GNARL within the framework of incremental self-improvement and multi-agent learning. |
| Yao (1999) | Provided a comprehensive review of evolutionary artificial neural networks, citing GNARL as a foundational approach for evolving both network structure and parameters. Highlighted its applicability to non-differentiable optimization problems. |
| Floreano, Dürr, and Mattiussi (2008) | Examined GNARL’s role in the historical development of neuroevolutionary methods. Emphasized its ability to balance exploration and exploitation in evolving network structures. |
| Gomez and Miikkulainen (1999) | Applied GNARL-inspired neuroevolution techniques to solve non-Markovian control tasks. Showed the importance of evolving memory-capable recurrent networks for reinforcement learning. |
| Kassahun and Sommer (2005) | Investigated reinforcement learning through evolutionary neural network optimization. Referenced GNARL as a key precursor to modern neuroevolutionary strategies. |
| Moriarty and Miikkulainen (1996) | Explored reinforcement learning using symbiotic evolution. Built on GNARL’s principles to demonstrate how co-evolutionary strategies enhance learning in neural networks. |
| Gomez and Miikkulainen (1997) | Extended GNARL-based ideas to incremental evolution of complex behaviors. Emphasized the importance of evolving modular and hierarchical structures for adaptive learning. |
16.5.4.2 Mathematical Analysis of GeNeralized Acquisition of Recurrent Links (GNARL)
16.5.4.3 Python Code to Generate Figure 196
- Input, hidden, and output layers.
- Recurrent links (self-loops or feedback connections).
- Weighted edges (blue = positive, red = negative, width proportional to magnitude).




16.5.5. Neuroevolution of Augmenting Topologies (NEAT)
16.5.5.1 Python Code to Generate Figure 197
- Input, hidden, and output nodes.
- Edges with weights (positive = blue, negative = red).
- Disabled connections (often drawn as dashed or faded).
- Labels showing node IDs.



16.5.6. Hypercube-Based NeuroEvolution of Augmenting Topologies (HyperNEAT)
16.5.6.1 Python Code to Generate Figure 198
- Nodes are in a 2D (or 3D) substrate.
- Connections are determined by the CPPN, usually sparse but patterned.
-
Visualization should highlight:
- (a)
- Node positions (substrate coordinates)
- (b)
- Connection weights (width proportional, color by sign)



16.5.7. Evolvable Substrate Hypercube-Based NeuroEvolution of Augmenting Topologies (ES-HyperNEAT)
16.5.7.1 Python Code to Generate Figure 199
- Nodes are arranged in a geometric substrate (usually 2D or 3D).
- Connections are created based on the CPPN output across node coordinates.
- Visualizations typically show:
- Node positions in substrate (x, y).
- Weighted connections (colored by sign, thickness by magnitude).





16.5.8. Evolutionary Acquisition of Neural Topologies (EANT/EANT2)
16.5.8.1 Python Code to Generate Figure 200
- Input layer, hidden nodes, output layer.
- Evolved connections (weighted, possibly sparse).
- Mutations such as added nodes or connections.
- Generation info (to highlight evolutionary growth).


16.5.9. Interactively Constrained Neuro-Evolution (ICONE)
16.5.9.1 Python Code to Generate Figure 201
- The user (human-in-the-loop) constrains evolution by enforcing rules (e.g., disallow certain connections, force modularity, or bias toward certain structures).
- Architectures evolve under both evolutionary pressure and interactive constraints.
-
Visualization should therefore highlight:
- (a)
- Neurons and their types.
- (b)
- Connections (weights, enabled/disabled).
- (c)
- Constraints applied (e.g., disallowed connections, frozen neurons).





16.5.10. Deus Ex Neural Network (DXNN)
- Initialization: A population of neural networks is initialized with random topologies and weights. Each network’s topology can be represented as a graph , where V denotes neurons and E denotes synaptic connections.
- Fitness Evaluation: Each network i in the population is evaluated based on a fitness function , which measures its performance on a given task. The fitness function could be defined as:where N is the number of samples, is the true output, is the network’s output, and L is a loss function, such as mean squared error:
- Selection: Networks are selected for reproduction based on their fitness scores. A common selection method is tournament selection, where a subset of networks is chosen, and the one with the highest fitness is selected for reproduction.
- Crossover (Recombination): Pairs of selected networks undergo crossover to produce offspring. This involves combining the topologies and weights of parent networks. For example, given parent networks with weight matrices and , an offspring’s weight matrix could be:where is a crossover coefficient.
- Mutation: Offspring networks undergo mutations to introduce variability. Mutations can affect both the topology and weights. For weight mutation:where is a perturbation matrix, often sampled from a normal distribution:For topology mutation, connections can be added or removed based on a probability or .
- Local Optimization (Memetic Component): After mutation, local search methods, such as gradient-based optimization, are applied to fine-tune the weights of the offspring networks. This involves minimizing the loss function L with respect to the weights W:where is the learning rate, and is the gradient of the loss function with respect to the weights.
- Replacement: The new generation of networks replaces the old population, and the process repeats from the fitness evaluation step until a termination criterion is met, such as a predefined number of generations or a satisfactory fitness level.
16.5.10.1 Python Code to Generate Figure 202
- Neurons form Cores (like sub-networks).
- Cores are grouped into a DXNN organism.
- The Core graph (inter-core connectivity).
- Inside each core, the Neuron graph (intra-core connectivity).
- Cores are grouped into a DXNN organism.





16.5.11. Spectrum-Diverse Unified Neuroevolution Architecture (SUNA)
16.5.11.1 Python Code to Generate Figure 203






17. Training Neural Networks
17.1. Literature Review of Training Neural Networks
17.2. Backpropagation Algorithm
17.2.0.1 Python Code to Generate Figure 204



17.3. Gradient Descent Variants
17.3.1. SGD (Stochastic Gradient Descent) Optimizer
17.3.1.1 Literature Review of SGD (Stochastic Gradient Descent) Optimizer
17.3.1.2 Analysis of SGD (Stochastic Gradient Descent) Optimizer
17.3.1.3 Python Code to Generate Figure 205



17.3.2. Nesterov Accelerated Gradient Descent (NAG)
17.3.2.1 Literature Review of Nesterov Accelerated Gradient Descent (NAG)
17.3.2.2 Analysis of Nesterov Accelerated Gradient Descent (NAG)
- Look-Ahead Gradient Computation: By computing instead of , NAG effectively anticipates the next move, leading to improved convergence rates.
- Adaptive Step Size: The effective step size is modified dynamically, stabilizing the trajectory.
- Choice of : Optimal momentum is .
- Adaptive Learning Rate: Choosing ensures convergence.
17.3.2.3 Python Code to Generate Figure 206



17.3.3. Adam (Adaptive Moment Estimation) Optimizer
17.3.3.1 Literature Review of Adam (Adaptive Moment Estimation) Optimizer
17.3.3.2 Analysis of Adam (Adaptive Moment Estimation) Optimizer
17.3.3.3 Python Code to Generate Figure 207



17.3.4. RMSProp (Root Mean Squared Propagation) Optimizer
17.3.4.1 Literature Review of RMSProp (Root Mean Squared Propagation) Optimizer
17.3.4.2 Analysis of RMSProp (Root Mean Squared Propagation) Optimizer
- is a biased estimator of for finite t, but unbiased in the limit.
- converges to in expectation, variance, and almost surely.
- This ensures stable and adaptive learning rates in RMSprop.
- Without Bias Correction: If is large in early iterations, then:Since , the denominator in is too small, leading to excessively large steps, causing instability.
- With Bias Correction: Since , we ensure that:resulting in stable step sizes and improved convergence.
- Bias correction ensures , removing underestimation.
- Almost sure convergence guarantees asymptotically stable second-moment estimation.
- Stable step sizes prevent instability in early iterations.
17.3.4.3 Python Code to Generate Figure 208
- RMSProp keeps an exponentially decaying average of squared gradients.
- Each update is scaled by , which adaptively adjusts the learning rate per dimension.
- This stabilizes training and avoids oscillations.
- The trajectory will show smoother convergence compared to plain SGD.



17.4. Overfitting and Regularization Techniques
17.4.1. Literature Review of Overfitting and Regularization Techniques
17.4.2. Analysis of Overfitting and Regularization Techniques
17.4.3. Dropout
17.4.3.1 Literature Review of Dropout
17.4.3.2 Analysis of Dropout
17.4.3.3 Python Code to Generate Figure 209
- Without Dropout → tends to overfit.
- With Dropout → smoother decision boundary, better generalization.



17.4.4. L1/L2 Regularization and Overfitting
17.4.4.1 Literature Review of L1 (Lasso) Regularization
17.4.4.2 Literature Review of L2 (Ridge Regression) Regularization
17.4.4.3 Analysis of L1/L2 Regularization and Overfitting
17.4.5. Elastic Net Regularization
17.4.5.1 Literature Review of Elastic Net Regularization
17.4.5.2 Analysis of Elastic Net Regularization
17.4.6. Early Stopping
17.4.6.1 Literature Review of Early Stopping
17.4.6.2 Analysis of Early Stopping
17.4.7. Data Augmentation
17.4.7.1 Literature Review of Data Augmentation
17.4.7.2 Analysis of Data Augmentation
17.4.8. Cross-Validation
17.4.8.1 Literature Review of Cross-Validation
17.4.8.2 Analysis of Cross-Validation
17.4.9. Pruning
17.4.9.1 Literature Review of Pruning
17.4.9.2 Analysis of Pruning
17.4.10. Ensemble Methods
17.4.10.1 Literature Review of Ensemble Methods
17.4.10.2 Analysis of Ensemble Methods
17.4.11. Noise Injection
17.4.11.1 Literature Review of Noise Injection
17.4.11.2 Analysis of Noise Injection
17.4.12. Batch Normalization
17.4.12.1 Literature Review of Batch Normalization
17.4.12.2 Analysis of Batch Normalization
17.4.13. Weight Decay
17.4.13.1 Literature Review of Weight Decay
17.4.13.2 Analysis of Weight Decay
17.4.14. Max Norm Constraints
17.4.14.1 Literature Review of Max Norm Constraints
17.4.14.2 Analysis of Max Norm Constraints
17.4.15. Transfer Learning
17.4.15.1 Literature Review of Transfer Learning
17.4.15.2 Analysis of Transfer Learning
- Under-regularization: Low bias, high variance ⇒ overfitting.
- Over-regularization: High bias, low variance ⇒ underfitting.
17.5. Hyperparameter Tuning
17.5.1. Literature Review of Hyperparameter Tuning
17.5.2. Analysis of Hyperparameter Tuning
17.5.3. Grid Search
17.5.3.1 Literature Review of Grid Search
17.5.3.2 Analysis of Grid Search
- Guaranteed to find the best combination within the search space.
- Easy to implement and parallelize.
- Computationally expensive, especially for high-dimensional hyperparameter spaces.
- Inefficient if some hyperparameters have little impact on performance.
17.5.4. Random Search
17.5.4.1 Literature Review of Random Search
17.5.4.2 Analysis of Random Search
- More efficient than grid search, especially when some hyperparameters are less important.
- Can explore a larger search space with fewer evaluations.
- No guarantee of finding the optimal hyperparameters.
- May still require many iterations for high-dimensional spaces.
17.5.5. Bayesian Optimization
17.5.5.1 Literature Review of Bayesian Optimization
17.5.5.2 Analysis of Bayesian Optimization
- Efficient and requires fewer evaluations compared to grid/random search.
- Balances exploration (trying new regions) and exploitation (focusing on promising regions).
- Computationally expensive to build and update the surrogate model.
- May struggle with high-dimensional spaces or noisy objective functions.
17.5.6. Genetic Algorithms
17.5.6.1 Literature Review of Genetic Algorithms
17.5.6.2 Analysis of Genetic Algorithms
- Can explore a wide range of hyperparameter combinations.
- Suitable for non-differentiable or discontinuous objective functions.
- Computationally expensive and slow to converge.
- Requires careful tuning of mutation and crossover parameters.
17.5.7. Hyperband
17.5.7.1 Literature Review of Hyperband
17.5.7.2 Analysis of Hyperband
- is a black-box function with no known analytical form.
- Evaluating with a budget b (e.g., number of epochs, dataset size) yields an approximation , where as , and R is the maximum budget.
- Start with n configurations and allocate a small budget b to each.
- Evaluate all configurations and keep the top fraction.
- Increase the budget by a factor of and repeat until one configuration remains.
- Allocate budget to each configuration.
- Evaluate for all j.
- Keep the top configurations based on .
- For small s, it explores many configurations with small budgets.
- For large s, it exploits fewer configurations with large budgets.
- Near-Optimality: The best configuration found by HyperBand converges to as .
- Logarithmic Scaling: The total cost scales logarithmically with the number of configurations.
- Large-Scale Optimization: It scales to high-dimensional hyperparameter spaces.
- Parallelization: Configurations can be evaluated independently, enabling distributed computation.
- Adaptability: It works for both continuous and discrete hyperparameter spaces.
17.5.8. Gradient-Based Optimization
17.5.8.1 Literature Review of Gradient-Based Optimization
17.5.8.2 Analysis of Gradient-Based Optimization
- Hypothesis space: as a Banach space equipped with norm .
- Parameter space: , where is a closed, convex subset of .
17.5.9. Population-Based Training (PBT)
17.5.9.1 Literature Review of Population-Based Training (PBT)
17.5.9.2 Analysis of Population-Based Training (PBT)
- represents the model parameters, with d being the dimensionality of the model parameter space.
- represents the hyperparameters of the i-th model, with m being the dimensionality of the hyperparameter space . The set is a bounded subset of the positive real numbers, such as learning rates, batch sizes, or regularization factors.
-
At each iteration t, we perform:
- –
- N forward passes to compute the losses .
- –
- N selection and mutation operations for updating the population.
- This leads to a time complexity of per iteration.
17.5.10. Optuna
17.5.10.1 Literature Review of Optuna
17.5.10.2 Analysis of Optuna
17.5.11. Successive Halving
17.5.11.1 Literature Review of Successive Halving
17.5.11.2 Analysis of Successive Halving
17.5.12. Reinforcement Learning (RL)
17.5.12.1 Literature Review of Reinforcement Learning (RL)
17.5.12.2 Analysis of Reinforcement Learning (RL)
- State Space (S): The state encodes the current hyperparameter configuration , the history of performance metrics, and any other relevant information (e.g., computational resources used).
- Action Space (A): The action represents a perturbation to the hyperparameters, such that:
- Transition Dynamics (P): The transition probability describes the stochastic evolution of the state. This includes the effect of training the model and evaluating it on .
- Reward Function (R): The reward quantifies the improvement in model performance, e.g.,
- Discount Factor (): The discount factor balances immediate and future rewards.
- Neural Network Function Approximation: Use deep neural networks to parameterize the policy and value function .
- Parallelization: Distribute the evaluation of hyperparameter configurations across multiple workers.
- Early Stopping: Use techniques like Hyperband to terminate poorly performing configurations early.
17.5.13. Meta-Learning
17.5.13.1 Literature Review of Meta-Learning
17.5.13.2 Analysis of Meta-Learning
18. Convolution Neural Networks
18.1. Literature Review of Convolution Neural Networks
18.2. Key Concepts
18.3. Applications in Image Processing
18.3.1. Image Classification
18.3.1.1 Literature Review of Image Classification
18.3.1.2 Analysis of Image Classification
18.3.2. Object Detection
18.3.2.1 Literature Review of Object Detection
18.3.2.2 Analysis of Object Detection
18.4. Real-World Applications
18.4.1. Medical Imaging
18.4.1.1 Literature Review of Medical Imaging
18.4.1.2 Analysis of Medical Imaging
18.4.2. Autonomous Vehicles
18.4.2.1 Literature Review of Autonomous Vehicles
18.4.2.2 Analysis of Autonomous Vehicles
18.5. Popular CNN Architectures
18.5.1. Literature Review of Popular CNN Architectures
18.5.2. AlexNet
18.5.3. ResNet
18.5.4. VGG
19. Recurrent Neural Networks (RNNs)
19.1. Literature Review of Recurrent Neural Networks (RNNs)
19.2. Key Concepts
19.3. Sequence Modeling and Long Short-Term Memory (LSTM) and GRUs
19.3.1. Literature Review of Sequence Modeling and Long Short-Term Memory (LSTM) and GRUs
19.3.2. Analysis of Sequence Modeling and Long Short-Term Memory (LSTM) and GRUs
19.4. Applications in Natural Language Processing
19.4.1. Literature Review of Applications in Natural Language Processing
19.4.2. Analysis of Applications in Natural Language Processing
19.5. Deep Learning and the Collatz Conjecture
19.5.1. Literature Review of Deep Learning and the Collatz Conjecture
19.5.2. Analysis of Deep Learning and the Collatz Conjecture
19.6. Mertens Function and the Collatz Conjecture
19.6.1. Literature Review of Bounds of Mertens Function
- Kotnik & van de Lune (2004) [436] performed numerical experiments on the order of and gave heuristic evidence and computational data about the local maxima/minima of .
- Greg Hurst (2018) [437] improved computations of up to and used modern algorithms to push the numerically observed bounds for lim sup and lim inf to roughly (improving the lower/upper bounds coming from Odlyzko–te Riele methods). Hurst also described improved algorithms for computing asymptotically faster than naive summation.
- In comparing to other explicit (rigorous) bounds, the bounds proposed by Cox et. al. (2021) [1214] is weaker than the best known unconditional or conditional bounds in magnitude, but interesting in that it ties the growth of to which is much smaller than typical scale for large x. Its main value is conceptual: exploring novel arithmetic identities and conditional paths toward RH rather than giving the sharpest bounds.
- More recently (2024–2025) Kim & Nguyen applied advanced lattice-reduction and algorithmic improvements (moving beyond classic LLL to BKZ and modern CVP techniques) to substantially lower the proven upper bound on the smallest counterexample: they report rigorous reductions of the bound to about (their arXiv / journal work gives the current best rigorous upper bound). These works continue the Odlyzko–te Riele [865] program but exploit decades of progress in lattice algorithms motivated by cryptography.
- Explicit formula/complex analysis/Perron’s formula: express in terms of nontrivial zeta zeros and use zero-location information to bound sums . This is the bridge linking (M(x)) to RH and zero statistics.
- Zero-density / spacing and moment methods (analytic): control of zero heights and moments of feed into conditional upper bounds (Maier, Montgomery, Soundararajan). Soundararajan’s argument bounds the frequency of abnormal clustering of zeros and yields the currently best conditional growth bound.
- Diophantine/lattice methods (Odlyzko–te Riele and descendants): translate the existence of large values of the truncated explicit formula into an inhomogeneous simultaneous approximation problem. Then use lattice-basis reduction (LLL, BKZ) and CVP/Aggregation algorithms to produce rigorous upper bounds on the first counterexample. Modern work improves reduction quality and thus the proven exponential bounds.
- Heavy computation / algorithmic summation: faster summation algorithms, GPU methods, blockwise techniques, and practical searches compute up to very large thresholds and provide empirical data (Kotnik, Hurst, Deléglise–Rivat and others).
- Exact order of growth: Unconditional asymptotic order of is unknown; conditional heuristics and evidence point to times slowly growing iterated-log factors (Gonek/Ng), but no proof exists.
- Explicit smallest counterexample: While Odlyzko–te Riele proved a counterexample exists and later work vastly lowered the proven upper bound for the least counterexample (from astronomically huge values down by many orders of magnitude), an explicit small counterexample is still unknown; current rigorous upper bounds are enormous exponentials (though much reduced by modern lattice work of Kim–Nguyen).
- Sharp conditional bounds under RH / GRH: Soundararajan’s conditional bound remains the strongest in the general literature; further progress on zero statistics (moments, spacing) may refine this.
19.6.2. Deep Learning Approaches
19.6.3. Experiment 1: Sequence Prediction of the Möbius Function
19.6.4. Experiment 2: Estimating Growth Bounds on the Mertens Function Using Regression Models
- Fully connected deep neural networks (DNNs)
- Convolutional neural networks (CNNs) applied to structured embeddings of n
- Recurrent neural networks (RNNs) incorporating sequential dependencies in arithmetic functions
19.6.5. Experiment 3: Neural Network Approximation of the Zeta Function Zeros
- Numerical Evidence for Spectral Connections: If the model can learn meaningful patterns in the distribution of zeros, this might provide further empirical support for the spectral interpretation of prime number distributions.
- Predictive Utility: A model capable of estimating new zero locations could refine our understanding of the error terms in prime number theorems and potentially guide new conjectures in analytic number theory.
- Deep Learning as a Theoretical Tool: While deep learning does not offer rigorous mathematical proofs, its ability to approximate highly nonlinear functions could lead to novel heuristic insights, paving the way for new analytical techniques to study the Mertens function and related zeta function properties.
19.6.6. Experiment 4: Graph Neural Networks for Prime Factorization Trees
19.6.7. Experiment 5: Autoencoders for Dimensionality Reduction of Number-Theoretic Functions
20. Advanced Architectures
20.1. Transformers and Attention Mechanisms
20.1.1. Literature Review of Transformers and Attention Mechanisms
20.1.2. Analysis of Transformers and Attention Mechanisms
20.2. Generative Adversarial Networks (GANs)
20.2.1. Literature Review of Generative Adversarial Networks (GANs)
20.2.2. Analysis of Generative Adversarial Networks (GANs)
20.3. Autoencoders and Variational Autoencoders
20.3.1. Literature Review of Autoencoders and Variational Autoencoders
20.3.2. Analysis of Autoencoders and Variational Autoencoders
20.4. Graph Neural Networks (GNNs)
20.4.1. Literature Review of Graph Neural Networks (GNNs)
20.4.2. Analysis of Graph neural networks (GNNs)
20.5. Physics Informed Neural Networks (PINNs)
20.5.1. Literature Review of Physics Informed Neural Networks (PINNs)
20.5.2. Analysis of Physics Informed Neural Networks (PINNs)
- is a differential operator, for instance, the Laplace operator, or the Navier-Stokes operator for fluid dynamics.
- is the unknown solution we wish to approximate.
- is a known source term, which could represent external forces or other sources in the system.
- is the domain in which the equation is valid, such as a bounded region in (e.g., ).
- is a nonlinear activation function, such as ReLU or sigmoid.
- and are the weight matrices and bias vectors of the i-th layer.
- The function is a feedforward neural network with multiple layers.
- Data-driven loss term: This term enforces the agreement between the model predictions and any available data points (boundary or initial conditions).
- Physics-driven loss term: This term enforces the satisfaction of the governing PDE at collocation points within the domain .
20.6. Implementation of the Deep Galerkin Methods (DGM) Using the Physics-Informed Neural Networks (PINNs)
21. Deep Kolmogorov Methods
21.1. Literature Review of Deep Kolmogorov Methods
21.2. The Kolmogorov Backward Equation and Its Functional Formulation
21.3. The Feynman-Kac Representation and Its Justification
21.4. Deep Kolmogorov Method: Neural Network Approximation
-
Neural Network Approximation Error:
-
Monte Carlo Sampling Error:where N is the number of samples used in SGD.
22. Reinforcement Learning
22.1. Literature Review of Reinforcement Learning
22.2. Key Concepts
- is the state space,
- is the action space,
- is the state transition probability,
- is the reward function,
- is the discount factor.
22.3. Deep Q-Learning
22.3.1. Literature Review of Deep Q-Learning
22.3.2. Analysis of Deep Q-Learning
22.3.3. Analysis of Double Q-Learning
22.3.4. Analysis of Dueling Q-Learning
22.3.5. Analysis of Prioritized Experience Replay
22.3.6. Analysis of Rainbow Deep Q-Network
22.4. Tabular Q-Learning
22.5. SARSA (State–Action–Reward–State–Action)
22.6. Expected SARSA (State–Action–Reward–State–Action)
22.7. Applications in Games and Robotics
22.7.1. Literature Review of Applications in Games and Robotics
22.7.2. Analysis of Applications in Games and Robotics
- is the state space, which represents all possible states the agent can be in.
- is the action space, which represents all possible actions the agent can take.
- is the state transition probability, which defines the probability of transitioning from state s to state under action a.
- is the reward function, which defines the immediate reward received after taking action a in state s.
- is the discount factor, which determines the importance of future rewards.
23. Federated Learning

23.1. Literature Review of Federated Learning
- Foundations of Federated Learning: The concept of Federated Learning was introduced by McMahan et al. (2017) [1186] in their seminal work, Communication-Efficient Learning of Deep Networks from Decentralized Data. They proposed Federated Averaging (FedAvg), an algorithm that allows distributed devices to train local models and share only the model updates instead of raw data. This work laid the foundation for privacy-preserving machine learning. Kairouz et al. (2021) [1187] provided a comprehensive survey on FL, covering its mathematical framework, privacy concerns, optimization techniques, and future research directions.
-
Privacy and Security in Federated Learning: One of the primary motivations for FL is preserving user privacy. Various studies have explored privacy-enhancing techniques, such as:
- –
- Differential Privacy (DP): Abadi et al. (2016) [1188] introduced differentially private SGD, which limits the influence of individual data points on model updates. This has been incorporated into FL to ensure user-level privacy.
- –
- Secure Aggregation: Bonawitz et al. (2017) [1189] developed cryptographic protocols to securely aggregate model updates, preventing adversaries from accessing individual updates.
- –
- Adversarial Attacks and Defenses: Zhao et al. (2018) [1190] studied model inversion attacks, highlighting the vulnerability of FL to privacy leakage and proposing defenses such as secure multi-party computation (MPC) and homomorphic encryption.
-
Communication Efficiency in Federated Learning: Efficient communication is crucial for FL due to the distributed nature of training. Several methods have been proposed to optimize communication:
- –
- Compression Techniques: Sattler et al. (2019) [1192] explored gradient compression techniques, such as quantization and sparsification, to reduce communication overhead.
- –
- Adaptive Federated Optimization: Reddi et al. (2020) [1193] proposed adaptive federated optimization methods, including FedProx and FedOpt, to enhance convergence and stability in heterogeneous data settings.
-
Personalization and Heterogeneous Data Handling: Unlike traditional centralized learning, FL operates on non-IID (independent and identically distributed) data across clients. Researchers have developed personalized FL approaches to address data heterogeneity:
- –
- Clustered FL: Sattler et al. (2020) [1194] introduced methods to group clients with similar data distributions for better model convergence.
- –
- Meta-Learning in FL: Fallah et al. (2020) applied meta-learning techniques in FL to improve model generalization across diverse clients.
-
Applications of Federated Learning: Federated Learning has been applied in various domains, including:
- –
- Healthcare: Sheller et al. (2020) [1196] demonstrated FL in medical imaging, allowing hospitals to collaboratively train models without sharing patient data.
- –
- Finance: Byrd and Polychroniadou et al. (2020) explored FL in fraud detection, improving prediction accuracy while ensuring data confidentiality.
- –
- Edge Computing: Jagatheesaperumal et al. (2021) [1198] studied FL for Internet of Things (IoT) applications, reducing reliance on cloud-based computation.
- Scalability: Efficient handling of a large number of clients remains an open research problem.
- Privacy-Utility Trade-off: Balancing privacy protection with model accuracy is an ongoing research area.
- Fairness and Bias: Addressing biases in FL models due to non-representative data distributions is a critical issue.
23.2. Recent Literature Review of Federated Learning
- Fundamentals and Privacy-Preserving Mechanisms in Federated Learning: Meduri et al. (2024) [1199] discuss a novel FL architecture for privacy-preserving analysis of electronic health records (EHRs), highlighting its benefits in rare disease research. The study introduces a secure communication framework that enhances data confidentiality in multi-institutional research. Tzortzis et al. (2025) [1200] explore generalizable FL in medical imaging, with a case study on mammography data. Their research compares centralized and decentralized training approaches, demonstrating that federated models can improve diagnosis accuracy while maintaining data sovereignty. Szelag et al. (2025) [1201] present a survey on adaptive adversaries in Byzantine-robust FL, discussing attacks on FL networks and countermeasures such as differential privacy, secure aggregation, and robust optimization strategies.
- Federated Learning for IoT and Smart Systems: Ferretti et al. (2025) [1203] propose a blockchain-based federated learning system for resilient and decentralized coordination, improving reliability and traceability in edge AI environments. Their approach ensures secure and tamper-proof federated model updates. Chen et al. (2025) [1204] introduce Federated Hyperdimensional Computing (FHC) for quality monitoring in smart manufacturing, which leverages hierarchical learning strategies to improve anomaly detection and predictive maintenance in industrial settings. Mei et al. (2025) [1205] explore semi-asynchronous FL control strategies in satellite networks, focusing on optimizing communication efficiency and reducing training latency in federated AI for space applications.
- Advances in Federated Learning for Edge AI and Security: Rawas and Samala (2025) [1206] introduce Edge-Assisted Federated Learning (EAFL) for real-time disease prediction, integrating FL with Edge AI to enhance processing efficiency while preserving patient data privacy. Becker et al. (2025) [1207] examine combined reconstruction and poisoning attacks on FL systems, assessing vulnerabilities and proposing mitigation strategies such as federated adversarial learning and model verification techniques.
- Optimization and Personalization in Federated Learning: Fu et al. (2025) present Personalized Federated Learning (Reads) [1208], incorporating fine-grained layer aggregation and decentralized clustering to address data heterogeneity among FL clients. Li et al. (2025) [1209] propose UltraFlwr, an efficient federated medical object detection framework designed to optimize federated model aggregation for medical imaging datasets. Shi et al. (2025) [1210] introduce FedLWS, a novel FL technique that applies layer-wise weight shrinking to improve training stability and reduce the risk of model overfitting.
- Federated Learning in Financial and Fraud Detection Applications: Choudhary (2025) [1212] reports on the integration of federated learning in fraud detection, demonstrating a 72 percentage reduction in privacy-related risks while maintaining high accuracy in financial anomaly detection. Zhou et al. (2025) [1213] introduce Blockchain-Empowered Cluster Distillation FL, which optimizes training efficiency in heterogeneous smart grids, ensuring robust energy management and fraud prevention.
- Enhancing secure model aggregation using cryptographic techniques.
- Reducing computational costs for real-world FL deployment.
- Improving federated AI in healthcare, finance, and IoT by refining optimization algorithms.
23.3. Formal Definition of Federated Learning
23.4. Federated Learning and Distributed Optimization Framework
23.4.1. Clients
- is the loss function (e.g., cross-entropy, mean squared error) evaluated on a data point .
- is the local dataset of client k, which may differ significantly from other clients’ datasets (non-IID data).
- Non-IID Data: The local data distribution may differ significantly from the global data distribution D, leading to statistical heterogeneity. This can be quantified by the gradient divergence:where measures the degree of non-IIDness.
- Resource Constraints: Clients often have limited computational resources (e.g., CPU, memory) and communication bandwidth. This necessitates efficient algorithms for local training and model compression.
- Privacy and Security: Clients must ensure that their local data is not exposed during training. Techniques such as differential privacy and secure multi-party computation (SMPC) are employed to protect client data.
23.4.2. Server
- Heterogeneous Client Participation: Clients may have varying computational resources, communication bandwidth, and availability, leading to asynchronous participation. The server must handle this heterogeneity to ensure efficient training.
- Non-IID Data: The local data distributions may differ significantly across clients, leading to statistical heterogeneity. This can cause client drift and slow convergence. The server must account for this by using robust aggregation methods.
- Privacy and Security: The server must ensure that the global model updates do not leak sensitive information about the clients’ local data. Techniques such as secure aggregation and differential privacy are employed to protect client privacy.
23.4.3. Local Updates
23.4.4. Model Aggregation
23.5. Detailed Steps in a Communication Round
- Client Selection: At the start of each communication round t, the server selects a subset of clients to participate. The selection may be random or based on criteria such as client availability, computational resources, or data distribution. The probability of selecting client k is denoted by , where
- Model Distribution: The server sends the current global model parameters to the selected clients .
- Local Training: Each selected client performs steps of local stochastic gradient descent (SGD) on its dataset to compute updated parameters . The local update rule at local step s is:where is the learning rate at round t, is the stochastic gradient computed on a mini-batch . After steps, the client sends the updated parametersback to the server.
- Model Aggregation: The server aggregates the local updates from the selected clients using Federated Averaging (FedAvg):where is the number of data points on client k and is the total number of data points across the selected clients.
23.5.1. Challenges in Communication Rounds
- Heterogeneous Client Participation: Clients may have varying computational resources, communication bandwidth, and availability, leading to asynchronous participation. This can slow down the training process and introduce bias in the model updates.
- Non-IID Data: The local data distributions may differ significantly across clients, leading to statistical heterogeneity. This can cause client drift and slow convergence.
- Communication Bottlenecks: The communication between the server and clients can be a bottleneck, especially in large-scale FL systems with millions of clients. Techniques such as model compression and sparse updates are used to reduce communication costs.
23.5.1.1 Heterogeneous Client Participation
23.5.1.2 Communication Bottlenecks
23.5.1.3 Non-IID Data
23.5.1.4 Python Code to Generate Figure 211, Figure 212, and Figure 213 Illustrating Heterogeneous Client Participation in Federated Learning





23.5.1.5 Python Code to Generate Figure 214 and Figure 215 Illustrating Communication Bottlenecks and Cumulative Communication over Rounds in Federated Learning




23.5.1.6 Python Code to Generate Figure 216 Illustrating Client Drift in Federated Learning, Measured as L2 Distance from the Global Model



23.5.1.7 Python Code to Generate Figure 217 Illustrating Comparative Client Drift Under Different Heterogeneity Scenarios in Federated Learning



23.5.1.8 Python Code to Generate Figure 218 Illustrating Effect of Client Drift on Global Model Accuracy in Federated Learning



23.5.2. Advanced Techniques for Communication Rounds
- Adaptive Client Selection: The server can use adaptive client selection strategies to prioritize clients with higher data quality or computational resources. For example, clients with larger datasets or lower gradient divergence may be selected more frequently.
- Local Step Adaptation: The number of local steps can be adapted dynamically based on the client’s computational resources and data distribution. For example, clients with more data may perform more local steps to reduce communication frequency.
- Secure Aggregation: To ensure privacy, the server can use secure multi-party computation (SMPC) to aggregate client updates without revealing individual contributions:
23.5.2.1 Adaptive Client Selection
23.5.2.2 Local Step Adaptation
23.5.2.3 Secure Aggregation
23.5.2.4 Python Code to Generate Figure 219 and Figure 222 Illustrating Adaptive Client Selection in Federated Learning




23.5.2.5 Python Code to Generate Figure 221 and Figure 222 Illustrating Local Step Adaptation in Federated Learning




23.5.2.6 Python Code to Generate Figure 223, Figure 224, and Figure 225 Illustrating Secure Aggregation in Federated Learning






23.6. Theoretical Foundations
23.6.1. Smoothness and Convexity
- is L-smooth: for all .
- is -strongly convex: .
- represents the local loss function associated with client k.
- The coefficients are non-negative weights assigned to each client, which sum to one, i.e.,
23.6.2. Bounded Variance
23.6.2.1 Python Code to Generate Figure 226 Illustrating the Stochastic Gradient Bounded Variance in Federated Learning



23.6.3. Heterogeneity
23.6.3.1 Python Code to Generate Figure 227 and Figure 228 Illustrating the System Heterogeneity in Federated Learning




23.7. Convergence Analysis
23.7.1. Convergence Rate

23.7.1.1 Python Code to Generate Figure 230 Illustrating the Convergence Rate in Federated Learning Under Different Conditions



23.7.1.2 Python Code to Generate Figure 230 Illustrating the Convergence Rate in Federated Learning (log-log Scale)


23.7.2. Communication Complexity
23.8. Advanced Techniques
23.8.1. Adaptive Optimization
- If a client has high gradient variance, is large, leading to a small .
- If a client has low gradient variance, is small, leading to a large .
23.8.2. Differential Privacy
23.8.3. Sparse Updates
23.9. Statistical Learning Perspective
23.10. Open Problems and Future Directions
- Theoretical Limits: Deriving tight lower bounds on communication complexity and convergence rates.
- Robustness: Developing algorithms resilient to adversarial clients and Byzantine failures.
- Scalability: Scaling FL to massive networks with millions of clients.
23.11. Conclusion
24. Diffusion Models and Score-Based Generative Models
24.1. Literature Review of Diffusion Models and Score-Based Generative Models
24.2. Analysis of Diffusion Models and Score-Based Generative Models
24.3. Key Conceptual Components of Diffusion Models and Score-Based Generative Models
24.3.1. Forward Diffusion Process
- is the drift coefficient, dictating the deterministic evolution of .
- is the diffusion coefficient, controlling the rate of noise injection.
- is a standard Wiener process (Brownian motion), introducing Gaussian noise.
- Forward SDE: .
- Perturbation Kernel: .
- Fokker-Planck Equation: .
- Infinitesimal Generator: .
- Score-Based Perturbation: .
24.3.2. Reverse Diffusion Process
- is the score function of the perturbed data distribution at time t,
- is a reverse-time Wiener process,
- The term is the drift correction ensuring is preserved.
- is a weighting function (often ),
- is the perturbation kernel,
- .
- Reverse SDE: .
- Probability Flow ODE: .
- DSM Objective: .
- Langevin Dynamics: .
- Likelihood Computation: .
24.3.3. Probability Flow ODE
24.3.4. Training Objective
24.3.5. Sampling
24.3.6. Score-Based Generative Models
24.3.7. Langevin Dynamics for Sampling
24.3.8. Connection to Diffusion Models
24.3.9. Likelihood Computation
- Exploding positive eigenvalues causing local volume expansion
- Large negative eigenvalues leading to numerical instabilities
- Ill-conditioned transformations in the probability flow ODE
24.3.10. Conclusion
24.4. Stable Diffusion
24.4.1. Literature Review of Stable Diffusion
24.4.2. Analysis of Stable Diffusion
- Compactness: maps images to a bounded subspace of .
- Invertibility: The decoder should approximately satisfy .
- Gaussian Prior: The final latent converges to .
- The Chapman-Kolmogorov equation ensures consistency:
- The Fokker-Planck equation describes the evolution of the probability density:
24.4.3. Latent Variable Model and Diffusion Process
24.4.4. Reverse Diffusion and Denoising
24.4.5. Training Objective
24.4.6. Architecture: U-Net with Cross-Attention
24.4.7. Latent Space and Autoencoder
24.4.8. Sampling Process
24.4.9. Classifier-Free Guidance
25. Kernel Regression
25.1. Literature Review
25.2. Analysis of Kernel Regression
25.3. Nadaraya–Watson Kernel Estimator
25.3.1. Literature Review
25.3.2. Analysis of Nadaraya–Watson Kernel Estimator
- The eigenvalue decay rate controls approximation power.
- Spectral filtering via regularization prevents high-frequency noise.
- Generalization is optimized when balancing bias and variance.
25.4. Priestley–Chao Kernel Estimator
25.4.1. Literature Review
25.4.2. Analysis of Priestley–Chao Kernel Estimator
- Uniform kernel:
- Epanechnikov kernel (optimal in MSE sense):
- Gaussian kernel:
25.5. Gasser–Müller Kernel Estimator
25.5.1. Literature Review
25.5.2. Analysis of Gasser–Müller Kernel Estimator
25.6. Parzen-Rosenblatt Method
25.6.1. Literature Review
25.6.2. Analysis of Parzen-Rosenblatt Method
- Normalization Condition:This ensures that the kernel behaves like a proper probability density function and does not introduce artificial bias into the estimation.
- Symmetry Condition:Symmetry guarantees that the kernel function does not introduce directional bias in the estimation of .
- Non-negativity:While not strictly necessary, this property ensures that remains a valid probability density estimate in a practical sense.
- Finite Second Moment (Variance Condition):This ensures that the kernel function does not assign an excessive amount of probability mass far from the origin, preserving local smoothness properties.
- Unbiasedness Condition (Mean Zero Constraint):This ensures that the kernel function does not introduce artificial shifts in the density estimate.
- Gaussian Kernel:This kernel has the advantage of being infinitely differentiable and providing smooth density estimates.
- Epanechnikov Kernel:This kernel is optimal in the mean integrated squared error (MISE) sense, meaning that it minimizes the variance of while preserving local smoothness properties.
- Uniform Kernel:This kernel is simple but suffers from discontinuities, making it less desirable for smooth density estimation.
26. Natural Language Processing (NLP)
26.1. Literature Review
26.2. Text Classification
26.2.1. Literature Review of Text Classification
26.2.2. Analysis of Text Classification
- Tokenization: Breaking the text into words or tokens.
- Stopword Removal: Removing common words (such as "and", "the", etc.) that do not carry significant meaning.
- Stemming and Lemmatization: Reducing words to their base or root form, e.g., "running" becomes "run".
- Lowercasing: Converting all words to lowercase to ensure consistency.
- Punctuation Removal: Removing punctuation marks.
- Bag-of-Words (BoW) model
- Term Frequency-Inverse Document Frequency (TF-IDF)
26.3. Machine Translation
26.3.1. Literature Review of Machine Translation
26.3.2. Analysis of Machine Translation
26.4. Chatbots and Conversational AI
26.4.1. Literature Review of Chatbots and Conversational AI
26.4.2. Analysis of Chatbots and Conversational AI
26.5. Representation Learning and Optimization
26.6. Structured Prediction and Decoding
27. Deep Learning Frameworks
27.1. TensorFlow
27.1.1. Literature Review of TensorFlow
27.1.2. Analysis of TensorFlow
27.2. PyTorch
27.2.1. Literature Review of PyTorch
27.2.2. Analysis of PyTorch
27.3. JAX
27.3.1. Literature Review of JAX
27.3.2. Analysis of JAX
Acknowledgments
28. Appendix
28.1. Linear Algebra Essentials
28.1.1. Matrices and Vector Spaces
- Addition: Defined entrywise:
- Scalar Multiplication: For ,
-
Matrix Multiplication: If and , then the product is given by:This is only defined when the number of columns of A equals the number of rows of B.
- Transpose: The transpose of A, denoted , satisfies:
- Determinant: If , then its determinant is given recursively by:where is the submatrix obtained by removing the first row and j-th column.
- Inverse: A square matrix A is invertible if there exists such that:where I is the identity matrix.
28.1.2. Vector Spaces and Linear Transformations
- Vector Addition: for
- Scalar Multiplication: for and
- It is linearly independent:
- It spans V, meaning every can be written as:
28.1.3. Eigenvalues and Eigenvectors
28.1.4. Singular Value Decomposition (SVD)
28.2. Probability and Statistics
28.2.1. Probability Distributions
- for each .
- The sum of probabilities across all possible outcomes is 1:
- for all x.
- The total probability over the entire range of X is 1:
28.2.2. Bayes’ Theorem
28.2.3. Statistical Measures
- Measures of Central Tendency (e.g., mean, median, mode)
- Measures of Dispersion (e.g., variance, standard deviation, interquartile range)
- Measures of Shape (e.g., skewness, kurtosis)
- Measures of Association (e.g., covariance, correlation)
- Information-Theoretic Measures (e.g., entropy, mutual information)
- Expectation is linear:
- Variance is translation invariant but scales quadratically:
28.3. Optimization Techniques
28.3.1. Gradient Descent (GD)
- is the current point in the n-dimensional space (iteration index k),
- is the gradient of the objective function at ,
- is the learning rate (step size).
28.3.2. Stochastic Gradient Descent (SGD)
28.3.3. Second-Order Methods
- Gradient Descent (GD): An optimization algorithm that updates the parameter vector in the direction opposite to the gradient of the objective function. Convergence is guaranteed under convexity assumptions with an appropriately chosen step size.
- Stochastic Gradient Descent (SGD): A variant of GD that uses a random subset of the data to estimate the gradient at each iteration. While faster and less computationally intensive, its convergence is slower and more noisy, requiring variance reduction techniques for efficient training.
- Second-Order Methods: These methods use the Hessian (second derivatives of the objective function) to accelerate convergence, often exhibiting quadratic convergence near the optimum. However, the computational cost of calculating the Hessian restricts their practical use. Quasi-Newton methods, such as BFGS, approximate the Hessian to improve efficiency.
28.4. Matrix Calculus
28.4.1. Matrix Differentiation
- Matrix trace: For a matrix , the derivative of the trace with respect to is the identity matrix:
- Matrix product: Let and be matrices, and consider the product . The derivative of this product with respect to is:
- Matrix inverse: The derivative of the inverse of with respect to is:
28.4.2. Tensor Differentiation
28.5. Information Theory
28.5.1. Entropy: The Fundamental Measure of Uncertainty
- Continuity: is a continuous function of .
- Maximality: The uniform distribution for all maximizes entropy:
- Additivity: For two independent random variables X and Y, entropy satisfies:
- Monotonicity: Conditioning reduces entropy:
28.5.2. Source Coding Theorem: Fundamental Limits of Compression
- Achievability: Given a discrete memoryless source (DMS) X with entropy , for any , there exists a source code that compresses sequences of length n to approximately bits per symbol and allows for decoding with vanishing error probability as .
- Converse: No source code can achieve an average code length per symbol smaller than without increasing the error probability to 1.
28.5.3. Noisy Channel Coding Theorem: Fundamental Limits of Communication
- If , there exists a code that allows error-free transmission.
- If , error probability approaches 1.
- : .
- : Some other codeword (with ) satisfies .
- Use Fano’s inequality to relate the error probability to the conditional entropy .
- Apply the data processing inequality to bound the mutual information .
- Show that if , the error probability cannot vanish.
28.5.4. Rate-Distortion Theory: Lossy Data Compression
- The mutual information is a convex function of ,
- The distortion constraint is a linear (and thus convex) constraint.
- Stationarity:
- Primal Feasibility:
- Dual Feasibility:
- Complementary Slackness:
28.5.5. Applications of Information Theory
- Factor Graph Representation: The decoding process is represented as message passing on a factor graph, where the nodes correspond to variables and constraints. The Bethe free energy provides a variational characterization of the decoding problem.
- EXIT Charts: The extrinsic information transfer (EXIT) chart is a tool to analyze the convergence of iterative decoding. The area theorem relates the area under the EXIT curve to the gap to capacity.
- The solution to the constrained optimization problem exists and is unique.
- The maximum entropy distribution is the unique global maximizer of subject to the constraints.
- Sanov’s Theorem: A result in large deviation theory that characterizes the probability of observing an empirical distribution deviating from the true distribution.
- Gibbs’ Inequality: The Shannon entropy is maximized by the uniform distribution when no constraints are imposed.
- Convex Duality: The Lagrange multipliers are dual variables that encode the sensitivity of the entropy to changes in the constraints.
- The Boltzmann distribution for the canonical ensemble.
- The Fermi-Dirac and Bose-Einstein distributions for quantum systems.
- The Gibbs distribution for systems with multiple conserved quantities.
- It assumes knowledge of the correct constraints.
- It may not apply to systems with long-range correlations or non-Markovian dynamics.
- Extensions to non-equilibrium systems remain an active area of research.
28.5.6. Conclusion: Information Theory as a Universal Mathematical Principle
References
- Rao, N., Farid, M., and Raiz, M. (2024). Symmetric Properties of λ-Szász Operators Coupled with Generalized Beta Functions and Approximation Theory. Symmetry, 16(12), 1703.
- Mukhopadhyay, S.N., Ray, S. (2025). Function Spaces. In: Measure and Integration. University Texts in the Mathematical Sciences. Springer, Singapore.
- Szołdra, T. (2024). Ergodicity breaking in quantum systems: from exact time evolution to machine learning (Doctoral dissertation).
- SONG, W. X., CHEN, H., CUI, C., LIU, Y. F., TONG, D., GUO, F., ... and XIAO, C. W. (2025). Theoretical, methodological, and implementation considerations for establishing a sustainable urban renewal model. JOURNAL OF NATURAL RESOURCES, 40(1), 20-38.
- El Mennaoui, O., Kharou, Y., and Laasri, H. (2025). Evolution families in the framework of maximal regularity. Evolution Equations and Control Theory, 0-0.
- Pedroza, G. (2024). On the Conditions for Domain Stability for Machine Learning: a Mathematical Approach. arXiv preprint arXiv:2412.00464.
- Cerreia-Vioglio, S., and Ok, E. A. (2024). Abstract integration of set-valued functions. Journal of Mathematical Analysis and Applications, 129169.
- Averin, A. (2024). Formulation and Proof of the Gravitational Entropy Bound. arXiv preprint arXiv:2412.02470.
- Potter, T. (2025). Subspaces of L2(ℝn) Invariant Under Crystallographic Shifts. arXiv e-prints, arXiv-2501.
- Lee, M. (2025). Emergence of Self-Identity in Artificial Intelligence: A Mathematical Framework and Empirical Study with Generative Large Language Models. Axioms, 14(1), 44.
- Wang, R., Cai, L., Wu, Q., and Niyato, D. (2025). Service Function Chain Deployment with Intrinsic Dynamic Defense Capability. IEEE Transactions on Mobile Computing.
- Duim, J. L., and Mesquita, D. P. (2025). Artificial Intelligence Value Alignment via Inverse Reinforcement Learning. Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, 11(1), 1-2.
- Khayat, M., Barka, E., Serhani, M. A., Sallabi, F., Shuaib, K., and Khater, H. M. (2025). Empowering Security Operation Center with Artificial Intelligence and Machine Learning–A Systematic Literature Review. IEEE Access.
- Agrawal, R. (2025). 46 Detection of melanoma using DenseNet-based adaptive weighted loss function. Emerging Trends in Computer Science and Its Application, 283.
- Hailemichael, H., and Ayalew, B. Adaptive and Safe Fast Charging of Lithium-Ion Batteries Via Hybrid Model Learning and Control Barrier Functions. Available at SSRN 5110597.
- Nguyen, E., Xiao, J., Fan, Z., and Ruan, D. Contrast-free Full Intracranial Vessel Geometry Estimation from MRI with Metric Learning based Inference. In Medical Imaging with Deep Learning.
- Luo, Z., Bi, Y., Yang, X., Li, Y., Wang, S., and Ye, Q. A Novel Machine Vision-Based Collision Risk Warning Method for Unsignalized Intersections on Arterial Roads. Frontiers in Physics, 13, 1527956.
- Bousquet, N., Thomassé, S. (2015). VC-dimension and Erdős–Pósa property. Discrete Mathematics, 338(12), 2302-2317.
- Asian, O., Yildiz, O. T., Alpaydin, E. (2009, September). Calculating the VC-dimension of decision trees. In 2009 24th International Symposium on Computer and Information Sciences (pp. 193-198). IEEE.
- Zhang, C., Bian, W., Tao, D., Lin, W. (2012). Discretized-Vapnik-Chervonenkis dimension for analyzing complexity of real function classes. IEEE transactions on neural networks and learning systems, 23(9), 1461-1472.
- Riondato, M., Akdere, M., Çetintemel, U., Zdonik, S. B., Upfal, E. (2011). The VC-dimension of SQL queries and selectivity estimation through sampling. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part II 22 (pp. 661-676). Springer Berlin Heidelberg.
- Bane, M., Riggle, J., Sonderegger, M. (2010). The VC dimension of constraint-based grammars. Lingua, 120(5), 1194-1208.
- Anderson, A. (2023). Fuzzy VC Combinatorics and Distality in Continuous Logic. arXiv preprint arXiv:2310.04393.
- Fox, J., Pach, J., Suk, A. (2021). Bounded VC-dimension implies the Schur-Erdős conjecture. Combinatorica, 41(6), 803-813.
- Johnson, H. R. (2021). Binary strings of finite VC dimension. arXiv preprint arXiv:2101.06490.
- Janzing, D. (2018). Merging joint distributions via causal model classes with low VC dimension. arXiv preprint arXiv:1804.03206.
- Hüllermeier, E., Fallah Tehrani, A. (2012, July). On the vc-dimension of the choquet integral. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (pp. 42-50). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Mohri, M. (2018). Foundations of machine learning.
- Cucker, F., Zhou, D. X. (2007). Learning theory: an approximation theory viewpoint (Vol. 24). Cambridge University Press.
- Shalev-Shwartz, S., Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.
- Truong, L. V. (2022). On rademacher complexity-based generalization bounds for deep learning. arXiv preprint arXiv:2208.04284.
- Gnecco, G., and Sanguineti, M. (2008). Approximation error bounds via Rademacher complexity. Applied Mathematical Sciences, 2, 153-176.
- Astashkin, S. V. (2010). Rademacher functions in symmetric spaces. Journal of Mathematical Sciences, 169(6), 725-886.
- Ying and Campbell (2010). Rademacher chaos complexities for learning the kernel problem. Neural computation, 22(11), 2858-2886.
- Zhu, J., Gibson, B., and Rogers, T. T. (2009). Human rademacher complexity. Advances in neural information processing systems, 22.
- Astashkin, S. V., Astashkin, S. V., and Mazlum. (2020). The Rademacher system in function spaces. Basel: Birkhäuser.
- Sachs, S., van Erven, T., Hodgkinson, L., Khanna, R., and Şimşekli, U. (2023, July). Generalization Guarantees via Algorithm-dependent Rademacher Complexity. In The Thirty Sixth Annual Conference on Learning Theory (pp. 4863-4880). PMLR.
- Ma and Wang (2020). Rademacher complexity and the generalization error of residual networks. Communications in Mathematical Sciences, 18(6), 1755-1774.
- Bartlett, P. L., and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov), 463-482.
- Bartlett, P. L., and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov), 463-482.
- McDonald, D. J., and Shalizi, C. R. (2011). Rademacher complexity of stationary sequences. arXiv preprint arXiv:1106.0730.
- Abderachid, S., and Kenza, B. EMBEDDINGS IN RIEMANN–LIOUVILLE FRACTIONAL SOBOLEV SPACES AND APPLICATIONS.
- Giang, T. H., Tri, N. M., and Tuan, D. A. (2024). On some Sobolev and Pólya-Sezgö type inequalities with weights and applications. arXiv preprint arXiv:2412.15490.
- Ruiz, P. A., and Fragkiadaki, V. (2024). Fractional Sobolev embeddings and algebra property: A dyadic view. arXiv preprint arXiv:2412.12051.
- Bilalov, B., Mamedov, E., Sezer, Y., and Nasibova, N. (2025). Compactness in Banach function spaces: Poincaré and Friedrichs inequalities. Rendiconti del Circolo Matematico di Palermo Series 2, 74(1), 68.
- Cheng, M., and Shao, K. (2025). Ground states of the inhomogeneous nonlinear fractional Schrödinger-Poisson equations. Complex Variables and Elliptic Equations, 1-17.
- Wei, J., and Zhang, L. (2025). Ground State Solutions of Nehari-Pohozaev Type for Schrödinger-Poisson Equation with Zero-Mass and Weighted Hardy Sobolev Subcritical Exponent. The Journal of Geometric Analysis, 35(2), 48.
- Zhang, X., and Qi, W. (2025). Multiplicity result on a class of nonhomogeneous quasilinear elliptic system with small perturbations in RN. arXiv preprint arXiv:2501.01602.
- Xiao, J., and Yue, C. (2025). A Trace Principle for Fractional Laplacian with an Application to Image Processing. La Matematica, 1-26.
- Pesce, A., and Portaro, S. (2025). Fractional Sobolev spaces related to an ultraparabolic operator. arXiv preprint arXiv:2501.05898.
- LASSOUED, D. (2026). A STUDY OF FUNCTIONS ON THE TORUS AND MULTI-PERIODIC FUNCTIONS. Kragujevac Journal of Mathematics, 50(2), 297-337.
- Chen, H., Chen, H. G., and Li, J. N. (2024). Sharp embedding results and geometric inequalities for Hö rmander vector fields. arXiv preprint arXiv:2404.19393.
- Adams, R. A., and Fournier, J. J. (2003). Sobolev spaces. Elsevier.
- Cox, D., & Ghosh, S. (2022). An Analogue of Mertens’ Function.
- Brezis, H., and Brézis, H. (2011). Functional analysis, Sobolev spaces and partial differential equations (Vol. 2, No. 3, p. 5). New York: Springer.
- Evans, L. C. (2022). Partial differential equations (Vol. 19). American Mathematical Society.
- Maz’â, V. G. (2011). Sobolev Spaces: With Applications to Elliptic Partial Differential Equations. Springer.
- Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359-366.
- Gupta, A., Aberkane, I. J., Ghosh, S., Abold, A., Rahn, A., & Sultanow, E. (2022). Rotating Binaries. AppliedMath 2022, 2, 104–117.
- Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4), 303-314.
- Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory, 39(3), 930-945.
- Pinkus, A. (1999). Approximation theory of the MLP model in neural networks. Acta numerica, 8, 143-195.
- Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. (2017). The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30.
- Hanin, B., and Sellke, M. (2017). Approximating continuous functions by relu nets of minimal width. arXiv preprint arXiv:1710.11278.
- Ghosh, S., Kumawat, K., Sajish, S. D., Arul, J., & Bhattacharya, B. (2025). Time-dependent fatigue reliability of main vessel steel structural components in sodium cooled fast breeder reactors. Nuclear Engineering and Design, 433, 113820.
- Garcıa-Cervera, C. J., Kessler, M., Pedregal, P., and Periago, F. Universal approximation of set-valued maps and DeepONet approximation of the controllability map.
- Majee, S., Abhishek, A., Strauss, T., and Khan, T. (2024). MCMC-Net: Accelerating Markov Chain Monte Carlo with Neural Networks for Inverse Problems. arXiv preprint arXiv:2412.16883.
- Toscano, J. D., Wang, L. L., and Karniadakis, G. E. (2024). KKANs: Kurkova-Kolmogorov-Arnold Networks and Their Learning Dynamics. arXiv preprint arXiv:2412.16738.
- Son, H. (2025). ELM-DeepONets: Backpropagation-Free Training of Deep Operator Networks via Extreme Learning Machines. arXiv preprint arXiv:2501.09395.
- Rudin, W. (1964). Principles of mathematical analysis (Vol. 3). New York: McGraw-hill.
- Stein, E. M., and Shakarchi, R. (2009). Real analysis: measure theory, integration, and Hilbert spaces. Princeton University Press.
- Darrell Cox, Sourangshu Ghosh, “A Partial Factorization Algorithm Using Mertens’ Function", ResearchGate Publications, November 2022.
- Conway, J. B. (2019). A course in functional analysis (Vol. 96). Springer.
- Dieudonné, J. (2020). History of Functional Analyais. In Functional Analysis, Holomorphy, and Approximation Theory (pp. 119-129). CRC Press.
- Cox, D., Ghosh, S., & Sultanow, E. (2022). A Generalization of the Sum of Divisors Function.
- Folland, G. B. (1999). Real analysis: modern techniques and their applications (Vol. 40). John Wiley and Sons.
- Sugiura, S. (2024). On the Universality of Reservoir Computing for Uniform Approximation.
- LIU, Y., LIU, S., HUANG, Z., and ZHOU, P. NORMED MODULES AND THE CATEGORIFICATION OF INTEGRATIONS, SERIES EXPANSIONS, AND DIFFERENTIATIONS.
- Barreto, D. M. (2025). Stone-Weierstrass Theorem.
- Chang, S. Y., and Wei, Y. (2024). Generalized Choi–Davis–Jensen’s Operator Inequalities and Their Applications. Symmetry, 16(9), 1176.
- Caballer, M., Dantas, S., and Rodríguez-Vidanes, D. L. (2024). Searching for linear structures in the failure of the Stone-Weierstrass theorem. arXiv preprint arXiv:2405.06453.
- Chen, D. (2024). The Machado–Bishop theorem in the uniform topology. Journal of Approximation Theory, 304, 106085.
- Rafiei, H., and Akbarzadeh-T, M. R. (2024). Hedge-embedded Linguistic Fuzzy Neural Networks for Systems Identification and Control. IEEE Transactions on Artificial Intelligence.
- Kolmogorov, A. N. (1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. In Doklady Akademii Nauk (Vol. 114, No. 5, pp. 953-956). Russian Academy of Sciences.
- Cox, D., & Ghosh, S. (2022). Farey Sequences and the Franel-Landau Theorem.
- Arnold, V. I. (2009). On the representation of functions of several variables as a superposition of functions of a smaller number of variables. Collected works: Representations of functions, celestial mechanics and KAM theory, 1957–1965, 25-46.
- Lorentz, G. G. (1966). Approximation of functions, athena series. Selected Topics in Mathematics.
- Guilhoto, L. F., and Perdikaris, P. (2024). Deep learning alternatives of the Kolmogorov superposition theorem. arXiv preprint arXiv:2410.01990.
- Alhafiz, M. R., Zakaria, K., Dung, D. V., Palar, P. S., Dwianto, Y. B., and Zuhal, L. R. (2025). Kolmogorov-Arnold Networks for Data-Driven Turbulence Modeling. In AIAA SCITECH 2025 Forum (p. 2047).
- Lorencin, I., Mrzljak, V., Poljak, I., and Etinger, D. (2024, September). Prediction of CODLAG Propulsion System Parameters Using Kolmogorov-Arnold Network. In 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics (SISY) (pp. 173-178). IEEE.
- Ghosh, S. (2024). Analytical Solution of Burgers Equation using Cole-Hopf Transformation: Part 1.
- Trevisan, D., Cassara, P., Agazzi, A., and Scardera, S. NTK Analysis of Knowledge Distillation.
- Bonfanti, A., Bruno, G., and Cipriani, C. (2024). The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks. arXiv preprint arXiv:2402.03864.
- Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31.
- Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., and Pennington, J. (2019). Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32.
- Yang, G., and Hu, E. J. (2020). Feature learning in infinite-width neural networks. arXiv preprint arXiv:2011.14522.
- Xiang, L., Dudziak, Ł., Abdelfattah, M. S., Chau, T., Lane, N. D., and Wen, H. (2021). Zero-Cost Operation Scoring in Differentiable Architecture Search. arXiv preprint arXiv:2106.06799.
- Lee, J., Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., and Pennington, J. (2019). Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems, 32.
- McAllester, D. A. (1999, July). PAC-Bayesian model averaging. In Proceedings of the twelfth annual conference on Computational learning theory (pp. 164-170).
- Catoni, O. (2007). PAC-Bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248.
- Germain, P., Lacasse, A., Laviolette, F., and Marchand, M. (2009, June). PAC-Bayesian learning of linear classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 353-360).
- Seeger, M. (2002). PAC-Bayesian generalisation error bounds for Gaussian process classification. Journal of machine learning research, 3(Oct), 233-269.
- Alquier, P., Ridgway, J., and Chopin, N. (2016). On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research, 17(236), 1-41.
- Dziugaite, G. K., and Roy, D. M. (2017). Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008.
- Ghosh, S. (2020). Inequalities.
- Rivasplata, O., Kuzborskij, I., Szepesvári, C., and Shawe-Taylor, J. (2020). PAC-Bayes analysis beyond the usual bounds. Advances in Neural Information Processing Systems, 33, 16833-16845.
- Lever, G., Laviolette, F., and Shawe-Taylor, J. (2013). Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science, 473, 4-28.
- Rivasplata, O., Parrado-Hernández, E., Shawe-Taylor, J. S., Sun, S., and Szepesvári, C. (2018). PAC-Bayes bounds for stable algorithms with instance-dependent priors. Advances in Neural Information Processing Systems, 31.
- Lindemann, L., Zhao, Y., Yu, X., Pappas, G. J., and Deshmukh, J. V. (2024). Formal verification and control with conformal prediction. arXiv preprint arXiv:2409.00536.
- Jin, G., Wu, S., Liu, J., Huang, T., and Mu, R. (2025). Enhancing Robust Fairness via Confusional Spectral Regularization. arXiv preprint arXiv:2501.13273.
- Ye, F., Xiao, J., Ma, W., Jin, S., and Yang, Y. (2025). Detecting small clusters in the stochastic block model. Statistical Papers, 66(2), 37.
- Bhattacharjee, A., and Bharadwaj, P. (2025). Coherent Spectral Feature Extraction Using Symmetric Autoencoders. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
- Wu, Q., Hu, B., Liu, C. et al. (2025). Velocity Analysis Using High-resolution Hyperbolic Radon Transform with Lq1 − Lq2 Regularization. Pure Appl. Geophys.
- Ortega, I., Hannigan, J. W., Baier, B. C., McKain, K., and Smale, D. (2025). Advancing CH 4 and N 2 O retrieval strategies for NDACC/IRWG high-resolution direct-sun FTIR Observations. EGUsphere, 2025, 1-32.
- Kazmi, S. H. A., Hassan, R., Qamar, F., Nisar, K., and Al-Betar, M. A. (2025). Federated Conditional Variational Auto Encoders for Cyber Threat Intelligence: Tackling Non-IID Data in SDN Environments. IEEE Access.
- Zhao, Y., Bi, Z., Zhu, P., Yuan, A., and Li, X. (2025). Deep Spectral Clustering with Projected Adaptive Feature Selection. IEEE Transactions on Geoscience and Remote Sensing.
- Saranya, S., and Menaka, R. (2025). A Quantum-Based Machine Learning Approach for Autism Detection using Common Spatial Patterns of EEG Signals. IEEE Access.
- Dhalbisoi, S., Mohapatra, A., and Rout, A. (2024, March). Design of Cell-Free Massive MIMO for Beyond 5G Systems with MMSE and RZF Processing. In International Conference on Machine Learning, IoT and Big Data (pp. 263-273). Singapore: Springer Nature Singapore.
- Wei, C., Li, Z., Hu, T., Zhao, M., Sun, Z., Jia, K., ... and Jiang, S. (2025). Model-based convolution neural network for 3D Near-infrared spectral tomography. IEEE Transactions on Medical Imaging.
- Goodfellow, I. (2016). Deep learning (Vol. 196). MIT press.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... and Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
- Haykin, S. (2009). Neural networks and learning machines, 3/E. Pearson Education India.
- Schmidhuber, J. (2015). Deep learning in neural networks: An overview.
- Bishop, C. M., and Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: springer.
- Poggio, T., and Smale, S. (2003). The mathematics of learning: Dealing with data. Notices of the AMS, 50(5), 537-544.
- LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.
- Tishby, N., and Zaslavsky, N. (2015, April). Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw) (pp. 1-5). IEEE.
- Sorrenson, P. (2025). Free-Form Flows: Generative Models for Scientific Applications (Doctoral dissertation).
- Liu, W., and Shi, X. (2025). An Enhanced Neural Network Forecasting System for the July Precipitation over the Middle-Lower Reaches of the Yangtze River.
- Das, P., Mondal, D., Islam, M. A., Al Mohotadi, M. A., and Roy, P. C. (2025). Analytical Finite-Integral-Transform and Gradient-Enhanced Machine Learning Approach for Thermoelastic Analysis of FGM Spherical Structures with Arbitrary Properties. Theoretical and Applied Mechanics Letters, 100576.
- Zhang, R. (2025). Physics-informed Parallel Neural Networks for the Identification of Continuous Structural Systems.
- Ali, S., and Hussain, A. (2025). A neuro-intelligent heuristic approach for performance prediction of triangular fuzzy flow system. Proceedings of the Institution of Mechanical Engineers, Part N: Journal of Nanomaterials, Nanoengineering and Nanosystems, 23977914241310569.
- Li, S. (2025). Scalable, generalizable, and offline methods for imperfect-information extensive-form games.
- Darrell Cox, Sourangshu Ghosh, “Farey Sequences, a Companion Function of Mertens’ Function, and Zeta Function Zeros”, ResearchGate Publications, September 2022. [CrossRef]
- Hu, T., Jin, B., and Wang, F. (2025). An Iterative Deep Ritz Method for Monotone Elliptic Problems. Journal of Computational Physics, 113791.
- Chen, P., Zhang, A., Zhang, S., Dong, T., Zeng, X., Chen, S., ... and Zhou, Q. (2025). Maritime near-miss prediction framework and model interpretation analysis method based on Transformer neural network model with multi-task classification variables. Reliability Engineering and System Safety, 110845.
- Sun, G., Liu, Z., Gan, L., Su, H., Li, T., Zhao, W., and Sun, B. (2025). SpikeNAS-Bench: Benchmarking NAS Algorithms for Spiking Neural Network Architecture. IEEE Transactions on Artificial Intelligence.
- Zhang, Z., Wang, X., Shen, J., Zhang, M., Yang, S., Zhao, W., ... and Wang, J. (2025). Unfixed Bias Iterator: A New Iterative Format. IEEE Access.
- Rosa, G. J. (2010). The Elements of Statistical Learning: Data Mining, Inference, and Prediction by HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J.
- Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
- Zou, H., and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301-320.
- Vapnik, V. (2013). The nature of statistical learning theory. Springer science and business media.
- Cox, D., & Ghosh, S. (2022). Pólya’s Conjecture and Generator Functions for the Möbius and Liouville Functions.
- Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78).
- Li, T. (2025). Optimization of Clinical Trial Strategies for Anti-HER2 Drugs Based on Bayesian Optimization and Deep Learning.
- Yasuda, M., and Sekimoto, K. (2024). Gaussian-discrete restricted Boltzmann machine with sparse-regularized hidden layer. Behaviormetrika, 1-19.
- Xiaodong Luo, William C. Cruz, Xin-Lei Zhang, Heng Xiao, (2023), Hyper-parameter optimization for improving the performance of localization in an iterative ensemble smoother, Geoenergy Science and Engineering, Volume 231, Part B, 212404.
- Alrayes, F.S., Maray, M., Alshuhail, A. et al. (2025) Privacy-preserving approach for IoT networks using statistical learning with optimization algorithm on high-dimensional big data environment. Sci Rep 15, 3338. [CrossRef]
- Cho, H., Kim, Y., Lee, E., Choi, D., Lee, Y., and Rhee, W. (2020). Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE access, 8, 52588-52608.
- IBRAHIM, M. M. W. (2025). Optimizing Tuberculosis Treatment Predictions: A Comparative Study of XGBoost with Hyperparameter in Penang, Malaysia. Sains Malaysiana, 54(1), 3741-3752.
- Abdel-salam, M., Elhoseny, M. and El-hasnony, I.M. Intelligent and Secure Evolved Framework for Vaccine Supply Chain Management Using Machine Learning and Blockchain. SN COMPUT. SCI. 6, 121 (2025). [CrossRef]
- Vali, M. H. (2025). Vector quantization in deep neural networks for speech and image processing.
- Vincent, A.M., Jidesh, P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Sci Rep 13, 4737 (2023). [CrossRef]
- Razavi-Termeh, S. V., Sadeghi-Niaraki, A., Ali, F., and Choi, S. M. (2025). Improving flood-prone areas mapping using geospatial artificial intelligence (GeoAI): A non-parametric algorithm enhanced by math-based metaheuristic algorithms. Journal of Environmental Management, 375, 124238.
- Kiran, M., and Ozyildirim, M. (2022). Hyperparameter tuning for deep reinforcement learning applications. arXiv preprint arXiv:2201.11182.
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90.
- Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
- Cohen, T., and Welling, M. (2016, June). Group equivariant convolutional networks. In International conference on machine learning (pp. 2990-2999). PMLR.
- Zeiler, M. D., and Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13 (pp. 818-833). Springer International Publishing.
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).
- Lin, M. (2013). Network in network. arXiv preprint arXiv:1312.4400.
- Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.
- Bensaid, B., Poëtte, G., and Turpault, R. (2024). Convergence of the Iterates for Momentum and RMSProp for Local Smooth Functions: Adaptation is the Key. arXiv preprint arXiv:2407.15471.
- Liu, Q., and Ma, W. (2024). The Epochal Sawtooth Effect: Unveiling Training Loss Oscillations in Adam and Other Optimizers. arXiv preprint arXiv:2410.10056.
- Li, H. (2024). Smoothness and Adaptivity in Nonlinear Optimization for Machine Learning Applications (Doctoral dissertation, Massachusetts Institute of Technology).
- Heredia, C. (2024). Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations. arXiv preprint arXiv:2411.09734.
- Ye, Q. (2024). Preconditioning for Accelerated Gradient Descent Optimization and Regularization. arXiv preprint arXiv:2410.00232.
- Compagnoni, E. M., Liu, T., Islamov, R., Proske, F. N., Orvieto, A., and Lucchi, A. (2024). Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise. arXiv preprint arXiv:2411.15958.
- Yao, B., Zhang, Q., Feng, R., and Wang, X. (2024). System response curve based first-order optimization algorithms for cyber-physical-social intelligence. Concurrency and Computation: Practice and Experience, 36(21), e8197.
- Wen, X., and Lei, Y. (2024, June). A Fast ADMM Framework for Training Deep Neural Networks Without Gradients. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
- Hannibal, S., Jentzen, A., and Thang, D. M. (2024). Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation. arXiv preprint arXiv:2410.10533.
- Yang, Z. (2025). Adaptive Biased Stochastic Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Kingma, D. P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Reddi, S. J., Kale, S., and Kumar, S. (2019). On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237.
- Jin, L., Nong, H., Chen, L., and Su, Z. (2024). A Method for Enhancing Generalization of Adam by Multiple Integrations. arXiv preprint arXiv:2412.12473.
- Adly, A. M. (2024). EXAdam: The Power of Adaptive Cross-Moments. arXiv preprint arXiv:2412.20302.
- Liu, Y., Cao, Y., and Lin, J. Convergence Analysis of the ADAM Algorithm for Linear Inverse Problems.
- Yang, Z. (2025). Adaptive Biased Stochastic Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Park, K., and Lee, S. (2024). SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization. arXiv preprint arXiv:2412.08894.
- Mahjoubi, M. A., Lamrani, D., Saleh, S., Moutaouakil, W., Ouhmida, A., Hamida, S., ... and Raihani, A. (2025). Optimizing ResNet50 Performance Using Stochastic Gradient Descent on MRI Images for Alzheimer’s Disease Classification. Intelligence-Based Medicine, 100219.
- Seini, A. B., and Adam, I. O. (2024). HUMAN-AI COLLABORATION FOR ADAPTIVE WORKING AND LEARNING OUTCOMES: AN ACTIVITY THEORY PERSPECTIVE.
- Teessar, J. (2024). The Complexities of Truthful Responding in Questionnaire-Based Research: A Comprehensive Analysis.
- Lauand, C. K., and Meyn, S. (2025). Markovian Foundations for Quasi-Stochastic Approximation. SIAM Journal on Control and Optimization, 63(1), 402-430.
- Maranjyan, A., Tyurin, A., and Richtárik, P. (2025). Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity. arXiv preprint arXiv:2501.16168.
- Gao, Z., and Gündüz, D. (2025). Graph Neural Networks over the Air for Decentralized Tasks in Wireless Networks. IEEE Transactions on Signal Processing.
- Yoon, T., Choudhury, S., and Loizou, N. (2025). Multiplayer Federated Learning: Reaching Equilibrium with Less Communication. arXiv preprint arXiv:2501.08263.
- Verma, K., and Maiti, A. (2025). Sine and cosine based learning rate for gradient descent method. Applied Intelligence, 55(5), 352.
- Borowski, M., and Miasojedow, B. (2025). Convergence of projected stochastic approximation algorithm. arXiv e-prints, arXiv-2501.
- Dong, K., Chen, S., Dan, Y., Zhang, L., Li, X., Liang, W., ... and Sun, Y. (2025). A new perspective on brain stimulation interventions: Optimal stochastic tracking control of brain network dynamics. arXiv preprint arXiv:2501.08567.
- Jiang, Y., Kang, H., Liu, J., and Xu, D. (2025). On the Convergence of Decentralized Stochastic Gradient Descent with Biased Gradients. IEEE Transactions on Signal Processing.
- Sonobe, N., Momozaki, T., and Nakagawa, T. (2025). Sampling from Density power divergence-based Generalized posterior distribution via Stochastic optimization. arXiv preprint arXiv:2501.07790.
- Zhang, X., and Jia, G. (2025). Convergence of Policy Gradient for Stochastic Linear Quadratic Optimal Control Problems in Infinite Horizon. Journal of Mathematical Analysis and Applications, 129264.
- Thiriveedhi, A., Ghanta, S., Biswas, S., and Pradhan, A. K. (2025). ALL-Net: integrating CNN and explainable-AI for enhanced diagnosis and interpretation of acute lymphoblastic leukemia. PeerJ Computer Science, 11, e2600.
- Ramos-Briceño, D. A., Flammia-D’Aleo, A., Fernández-López, G., Carrión-Nessi, F. S., and Forero-Peña, D. A. (2025). Deep learning-based malaria parasite detection: convolutional neural networks model for accurate species identification of Plasmodium falciparum and Plasmodium vivax. Scientific Reports, 15(1), 3746.
- Espino-Salinas, C. H., Luna-García, H., Cepeda-Argüelles, A., Trejo-Vázquez, K., Flores-Chaires, L. A., Mercado Reyna, J., ... and Villalba-Condori, K. O. (2025). Convolutional Neural Network for Depression and Schizophrenia Detection. Diagnostics, 15(3), 319.
- Ran, T., Huang, W., Qin, X., Xie, X., Deng, Y., Pan, Y., ... and Zou, D. (2025). Liquid-based cytological diagnosis of pancreatic neuroendocrine tumors using hyperspectral imaging and deep learning. EngMedicine, 2(1), 100059.
- Araujo, B. V. S., Rodrigues, G. A., de Oliveira, J. H. P., Xavier, G. V. R., Lebre, U., Cordeiro, C., ... and Ferreira, T. V. (2025). Monitoring ZnO surge arresters using convolutional neural networks and image processing techniques combined with signal alignment. Measurement, 116889.
- Sari, I. P., Elvitaria, L., and Rudiansyah, R. (2025). Data-driven approach for batik pattern classification using convolutional neural network (CNN). Jurnal Mandiri IT, 13(3), 323-331.
- Wang, D., An, K., Mo, Y., Zhang, H., Guo, W., and Wang, B. Cf-Wiad: Consistency Fusion with Weighted Instance and Adaptive Distribution for Enhanced Semi-Supervised Skin Lesion Classification. Available at SSRN 5109182.
- Cai, P., Zhang, Y., He, H., Lei, Z., and Gao, S. (2025). DFNet: A Differential Feature-Incorporated Residual Network for Image Recognition. Journal of Bionic Engineering, 1-14.
- Vishwakarma, A. K., and Deshmukh, M. (2025). CNNM-FDI: Novel Convolutional Neural Network Model for Fire Detection in Images. IETE Journal of Research, 1-14.
- Ranjan, P., Kaushal, A., Girdhar, A., and Kumar, R. (2025). Revolutionizing hyperspectral image classification for limited labeled data: unifying autoencoder-enhanced GANs with convolutional neural networks and zero-shot learning. Earth Science Informatics, 18(2), 1-26.
- Naseer, A., and Jalal, A. Multimodal Deep Learning Framework for Enhanced Semantic Scene Classification Using RGB-D Images.
- Wang, Z., and Wang, J. (2025). Personalized Icon Design Model Based on Improved Faster-RCNN. Systems and Soft Computing, 200193.
- Ramana, R., Vasudevan, V., and Murugan, B. S. (2025). Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection. IETE Journal of Research, 1-13.
- Shin, S., Land, O., Seider, W., Lee, J., and Lee, D. (2025). Artificial Intelligence-Empowered Automated Double Emulsion Droplet Library Generation.
- Taca, B. S., Lau, D., and Rieder, R. (2025). A comparative study between deep learning approaches for aphid classification. IEEE Latin America Transactions, 23(3), 198-204.
- Ulaş, B., Szklenár, T., and Szabó, R. (2025). Detection of Oscillation-like Patterns in Eclipsing Binary Light Curves using Neural Network-based Object Detection Algorithms. arXiv preprint arXiv:2501.17538.
- Valensi, D., Lupu, L., Adam, D., and Topilsky, Y. Semi-Supervised Learning, Foundation Models and Image Processing for Pleural Line Detection and Segmentation in Lung Ultrasound. Foundation Models and Image Processing for Pleural Line Detection and Segmentation in Lung Ultrasound.
- V, A., V, P. and Kumar, D. An effective object detection via BS2ResNet and LTK-Bi-LSTM. Multimed Tools Appl (2025). [CrossRef]
- Zhu, X., Chen, W., and Jiang, Q. (2025). High-transferability black-box attack of binary image segmentation via adversarial example augmentation. Displays, 102957.
- Guo, X., Zhu, Y., Li, S., Wu, S., and Liu, S. (2025). Research and Implementation of Agronomic Entity and Attribute Extraction Based on Target Localization. Agronomy, 15(2), 354.
- Yousif, M., Jassam, N. M., Salim, A., Bardan, H. A., Mutlak, A. F., Sallibi, A. D., and Ataalla, A. F. Melanoma Skin Cancer Detection Using Deep Learning Methods and Binary GWO Algorithm.
- Rahman, S. I. U., Abbas, N., Ali, S., Salman, M., Alkhayat, A., Khan, J., ... and Gu, Y. H. (2025). Deep Learning and Artificial Intelligence-Driven Advanced Methods for Acute Lymphoblastic Leukemia Identification and Classification: A Systematic Review. Comput Model Eng Sci, 142(2).
- Pratap Joshi, K., Gowda, V. B., Bidare Divakarachari, P., Siddappa Parameshwarappa, P., and Patra, R. K. (2025). VSA-GCNN: Attention Guided Graph Neural Networks for Brain Tumor Segmentation and Classification. Big Data and Cognitive Computing, 9(2), 29.
- Ng, B., Eyre, K., and Chetrit, M. (2025). Prediction of ischemic cardiomyopathy using a deep neural network with non-contrast cine cardiac magnetic resonance images. Journal of Cardiovascular Magnetic Resonance, 27.
- Nguyen, H. T., Lam, T. B., Truong, T. T. N., Duong, T. D., and Dinh, V. Q. Mv-Trams: An Efficient Tumor Region-Adapted Mammography Synthesis Under Multi-View Diagnosis. Available at SSRN 5109180.
- Chen, W., Xu, T., and Zhou, W. (2025). Task-based Regularization in Penalized Least-Squares for Binary Signal Detection Tasks in Medical Image Denoising. arXiv preprint arXiv:2501.18418.
- Richards, G., Dutta, S., & Ghosh, S. (2020). Rayleigh Benard Convection and modeling it under the Stochastic Framework.
- Pradhan, P. D., Talmale, G., and Wazalwar, S. Deep dive into precision (DDiP): Unleashing advanced deep learning approaches in diabetic retinopathy research for enhanced detection and classification of retinal abnormalities. In Recent Advances in Sciences, Engineering, Information Technology and Management (pp. 518-530). CRC Press.
- Örenç, S., Acar, E., Özerdem, M. S., Şahin, S., and Kaya, A. (2025). Automatic Identification of Adenoid Hypertrophy via Ensemble Deep Learning Models Employing X-ray Adenoid Images. Journal of Imaging Informatics in Medicine, 1-15.
- Jiang, M., Wang, S., Chan, K. H., Sun, Y., Xu, Y., Zhang, Z., ... and Tan, T. (2025). Multimodal Cross Global Learnable Attention Network for MR images denoising with arbitrary modal missing. Computerized Medical Imaging and Graphics, 102497.
- Al-Haidri, W., Levchuk, A., Zotov, N., Belousova, K., Ryzhkov, A., Fokin, V., ... and Brui, E. (2025). Quantitative analysis of myocardial fibrosis using a deep learning-based framework applied to the 17-Segment model. Biomedical Signal Processing and Control, 105, 107555.
- Osorio, S. L. J., Ruiz, M. A. R., Mendez-Vazquez, A., and Rodriguez-Tello, E. (2024). Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting. arXiv preprint arXiv:2404.15377.
- Umeano, C., and Kyriienko, O. (2024). Ground state-based quantum feature maps. arXiv preprint arXiv:2404.07174.
- Liu, N., He, X., Laurent, T., Di Giovanni, F., Bronstein, M. M., and Bresson, X. (2024). Advancing Graph Convolutional Networks via General Spectral Wavelets. arXiv preprint arXiv:2405.13806.
- Vlasic, A. (2024). Quantum Circuits, Feature Maps, and Expanded Pseudo-Entropy: A Categorical Theoretic Analysis of Encoding Real-World Data into a Quantum Computer. arXiv preprint arXiv:2410.22084.
- Kim, M., Hioka, Y., and Witbrock, M. (2024). Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis. arXiv preprint arXiv:2410.04703.
- Xie, Y., Daigavane, A., Kotak, M., and Smidt, T. (2024). The price of freedom: Exploring tradeoffs between expressivity and computational efficiency in equivariant tensor products. In ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling.
- Liu, G., Wei, Z., Zhang, H., Wang, R., Yuan, A., Liu, C., ... and Cao, G. (2024, April). Extending Implicit Neural Representations for Text-to-Image Generation. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3650-3654). IEEE.
- Zhang, M. (2024). Lock-in spectrum: a tool for representing long-term evolution of bearing fault in the time–frequency domain using vibration signal. Sensor Review, 44(5), 598-610.
- Hamed, M., and Lachiri, Z. (2024, July). Expressivity Transfer In Transformer-Based Text-To-Speech Synthesis. In 2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP) (Vol. 1, pp. 443-448). IEEE.
- Lehmann, F., Gatti, F., Bertin, M., Grenié, D., and Clouteau, D. (2024). Uncertainty propagation from crustal geologies to rock-site ground motion with a Fourier Neural Operator. European Journal of Environmental and Civil Engineering, 28(13), 3088-3105.
- Jurafsky, D. (2000). Speech and language processing.
- Manning, C., and Schutze, H. (1999). Foundations of statistical natural language processing. MIT press.
- Liu, Y., and Zhang, M. (2018). Neural network methods for natural language processing.
- Allen, J. (1988). Natural language understanding. Benjamin-Cummings Publishing Co., Inc.
- Li, Z., Zhao, Y., Zhang, X., Han, H., and Huang, C. (2025). Word embedding factor based multi-head attention. Artificial Intelligence Review, 58(4), 1-21.
- Hempelmann, C. F., Rayz, J., Dong, T., and Miller, T. (2025, January). Proceedings of the 1st Workshop on Computational Humor (CHum). In Proceedings of the 1st Workshop on Computational Humor (CHum).
- Koehn, P. (2009). Statistical machine translation. Cambridge University Press.
- Eisenstein, J. (2019). Introduction to natural language processing. The MIT Press.
- Otter, D. W., Medina, J. R., and Kalita, J. K. (2020). A survey of the usages of deep learning for natural language processing. IEEE transactions on neural networks and learning systems, 32(2), 604-624.
- Mitkov, R. (Ed.). (2022). The Oxford handbook of computational linguistics. Oxford university press.
- Liu, X., Tao, Z., Jiang, T., Chang, H., Ma, Y., and Huang, X. (2024). ToDA: Target-oriented Diffusion Attacker against Recommendation System. arXiv preprint arXiv:2401.12578.
- Çekik, R. (2025). Effective Text Classification Through Supervised Rough Set-Based Term Weighting. Symmetry, 17(1), 90.
- Zhu, H., Xia, J., Liu, R., and Deng, B. (2025). SPIRIT: Structural Entropy Guided Prefix Tuning for Hierarchical Text Classification. Entropy, 27(2), 128.
- Matrane, Y., Benabbou, F., and Ellaky, Z. (2024). Enhancing Moroccan Dialect Sentiment Analysis through Optimized Preprocessing and transfer learning Techniques. IEEE Access.
- Ghosh, S. (2024). Theory and Applications of the Eshelby Ellipsoidal Elastic Inclusion Problem.
- Moqbel, M., and Jain, A. (2025). Mining the truth: A text mining approach to understanding perceived deceptive counterfeits and online ratings. Journal of Retailing and Consumer Services, 84, 104149.
- Kumar, V., Iqbal, M. I., and Rathore, R. (2025). Natural Language Processing (NLP) in Disease Detection—A Discussion of How NLP Techniques Can Be Used to Analyze and Classify Medical Text Data for Disease Diagnosis. AI in Disease Detection: Advancements and Applications, 53-75.
- Yin, S. (2024). The Current State and Challenges of Aspect-Based Sentiment Analysis. Applied and Computational Engineering, 114, 25-31.
- Raghavan, M. (2024). Are you who AI says you are? Exploring the role of Natural Language Processing algorithms for “predicting” personality traits from text (Doctoral dissertation, University of South Florida).
- Semeraro, A., Vilella, S., Improta, R., De Duro, E. S., Mohammad, S. M., Ruffo, G., and Stella, M. (2025). EmoAtlas: An emotional network analyzer of texts that merges psychological lexicons, artificial intelligence, and network science. Behavior Research Methods, 57(2), 77.
- Cai, F., and Liu, X. Data Analytics for Discourse Analysis with Python: The Case of Therapy Talk, by Dennis Tay. New York: Routledge, 2024. ISBN: 9781032419015 (HB: USD 41.24), xiii+ 182 pages. Natural Language Processing, 1-4.
- Ghosh, S. (2023). Stability Analysis of 2nd and 4th Order Runge Kutta Method.
- Wu, Yonghui. "Google’s neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).
- Hettiarachchi, H., Ranasinghe, T., Rayson, P., Mitkov, R., Gaber, M., Premasiri, D., ... and Uyangodage, L. (2024). Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025). arXiv preprint arXiv:2412.16365.
- Das, B. R., and Sahoo, R. (2024). Word Alignment in Statistical Machine Translation: Issues and Challenges. Nov Joun of Appl Sci Res, 1 (6), 01-03.
- Oluwatoki, T. G., Adetunmbi, O. A., and Boyinbode, O. K. A Transformer-Based Yoruba to English Machine Translation (TYEMT) System with Rouge Score.
- UÇKAN, T., and KURT, E. Word Embeddings in NLP. PIONEER AND INNOVATIVE STUDIES IN COMPUTER SCIENCES AND ENGINEERING, 58.
- Pastor, G. C., Monti, J., Mitkov, R., and Hidalgo-Ternero, C. M. (2024). Recent Advances in Multiword Units in Machine Translation and Translation Technology. Recent Advances in Multiword Units in Machine Translation and Translation Technology.
- Fernandes, R. M. Decoding spatial semantics: a comparative analysis of the performance of open-source LLMs against NMT systems in translating EN-PT-BR subtitles (Doctoral dissertation, Universidade de Sã o Paulo).
- Jozić, K. (2024). Testing ChatGPT’s Capabilities as an English-Croatian Machine Translation System in a Real-World Setting: eTranslation versus ChatGPT at the European Central Bank (Doctoral dissertation, University of Zagreb. Faculty of Humanities and Social Sciences. Department of English language and literature).
- Yang, M. (2025). Adaptive Recognition of English Translation Errors Based on Improved Machine Learning Methods. International Journal of High Speed Electronics and Systems, 2540236.
- Linnemann, G. A., and Reimann, L. E. (2024). Artificial Intelligence as a New Field of Activity for Applied Social Psychology–A Reasoning for Broadening the Scope.
- Merkel, S., and Schorr, S. OPP: APPLICATION FIELDS and INNOVATIVE TECHNOLOGIES.
- Kushwaha, N. S., and Singh, P. (2022). Artificial Intelligence based Chatbot: A Case Study. Journal of Management and Service Science (JMSS), 2(1), 1-13.
- Macedo, P., Madeira, R. N., Santos, P. A., Mota, P., Alves, B., and Pereira, C. M. (2024). A Conversational Agent for Empowering People with Parkinson’s Disease in Exercising Through Motivation and Support. Applied Sciences, 15(1), 223.
- Gupta, R., Nair, K., Mishra, M., Ibrahim, B., and Bhardwaj, S. (2024). Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda. International Journal of Information Management Data Insights, 4(1), 100232.
- Foroughi, B., Iranmanesh, M., Yadegaridehkordi, E., Wen, J., Ghobakhloo, M., Senali, M. G., and Annamalai, N. (2025). Factors Affecting the Use of ChatGPT for Obtaining Shopping Information. International Journal of Consumer Studies, 49(1), e70008.
- Jandhyala, V. S. V. (2024). BUILDING AI CHATBOTS AND VIRTUAL ASSISTANTS: A TECHNICAL GUIDE FOR ASPIRING PROFESSIONALS. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND INFORMATION TECHNOLOGY (IJRCAIT), 7(2), 448-463.
- Pavlović, N., and Savić, M. (2024). The Impact of the ChatGPT Platform on Consumer Experience in Digital Marketing and User Satisfaction. Theoretical and Practical Research in Economic Fields, 15(3), 636-646.
- Mannava, V., Mitrevski, A., and Plöger, P. G. (2024, August). Exploring the Suitability of Conversational AI for Child-Robot Interaction. In 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN) (pp. 1821-1827). IEEE.
- Sherstinova, T., Mikhaylovskiy, N., Kolpashchikova, E., and Kruglikova, V. (2024, April). Bridging Gaps in Russian Language Processing: AI and Everyday Conversations. In 2024 35th Conference of Open Innovations Association (FRUCT) (pp. 665-674). IEEE.
- Lipton, Z. C. (2015). A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv Preprint, CoRR, abs/1506.00019.
- Pascanu, R. (2013). On the difficulty of training recurrent neural networks. arXiv preprint arXiv:1211.5063.
- Jaeger, H. (2001). The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report, 148(34), 13.
- Hochreiter, S. (1997). Long Short-term Memory. Neural Computation MIT-Press.
- Kawakami, K. (2008). Supervised sequence labelling with recurrent neural networks (Doctoral dissertation, Ph. D. thesis).
- Bengio, Y., Simard, P., and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), 157-166.
- Bhattamishra, S., Patel, A., and Goyal, N. (2020). On the computational power of transformers and its implications in sequence modeling. arXiv preprint arXiv:2006.09286.
- Siegelmann, H. T. (1993). Theoretical foundations of recurrent neural networks.
- Sutton, R. S. (2018). Reinforcement learning: An introduction. A Bradford Book.
- Barto, A. G. (2021). Reinforcement Learning: An Introduction. By Richard’s Sutton. SIAM Rev, 6(2), 423.
- Bertsekas, D. P. (1996). Neuro-dynamic programming. Athena Scientific.
- Kakade, S. M. (2003). On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom).
- Szepesvári, C. (2022). Algorithms for reinforcement learning. Springer nature.
- Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, July). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861-1870). PMLR.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... and Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529-533.
- Konda, V., and Tsitsiklis, J. (1999). Actor-critic algorithms. Advances in neural information processing systems, 12.
- Levine, S. (2018). Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
- Mannor, S., Mansour, Y., and Tamar, A. (2022). Reinforcement Learning: Foundations. Online manuscript.
- Borkar, V. S., and Borkar, V. S. (2008). Stochastic approximation: a dynamical systems viewpoint (Vol. 9). Cambridge: Cambridge University Press.
- Takhsha, Amir Reza, Maryam Rastgarpour, and Mozhgan Naderi. "A Feature-Level Ensemble Model for COVID-19 Identification in CXR Images using Choquet Integral and Differential Evolution Optimization." arXiv preprint arXiv:2501.08241 (2025).
- Singh, P., and Raman, B. (2025). Graph Neural Networks: Extending Deep Learning to Graphs. In Deep Learning Through the Prism of Tensors (pp. 423-482). Singapore: Springer Nature Singapore.
- Yao, L., Shi, Q., Yang, Z., Shao, S., and Hariri, S. (2024). Development of an Edge Resilient ML Ensemble to Tolerate ICS Adversarial Attacks. arXiv preprint arXiv:2409.18244.
- Chen, K., Bi, Z., Niu, Q., Liu, J., Peng, B., Zhang, S., ... and Feng, P. (2024). Deep learning and machine learning, advancing big data analytics and management: Tensorflow pretrained models. arXiv preprint arXiv:2409.13566.
- Dumić, E. (2024). Learning neural network design with TensorFlow and Keras. In ICERI2024 Proceedings (pp. 10689-10696). IATED.
- Bajaj, K., Bordoloi, D., Tripathy, R., Mohapatra, S. K., Sarangi, P. K., and Sharma, P. (2024, September). Convolutional Neural Network Based on TensorFlow for the Recognition of Handwritten Digits in the Odia. In 2024 International Conference on Advances in Computing Research on Science Engineering and Technology (ACROSET) (pp. 1-5). IEEE.
- Abbass, A. M., and Fyath, R. S. (2024). Enhanced approach for artificial neural network-based optical fiber channel modeling: Geometric constellation shaping WDM system as a case study. Journal of Applied Research and Technology, 22(6), 768-780.
- Prabha, D., Subramanian, R. S., Dinesh, M. G., and Girija, P. (2024). Sustainable Farming Through AI-Enabled Precision Agriculture. In Artificial Intelligence for Precision Agriculture (pp. 159-182). Auerbach Publications.
- Abdelmadjid, S. A. A. D., and Abdeldjallil, A. I. D. I. (2024, November). Optimized Deep Learning Models For Edge Computing: A Comparative Study on Raspberry PI4 For Real-Time Plant Disease Detection. In 2024 4th International Conference on Embedded and Distributed Systems (EDiS) (pp. 273-278). IEEE.
- Mlambo, F. (2024). What are Bayesian Neural Networks?
- Team, G. Y. Bifang: A New Free-Flying Cubic Robot for Space Station.
- Tabel, L. (2024). Delay Learning in Spiking.
- Naderi, S., Chen, B., Yang, T., Xiang, J., Heaney, C. E., Latham, J. P., ... and Pain, C. C. (2024). A discrete element solution method embedded within a Neural Network. Powder Technology, 448, 120258.
- Polaka, S. K. R. (2024). Verifica delle reti neurali per l’apprendimento rinforzato sicuro.
- Erdogan, L. E., Kanakagiri, V. A. R., Keutzer, K., and Dong, Z. (2024). Stochastic Communication Avoidance for Recommendation Systems. arXiv preprint arXiv:2411.01611.
- Liao, F., Tang, Y., Du, Q., Wang, J., Li, M., and Zheng, J. (2024). Domain Progressive Low-dose CT Imaging using Iterative Partial Diffusion Model. IEEE Transactions on Medical Imaging.
- Sekhavat, Y. (2024). Looking for creative basis of artificial intelligence art in the midst of order and chaos based on Nietzsche’s theories. Theoretical Principles of Visual Arts.
- Cai, H., Yang, Y., Tang, Y., Sun, Z., and Zhang, W. (2025). Shapley value-based class activation mapping for improved explainability in neural networks. The Visual Computer, 1-19.
- Na, W. (2024). Rach-Space: Novel Ensemble Learning Method With Applications in Weakly Supervised Learning (Master’s thesis, Tufts University).
- Khajah, M. M. (2024). Supercharging BKT with Multidimensional Generalizable IRT and Skill Discovery. Journal of Educational Data Mining, 16(1), 233-278.
- Zhang, Y., Duan, Z., Huang, Y., and Zhu, F. (2024). Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs. arXiv preprint arXiv:2403.18535.
- Wang, L., and Huang, W. (2025). On the convergence analysis of over-parameterized variational autoencoders: a neural tangent kernel perspective. Machine Learning, 114(1), 15.
- Li, C. N., Liang, H. P., Zhao, B. Q., Wei, S. H., and Zhang, X. (2024). Machine learning assisted crystal structure prediction made simple. Journal of Materials Informatics, 4(3), N-A.
- Huang, Y. (2024). Research Advanced in Image Generation Based on Diffusion Probability Model. Highlights in Science, Engineering and Technology, 85, 452-456.
- Chenebuah, E. T. (2024). Artificial Intelligence Simulation and Design of Energy Materials with Targeted Properties (Doctoral dissertation, Université d’Ottawa| University of Ottawa).
- Furth, N., Imel, A., and Zawodzinski, T. A. (2024, November). Graph Encoders for Redox Potentials and Solubility Predictions. In Electrochemical Society Meeting Abstracts prime2024 (No. 3, pp. 344-344). The Electrochemical Society, Inc.
- Gong, J., Deng, Z., Xie, H., Qiu, Z., Zhao, Z., and Tang, B. Z. (2025). Deciphering Design of Aggregation-Induced Emission Materials by Data Interpretation. Advanced Science, 12(3), 2411345.
- Kim, H., Lee, C. H., and Hong, C. (2024, July). VATMAN: Video Anomaly Transformer for Monitoring Accidents and Nefariousness. In 2024 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-7). IEEE.
- Albert, S. W., Doostan, A., and Schaub, H. (2024). Dimensionality Reduction for Onboard Modeling of Uncertain Atmospheres. Journal of Spacecraft and Rockets, 1-13.
- Sharma, D. K., Hota, H. S., and Rababaah, A. R. (2024). Machine Learning for Real World Applications (Doctoral dissertation, Department of Computer Science and Engineering, Indian Institute of Technology Patna).
- Li, T., Shi, Z., Dale, S. G., Vignale, G., and Lin, M. Jrystal: A JAX-based Differentiable Density Functional Theory Framework for Materials.
- Ghosh, S., & Bhattacharya, B. (2022). A nested hierarchy of second order upper bounds on system failure probability. Probabilistic Engineering Mechanics, 70, 103335.
- Bieberich, S., Li, P., Ngai, J., Patel, K., Vogt, R., Ranade, P., ... and Stafford, S. (2024). Conducting Quantum Machine Learning Through The Lens of Solving Neural Differential Equations On A Theoretical Fault Tolerant Quantum Computer: Calibration and Benchmarking.
- Dagréou, M., Ablin, P., Vaiter, S., and Moreau, T. (2024). How to compute Hessian-vector products?. In The Third Blogpost Track at ICLR 2024.
- Lohoff, J., and Neftci, E. (2024). Optimizing Automatic Differentiation with Deep Reinforcement Learning. arXiv preprint arXiv:2406.05027.
- Legrand, N., Weber, L., Waade, P. T., Daugaard, A. H. M., Khodadadi, M., Mikuš, N., and Mathys, C. (2024). pyhgf: A neural network library for predictive coding. arXiv preprint arXiv:2410.09206.
- Alzás, P. B., and Radev, R. (2024). Differentiable nuclear deexcitation simulation for low energy neutrino physics. arXiv preprint arXiv:2404.00180.
- Edenhofer, G., Frank, P., Roth, J., Leike, R. H., Guerdi, M., Scheel-Platz, L. I., ... and Enßlin, T. A. (2024). Re-envisioning numerical information field theory (NIFTy. re): A library for Gaussian processes and variational inference. arXiv preprint arXiv:2402.16683.
- Chan, S., Kulkarni, P., Paul, H. Y., and Parekh, V. S. (2024, September). Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification. In 2024 IEEE International Conference on Quantum Computing and Engineering (QCE) (Vol. 1, pp. 572-582). IEEE.
- Ye, H., Hu, Z., Yin, R., Boyko, T. D., Liu, Y., Li, Y., ... and Li, Y. (2025). Electron transfer at birnessite/organic compound interfaces: mechanism, regulation, and two-stage kinetic discrepancy in structural rearrangement and decomposition. Geochimica et Cosmochimica Acta, 388, 253-267.
- Khan, M., Ludl, A. A., Bankier, S., Björkegren, J. L., and Michoel, T. (2024). Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables. PLoS genetics, 20(11), e1011473.
- Ojala, K., and Zhou, C. (2024). Determination of outdoor object distances from monocular thermal images.
- Popordanoska, T., and Blaschko, M. (2024). Advancing Calibration in Deep Learning: Theory, Methods, and Applications.
- Alfieri, A., Cortes, J. M. P., Pastore, E., Castiglione, C., and Rey, G. M. Z. A Deep Q-Network Approach to Job Shop Scheduling with Transport Resources.
- Zanardelli, R. (2025). Statistical learning methods for decision-making, with applications in Industry 4.0.
- Norouzi, M., Hosseini, S. H., Khoshnevisan, M., and Moshiri, B. (2025). Applications of pre-trained CNN models and data fusion techniques in Unity3D for connected vehicles. Applied Intelligence, 55(6), 390.
- Wang, R., Yang, T., Liang, C., Wang, M., and Ci, Y. (2025). Reliable Autonomous Driving Environment Perception: Uncertainty Quantification of Semantic Segmentation. Journal of Transportation Engineering, Part A: Systems, 151(3), 04024117.
- Xia, Q., Chen, P., Xu, G., Sun, H., Li, L., and Yu, G. (2024). Adaptive Path-Tracking Controller Embedded With Reinforcement Learning and Preview Model for Autonomous Driving. IEEE Transactions on Vehicular Technology.
- Liu, Q., Tang, Y., Li, X., Yang, F., Wang, K., and Li, Z. (2024). MV-STGHAT: Multi-View Spatial-Temporal Graph Hybrid Attention Network for Decision-Making of Connected and Autonomous Vehicles. IEEE Transactions on Vehicular Technology.
- Chakraborty, D., and Deka, B. (2025). Deep Learning-based Selective Feature Fusion for Litchi Fruit Detection using Multimodal UAV Sensor Measurements. IEEE Transactions on Artificial Intelligence.
- Mirindi, D., Khang, A., and Mirindi, F. (2025). Artificial Intelligence (AI) and Automation for Driving Green Transportation Systems: A Comprehensive Review. Driving Green Transportation System Through Artificial Intelligence and Automation: Approaches, Technologies and Applications, 1-19.
- Choudhury, B., Rajakumar, K., Badhale, A. A., Roy, A., Sahoo, R., and Margret, I. N. (2024, June). Comparative Analysis of Advanced Models for Satellite-Based Aircraft Identification. In 2024 International Conference on Smart Systems for Electrical, Electronics, Communication and Computer Engineering (ICSSEECC) (pp. 483-488). IEEE.
- Almubarok, W., Rosiani, U. D., and Asmara, R. A. (2024, November). MobileNetV2 Pruning for Improved Efficiency in Catfish Classification on Resource-Limited Devices. In 2024 IEEE 10th Information Technology International Seminar (ITIS) (pp. 271-277). IEEE.
- Ding, Q. (2024, February). Classification Techniques of Tongue Manifestation Based on Deep Learning. In 2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) (pp. 802-810). IEEE.
- He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
- Sultana, F., Sufian, A., and Dutta, P. (2018, November). Advancements in image classification using convolutional neural network. In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 122-129). IEEE.
- Sattler, T., Zhou, Q., Pollefeys, M., and Leal-Taixe, L. (2019). Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3302-3312).
- Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
- Nannepagu, M., Babu, D. B., and Madhuri, C. B. Leveraging Hybrid AI Models: DQN, Prophet, BERT, ART-NN, and Transformer-Based Approaches for Advanced Stock Market Forecasting.
- De Rose, L., Andresini, G., Appice, A., and Malerba, D. (2024). VINCENT: Cyber-threat detection through vision transformers and knowledge distillation. Computers and Security, 103926.
- Buehler, M. J. (2025). Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers. arXiv preprint arXiv:2501.02393.
- Tabibpour, S. A., and Madanizadeh, S. A. (2024). Solving High-Dimensional Dynamic Programming Using Set Transformer. Available at SSRN 5040295.
- Li, S., and Dong, P. (2024, October). Mixed Attention Transformer Enhanced Channel Estimation for Extremely Large-Scale MIMO Systems. In 2024 16th International Conference on Wireless Communications and Signal Processing (WCSP) (pp. 394-399). IEEE.
- Asefa, S. H., and Assabie, Y. (2024). Transformer-Based Amharic-to-English Machine Translation with Character Embedding and Combined Regularization Techniques. IEEE Access.
- Liao, M., and Chen, M. (2024, November). A new deepfake detection method by vision transformers. In International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) (Vol. 13403, pp. 953-957). SPIE.
- Jiang, L., Cui, J., Xu, Y., Deng, X., Wu, X., Zhou, J., and Wang, Y. (2024, August). SCFormer: Spatial and Channel-wise Transformer with Contrastive Learning for High-Quality PET Image Reconstruction. In 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 26-31). IEEE.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... and Bengio, Y. (2014). Generative adversarial nets. Advances in neural information processing systems, 27.
- CHAPPIDI, J., and SUNDARAM, D. M. (2024). DUAL Q-LEARNING WITH GRAPH NEURAL NETWORKS: A NOVEL APPROACH TO ANIMAL DETECTION IN CHALLENGING ECOSYSTEMS. Journal of Theoretical and Applied Information Technology, 102(23).
- Joni, R. (2024). Delving into Deep Learning: Illuminating Techniques and Visual Clarity for Image Analysis (No. 12808). EasyChair.
- Kalaiarasi, G., Sudharani, B., Jonnalagadda, S. C., Battula, H. V., and Sanagala, B. (2024, July). A Comprehensive Survey of Image Steganography. In 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS) (pp. 1225-1230). IEEE.
- Arjmandi-Tash, A. M., Mansourian, A., Rahsepar, F. R., and Abdi, Y. (2024). Predicting Photodetector Responsivity through Machine Learning. Advanced Theory and Simulations, 2301219.
- Gao, Y. (2024). Neural networks meet applied mathematics: GANs, PINNs, and transformers. HKU Theses Online (HKUTO).
- Hisama, K., Ishikawa, A., Aspera, S. M., and Koyama, M. (2024). Theoretical Catalyst Screening of Multielement Alloy Catalysts for Ammonia Synthesis Using Machine Learning Potential and Generative Artificial Intelligence. The Journal of Physical Chemistry C, 128(44), 18750-18758.
- Wang, M., and Zhang, Y. (2024). Image Segmentation in Complex Backgrounds using an Improved Generative Adversarial Network. International Journal of Advanced Computer Science and Applications, 15(5).
- Alonso, N. I., and Arias, F. (2025). The Mathematics of Q-Learning and the Hamilton-Jacobi-Bellman Equation. Fernando, The Mathematics of Q-Learning and the Hamilton-Jacobi-Bellman Equation (January 05, 2025).
- Lu, C., Shi, L., Chen, Z., Wu, C., and Wierman, A. (2024). Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization. arXiv preprint arXiv:2411.07591.
- Humayoo, M. (2024). Time-Scale Separation in Q-Learning: Extending TD (▵) for Action-Value Function Decomposition. arXiv preprint arXiv:2411.14019.
- Jia, L., Qi, N., Su, Z., Chu, F., Fang, S., Wong, K. K., and Chae, C. B. (2024). Game theory and reinforcement learning for anti-jamming defense in wireless communications: Current research, challenges, and solutions. IEEE Communications Surveys and Tutorials.
- Chai, J., Chen, E., and Fan, J. (2025). Deep Transfer Q-Learning for Offline Non-Stationary Reinforcement Learning. arXiv preprint arXiv:2501.04870.
- Yao, J., and Gong, X. (2024, October). Communication-Efficient and Resilient Distributed Deep Reinforcement Learning for Multi-Agent Systems. In 2024 IEEE International Conference on Unmanned Systems (ICUS) (pp. 1521-1526). IEEE.
- Liu, Y., Yang, T., Tian, L., and Pei, J. (2025). SGD-TripleQNet: An Integrated Deep Reinforcement Learning Model for Vehicle Lane-Change Decision. Mathematics, 13(2), 235.
- Masood, F., Ahmad, J., Al Mazroa, A., Alasbali, N., Alazeb, A., and Alshehri, M. S. (2025). Multi IRS-Aided Low-Carbon Power Management for Green Communication in 6G Smart Agriculture Using Deep Game Theory. Computational Intelligence, 41(1), e70022.
- Patrick, B. Reinforcement Learning for Dynamic Economic Models.
- El Mimouni, I., and Avrachenkov, K. (2025, January). Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems. In Northern Lights Deep Learning Conference 2025.
- Shefin, R. S., Rahman, M. A., Le, T., and Alqahtani, S. (2024). xSRL: Safety-Aware Explainable Reinforcement Learning–Safety as a Product of Explainability. arXiv preprint arXiv:2412.19311.
- Khlifi, A., Othmani, M., and Kherallah, M. (2025). A Novel Approach to Autonomous Driving Using DDQN-Based Deep Reinforcement Learning.
- Kuczkowski, D. (2024). Energy efficient multi-objective reinforcement learning algorithm for traffic simulation.
- Krauss, R., Zielasko, J., and Drechsler, R. Large-Scale Evolutionary Optimization of Artificial Neural Networks Using Adaptive Mutations.
- Ahamed, M. S., Pey, J. J. J., Samarakoon, S. B. P., Muthugala, M. V. J., and Elara, M. R. (2025). Reinforcement Learning for Reconfigurable Robotic Soccer. IEEE Access.
- Elmquist, A., Serban, R., and Negrut, D. (2024). A methodology to quantify simulation-vs-reality differences in images for autonomous robots. IEEE Sensors Journal.
- Kobanda, A., Portelas, R., Maillard, O. A., and Denoyer, L. (2024). Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning. arXiv preprint arXiv:2412.14865.
- Xu, J., Xie, G., Zhang, Z., Hou, X., Zhang, S., Ren, Y., and Niyato, D. (2025). UPEGSim: An RL-Enabled Simulator for Unmanned Underwater Vehicles Dedicated in the Underwater Pursuit-Evasion Game. IEEE Internet of Things Journal, 12(3), 2334-2346.
- Patadiya, K., Jain, R., Moteriya, J., Palaniappan, D., Kumar, P., and Premavathi, T. (2024, December). Application of Deep Learning to Generate Auto Player Mode in Car Based Game. In 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 233-237). IEEE.
- Janjua, J. I., Kousar, S., Khan, A., Ihsan, A., Abbas, T., and Saeed, A. Q. (2024, December). Enhancing Scalability in Reinforcement Learning for Open Spaces. In 2024 International Conference on Decision Aid Sciences and Applications (DASA) (pp. 1-8). IEEE.
- Yang, L., Li, Y., Wang, J., and Sherratt, R. S. (2020). Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE access, 8, 23522-23530.
- Manikandan, C., Kumar, P. S., Nikitha, N., Sanjana, P. G., and Dileep, Y. Filtering Emails Using Natural Language Processing.
- ISIAKA, S. O., BABATUNDE, R. S., and ISIAKA, R. M. Exploring Artificial Intelligence (AI) Technologies in Predictive Medicine: A Systematic Review.
- Petrov, A., Zhao, D., Smith, J., Volkov, S., Wang, J., and Ivanov, D. Deep Learning Approaches for Emotional State Classification in Textual Data.
- Liang, M. (2025). Leveraging natural language processing for automated assessment and feedback production in virtual education settings. Journal of Computational Methods in Sciences and Engineering, 14727978251314556.
- Jin, L. (2025). Research on Optimization Strategies of Artificial Intelligence Algorithms for the Integration and Dissemination of Pharmaceutical Science Popularization Knowledge. Scientific Journal of Technology, 7(1), 45-55.
- McNicholas, B. A., Madden, M. G., and Laffey, J. G. (2025). Natural language processing in critical care: opportunities, challenges, and future directions. Intensive Care Medicine, 1-5.
- Abd Al Abbas, M., and Khammas, B. M. (2024). Efficient IoT Malware Detection Technique Using Recurrent Neural Network. Iraqi Journal of Information and Communication Technology, 7(3), 29-42.
- Kalonia, S., and Upadhyay, A. (2025). Deep learning-based approach to predict software faults. In Artificial Intelligence and Machine Learning Applications for Sustainable Development (pp. 326-348). CRC Press.
- Han, S. C., Weld, H., Li, Y., Lee, J., and Poon, J. Natural Language Understanding in Conversational AI with Deep Learning.
- Potter, K., and Egon, A. RECURRENT NEURAL NETWORKS (RNNS) FOR TIME SERIES FORECASTING.
- Yatkin, M. A., Kõrgesaar, M., and Işlak, Ü. (2025). A Topological Approach to Enhancing Consistency in Machine Learning via Recurrent Neural Networks. Applied Sciences, 15(2), 933.
- Saifullah, S. (2024). Comparative Analysis of LSTM and GRU Models for Chicken Egg Fertility Classification using Deep Learning.
- Noguer I Alonso, Miquel, The Mathematics of Recurrent Neural Networks (October 27, 2024). Available at SSRN: https://ssrn.com/abstract=5001243 or . [CrossRef]
- Tu, Z., Jeffries, S. D., Morse, J., and Hemmerling, T. M. (2024). Comparison of time-series models for predicting physiological metrics under sedation. Journal of Clinical Monitoring and Computing, 1-11.
- Zuo, Y., Jiang, J., and Yada, K. (2025). Application of hybrid gate recurrent unit for in-store trajectory prediction based on indoor location system. Scientific Reports, 15(1), 1055.
- Lima, R., Scardua, L. A., and De Almeida, G. M. (2024). Predicting Temperatures Inside a Steel Slab Reheating Furnace Using Neural Networks. Authorea Preprints.
- Khan, S., Muhammad, Y., Jadoon, I., Awan, S. E., and Raja, M. A. Z. (2025). Leveraging LSTM-SMI and ARIMA architecture for robust wind power plant forecasting. Applied Soft Computing, 112765.
- Guo, Z., and Feng, L. (2024). Multi-step prediction of greenhouse temperature and humidity based on temporal position attention LSTM. Stochastic Environmental Research and Risk Assessment, 1-28.
- Abdelhamid, N. M., Khechekhouche, A., Mostefa, K., Brahim, L., and Talal, G. (2024). Deep-RNN based model for short-time forecasting photovoltaic power generation using IoT. Studies in Engineering and Exact Sciences, 5(2), e11461-e11461.
- Rohman, F. N., and Farikhin, B. S. Hyperparameter Tuning of Random Forest Algorithm for Diabetes Classification.
- Rahman, M. Utilizing Machine Learning Techniques for Early Brain Tumor Detection.
- Nandi, A., Singh, H., Majumdar, A., Shaw, A., and Maiti, A. Optimizing Baby Sound Recognition using Deep Learning through Class Balancing and Model Tuning.
- Sianga, B. E., Mbago, M. C., and Msengwa, A. S. (2025). PREDICTING THE PREVALENCE OF CARDIOVASCULAR DISEASES USING MACHINE LEARNING ALGORITHMS. Intelligence-Based Medicine, 100199.
- Li, L., Hu, Y., Yang, Z., Luo, Z., Wang, J., Wang, W., ... and Zhang, Z. (2025). Exploring the assessment of post-cardiac valve surgery pulmonary complication risks through the integration of wearable continuous physiological and clinical data. BMC Medical Informatics and Decision Making, 25(1), 1-11.
- Lázaro, F. L., Madeira, T., Melicio, R., Valério, D., and Santos, L. F. (2025). Identifying Human Factors in Aviation Accidents with Natural Language Processing and Machine Learning Models. Aerospace, 12(2), 106.
- Li, Z., Zhong, J., Wang, H., Xu, J., Li, Y., You, J., ... and Dev, S. (2025). RAINER: A Robust Ensemble Learning Grid Search-Tuned Framework for Rainfall Patterns Prediction. arXiv preprint arXiv:2501.16900.
- Khurshid, M. R., Manzoor, S., Sadiq, T., Hussain, L., Khan, M. S., and Dutta, A. K. (2025). Unveiling diabetes onset: Optimized XGBoost with Bayesian optimization for enhanced prediction. PloS one, 20(1), e0310218.
- Kanwar, M., Pokharel, B., and Lim, S. (2025). A new random forest method for landslide susceptibility mapping using hyperparameter optimization and grid search techniques. International Journal of Environmental Science and Technology, 1-16.
- Fadil, M., Akrom, M., and Herowati, W. (2025). Utilization of Machine Learning for Predicting Corrosion Inhibition by Quinoxaline Compounds. Journal of Applied Informatics and Computing, 9(1), 173-177.
- Ghosh, S. (2020). Counting connected labeled graphs.
- Emmanuel, J., Isewon, I., and Oyelade, J. (2025). An Optimized Deep-Forest Algorithm Using a Modified Differential Evolution Optimization Algorithm: A Case of Host-Pathogen Protein-Protein Interaction Prediction. Computational and Structural Biotechnology Journal.
- Gaurav, A., Gupta, B. B., Attar, R. W., Alhomoud, A., Arya, V., and Chui, K. T. (2025). Driver identification in advanced transportation systems using osprey and salp swarm optimized random forest model. Scientific Reports, 15(1), 2453.
- Ning, C., Ouyang, H., Xiao, J., Wu, D., Sun, Z., Liu, B., ... and Huang, G. (2025). Development and validation of an explainable machine learning model for mortality prediction among patients with infected pancreatic necrosis. eClinicalMedicine, 80.
- Muñoz, V., Ballester, C., Copaci, D., Moreno, L., and Blanco, D. (2025). Accelerating hyperparameter optimization with a secretary. Neurocomputing, 129455.
- Balcan, M. F., Nguyen, A. T., and Sharma, D. (2025). Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function. arXiv preprint arXiv:2501.13734.
- Azimi, H., Kalhor, E. G., Nabavi, S. R., Behbahani, M., and Vardini, M. T. (2025). Data-based modeling for prediction of supercapacitor capacity: Integrated machine learning and metaheuristic algorithms. Journal of the Taiwan Institute of Chemical Engineers, 170, 105996.
- Shibina, V., and Thasleema, T. M. (2025). Voice feature-based diagnosis of Parkinson’s disease using nature inspired squirrel search algorithm with ensemble learning classifiers. Iran Journal of Computer Science, 1-25.
- Chang, F., Dong, S., Yin, H., Ye, X., Wu, Z., Zhang, W., and Zhu, H. (2025). 3D displacement time series prediction of a north-facing reservoir landslide powered by InSAR and machine learning. Journal of Rock Mechanics and Geotechnical Engineering.
- Cihan, P. (2025). Bayesian Hyperparameter Optimization of Machine Learning Models for Predicting Biomass Gasification Gases. Applied Sciences, 15(3), 1018.
- Makomere, R., Rutto, H., Alugongo, A., Koech, L., Suter, E., and Kohitlhetse, I. (2025). Enhanced dry SO2 capture estimation using Python-driven computational frameworks with hyperparameter tuning and data augmentation. Unconventional Resources, 100145.
- Bakır, H. (2025). A new method for tuning the CNN pre-trained models as a feature extractor for malware detection. Pattern Analysis and Applications, 28(1), 26.
- Liu, Y., Yin, H., and Li, Q. (2025). Sound absorption performance prediction of multi-dimensional Helmholtz resonators based on deep learning and hyperparameter optimization. Physica Scripta.
- Ma, Z., Zhao, M., Dai, X., and Chen, Y. (2025). Anomaly detection for high-speed machining using hybrid regularized support vector data description. Robotics and Computer-Integrated Manufacturing, 94, 102962.
- Kotnik, T., & van de Lune, J. (2004). On the order of the Mertens function. Experimental Mathematics, 13(4), 473-481.
- Hurst, G. (2018). Computations of the Mertens function and improved bounds on the Mertens conjecture. Mathematics of Computation, 87(310), 1013-1028.
- El-Bouzaidi, Y. E. I., Hibbi, F. Z., and Abdoun, O. (2025). Optimizing Convolutional Neural Network Impact of Hyperparameter Tuning and Transfer Learning. In Innovations in Optimization and Machine Learning (pp. 301-326). IGI Global Scientific Publishing.
- Mustapha, B., Zhou, Y., Shan, C., and Xiao, Z. (2025). Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks. Current Medical Imaging, e15734056326685.
- Adly, S., and Attouch, H. (2024). Complexity Analysis Based on Tuning the Viscosity Parameter of the Su-Boyd-Candès Inertial Gradient Dynamics. Set-Valued and Variational Analysis, 32(2), 17.
- Wang, Z., and Peypouquet, J. G. Nesterov’s Accelerated Gradient Method for Strongly Convex Functions: From Inertial Dynamics to Iterative Algorithms.
- Hermant, J., Renaud, M., Aujol, J. F., and Rondepierre, C. D. A. (2024). Nesterov momentum for convex functions with interpolation: is it faster than Stochastic gradient descent?. Book of abstracts PGMO DAYS 2024, 68.
- Alavala, S., and Gorthi, S. (2024). 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement. arXiv preprint arXiv:2406.08048.
- Li, C. J. (2024). Unified Momentum Dynamics in Stochastic Gradient Optimization. Available at SSRN 4981009.
- Gupta, K., and Wojtowytsch, S. (2024). Nesterov acceleration in benignly non-convex landscapes. arXiv preprint arXiv:2410.08395.
- Razzouki, O. F., Charroud, A., El Allali, Z., Chetouani, A., and Aslimani, N. (2024, December). A Survey of Advanced Gradient Methods in Machine Learning. In 2024 7th International Conference on Advanced Communication Technologies and Networking (CommNet) (pp. 1-7). IEEE.
- Wang, J., Du, B., Su, Z., Hu, K., Yu, J., Cao, C., ... and Guo, H. (2025). A fast LMS-based digital background calibration technique for 16-bit SAR ADC with modified shuffling scheme. Microelectronics Journal, 156, 106547.
- Naeem, K., Bukhari, A., Daud, A., Alsahfi, T., Alshemaimri, B., and Alhajlah, M. (2024). Machine Learning and Deep Learning Optimization Algorithms for Unconstrained Convex Optimization Problem. IEEE Access.
- Campos, C. M., de Diego, D. M., and Torrente, J. (2024). Momentum-based gradient descent methods for Lie groups. arXiv preprint arXiv:2404.09363.
- Jing Li, Hewan Chen, Mohd Shahizan Othman, Naomie Salim, Lizawati Mi Yusuf, Shamini Raja Kumaran, NFIoT-GATE-DTL IDS: Genetic algorithm-tuned ensemble of deep transfer learning for NetFlow-based intrusion detection system for internet of things, Engineering Applications of Artificial Intelligence, Volume 143, 2025, 110046, ISSN 0952-1976. [CrossRef]
- GÜL, M.F., Bakır, H. GA-ML: enhancing the prediction of water electrical conductivity through genetic algorithm-based end-to-end hyperparameter tuning. Earth Sci Inform 18, 191 (2025). [CrossRef]
- Sen, A., Sen, U., Paul, M., Padhy, A. P., Sai, S., Mallik, A., and Mallick, C. (2025). QGAPHEnsemble: Combining Hybrid QLSTM Network Ensemble via Adaptive Weighting for Short Term Weather Forecasting. arXiv preprint arXiv:2501.10866.
- Roy, A., Sen, A., Gupta, S., Haldar, S., Deb, S., Vankala, T. N., and Das, A. (2025). DeepEyeNet: Adaptive Genetic Bayesian Algorithm Based Hybrid ConvNeXtTiny Framework For Multi-Feature Glaucoma Eye Diagnosis. arXiv preprint arXiv:2501.11168.
- Jiang, T., Lu, W., Lu, L., Xu, L., Xi, W., Liu, J., and Zhu, Y. (2025). Inlet Passage Hydraulic Performance Optimization of Coastal Drainage Pump System Based on Machine Learning Algorithms. Journal of Marine Science and Engineering, 13(2), 274.
- Borah, J., and Chandrasekaran, M. (2025). Application of Machine Learning-Based Approach to Predict and Optimize Mechanical Properties of Additively Manufactured Polyether Ether Ketone Biopolymer Using Fused Deposition Modeling. Journal of Materials Engineering and Performance, 1-17.
- Tan, Q., He, D., Sun, Z., Yao, Z., zhou, J. X., and Chen, T. (2025). A deep reinforcement learning based metro train operation control optimization considering energy conservation and passenger comfort. Engineering Research Express.
- García-Galindo, A., López-De-Castro, M., and Armañanzas, R. (2025). Fair prediction sets through multi-objective hyperparameter optimization. Machine Learning, 114(1), 27.
- Montufar, G. F., Pascanu, R., Cho, K., and Bengio, Y. (2014). On the number of linear regions of deep neural networks. Advances in neural information processing systems, 27.
- Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function.
- Yarotsky, D. (2017). Error bounds for approximations with deep ReLU networks. Neural networks, 94, 103-114.
- Telgarsky, M. (2016, June). Benefits of depth in neural networks. In Conference on learning theory (pp. 1517-1539). PMLR.
- Lu, Z., Pu, H., Wang, F., Hu, Z., and Wang, L. (2017). The expressive power of neural networks: A view from the width. Advances in neural information processing systems, 30.
- Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107-115.
- Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80.
- Kipf, T. N., and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
- Hamilton, W., Ying, Z., and Leskovec, J. (2017). Inductive representation learning on large graphs. Advances in neural information processing systems, 30.
- Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903.
- Xu, K., Hu, W., Leskovec, J., and Jegelka, S. (2018). How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826.
- Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017, July). Neural message passing for quantum chemistry. In International conference on machine learning (pp. 1263-1272). PMLR.
- Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., ... and Pascanu, R. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.
- Bruna, J., Zaremba, W., Szlam, A., and LeCun, Y. (2013). Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203.
- Ying, R., He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018, July). Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 974-983).
- Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., ... and Sun, M. (2020). Graph neural networks: A review of methods and applications. AI open, 1, 57-81.
- Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378, 686-707.
- Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L. (2021). Physics-informed machine learning. Nature Reviews Physics, 3(6), 422-440.
- Lu, L., Meng, X., Mao, Z., and Karniadakis, G. E. (2021). DeepXDE: A deep learning library for solving differential equations. SIAM review, 63(1), 208-228.
- Sirignano, J., and Spiliopoulos, K. (2018). DGM: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375, 1339-1364.
- Wang, S., Teng, Y., and Perdikaris, P. (2021). Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing, 43(5), A3055-A3081.
- Mishra, S., and Molinaro, R. (2023). Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA Journal of Numerical Analysis, 43(1), 1-43.
- Zhang, D., Guo, L., and Karniadakis, G. E. (2020). Learning in modal space: Solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM Journal on Scientific Computing, 42(2), A639-A665.
- Jin, X., Cai, S., Li, H., and Karniadakis, G. E. (2021). NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. Journal of Computational Physics, 426, 109951.
- Chen, Y., Lu, L., Karniadakis, G. E., and Dal Negro, L. (2020). Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Optics express, 28(8), 11618-11633.
- Psichogios, D. C., and Ungar, L. H. (1992). A hybrid neural network-first principles approach to process modeling. AIChE Journal, 38(10), 1499-1511.
- Chizat, L., and Bach, F. (2018). On the global convergence of gradient descent for over-parameterized models using optimal transport. Advances in neural information processing systems, 31.
- Du, S., Lee, J., Li, H., Wang, L., and Zhai, X. (2019, May). Gradient descent finds global minima of deep neural networks. In International conference on machine learning (pp. 1675-1685). PMLR.
- Arora, S., Du, S., Hu, W., Li, Z., and Wang, R. (2019, May). Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning (pp. 322-332). PMLR.
- Allen-Zhu, Z., Li, Y., and Song, Z. (2019, May). A convergence theory for deep learning via over-parameterization. In International conference on machine learning (pp. 242-252). PMLR.
- Cao, Y., and Gu, Q. (2019). Generalization bounds of stochastic gradient descent for wide and deep neural networks. Advances in neural information processing systems, 32.
- Yang, G. (2019). Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation. arXiv preprint arXiv:1902.04760.
- Huang, J., and Yau, H. T. (2020, November). Dynamics of deep neural networks and neural tangent hierarchy. In International conference on machine learning (pp. 4542-4551). PMLR.
- Belkin, M., Ma, S., and Mandal, S. (2018, July). To understand deep learning we need to understand kernel learning. In International Conference on Machine Learning (pp. 541-549). PMLR.
- Sra, S., Nowozin, S., and Wright, S. J. (Eds.). (2011). Optimization for machine learning. Mit Press.
- Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2015, February). The loss surfaces of multilayer networks. In Artificial intelligence and statistics (pp. 192-204). PMLR.
- Arora, S., Cohen, N., and Hazan, E. (2018, July). On the optimization of deep networks: Implicit acceleration by overparameterization. In International conference on machine learning (pp. 244-253). PMLR.
- Baratin, A., George, T., Laurent, C., Hjelm, R. D., Lajoie, G., Vincent, P., and Lacoste-Julien, S. (2020). Implicit regularization in deep learning: A view from function space. arXiv preprint arXiv:2008.00938.
- Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. (2018, July). The mechanics of n-player differentiable games. In International Conference on Machine Learning (pp. 354-363). PMLR.
- Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in mathematics and statistics, 5(4), 349-380.
- Beck, C., Becker, S., Grohs, P., Jaafari, N., and Jentzen, A. (2021). Solving the Kolmogorov PDE by means of deep learning. Journal of Scientific Computing, 88, 1-28.
- Han, J., Jentzen, A., and E, W. (2018). Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34), 8505-8510.
- Jentzen, A., Salimova, D., and Welti, T. (2018). A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. arXiv preprint arXiv:1809.07321.
- Yu, B. (2018). The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1), 1-12.
- Khoo, Y., Lu, J., and Ying, L. (2021). Solving parametric PDE problems with artificial neural networks. European Journal of Applied Mathematics, 32(3), 421-435.
- Hutzenthaler, M., and Kruse, T. (2020). Multilevel Picard approximations of high-dimensional semilinear parabolic differential equations with gradient-dependent nonlinearities. SIAM Journal on Numerical Analysis, 58(2), 929-961.
- Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 18(185), 1-52.
- Guha, K., & Ghosh, S. (2021). On the Generalization of Locker Problem.
- Falkner, S., Klein, A., and Hutter, F. (2018, July). BOHB: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning (pp. 1437-1446). PMLR.
- Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Ben-Tzur, J., Hardt, M., ... and Talwalkar, A. (2020). A system for massively parallel hyperparameter tuning. Proceedings of Machine Learning and Systems, 2, 230-246.
- Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.
- Slivkins, A., Zhou, X., Sankararaman, K. A., and Foster, D. J. (2024). Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression. Journal of Machine Learning Research, 25(394), 1-37.
- Hazan, E., Klivans, A., and Yuan, Y. (2017). Hyperparameter optimization: A spectral approach. arXiv preprint arXiv:1706.00764.
- Domhan, T., Springenberg, J. T., and Hutter, F. (2015, June). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence.
- Agrawal, T. (2021). Hyperparameter optimization in machine learning: make your machine learning and deep learning models more efficient (pp. 109-129). New York, NY, USA:: Apress.
- Shekhar, S., Bansode, A., and Salim, A. (2021, December). A comparative study of hyper-parameter optimization tools. In 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (pp. 1-6). IEEE.
- Bergstra, J., Bardenet, R., Bengio, Y., and Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems, 24.
- Zoph, B. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578.
- Maclaurin, D., Duvenaud, D., and Adams, R. (2015, June). Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning (pp. 2113-2122). PMLR.
- Pedregosa, F. (2016, June). Hyperparameter optimization with approximate gradient. In International conference on machine learning (pp. 737-746). PMLR.
- Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., and Pontil, M. (2018, July). Bilevel programming for hyperparameter optimization and meta-learning. In International conference on machine learning (pp. 1568-1577). PMLR.
- Franceschi, L., Donini, M., Frasconi, P., and Pontil, M. (2017, July). Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning (pp. 1165-1173). PMLR.
- Liu, H., Simonyan, K., and Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055.
- Lorraine, J., Vicol, P., and Duvenaud, D. (2020, June). Optimizing millions of hyperparameters by implicit differentiation. In International conference on artificial intelligence and statistics (pp. 1540-1552). PMLR.
- Liang, J., Gonzalez, S., Shahrzad, H., and Miikkulainen, R. (2021, June). Regularized evolutionary population-based training. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 323-331).
- Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., ... and Kavukcuoglu, K. (2017). Population based training of neural networks. arXiv preprint arXiv:1711.09846.
- Co-Reyes, J. D., Miao, Y., Peng, D., Real, E., Levine, S., Le, Q. V., ... and Faust, A. (2021). Evolving reinforcement learning algorithms. arXiv preprint arXiv:2101.03958.
- Song, C., Ma, Y., Xu, Y., and Chen, H. (2024). Multi-population evolutionary neural architecture search with stacked generalization. Neurocomputing, 587, 127664.
- Wan, X., Lu, C., Parker-Holder, J., Ball, P. J., Nguyen, V., Ru, B., and Osborne, M. (2022, September). Bayesian generational population-based training. In International conference on automated machine learning (pp. 14-1). PMLR.
- García-Valdez, M., Mancilla, A., Castillo, O., and Merelo-Guervós, J. J. (2023). Distributed and asynchronous population-based optimization applied to the optimal design of fuzzy controllers. Symmetry, 15(2), 467.
- Akiba, T., Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2623-2631).
- Akiba, T., Shing, M., Tang, Y., Sun, Q., and Ha, D. (2025). Evolutionary optimization of model merging recipes. Nature Machine Intelligence, 1-10.
- Kadhim, Z. S., Abdullah, H. S., and Ghathwan, K. I. (2022). Artificial Neural Network Hyperparameters Optimization: A Survey. International Journal of Online and Biomedical Engineering, 18(15).
- Jeba, J. A. (2021). Case study of Hyperparameter optimization framework Optuna on a Multi-column Convolutional Neural Network (Doctoral dissertation, University of Saskatchewan).
- Sousa, J. R., & Ghosh, S. (2025). A New Generalization of the Riemann Functional Equation.
- Yang, L., and Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295-316.
- Wang, T. (2024). Multi-objective hyperparameter optimisation for edge machine learning.
- Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.
- Hutter, F., Kotthoff, L., and Vanschoren, J. (2019). Automated machine learning: methods, systems, challenges (p. 219). Springer Nature.
- Jamieson, K., and Talwalkar, A. (2016, May). Non-stochastic best arm identification and hyperparameter optimization. In Artificial intelligence and statistics (pp. 240-248). PMLR.
- Schmucker, R., Donini, M., Zafar, M. B., Salinas, D., and Archambeau, C. (2021). Multi-objective asynchronous successive halving. arXiv preprint arXiv:2106.12639.
- Dong, X., Shen, J., Wang, W., Shao, L., Ling, H., and Porikli, F. (2019). Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE transactions on pattern analysis and machine intelligence, 43(5), 1515-1529.
- Rijsdijk, J., Wu, L., Perin, G., and Picek, S. (2021). Reinforcement learning for hyperparameter tuning in deep learning-based side-channel analysis. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021(3), 677-707.
- Jaafra, Y., Laurent, J. L., Deruyver, A., and Naceur, M. S. (2019). Reinforcement learning for neural architecture search: A review. Image and Vision Computing, 89, 57-66.
- Afshar, R. R., Zhang, Y., Vanschoren, J., and Kaymak, U. (2022). Automated reinforcement learning: An overview. arXiv preprint arXiv:2201.05000.
- Wu, J., Chen, S., and Liu, X. (2020). Efficient hyperparameter optimization through model-based reinforcement learning. Neurocomputing, 409, 381-393.
- Iranfar, A., Zapater, M., and Atienza, D. (2021). Multiagent reinforcement learning for hyperparameter optimization of convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 41(4), 1034-1047.
- He, X., Zhao, K., and Chu, X. (2021). AutoML: A survey of the state-of-the-art. Knowledge-based systems, 212, 106622.
- Gomaa, I., Zidane, A., Mokhtar, H. M., and El-Tazi, N. (2022). SML-AutoML: A Smart Meta-Learning Automated Machine Learning Framework.
- Khan, A. N., Khan, Q. W., Rizwan, A., Ahmad, R., and Kim, D. H. (2025). Consensus-Driven Hyperparameter Optimization for Accelerated Model Convergence in Decentralized Federated Learning. Internet of Things, 30, 101476.
- Morrison, N., and Ma, E. Y. (2025). Efficiency of machine learning optimizers and meta-optimization for nanophotonic inverse design tasks. APL Machine Learning, 3(1).
- Berdyshev, D. A., Grachev, A. M., Shishkin, S. L., and Kozyrskiy, B. L. (2024). EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs. arXiv preprint arXiv:2412.19725.
- Pratellesi, C. (2025). Meta Learning for Flow Cytometry Cell Classification (Doctoral dissertation, Technische Universität Wien).
- García, C. A., Gil-de-la-Fuente, A., Barbas, C., and Otero, A. (2022). Probabilistic metabolite annotation using retention time prediction and meta-learned projections. Journal of Cheminformatics, 14(1), 33.
- Deng, L., Raissi, M., and Xiao, M. (2024). Meta-Learning-Based Surrogate Models for Efficient Hyperparameter Optimization. Authorea Preprints.
- Jae, J., Hong, J., Choo, J., and Kwon, Y. D. (2024). Reinforcement learning to learn quantum states for Heisenberg scaling accuracy. arXiv preprint arXiv:2412.02334.
- Upadhyay, R., Phlypo, R., Saini, R., and Liwicki, M. (2025). Meta-Sparsity: Learning Optimal Sparse Structures in Multi-task Networks through Meta-learning. arXiv preprint arXiv:2501.12115.
- Paul, S., Ghosh, S., Das, D., and Sarkar, S. K. (2025). Advanced Methodologies for Optimal Neural Network Design and Performance Enhancement. In Nature-Inspired Optimization Algorithms for Cyber-Physical Systems (pp. 403-422). IGI Global Scientific Publishing.
- Egele, R., Mohr, F., Viering, T., and Balaprakash, P. (2024). The unreasonable effectiveness of early discarding after one epoch in neural network hyperparameter optimization. Neurocomputing, 127964.
- Wojciuk, M., Swiderska-Chadaj, Z., Siwek, K., and Gertych, A. (2024). Improving classification accuracy of fine-tuned CNN models: Impact of hyperparameter optimization. Heliyon, 10(5).
- Geissler, D., Zhou, B., Suh, S., and Lukowicz, P. (2024). Spend More to Save More (SM2): An Energy-Aware Implementation of Successive Halving for Sustainable Hyperparameter Optimization. arXiv preprint arXiv:2412.08526.
- Hosseini Sarcheshmeh, A., Etemadfard, H., Najmoddin, A., and Ghalehnovi, M. (2024). Hyperparameters’ role in machine learning algorithm for modeling of compressive strength of recycled aggregate concrete. Innovative Infrastructure Solutions, 9(6), 212.
- Sankar, S. U., Dhinakaran, D., Selvaraj, R., Verma, S. K., Natarajasivam, R., and Kishore, P. P. (2024). Optimizing diabetic retinopathy disease prediction using PNAS, ASHA, and transfer learning. In Advances in Networks, Intelligence and Computing (pp. 62-71). CRC Press.
- Zhang, X., and Duh, K. (2024, September). Best Practices of Successive Halving on Neural Machine Translation and Large Language Models. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track) (pp. 130-139).
- Aach, M., Sarma, R., Neukirchen, H., Riedel, M., and Lintermann, A. (2024). Resource-Adaptive Successive Doubling for Hyperparameter Optimization with Large Datasets on High-Performance Computing Systems. arXiv preprint arXiv:2412.02729.
- Jang, D., Yoon, H., Jung, K., and Chung, Y. D. (2024). QHB+: Accelerated Configuration Optimization for Automated Performance Tuning of Spark SQL Applications. IEEE Access.
- Chen, Y., Wen, Z., Chen, J., and Huang, J. (2024, May). Enhancing the Performance of Bandit-based Hyperparameter Optimization. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) (pp. 967-980). IEEE.
- Zhang, Y., Wu, H., and Yang, Y. (2024). FlexHB: a More Efficient and Flexible Framework for Hyperparameter Optimization. arXiv preprint arXiv:2402.13641.
- Srivastava, N. (2013). Improving neural networks with dropout. University of Toronto, 182(566), 7.
- Baldi, P., and Sadowski, P. J. (2013). Understanding dropout. Advances in neural information processing systems, 26.
- Gal, Y., and Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050-1059). PMLR.
- Gal, Y., Hron, J., and Kendall, A. (2017). Concrete dropout. Advances in neural information processing systems, 30.
- Gal, Y., and Ghahramani, Z. (2016). A theoretically grounded application of dropout in recurrent neural networks. Advances in neural information processing systems, 29.
- Friedman, J. H., Hastie, T., and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33, 1-22.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1), 267-288.
- Meinshausen, N. (2007). Relaxed lasso. Computational Statistics and Data Analysis, 52(1), 374-393.
- Carvalho, C. M., Polson, N. G., and Scott, J. G. (2009, April). Handling sparsity via the horseshoe. In Artificial intelligence and statistics (pp. 73-80). PMLR.
- Hoerl, A. E., and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.
- Cesa-Bianchi, N., Conconi, A., and Gentile, C. (2004). On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50(9), 2050-2057.
- Devroye, L., Györfi, L., and Lugosi, G. (2013). A probabilistic theory of pattern recognition (Vol. 31). Springer Science and Business Media.
- Abu-Mostafa, Y. S., Magdon-Ismail, M., and Lin, H. T. (2012). Learning from data (Vol. 4, p. 4). New York: AMLBook.
- Shalev-Shwartz, S., and Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.
- Bühlmann, P., and Van De Geer, S. (2011). Statistics for high-dimensional data: methods, theory and applications. Springer Science and Business Media.
- Gareth, J., Daniela, W., Trevor, H., and Robert, T. (2013). An introduction to statistical learning: with applications in R. Spinger.
- Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression.
- Fan, J., and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456), 1348-1360.
- Meinshausen, N., and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso.
- Montavon, G., Orr, G., and Müller, K. R. (Eds.). (2012). Neural networks: tricks of the trade (Vol. 7700). springer.
- Prechelt, L. (2002). Early stopping-but when?. In Neural Networks: Tricks of the trade (pp. 55-69). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Brownlee, J. (2019). Develop deep learning models on theano and TensorFlow using keras. J Chem Inf Model, 53(9), 1689-1699.
- Zhang, H. (2017). mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412.
- Shorten, C., and Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of big data, 6(1), 1-48.
- Perez, L. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv preprint arXiv:1712.04621.
- Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501.
- Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78-87.
- Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society: Series B (Methodological), 36(2), 111-133.
- LeCun, Y., Denker, J., and Solla, S. (1989). Optimal brain damage. Advances in neural information processing systems, 2.
- Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. (2016). Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.
- Frankle, J., and Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635.
- Han, S., Pool, J., Tran, J., and Dally, W. (2015). Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28.
- Liu, Z., Sun, M., Zhou, T., Huang, G., and Darrell, T. (2018). Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270.
- Cheng, Y., Wang, D., Zhou, P., and Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282.
- Frankle, J., Dziugaite, G. K., Roy, D. M., and Carbin, M. (2020). Pruning neural networks at initialization: Why are we missing the mark?. arXiv preprint arXiv:2009.08576.
- Breiman, L. (1996). Bagging predictors. Machine learning, 24, 123-140.
- Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
- Freund, Y., and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
- Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
- Zhou, Z. H. (2025). Ensemble methods: foundations and algorithms. CRC press.
- Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning, 40, 139-157.
- Chen, T., and Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
- Bühlmann, P., and Yu, B. (2003). Boosting with the L 2 loss: regression and classification. Journal of the American Statistical Association, 98(462), 324-339.
- Hinton, G. E., and Van Camp, D. (1993, August). Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory (pp. 5-13).
- Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1), 108-116.
- Grandvalet, Y., and Bengio, Y. (2004). Semi-supervised learning by entropy minimization. Advances in neural information processing systems, 17.
- Wager, S., Wang, S., and Liang, P. S. (2013). Dropout training as adaptive regularization. Advances in neural information processing systems, 26.
- Pei, Z., Zhang, Z., Chen, J., Liu, W., Chen, B., Huang, Y., ... and Lu, Y. (2025). KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning. Electronics, 14(3), 414.
- Chen, H. (2024). Augmenting image data using noise, rotation and shifting.
- An, D., Liu, P., Feng, Y., Ding, P., Zhou, W., and Yu, B. (2024). Dynamic weighted knowledge distillation for brain tumor segmentation. Pattern Recognition, 155, 110731.
- SONG, Y. F., and LIU, Y. (2024). Fast adversarial training method based on data augmentation and label noise. Journal of Computer Applications, 0.
- Hosseini, S. A., Servaes, S., Rahmouni, N., Therriault, J., Tissot, C., Macedo, A. C., ... and Rosa-Neto, P. (2024). Leveraging T1 MRI Images for Amyloid Status Prediction in Diverse Cognitive Conditions Using Advanced Deep Learning Models. Alzheimer’s and Dementia, 20, e094153.
- Cakmakci, U. B. Deep Learning Approaches for Pediatric Bone Age Prediction from Hand Radiographs.
- Surana, A. V., Pawar, S. E., Raha, S., Mali, N., and Mukherjee, T. (2024). ENSEMBLE FINE TUNED MULTI LAYER PERCEPTRON FOR PREDICTIVE ANALYSIS OF WEATHER PATTERNS AND RAINFALL FORECASTING FROM SATELLITE DATA. ICTACT Journal on Soft Computing, 15(2).
- Chanda, A. An In-Depth Analysis of CIFAR-100 Using Inception v3.
- Zaitoon, R., Mohanty, S. N., Godavarthi, D., and Ramesh, J. V. N. (2024). SPBTGNS: Design of an Efficient Model for Survival Prediction in Brain Tumour Patients using Generative Adversarial Network with Neural Architectural Search Operations. IEEE Access.
- Bansal, A., Sharma, D. R., and Kathuria, D. M. Bayesian-Optimized Ensemble Approach for Fall Detection: Integrating Pose Estimation with Temporal Convolutional and Graph Neural Networks. Available at SSRN 4974349.
- Kusumaningtyas, E. M., Ramadijanti, N., and Rijal, I. H. K. (2024, August). Convolutional Neural Network Implementation with MobileNetV2 Architecture for Indonesian Herbal Plants Classification in Mobile App. In 2024 International Electronics Symposium (IES) (pp. 521-527). IEEE.
- Yadav, A. C., Alam, Z., and Mufeed, M. (2024, August). U-Net-Driven Advancements in Breast Cancer Detection and Segmentation. In 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT) (Vol. 1, pp. 1-6). IEEE.
- Alshamrani, A. F. A., and Alshomran, F. (2024). Optimizing Breast Cancer Mammogram Classification through a Dual Approach: A Deep Learning Framework Combining ResNet50, SMOTE, and Fully Connected Layers for Balanced and Imbalanced Data. IEEE Access.
- Cox, D., Ghosh, S., & Sultanow, E. (2022). International Journal of Pure and Applied Mathematics Research.
- Zamindar, N. (2024). Using Artificial Intelligence for Thermographic Image Analysis: Applications to the Arc Welding Process (Doctoral dissertation, Politecnico di Torino).
- Xu, M., Yin, H., and Zhong, S. (2024, July). Enhancing Generalization and Convergence in Neural Networks through a Dual-Phase Regularization Approach with Excitatory-Inhibitory Transition. In 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET (pp. 1-4). IEEE.
- Elshamy, R., Abu-Elnasr, O., Elhoseny, M., and Elmougy, S. (2024). Enhancing colorectal cancer histology diagnosis using modified deep neural networks optimizer. Scientific Reports, 14(1), 19534.
- Vinay, K., Kodipalli, A., Swetha, P., and Kumaraswamy, S. (2024, May). Analysis of prediction of pneumonia from chest X-ray images using CNN and transfer learning. In 2024 5th International Conference for Emerging Technology (INCET) (pp. 1-6). IEEE.
- Gai, S., and Huang, X. (2024). Regularization method for reduced biquaternion neural network. Applied Soft Computing, 166, 112206.
- Xu, Y. (2025). Deep regularization techniques for improving robustness in noisy record linkage task. Advances in Engineering Innovation, 15, 9-13.
- Liao, Z., Li, S., Zhou, P., and Zhang, C. (2025). Decay regularized stochastic configuration networks with multi-level data processing for UAV battery RUL prediction. Information Sciences, 701, 121840.
- Dong, Z., Yang, C., Li, Y., Huang, L., An, Z., and Xu, Y. (2024, May). Class-wise Image Mixture Guided Self-Knowledge Distillation for Image Classification. In 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD) (pp. 310-315). IEEE.
- Ba, Y., Mancenido, M. V., and Pan, R. (2024). How Does Data Diversity Shape the Weight Landscape of Neural Networks?. arXiv preprint arXiv:2410.14602.
- Li, Z., Zhang, Y., and Li, W. (2024, September). Fusion of L2 Regularisation and Hybrid Sampling Methods for Multi-Scale SincNet Audio Recognition. In 2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 7, pp. 1556-1560). IEEE.
- Zang, X., and Yan, A. (2024, May). A Stochastic Configuration Network with Attenuation Regularization and Multi-kernel Learning and Its Application. In 2024 36th Chinese Control and Decision Conference (CCDC) (pp. 2385-2390). IEEE.
- Moradi, R., Berangi, R., and Minaei, B. (2020). A survey of regularization strategies for deep models. Artificial Intelligence Review, 53(6), 3947-3986.
- Rodríguez, P., Gonzalez, J., Cucurull, G., Gonfaus, J. M., and Roca, X. (2016). Regularizing cnns with locally constrained decorrelations. arXiv preprint arXiv:1611.01967.
- Tian, Y., and Zhang, Y. (2022). A comprehensive survey on regularization strategies in machine learning. Information Fusion, 80, 146-166.
- Ghosh, S. Markov Chain Monte Carlo Approach to Sample Size Reestimation.
- Cong, Y., Liu, J., Fan, B., Zeng, P., Yu, H., and Luo, J. (2017). Online similarity learning for big data with overfitting. IEEE Transactions on Big Data, 4(1), 78-89.
- Salman, S., and Liu, X. (2019). Overfitting mechanism and avoidance in deep neural networks. arXiv preprint arXiv:1901.06566.
- Wang, K., Muthukumar, V., and Thrampoulidis, C. (2021). Benign overfitting in multiclass classification: All roads lead to interpolation. Advances in Neural Information Processing Systems, 34, 24164-24179.
- Poggio, T., Kawaguchi, K., Liao, Q., Miranda, B., Rosasco, L., Boix, X., ... and Mhaskar, H. (2017). Theory of deep learning III: explaining the non-overfitting puzzle. arXiv preprint arXiv:1801.00173.
- Oyedotun, O. K., Olaniyi, E. O., and Khashman, A. (2017). A simple and practical review of over-fitting in neural network learning. International Journal of Applied Pattern Recognition, 4(4), 307-328.
- Luo, X., Chang, X., and Ban, X. (2016). Regression and classification using extreme learning machine based on L1-norm and L2-norm. Neurocomputing, 174, 179-186.
- Zhou, Y., Yang, Y., Wang, D., Zhai, Y., Li, H., and Xu, Y. (2024). Innovative Ghost Channel Spatial Attention Network with Adaptive Activation for Efficient Rice Disease Identification. Agronomy, 14(12), 2869.
- Omole, O. J., Rosa, R. L., Saadi, M., and Rodriguez, D. Z. (2024). AgriNAS: Neural Architecture Search with Adaptive Convolution and Spatial–Time Augmentation Method for Soybean Diseases. AI, 5(4), 2945-2966.
- Tripathi, L., Dubey, P., Kalidoss, D., Prasad, S., Sharma, G., and Dubey, P. (2024, December). Deep Learning Approaches for Brain Tumour Detection Using VGG-16 Architecture. In 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 256-261). IEEE.
- Singla, S., and Gupta, R. (2024, December). Pneumonia Detection from Chest X-Ray Images Using Transfer Learning with EfficientNetB1. In 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS) (pp. 894-899). IEEE.
- Al-Adhaileh, M. H., Alsharbi, B. M., Aldhyani, T., Ahmad, S., Almaiah, M., Ahmed, Z. A., ... and Singh, S. DLAAD-Deep Learning Algorithms Assisted Diagnosis of Chest Disease Using Radiographic Medical Images. Frontiers in Medicine, 11, 1511389.
- Gopal, M. S., & Ghosh, S. (2021). Special Primes And Some Of Their Properties.
- Harvey, E., Petrov, M., and Hughes, M. C. (2025). Learning Hyperparameters via a Data-Emphasized Variational Objective. arXiv preprint arXiv:2502.01861.
- Mahmood, T., Saba, T., Al-Otaibi, S., Ayesha, N., and Almasoud, A. S. (2025). AI-Driven Microscopy: Cutting-Edge Approach for Breast Tissue Prognosis Using Microscopic Images. Microscopy Research and Technique.
- Shen, Q. (2025). Predicting the value of football players: machine learning techniques and sensitivity analysis based on FIFA and real-world statistical datasets. Applied Intelligence, 55(4), 265.
- Guo, X., Wang, M., Xiang, Y., Yang, Y., Ye, C., Wang, H., and Ma, T. (2025). Uncertainty Driven Adaptive Self-Knowledge Distillation for Medical Image Segmentation. IEEE Transactions on Emerging Topics in Computational Intelligence.
- Zambom, A. Z., and Dias, R. (2013). A review of kernel density estimation with applications to econometrics. International Econometric Review, 5(1), 20-42.
- Reyes, M., Francisco-Fernández, M., and Cao, R. (2016). Nonparametric kernel density estimation for general grouped data. Journal of Nonparametric Statistics, 28(2), 235-249.
- Tenreiro, C. (2024). A Parzen–Rosenblatt type density estimator for circular data: exact and asymptotic optimal bandwidths. Communications in Statistics-Theory and Methods, 53(20), 7436-7452.
- Devroye, L., and Penrod, C. S. (1984). The consistency of automatic kernel density estimates. The Annals of Statistics, 1231-1249.
- El Machkouri, M. (2011). Asymptotic normality of the Parzen–Rosenblatt density estimator for strongly mixing random fields. Statistical Inference for Stochastic Processes, 14, 73-84.
- Slaoui, Y. (2018). Bias reduction in kernel density estimation. Journal of Nonparametric Statistics, 30(2), 505-522.
- Michalski, A. (2016). The use of kernel estimators to determine the distribution of groundwater level. Meteorology Hydrology and Water Management. Research and Operational Applications, 4(1), 41-46.
- Gramacki, A., and Gramacki, A. (2018). Kernel density estimation. Nonparametric Kernel Density Estimation and Its Computational Aspects, 25-62.
- Desobry, F., Davy, M., and Fitzgerald, W. J. (2007, April). Density kernels on unordered sets for kernel-based signal processing. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (Vol. 2, pp. II-417). IEEE.
- Gasser, T., and Müller, H. G. (1979). Kernel estimation of regression functions. In Smoothing Techniques for Curve Estimation: Proceedings of a Workshop held in Heidelberg, April 2–4, 1979 (pp. 23-68). Springer Berlin Heidelberg.
- Gasser, T., and Müller, H. G. (1984). Estimating regression functions and their derivatives by the kernel method. Scandinavian journal of statistics, 171-185.
- Härdle, W., and Gasser, T. (1985). On robust kernel estimation of derivatives of regression functions. Scandinavian journal of statistics, 233-240.
- Müller, H. G. (1987). Weighted local regression and kernel methods for nonparametric curve fitting. Journal of the American Statistical Association, 82(397), 231-238.
- Chu, C. K. (1993). A new version of the Gasser-Mueller estimator. Journal of Nonparametric Statistics, 3(2), 187-193.
- Peristera, P., and Kostaki, A. (2005). An evaluation of the performance of kernel estimators for graduating mortality data. Journal of Population Research, 22, 185-197.
- Müller, H. G. (1991). Smooth optimum kernel estimators near endpoints. Biometrika, 78(3), 521-530.
- Gasser, T., Gervini, D., Molinari, L., Hauspie, R. C., and Cameron, N. (2004). Kernel estimation, shape-invariant modelling and structural analysis. Cambridge Studies in Biological and Evolutionary Anthropology, 179-204.
- Jennen-Steinmetz, C., and Gasser, T. (1988). A unifying approach to nonparametric regression estimation. Journal of the American Statistical Association, 83(404), 1084-1089.
- Müller, H. G. (1997). Density adjusted kernel smoothers for random design nonparametric regression. Statistics and probability letters, 36(2), 161-172.
- Neumann, M. H., and Thorarinsdottir, T. L. (2006). Asymptotic minimax estimation in nonparametric autoregression. Mathematical Methods of Statistics, 15(4), 374.
- Steland, A. THE AVERAGE RUN LENGTH OF KERNEL CONTROL CHARTS FOR DEPENDENT TIME SERIES.
- Makkulau, A. T. A., Baharuddin, M., and Agusrawati, A. T. P. M. (2023, December). Multivariable Semiparametric Regression Used Priestley-Chao Estimators. In Proceedings of the 5th International Conference on Statistics, Mathematics, Teaching, and Research 2023 (ICSMTR 2023) (Vol. 109, p. 118). Springer Nature.
- Staniswalis, J. G. (1989). The kernel estimate of a regression function in likelihood-based models. Journal of the American Statistical Association, 84(405), 276-283.
- Mack, Y. P., and Müller, H. G. (1988). Convolution type estimators for nonparametric regression. Statistics and probability letters, 7(3), 229-239.
- Jones, M. C., Davies, S. J., and Park, B. U. (1994). Versions of kernel-type regression estimators. Journal of the American Statistical Association, 89(427), 825-832.
- Ghosh, S. (2015). Surface estimation under local stationarity. Journal of Nonparametric Statistics, 27(2), 229-240.
- Liu, C. W., and Luor, D. C. (2023). Applications of fractal interpolants in kernel regression estimations. Chaos, Solitons and Fractals, 175, 113913.
- Agua, B. M., and Bouzebda, S. (2024). Single index regression for locally stationary functional time series. AIMS Math, 9, 36202-36258.
- Bouzebda, S., Nezzal, A., and Elhattab, I. (2024). Limit theorems for nonparametric conditional U-statistics smoothed by asymmetric kernels. AIMS Mathematics, 9(9), 26195-26282.
- Zhao, H., Qian, Y., and Qu, Y. (2025). Mechanical performance degradation modelling and prognosis method of high-voltage circuit breakers considering censored data. IET Science, Measurement and Technology, 19(1), e12235.
- Patil, M. D., Kannaiyan, S., and Sarate, G. G. (2024). Signal denoising based on bias-variance of intersection of confidence interval. Signal, Image and Video Processing, 18(11), 8089-8103.
- Kakani, K., and Radhika, T. S. L. (2024). Nonparametric and nonlinear approaches for medical data analysis. International Journal of Data Science and Analytics, 1-19.
- Kato, M. (2024). Debiased Regression for Root-N-Consistent Conditional Mean Estimation. arXiv preprint arXiv:2411.11748.
- Sadek, A. M., and Mohammed, L. A. (2024). Evaluation of the Performance of Kernel Non-parametric Regression and Ordinary Least Squares Regression. JOIV: International Journal on Informatics Visualization, 8(3), 1352-1360.
- Ghosh, S. (2020). Withdrawn: Chebysev’s Estimate of the Prime Counting Function.
- Gong, A., Choi, K., and Dwivedi, R. (2024). Supervised Kernel Thinning. arXiv preprint arXiv:2410.13749.
- Zavatone-Veth, J. A., and Pehlevan, C. (2025). Nadaraya–Watson kernel smoothing as a random energy model. Journal of Statistical Mechanics: Theory and Experiment, 2025(1), 013404.
- Ferrigno, S. (2024, December). Nonparametric estimation of reference curves. In CMStatistics 2024.
- Fan, X., Leng, C., and Wu, W. (2025). Causal Inference under Interference: Regression Adjustment and Optimality. arXiv preprint arXiv:2502.06008.
- Atanasov, A., Bordelon, B., Zavatone-Veth, J. A., Paquette, C., and Pehlevan, C. (2025). Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models. arXiv preprint arXiv:2502.05074.
- Ghosh, S. (2020). The Basel Problem. arXiv preprint arXiv:2010.03953.
- Mishra, U., Gupta, D., Sarkar, A., and Hazarika, B. B. (2025). A hybrid approach for plant leaf detection using ResNet50-intuitionistic fuzzy RVFL (ResNet50-IFRVFLC) classifier. Computers and Electrical Engineering, 123, 110135.
- Elsayed, M. M., and Nazier, H. (2025). Technology and evolution of occupational employment in Egypt (1998–2018): a task-based framework. Review of Economics and Political Science.
- Kong, X., Li, C., and Pan, Y. (2025). Association Between Heavy Metals Mixtures and Life’s Essential 8 Score in General US Adults. Cardiovascular Toxicology, 1-12.
- Bracale, D., Banerjee, M., Sun, Y., Stoll, K., and Turki, S. (2025). Dynamic Pricing in the Linear Valuation Model using Shape Constraints. arXiv preprint arXiv:2502.05776.
- Köhne, F., Philipp, F. M., Schaller, M., Schiela, A., and Worthmann, K. (2024). L∞-error bounds for approximations of the Koopman operator by kernel extended dynamic mode decomposition. arXiv preprint arXiv:2403.18809.
- Sadeghi, R., and Beyeler, M. (2025). Efficient Spatial Estimation of Perceptual Thresholds for Retinal Implants via Gaussian Process Regression. arXiv preprint arXiv:2502.06672.
- Naresh, E., Patil, A., and Bhuvan, S. (2025, February). Enhancing network security with eBPF-based firewall and machine learning. In Data Science and Exploration in Artificial Intelligence: Proceedings of the First International Conference On Data Science and Exploration in Artificial Intelligence (CODE-AI 2024) Bangalore, India, 3rd-4th July, 2024 (Volume 1) (p. 169). CRC Press.
- Zhao, W., Chen, H., Liu, T., Tuo, R., and Tian, C. From Deep Additive Kernel Learning to Last-Layer Bayesian Neural Networks via Induced Prior Approximation. In The 28th International Conference on Artificial Intelligence and Statistics.
- Nanyonga, A., Wasswa, H., Joiner, K., Turhan, U., and Wild, G. (2025). A Multi-Head Attention-Based Transformer Model for Predicting Causes in Aviation Incident.
- Fan, C. L., and Chung, Y. J. (2025). Integrating Image Processing Technology and Deep Learning to Identify Crops in UAV Orthoimages.
- Bakaev, M., Gorovaia, S., and Mitrofanova, O. (2025). Who Will Author the Synthetic Texts? Evoking Multiple Personas from Large Language Models to Represent Users’ Associative Thesauri. Big Data and Cognitive Computing, 9(2), 46.
- Celli, M., Ghosh, S., & Prakash, A. (2020). THE PENTAGON.
- Ahn, K. S., Choi, J. H., Kwon, H., Lee, S., Cho, Y., and Jang, W. Y. (2025). Deep learning-based automated guide for defining a standard imaging plane for developmental dysplasia of the hip screening using ultrasonography: a retrospective imaging analysis. BMC Medical Informatics and Decision Making, 25(1), 1-8.
- Peng, J., Lu, F., Li, B., Huang, Y., Qu, S., and Chen, G. (2025). Range and Bird’s Eye View Fused Cross-Modal Visual Place Recognition. arXiv preprint arXiv:2502.11742.
- Zhao, J., Wang, W., Wang, J., Zhang, S., Fan, Z., and Matwin, S. (2025). Privacy-preserved federated clustering with Non-IID data via GANs. The Journal of Supercomputing, 81(4), 1-37.
- Wang, J., Liu, L., He, K., Gebrewahid, T. W., Gao, S., Tian, Q., ... and Li, H. (2025). Accurate genomic prediction for grain yield and grain moisture content of maize hybrids using multi-environment data. Journal of Integrative Plant Biology.
- Xu, H., Xue, T., Fan, J., Liu, D., Chen, Y., Zhang, F., ... and Cai, W. (2025). Medical Image Registration Meets Vision Foundation Model: Prototype Learning and Contour Awareness. arXiv preprint arXiv:2502.11440.
- Sun, M., Yin, Y., Xu, Z., Kolter, J. Z., and Liu, Z. (2025). Idiosyncrasies in Large Language Models. arXiv preprint arXiv:2502.12150.
- Liang, Y., Liu, F., Li, A., Li, X., and Zheng, C. (2025). NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing. arXiv preprint arXiv:2502.12002.
- Fix, E., and Hodges, J. L. (1951). Discriminatory analysis, nonparametric discrimination.
- Cover, T., and Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
- Devroye, L., Györfi, L., and Lugosi, G. (2013). A probabilistic theory of pattern recognition (Vol. 31). Springer Science and Business Media.
- Toussaint, G. (2005). Geometric proximity graphs for improving nearest neighbor methods in instance-based learning and data mining. International Journal of Computational Geometry and Applications, 15(02), 101-150.
- Cox, D., Ghosh, S., and Sultanow, E. (2021). Collatz Cycles and 3n+c Cycles. arXiv preprint arXiv:2101.04067.
- Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. Y. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM), 45(6), 891-923.
- Terrell, G. R., and Scott, D. W. (1992). Variable kernel density estimation. The Annals of Statistics, 1236-1265.
- Samworth, R. J. (2012). Optimal weighted nearest neighbour classifiers.
- Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., and Toussaint, G. (2005). Output-sensitive algorithms for computing nearest-neighbour decision boundaries. Discrete and Computational Geometry, 33, 593-604.
- Ramaswamy, S., Rastogi, R., and Shim, K. (2000, May). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 427-438).
- Cover, T. M. (1999). Elements of information theory. John Wiley and Sons.
- Alaca, Y., and Emin, B. PERFORMANCE EVALUAITION OF HYBRID APPROACHES COMBINING DEEP LEARNING MODELS AND MACHINE LEARNING METHODS FOR MEDICAL KIDNEY IMAGE CLASSIFICATION.
- Chen, J. S., Hung, R. W., and Yang, C. Y. (2025). An Efficient Target-to-Area Classification Strategy with a PIP-Based KNN Algorithm for Epidemic Management. Mathematics, 13(4), 661.
- Liu, J., Tu, S., Wang, M., Chen, D., Chen, C., and Xie, H. (2025). The influence of different factors on the bond strength of lithium disilicate-reinforced glass–ceramics to Resin: a machine learning analysis. BMC Oral Health, 25(1), 1-12.
- Barghouthi, E. A. D., Owda, A. Y., Owda, M., and Asia, M. (2025). A Fused Multi-Channel Prediction Model of Pressure Injury for Adult Hospitalized Patients—The “EADB” Model. AI, 6(2), 39.
- Jewan, S. Y. Y. Remote sensing technology and machine learning algorithms for crop yield prediction in Bambara groundnut and grapevines (Doctoral dissertation, University of Nottingham).
- Moldovanu, S., Munteanu, D., and Sîrbu, C. (2025). Impact on Classification Process Generated by Corrupted Features. Big Data and Cognitive Computing, 9(2), 45.
- HosseinpourFardi, N., and Alizadeh, B. (2025). AILIS: effective hardware accelerator for incremental learning with intelligent selection in classification. The Journal of Supercomputing, 81(4), 1-30.
- Afrin, T., Yodo, N., and Huang, Y. (2025). AI-Driven Framework for Predicting Oil Pipeline Failure Causes Based on Leak Properties and Financial Impact. Journal of Pipeline Systems Engineering and Practice, 16(2), 04025009.
- Hussain, M. A., Chen, Z., Zhou, Y., Ullah, H., and Ying, M. (2025). Spatial analysis of flood susceptibility in Coastal area of Pakistan using machine learning models and SAR imagery. Environmental Earth Sciences, 84(5), 1-23.
- Reddy, S. R., and Murthy, G. V. (2025). Cardiovascular Disease Prediction Using Particle Swarm Optimization and Neural Network Based an Integrated Framework. SN Computer Science, 6(2), 186.
- Chen, Y., Garcia, E. K., Gupta, M. R., Rahimi, A., and Cazzanti, L. (2009). Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research, 10(3).
- Chechik, G., Sharma, V., Shalit, U., and Bengio, S. (2010). Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11(3).
- Huang, W., Zhang, P., and Wan, M. (2013). A novel similarity learning method via relative comparison for content-based medical image retrieval. Journal of digital imaging, 26, 850-865.
- Yang, P., Wang, H., Yang, J., Qian, Z., Zhang, Y., and Lin, X. (2024). Deep learning approaches for similarity computation: A survey. IEEE Transactions on Knowledge and Data Engineering.
- Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C., and Hao, Z. (2011, July). Similarity-based approach for positive and unlabeled learning. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p. 1577).
- Kar, P., and Jain, P. (2011). Similarity-based learning via data driven embeddings. Advances in neural information processing systems, 24.
- https://www.pingcap.com/article/top-10-tools-for-calculating-semantic-similarity/.
- Co-citation proximity analysis. (n.d.). In Wikipedia. Retrieved February 22, 2025, from https://en.wikipedia.org/wiki/Co-citation_Proximity_Analysis.
- Choi, S. (2022). Internet News User Analysis Using Deep Learning and Similarity Comparison. Electronics, 11(4), 569.
- Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1, 81-106.
- Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
- Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. (2017). Classification and regression trees. Routledge.
- Kohavi, R., and John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1-2), 273-324.
- Breiman, L. (1996). Bagging predictors. Machine learning, 24, 123-140.
- Freund, Y., and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
- Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
- Domingos, P., and Hulten, G. (2000, August). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 71-80).
- Freund, Y., and Mason, L. (1999, June). The alternating decision tree learning algorithm. In icml (Vol. 99, pp. 124-133).
- Quinlan, J. R. (1993). Program for machine learning. C4. 5.
- Usman, S. A., Bhattacharjee, M., Alsukhailah, A. A., Shahzad, A. D., Razick, M. S. A., and Amin, N. (2025). Identifying the Best-Selling Product using Machine Learning Algorithms.
- Abbas, J., Yousef, M., Hamoud, K., and Joubran, K. (2025). Low Back Pain Among Health Sciences Undergraduates: Results Obtained from a Machine-Learning Analysis.
- Deng, C., Liu, X., Zhang, J., Mo, Y., Li, P., Liang, X., and Li, N. (2025). Prediction of retail commodity hot-spots: a machine learning approach. Data Science and Management.
- Eili, M. Y., Rezaeenour, J., and Roozbahani, M. H. (2025). Predicting clinical pathways of traumatic brain injuries (TBIs) through process mining. npj Digital Medicine, 8(1), 1-12.
- Yin, Y., Xu, B., Chang, J., Li, Z., Bi, X., Wei, Z., ... and Cai, J. (2025). Gamma-Glutamyl Transferase Plus Carcinoembryonic Antigen Ratio Index: A Promising Biomarker Associated with Treatment Response to Neoadjuvant Chemotherapy for Patients with Colorectal Cancer Liver Metastases. Current Oncology, 32(2), 117.
- Abdullahi, N., Akbal, E., Dogan, S., Tuncer, T., and Erman, U. Accurate Indoor Home Location Classification through Sound Analysis: The 1D-ILQP Approach. Firat University Journal of Experimental and Computational Engineering, 4(1), 12-29.
- Mokan, M., Gabrani, G., and Relan, D. (2025). Pixel-wise classification of the whole retinal vasculature into arteries and veins using supervised learning. Biomedical Signal Processing and Control, 106, 107691.
- Maron, M. E. (1961). Automatic indexing: an experimental inquiry. Journal of the ACM (JACM), 8(3), 404-417.
- Minsky, M. (1961). Steps toward artificial intelligence. Proceedings of the IRE, 49(1), 8-30.
- Mosteller, F., and Wallace, D. L. (1963). Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist Papers. Journal of the American Statistical Association, 58(302), 275-309.
- Domingos, P., and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine learning, 29, 103-130.
- Hand, D. J., and Yu, K. (2001). Idiot’s Bayes—not so stupid after all?. International statistical review, 69(3), 385-398.
- Rish, I. (2001, August). An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence (Vol. 3, No. 22, pp. 41-46).
- Ng, A., and Jordan, M. (2001). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems, 14.
- Webb, G. I., Boughton, J. R., and Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine learning, 58, 5-24.
- Boullé, M. (2007). Compression-based averaging of selective naive Bayes classifiers. The Journal of Machine Learning Research, 8, 1659-1685.
- Larsen, B., and Aone, C. (1999, August). Fast and effective text mining using linear-time document clustering. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 16-22).
- SHANNAQ, B. (2025). DOES DATASET SPLITTING IMPACT ARABIC TEXT CLASSIFICATION MORE THAN PREPROCESSING? AN EMPIRICAL ANALYSIS IN BIG DATA ANALYTICS. Journal of Theoretical and Applied Information Technology, 103(3).
- Goldstein, D., Aldrich, C., Shao, Q., and O’Connor, L. (2025). A Machine Learning Classification Approach to Geotechnical Characterisation Using Measure-While-Drilling Data.
- Ntamwiza, J. M. V., and Bwire, H. (2025). Predicting biking preferences in Kigali city: A comparative study of traditional statistical models and ensemble machine learning models. Transport Economics and Management.
- EL Fadel, N. (2025). Facial Recognition Algorithms: A Systematic Literature Review. Journal of Imaging, 11(2), 58.
- RaviKumar, S., Pandian, C. A., Hameed, S. S., Muralidharan, V., and Ali, M. S. W. (2025). Application of machine learning for fault diagnosis and operational efficiency in EV motor test benches using vibration analysis. Engineering Research Express, 7(1), 015355.
- Kavitha, D., Srujankumar, G., Akhil, C., and Sumanth, P. Uncovering the Truth: A Machine Learning Approach to Detect Fake Product Reviews and Analyze Sentiment. Explainable IoT Applications: A Demystification, 309.
- Nusantara, R. M. (2025). Analisis Sentimen Masyarakat terhadap Pelayanan Bank Central Asia: Text Mining Cuitan Satpam BCA pada Twitter. Co-Value Jurnal Ekonomi Koperasi dan kewirausahaan, 15(9).
- Ahmadi, M., Khajavi, M., Varmaghani, A., Ala, A., Danesh, K., and Javaheri, D. (2025). Leveraging Large Language Models for Cybersecurity: Enhancing SMS Spam Detection with Robust and Context-Aware Text Classification. arXiv preprint arXiv:2502.11014.
- Takaki, T., Matsuoka, R., Fujita, Y., and Murakami, S. (2025). Development and clinical evaluation of an AI-assisted respiratory state classification system for chest X-rays: A BMI-Specific approach. Computers in Biology and Medicine, 188, 109854.
- Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179-188.
- Anderson, T. W., Anderson, T. W., Anderson, T. W., Anderson, T. W., and Mathématicien, E. U. (1958). An introduction to multivariate statistical analysis (Vol. 2, pp. 3-5). New York: Wiley.
- Rao, C. R. (1948). The utilization of multiple measurements in problems of biological classification. Journal of the Royal Statistical Society. Series B (Methodological), 10(2), 159-203.
- Duda, R. O., and Hart, P. E. (2006). Pattern classification. John Wiley and Sons.
- McLachlan, G. J. (2005). Discriminant analysis and statistical pattern recognition. John Wiley and Sons.
- Belhumeur, P. N., Hespanha, J. P., and Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence, 19(7), 711-720.
- Mika, S., Ratsch, G., Weston, J., Scholkopf, B., and Mullers, K. R. (1999, August). Fisher discriminant analysis with kernels. In Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468) (pp. 41-48). Ieee.
- Ye, J., and Yu, B. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6(4).
- Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of machine learning research, 8(5).
- Hartmann, M., Wolff, W., Martarelli, C. S., Hartmann, M., and Suisse, U. Unpleasant mind, deactivated body–A distinct somatic signature of boredom through bodily sensation mapping.
- Garrido-Tamayo, M. A., Rincón Santamaría, A., Hoyos, F. E., González Vega, T., and Laroze, D. (2025). Autofluorescence of Red Blood Cells Infected with P. falciparum as a Preliminary Analysis of Spectral Sweeps to Predict Infection. Biosensors, 15(2), 123.
- Li, B., and Jiang, S. (2025). Reservoir Fluid PVT High-Pressure Physical Property Analysis Based on Graph Convolutional Network Model. Applied Sciences, 15(4), 2209.
- Nyembwe, A., Zhao, Y., Caceres, B. A., Hall, K., Prescott, L., Potts-Thompson, S., ... and Taylor, J. Y. (2025). Moderating effect of coping strategies on the association between perceived discrimination and blood pressure outcomes among young Black mothers in the InterGEN study. AIMS Public Health, 12(1), 217-232.
- Singh, S. K., Kumar, M., Khan, I. M., Jayanthiladevi, A., and Agarwal, C. (2025). An Attention-based Model for Recognition of Facial Expressions using CNN-BiLSTM. Polytechnic Journal, 15(1), 4.
- Akter, T., Faqeerzada, M. A., Kim, Y., Pahlawan, M. F. R., Aline, U., Kim, H., ... and Cho, B. K. (2025). Hyperspectral imaging with multivariate analysis for detection of exterior flaws for quality evaluation of apples and pears. Postharvest Biology and Technology, 223, 113453.
- Feng, C. H., Deng, F., Disis, M. L., Gao, N., and Zhang, L. (2025). Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients. bioRxiv, 2025-02.
- Ghosh, S. (2021, February 24). Another Proof of Basel Problem. [CrossRef]
- Chick, H. M., Williams, L. K., Sparks, N., Khattak, F., Vermeij, P., Frantzen, I., ... and Wilkinson, T. S. (2025). Campylobacter jejuni ST353 and ST464 cause localized gut inflammation, crypt damage, and extraintestinal spread during large-and small-scale infection in broiler chickens. Applied and Environmental Microbiology, e01614-24.
- Miao, X., Xu, L., Sun, L., Xie, Y., Zhang, J., Xu, X., ... and Lin, J. (2025). Highly Sensitive Detection and Molecular Subtyping of Breast Cancer Cells Using Machine Learning-assisted SERS Technology. Nano Biomedicine and Engineering.
- Rohan, D., Reddy, G. P., Kumar, Y. P., Prakash, K. P., and Reddy, C. P. (2025). An extensive experimental analysis for heart disease prediction using artificial intelligence techniques. Scientific Reports, 15(1), 6132.
- Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society Series B: Statistical Methodology, 20(2), 215-232.
- Jahangiri, J. (2022). 106.45 A generalisation of a classical open-top box problem. The Mathematical Gazette, 106(567), 526-531.
- Nelder, J. A., and Wedderburn, R. W. (1972). Generalized linear models. Journal of the Royal Statistical Society Series A: Statistics in Society, 135(3), 370-384.
- Haberman, S., and Renshaw, A. E. (1990). Generalised linear models and excess mortality from peptic ulcers. Insurance: Mathematics and Economics, 9(1), 21-32.
- Hosmer, D. W., and Lemesbow, S. (1980). Goodness of fit tests for the multiple logistic regression model. Communications in statistics-Theory and Methods, 9(10), 1043-1069.
- McCullagh, P. (2019). Generalized linear models. Routledge.
- Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27-38.
- King, G., and Zeng, L. (2001). Logistic regression in rare events data. Political analysis, 9(2), 137-163.
- Gelman, A., and Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
- Sani, J., Oluyomi, A. O., Wali, I. G., Ahmed, M. M., and Halane, S. (2025). Regional disparities on contraceptive intention and its sociodemographic determinants among reproductive women in Nigeria. Contraception and Reproductive Medicine, 10(1), 1-10.
- Dorsey, S. S., Catlin, D. H., Ritter, S. J., Wails, C. N., Robinson, S. G., Oliver, K. W., ... and Fraser, J. D. (2025). The importance of viewshed in nest site selection of a ground-nesting shorebird. PLOS ONE, 20(2), e0319021.
- Slawny, C., Libersky, E., and Kaushanskaya, M. (2025). The Roles of Language Ability and Language Dominance in Bilingual Parent–Child Language Alignment. Journal of Speech, Language, and Hearing Research, 1-13.
- Waller, D. K., Dass, N. L. M., Oluwafemi, O. O., Agopian, A. J., Tark, J. Y., Hoyt, A. T., ... and Study, N. B. D. P. (2025). Maternal Diarrhea During the Periconceptional Period and the Risk of Birth Defects, National Birth Defects Prevention Study, 2006-2011. Birth defects research, 117(2), e2438.
- Beyeler, M., Rohner, R., Ijäs, P., Eker, O. F., Cognard, C., Bourcier, R., ... and Kaesmacher, J. (2025). Susceptibility Vessel Sign and Intravenous Alteplase in Stroke Patients Treated with Thrombectomy. Clinical Neuroradiology, 1-11.
- Yedavalli, V., Salim, H. A., Balar, A., Lakhani, D. A., Mei, J., Lu, H., ... and Heit, J. J. (2025). Hypoperfusion Intensity Ratio Less Than 0.4 is Associated with Favorable Outcomes in Unsuccessfully Reperfused Acute Ischemic Stroke with Large-Vessel Occlusion. American Journal of Neuroradiology.
- Aarakit, S. M., Ssennono, F. V., Nalweyiso, G., Murungi, H., and Adaramola, M. S. Do Social Networks and Neighbourhood Effects Matter in Solar Adoption? Insights from Uganda National Household Survey. Insights from Uganda National Household Survey.
- Yang, Y., Cai, X., Zhou, M., Chen, Y., Pi, J., Zhao, M., ... and Wang, Y. (2025). Association of Left Ventricular Function With Cerebral Small Vessel Disease in a Community-Based Population. CNS neuroscience and therapeutics, 31(2), e70226.
- Cortese, S. (2025). Advancing our knowledge on the maternal and neonatal outcomes in women with ADHD. Evidence-Based Nursing.
- Gaspar, P., Mittal, P., Cohen, H., and Isenberg, D. A. (2025). Risk factors for bleeding in patients with thrombotic antiphospholipid syndrome during antithrombotic therapy. Lupus, 09612033251322927.
- Schölkopf, B., and Smola, A. J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.
- Cristianini, N., and Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.
- Christmann, A., and Steinwart, I. (2008). Support vector machines.
- Schölkopf, B., Burges, C. J., and Smola, A. J. (Eds.). (1999). Advances in kernel methods: support vector learning. MIT press.
- Drucker, H., Burges, C. J., Kaufman, L., Smola, A., and Vapnik, V. (1996). Support vector regression machines. Advances in neural information processing systems, 9.
- Joachims, T. (1999, June). Transductive inference for text classification using support vector machines. In Icml (Vol. 99, pp. 200-209).
- Schölkopf, B., Smola, A., and Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5), 1299-1319.
- Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121-167.
- Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural computation, 13(7), 1443-1471.
- Gauss, C. F. (1809). Theoria motus corporum coelestium in sectionibus conicis solem ambientium auctore Carolo Friderico Gauss. sumtibus Frid. Perthes et IH Besser.
- Legendre, A. M. (1806). Nouvelles méthodes pour la détermination des orbites des comètes: avec un supplément contenant divers perfectionnemens de ces méthodes et leur application aux deux comètes de 1805. Courcier.
- Pearson, K. (1901). LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science, 2(11), 559-572.
- Fisher, R. A. (1922). The goodness of fit of regression formulae, and the distribution of regression coefficients. Journal of the Royal Statistical Society, 597-612.
- Koopmans, T. C. (1937). Linear regression analysis of economic time series.
- Goldberger, A. S. (1991). A course in econometrics. Harvard University Press.
- Rao, C. R., Rao, C. R., Statistiker, M., Rao, C. R., and Rao, C. R. (1973). Linear statistical inference and its applications (Vol. 2, pp. 263-270). New York: Wiley.
- Huber, P. J. (1992). Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution (pp. 492-518). New York, NY: Springer New York.
- Ramadhan, D. L., and Ali, T. H. (2025). A Multivariate Wavelet Shrinkage in Quantile Regression Models.
- Zhou, F., Chu, J., Lu, F., Ouyang, W., Liu, Q., and Wu, Z. (2025). Real-time monitoring of methyl orange degradation in non-thermal plasma by integrating Raman spectroscopy with a hybrid machine learning model. Environmental Technology and Innovation, 104100.
- Zhong, X., Cai, S., Wang, H., Wu, L., and Sun, Y. (2025). The knowledge, attitude and practice of nurses on the posture management of premature infants: status quo and coping strategies. BMC Health Services Research, 25(1), 288.
- Liu, J., Wang, S., Tang, Y., Pan, F., and Xia, J. (2025). Current status and influencing factors of pediatric clinical nurses’ scientific research ability: a survey. BMC nursing, 24(1), 1-8.
- Ming-jun, C., and Jian-ya, Z. (2025). Research on the comprehensive effect of the Porter hypothesis of environmental protection tax regulation in China. Environmental Sciences Europe, 37(1), 28.
- Dietze, P., Colledge-Frisby, S., Gerra, G., Poznyak, V., Campello, G., Kashino, W., ... and Krupchanka, D. (2025). Impact of UNODC/WHO SOS (stop-overdose-safely) training on opioid overdose knowledge and attitudes among people at high or low risk of opioid overdose in Kazakhstan, Kyrgyzstan, Tajikistan and Ukraine. Harm Reduction Journal, 22, 20.
- Hasan, M. S., and Ghosal, S. (2025). Unravelling Inequities in Access to Public Healthcare Services in West Bengal, India: Multiple Dimensions, Geographic Pattern, and Association with Health Outcomes. Global Social Welfare, 1-18.
- Ghosh, S., & Jain, P. (2021). On Fermat Numbers and Munafo’s Conjecture.
- Zeng, S., Hou, X., Luo, X., and Wei, Q. Enhancing Maize Yield Prediction Under Stress Conditions Using Solar-Induced Chlorophyll Fluorescence and Deep Learning. Available at SSRN 5146460.
- Baird, H. B., Allen, W., Gallegos, M., Ashy, C., Slone, H. S., and Pullen, W. M. (2025). Artificial Intelligence-Driven Analysis Identifies Anterior Cruciate Ligament Reconstruction, Hip Arthroscopy and Femoroacetabular Impingement Syndrome, and Shoulder Instability as the Most Commonly Published Topics in Arthroscopy. Arthroscopy, Sports Medicine, and Rehabilitation, 101108.
- Overton, M. W., and Eicker, S. (2025). Associations between days open and dry period length versus milk production, replacement, and fertility in the subsequent lactation in Holstein dairy cows. Journal of Dairy Science.
- Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., ... and Hassabis, D. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815.
- He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 9729-9738).
- Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., ... and Valko, M. (2020). Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33, 21271-21284.
- Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
- Finn, C., Abbeel, P., and Levine, S. (2017, July). Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning (pp. 1126-1135). PMLR.
- Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., and Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
- Mousavi, S. M. H. Is Deleting the Dataset of a Self-Aware AGI Ethical? Does It Possess a Soul by Self-Awareness?
- Bjerregaard, A., Groth, P. M., Hauberg, S., Krogh, A., and Boomsma, W. (2025). Foundation models of protein sequences: A brief overview. Current Opinion in Structural Biology, 91, 103004.
- Cui, T., Tang, C., Zhou, D., Li, Y., Gong, X., Ouyang, W., ... and Zhang, S. (2025). Online test-time adaptation for better generalization of interatomic potentials to out-of-distribution data. Nature Communications, 16(1), 1891.
- Jia, Q., Zhang, Y., Wang, Y., Ruan, T., Yao, M., and Wang, L. (2025). Fragment-level Feature Fusion Method Using Retrosynthetic Fragmentation Algorithm for molecular property prediction. Journal of Molecular Graphics and Modelling, 108985.
- Hou, L. Unboxing the intersections between self-esteem and academic mindfulness with test emotions, psychological wellness and academic achievement in artificial intelligence-supported learning environments: Evidence from English as a foreign language learners. British Educational Research Journal.
- Liu, Y., Huang, Y., Dai, Z., and Gao, Y. (2025). Self-optimized learning algorithm for multi-specialty multi-stage elective surgery scheduling. Engineering Applications of Artificial Intelligence, 147, 110346.
- Song, Q., Li, C., Fu, J., Zeng, Q., and Xie, N. (2025). Self-supervised heterogeneous graph neural network based on deep and broad neighborhood encoding. Applied Intelligence, 55(6), 467.
- Odlyzko, A. M., & Riele, H. T. (1985). Disproof of the Mertens conjecture.
- Davenport, H. (1937). On some infinite series involving arithmetical functions (II). The Quarterly Journal of Mathematics, (1), 313-320.
- Soundararajan, K. (2007). Partial sums of the M ö bius function. arXiv preprint arXiv:0705.0723.
- Soundararajan, K. (2009). Partial sums of the Möbius function. Journal für die reine und angewandte Mathematik, 2009(631), 141-152. [CrossRef]
- Li, T., Nath, D., Cheng, Y., Fan, Y., Li, X., Raković, M., ... and Gašević, D. (2025, March). Turning Real-Time Analytics into Adaptive Scaffolds for Self-Regulated Learning Using Generative Artificial Intelligence. In Proceedings of the 15th International Learning Analytics and Knowledge Conference (pp. 667-679).
- Chaudary, E., Khan, S. A., and Mumtaz, W. (2025). EEG-CNN-Souping: Interpretable emotion recognition from EEG signals using EEG-CNN-souping model and explainable AI. Computers and Electrical Engineering, 123, 110189.
- Tautan, A. M., Andrei, A. G., Smeralda, C. L., Vatti, G., Rossi, S., and Ionescu, B. (2025). Unsupervised learning from EEG data for epilepsy: A systematic literature review. Artificial Intelligence in Medicine, 103095.
- Guo, X., and Sun, L. (2025). Evaluation of stroke sequelae and rehabilitation effect on brain tumor by neuroimaging technique: A comparative study. PLOS ONE, 20(2), e0317193.
- Diao, S., Wan, Y., Huang, D., Huang, S., Sadiq, T., Khan, M. S., ... and Mazhar, T. (2025). Optimizing Bi-LSTM networks for improved lung cancer detection accuracy. PLOS ONE, 20(2), e0316136.
- Lin, N., Shi, Y., Ye, M., Zhang, Y., and Jia, X. (2025). Deep transfer learning radiomics for distinguishing sinonasal malignancies: a preliminary MRI study. Future Oncology, 1-8.
- Çetintaş, D. (2025). Efficient monkeypox detection using hybrid lightweight CNN architectures and optimized SVM with grid search on imbalanced data. Signal, Image and Video Processing, 19(4), 1-12.
- Wang, X., and Zhao, D. (2025). A comparative experimental study of citation sentiment identification based on the Athar-Corpus. Data Science and Informetrics.
- Muralinath, R. N., Pathak, V., and Mahanti, P. K. (2025). Metastable Substructure Embedding and Robust Classification of Multichannel EEG Data Using Spectral Graph Kernels. Future Internet, 17(3), 102.
- Hu, Y. H., Liu, T. H., Tsai, C. F., and Lin, Y. J. (2025). Handling Class Imbalanced Data in Sarcasm Detection with Ensemble Oversampling Techniques. Applied Artificial Intelligence, 39(1), 2468534.
- Wang, H., Lv, F., Zhan, Z., Zhao, H., Li, J., and Yang, K. (2025). Predicting the Tensile Properties of Automotive Steels at Intermediate Strain Rates via Interpretable Ensemble Machine Learning. World Electric Vehicle Journal, 16(3), 123.
- Husain, M., Aftab, R. A., Zaidi, S., and Rizvi, S. J. A. (2025). Shear thickening fluid: A multifaceted rheological modeling integrating phenomenology and machine learning approach. Journal of Molecular Liquids, 127223.
- Iqbal, A., and Siddiqi, T. A. (2025). Enhancing seasonal streamflow prediction using multistage hybrid stochastic data-driven deep learning methodology with deep feature selection. Environmental and Ecological Statistics, 1-51.
- Geman, S., Bienenstock, E., and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural computation, 4(1), 1-58.
- Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2019). Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 15849-15854.
- Neal, B., Mittal, S., Baratin, A., Tantia, V., Scicluna, M., Lacoste-Julien, S., and Mitliagkas, I. (2018). A modern take on the bias-variance tradeoff in neural networks. arXiv preprint arXiv:1810.08591.
- Rocks, J. W., and Mehta, P. (2022). Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models. Physical review research, 4(1), 013201.
- Doroudi, S., and Rastegar, S. A. (2023). The bias–variance tradeoff in cognitive science. Cognitive Science, 47(1), e13241.
- Almeida, M., Zhuang, Y., Ding, W., Crouter, S. E., and Chen, P. (2021). Mitigating class-boundary label uncertainty to reduce both model bias and variance. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(2), 1-18.
- Zhou, H., Song, L., Chen, J., Zhou, Y., Wang, G., Yuan, J., and Zhang, Q. (2021). Rethinking soft labels for knowledge distillation: A bias-variance tradeoff perspective. arXiv preprint arXiv:2102.00650.
- Gupta, N., Smith, J., Adlam, B., and Mariet, Z. (2022). Ensembling over classifiers: a bias-variance perspective. arXiv preprint arXiv:2206.10566.
- Ranglani, Hardev. (2024). Empirical Analysis of the Bias-Variance Tradeoff Across Machine Learning Models. Machine Learning and Applications: An International Journal. 11. 01-12. [CrossRef]
- Bellman, R. (1954). The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6), 503-515.
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... and Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), 484-489.
- Watkins, C. J., and Dayan, P. (1992). Q-learning. Machine learning, 8, 279-292.
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., ... and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
- Shah, H. Towards Safe AI: Ensuring Security in Machine Learning and Reinforcement Learning Models.
- Ajanovi, Z., Gros, T., Den Hengst, F., Holler, D., Kokel, H., and Taitler, A. (2025, February). Bridging the Gap Between AI Planning and Reinforcement Learning. In AAAI Conference on Artificial Intelligence.
- Oliveira, D. R., Moreira, G. J., and Duarte, A. R. (2025). Arbitrarily shaped spatial cluster detection via reinforcement learning algorithms. Environmental and Ecological Statistics, 1-23.
- Hengzhi, B. A. I., Haichao, W. A. N. G., Rongrong, H. E., Jiatao, D. U., Guoxin, L. I., Yuhua, X. U., and Yutao, J. I. A. O. (2025). Multi-hop UAV relay covert communication: A multi-agent reinforcement learning approach. Chinese Journal of Aeronautics, 103440.
- Pan, R., Yuan, Q., Luo, G., Chen, B., Liu, Y., and Li, J. Tg-Mg: Task Grouping Based on Mdp Graph for Multi-Task Reinforcement Learning. Available at SSRN 5149163.
- Liu, H., Li, D., Zeng, B., and Xu, Y. (2025). Learning discriminative features for multi-hop knowledge graph reasoning. Applied Intelligence, 55(6), 1-14.
- Chen, H., Guo, W., Bao, W., Cui, M., Wang, X., and Zhao, Q. (2025). A novel interpretable decision rule extracting method for deep reinforcement learning-based energy management in building complexes. Energy and Buildings, 115514.
- Anwar, G. A., and Akber, M. Z. (2025). Multi-agent deep reinforcement learning for resilience optimization of building structures considering utility interactions for functionality. Computers and Structures, 310, 107703.
- Zhao, W., Lv, Y., Lee, K. M., and Li, W. (2025). An intelligent data-driven adaptive health state assessment approach for rolling bearings under single and multiple working conditions. Computers and Industrial Engineering, 110988.
- Soman, G., Judy, M. V., and Abou, A. M. (2025). Human guided empathetic AI agent for mental health support leveraging reinforcement learning-enhanced retrieval-augmented generation. Cognitive Systems Research, 101337.
- Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12.
- Ghosh, S. (2024). Tensor Derivative in Curvilinear Coordinates.
- Kakade, S. M. (2001). A natural policy gradient. Advances in neural information processing systems, 14.
- Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, June). Trust region policy optimization. In International conference on machine learning (pp. 1889-1897). PMLR.
- Agarwal, A., Kakade, S. M., Lee, J. D., and Mahajan, G. (2021). On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98), 1-76.
- Liu, J., Li, W., and Wei, K. (2024). Elementary analysis of policy gradient methods. arXiv preprint arXiv:2404.03372.
- Lorberbom, G., Maddison, C. J., Heess, N., Hazan, T., and Tarlow, D. (2020). Direct policy gradients: Direct optimization of policies in discrete action spaces. Advances in Neural Information Processing Systems, 33, 18076-18086.
- McCracken, G., Daniels, C., Zhao, R., Brandenberger, A., Panangaden, P., and Precup, D. (2020). A Study of Policy Gradient on a Class of Exactly Solvable Models. arXiv preprint arXiv:2011.01859.
- Lehmann, M. (2024). The definitive guide to policy gradients in deep reinforcement learning: Theory, algorithms and implementations. arXiv preprint arXiv:2401.13662.
- Rahn, A., Sultanow, E., Henkel, M., Ghosh, S., and Aberkane, I. J. (2021). An algorithm for linearizing the Collatz convergence. Mathematics, 9(16), 1898.
- Sutton, R. S., Singh, S., and McAllester, D. (2000). Comparing policy-gradient algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 30(4), 467-477.
- Mustafa, E., Shuja, J., Rehman, F., Namoun, A., Bilal, M., and Iqbal, A. (2025). Computation offloading in vehicular communications using PPO-based deep reinforcement learning. The Journal of Supercomputing, 81(4), 1-24.
- Yang, C., Chen, J., Huang, X., Lian, J., Tang, Y., Chen, X., and Xie, S. (2025). Joint Driving Mode Selection and Resource Management in Vehicular Edge Computing Networks. IEEE Internet of Things Journal.
- Jamshidiha, S., Pourahmadi, V., and Mohammadi, A. (2025). A Traffic-Aware Graph Neural Network for User Association in Cellular Networks. IEEE Transactions on Mobile Computing.
- Raei, H., De Momi, E., and Ajoudani, A. (2025). A Reinforcement Learning Approach to Non-prehensile Manipulation through Sliding. arXiv preprint arXiv:2502.17221.
- Ting-Ting, Z., Yan, C., Ren-zhi, D., Tao, C., Yan, L., Kai-Ge, Z., ... and Yu-Shi, L. (2025). Autonomous decision-making of UAV cluster with communication constraints based on reinforcement learning. Journal of Cloud Computing, 14(1), 12.
- Zhang, B., Xing, H., Zhang, Z., and Feng, W. (2025). Autonomous obstacle avoidance decision method for spherical underwater robot based on brain-inspired spiking neural network. Expert Systems with Applications, 127021.
- Nguyen, X. B., Phan, X. H., and Piccardi, M. (2025). Fine-tuning text-to-SQL models with reinforcement-learning training objectives. Natural Language Processing Journal, 100135.
- Brahmanage, J. C., Ling, J., and Kumar, A. (2025). Leveraging Constraint Violation Signals For Action-Constrained Reinforcement Learning. arXiv preprint arXiv:2502.10431.
- Huang, Z., Dai, W., Zou, Y., Li, D., Cai, J., Gadekallu, T. R., and Wang, W. (2025). Cooperative Traffic Scheduling in Transportation Network: A Knowledge Transfer Method. IEEE Transactions on Intelligent Transportation Systems.
- Li, J., Li, R., Ma, G., Wang, H., Yang, W., and Gu, Z. Fedddpg: A Reinforcement Learning Method For Federated Learning-Based Vehicle Trajectory Prediction. Available at SSRN 5148441.
- Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., ... and Silver, D. (2018, April). Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).
- Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016). End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39), 1-40.
- Bellemare, M. G., Dabney, W., and Munos, R. (2017, July). A distributional perspective on reinforcement learning. In International conference on machine learning (pp. 449-458). PMLR.
- Xue, K., Zhai, L., Li, Y., Lu, Z., and Zhou, W. (2025). Task Offloading and Multi-cache Placement Based on DRL in UAV-assisted MEC Networks. Vehicular Communications, 100900.
- Amodu, O. A., Mahmood, R. A. R., Althumali, H., Jarray, C., Adnan, M. H., Bukar, U. A., ... and Zukarnain, Z. A. (2025). A question-centric review on DRL-based optimization for UAV-assisted MEC sensor and IoT applications, challenges, and future directions. Vehicular Communications, 100899.
- Silvestri, A., Coraci, D., Brandi, S., Capozzoli, A., and Schlueter, A. (2025). Practical deployment of reinforcement learning for building controls using an imitation learning approach. Energy and Buildings, 115511.
- SARIGUL, F. A., and BAYEZIT, I. DEEP REINFORCEMENT LEARNING BASED AUTONOMOUS HEADING CONTROL OF A FIXED-WING AIRCRAFT.
- Mukhamadiarov, R. (2025). Controlling dynamics of stochastic systems with deep reinforcement learning. arXiv preprint arXiv:2502.18111.
- Ali, N., and Wallace, G. (2025). The Future of SOC Operations: Autonomous Cyber Defense with AI and Machine Learning.
- Yan, L., Wang, Q., Hu, G., Chen, W., and Noack, B. R. (2025). Deep reinforcement cross-domain transfer learning of active flow control for three-dimensional bluff body flow. Journal of Computational Physics, 113893.
- Silvestri, A., Coraci, D., Brandi, S., Capozzoli, A., and Schlueter, A. (2025). Practical deployment of reinforcement learning for building controls using an imitation learning approach. Energy and Buildings, 115511.
- Alajaji, S. A., Sabzian, R., Wang, Y., Sultan, A. S., and Wang, R. (2025). A Scoping Review of Infrared Spectroscopy and Machine Learning Methods for Head and Neck Precancer and Cancer Diagnosis and Prognosis. Cancers, 17(5), 796.
- Wang, X., and Liu, L. (2025). Risk-Sensitive DRL for Portfolio Optimization in Petroleum Futures.
- Thongkairat, S., and Yamaka, W. (2025). A Combined Algorithm Approach for Optimizing Portfolio Performance in Automated Trading: A Study of SET50 Stocks. Mathematics, 13(3), 461.
- Dey, D., and Ghosh, N. Iquic: An Intelligent Framework for Defending Quic Connection Id-Based Dos Attack Using Advantage Actor-Critic Rl. Available at SSRN 5129475.
- Zhao, K., Peng, L., and Tak, B. (2025). Joint DRL-Based UAV Trajectory Planning and TEG-Based Task Offloading. IEEE Transactions on Consumer Electronics.
- Mounesan, M., Zhang, X., and Debroy, S. (2025). Infer-EDGE: Dynamic DNN Inference Optimization in’Just-in-time’Edge-AI Implementations. arXiv preprint arXiv:2501.18842.
- Hou, Y., Yin, C., Sheng, X., Xu, D., Chen, J., and Tang, H. (2025). Automotive Fuel Cell Performance Degradation Prediction Using Multi-Agent Cooperative Advantage Actor-Critic Model. Energy, 134899.
- Radaideh, M. I., Tunkle, L., Price, D., Abdulraheem, K., Lin, L., and Elias, M. (2025). Multistep Criticality Search and Power Shaping in Nuclear Microreactors with Deep Reinforcement Learning. Nuclear Science and Engineering, 1-13.
- LI, B., SHEN, L., ZHAO, C., and FEI, Z. (2025). Robust Resource Optimization in Integrated Sensing, Communication, and Computing Networks Based on Soft Actor-Critic, 47(3), 1-10.
- Khan, N., Ahmad, S., Raza, S., Khan, A., and Younas, M. (2025). COST EFFECTIVE ROUTE OPTIMIZATION FOR DAIRY PRODUCT DELIVERY. Kashf Journal of Multidisciplinary Research, 2(02), 13-26.
- Yuan, Y., Zhang, J., Xu, X., Wang, B., Han, S., Sun, M., and Zhang, P. (2025). Learning-Based Task-Centric Multi-User Semantic Communication Solution for Vehicle Networks. IEEE Transactions on Vehicular Technology.
- Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., ... and Kavukcuoglu, K. (2016, June). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928-1937). PmLR.
- Wang, Y., Zhang, C., Yu, T., and Ma, M. (2022). Recursive Least Squares Advantage Actor-Critic Algorithms. arXiv preprint arXiv:2201.05918.
- G, Rubell Marion Lincy and Sagar, Som and Narayanan, Vishnu and Binu, Dhanush and Selby, Nevin and Thomas, Sheba Elizabeth, Advantage Actor-Critic Reinforcement Learning with Technical Indicators for Stock Trading Decisions.
- Paczolay, G., and Harmati, I. (2020, October). A new advantage actor-critic algorithm for multi-agent environments. In 2020 23rd International Symposium on Measurement and Control in Robotics (ISMCR) (pp. 1-6). IEEE.
- Qin, S., Xie, X., Wang, J., Guo, X., Qi, L., Cai, W., ... and Talukder, Q. T. A. (2024). An Optimized Advantage Actor-Critic Algorithm for Disassembly Line Balancing Problem Considering Disassembly Tool Degradation. Mathematics, 12(6), 836.
- Ghosh, S. (2023). Relationship of Galerkin FEM with Central Difference Method.
- Kölle, M., Hgog, M., Ritz, F., Altmann, P., Zorn, M., Stein, J., and Linnhoff-Popien, C. (2024). Quantum advantage actor-critic for reinforcement learning. arXiv preprint arXiv:2401.07043.
- Benhamou, E. (2019). Variance reduction in actor critic methods (ACM). arXiv preprint arXiv:1907.09765.
- Peng, B., Li, X., Gao, J., Liu, J., Chen, Y. N., and Wong, K. F. (2018, April). Adversarial advantage actor-critic model for task-completion dialogue policy learning. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6149-6153). IEEE.
- van Veldhuizen, V. (2022). Autotuning PID control using Actor-Critic Deep Reinforcement Learning. arXiv preprint arXiv:2212.00013.
- Cicek, D. C., Duran, E., Saglam, B., Mutlu, F. B., and Kozat, S. S. (2021, November). Off-policy correction for deep deterministic policy gradient algorithms via batch prioritized experience replay. In 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 1255-1262). IEEE.
- Han, S., Zhou, W., Lü, S., and Yu, J. (2021). Regularly updated deterministic policy gradient algorithm. Knowledge-Based Systems, 214, 106736.
- Pan, L., Cai, Q., and Huang, L. (2020). Softmax deep double deterministic policy gradients. Advances in neural information processing systems, 33, 11767-11777.
- Luck, K. S., Vecerik, M., Stepputtis, S., Amor, H. B., and Scholz, J. (2019, November). Improved exploration through latent trajectory optimization in deep deterministic policy gradient. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 3704-3711). IEEE.
- Dong, R., Du, J., Liu, Y., Heidari, A. A., and Chen, H. (2023). An enhanced deep deterministic policy gradient algorithm for intelligent control of robotic arms. Frontiers in Neuroinformatics, 17, 1096053.
- Jesus, J. C., Bottega, J. A., Cuadros, M. A., and Gamarra, D. F. (2019, December). Deep deterministic policy gradient for navigation of mobile robots in simulated environments. In 2019 19th International Conference on Advanced Robotics (ICAR) (pp. 362-367). IEEE.
- Lin, T., Zhang, X., Gong, J., Tan, R., Li, W., Wang, L., ... and Gao, J. (2023). A dosing strategy model of deep deterministic policy gradient algorithm for sepsis patients. BMC Medical Informatics and Decision Making, 23(1), 81.
- Sumalatha, V., and Pabboju, S. (2024). Optimal Index Selection using Optimized Deep Deterministic Policy Gradient for NoSQL Database. Engineering, Technology and Applied Science Research, 14(6), 18125-18130.
- Yang, C., Chen, J., Huang, X., Lian, J., Tang, Y., Chen, X., and Xie, S. (2025). Joint Driving Mode Selection and Resource Management in Vehicular Edge Computing Networks. IEEE Internet of Things Journal.
- Tian, S., Zhu, X., Feng, B., Zheng, Z., Liu, H., and Li, Z. (2025). Partial Offloading Strategy Based on Deep Reinforcement Learning in the Internet of Vehicles. IEEE Transactions on Mobile Computing.
- Chen, H., Cui, H., Wang, J., Cao, P., He, Y., and Guizani, M. (2025). Computation Offloading Optimization for UAV-Based Cloud-Edge Collaborative Task Scheduling Strategy. IEEE Transactions on Cognitive Communications and Networking.
- Deng, J., Zhou, H., and Alouini, M. S. (2025). Distributed Coordination for Heterogeneous Non-Terrestrial Networks. arXiv preprint arXiv:2502.17366.
- Zhang, Y., Fan, W., Yu, Y., and Liu, Y. A. (2025). DRL-Based Resource Orchestration for Vehicular Edge Computing With Multi-Edge and Multi-Vehicle Assistance. IEEE Transactions on Intelligent Transportation Systems.
- Cuéllar, R., Posada, D., Henderson, T., and Karimi, R. R. ORBITAL MANEUVER AND INTERPLANETARY TRAJECTORY DESIGN VIA REINFORCEMENT LEARNING.
- Liu, L., Sun, M., Zhao, E., and Zhu, K. (2025). Three-Dimensional Dynamic Trajectory Planning for Autonomous Underwater Robots Under the PPO-IIFDS Framework. Journal of Marine Science and Engineering, 13(3), 445.
- Figueroa, N. F., Tafur, J. C., and Kheddar, A. (2025). Fast Autolearning for Multimodal Walking in Humanoid Robots with Variability of Experience. IEEE Robotics and Automation Letters.
- Xu, C., Zhang, P., and Yu, H. (2025). Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning. IEEE Sensors Journal.
- Li, L., Jing, X., Liu, H., Lei, H., and Chen, Q. (2025). Adaptive Anti-Jamming Resource Allocation Scheme in Dynamic Jamming Environment. IEEE Transactions on Vehicular Technology.
- Chandrasiri, S., and Meedeniya, D. (2025). Energy-Efficient Dynamic Workflow Scheduling in Cloud Environments Using Deep Learning. Sensors, 25(5), 1428.
- Cox, D., & Ghosh, S. (2021). A Uniformly Distributed Congruence.
- Wu, Y., and Xie, N. (2025). Design of digital low-carbon system for smart buildings based on PPO algorithm. Sustainable Energy Research, 12(1), 1-14.
- Guan, Q., Cao, H., Jia, L., Yan, D., and Chen, B. (2025). Synergetic attention-driven transformer: A Deep reinforcement learning approach for vehicle routing problems. Expert Systems with Applications, 126961.
- Zhang, B., Wang, Y., and Dhillon, P. S. (2025). Policy Learning with a Natural Language Action Space: A Causal Approach. arXiv preprint arXiv:2502.17538.
- Zhang, C., Dai, L., Zhang, H., and Wang, Z. (2025). Control Barrier Function-Guided Deep Reinforcement Learning for Decision-Making of Autonomous Vehicle at On-Ramp Merging. IEEE Transactions on Intelligent Transportation Systems.
- Stanley, K. O., and Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary computation, 10(2), 99-127.
- Stanley, K. O., Bryant, B. D., and Miikkulainen, R. (2005). Real-time neuroevolution in the NERO video game. IEEE transactions on evolutionary computation, 9(6), 653-668.
- Gauci, J., and Stanley, K. (2007, July). Generating large-scale neural networks through discovering geometric regularities. In Proceedings of the 9th annual conference on Genetic and evolutionary computation (pp. 997-1004).
- Metzen, J. H., Edgington, M., Kassahun, Y., and Kirchner, F. (2007, December). Performance evaluation of EANT in the robocup keepaway benchmark. In Sixth International Conference on Machine Learning and Applications (ICMLA 2007) (pp. 342-347). IEEE.
- Kassahun, Y., and Sommer, G. (2005, April). Efficient reinforcement learning through Evolutionary Acquisition of Neural Topologies. In ESANN (pp. 259-266).
- Siebel, N. T., and Sommer, G. (2007). Evolutionary reinforcement learning of artificial neural networks. International Journal of Hybrid Intelligent Systems, 4(3), 171-183.
- Ghosh, S. (2025). Analysis of Creep Deformation. Preprints. [CrossRef]
- Siebel, N. T., and Sommer, G. (2008, June). Learning defect classifiers for visual inspection images by neuro-evolution using weakly labelled training data. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) (pp. 3925-3931). IEEE.
- Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., ... and Hodjat, B. (2024). Evolving deep neural networks. In Artificial intelligence in the age of neural networks and brain computing (pp. 269-287). Academic Press.
- Ghosh, S. Rate-Dependent Plastic Deformation Model of Euler-Bernoulli Beams.
- Liang, J., Meyerson, E., Hodjat, B., Fink, D., Mutch, K., and Miikkulainen, R. (2019, July). Evolutionary neural automl for deep learning. In Proceedings of the genetic and evolutionary computation conference (pp. 401-409).
- Vargas, D. V., and Murata, J. (2016). Spectrum-diverse neuroevolution with unified neural models. IEEE transactions on neural networks and learning systems, 28(8), 1759-1773.
- Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley, K. O., and Clune, J. (2017). Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567.
- Assunção, F., Lourenço, N., Ribeiro, B., and Machado, P. (2021). Fast-DENSER: Fast deep evolutionary network structured representation. SoftwareX, 14, 100694.
- Rempis, C. W. (2012). Evolving complex neuro-controllers with interactively constrained neuro-evolution (Doctoral dissertation, University of Osnabrück).
- Stanley, K. O., Clune, J., Lehman, J., and Miikkulainen, R. (2019). Designing neural networks through neuroevolution. Nature Machine Intelligence, 1(1), 24-35.
- Bertens, P., and Lee, S. W. (2019). Network of evolvable neural units: Evolving to learn at a synaptic level. arXiv preprint arXiv:1912.07589.
- Wang, Z., Zhou, Y., Takagi, T., Song, J., Tian, Y. S., and Shibuya, T. (2023). Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data. BMC bioinformatics, 24(1), 139.
- Pagliuca, P., Milano, N., and Nolfi, S. (2020). Efficacy of modern neuro-evolutionary strategies for continuous control optimization. Frontiers in Robotics and AI, 7, 98.
- Behjat, A., Chidambaran, S., and Chowdhury, S. (2019, May). Adaptive genomic evolution of neural network topologies (agent) for state-to-action mapping in autonomous agents. In 2019 International Conference on Robotics and Automation (ICRA) (pp. 9638-9644). IEEE.
- Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., ... and Gandomi, A. H. (2023). Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artificial Intelligence Review, 56(11), 13521-13617.
- Miikkulainen, R. (2023, July). Evolution of neural networks. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation (pp. 1008-1025).
- Kannan, A., Selvi, M., Santhosh Kumar, S. V. N., Thangaramya, K., and Shalini, S. (2024). Machine Learning Based Intelligent RPL Attack Detection System for IoT Networks. In Advanced Machine Learning with Evolutionary and Metaheuristic Techniques (pp. 241-256). Singapore: Springer Nature Singapore.
- Zeng, X., Cai, J., Liang, C., and Yuan, C. (2022). A hybrid model integrating long short-term memory with adaptive genetic algorithm based on individual ranking for stock index prediction. Plos one, 17(8), e0272637.
- KV, S., and Swamy, A. (2024). Enhancing Software Quality with Ensemble Machine Learning and Evolutionary Approaches.
- Gruau, F. (1993, April). Cellular encoding as a graph grammar. In IEE colloquium on grammatical inference: Theory, applications and alternatives (pp. 17-1). IET.
- Gruau, F., Whitley, D., and Pyeatt, L. (1996, July). A comparison between cellular encoding and direct encoding for genetic neural networks. In Proceedings of the 1st annual conference on genetic programming (pp. 81-89).
- Gruau, F., and Whitley, D. (1993). Adding learning to the cellular development of neural networks: Evolution and the Baldwin effect. Evolutionary computation, 1(3), 213-233.
- Gutierrez, G., Galvan, I., MoIina, J., and Sanchis, A. (2004, July). Studying the capacity of cellular encoding to generate feedforward neural network topologies. In 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541) (Vol. 1, pp. 211-215). IEEE.
- Zhang, B. T., and Muhlenbein, H. (1993). Evolving optimal neural networks using genetic algorithms with Occam’s razor. Complex systems, 7(3), 199-220.
- Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system. Complex System, 4(4), 461-476.
- Miller, J., and Turner, A. (2015, July). Cartesian genetic programming. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation (pp. 179-198).
- Miller, J. F. (2020). Cartesian genetic programming: its status and future. Genetic Programming and Evolvable Machines, 21(1), 129-168.
- Hernández Ruiz, A. J., Vilalta Arias, A., and Moreno-Noguer, F. (2021). Neural cellular automata manifold. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10015-10023). IEEE Computer Society Conference Publishing Services (CPS).
- Hajij, M., Istvan, K., and Zamzmi, G. (2020). Cell complex neural networks. arXiv preprint arXiv:2010.00743.
- Sun, W., Winnubst, J., Natrajan, M., Lai, C., Kajikawa, K., Bast, A., ... and Spruston, N. (2025). Learning produces an orthogonalized state machine in the hippocampus. Nature, 1-11.
- Guan, B., Chu, G., Wang, Z., Li, J., and Yi, B. (2025). Instance-level semantic segmentation of nuclei based on multimodal structure encoding. BMC bioinformatics, 26(1), 42.
- Ghosh, N., Dutta, P., and Santoni, D. (2025). TFBS-Finder: Deep Learning-based Model with DNABERT and Convolutional Networks to Predict Transcription Factor Binding Sites. arXiv preprint arXiv:2502.01311.
- Sun, R., Qian, L., Li, Y., Cheng, H., Xue, Z., Zhang, X., ... and Guo, T. (2025). A perturbation proteomics-based foundation model for virtual cell construction. bioRxiv, 2025-02.
- Grosjean, P., Shevade, K., Nguyen, C., Ancheta, S., Mader, K., Franco, I., ... and Kampmann, M. (2025). Network-aware self-supervised learning enables high-content phenotypic screening for genetic modifiers of neuronal activity dynamics. bioRxiv, 2025-02.
- Gonzalez, K. C., Noguchi, A., Zakka, G., Yong, H. C., Terada, S., Szoboszlay, M., ... and Losonczy, A. (2025). Visually guided in vivo single-cell electroporation for monitoring and manipulating mammalian hippocampal neurons. Nature Protocols, 1-17.
- de Carvalho, L. M., Carvalho, V. M., Camargo, A. P., and Papes, F. (2025). Gene network analysis identifies dysregulated pathways in an autism spectrum disorder caused by mutations in Transcription Factor 4. Scientific Reports, 15(1), 4993.
- Sprecher, S. G. (2025). Disentangling how the brain is wired. Fly, 19(1), 2440950.
- Li, S., Cai, Y., and Xia, Z. (2025). Function and regulation of non-neuronal cells in the nervous system. Frontiers in Cellular Neuroscience, 19, 1550903.
- Saunders, G., Angeline, P., and Pollack, J. (1993). Structural and behavioral evolution of recurrent networks. Advances in Neural Information Processing Systems, 6.
- Angeline, P. J., Saunders, G. M., and Pollack, J. B. (1994). An evolutionary algorithm that constructs recurrent neural networks. IEEE transactions on Neural Networks, 5(1), 54-65.
- Schmidhuber, J. (1999). A general method for incremental self-improvement and multi-agent learning. In Evolutionary Computation: Theory and Applications (pp. 81-123).
- Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE, 87(9), 1423-1447.
- Floreano, D., Dürr, P., and Mattiussi, C. (2008). Neuroevolution: from architectures to learning. Evolutionary intelligence, 1, 47-62.
- Gomez, F. J., and Miikkulainen, R. (1999, July). Solving non-Markovian control tasks with neuroevolution. In IJCAI (Vol. 99, pp. 1356-1361).
- Moriarty, D. E., and Mikkulainen, R. (1996). Efficient reinforcement learning through symbiotic evolution. Machine learning, 22(1), 11-32.
- Gomez, F., and Miikkulainen, R. (1997). Incremental evolution of complex general behavior. Adaptive Behavior, 5(3-4), 317-342.
- MacQueen, J. (1967, January). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics (Vol. 5, pp. 281-298). University of California press.
- Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1), 1-22.
- Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological cybernetics, 43(1), 59-69.
- Belkin, M., and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 15(6), 1373-1396.
- Tishby, N., Pereira, F. C., and Bialek, W. (2000). The information bottleneck method. arXiv preprint physics/0004057.
- Hinton, G. E., and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. science, 313(5786), 504-507.
- Kingma, D. P., and Welling, M. (2013, December). Auto-encoding variational bayes.
- Van der Maaten, L., and Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).
- Roweis, S. T., and Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. science, 290(5500), 2323-2326.
- Bell, A. J., and Sejnowski, T. J. (1995). An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7(6), 1129-1159.
- Parmar, Tarun. (2020). Leveraging Unsupervised Learning for Identifying Unknown Defects in New Semiconductor Products. [CrossRef]
- Raikwar, Teena and Gupta, Divya (2025). AI-Driven Trust Management Framework for Secure Wireless Ad Hoc Networks. 6. 2582-6948.
- Moustakidis, S., Stergiou, K., Gee, M., Roshanmanesh, S., Hayati, F., Karlsson, P., and Papaelias, M. (2025). Deep Learning Autoencoders for Fast Fourier Transform-Based Clustering and Temporal Damage Evolution in Acoustic Emission Data from Composite Materials. Infrastructures, 10(3), 51.
- Liu W, Ning Q, Liu G, Wang H, Zhu Y, Zhong M (2025) Unsupervised feature selection algorithm based on L 2,p -norm feature reconstruction. PLoS ONE 20(3): e0318431. [CrossRef]
- Zhou, M., Sun, T., Yan, Y., Jing, M., Gao, Y., Jiang, B., ... and Zhao, J. (2025). Metabolic subtypes in hypertriglyceridemia and associations with diseases: insights from population-based metabolome atlas. Journal of Translational Medicine, 23(1), 1-5.
- Lin, P., Cai, Y., Wu, H., Yin, J., and Luorang, Z. (2025). AI-Driven Risk Control for Health Insurance Fund Management: A Data-Driven Approach. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS and CONTROL, 20(2).
- Huang, Y., Hu, J., and Luo, R. (2025). FMDL: Enhancing Open-World Object Detection with foundation models and dynamic learning. Expert Systems with Applications, 127050.
- Wu, J., and Liu, C. (2025). VQ-VAE-2 Based Unsupervised Algorithm for Detecting Concrete Structural Apparent Cracks. Materials Today Communications, 112075.
- Nagelli, A., and Saleena, B. (2025). Aspect-based Sentiment Analysis with Ontology-assisted Recommender System on Multilingual Data using Optimised Self-attention and Adaptive Deep Learning Network. Journal of Information and Knowledge Management.
- Ekanayake, M. B. Deep Learning for Magnetic Resonance Image Reconstruction and Super-resolution (Doctoral dissertation, Monash University).
- LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
- Friedman, J., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2), 337-407.
- Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
- Schapire, R. E. (1990). The strength of weak learnability. Machine learning, 5, 197-227.
- Rafiei, M., Shojaei, A., and Chau, Y. (2025). Machine learning-assisted design of immunomodulatory lipid nanoparticles for delivery of mRNA to repolarize hyperactivated microglia. Drug Delivery, 32(1), 2465909.
- Pei, Z., Wu, X., Wu, X., Xiao, Y., Yu, P., Gao, Z., ... and Guo, W. (2025). Segmenting Vegetation from UAV Images via Spectral Reconstruction in Complex Field Environments. Plant Phenomics, 100021.
- Efendi, A., Ammarullah, M. I., Isa, I. G. T., Sari, M. P., Izza, J. N., Nugroho, Y. S., ... and Alfian, D. (2025). IoT-Based Elderly Health Monitoring System Using Firebase Cloud Computing. Health Science Reports, 8(3), e70498.
- Pang, Y. T., Kuo, K. M., Yang, L., and Gumbart, J. C. (2025). DeepPath: Overcoming data scarcity for protein transition pathway prediction using physics-based deep learning. bioRxiv, 2025-02.
- Curry, A., Singer, M., Musu, A., and Caricchi, L. Supervised and Unsupervised Machine Learning Applied to an Ignimbrite Flare-Up in the Central San Juan Caldera Cluster, Colorado.
- Li, X., Ouyang, Q., Han, M., Liu, X., He, F., Zhu, Y., ... and Ma, J. (2025). π-PhenoDrug: A Comprehensive Deep Learning-Based Pipeline for Phenotypic Drug Screening in High-Content Analysis. Advanced Intelligent Systems, 2400635.
- Liu, Y., Deng, L., Ding, F., Zhang, W., Zhang, S., Zeng, B., ... and Wu, L. (2025). Discovery of ASGR1 and HMGCR dual-target inhibitors based on supervised learning, molecular docking, molecular dynamic simulations, and biological evaluation. Bioorganic Chemistry, 108326.
- Ghosh, S. Rate-Independent Gradient-Enhanced Plastic Deformation Model of Euler-Bernoulli Beams.
- Dutta, R., and Karmakar, S. (2024, March). Ransomware Detection in Healthcare Organizations Using Supervised Learning Models: Random Forest Technique. In International Conference on Emerging Trends and Technologies on Intelligent Systems (pp. 385-395). Singapore: Springer Nature Singapore.
- Tishby, N., Pereira, F. C., and Bialek, W. (2000). The information bottleneck method. arXiv preprint physics/0004057.
- Chechik, G., Globerson, A., Tishby, N., and Weiss, Y. (2003). Information bottleneck for Gaussian variables. Advances in Neural Information Processing Systems, 16.
- Chechik, G., and Tishby, N. (2002). Extracting relevant structures with side information. Advances in Neural Information Processing Systems, 15.
- Tishby, N., and Zaslavsky, N. (2015, April). Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw) (pp. 1-5). Ieee.
- Saxe, A. M., Bansal, Y., Dapello, J., Advani, M., Kolchinsky, A., Tracey, B. D., and Cox, D. D. (2019). On the information bottleneck theory of deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2019(12), 124020.
- Shwartz-Ziv, R., and Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810.
- Noshad, M., Zeng, Y., and Hero, A. O. (2019, May). Scalable mutual information estimation using dependence graphs. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2962-2966). IEEE.
- Goldfeld, Z., Berg, E. V. D., Greenewald, K., Melnyk, I., Nguyen, N., Kingsbury, B., and Polyanskiy, Y. (2018). Estimating information flow in deep neural networks. arXiv preprint arXiv:1810.05728.
- Ng, N. (2004). The distribution of the summatory function of the Möbius function. Proceedings of the London Mathematical Society, 89(2), 361-389.
- Geiger, B. C. (2021). On information plane analyses of neural network classifiers—A review. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 7039-7051.
- Kawaguchi, K., Deng, Z., Ji, X., and Huang, J. (2023, July). How does information bottleneck help deep learning?. In International Conference on Machine Learning (pp. 16049-16096). PMLR.
- Dardour, O., Aguilar, E., Radeva, P., and Zaied, M. (2025). Inter-separability and intra-concentration to enhance stochastic neural network adversarial robustness. Pattern Recognition Letters.
- Krinner, M., Aljalbout, E., Romero, A., and Scaramuzza, D. (2025). Accelerating Model-Based Reinforcement Learning with State-Space World Models. arXiv preprint arXiv:2502.20168.
- Yildirim, A. B., Pehlivan, H., and Dundar, A. (2024). Warping the residuals for image editing with stylegan. International Journal of Computer Vision, 1-16.
- Yang, Y., Wang, Y., Ma, C., Yu, L., Chersoni, E., and Huang, C. R. (2025). Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs. arXiv preprint arXiv:2502.19078.
- Liu, H., Jia, C., Shi, F., Cheng, X., and Chen, S. (2025). SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures. arXiv preprint arXiv:2503.01113.
- STIERLE, M., and Valtere, L. Addressing the Gene Therapy Bottleneck in the EU: Patent vs. Regulatory Incentives. Gewerblicher Rechtsschutz und Urheberrecht. Internationaler Teil.
- Chen, Z. S., Tan, Y., Ma, Z., Zhu, Z., and Skibniewski, M. J. (2025). Unlocking the potential of quantum computing in prefabricated construction supply chains: Current trends, challenges, and future directions. Information Fusion, 103043.
- Yuan, X., Smith, N. S., and Moghe, G. D. (2025). Analysis of plant metabolomics data using identification-free approaches. Applications in Plant Sciences, e70001.
- Dey, A., Sarkar, S., Mondal, A., and Mitra, P. (2025). Spatio-Temporal NDVI Prediction for Rice Crop. SN Computer Science, 6(3), 1-13.
- Li, W. (2025). Navigation path extraction for garden mobile robot based on road median point. EURASIP Journal on Advances in Signal Processing, 2025(1), 6.
- Adegoke, K., & Ghosh, S. (2021). Fibonacci-Zeta infinite series associated with the polygamma functions. arXiv preprint arXiv:2103.09799.
- Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory.
- Carreira-Perpinan, M. A., and Hinton, G. (2005, January). On contrastive divergence learning. In International workshop on artificial intelligence and statistics (pp. 33-40). PMLR.
- Hinton, G. E. (2012). A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade: Second Edition (pp. 599-619). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Fischer, A., and Igel, C. (2014). Training restricted Boltzmann machines: An introduction. Pattern Recognition, 47(1), 25-39.
- Larochelle, H., and Bengio, Y. (2008, July). Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th international conference on Machine learning (pp. 536-543).
- Salakhutdinov, R., Mnih, A., and Hinton, G. (2007, June). Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning (pp. 791-798).
- Coates, A., Ng, A., and Lee, H. (2011, June). An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 215-223). JMLR Workshop and Conference Proceedings.
- Hinton, G. E., and Salakhutdinov, R. R. (2009). Replicated softmax: an undirected topic model. Advances in neural information processing systems, 22.
- Adachi, S. H., and Henderson, M. P. (2015). Application of quantum annealing to training of deep neural networks. arXiv preprint arXiv:1510.06356.
- Salloum, H., Nayal, L., and Mazzara, M. Evaluating the Advantage 2 Quantum Annealer Prototype: A Comparative Evaluation with Advantage 1 and Hybrid Solver and Classical Restricted Boltzmann Machines on MNIST Classification.
- Joudaki, M. (2025). A Comprehensive Literature Review on the Use of Restricted Boltzmann Machines and Deep Belief Networks for Human Action Recognition.
- Prat Pou, A., Romero, E., Martí, J., and Mazzanti, F. (2025). Mean Field Initialization of the Annealed Importance Sampling Algorithm for an Efficient Evaluation of the Partition Function Using Restricted Boltzmann Machines. Entropy, 27(2), 171.
- Decelle, A., Gómez, A. D. J. N., and Seoane, B. (2025). Inferring High-Order Couplings with Neural Networks. arXiv preprint arXiv:2501.06108.
- Savitha, S., Kannan, A. R., and Logeswaran, K. (2025). Augmenting Cardiovascular Disease Prediction Through CWCF Integration Leveraging Harris Hawks Search in Deep Belief Networks. Cognitive Computation, 17(1), 52.
- Béreux, N., Decelle, A., Furtlehner, C., Rosset, L., and Seoane, B. (2025, April). Fast training and sampling of Restricted Boltzmann Machines. In 13th International Conference on Learning Representations-ICLR 2025.
- Thériault, R., Tosello, F., and Tantari, D. (2024). Modelling structured data learning with restricted boltzmann machines in the teacher-student setting. arXiv preprint arXiv:2410.16150.
- Manimurugan, S., Karthikeyan, P., Narmatha, C., Aborokbah, M. M., Paul, A., Ganesan, S., ... and Ammad-Uddin, M. (2024). A hybrid Bi-LSTM and RBM approach for advanced underwater object detection. PloS one, 19(11), e0313708.
- Hossain, M. M., Han, T. A., Ara, S. S., and Shamszaman, Z. U. (2025). Benchmarking Classical, Deep, and Generative Models for Human Activity Recognition. arXiv preprint arXiv:2501.08471.
- Qin, Y., Peng, Z., Miao, L., Chen, Z., Ouyang, J., and Yang, X. (2025). Integrating nanodevice and neuromorphic computing for enhanced magnetic anomaly detection. Measurement, 244, 116532.
- Lee, H., Grosse, R., Ranganath, R., and Ng, A. Y. (2009, June). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609-616).
- Guha, K., & Ghosh, S. (2021). Measuring Abundance with Abundancy Index. arXiv preprint arXiv:2106.08994.
- Mohamed, A. R., Dahl, G. E., and Hinton, G. (2011). Acoustic modeling using deep belief networks. IEEE transactions on audio, speech, and language processing, 20(1), 14-22.
- Peng, K., Jiao, R., Dong, J., and Pi, Y. (2019). A deep belief network based health indicator construction and remaining useful life prediction using improved particle filter. Neurocomputing, 361, 19-28.
- Zhang, Z., and Zhao, J. (2017). A deep belief network based fault diagnosis model for complex chemical processes. Computers and chemical engineering, 107, 395-407.
- Liu, H. (2018). Leveraging financial news for stock trend prediction with attention-based recurrent neural network. arXiv preprint arXiv:1811.06173.
- Zhang, D., Zou, L., Zhou, X., and He, F. (2018). Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer. Ieee Access, 6, 28936-28944.
- Hoang, D. T., and Kang, H. J. (2018, June). Deep belief network and dempster-shafer evidence theory for bearing fault diagnosis. In 2018 IEEE 27th international symposium on industrial electronics (ISIE) (pp. 841-846). IEEE.
- Adegoke, K., Olatinwo, A., & Ghosh, S. (2021). Cubic binomial Fibonacci sums. ijs, 1, 0.
- Zhong, P., Gong, Z., Li, S., and Schönlieb, C. B. (2017). Learning to diversify deep belief networks for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(6), 3516-3530.
- Alzughaibi, A. (2025). Leveraging Pattern Recognition based Fusion Approach for Pest Detection using Modified Artificial Hummingbird Algorithm with Deep Learning. Appl. Math, 19(3), 509-518.
- Tausani, L., Testolin, A., and Zorzi, M. (2025). Investigating the intrinsic top-down dynamics of deep generative models. Scientific Reports, 15(1), 2875.
- Kumar, S., and Ravi, V. (2025, January). XDATE: eXplainable Deep belief network-based Auto-encoder with exTended Garson Algorithm. In 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS) (pp. 108-113). IEEE.
- Alhajlah, M. (2024). Automated lesion detection in gastrointestinal endoscopic images: leveraging deep belief networks and genetic algorithm-based Segmentation. Multimedia Tools and Applications, 1-15.
- Pavithra, D., Bharathraj, R., Poovizhi, P., Libitharan, K., and Nivetha, V. (2025). Detection of IoT Attacks Using Hybrid RNN-DBN Model. Generative Artificial Intelligence: Concepts and Applications, 209-225.
- Bhadane, S. N., and Verma, P. (2024, November). Review of Machine Learning and Deep Learning algorithms for Personality traits classification. In 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI) (pp. 1-6). IEEE.
- Keivanimehr, A. R., and Akbari, M. (2025). TinyML and edge intelligence applications in cardiovascular disease: A survey. Computers in Biology and Medicine, 186, 109653.
- Kobak, D., and Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature communications, 10(1), 5416.
- Belkina, A. C., Ciccolella, C. O., Anno, R., Halpert, R., Spidlen, J., and Snyder-Cappione, J. E. (2019). Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nature communications, 10(1), 5415.
- Linderman, G. C., and Steinerberger, S. (2019). Clustering with t-SNE, provably. SIAM journal on mathematics of data science, 1(2), 313-332.
- De Amorim, R. C., and Mirkin, B. (2012). Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognition, 45(3), 1061-1075.
- Wattenberg, M., Viégas, F., and Johnson, I. (2016). How to use t-SNE effectively. Distill, 1(10), e2.
- Pezzotti, N., Lelieveldt, B. P., Van Der Maaten, L., Höllt, T., Eisemann, E., and Vilanova, A. (2016). Approximated and user steerable tSNE for progressive visual analytics. IEEE transactions on visualization and computer graphics, 23(7), 1739-1752.
- Kobak, D., and Linderman, G. C. (2021). Initialization is critical for preserving global data structure in both t-SNE and UMAP. Nature biotechnology, 39(2), 156-157.
- Becht, E., McInnes, L., Healy, J., Dutertre, C. A., Kwok, I. W., Ng, L. G., ... and Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature biotechnology, 37(1), 38-44.
- Moon, K. R., Van Dijk, D., Wang, Z., Gigante, S., Burkhardt, D. B., Chen, W. S., ... and Krishnaswamy, S. (2019). Visualizing structure and transitions in high-dimensional biological data. Nature biotechnology, 37(12), 1482-1492.
- Rivera, G., and Deniega, J. V. Artificial Intelligence-Driven Automation of Flow Cytometry Gating. Capstone Chronicles, 186.
- Chang, Y. C. I. (2025). A Survey: Potential Dimensionality Reduction Methods. arXiv preprint arXiv:2502.11036.
- Chern, W. C., Gunay, E., Okudan-Kremer, G. E., and Kremer, P. Exploring the Impact of Defect Attributes and Augmentation Variability on Recent Yolo Variants for Metal Defect Detection. Available at SSRN 5149346.
- Li, D., Monteiro, D. D. G. N., Jiang, H., and Chen, Q. (2025). Qualitative analysis of wheat aflatoxin B1 using olfactory visualization technique based on natural anthocyanins. Journal of Food Composition and Analysis, 107359.
- Singh, M., and Singh, M. K. (2025). Content-Based Gastric Image Retrieval Using Fusion of Deep Learning Features with Dimensionality Reduction. SN Computer Science, 6(2), 1-12.
- Ghosh, S. (2022). 106.43 Another proof of ex/y being irrational. The Mathematical Gazette, 106(567), 523-525.
- Ghosh, S. (2021). On the Irrationality and Transcendence of Rational Powers of e. Asian Research Journal of Mathematics, 17(2), 102-110.
- Sun, J. Q., Zhang, C., Liu, G. D., and Zhang, C. Detecting Muscle Fatigue during Lower Limb Isometric Contractions Tasks: A Machine Learning Approach. Frontiers in Physiology, 16, 1547257.
- Su, Z., Xiao, X., Tong, D., Wang, X., Zhong, Z., Zhao, P., and Yu, J. (2025, March). Seismic fragility of earth-rock dams with heterogeneous compacted materials using deep learning-aided intensity measure. In Structures (Vol. 73, p. 108373). Elsevier.
- Yousif, A. Y., and Al-Sarray, B. (2025, March). Integrating t-SNE and spectral clustering via convex optimization for enhanced breast cancer gene expression data diagnosing. In AIP Conference Proceedings (Vol. 3264, No. 1). AIP Publishing.
- Park, M. S., Lee, J. K., Kim, B., Ju, H. Y., Yoo, K. H., Jung, C. W., ... and Kim, H. Y. (2025). Assessing the clinical applicability of dimensionality reduction algorithms in flow cytometry for hematologic malignancies. Clinical Chemistry and Laboratory Medicine (CCLM), (0).
- Qiao, S., YANG, L., ZHANG, G., LU, A., and LI, F. (2025). Abstract B097: Cancer-associated fibroblasts in pancreatic ductal adenocarcinoma patients defined by a core inflammatory gene network exhibited inflammatory characteristics with high CCN2 expression. Cancer Immunology Research, 13(2-Supplement), B097-B097.
- Saul, L. K., and Roweis, S. T. (2000). An introduction to locally linear embedding. unpublished. Available at: http://www. cs. toronto. edu/ roweis/lle/publications. html.
- Polito, M., and Perona, P. (2001). Grouping and dimensionality reduction by locally linear embedding. Advances in neural information processing systems, 14.
- Bruun, R., & Ghosh, S. (2021). The Collatz graph as flow-diagram, the Copenhagen Graph and the different Algorithms for generating the Collatz odd series. arXiv preprint arXiv:2105.11334.
- Zhang, Z., and Zha, H. (2004). Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM journal on scientific computing, 26(1), 313-338.
- Donoho, D. L., and Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591-5596.
- Zhang, Z., and Wang, J. (2006). MLLE: Modified locally linear embedding using multiple weights. Advances in neural information processing systems, 19.
- Liang, P. (2005). Semi-supervised learning for natural language (Doctoral dissertation, Massachusetts Institute of Technology).
- Coates, A., and Ng, A. Y. (2012). Learning feature representations with k-means. In Neural Networks: Tricks of the Trade: Second Edition (pp. 561-580). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Hyvärinen, A., and Oja, E. (2000). Independent component analysis: algorithms and applications. Neural networks, 13(4-5), 411-430.
- Lee, H., Battle, A., Raina, R., and Ng, A. (2006). Efficient sparse coding algorithms. Advances in neural information processing systems, 19.
- Yang, B., Gu, X., An, S., Song, K., Wang, S., Qiu, X., and Meng, X. (2025). ASSESSMENT OF CHINESE CITIES’INTERNATIONAL TOURISM COMPETITIVENESS USING AN INTEGRATED ENTROPY-TOPSIS AND GRA MODEL.
- Wang, Y., Ma, T., Shen, L., Wang, X., and Luo, R. (2025). Prediction of thermal conductivity of natural rock materials using LLE-transformer-lightGBM model for geothermal energy applications. Energy Reports, 13, 2516-2530.
- Jin, X., Li, H., Xu, X., Xu, Z., and Su, F. (2025). Inverse Synthetic Aperture Radar Image Multi-Modal Zero-Shot Learning Based on the Scattering Center Model and Neighbor-Adapted Locally Linear Embedding. Remote Sensing, 17(4), 725.
- Li, X., Zhu, Z., Hui, L., Ma, X., Li, D., Yang, Z., and Nai, W. (2024, December). Locally Linear Embedding Based on Neiderreit Sequence Initialized Ali Baba and The Forty Thieves Algorithm. In 2024 IEEE 4th International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA) (Vol. 4, pp. 1466-1470). IEEE.
- 2025; Pouya Jafari, Ehsan Espandar, Fatemeh Baharifard, Snehashish Chakraverty, Linear local embedding, Dimensionality Reduction in Machine Learning, Morgan Kaufmann, 2025, Pages 129-156.
- Zhou, X., Ye, D., Yin, C., Wu, Y., Chen, S., Ge, X., ... and Liu, Q. (2025). Application of Machine Learning in Terahertz-Based Nondestructive Testing of Thermal Barrier Coatings with High-Temperature Growth Stresses. Coatings, 15(1), 49.
- Ghosh, S. (2020). Lattices and the Geometry of Numbers. arXiv preprint arXiv:2010.00245.
- Dou, F., Ju, Y., and Cheng, C. (2024, December). Fault detection based on locally linear embedding for traction systems in high-speed trains. In Fourth International Conference on Testing Technology and Automation Engineering (TTAE 2024) (Vol. 13439, pp. 314-319). SPIE.
- Bagherzadeh, M., Kahani, N., and Briand, L. (2021). Reinforcement learning for test case prioritization. IEEE Transactions on Software Engineering, 48(8), 2836-2856.
- Liu, H., Yang, B., Kang, F., Li, Q., and Zhang, H. (2025). Intelligent recognition algorithm of connection relation of substation secondary wiring drawing based on D-LLE algorithm. Discover Applied Sciences, 7(1), 1-12.
- Comon, P. (1994). Independent component analysis, a new concept?. Signal processing, 36(3), 287-314.
- Jutten, C., and Herault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal processing, 24(1), 1-10.
- Hyvärinen, A., and Oja, E. (1997). A fast fixed-point algorithm for independent component analysis. Neural computation, 9(7), 1483-1492.
- Cardoso, J. F., and Souloumiac, A. (1993, December). Blind beamforming for non-Gaussian signals. In IEE proceedings F (radar and signal processing) (Vol. 140, No. 6, pp. 362-370). IEE.
- Amari, S. I., Cichocki, A., and Yang, H. (1995). A new learning algorithm for blind signal separation. Advances in neural information processing systems, 8.
- Lee, T. W., Girolami, M., and Sejnowski, T. J. (1999). Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources. Neural computation, 11(2), 417-441.
- Pham, D. T., and Garat, P. (1997). Blind separation of mixture of independent sources through a quasi-maximum likelihood approach. IEEE transactions on Signal Processing, 45(7), 1712-1725.
- Højen-Sørensen, P. A., Winther, O., and Hansen, L. K. (2002). Mean-field approaches to independent component analysis. Neural Computation, 14(4), 889-918.
- Stone, J. V. (2004). Independent component analysis: a tutorial introduction.
- Behzadfar, N., Mathalon, D., Preda, A., Iraji, A., and Calhoun, V. D. (2025). A multi-frequency ICA-based approach for estimating voxelwise frequency difference patterns in fMRI data. Aperture Neuro, 5.
- Eierud, C., Norgaard, M., Bilgel, M., Petropoulos, H., Fu, Z., Iraji, A., ... and Calhoun, V. (2025). Building Multivariate Molecular Imaging Brain Atlases Using the NeuroMark PET Independent Component Analysis Framework. bioRxiv, 2025-02.
- Wang, J., Shen, Y., Awange, J., Tangdamrongsub, N., Feng, T., Hu, K., ... and Wang, X. (2025). Exploring potential drivers of terrestrial water storage anomaly trends in the Yangtze River Basin (2002–2019). Journal of Hydrology: Regional Studies, 58, 102264.
- Heurtebise, A., Chehab, O., Ablin, P., Gramfort, A., and Hyvärinen, A. (2025). Identifiable Multi-View Causal Discovery Without Non-Gaussianity. arXiv preprint arXiv:2502.20115.
- Ouyang, G., and Li, Y. (2025). Protocol for semi-automatic EEG preprocessing incorporating independent component analysis and principal component analysis. STAR Protocols, 6(1), 103682.
- Zhang, G., and Luck, S. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM-based decoding of EEG signals. bioRxiv, 2025-02.
- Kirsten, O., and Süssmuth, B. (2025). Forecasting the unforecastable: An independent component analysis for majority game-like global cryptocurrencies. Physica A: Statistical Mechanics and its Applications, 130472.
- Jung, S., Kim, J., and Kim, S. (2025). A hybrid fault detection method of independent component analysis and auto-associative kernel regression for process monitoring in power plant. IEEE Access.
- Wang, Z., Hu, L., Wang, Y., Li, H., Li, J., Tian, Z., and Zhou, H. (2025, February). A dual five-element stereo array passive acoustic localization fusion method. In Fourth International Computational Imaging Conference (CITA 2024) (Vol. 13542, pp. 17-28). SPIE.
- Luo, W., Xiong, S., Li, Y., and Jiang, P. (2025, March). Research on brain signal acquisition and transmission via noninvasive brain-computer interface. In Third International Conference on Algorithms, Network, and Communication Technology (ICANCT 2024) (Vol. 13545, pp. 366-374). SPIE.
- McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR.
- Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... and Zhao, S. (2021). Advances and open problems in federated learning. Foundations and trends® in machine learning, 14(1–2), 1-210.
- Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. (2016, October). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308-318).
- Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... and Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175-1191).
- Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., and Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.
- Cox, D., & Ghosh, S. (2022). An Analogue of Lagarias’ Inequality Pertaining to the Riemann Hypothesis. Global Journal of Pure and Applied Mathematics, 18(2), 735-752.
- Sattler, F., Wiedemann, S., Müller, K. R., and Samek, W. (2019). Robust and communication-efficient federated learning from non-iid data. IEEE transactions on neural networks and learning systems, 31(9), 3400-3413.
- Reddi, S., Charles, Z., Zaheer, M., Garrett, Z., Rush, K., Konečný, J., ... and McMahan, H. B. (2020). Adaptive federated optimization. arXiv preprint arXiv:2003.00295.
- Sattler, F., Marban, A., Rischke, R., and Samek, W. (2020). Communication-efficient federated distillation. arXiv preprint arXiv:2012.00632.
- Fallah, A., Mokhtari, A., and Ozdaglar, A. (2020). Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Advances in neural information processing systems, 33, 3557-3568.
- Sheller, M. J., Edwards, B., Reina, G. A., Martin, J., Pati, S., Kotrotsou, A., ... and Bakas, S. (2020). Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Scientific reports, 10(1), 12598.
- Byrd, D., and Polychroniadou, A. (2020, October). Differentially private secure multi-party computation for federated learning in financial applications. In Proceedings of the first ACM international conference on AI in finance (pp. 1-9).
- Jagatheesaperumal, S. K., Rahouti, M., Ahmad, K., Al-Fuqaha, A., and Guizani, M. (2021). The duo of artificial intelligence and big data for industry 4.0: Applications, techniques, challenges, and future research directions. IEEE Internet of Things Journal, 9(15), 12861-12885.
- Meduri, K., Nadella, G. S., Yadulla, A. R., Kasula, V. K., Maturi, M. H., Brown, S., ... and Gonaygunta, H. (2024). Leveraging Federated Learning for Privacy-Preserving Analysis of Multi-Institutional Electronic Health Records in Rare Disease Research. Journal of Economy and Technology.
- Tzortzis, I. N., Gutierrez-Torre, A., Sykiotis, S., Agulló, F., Bakalos, N., Doulamis, A., ... and Berral, J. L. (2025). Towards generalizable Federated Learning in Medical Imaging: A real-world case study on mammography data. Computational and Structural Biotechnology Journal.
- Szelag, J. K., Chin, J. J., and Yip, S. C. (2025). Adaptive Adversaries in Byzantine-Robust Federated Learning: A survey. Cryptology ePrint Archive.
- Cox, D., Ghosh, S., & Sultanow, E. (2022). Abundant Numbers and the Riemann Hypothesis. Global Journal of Pure and Applied Mathematics, 18(2), 613-637.
- Ferretti, S., Cassano, L., Cialone, G., D’Abramo, J., and Imboccioli, F. (2025). Decentralized coordination for resilient federated learning: A blockchain-based approach with smart contracts and decentralized storage. Computer Communications, 108112.
- Chen, Z., Hoang, D., Piran, F. J., Chen, R., and Imani, F. (2025). Federated Hyperdimensional Computing for hierarchical and distributed quality monitoring in smart manufacturing. Internet of Things, 101568.
- Mei, Q., Huang, R., Li, D., Li, J., Shi, N., Du, M., ... and Tian, C. (2025). Intelligent hierarchical federated learning system based on semi-asynchronous and scheduled synchronous control strategies in satellite network. Autonomous Intelligent Systems, 5(1), 9.
- Rawas, S., and Samala, A. D. (2025). EAFL: Edge-Assisted Federated Learning for real-time disease prediction using privacy-preserving AI. Iran Journal of Computer Science, 1-11.
- Becker, C., Peregrina, J. A., Beccard, F., Mohr, M., and Zirpins, C. (2025). A Study on the Efficiency of Combined Reconstruction and Poisoning Attacks in Federated Learning. Journal of Data Science and Intelligent Systems.
- Fu, H., Tian, F., Deng, G., Liang, L., and Zhang, X. (2025). Reads: A Personalized Federated Learning Framework with Fine-grained Layer Aggregation and Decentralized Clustering. IEEE Transactions on Mobile Computing.
- Li, Y., Kundu, S. S., Boels, M., Mahmoodi, T., Ourselin, S., Vercauteren, T., ... and Granados, A. (2025). UltraFlwr–An Efficient Federated Medical and Surgical Object Detection Framework. arXiv preprint arXiv:2503.15161.
- Shi, C., Li, J., Zhao, H., and Chang, Y. (2025). FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking. arXiv preprint arXiv:2503.15111.
- Cox, D., Ghosh, S., & Sultanow, E. (2021). Fermat’s Last Theorem and Related Problems. Journal of Advances in Mathematics and Computer Science, 36(5), 6-34.
- Choudhary, S. K. REAL-TIME FRAUD DETECTION USING AI-DRIVEN ANALYTICS IN THE CLOUD: SUCCESS STORIES AND APPLICATIONS.
- Zhou, Z., He, Y., Zhang, W., Ding, Z., Wu, B., and Xiao, K. Blockchain-Empowered Cluster Distillation Federated Learning for Heterogeneous Smart Grids. Available at SSRN 5187086.
- Cox, D., Ghosh, S., & Sultanow, E. (2020). Bounds of the Mertens Function. arXiv preprint arXiv:2012.11756.
- Scheirer, W. J., de Rezende Rocha, A., Sapkota, A., and Boult, T. E. (2012). Toward open set recognition. IEEE transactions on pattern analysis and machine intelligence, 35(7), 1757-1772.
- Bendale, A., and Boult, T. (2015). Towards open world recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1893-1902).
- Panareda Busto, P., and Gall, J. (2017). Open set domain adaptation. In Proceedings of the IEEE international conference on computer vision (pp. 754-763).
- Saito, K., Yamamoto, S., Ushiku, Y., and Harada, T. (2018). Open set domain adaptation by backpropagation. In Proceedings of the European conference on computer vision (ECCV) (pp. 153-168).
- Geng, C., Huang, S. J., and Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(10), 3614-3631.
- Cox, D., Ghosh, S., & Sultanow, E. (2021). The Farey Sequence and the Mertens Function. arXiv preprint arXiv:2105.12352.
- Chen, G., Qiao, L., Shi, Y., Peng, P., Li, J., Huang, T., ... and Tian, Y. (2020). Learning open set network with discriminative reciprocal points. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16 (pp. 507-522). Springer International Publishing.
- Liu, B., Kang, H., Li, H., Hua, G., and Vasconcelos, N. (2020). Few-shot open-set recognition using meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8798-8807).
- Kong, S., and Ramanan, D. (2021). Opengan: Open-set recognition via open data generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 813-822).
- Fang, Z., Lu, J., Liu, A., Liu, F., and Zhang, G. (2021, July). Learning bounds for open-set learning. In International conference on machine learning (pp. 3122-3132). PMLR.
- Mandivarapu, J. K., Camp, B., and Estrada, R. (2022). Deep active learning via open-set recognition. Frontiers in Artificial Intelligence, 5, 737363.
- Engelbrecht, E. R., and Preez, J. A. D. (2020). Open-set learning with augmented categories by exploiting unlabelled data. arXiv preprint arXiv:2002.01368.
- Shao, J. J., Yang, X. W., and Guo, L. Z. (2024). Open-set learning under covariate shift. Machine Learning, 113(4), 1643-1659.
- Park, J., Park, H., Jeong, E., and Teoh, A. B. J. (2024). Understanding open-set recognition by Jacobian norm and inter-class separation. Pattern Recognition, 145, 109942.
- Jormakka, J., & Ghosh, S. (2021). On primes which are congruent numbers. Journal of the Ramanujan Mathematical Society.
- Liu, Y. C., Ma, C. Y., Dai, X., Tian, J., Vajda, P., He, Z., and Kira, Z. (2022, October). Open-set semi-supervised object detection. In European conference on computer vision (pp. 143-159). Cham: Springer Nature Switzerland.
- Vaze, S., Han, K., Vedaldi, A., and Zisserman, A. (2021). Open-set recognition: A good closed-set classifier is all you need?
- Barcina-Blanco, M., Lobo, J. L., Garcia-Bringas, P., and Del Ser, J. (2023). Managing the unknown: a survey on open set recognition and tangential areas. arXiv preprint arXiv:2312.08785.
- iCGY96. (2023). Awesome Open Set Recognition List. GitHub. Retrieved April 1, 2025, from https://github.com/iCGY96/awesome_OpenSetRecognition_list.
- Wikipedia contributors. (n.d.). Topological deep learning. Wikipedia, The Free Encyclopedia. Retrieved April 1, 2025, from https://en.wikipedia.org/wiki/Topological_deep_learning.
- Zhou, Y., Fang, S., Li, S., Wang, B., and Kung, S. Y. (2024). Contrastive learning based open-set recognition with unknown score. Knowledge-Based Systems, 296, 111926.
- Abouzaid, S., Jaeschke, T., Kueppers, S., Barowski, J., and Pohl, N. (2023). Deep learning-based material characterization using FMCW radar with open-set recognition technique. IEEE Transactions on Microwave Theory and Techniques, 71(11), 4628-4638.
- Cevikalp, H., Uzun, B., Salk, Y., Saribas, H., and Köpüklü, O. (2023). From anomaly detection to open set recognition: Bridging the gap. Pattern Recognition, 138, 109385.
- Palechor, A., Bhoumik, A., and Günther, M. (2023). Large-scale open-set classification protocols for imagenet. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 42-51).
- Cox, D., Ghosh, S., & Sultanow, E. (2021). Sign of the Mertens function. Global Journal of Pure and Applied Mathematics, 17(2), 201-208.
- Cen, J., Luan, D., Zhang, S., Pei, Y., Zhang, Y., Zhao, D., ... and Chen, Q. (2023). The devil is in the wrongly-classified samples: Towards unified open-set recognition. arXiv preprint arXiv:2302.04002.
- Huang, H., Wang, Y., Hu, Q., and Cheng, M. M. (2022). Class-specific semantic reconstruction for open set recognition. IEEE transactions on pattern analysis and machine intelligence, 45(4), 4214-4228.
- Wang, Z., Xu, Q., Yang, Z., He, Y., Cao, X., and Huang, Q. (2022). Openauc: Towards auc-oriented open-set recognition. Advances in Neural Information Processing Systems, 35, 25033-25045.
- Alliegro, A., Borlino, F. C., and Tommasi, T. (2022). Towards open set 3d learning: A benchmark on object point clouds. arXiv preprint arXiv:2207.11554, 2(3).
- Grieggs, S., Shen, B., Rauch, G., Li, P., Ma, J., Chiang, D., ... and Scheirer, W. J. (2021). Measuring human perception to improve handwritten document transcription. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6594-6601.
- Grcić, M., Bevandić, P., and Šegvić, S. (2022, October). Densehybrid: Hybrid anomaly detection for dense open-set recognition. In European Conference on Computer Vision (pp. 500-517). Cham: Springer Nature Switzerland.
- Moon, W., Park, J., Seong, H. S., Cho, C. H., and Heo, J. P. (2022, October). Difficulty-aware simulator for open set recognition. In European conference on computer vision (pp. 365-381). Cham: Springer Nature Switzerland.
- Cox, D., Sultanow, E., & Ghosh, S. (2021). The Energy Spectral Density of the Mertens Function. Global Journal of Pure and Applied Mathematics, 17(2), 197-199.
- Kuchibhotla, H. C., Malagi, S. S., Chandhok, S., and Balasubramanian, V. N. (2022). Unseen classes at a later time? no problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9245-9254).
- Katsumata, K., Vo, D. M., and Nakayama, H. (2022). Ossgan: Open-set semi-supervised image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11185-11193).
- Bao, W., Yu, Q., and Kong, Y. (2022). Opental: Towards open set temporal action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2979-2989).
- Dietterich, T. G., and Guyer, A. (2022). The familiarity hypothesis: Explaining the behavior of deep open set methods. Pattern Recognition, 132, 108931.
- Ghosh, S. (2020). Reverse Cuthill–McKee Ordering [Computer software]. GitHub. https://github.com/SourangshuGhosh/Reverse-Cuthill-McKee-Ordering.
- Cai, J., Wang, Y., Hsu, H. M., Hwang, J. N., Magrane, K., and Rose, C. S. (2022, June). Luna: Localizing unfamiliarity near acquaintance for open-set long-tailed recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 1, pp. 131-139).
- Wang, Q. F., Geng, X., Lin, S. X., Xia, S. Y., Qi, L., and Xu, N. (2022, June). Learngene: From open-world to your learning task. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 8, pp. 8557-8565).
- Cox, D., & Ghosh, S. (2021). A Generalization of the Prime Number Theorem. Global Journal of Pure and Applied Mathematics, 17(1), 693-712.
- Zhang, X., Cheng, X., Zhang, D., Bonnington, P., and Ge, Z. (2022, June). Learning network architecture for open-set recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 3, pp. 3362-3370).
- Lu, J., Xu, Y., Li, H., Cheng, Z., and Niu, Y. (2022, June). Pmal: Open set recognition via robust prototype mining. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 2, pp. 1872-1880).
- Xia, Z., Wang, P., Dong, G., and Liu, H. (2023). Adversarial kinetic prototype framework for open set recognition. IEEE Transactions on Neural Networks and Learning Systems.
- Kong, S., and Ramanan, D. (2021). Opengan: Open-set recognition via open data generation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 813-822).
- Huang, J., Fang, C., Chen, W., Chai, Z., Wei, X., Wei, P., ... and Li, G. (2021). Trash to treasure: Harvesting ood data with cross-modal matching for open-set semi-supervised learning. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8310-8319).
- Cox, D., Ghosh, S., & Sultanow, E. (2020). Sequences and Polynomial Congruence. arXiv preprint arXiv:2012.11373.
- Wang, Y., Li, B., Che, T., Zhou, K., Liu, Z., and Li, D. (2021). Energy-based open-world uncertainty modeling for confidence calibration. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 9302-9311).
- Zhang, H., and Ding, H. (2021). Prototypical matching and open set rejection for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6974-6983).
- Girish, S., Suri, S., Rambhatla, S. S., and Shrivastava, A. (2021). Towards discovery and attribution of open-world gan generated images. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 14094-14103).
- Wang, W., Feiszli, M., Wang, H., and Tran, D. (2021). Unidentified video objects: A benchmark for dense, open-world segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10776-10785).
- Cen, J., Yun, P., Cai, J., Wang, M. Y., and Liu, M. (2021). Deep metric learning for open world semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 15333-15342).
- Wu, Z. F., Wei, T., Jiang, J., Mao, C., Tang, M., and Li, Y. F. (2021). Ngc: A unified framework for learning with open-world noisy data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 62-71).
- Bastan, M., Wu, H. Y., Cao, T., Kota, B., and Tek, M. (2019). Large scale open-set deep logo detection. arXiv preprint arXiv:1911.07440.
- Saito, K., Kim, D., and Saenko, K. (2021). Openmatch: Open-set consistency regularization for semi-supervised learning with outliers. arXiv preprint arXiv:2105.14148.
- Esmaeilpour, S., Liu, B., Robertson, E., and Shu, L. (2022, June). Zero-shot out-of-distribution detection based on the pre-trained model clip. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 6, pp. 6568-6576).
- Maji, M., Eswaran, K. S., Ghosh, S., Seshasayanan, K., & Shukla, V. (2023). Equivalence of nonequilibrium ensembles: Two-dimensional turbulence with a dual cascade. Physical Review E, 108(1), 015102.
- Chen, G., Peng, P., Wang, X., and Tian, Y. (2021). Adversarial reciprocal points learning for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 8065-8081.
- Guo, Y., Camporese, G., Yang, W., Sperduti, A., and Ballan, L. (2021). Conditional variational capsule network for open set recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 103-111).
- Bao, W., Yu, Q., and Kong, Y. (2021). Evidential deep learning for open set action recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13349-13358).
- Sun, X., Ding, H., Zhang, C., Lin, G., and Ling, K. V. (2021). M2iosr: Maximal mutual information open set recognition. arXiv preprint arXiv:2108.02373.
- Hwang, J., Oh, S. W., Lee, J. Y., and Han, B. (2021). Exemplar-based open-set panoptic segmentation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1175-1184).
- Ghosh, S. (2022). 106.44 Another proof of the irrationality of Np/q. The Mathematical Gazette, 106(567), 525-526.
- Balasubramanian, L., Kruber, F., Botsch, M., and Deng, K. (2021, July). Open-set recognition based on the combination of deep learning and ensemble method for detecting unknown traffic scenarios. In 2021 IEEE Intelligent Vehicles Symposium (IV) (pp. 674-681). IEEE.
- Jang, J., and Kim, C. O. (2023). Teacher–explorer–student learning: A novel learning method for open set recognition. IEEE Transactions on Neural Networks and Learning Systems.
- Zhou, D. W., Ye, H. J., and Zhan, D. C. (2021). Learning placeholders for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4401-4410).
- Cevikalp, H., Uzun, B., Köpüklü, O., and Ozturk, G. (2021). Deep compact polyhedral conic classifier for open and closed set recognition. Pattern Recognition, 119, 108080.
- Yue, Z., Wang, T., Sun, Q., Hua, X. S., and Zhang, H. (2021). Counterfactual zero-shot and open-set visual recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 15404-15414).
- Jia, J., and Chan, P. K. (2022, September). Self-supervised detransformation autoencoder for representation learning in open set recognition. In International Conference on Artificial Neural Networks (pp. 471-483). Cham: Springer Nature Switzerland.
- Jia, J., and Chan, P. K. (2021). MMF: A loss extension for feature learning in open set recognition. In Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part II 30 (pp. 319-331). Springer International Publishing.
- Salomon, G., Britto, A., Vareto, R. H., Schwartz, W. R., and Menotti, D. (2020, July). Open-set face recognition for small galleries using siamese networks. In 2020 International conference on systems, signals and image processing (IWSSIP) (pp. 161-166). IEEE.
- Jormakka, J., & Ghosh, S. (2021). Why Recessive Lethal Alleles Have Not Disappeared?
- Sun, X., Yang, Z., Zhang, C., Ling, K. V., and Peng, G. (2020). Conditional gaussian distribution learning for open set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13480-13489).
- Perera, P., Morariu, V. I., Jain, R., Manjunatha, V., Wigington, C., Ordonez, V., and Patel, V. M. (2020). Generative-discriminative feature representations for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11814-11823).
- Ditria, L., Meyer, B. J., and Drummond, T. (2020). Opengan: Open set generative adversarial networks. In Proceedings of the Asian conference on computer vision.
- Geng, C., and Chen, S. (2020). Collective decision for open set recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1), 192-204.
- Jang, J., and Kim, C. O. (2020). One-vs-rest network-based deep probability model for open set recognition. arXiv preprint arXiv:2004.08067.
- Zhang, H., Li, A., Guo, J., and Guo, Y. (2020). Hybrid models for open set recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16 (pp. 102-117). Springer International Publishing.
- Shao, R., Perera, P., Yuen, P. C., and Patel, V. M. (2020, August). Open-set adversarial defense. In European Conference on Computer Vision (pp. 682-698). Cham: Springer International Publishing.
- Yu, Q., Ikami, D., Irie, G., and Aizawa, K. (2020). Multi-task curriculum framework for open-set semi-supervised learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16 (pp. 438-454). Springer International Publishing.
- Miller, D., Sunderhauf, N., Milford, M., and Dayoub, F. (2021). Class anchor clustering: A loss for distance-based open set recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3570-3578).
- Jia, J., and Chan, P. K. (2021). MMF: A loss extension for feature learning in open set recognition. In Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part II 30 (pp. 319-331). Springer International Publishing.
- Jormakka, J., & Ghosh, S. (2021). Analysis of risks and costs in intruder detection with Markov Decision Processes.
- Jormakka, J., & Ghosh, S. (2021). Calculating Cost Distributions of a Multiservice Loss System. arXiv preprint arXiv:2108.12277.
- Anantha Krishna, B., Gopal, M. S., & Ghosh, S. (2021). Special Primes And Some Of Their Properties. Global Journal of Pure and Applied Mathematics, 17(2), 257-263.
- Oliveira, H., Silva, C., Machado, G. L., Nogueira, K., and Dos Santos, J. A. (2023). Fully convolutional open set segmentation. Machine Learning, 1-52.
- Yang, Y., Wei, H., Sun, Z. Q., Li, G. Y., Zhou, Y., Xiong, H., and Yang, J. (2021). S2OSC: A holistic semi-supervised approach for open set classification. ACM Transactions on Knowledge Discovery from Data (TKDD), 16(2), 1-27.
- Sun, X., Zhang, C., Lin, G., and Ling, K. V. (2020). Open set recognition with conditional probabilistic generative models. arXiv preprint arXiv:2008.05129.
- Yang, H. M., Zhang, X. Y., Yin, F., Yang, Q., and Liu, C. L. (2020). Convolutional prototype network for open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5), 2358-2370.
- Dhamija, A., Gunther, M., Ventura, J., and Boult, T. (2020). The overlooked elephant of object detection: Open set. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1021-1030).
- Meyer, B. J., and Drummond, T. (2019, May). The importance of metric learning for robotic vision: Open set recognition and active learning. In 2019 International Conference on Robotics and Automation (ICRA) (pp. 2924-2931). IEEE.
- Oza, P., and Patel, V. M. (2019). Deep CNN-based multi-task learning for open-set recognition. arXiv preprint arXiv:1903.03161.
- Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., and Naemura, T. (2019). Classification-reconstruction learning for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4016-4025).
- Malalur, P., and Jaakkola, T. (2019). Alignment based matching networks for one-shot classification and open-set recognition. arXiv preprint arXiv:1903.06538.
- Schlachter, P., Liao, Y., and Yang, B. (2019, September). Open-set recognition using intra-class splitting. In 2019 27th European signal processing conference (EUSIPCO) (pp. 1-5). IEEE.
- Imoscopi, S., Grancharov, V., Sverrisson, S., Karlsson, E., and Pobloth, H. (2019). Experiments on Open-Set Speaker Identification with Discriminatively Trained Neural Networks. arXiv preprint arXiv:1904.01269.
- Mundt, M., Pliushch, I., Majumder, S., and Ramesh, V. (2019). Open set recognition through deep neural network uncertainty: Does out-of-distribution detection require generative classifiers?. In Proceedings of the IEEE/CVF international conference on computer vision workshops (pp. 0-0).
- Gupta, A., Aberkane, I. J., Ghosh, S., Abold, A., Rahn, A., & Sultanow, E. (2022). Rotating Binaries. AppliedMath, 2(1), 104-117.
- Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., and Yu, S. X. (2019). Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2537-2546).
- Perera, P., and Patel, V. M. (2019). Deep transfer learning for multiple class novelty detection. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 11544-11552).
- Xiong, H., Lu, H., Liu, C., Liu, L., Cao, Z., and Shen, C. (2019). From open set to closed set: Counting objects by spatial divide-and-conquer. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8362-8371).
- Yang, Y., Hou, C., Lang, Y., Guan, D., Huang, D., and Xu, J. (2019). Open-set human activity recognition based on micro-Doppler signatures. Pattern Recognition, 85, 60-69.
- Oza, P., and Patel, V. M. (2019). C2ae: Class conditioned auto-encoder for open-set recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2307-2316).
- Liu, S., Garrepalli, R., Dietterich, T., Fern, A., and Hendrycks, D. (2018, July). Open category detection with PAC guarantees. In International Conference on Machine Learning (pp. 3169-3178). PMLR.
- Venkataram, V. M. (2018). Open set text classification using neural networks. University of Colorado Colorado Springs.
- Hassen, M., and Chan, P. K. (2020). Learning a neural-network-based representation for open set recognition. In Proceedings of the 2020 SIAM International Conference on Data Mining (pp. 154-162). Society for Industrial and Applied Mathematics.
- Shu, L., Xu, H., and Liu, B. (2018). Unseen class discovery in open-world classification. arXiv preprint arXiv:1801.05609.
- Dhamija, A. R., Günther, M., and Boult, T. (2018). Reducing network agnostophobia. Advances in Neural Information Processing Systems, 31.
- Cox, D., Sultanow, E., & Ghosh, S. (2022). Quadratic, Cubic, Biquadratic, and Quintic Reciprocity. International Journal of Pure and Applied Mathematics Research, 2(1), 15-39.
- Zheng, Z., Zheng, L., Hu, Z., and Yang, Y. (2018). Open set adversarial examples. arXiv preprint arXiv:1809.02681, 3.
- Neal, L., Olson, M., Fern, X., Wong, W. K., and Li, F. (2018). Open set learning with counterfactual images. In Proceedings of the European conference on computer vision (ECCV) (pp. 613-628).
- Rudd, E. M., Jain, L. P., Scheirer, W. J., and Boult, T. E. (2017). The extreme value machine. IEEE transactions on pattern analysis and machine intelligence, 40(3), 762-768.
- Vignotto, E., and Engelke, S. (2018). Extreme Value Theory for Open Set Classification–GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
- Cardoso, D. O., Gama, J., and França, F. M. (2017). Weightless neural networks for open set recognition. Machine Learning, 106(9), 1547-1567.
- Rozsa, A., Günther, M., and Boult, T. E. (2017). Adversarial robustness: Softmax versus openmax. arXiv preprint arXiv:1708.01697.
- Shu, L., Xu, H., and Liu, B. (2017). Doc: Deep open classification of text documents. arXiv preprint arXiv:1709.08716.
- Ge, Z., Demyanov, S., Chen, Z., and Garnavi, R. (2017). Generative openmax for multi-class open set classification. arXiv preprint arXiv:1707.07418.
- Yu, Y., Qu, W. Y., Li, N., and Guo, Z. (2017). Open-category classification by adversarial sample generation. arXiv preprint arXiv:1705.08722.
- Júnior, P. R. M., Boult, T. E., Wainer, J., and Rocha, A. (2016). Specialized support vector machines for open-set recognition. arXiv preprint arXiv:1606.03802, 2.
- Mendes Júnior, P. R., De Souza, R. M., Werneck, R. D. O., Stein, B. V., Pazinato, D. V., De Almeida, W. R., ... and Rocha, A. (2017). Nearest neighbors distance ratio open-set classifier. Machine Learning, 106(3), 359-386.
- Dong, H., Fu, Y., Hwang, S. J., Sigal, L., and Xue, X. (2022). Learning the compositional domains for generalized zero-shot learning. Computer Vision and Image Understanding, 221, 103454.
- Vareto, R., Silva, S., Costa, F., and Schwartz, W. R. (2017, October). Towards open-set face recognition using hashing functions. In 2017 IEEE international joint conference on biometrics (IJCB) (pp. 634-641). IEEE.
- Fei, G., and Liu, B. (2016, June). Breaking the closed world assumption in text classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 506-514).
- Cox, D., & Ghosh, S. (2023). A Note on Colossally Abundant Numbers.
- Neira, M. A. C., Júnior, P. R. M., Rocha, A., and Torres, R. D. S. (2018). Data-fusion techniques for open-set recognition problems. IEEE access, 6, 21242-21265.
- Scheirer, W. J., Jain, L. P., and Boult, T. E. (2014). Probability models for open set recognition. IEEE transactions on pattern analysis and machine intelligence, 36(11), 2317-2324.
- Jain, L. P., Scheirer, W. J., and Boult, T. E. (2014). Multi-class open set recognition using probability of inclusion. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13 (pp. 393-409). Springer International Publishing.
- Zhang, H., and Patel, V. M. (2016). Sparse representation-based open set recognition. IEEE transactions on pattern analysis and machine intelligence, 39(8), 1690-1696.
- Cevikalp, H. (2016). Best fitting hyperplanes for classification. IEEE transactions on pattern analysis and machine intelligence, 39(6), 1076-1088.
- Cevikalp, H., and Serhan Yavuz, H. (2017). Fast and accurate face recognition with image sets. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 1564-1572).
- Gal, Y., and Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. Proceedings of the 33rd International Conference on Machine Learning (ICML), 1050–1059.
- Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems (NeurIPS), 30.
- Rudd, E. M., Jain, L. P., Scheirer, W. J., and Boult, T. E. (2017). The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(3), 762–768.
- Malinin, A., and Gales, M. (2018). Predictive uncertainty estimation via prior networks. Advances in Neural Information Processing Systems (NeurIPS), 31.
- Liu, W., Wang, X., Owens, J., and Li, Y. (2020). Energy-based out-of-distribution detection. Advances in Neural Information Processing Systems (NeurIPS), 33.
- Chen, G., Peng, P., Ma, L., Li, J., Du, L., and Tian, Y. (2021). Bayesian open-world learning. International Conference on Learning Representations (ICLR).
- Nandy, J., Hsu, W., and Lee, M. L. (2020). Towards maximizing the representation gap between in-domain and out-of-distribution examples. Advances in Neural Information Processing Systems (NeurIPS), 33.
- Ghosh, S. (2019). Bayesian Beer Market Estimation Simulating Nash Equilibrium Market Outcomes with Bayesian Analysis of Choice-Based Conjoint Data.
- Mukhoti, J., Kirsch, A., van Amersfoort, J., Torr, P. H., and Gal, Y. (2021). Deterministic neural networks with inductive biases capture epistemic and aleatoric uncertainty. Proceedings of the 38th International Conference on Machine Learning (ICML).
- Kristiadi, A., Hein, M., and Hennig, P. (2020). Being Bayesian, even just a bit, fixes overconfidence in ReLU networks. Proceedings of the 37th International Conference on Machine Learning (ICML).
- Ovadia, Y., Fertig, E., Ren, J., Nado, Z., Sculley, D., Nowozin, S., ... and Snoek, J. (2019). Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems (NeurIPS), 32.
- Sensoy, M., Kaplan, L., and Kandemir, M. (2018). Evidential deep learning to quantify classification uncertainty. Advances in Neural Information Processing Systems (NeurIPS), 31.
- Bendale, A., and Boult, T. E. (2016). Towards open set deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1563–1572.
- Neal, L., Olson, M., Fern, X., Wong, W. K., and Li, F. (2018). Open set learning with counterfactual images. Proceedings of the European Conference on Computer Vision (ECCV).
- Zhang, H., Li, A., Guo, J., and Guo, Y. (2020). Hybrid models for open set recognition. Proceedings of the European Conference on Computer Vision (ECCV).
- Charoenphakdee, N., Lee, J., and Sugiyama, M. (2021). On symmetric losses for learning from corrupted labels. Proceedings of the 38th International Conference on Machine Learning (ICML).
- Hendrycks, D., Mazeika, M., and Dietterich, T. (2019). Deep anomaly detection with outlier exposure. International Conference on Learning Representations (ICLR).
- Vaze, S., Han, K., Vedaldi, A., and Zisserman, A. (2022). Generalized category discovery. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Liu, J., Lin, Z., Padhy, S., Tran, D., Bedrax Weiss, T., and Lakshminarayanan, B. (2020). Simple and principled uncertainty estimation with deterministic deep learning via distance awareness. Advances in Neural Information Processing Systems (NeurIPS), 33.
- Van Amersfoort, J., Smith, L., Teh, Y. W., and Gal, Y. (2020). Uncertainty estimation using a single deep deterministic neural network. Proceedings of the 37th International Conference on Machine Learning (ICML).
- Smith, L., and Gal, Y. (2018). Understanding measures of uncertainty for adversarial example detection. Conference on Uncertainty in Artificial Intelligence (UAI).
- Fort, S., Hu, H., and Lakshminarayanan, B. (2019). Deep ensembles: A loss landscape perspective. Advances in Neural Information Processing Systems (NeurIPS), 32.
- Ober, S. W., Rasmussen, C. E., and van der Wilk, M. (2021). The promises and pitfalls of deep kernel learning. Proceedings of the 38th International Conference on Machine Learning (ICML).
- Sun, S., Zhang, G., Shi, J., and Grosse, R. (2019). Functional variational Bayesian neural networks. International Conference on Learning Representations (ICLR).
- Bradshaw, J., Matthews, A. G., and Ghahramani, Z. (2017). Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS).
- Daxberger, E., Kristiadi, A., Immer, A., Eschenhagen, R., Bauer, M., and Hennig, P. (2021). Laplace redux—effortless Bayesian deep learning. Advances in Neural Information Processing Systems (NeurIPS), 34.
- Kapoor, S., Benavoli, A., Azzimonti, D., and Póczos, B. (2022). Robust Bayesian inference for simulator-based models via the MMD posterior bootstrap. Journal of Machine Learning Research (JMLR), 23(1).
- Pidhorskyi, S., Almohsen, R., and Doretto, G. (2018). Generative probabilistic novelty detection with adversarial autoencoders. Advances in Neural Information Processing Systems (NeurIPS), 31.
- Schlegl, T., Seeböck, P., Waldstein, S. M., Schmidt-Erfurth, U., and Langs, G. (2017). Unsupervised anomaly detection with generative adversarial networks. Medical Image Computing and Computer-Assisted Intervention (MICCAI).
- Xiao, Z., Yan, Q., and Amit, Y. (2020). Likelihood regret: An out-of-distribution detection score for variational auto-encoder. Advances in Neural Information Processing Systems (NeurIPS), 33.
- Nalisnick, E., Matsukawa, A., Teh, Y. W., Gorur, D., and Lakshminarayanan, B. (2019). Do deep generative models know what they don’t know? International Conference on Learning Representations (ICLR).
- Choi, H., Jang, E., and Alemi, A. A. (2018). WAIC, but why? Generative ensembles for robust anomaly detection. Advances in Neural Information Processing Systems (NeurIPS), 31.
- Denouden, T., Salay, R., Czarnecki, K., Abdelzad, V., Phan, B., and Vernekar, S. (2018). Improving reconstruction autoencoder out-of-distribution detection with Mahalanobis distance. NeurIPS Workshop on Bayesian Deep Learning.
- Kirichenko, P., Izmailov, P., and Wilson, A. G. (2020). Why normalizing flows fail to detect out-of-distribution data. Advances in Neural Information Processing Systems (NeurIPS), 33.
- Serra, J., Alvarez, D., Gomez, V., Slizovskaia, O., Núñez, J. F., and Luque, J. (2020). Input complexity and out-of-distribution detection with likelihood-based generative models. Proceedings of the 37th International Conference on Machine Learning (ICML).
- Morningstar, W., Ham, C., Gallagher, A., Lakshminarayanan, B., Alemi, A., and Dillon, J. (2021). Density-supervised deep learning for uncertainty quantification. International Conference on Learning Representations (ICLR).
- Ruff, L., Kauffmann, J. R., Vandermeulen, R. A., Montavon, G., Samek, W., Kloft, M., ... and Müller, K. R. (2021). A unifying review of deep and shallow anomaly detection. Journal of Machine Learning Research (JMLR), 22(1).
- Lampert, C. H., Nickisch, H., and Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 951-958). IEEE.
- Akata, Z., Reed, S., Walter, D., Lee, H., and Schiele, B. (2013). Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2927-2936).
- Romera-Paredes, B., and Torr, P. H. (2015). An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning (pp. 2152-2161). PMLR.
- Ghosh, S. (2020). Withdrawn: A Proof to the Riemann Hypothesis.
- Xian, Y., Lampert, C. H., Schiele, B., and Akata, Z. (2017). Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251-2265.
- Zhang, L., Xiang, T., and Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2021-2030).
- Fu, Y., Hospedales, T. M., Xiang, T., and Gong, S. (2015). Transductive multi-view zero-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11), 2332-2345.
- Kodirov, E., Xiang, T., and Gong, S. (2017). Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3174-3183).
- Changpinyo, S., Chao, W. L., Gong, B., and Sha, F. (2016). Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5327-5336).
- Kampffmeyer, M., Chen, Y., Liang, X., Wang, H., Zhang, Y., and Xing, E. P. (2019). Rethinking knowledge graph propagation for zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11487-11496).
- Wang, X., Ye, Y., and Gupta, A. (2018). Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6857-6866).
- Li, J., Jing, M., Lu, K., Ding, Z., Zhu, L., and Huang, Z. (2019). Leveraging the invariant side of generative zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7402-7411).
- Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., et al. (2021). Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (pp. 8748-8763). PMLR.
- Chao, W. L., Changpinyo, S., Gong, B., and Sha, F. (2016). An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European Conference on Computer Vision (pp. 52-68). Springer.
- Verma, V. K., Rai, P., and Namboodiri, A. (2018). Generalized zero-shot learning via synthesized examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4281-4289).
- Huynh, D., and Elhamifar, E. (2020). Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4483-4493).
- Palatucci, M., Pomerleau, D., Hinton, G. E., and Mitchell, T. M. (2009). Zero-shot learning with semantic output codes. In Advances in Neural Information Processing Systems (pp. 1410-1418).
- Socher, R., Ganjoo, M., Manning, C. D., and Ng, A. (2013). Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems (pp. 935-943).
- Hariharan, B., and Girshick, R. (2017). Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE International Conference on Computer Vision (pp. 3018-3027).
- Xian, Y., Lorenz, T., Schiele, B., and Akata, Z. (2018). Feature generating networks for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5542-5551).
- Scheirer, W. J., Rocha, A., Sapkota, A., and Boult, T. E. (2013). Toward open set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1757-1772.
- Yang, J., Zhou, K., Li, Y., and Liu, Z. (2021). Generalized out-of-distribution detection: A survey. arXiv preprint arXiv:2110.11334.
- Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and Ganguli, S. (2015, June). Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning (pp. 2256-2265). pmlr.
- Song, Y., & Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32.
- Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
- Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. Advances in neural information processing systems, 33, 6840-6851.
- Nichol, A. Q., & Dhariwal, P. (2021, July). Improved denoising diffusion probabilistic models. In International conference on machine learning (pp. 8162-8171). PMLR.
- Dhariwal, P., & Nichol, A. (2021). Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34, 8780-8794.
- Song, Y., Durkan, C., Murray, I., & Ermon, S. (2021). Maximum likelihood training of score-based diffusion models. Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 130, 3391-3401.
- Huang, R., Chen, J., Ren, Z., Liu, J., Su, H., & Zhao, D. (2022). Diffusion-based voice conversion with fast maximum likelihood sampling scheme. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6812-6816.
- Vahdat, A., Kreis, K., & Kautz, J. (2021). Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34, 11287-11302.
- Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2022). Analyzing and improving the training dynamics of diffusion models. International Conference on Learning Representations.
- Bansal, A., Borgnia, E., Chu, H.-M., Li, J. S., Kazemi, H., Huang, F., Goldblum, M., & Goldstein, T. (2022). Cold diffusion: Inverting arbitrary image transforms without noise. Advances in Neural Information Processing Systems, 35, 21440-21458.
- Ghosh, S. Burgers Equation Analytical Solution.
- Song, J., Meng, C., & Ermon, S. (2020). Denoising diffusion implicit models. Advances in Neural Information Processing Systems, 33, 6840-6851.
- Dockhorn, T., Vahdat, A., & Kreis, K. (2022). Score-based generative modeling with critically-damped Langevin diffusion. International Conference on Learning Representations.
- Vincent, P., Meng, L., Song, Y., & Ermon, S. (2022). Uncertainty estimation in score-based models. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, 151, 10347-10358.
- Li, X. L., Li, C., Hosseini, S., Chen, M., & Carin, L. (2022). Diffusion-LM improves controllable text generation. Advances in Neural Information Processing Systems, 35, 4328-4343.
- Janner, M., Du, Y., Tenenbaum, J., & Levine, S. (2022). Planning with diffusion for flexible behavior synthesis. Advances in Neural Information Processing Systems, 35, 19421-19436.
- Ho, J., & Salimans, T. (2022). Classifier-free diffusion guidance. International Conference on Learning Representations.
- Nie, W., Guo, B., Chang, Y., & Liu, S. (2022). Diffusion models for adversarial purification. Advances in Neural Information Processing Systems, 35, 20305-20318.
- Block, A., Mroueh, Y., & Rakhlin, A. (2022). On the convergence of diffusion models: A non-asymptotic approach. Advances in Neural Information Processing Systems, 35, 2986–3000. https://proceedings.neurips.cc/paper_files/paper/2022/hash/1a5b1e4daae265b790965a275b53ae50-Abstract-Conference.html.
- Liu, L., Ren, Y., Lin, Z., & Zhao, Z. (2022). Diffusion models for adversarial purification and optimal transport. International Conference on Machine Learning, 162, 13951–13969. https://proceedings.mlr.press/v162/liu22f.html.
- Dai, Z., Gifford, D., & Khosla, M. (2022). Score-based generative models with learned constraints. Advances in Neural Information Processing Systems, 35, 21144-21156.
- Luo, S., Hu, W., Zhang, Y., Liu, H., & Wang, H. (2022). Diffusion models for structured data. Advances in Neural Information Processing Systems, 35, 28382-28394.
- Xu, M., Zhang, J., Ju, F., & Tang, J. (2022). Score-based generative models for molecular design. Advances in Neural Information Processing Systems, 35, 14284-14298.
- Cox, D., Ghosh, S., & Sultanow, E. (2021). Euler’s Totient Function, the Mangoldt Function, and a Sequence of Mertens Function Values.
- Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10684-10695.
- Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., ... & Norouzi, M. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, 36479-36494.
- Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125.
- Chung, H., Sim, B., & Ye, J. C. (2022). Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12413-12422.
- Ghosh, S. (2023). Equivalence of Galerkin Finite Element Formulation and Central Difference Formulation of the Advection-Diffusion Equation.
- Ghosh, S. (2023). Stability Analysis of Forward Euler, Backward Euler, and Central Difference Time Integration Schemes.
- Watson, D., Chan, W., Ho, J., & Norouzi, M. (2022). Learning fast samplers for diffusion models by differentiating through sample quality. International Conference on Learning Representations.
- Poole, B., Jain, A., Barron, J. T., & Mildenhall, B. (2023). DreamFusion: Text-to-3D using 2D diffusion. International Conference on Learning Representations.
- Harvey, W., Naderiparizi, S., Masrani, V., Weilbach, C., & Wood, F. (2023). Flexible diffusion modeling of long videos. Advances in Neural Information Processing Systems, 36.
- Tewel, Y., Gal, R., Chechik, G., & Atzmon, Y. (2023). Key-Locked Rank One Editing for Text-to-Image Personalization. ACM Transactions on Graphics, 42(4).
- Xiao, Z., Kreis, K., & Vahdat, A. (2023). Tackling the generative learning trilemma with denoising diffusion GANs. International Conference on Learning Representations.
- Nash, C., Menick, J., Dieleman, S., & Battaglia, P. (2023). Generating images with sparse representations. International Conference on Machine Learning.
- Austin, J., Johnson, D. D., Ho, J., Tarlow, D., & van den Berg, R. (2023). Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 36.
- Chen, T., Zhang, R., & Hinton, G. (2023). Analog bits: Generating discrete data using diffusion models with self-conditioning. International Conference on Learning Representations.
- De Bortoli, V., Mathieu, E., Hutchinson, M., Thornton, J., Teh, Y. W., & Doucet, A. (2023). Riemannian score-based generative modelling. Advances in Neural Information Processing Systems, 36.
- Cho, J., Zala, A., & Bansal, M. (2023). DALL-Eval: Probing the reasoning skills and social biases of text-to-image generation models. Proceedings of the IEEE/CVF International Conference on Computer Vision.
- Fernandez, P., Sablayrolles, A., Furon, T., Jégou, H., & Douze, M. (2023). Watermarking images in self-supervised latent spaces. IEEE International Conference on Acoustics, Speech and Signal Processing.
- Gupta, A., Xiong, W., Nie, Y., Allingham, J. U., & Zhou, M. (2023). An ethical framework for generative AI and its application to text-to-image synthesis. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society.
- Ryu, M., Lee, K., & Ye, J. C. (2023). Memory-efficient diffusion models via gradient checkpointing. International Conference on Machine Learning.
- Sanchez-Gonzalez, A., Heess, N., Springenberg, J. T., Merel, J., Riedmiller, M., Hadsell, R., & Battaglia, P. (2023). Graph networks as learnable physics engines for inference and control. Proceedings of the 38th International Conference on Machine Learning.
- Ellis, K., Wong, C., Nye, M., Sablé-Meyer, M., Cary, L., Morales, L., ... & Solar-Lezama, A. (2023). DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning. Communications of the ACM, 66(7), 76-86.
- Olah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M., & Carter, S. (2023). Zoom in: An introduction to circuits. Distill, 8(3).
- Amodei, D., Steinhardt, J., Christiano, P., & Leike, J. (2023). Concrete problems in AI safety. Communications of the ACM, 66(9), 38-47.
- Jormakka, J., & Ghosh, S. (2021). Applications of generating functions to stochastic processes and to the complexity of the knapsack problem.
- Kingma, D. P., Salimans, T., Poole, B., & Ho, J. (2023). Variational diffusion models. Journal of Machine Learning Research, 24(136), 1-62.
- Meng, C., Song, Y., Ermon, S., & Kingma, D. P. (2023). On distillation of guided diffusion models. Proceedings of the 40th International Conference on Machine Learning, 202, 25342-25356.
- Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2023). Scalable diffusion models with transformers. Proceedings of the IEEE/CVF International Conference on Computer Vision, 4195-4205.
- Zhang, L., Rao, A., & Agrawala, M. (2023). Adding conditional control to text-to-image diffusion models. Proceedings of the IEEE/CVF International Conference on Computer Vision, 3836-3847. [CrossRef]
- Liu, X., Park, D. H., Azadi, S., Zhang, G., Chopra, S., Kim, S., & Schwing, A. G. (2023). More control for free! Image synthesis with semantic diffusion guidance. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 289-299. [CrossRef]
- Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., & Abbeel, P. (2018). Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641.
- Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. arXiv preprint arXiv:1701.07875.
- Balduzzi, D., Racanière, S., Martens, J., Foerster, J., Tuyls, K., & Graepel, T. (2018). The mechanics of n-player differentiable games. International Conference on Machine Learning (pp. 354-363). PMLR.
- Carmona, R., Laurière, M., & Tan, Z. (2019). Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods. arXiv preprint arXiv:1910.04295.
- Chizat, L., & Bach, F. (2018). On the global convergence of gradient descent for over-parameterized models using optimal transport. Advances in Neural Information Processing Systems, *31*.
- Czarnecki, W. M., Gidel, G., Tracey, B., Tuyls, K., Omidshafiei, S., Pascanu, R., & Jaderberg, M. (2020). Real world games look like spinning tops. Advances in Neural Information Processing Systems, *33*, 17443-17454.
- Daskalakis, C., Ilyas, A., Syrgkanis, V., & Zeng, H. (2018). Training GANs with optimism. International Conference on Learning Representations.
- Ghosh, S. (2021). Inequalities-Part 1. At Right Angles, (10), 93-96.
- Dütting, P., Feng, Z., Narasimhan, H., Parkes, D. C., & Ravindranath, S. S. (2019). Optimal auctions through deep learning. International Conference on Machine Learning (pp. 1706-1715). PMLR.
- Feng, Z., Narasimhan, H., & Parkes, D. C. (2020). Deep learning for revenue-optimal auctions with budgets. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (pp. 354-362).
- Coxa, D., Ghoshb, S., & Sultanowc, E. Generalizing Halbeisen’s and Hungerbuhler’s optimal bounds for the length of Collatz cycles to 3n+ c cycles.
- Foerster, J., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2018). Learning with opponent-learning awareness. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (pp. 122-130).
- Frye, C., Rowat, C., & Feige, I. (2020). Asymmetric Shapley values: Incorporating causal knowledge into model-agnostic explainability. Advances in Neural Information Processing Systems, *33*, 1229-1239.
- Gemp, I., Anthony, T., Bachrach, Y., Lever, G., Pérolat, J., Tuyls, K., & Lanctot, M. (2021). Negotiating team formation using deep reinforcement learning. Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (pp. 464-472).
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27.
- Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems, 30.
- Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2016). Cooperative inverse reinforcement learning. Advances in Neural Information Processing Systems, 29.
- Hardt, M., Megiddo, N., Papadimitriou, C., & Wootters, M. (2016). Strategic classification. Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science (pp. 111-122).
- Heinrich, J., & Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121.
- Janzing, D., Minorics, L., & Blöbaum, P. (2020). Feature relevance quantification in explainable AI: A causal problem. International Conference on Artificial Intelligence and Statistics (pp. 2907-2916). PMLR.
- Kang, J., Xiong, Z., Niyato, D., Xie, S., & Zhang, J. (2020). Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory. IEEE Internet of Things Journal, 6(6), 10700-10714.
- Cox, D., & Ghosh, S. (2020). The 3n+ 1 Problem-Generalized Dead Limbs and Cycles with Attachment Points.
- Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Pérolat, J., ... & Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. Advances in Neural Information Processing Systems, 30.
- Letcher, A., Balduzzi, D., Racanière, S., Martens, J., Foerster, J., Tuyls, K., & Graepel, T. (2019). Differentiable game mechanics. Journal of Machine Learning Research, 20(84), 1-40.
- Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., & Smith, V. (2020). Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems, 2, 429-450.
- Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning (pp. 157-163).
- Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 30.
- Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations.
- Mertikopoulos, P., Lecouat, B., Zenati, H., Foo, C. S., Chandrasekhar, V., & Piliouras, G. (2018). Mirror descent in saddle-point problems: Going the extra (gradient) mile. arXiv preprint arXiv:1807.02629.
- OpenAI. (2019). Emergent tool use from multi-agent interaction. arXiv preprint arXiv:1909.07528.
- Perolat, J., De Vylder, B., Hennes, D., Tarassov, E., Strub, F., de Boer, V., ... & Tuyls, K. (2022). Mastering the game of Stratego with model-free multiagent reinforcement learning. Science, 378(6623), 990-996.
- Abhyankar, S. S., Abramov, V., Adem, A., Aizenberg, L., Albeverio, S., Alías, L. J., ... & Michor, P. W. (2002). Encyclopaedia of mathematics, supplement III. Dordrecht: Springer Netherlands.
- Perrin, S., Pérolat, J., Laurière, M., Geist, M., Elie, R., & Pietquin, O. (2020). Fictitious play for mean field games: Continuous time analysis and applications. Advances in Neural Information Processing Systems, 33, 13199-13213.
- Raghunathan, A., Steinhardt, J., & Liang, P. (2018). Certified defenses against adversarial examples. International Conference on Learning Representations.
- Raghu, A., Raghu, M., Bengio, S., & Vinyals, O. (2019). Rapid learning or feature reuse? Towards understanding the effectiveness of MAML. International Conference on Learning Representations.
- Schäfer, F., & Anandkumar, A. (2019). Competitive gradient descent. Advances in Neural Information Processing Systems, 32.
- Shalev-Shwartz, S., Shamir, O., Srebro, N., & Sridharan, K. (2012). Learnability, stability and uniform convergence. Journal of Machine Learning Research, 13(1), 2635-2670.
- Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354-359.
- Sinha, A., Namkoong, H., & Duchi, J. (2018). Certifiable distributional robustness with principled adversarial training. International Conference on Learning Representations.
- Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley, K. O., & Clune, J. (2017). Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567.
- Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. International Conference on Machine Learning (pp. 3319-3328). PMLR.
- Tishby, N., Pereira, F. C., & Bialek, W. (2015). The information bottleneck method. arXiv preprint physics/0004057.
- Wang, R., He, X., Yu, R., Qiu, W., An, B., & Rabinovich, Z. (2019). Learning efficient multi-agent communication: An information bottleneck approach. International Conference on Machine Learning (pp. 9905-9915). PMLR.
- Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., & Jordan, M. (2019). Theoretically principled trade-off between robustness and accuracy. International Conference on Machine Learning (pp. 7472-7482). PMLR.
- Ambrosio, L., Gigli, N., & Savaré, G. (2008). Gradient flows: In metric spaces and in the space of probability measures. Birkhäuser.
- Courty, N., Flamary, R., Habrard, A., & Rakotomamonjy, A. (2017). Joint distribution optimal transportation for domain adaptation. Advances in Neural Information Processing Systems (NeurIPS), 30.
- Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. Advances in Neural Information Processing Systems (NeurIPS), 26, 2292–2300.
- Deshpande, I., Hu, Y.-T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., Zhao, Z., Forsyth, D., & Schwing, A. G. (2018). Generative modeling using the sliced Wasserstein distance. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3483–3491.
- Evans, L. C. (2021). The theory of optimal transport and its applications. American Mathematical Society.
- Frogner, C., Zhang, C., Mobahi, H., Araya, M., & Poggio, T. (2015). Learning with a Wasserstein loss. Advances in Neural Information Processing Systems (NeurIPS), 28.
- Genevay, A., Peyré, G., & Cuturi, M. (2018). Learning generative models with Sinkhorn divergences. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 84, 1601–1610.
- Huang, J., Zhao, W., Jin, Q., & Liu, W. (2021). Optimal transport maps for deep generative models. International Conference on Learning Representations (ICLR).
- Janati, H., Cuturi, M., & Gramfort, A. (2020). Entropic optimal transport between unbalanced Gaussian measures. Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 108, 3186–3196.
- Li, Y., Swersky, K., & Zemel, R. (2015). Generative moment matching networks. Proceedings of the 32nd International Conference on Machine Learning (ICML), 37, 1718–1727.
- Mei, S., Montanari, A., & Nguyen, P.-M. (2018). A mean field view of the landscape of two-layer neural networks. Proceedings of the National Academy of Sciences (PNAS), 115(33), E7665–E7671.
- Ozair, S., Li, Y., & Zemel, R. (2019). Wasserstein dependency measure for representation learning. Advances in Neural Information Processing Systems (NeurIPS), 32.
- Peyré, G., & Cuturi, M. (2019). Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6), 355–607.
- Rachev, S. T., & Rüschendorf, L. (1998). Mass transportation problems: Volume I: Theory. Springer.
- Ramdas, A., Trillos, N. G., & Cuturi, M. (2017). On Wasserstein two-sample testing and related families of nonparametric tests. Entropy, 19(2), 47.
- Santambrogio, F. (2015). Optimal transport for applied mathematicians. Birkhäuser.
- Taghvaei, A., & Jalali, A. (2019). 2-Wasserstein approximation via restricted convex potentials with application to improved training for GANs. Advances in Neural Information Processing Systems (NeurIPS), 32.
- Tolstikhin, I., Bousquet, O., Gelly, S., & Schoelkopf, B. (2018). Wasserstein auto-encoders. International Conference on Learning Representations (ICLR).
- Villani, C. (2009). Optimal transport: Old and new. Springer.
- Adams, R. A., & Fournier, J. J. F. (2003). Sobolev spaces (2nd ed.). Academic Press.
- Albiac, F., & Kalton, N. J. (2006). Topics in Banach space theory. Springer.
- Billingsley, P. (1968). Convergence of probability measures. Wiley.
- Chow, P.-L. (2007). Stochastic partial differential equations. Chapman & Hall/CRC.
- Ciarlet, P. G. (1978). The finite element method for elliptic problems. North-Holland.
- Deimling, K. (1985). Nonlinear functional analysis. Springer.
- Evans, L. C. (2010). Partial differential equations (2nd ed.). American Mathematical Society.
- Evans, L. C., & Gariepy, R. F. (1992). Measure theory and fine properties of functions. CRC Press.
- Folland, G. B. (1992). Fourier analysis and its applications. Brooks/Cole.
- Grafakos, L. (2008). Classical and modern Fourier analysis. Pearson.
- Karatzas, I., & Shreve, S. E. (1991). Brownian motion and stochastic calculus (2nd ed.). Springer.
- Katok, A., & Hasselblatt, B. (1995). Introduction to the modern theory of dynamical systems. Cambridge University Press.
- Lions, J.-L. (1971). Optimal control of systems governed by partial differential equations. Springer.
- Morton, K. W., & Mayers, D. F. (1994). Numerical solution of partial differential equations. Cambridge University Press.
- Petersen, K. (1983). Ergodic theory. Cambridge University Press.
- Royden, H. L. (1968). Real analysis (2nd ed.). Macmillan.
- Rudin, W. (1973). Functional analysis. McGraw-Hill.
- Temam, R. (1988). Infinite-dimensional dynamical systems in mechanics and physics. Springer.
- van der Vaart, A. W., & Wellner, J. A. (1996). Weak convergence and empirical processes: With applications to statistics. Springer.
- Fréchet, M. (1906). Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Matematico di Palermo, 22(1), 1–72.
- Kolmogorov, A. (1931). Über die Kompaktheit der Funktionenmengen bei der Konvergenz im Mittel. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1931, 60–63.
- Appell, J., & Zabrejko, P. P. (1990). Compactness in Lp spaces. Journal of Mathematical Analysis and Applications, 147(2), 303–317.
- Ambrosio, L., Gigli, N., & Savaré, G. (2005). Calculus and heat flow in metric measure spaces and applications to spaces with Ricci curvature bounded from below. Inventiones Mathematicae.
- Ambrosio, L., Gigli, N., & Savaré, G. (2008). Gradient flows in metric spaces and in the space of probability measures (2nd ed.). Birkhäuser.
- Ambrosetti, A., & Malchiodi, A. (2007). Nonlinear analysis and semilinear elliptic problems. Cambridge University Press.
- Bakry, D., Gentil, I., & Ledoux, M. (2014). Analysis and geometry of Markov diffusion operators. Springer.
- Baudoin, F., & Bonnefont, M. (2012). The Poincaré inequality for subelliptic operators. Journal of Functional Analysis.
- Brezis, H. (2011). Functional analysis, Sobolev spaces and partial differential equations. Springer.
- Buser, P. (1982). A note on the isoperimetric constant. Annales Scientifiques de l’École Normale Supérieure.
- Capogna, L., Danielli, D., & Garofalo, N. (2001). The geometric Sobolev embedding for vector fields and the isoperimetric inequality. Communications in Analysis and Geometry.
- Davies, E. B. (1989). Heat kernels and spectral theory. Cambridge University Press.
- Federer, H. (1969). Geometric measure theory. Springer.
- Garofalo, N., & D’Agostino, D. Z. (1999). Isoperimetric and Sobolev inequalities for Carnot-Carathéodory spaces and the existence of minimal surfaces. Communications on Pure and Applied Mathematics.
- Gromov, M. (1999). Metric structures for Riemannian and non-Riemannian spaces. Birkhäuser.
- Heinonen, J., & Koskela, P. (2001). Sobolev met Poincaré. Memoirs of the American Mathematical Society.
- Heinonen, J., Koskela, P., Shanmugalingam, N., & Tyson, J. (2005). Sobolev spaces on metric measure spaces: An approach based on upper gradients. Cambridge University Press.
- Lieb, E. H., & Loss, M. (2001). Analysis (2nd ed.). American Mathematical Society.
- Maz’ya, V. (1985). Sobolev spaces. Springer.
- Sturm, K.-T., & von Renesse, M.-K. (2005). Optimal transport and curvature: Nonlinear Ricci curvature bounds. Probability Theory and Related Fields.
- Zhang, Q. S. (2011). Sobolev inequalities, heat kernels under Ricci flow, and the Poincaré conjecture. CRC Press.
- Gilbarg, D., & Trudinger, N. S. (2001). Elliptic partial differential equations of second order (2nd ed.). Springer.
- Heinonen, J. (2001). Lectures on analysis on metric spaces. Springer.
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).