Submitted:
14 February 2025
Posted:
14 February 2025
Read the latest preprint version here
Abstract
Keywords:
1. Mathematical Foundations
- Functional Approximation: Representing complex, non-linear functions using neural networks. Functional approximation is one of the fundamental concepts in deep learning, and it is integral to how deep learning models, particularly neural networks, solve complex problems. In the context of deep learning, functional approximation refers to the ability of neural networks to represent complex, high-dimensional, and non-linear functions that are often difficult or infeasible to model using traditional mathematical techniques.
- Optimization Theory: Solving non-convex optimization problems efficiently. Optimization theory plays a central role in deep learning, as the goal of training deep neural networks is essentially to find the optimal set of parameters (weights and biases) that minimize a predefined objective, often called the loss function. This objective typically measures the difference between the network’s predictions and the true values. Optimization techniques guide the training process and determine how well a neural network can learn from data.
- Statistical Learning Theory: Understanding generalization behavior on unseen data. Statistical Learning Theory (SLT) provides the mathematical foundation for understanding the behavior of machine learning algorithms, including deep learning models. It offers key insights into how models generalize from training data to unseen data, which is critical for ensuring that deep learning models are not only accurate on the training set but also perform well on new, previously unseen data. SLT helps address fundamental challenges such as overfitting, bias-variance tradeoff, and generalization error.
1.1. Problem Definition: Risk Functional as a Mapping Between Spaces
1.1.1. Measurable Function Spaces
- f belongs to a hypothesis space .
- is a Borel probability measure over , satisfying .
1.1.2. Risk as a Functional
1.2. Approximation Spaces for Neural Networks
- VC-dimension theory for discrete hypotheses.
- Rademacher complexity for continuous spaces:where are i.i.d. Rademacher random variables.
1.2.1. VC-Dimension Theory for Discrete Hypotheses
- Shattering Implies Non-empty Hypothesis Class: If a set S is shattered by H, then H is non-empty. This follows directly from the fact that for each labeling , there exists some that produces the corresponding labeling. Therefore, H must contain at least one hypothesis.
- Upper Bound on Shattering: Given a hypothesis class H, if there exists a set of size k such that H can shatter S, then any set of size greater than k cannot be shattered. This gives us the crucial result that:
- Implication for Generalization A central result in the theory of statistical learning is the connection between VC-dimension and the generalization error. Specifically, the VC-dimension bounds the ability of a hypothesis class to generalize to unseen data. The higher the VC-dimension, the more complex the hypothesis class, and the more likely it is to overfit the training data, leading to poor generalization.
- Example 1: Linear Classifiers in : Consider the hypothesis class H consisting of linear classifiers in . These classifiers are hyperplanes in two dimensions, defined by:where is the weight vector and is the bias term. The VC-dimension of linear classifiers in is 3. This can be rigorously shown by noting that for any set of 3 points in , the hypothesis class H can shatter these points. In fact, any possible binary labeling of the 3 points can be achieved by some linear classifier. However, for 4 points in , it is impossible to shatter all possible binary labelings (e.g., the four vertices of a convex quadrilateral), meaning the VC-dimension is 3.
- Example 2: Polynomial Classifiers of Degree d: Consider a polynomial hypothesis class in of degree d. The hypothesis class H consists of polynomials of the form:where the are coefficients and . The VC-dimension of polynomial classifiers of degree d in grows as , implying that the complexity of the hypothesis class increases rapidly with both the degree d and the dimension n of the input space.
1.2.2. Rademacher Complexity for Continuous Spaces
1.2.3. Sobolev Embeddings
- Semi-norm Dominance: The -norm is controlled by the seminorm , ensuring sensitivity to high-order derivatives.
- Poincaré Inequality: For bounded, satisfies:
-
Sobolev Embedding Theorem: Let be a bounded domain with Lipschitz boundary. Then:
- If , with and .
- If , for .
- If , where .
-
Rellich-Kondrachov Compactness Theorem: The embedding is compact for . Compactness follows from:
- (a)
- Equicontinuity: -boundedness ensures uniform control over oscillations.
- (b)
- Rellich’s Selection Principle: Strong convergence follows from uniform estimates and tightness.
1.2.4. Rellich-Kondrachov Compactness Theorem
- The sequence does not oscillate excessively at small scales.
- The sequence does not escape to infinity in a way that prevents strong convergence.
2. Universal Approximation Theorem: Refined Proof
2.1. Approximation Using Convolution Operators
2.1.1. Stone-Weierstrass Application
2.2. Depth vs. Width: Capacity Analysis
2.2.1. Bounding the Expressive Power
| Paper | Main Contribution | Impact |
| Kolmogorov (1957) | Original KST theorem | Laid foundation for function decomposition |
| Arnold (1963) | Refinement using 2-variable functions | Made KST more practical for computation |
| Lorentz (2008) | KST in approximation theory | Linked KST to function approximation errors |
| Pinkus (1999) | KST in neural networks | Theoretical basis for deep learning |
| Perdikaris (2024) | Deep learning reinterpretation | Proposed Kolmogorov-Arnold Networks |
| Alhafiz (2025) | KST-based turbulence modeling | Improved CFD simulations |
| Lorencin (2024) | KST in naval propulsion | Optimized ship energy efficiency |
2.2.2. Fourier Analysis of Expressivity
3. Training Dynamics and NTK Linearization
3.1. Gradient Flow and Stationary Points
3.1.1. Hessian Structure
3.1.2. NTK Linearization
3.2. NTK Regime
4. Generalization Bounds: PAC-Bayes and Spectral Analysis
4.1. PAC-Bayes Formalism
- Empirical Risk: captures how well the posterior Q fits the training data.
- Complexity: The KL divergence ensures that Q remains close to P, discouraging overfitting and promoting generalization.
- Confidence: The term shrinks with increasing sample size, tightening the bound and enhancing reliability.
4.2. Spectral Regularization
5. Neural Network Basics
5.1. Perceptrons and Artificial Neurons
5.2. Feedforward Neural Networks
5.3. Activation Functions
5.4. Loss Functions
6. Training Neural Networks
6.1. Backpropagation Algorithm
6.2. Gradient Descent Variants
6.2.1. SGD (Stochastic Gradient Descent) Optimizer
6.2.2. Nesterov Accelerated Gradient Descent (NAG)
- Look-Ahead Gradient Computation: By computing instead of , NAG effectively anticipates the next move, leading to improved convergence rates.
- Adaptive Step Size: The effective step size is modified dynamically, stabilizing the trajectory.
- Choice of : Optimal momentum is .
- Adaptive Learning Rate: Choosing ensures convergence.
6.2.3. Adam (Adaptive Moment Estimation) Optimizer
6.2.4. RMSProp (Root Mean Squared Propagation) Optimizer
- is a biased estimator of for finite t, but unbiased in the limit.
- converges to in expectation, variance, and almost surely.
- This ensures stable and adaptive learning rates in RMSprop.
-
Without Bias Correction: If is large in early iterations, then:Since , the denominator in is too small, leading to excessively large steps, causing instability.
- With Bias Correction: Since , we ensure that:resulting in stable step sizes and improved convergence.
- Bias correction ensures , removing underestimation.
- Almost sure convergence guarantees asymptotically stable second-moment estimation.
- Stable step sizes prevent instability in early iterations.
6.3. Overfitting and Regularization Techniques
6.3.1. Dropout
6.3.2. L1/L2 Regularization and Overfitting
6.3.3. Elastic Net Regularization
6.3.4. Early Stopping
6.3.5. Data Augmentation
6.3.6. Cross-Validation
6.3.7. Pruning
6.3.8. Ensemble Methods
6.3.9. Noise Injection
6.3.10. Batch Normalization
6.3.11. Weight Decay
6.3.12. Max Norm Constraints
6.3.13. Transfer Learning
- Under-regularization: Low bias, high variance ⇒ overfitting.
- Over-regularization: High bias, low variance ⇒ underfitting.
6.4. Hyperparameter Tuning
6.4.1. Grid Search
- Guaranteed to find the best combination within the search space.
- Easy to implement and parallelize.
- Computationally expensive, especially for high-dimensional hyperparameter spaces.
- Inefficient if some hyperparameters have little impact on performance.
6.4.2. Random Search
- More efficient than grid search, especially when some hyperparameters are less important.
- Can explore a larger search space with fewer evaluations.
- No guarantee of finding the optimal hyperparameters.
- May still require many iterations for high-dimensional spaces.
6.4.3. Bayesian Optimization
- Efficient and requires fewer evaluations compared to grid/random search.
- Balances exploration (trying new regions) and exploitation (focusing on promising regions).
- Computationally expensive to build and update the surrogate model.
- May struggle with high-dimensional spaces or noisy objective functions.
6.4.4. Genetic Algorithms
- Can explore a wide range of hyperparameter combinations.
- Suitable for non-differentiable or discontinuous objective functions.
- Computationally expensive and slow to converge.
- Requires careful tuning of mutation and crossover parameters.
6.4.5. Hyperband
- is a black-box function with no known analytical form.
- Evaluating with a budget b (e.g., number of epochs, dataset size) yields an approximation , where as , and R is the maximum budget.
- Start with n configurations and allocate a small budget b to each.
- Evaluate all configurations and keep the top fraction.
- Increase the budget by a factor of and repeat until one configuration remains.
- Allocate budget to each configuration.
- Evaluate for all j.
- Keep the top configurations based on .
- For small s, it explores many configurations with small budgets.
- For large s, it exploits fewer configurations with large budgets.
- Near-Optimality: The best configuration found by HyperBand converges to as .
- Logarithmic Scaling: The total cost scales logarithmically with the number of configurations.
- Large-Scale Optimization: It scales to high-dimensional hyperparameter spaces.
- Parallelization: Configurations can be evaluated independently, enabling distributed computation.
- Adaptability: It works for both continuous and discrete hyperparameter spaces.
6.4.6. Gradient-Based Optimization
- Hypothesis space: as a Banach space equipped with norm .
- Parameter space: , where is a closed, convex subset of .
6.4.7. Population-Based Training (PBT)
- represents the model parameters, with d being the dimensionality of the model parameter space.
- represents the hyperparameters of the i-th model, with m being the dimensionality of the hyperparameter space . The set is a bounded subset of the positive real numbers, such as learning rates, batch sizes, or regularization factors.
-
At each iteration t, we perform:
- N forward passes to compute the losses .
- N selection and mutation operations for updating the population.
- This leads to a time complexity of per iteration.
6.4.8. Optuna
6.4.9. Successive Halving
6.4.10. Reinforcement Learning (RL)
- State Space (S): The state encodes the current hyperparameter configuration , the history of performance metrics, and any other relevant information (e.g., computational resources used).
- Action Space (A): The action represents a perturbation to the hyperparameters, such that:
- Transition Dynamics (P): The transition probability describes the stochastic evolution of the state. This includes the effect of training the model and evaluating it on .
- Reward Function (R): The reward quantifies the improvement in model performance, e.g.,
- Discount Factor (): The discount factor balances immediate and future rewards.
- Neural Network Function Approximation: Use deep neural networks to parameterize the policy and value function .
- Parallelization: Distribute the evaluation of hyperparameter configurations across multiple workers.
- Early Stopping: Use techniques like Hyperband to terminate poorly performing configurations early.
6.4.11. Meta-Learning
7. Convolution Neural Networks
7.1. Key Concepts
7.2. Applications in Image Processing
7.2.1. Image Classification
7.2.2. Object Detection
7.3. Real-World Applications
7.3.1. Medical Imaging
7.3.2. Autonomous Vehicles
7.4. Popular CNN Architectures
7.4.1. AlexNet
7.4.2. ResNet
7.4.3. VGG
8. Recurrent Neural Networks (RNNs)
8.1. Key Concepts
8.2. Sequence Modeling and Long Short-Term Memory (LSTM) and GRUs
8.3. Applications in Natural Language Processing
9. Advanced Architectures
9.1. Transformers and Attention Mechanisms
9.2. Generative Adversarial Networks (GANs)
9.3. Autoencoders and Variational Autoencoders
9.4. Graph neural networks (GNNs)
9.5. Physics Informed Neural Networks (PINNs)
- is a differential operator, for instance, the Laplace operator, or the Navier-Stokes operator for fluid dynamics.
- is the unknown solution we wish to approximate.
- is a known source term, which could represent external forces or other sources in the system.
- is the domain in which the equation is valid, such as a bounded region in (e.g., ).
- is a nonlinear activation function, such as ReLU or sigmoid.
- and are the weight matrices and bias vectors of the i-th layer.
- The function is a feedforward neural network with multiple layers.
- Data-driven loss term: This term enforces the agreement between the model predictions and any available data points (boundary or initial conditions).
- Physics-driven loss term: This term enforces the satisfaction of the governing PDE at collocation points within the domain .
9.6. Implementation of the Deep Galerkin Methods (DGM) using the Physics-Informed Neural Networks (PINNs)
10. Deep Kolmogorov Methods
10.1. The Kolmogorov Backward Equation and Its Functional Formulation
10.2. The Feynman-Kac Representation and Its Justification
10.3. Deep Kolmogorov Method: Neural Network Approximation
- Neural Network Approximation Error:
- Monte Carlo Sampling Error:where N is the number of samples used in SGD.
11. Reinforcement Learning
11.1. Key Concepts
- is the state space,
- is the action space,
- is the state transition probability,
- is the reward function,
- is the discount factor.
11.2. Deep Q-Learning
11.3. Applications in Games and Robotics
- is the state space, which represents all possible states the agent can be in.
- is the action space, which represents all possible actions the agent can take.
- is the state transition probability, which defines the probability of transitioning from state s to state under action a.
- is the reward function, which defines the immediate reward received after taking action a in state s.
- is the discount factor, which determines the importance of future rewards.
12. Kernel Regression
12.1. Nadaraya–Watson kernel regression
- The eigenvalue decay rate controls approximation power.
- Spectral filtering via regularization prevents high-frequency noise.
- Generalization is optimized when balancing bias and variance.
12.2. Priestley–Chao kernel estimator
- Uniform kernel:
- Epanechnikov kernel (optimal in MSE sense):
- Gaussian kernel:
12.3. Gasser–Müller kernel estimator
12.4. Parzen-Rosenblatt method
- Normalization Condition:This ensures that the kernel behaves like a proper probability density function and does not introduce artificial bias into the estimation.
- Symmetry Condition:Symmetry guarantees that the kernel function does not introduce directional bias in the estimation of .
- Non-negativity:While not strictly necessary, this property ensures that remains a valid probability density estimate in a practical sense.
- Finite Second Moment (Variance Condition):This ensures that the kernel function does not assign an excessive amount of probability mass far from the origin, preserving local smoothness properties.
- Unbiasedness Condition (Mean Zero Constraint):This ensures that the kernel function does not introduce artificial shifts in the density estimate.
- Gaussian Kernel:This kernel has the advantage of being infinitely differentiable and providing smooth density estimates.
- Epanechnikov Kernel:This kernel is optimal in the mean integrated squared error (MISE) sense, meaning that it minimizes the variance of while preserving local smoothness properties.
- Uniform Kernel:This kernel is simple but suffers from discontinuities, making it less desirable for smooth density estimation.
13. Natural Language Processing (NLP)
13.1. Text Classification
- Tokenization: Breaking the text into words or tokens.
- Stopword Removal: Removing common words (such as "and", "the", etc.) that do not carry significant meaning.
- Stemming and Lemmatization: Reducing words to their base or root form, e.g., "running" becomes "run".
- Lowercasing: Converting all words to lowercase to ensure consistency.
- Punctuation Removal: Removing punctuation marks.
- Bag-of-Words (BoW) model
- Term Frequency-Inverse Document Frequency (TF-IDF)
13.2. Machine Translation
13.3. Chatbots and Conversational AI
14. Deep Learning Frameworks
14.1. TensorFlow
14.2. PyTorch
14.3. JAX
Acknowledgments
Appendix
Appendix 15.1. Linear Algebra Essentials
Appendix 15.1.1. Matrices and Vector Spaces
- Addition: Defined entrywise:
- Scalar Multiplication: For ,
-
Matrix Multiplication: If and , then the product is given by:This is only defined when the number of columns of A equals the number of rows of B.
- Transpose: The transpose of A, denoted , satisfies:
- Determinant: If , then its determinant is given recursively by:where is the submatrix obtained by removing the first row and j-th column.
- Inverse: A square matrix A is invertible if there exists such that:where I is the identity matrix.
Appendix 15.1.2. Vector Spaces and Linear Transformations
- Vector Addition: for
- Scalar Multiplication: for and
- It is linearly independent:
- It spans V, meaning every can be written as:
Appendix 15.1.3. Eigenvalues and Eigenvectors
Appendix 15.1.4. Singular Value Decomposition (SVD)
Appendix 15.2. Probability and Statistics
Appendix 15.2.1. Probability Distributions
- for each .
- The sum of probabilities across all possible outcomes is 1:
- for all x.
- The total probability over the entire range of X is 1:
Appendix 15.2.2. Bayes’ Theorem
Appendix 15.2.3. Statistical Measures
- Measures of Central Tendency (e.g., mean, median, mode)
- Measures of Dispersion (e.g., variance, standard deviation, interquartile range)
- Measures of Shape (e.g., skewness, kurtosis)
- Measures of Association (e.g., covariance, correlation)
- Information-Theoretic Measures (e.g., entropy, mutual information)
- Expectation is linear:
- Variance is translation invariant but scales quadratically:
Appendix 15.3. Optimization Techniques
Appendix 15.3.1. Gradient Descent (GD)
- is the current point in the n-dimensional space (iteration index k),
- is the gradient of the objective function at ,
- is the learning rate (step size).
Appendix 15.3.2. Stochastic Gradient Descent (SGD)
Appendix 15.3.3. Second-Order Methods
- Gradient Descent (GD): An optimization algorithm that updates the parameter vector in the direction opposite to the gradient of the objective function. Convergence is guaranteed under convexity assumptions with an appropriately chosen step size.
- Stochastic Gradient Descent (SGD): A variant of GD that uses a random subset of the data to estimate the gradient at each iteration. While faster and less computationally intensive, its convergence is slower and more noisy, requiring variance reduction techniques for efficient training.
- Second-Order Methods: These methods use the Hessian (second derivatives of the objective function) to accelerate convergence, often exhibiting quadratic convergence near the optimum. However, the computational cost of calculating the Hessian restricts their practical use. Quasi-Newton methods, such as BFGS, approximate the Hessian to improve efficiency.
Appendix 15.4. Matrix Calculus
Appendix 15.4.1. Matrix Differentiation
- Matrix trace: For a matrix , the derivative of the trace with respect to is the identity matrix:
- Matrix product: Let and be matrices, and consider the product . The derivative of this product with respect to is:
- Matrix inverse: The derivative of the inverse of with respect to is:
Appendix 15.4.2. Tensor Differentiation
Appendix 15.5. Information Theory
Appendix 15.5.1. Entropy: The Fundamental Measure of Uncertainty
- Continuity: is a continuous function of .
- Maximality: The uniform distribution for all maximizes entropy:
- Additivity: For two independent random variables X and Y, entropy satisfies:
- Monotonicity: Conditioning reduces entropy:
Appendix 15.5.2. Source Coding Theorem: Fundamental Limits of Compression
- Achievability: Given a discrete memoryless source (DMS) X with entropy , for any , there exists a source code that compresses sequences of length n to approximately bits per symbol and allows for decoding with vanishing error probability as .
- Converse: No source code can achieve an average code length per symbol smaller than without increasing the error probability to 1.
Appendix 15.5.3. Noisy Channel Coding Theorem: Fundamental Limits of Communication
- If , there exists a code that allows error-free transmission.
- If , error probability approaches 1.
- : .
- : Some other codeword (with ) satisfies .
- Use Fano’s inequality to relate the error probability to the conditional entropy .
- Apply the data processing inequality to bound the mutual information .
- Show that if , the error probability cannot vanish.
Appendix 15.5.4. Rate-Distortion Theory: Lossy Data Compression
- The mutual information is a convex function of ,
- The distortion constraint is a linear (and thus convex) constraint.
- Stationarity:
- Primal Feasibility:
- Dual Feasibility:
- Complementary Slackness:
Appendix 15.5.5. Applications of Information Theory
- Factor Graph Representation: The decoding process is represented as message passing on a factor graph, where the nodes correspond to variables and constraints. The Bethe free energy provides a variational characterization of the decoding problem.
- EXIT Charts: The extrinsic information transfer (EXIT) chart is a tool to analyze the convergence of iterative decoding. The area theorem relates the area under the EXIT curve to the gap to capacity.
- The solution to the constrained optimization problem exists and is unique.
- The maximum entropy distribution is the unique global maximizer of subject to the constraints.
- Sanov’s Theorem: A result in large deviation theory that characterizes the probability of observing an empirical distribution deviating from the true distribution.
- Gibbs’ Inequality: The Shannon entropy is maximized by the uniform distribution when no constraints are imposed.
- Convex Duality: The Lagrange multipliers are dual variables that encode the sensitivity of the entropy to changes in the constraints.
- The Boltzmann distribution for the canonical ensemble.
- The Fermi-Dirac and Bose-Einstein distributions for quantum systems.
- The Gibbs distribution for systems with multiple conserved quantities.
- It assumes knowledge of the correct constraints.
- It may not apply to systems with long-range correlations or non-Markovian dynamics.
- Extensions to non-equilibrium systems remain an active area of research.
Appendix 15.5.6. Conclusion: Information Theory as a Universal Mathematical Principle
References
- Rao, N. , Farid, M., and Raiz, M. Symmetric Properties of λ-Szász Operators Coupled with Generalized Beta Functions and Approximation Theory. Symmetry 2024, 16, 1703. [Google Scholar] [CrossRef]
- Mukhopadhyay, S.N. , Ray, S. (2025). Function Spaces. In: Measure and Integration. University Texts in the Mathematical Sciences. Springer, Singapore.
- Szołdra, T. (2024). Ergodicity breaking in quantum systems: From exact time evolution to machine learning (Doctoral dissertation).
- SONG, W. X. , CHEN, H., CUI, C., LIU, Y. F., TONG, D., GUO, F., ... and XIAO, C. W. Theoretical, methodological, and implementation considerations for establishing a sustainable urban renewal model. Journal of Natural Resources 2025, 40, 20–38. [Google Scholar] [CrossRef]
- El Mennaoui, O. , Kharou, Y., and Laasri, H. Evolution families in the framework of maximal regularity. Evolution Equations and Control Theory 2025, 0–0. [Google Scholar] [CrossRef]
- Pedroza, G. On the Conditions for Domain Stability for Machine Learning: A Mathematical Approach. arXiv 2024, arXiv:2412.00464. [Google Scholar]
- Cerreia-Vioglio, S. , and Ok, E. A. Abstract integration of set-valued functions. Journal of Mathematical Analysis and Applications 2024, 129169. [Google Scholar]
- Averin, A. Formulation and Proof of the Gravitational Entropy Bound. arXiv 2024, arXiv:2412.02470. [Google Scholar]
- Potter, T. Subspaces of L2(Rn) Invariant Under Crystallographic Shifts. arXiv-2501. arXiv 2025. [Google Scholar]
- Lee, M. Emergence of Self-Identity in Artificial Intelligence: A Mathematical Framework and Empirical Study with Generative Large Language Models. Axioms 2025, 14, 44. [Google Scholar] [CrossRef]
- Wang, R. , Cai, L., Wu, Q., and Niyato, D. Service Function Chain Deployment with Intrinsic Dynamic Defense Capability. IEEE Transactions on Mobile Computing 2025. [Google Scholar]
- Duim, J. L. , and Mesquita, D. P. Artificial Intelligence Value Alignment via Inverse Reinforcement Learning. Proceeding Series of the Brazilian Society of Computational and Applied Mathematics 2025, 11, 1–2. [Google Scholar]
- Khayat, M. , Barka, E., Serhani, M. A., Sallabi, F., Shuaib, K., and Khater, H. M. Empowering Security Operation Center with Artificial Intelligence and Machine Learning—A Systematic Literature Review. IEEE Access 2025. [Google Scholar] [CrossRef]
- Agrawal, R. 46 Detection of melanoma using DenseNet-based adaptive weighted loss function. Emerging Trends in Computer Science and Its Application 2025, 283. [Google Scholar]
- Hailemichael, H. , and Ayalew, B. Adaptive and Safe Fast Charging of Lithium-Ion Batteries Via Hybrid Model Learning and Control Barrier Functions. Available at SSRN 5110597.
- Nguyen, E. , Xiao, J., Fan, Z., and Ruan, D. Contrast-free Full Intracranial Vessel Geometry Estimation from MRI with Metric Learning based Inference. In Medical Imaging with Deep Learning.
- Luo, Z. , Bi, Y., Yang, X., Li, Y., Wang, S., and Ye, Q. A Novel Machine Vision-Based Collision Risk Warning Method for Unsignalized Intersections on Arterial Roads. Frontiers in Physics 2025, 13, 1527956. [Google Scholar]
- Bousquet, N. , Thomassé, S. VC-dimension and Erdős–Pósa property. Discrete Mathematics 2015, 338, 2302–2317. [Google Scholar] [CrossRef]
- Asian, O. , Yildiz, O. T., Alpaydin, E. (2009, September). Calculating the VC-dimension of decision trees. In 2009 24th International Symposium on Computer and Information Sciences (pp. 193–198). IEEE.
- Zhang, C. , Bian, W., Tao, D., Lin, W. Discretized-Vapnik-Chervonenkis dimension for analyzing complexity of real function classes. IEEE transactions on neural networks and learning systems 2012, 23, 1461–1472. [Google Scholar] [CrossRef] [PubMed]
- Riondato, M. , Akdere, M., Çetintemel, U., Zdonik, S. B., Upfal, E. (2011). The VC-dimension of SQL queries and selectivity estimation through sampling. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5–9, 2011, Proceedings, Part II 22 (pp. 661–676). Springer Berlin Heidelberg.
- Bane, M. , Riggle, J., Sonderegger, M. The VC dimension of constraint-based grammars. Lingua 2010, 120, 1194–1208. [Google Scholar] [CrossRef]
- Anderson, A. Fuzzy VC Combinatorics and Distality in Continuous Logic. arXiv 2023, arXiv:2310.04393. [Google Scholar]
- Fox, J. , Pach, J., Suk, A. Bounded VC-dimension implies the Schur-Erdős conjecture. Combinatorica 2021, 41, 803–813. [Google Scholar] [CrossRef]
- Johnson, H. R. Binary strings of finite VC dimension. arXiv 2021, arXiv:2101.06490. [Google Scholar]
- Janzing, D. Merging joint distributions via causal model classes with low VC dimension. arXiv 2018, arXiv:1804.03206. [Google Scholar]
- Hüllermeier, E. , Fallah Tehrani, A. (2012, July). On the vc-dimension of the choquet integral. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (pp. 42–50). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Mohri, M. (2018). Foundations of machine learning.
- Cucker, F. , Zhou, D. X. (2007). Learning theory: An approximation theory viewpoint (Vol. 24). Cambridge University Press.
- Shalev-Shwartz, S. , Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.
- Truong, L. V. On rademacher complexity-based generalization bounds for deep learning. arXiv 2022, arXiv:2208.04284. [Google Scholar]
- Gnecco, G. , and Sanguineti, M. Approximation error bounds via Rademacher complexity. Applied Mathematical Sciences 2008, 2, 153–176. [Google Scholar]
- Astashkin, S. V. Rademacher functions in symmetric spaces. Journal of Mathematical Sciences 2010, 169, 725–886. [Google Scholar] [CrossRef]
- Ying and Campbell Rademacher chaos complexities for learning the kernel problem. Neural computation 2010, 22, 2858–2886. [CrossRef]
- Zhu, J. , Gibson, B., and Rogers, T. T. Human rademacher complexity. Advances in neural information processing systems 2009, 22. [Google Scholar]
- Astashkin, S. V. , Astashkin, S. V., and Mazlum. (2020). The Rademacher system in function spaces. Basel: Birkhäuser.
- Sachs, S. , van Erven, T., Hodgkinson, L., Khanna, R., and Şimşekli, U. (2023, July). Generalization Guarantees via Algorithm-dependent Rademacher Complexity. In The Thirty Sixth Annual Conference on Learning Theory (pp. 4863–4880). PMLR.
- Ma and Wang Rademacher complexity and the generalization error of residual networks. Communications in Mathematical Sciences 2020, 18, 1755–1774. [CrossRef]
- Bartlett, P. L. , and Mendelson, S. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 2002, 3, 463–482. [Google Scholar]
- Bartlett, P. L. , and Mendelson, S. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 2002, 3, 463–482. [Google Scholar]
- McDonald, D. J. , and Shalizi, C. R. (2011). Rademacher complexity of stationary sequences. arXiv 2011, arXiv:1106.0730. [Google Scholar]
- Abderachid, S. , and Kenza, B. EMBEDDINGS IN RIEMANN–LIOUVILLE FRACTIONAL SOBOLEV SPACES AND APPLICATIONS.
- Giang, T. H. , Tri, N. M., and Tuan, D. A. On some Sobolev and Pólya-Sezgö type inequalities with weights and applications. arXiv 2024, arXiv:2412.15490. [Google Scholar]
- Ruiz, P. A. , and Fragkiadaki, V. (2024). Fractional Sobolev embeddings and algebra property: A dyadic view. arXiv 2024, arXiv:2412.12051. [Google Scholar]
- Bilalov, B. , Mamedov, E., Sezer, Y., and Nasibova, N. Compactness in Banach function spaces: Poincaré and Friedrichs inequalities. Rendiconti del Circolo Matematico di Palermo Series 2 2025, 74, 68. [Google Scholar] [CrossRef]
- Cheng, M. , and Shao, K. Ground states of the inhomogeneous nonlinear fractional Schrödinger-Poisson equations. Complex Variables and Elliptic Equations (2025), 1–17.
- Wei, J. , and Zhang, L. Ground State Solutions of Nehari-Pohozaev Type for Schrödinger-Poisson Equation with Zero-Mass and Weighted Hardy Sobolev Subcritical Exponent. The Journal of Geometric Analysis 2025, 35, 48. [Google Scholar] [CrossRef]
- Zhang, X. , and Qi, W. Multiplicity result on a class of nonhomogeneous quasilinear elliptic system with small perturbations in RN. arXiv 2025, arXiv:2501.01602. [Google Scholar]
- Xiao, J. , and Yue, C. A Trace Principle for Fractional Laplacian with an Application to Image Processing. La Matematica 2025, 1–26. [Google Scholar]
- Pesce, A. , and Portaro, S. (2025). Fractional Sobolev spaces related to an ultraparabolic operator. arXiv 2025, arXiv:2501.05898. [Google Scholar]
- LASSOUED, D. A STUDY OF FUNCTIONS ON THE TORUS AND MULTI-PERIODIC FUNCTIONS. Kragujevac Journal of Mathematics 2026, 50, 297–337. [Google Scholar] [CrossRef]
- Chen, H. , Chen, H. G., and Li, J. N. (2024). Sharp embedding results and geometric inequalities for Hö rmander vector fields. arXiv 2024, arXiv:2404.19393. [Google Scholar]
- Adams, R. A. , and Fournier, J. J. (2003). Sobolev spaces. Elsevier.
- Brezis, H. , and Brézis, H. (2011). Functional analysis, Sobolev spaces and partial differential equations (Vol. 2, No. 3, p. 5). New York: Springer.
- Evans, L. C. (2022). Partial differential equations (Vol. 19). American Mathematical Society.
- Maz’â, V. G. (2011). Sobolev Spaces: With Applications to Elliptic Partial Differential Equations. Springer.
- Hornik, K. , Stinchcombe, M., and White, H. Multilayer feedforward networks are universal approximators. Neural networks 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Barron, A. R. Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory 1993, 39, 930–945. [Google Scholar] [CrossRef]
- Pinkus, A. Approximation theory of the MLP model in neural networks. Acta numerica 1999, 8, 143–195. [Google Scholar] [CrossRef]
- Lu, Z. , Pu, H., Wang, F., Hu, Z., and Wang, L. The expressive power of neural networks: A view from the width. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Hanin, B. , and Sellke, M. (2017). Approximating continuous functions by relu nets of minimal width. arXiv 2017, arXiv:1710.11278. [Google Scholar]
- Garcıa-Cervera, C. J. , Kessler, M., Pedregal, P., and Periago, F. Universal approximation of set-valued maps and DeepONet approximation of the controllability map.
- Majee, S. , Abhishek, A., Strauss, T., and Khan, T. (2024). MCMC-Net: Accelerating Markov Chain Monte Carlo with Neural Networks for Inverse Problems. arXiv 2024, arXiv:2412.16883. [Google Scholar]
- Toscano, J. D. , Wang, L. L., and Karniadakis, G. E. (2024). KKANs: Kurkova-Kolmogorov-Arnold Networks and Their Learning Dynamics. arXiv 2024, arXiv:2412.16738. [Google Scholar]
- Son, H. ELM-DeepONets: Backpropagation-Free Training of Deep Operator Networks via Extreme Learning Machines. arXiv 2025, arXiv:2501.09395. [Google Scholar]
- Rudin, W. (1964). Principles of mathematical analysis (Vol. 3). New York: McGraw-hill.
- Stein, E. M. , and Shakarchi, R. (2009). Real analysis: Measure theory, integration, and Hilbert spaces. Princeton University Press.
- Conway, J. B. (2019). A course in functional analysis (Vol. 96). Springer.
- Dieudonné, J. (2020). History of Functional Analyais. In Functional Analysis, Holomorphy, and Approximation Theory (pp. 119–129). CRC Press.
- Folland, G. B. (1999). Real analysis: Modern techniques and their applications (Vol. 40). John Wiley and Sons.
- Sugiura, S. (2024). On the Universality of Reservoir Computing for Uniform Approximation.
- LIU, Y. , LIU, S., HUANG, Z., and ZHOU, P. NORMED MODULES AND THE CATEGORIFICATION OF INTEGRATIONS, SERIES EXPANSIONS, AND DIFFERENTIATIONS.
- Barreto, D. M. (2025). Stone-Weierstrass Theorem.
- Chang, S. Y. , and Wei, Y. Generalized Choi–Davis–Jensen’s Operator Inequalities and Their Applications. Symmetry 2024, 16, 1176. [Google Scholar] [CrossRef]
- Caballer, M. , Dantas, S., and Rodríguez-Vidanes, D. L. (2024). Searching for linear structures in the failure of the Stone-Weierstrass theorem. arXiv 2024, arXiv:2405.06453. [Google Scholar]
- Chen, D. The Machado–Bishop theorem in the uniform topology. Journal of Approximation Theory 2024, 304, 106085. [Google Scholar] [CrossRef]
- Rafiei, H. , and Akbarzadeh-T, M. R. Hedge-embedded Linguistic Fuzzy Neural Networks for Systems Identification and Control. IEEE Transactions on Artificial Intelligence 2024, 5, 4928–4937. [Google Scholar] [CrossRef]
- Kolmogorov, A. N. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk Russian Academy of Sciences 1957, 114, 953–956. [Google Scholar]
- Arnold, V. I. On the representation of functions of several variables as a superposition of functions of a smaller number of variables. Collected works: Representations of functions, celestial mechanics and KAM theory 2009, 1957–1965, 25-46.
- Lorentz, G. G. (1966). Approximation of functions, athena series. Selected Topics in Mathematics.
- Guilhoto, L. F. , and Perdikaris, P. Deep learning alternatives of the Kolmogorov superposition theorem. arXiv 2024, arXiv:2410.01990. [Google Scholar]
- Alhafiz, M. R. , Zakaria, K., Dung, D. V., Palar, P. S., Dwianto, Y. B., and Zuhal, L. R. (2025). Kolmogorov-Arnold Networks for Data-Driven Turbulence Modeling. In AIAA SCITECH 2025 Forum (p. 2047).
- Lorencin, I. , Mrzljak, V., Poljak, I., and Etinger, D. (2024, September). Prediction of CODLAG Propulsion System Parameters Using Kolmogorov-Arnold Network. In 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics (SISY) (pp. 173–178). IEEE.
- Trevisan, D. , Cassara, P., Agazzi, A., and Scardera, S. NTK Analysis of Knowledge Distillation.
- Bonfanti, A. , Bruno, G., and Cipriani, C. (2024). The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks. arXiv 2024, arXiv:2402.03864. [Google Scholar]
- Jacot, A. , Gabriel, F., and Hongler, C. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Lee, J. , Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., and Pennington, J. Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Yang, G. , and Hu, E. J. (2020). Feature learning in infinite-width neural networks. arXiv 2020, arXiv:2011.14522. [Google Scholar]
- Xiang, L. , Dudziak, Ł., Abdelfattah, M. S., Chau, T., Lane, N. D., and Wen, H. (2021). Zero-Cost Operation Scoring in Differentiable Architecture Search. arXiv 2021, arXiv:2106.06799. [Google Scholar]
- Lee, J. , Xiao, L., Schoenholz, S., Bahri, Y., Novak, R., Sohl-Dickstein, J., and Pennington, J. Wide neural networks of any depth evolve as linear models under gradient descent. Advances in neural information processing systems 2019, 32. [Google Scholar]
- McAllester, D. A. (1999, July). PAC-Bayesian model averaging. In Proceedings of the twelfth annual conference on Computational learning theory (pp. 164–170).
- Catoni, O. PAC-Bayesian supervised classification: The thermodynamics of statistical learning. arXiv 2007, arXiv:0712.0248. [Google Scholar]
- Germain, P. , Lacasse, A., Laviolette, F., and Marchand, M. (2009, June). PAC-Bayesian learning of linear classifiers. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 353–360).
- Seeger, M. PAC-Bayesian generalisation error bounds for Gaussian process classification. Journal of machine learning research 2002, 3, 233–269. [Google Scholar]
- Alquier, P. , Ridgway, J., and Chopin, N. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research 2016, 17, 1–41. [Google Scholar]
- Dziugaite, G. K. , and Roy, D. M. (2017). Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv 2017, arXiv:1703.11008. [Google Scholar]
- Rivasplata, O. , Kuzborskij, I., Szepesvári, C., and Shawe-Taylor, J. PAC-Bayes analysis beyond the usual bounds. Advances in Neural Information Processing Systems 2020, 33, 16833–16845. [Google Scholar]
- Lever, G. , Laviolette, F., and Shawe-Taylor, J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science 2013, 473, 4–28. [Google Scholar] [CrossRef]
- Rivasplata, O. , Parrado-Hernández, E., Shawe-Taylor, J. S., Sun, S., and Szepesvári, C. PAC-Bayes bounds for stable algorithms with instance-dependent priors. Advances in Neural Information Processing Systems 2018, 31. [Google Scholar]
- Lindemann, L. , Zhao, Y., Yu, X., Pappas, G. J., and Deshmukh, J. V. (2024). Formal verification and control with conformal prediction. arXiv:2409.00536.
- Jin, G. , Wu, S., Liu, J., Huang, T., and Mu, R. (2025). Enhancing Robust Fairness via Confusional Spectral Regularization. arXiv:2501.13273.
- Ye, F. , Xiao, J., Ma, W., Jin, S., and Yang, Y. Detecting small clusters in the stochastic block model. Statistical Papers 2025, 66, 37. [Google Scholar] [CrossRef]
- Bhattacharjee, A. , and Bharadwaj, P. Coherent Spectral Feature Extraction Using Symmetric Autoencoders. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2025. [Google Scholar] [CrossRef]
- Wu, Q. , Hu, B., Liu, C. et al. (2025). Velocity Analysis Using High-resolution Hyperbolic Radon Transform with Lq1-Lq2 Regularization. Pure Appl. Geophys 2025. [Google Scholar]
- Ortega, I. , Hannigan, J. W., Baier, B. C., McKain, K., and Smale, D. Advancing CH 4 and N 2 O retrieval strategies for NDACC/IRWG high-resolution direct-sun FTIR Observations. EGUsphere 2025, 2025, 1–32. [Google Scholar]
- Kazmi, S. H. A. , Hassan, R., Qamar, F., Nisar, K., and Al-Betar, M. A. Federated Conditional Variational Auto Encoders for Cyber Threat Intelligence: Tackling Non-IID Data in SDN Environments. IEEE Access 2025. [Google Scholar] [CrossRef]
- Zhao, Y. , Bi, Z., Zhu, P., Yuan, A., and Li, X. Deep Spectral Clustering with Projected Adaptive Feature Selection. IEEE Transactions on Geoscience and Remote Sensing 2025. [Google Scholar]
- Saranya, S. , and Menaka, R. A Quantum-Based Machine Learning Approach for Autism Detection using Common Spatial Patterns of EEG Signals. IEEE Access 2025. [Google Scholar] [CrossRef]
- Dhalbisoi, S. , Mohapatra, A., and Rout, A. (2024, March). Design of Cell-Free Massive MIMO for Beyond 5G Systems with MMSE and RZF Processing. In International Conference on Machine Learning, IoT and Big Data (pp. 263–273). Singapore: Springer Nature Singapore.
- Wei, C. , Li, Z., Hu, T., Zhao, M., Sun, Z., Jia, K.,... and Jiang, S. (2025). Model-based convolution neural network for 3D Near-infrared spectral tomography. IEEE Transactions on Medical Imaging 2025. [Google Scholar]
- Goodfellow, I. (2016). Deep learning (Vol. 196). MIT press.
- Haykin, S. (2009). Neural networks and learning machines, 3/E. Pearson Education India.
- Schmidhuber, J. (2015). Deep learning in neural networks: An overview.
- Bishop, C. M. , and Nasrabadi, N. M. (2006). Pattern recognition and machine learning (Vol. 4, No. 4, p. 738). New York: Springer.
- Poggio, T. , and Smale, S. The mathematics of learning: Dealing with data. Notices of the AMS 2003, 50, 537–544. [Google Scholar]
- LeCun, Y. , Bengio, Y., and Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Tishby, N. , and Zaslavsky, N. (2015, April). Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw) (pp. 1–5). IEEE.
- Sorrenson, P. (2025). Free-Form Flows: Generative Models for Scientific Applications (Doctoral dissertation).
- Liu, W. , and Shi, X. (2025). An Enhanced Neural Network Forecasting System for the July Precipitation over the Middle-Lower Reaches of the Yangtze River.
- Das, P. , Mondal, D., Islam, M. A., Al Mohotadi, M. A., and Roy, P. C. Analytical Finite-Integral-Transform and Gradient-Enhanced Machine Learning Approach for Thermoelastic Analysis of FGM Spherical Structures with Arbitrary Properties. Theoretical and Applied Mechanics Letters 2025, 100576. [Google Scholar] [CrossRef]
- Zhang, R. (2025). Physics-informed Parallel Neural Networks for the Identification of Continuous Structural Systems.
- Ali, S. , and Hussain, A. A neuro-intelligent heuristic approach for performance prediction of triangular fuzzy flow system. Proceedings of the Institution of Mechanical Engineers, Part N: Journal of Nanomaterials, Nanoengineering and Nanosystems 2025, 23977914241310569. [Google Scholar]
- Li, S. (2025). Scalable, generalizable, and offline methods for imperfect-information extensive-form games.
- Hu, T. , Jin, B., and Wang, F. An Iterative Deep Ritz Method for Monotone Elliptic Problems. Journal of Computational Physics 2025, 113791. [Google Scholar] [CrossRef]
- Chen, P. , Zhang, A., Zhang, S., Dong, T., Zeng, X., Chen, S., ... and Zhou, Q. Maritime near-miss prediction framework and model interpretation analysis method based on Transformer neural network model with multi-task classification variables. Reliability Engineering and System Safety 2025, 110845. [Google Scholar] [CrossRef]
- Sun, G. , Liu, Z., Gan, L., Su, H., Li, T., Zhao, W., and Sun, B. SpikeNAS-Bench: Benchmarking NAS Algorithms for Spiking Neural Network Architecture. IEEE Transactions on Artificial Intelligence 2025. [Google Scholar] [CrossRef]
- Zhang, Z. , Wang, X., Shen, J., Zhang, M., Yang, S., Zhao, W.,... and Wang, J. (2025). Unfixed Bias Iterator: A New Iterative Format. IEEE Access 2025. [Google Scholar]
- Rosa, G. J. (2010). The Elements of Statistical Learning: Data Mining, Inference, and Prediction by HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J.
- Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT press.
- Srivastava, N. , Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The journal of machine learning research 2014, 15, 1929–1958. [Google Scholar]
- Zou, H. , and Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Vapnik, V. (2013). The nature of statistical learning theory. Springer science and business media.
- Ng, A. Y. (2004, July). Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the twenty-first international conference on Machine learning (p. 78).
- Li, T. (2025). Optimization of Clinical Trial Strategies for Anti-HER2 Drugs Based on Bayesian Optimization and Deep Learning.
- Yasuda, M. , and Sekimoto, K. Gaussian-discrete restricted Boltzmann machine with sparse-regularized hidden layer.Behaviormetrika,1-19.
- Xiaodong Luo, William C. Cruz, Xin-Lei Zhang, Heng Xiao. Hyper-parameter optimization for improving the performance of localization in an iterative ensemble smoother. Geoenergy Science and Engineering 2023, 231 Pt B, 212404. [CrossRef]
- Alrayes, F.S. , Maray, M., Alshuhail, A. et al. Privacy-preserving approach for IoT networks using statistical learning with optimization algorithm on high-dimensional big data environment. Sci Rep 2025, 15, 3338. [Google Scholar] [CrossRef]
- Cho, H. , Kim, Y., Lee, E., Choi, D., Lee, Y., and Rhee, W. Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
- IBRAHIM, M. M. W. Optimizing Tuberculosis Treatment Predictions: A Comparative Study of XGBoost with Hyperparameter in Penang, Malaysia. Sains Malaysiana 2025, 54, 3741–3752. [Google Scholar]
- Abdel-salam, M. , Elhoseny, M. and El-hasnony, I.M. Intelligent and Secure Evolved Framework for Vaccine Supply Chain Management Using Machine Learning and Blockchain. SN COMPUT. SCI. 2025, 6, 121. [Google Scholar] [CrossRef]
- Vali, M. H. (2025). Vector quantization in deep neural networks for speech and image processing.
- Vincent, A.M. , Jidesh, P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Sci Rep 2023, 13, 4737. [Google Scholar] [CrossRef]
- Razavi-Termeh, S. V. , Sadeghi-Niaraki, A., Ali, F., and Choi, S. M. Improving flood-prone areas mapping using geospatial artificial intelligence (GeoAI): A non-parametric algorithm enhanced by math-based metaheuristic algorithms. Journal of Environmental Management 2025, 375, 124238. [Google Scholar] [CrossRef] [PubMed]
- Kiran, M. , and Ozyildirim, M. (2022). Hyperparameter tuning for deep reinforcement learning applications. arXiv:2201.11182.
- Krizhevsky, A. , Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012, 25. [Google Scholar]
- Krizhevsky, A. , Sutskever, I., and Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K. , and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, arXiv:1409.1556.
- He, K. , Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
- Cohen, T. , and Welling, M. (2016, June). Group equivariant convolutional networks. In International conference on machine learning (pp. 2990–2999). PMLR.
- Zeiler, M. D. , and Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13 (pp. 818–833). Springer International Publishing.
- Liu, Z. , Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... and Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012–10022).
- Lin, M. (2013). Network in network. arXiv:1312.4400.
- Rumelhart, D. E. , Hinton, G. E., and Williams, R. J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Bensaid, B. , Poëtte, G., and Turpault, R. Convergence of the Iterates for Momentum and RMSProp for Local Smooth Functions: Adaptation is the Key. arXiv, arXiv:2407.15471.
- Liu, Q. , and Ma, W. The Epochal Sawtooth Effect: Unveiling Training Loss Oscillations in Adam and Other Optimizers. arXiv, arXiv:2410.10056.
- Li, H. (2024). Smoothness and Adaptivity in Nonlinear Optimization for Machine Learning Applications (Doctoral dissertation, Massachusetts Institute of Technology).
- Heredia, C. Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations. arXiv 2024, arXiv:2411.09734. [Google Scholar]
- Ye, Q. Preconditioning for Accelerated Gradient Descent Optimization and Regularization. arXiv 2024, arXiv:2410.00232. [Google Scholar]
- Compagnoni, E. M. , Liu, T., Islamov, R., Proske, F. N., Orvieto, A., and Lucchi, A. Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise. arXiv 2024, arXiv:2411.15958. [Google Scholar]
- Yao, B. , Zhang, Q., Feng, R., and Wang, X. System response curve based first-order optimization algorithms for cyber-physical-social intelligence. Concurrency and Computation: Practice and Experience 2024, 36, e8197. [Google Scholar] [CrossRef]
- Wen, X. , and Lei, Y. (2024, June). A Fast ADMM Framework for Training Deep Neural Networks Without Gradients. In 2024 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE.
- Hannibal, S. , Jentzen, A., and Thang, D. M. Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation. arXiv, arXiv:2410.10533.
- Yang, Z. Adaptive Biased Stochastic Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Google Scholar]
- Kingma, D. P. , and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980.
- Reddi, S. J. , Kale, S., and Kumar, S. On the convergence of adam and beyond. arXiv 2019, arXiv:1904.09237. [Google Scholar]
- Jin, L. , Nong, H., Chen, L., and Su, Z. A Method for Enhancing Generalization of Adam by Multiple Integrations. arXiv 2024, arXiv:2412.12473. [Google Scholar]
- Adly, A. M. EXAdam: The Power of Adaptive Cross-Moments. arXiv 2024, arXiv:2412.20302. [Google Scholar]
- Liu, Y. , Cao, Y., and Lin, J. Convergence Analysis of the ADAM Algorithm for Linear Inverse Problems.
- Yang, Z. Adaptive Biased Stochastic Optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025, 47, 3067–3078. [Google Scholar] [CrossRef] [PubMed]
- Park, K. , and Lee, S. SMMF: Square-Matricized Momentum Factorization for Memory-Efficient Optimization. arXiv 2024, arXiv:2412.08894. [Google Scholar]
- Mahjoubi, M. A. , Lamrani, D., Saleh, S., Moutaouakil, W., Ouhmida, A., Hamida, S., ... and Raihani, A. Optimizing ResNet50 Performance Using Stochastic Gradient Descent on MRI Images for Alzheimer’s Disease Classification. Intelligence-Based Medicine 2025, 100219. [Google Scholar] [CrossRef]
- Seini, A. B. , and Adam, I. O. (2024). Human-AI collaboration for adaptive working and learning outcomes: an activity theory perspective.
- Teessar, J. (2024). The Complexities of Truthful Responding in Questionnaire-Based Research: A Comprehensive Analysis.
- Lauand, C. K. , and Meyn, S. Markovian Foundations for Quasi-Stochastic Approximation. SIAM Journal on Control and Optimization 2025, 63, 402–430. [Google Scholar] [CrossRef]
- Maranjyan, A. , Tyurin, A., and Richtárik, P. Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity. arXiv, arXiv:2501.16168.
- Gao, Z. , and Gündüz, D. Graph Neural Networks over the Air for Decentralized Tasks in Wireless Networks. IEEE Transactions on Signal Processing 2025, 73, 721–737. [Google Scholar] [CrossRef]
- Yoon, T. , Choudhury, S., and Loizou, N. Multiplayer Federated Learning: Reaching Equilibrium with Less Communication. arXiv, arXiv:2501.08263.
- Verma, K. , and Maiti, A. Sine and cosine based learning rate for gradient descent method. Applied Intelligence 2025, 55, 352. [Google Scholar] [CrossRef]
- Borowski, M. , and Miasojedow, B. (2025). Convergence of projected stochastic approximation algorithm. arXiv e-prints, arXiv-2501.
- Dong, K. , Chen, S., Dan, Y., Zhang, L., Li, X., Liang, W., ... and Sun, Y. A new perspective on brain stimulation interventions: Optimal stochastic tracking control of brain network dynamics. arXiv, arXiv:2501.08567.
- Jiang, Y. , Kang, H., Liu, J., and Xu, D. On the Convergence of Decentralized Stochastic Gradient Descent with Biased Gradients. IEEE Transactions on Signal Processing 2025, 73, 549–558. [Google Scholar] [CrossRef]
- Sonobe, N. , Momozaki, T., and Nakagawa, T. Sampling from Density power divergence-based Generalized posterior distribution via Stochastic optimization. arXiv 2025, arXiv:2501.07790. [Google Scholar]
- Zhang, X. , and Jia, G. Convergence of Policy Gradient for Stochastic Linear Quadratic Optimal Control Problems in Infinite Horizon. Journal of Mathematical Analysis and Applications 2025, 129264. [Google Scholar] [CrossRef]
- Thiriveedhi, A. , Ghanta, S., Biswas, S., and Pradhan, A. K. ALL-Net: Integrating CNN and explainable-AI for enhanced diagnosis and interpretation of acute lymphoblastic leukemia. PeerJ Computer Science 2025, 11, e2600. [Google Scholar] [CrossRef]
- Ramos-Briceño, D. A. , Flammia-D’Aleo, A., Fernández-López, G., Carrión-Nessi, F. S., and Forero-Peña, D. A. Deep learning-based malaria parasite detection: Convolutional neural networks model for accurate species identification of Plasmodium falciparum and Plasmodium vivax. Scientific Reports 2025, 15, 3746. [Google Scholar] [CrossRef] [PubMed]
- Espino-Salinas, C. H. , Luna-García, H., Cepeda-Argüelles, A., Trejo-Vázquez, K., Flores-Chaires, L. A., Mercado Reyna, J., ... and Villalba-Condori, K. O. Convolutional Neural Network for Depression and Schizophrenia Detection. Diagnostics 2025, 15, 319. [Google Scholar] [CrossRef] [PubMed]
- Ran, T. , Huang, W., Qin, X., Xie, X., Deng, Y., Pan, Y., ... and Zou, D. Liquid-based cytological diagnosis of pancreatic neuroendocrine tumors using hyperspectral imaging and deep learning. EngMedicine 2025, 2, 100059. [Google Scholar] [CrossRef]
- Araujo, B. V. S. , Rodrigues, G. A., de Oliveira, J. H. P., Xavier, G. V. R., Lebre, U., Cordeiro, C., ... and Ferreira, T. V. Monitoring ZnO surge arresters using convolutional neural networks and image processing techniques combined with signal alignment. Measurement 2025, 116889. [Google Scholar] [CrossRef]
- Sari, I. P. , Elvitaria, L., and Rudiansyah, R. Data-driven approach for batik pattern classification using convolutional neural network (CNN). Jurnal Mandiri IT 2025, 13, 323–331. [Google Scholar]
- Wang, D. , An, K., Mo, Y., Zhang, H., Guo, W., and Wang, B. Cf-Wiad: Consistency Fusion with Weighted Instance and Adaptive Distribution for Enhanced Semi-Supervised Skin Lesion Classification. Available at SSRN 5109182.
- Cai, P. , Zhang, Y., He, H., Lei, Z., and Gao, S. DFNet: A Differential Feature-Incorporated Residual Network for Image Recognition. Journal of Bionic Engineering 2025, 1–14. [Google Scholar]
- Vishwakarma, A. K. , and Deshmukh, M. CNNM-FDI: Novel Convolutional Neural Network Model for Fire Detection in Images. IETE Journal of Research 2025, 1–14. [Google Scholar] [CrossRef]
- Ranjan, P. , Kaushal, A., Girdhar, A., and Kumar, R. Revolutionizing hyperspectral image classification for limited labeled data: Unifying autoencoder-enhanced GANs with convolutional neural networks and zero-shot learning. Earth Science Informatics 2025, 18, 1–26. [Google Scholar] [CrossRef]
- Naseer, A. , and Jalal, A. Multimodal Deep Learning Framework for Enhanced Semantic Scene Classification Using RGB-D Images.
- Wang, Z. , and Wang, J. Personalized Icon Design Model Based on Improved Faster-RCNN. Systems and Soft Computing 2025, 200193. [Google Scholar] [CrossRef]
- Ramana, R. , Vasudevan, V., and Murugan, B. S. Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection. IETE Journal of Research 2025, 1–13. [Google Scholar] [CrossRef]
- Shin, S. , Land, O., Seider, W., Lee, J., and Lee, D. (2025). Artificial Intelligence-Empowered Automated Double Emulsion Droplet Library Generation.
- Taca, B. S. , Lau, D., and Rieder, R. A comparative study between deep learning approaches for aphid classification. IEEE Latin America Transactions 2025, 23, 198–204. [Google Scholar] [CrossRef]
- Ulaş, B. , Szklenár, T., and Szabó, R. Detection of Oscillation-like Patterns in Eclipsing Binary Light Curves using Neural Network-based Object Detection Algorithms. arXiv, arXiv:2501.17538.
- Valensi, D. , Lupu, L., Adam, D., and Topilsky, Y. Semi-Supervised Learning, Foundation Models and Image Processing for Pleural Line Detection and Segmentation in Lung Ultrasound. Foundation Models and Image Processing for Pleural Line Detection and Segmentation in Lung Ultrasound.
- V, A. , V, P. and Kumar, D. An effective object detection via BS2ResNet and LTK-Bi-LSTM. Multimed Tools Appl (2025). [CrossRef]
- Zhu, X. , Chen, W., and Jiang, Q. High-transferability black-box attack of binary image segmentation via adversarial example augmentation. Displays 2025, 102957. [Google Scholar] [CrossRef]
- Guo, X. , Zhu, Y., Li, S., Wu, S., and Liu, S. Research and Implementation of Agronomic Entity and Attribute Extraction Based on Target Localization. Agronomy 2025, 15, 354. [Google Scholar] [CrossRef]
- Yousif, M. , Jassam, N. M., Salim, A., Bardan, H. A., Mutlak, A. F., Sallibi, A. D., and Ataalla, A. F. Melanoma Skin Cancer Detection Using Deep Learning Methods and Binary GWO Algorithm.
- Rahman, S. I. U. , Abbas, N., Ali, S., Salman, M., Alkhayat, A., Khan, J., ... and Gu, Y. H. Deep Learning and Artificial Intelligence-Driven Advanced Methods for Acute Lymphoblastic Leukemia Identification and Classification: A Systematic Review. Comput Model Eng Sci 2025, 142. [Google Scholar]
- Pratap Joshi, K. , Gowda, V. B., Bidare Divakarachari, P., Siddappa Parameshwarappa, P., and Patra, R. K. VSA-GCNN: Attention Guided Graph Neural Networks for Brain Tumor Segmentation and Classification. Big Data and Cognitive Computing 2025, 9, 29. [Google Scholar] [CrossRef]
- Ng, B. , Eyre, K., and Chetrit, M. Prediction of ischemic cardiomyopathy using a deep neural network with non-contrast cine cardiac magnetic resonance images. Journal of Cardiovascular Magnetic Resonance 2025, 27. [Google Scholar] [CrossRef]
- Nguyen, H. T. , Lam, T. B., Truong, T. T. N., Duong, T. D., and Dinh, V. Q. Mv-Trams: An Efficient Tumor Region-Adapted Mammography Synthesis Under Multi-View Diagnosis. Available at SSRN 5109180.
- Chen, W. , Xu, T., and Zhou, W. Task-based Regularization in Penalized Least-Squares for Binary Signal Detection Tasks in Medical Image Denoising. arXiv 2025, arXiv:2501.18418. [Google Scholar]
- Pradhan, P. D. , Talmale, G., and Wazalwar, S. Deep dive into precision (DDiP): Unleashing advanced deep learning approaches in diabetic retinopathy research for enhanced detection and classification of retinal abnormalities. In Recent Advances in Sciences, Engineering, Information Technology and Management (pp. 518–530). CRC Press.
- Örenç, S. , Acar, E., Özerdem, M. S., Şahin, S., and Kaya, A. Automatic Identification of Adenoid Hypertrophy via Ensemble Deep Learning Models Employing X-ray Adenoid Images. Journal of Imaging Informatics in Medicine 2025, 1–15. [Google Scholar]
- Jiang, M. , Wang, S., Chan, K. H., Sun, Y., Xu, Y., Zhang, Z., ... and Tan, T. Multimodal Cross Global Learnable Attention Network for MR images denoising with arbitrary modal missing. Computerized Medical Imaging and Graphics 2025, 102497. [Google Scholar] [CrossRef]
- Al-Haidri, W. , Levchuk, A., Zotov, N., Belousova, K., Ryzhkov, A., Fokin, V., ... and Brui, E. Quantitative analysis of myocardial fibrosis using a deep learning-based framework applied to the 17-Segment model. Biomedical Signal Processing and Control 2025, 105, 107555. [Google Scholar] [CrossRef]
- Osorio, S. L. J. , Ruiz, M. A. R., Mendez-Vazquez, A., and Rodriguez-Tello, E. Fourier Series Guided Design of Quantum Convolutional Neural Networks for Enhanced Time Series Forecasting. arXiv, arXiv:2404.15377.
- Umeano, C. , and Kyriienko, O. Ground state-based quantum feature maps. arXiv, arXiv:2404.07174.
- Liu, N. , He, X., Laurent, T., Di Giovanni, F., Bronstein, M. M., and Bresson, X. Advancing Graph Convolutional Networks via General Spectral Wavelets. arXiv 2024, arXiv:2405.13806. [Google Scholar]
- Vlasic, A. Quantum Circuits, Feature Maps, and Expanded Pseudo-Entropy: A Categorical Theoretic Analysis of Encoding Real-World Data into a Quantum Computer. arXiv 2024, arXiv:2410.22084. [Google Scholar]
- Kim, M. , Hioka, Y., and Witbrock, M. Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis. arXiv, arXiv:2410.04703.
- Xie, Y. , Daigavane, A., Kotak, M., and Smidt, T. (2024). The price of freedom: Exploring tradeoffs between expressivity and computational efficiency in equivariant tensor products. In ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling.
- Liu, G. , Wei, Z., Zhang, H., Wang, R., Yuan, A., Liu, C.,... and Cao, G. (2024, April). Extending Implicit Neural Representations for Text-to-Image Generation. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3650–3654). IEEE.
- Zhang, M. Lock-in spectrum: A tool for representing long-term evolution of bearing fault in the time–frequency domain using vibration signal. Sensor Review 2024, 44, 598–610. [Google Scholar] [CrossRef]
- Hamed, M. , and Lachiri, Z. (2024, July). Expressivity Transfer In Transformer-Based Text-To-Speech Synthesis. In 2024 IEEE 7th International Conference on Advanced Technologies, Signal and Image Processing (ATSIP) (Vol. 1, pp. 443–448). IEEE.
- Lehmann, F. , Gatti, F., Bertin, M., Grenié, D., and Clouteau, D. Uncertainty propagation from crustal geologies to rock-site ground motion with a Fourier Neural Operator. European Journal of Environmental and Civil Engineering 2024, 28, 3088–3105. [Google Scholar]
- Jurafsky, D. (2000). Speech and language processing.
- Manning, C. , and Schutze, H. (1999). Foundations of statistical natural language processing. MIT press.
- Liu, Y. , and Zhang, M. (2018). Neural network methods for natural language processing.
- Allen, J. (1988). Natural language understanding. Benjamin-Cummings Publishing Co., Inc.
- Li, Z. , Zhao, Y., Zhang, X., Han, H., and Huang, C. Word embedding factor based multi-head attention. Artificial Intelligence Review 2025, 58, 1–21. [Google Scholar] [CrossRef]
- Hempelmann, C. F. , Rayz, J. , Dong, T., and Miller, January). Proceedings of the 1st Workshop on Computational Humor (CHum). In Proceedings of the 1st Workshop on Computational Humor (CHum)., T. (2025. [Google Scholar]
- Koehn, P. (2009). Statistical machine translation. Cambridge University Press.
- Eisenstein, J. (2019). Introduction to natural language processing. The MIT Press.
- Otter, D. W. , Medina, J. R., and Kalita, J. K. A survey of the usages of deep learning for natural language processing. IEEE transactions on neural networks and learning systems 2020, 32, 604–624. [Google Scholar] [CrossRef]
- Mitkov, R. (Ed.). (2022). The Oxford handbook of computational linguistics. Oxford university press.
- Liu, X. , Tao, Z., Jiang, T., Chang, H., Ma, Y., and Huang, X. ToDA: Target-oriented Diffusion Attacker against Recommendation System. arXiv 2024, arXiv:2401.12578. [Google Scholar]
- Çekik, R. Effective Text Classification Through Supervised Rough Set-Based Term Weighting. Symmetry 2025, 17, 90. [Google Scholar] [CrossRef]
- Zhu, H. , Xia, J., Liu, R., and Deng, B. SPIRIT: Structural Entropy Guided Prefix Tuning for Hierarchical Text Classification. Entropy 2025, 27, 128. [Google Scholar] [CrossRef]
- Matrane, Y. , Benabbou, F., and Ellaky, Z. Enhancing Moroccan Dialect Sentiment Analysis through Optimized Preprocessing and transfer learning Techniques. IEEE Access 2024, 12, 87756–187777. [Google Scholar] [CrossRef]
- Moqbel, M. , and Jain, A. Mining the truth: A text mining approach to understanding perceived deceptive counterfeits and online ratings. Journal of Retailing and Consumer Services 2025, 84, 104149. [Google Scholar] [CrossRef]
- Kumar, V. , Iqbal, M. I., and Rathore, R. Natural Language Processing (NLP) in Disease Detection—A Discussion of How NLP Techniques Can Be Used to Analyze and Classify Medical Text Data for Disease Diagnosis. AI in Disease Detection: Advancements and Applications 2025, 53-75.
- Yin, S. The Current State and Challenges of Aspect-Based Sentiment Analysis. Applied and Computational Engineering 2024, 114, 25–31. [Google Scholar] [CrossRef]
- Raghavan, M. (2024). Are you who AI says you are? Exploring the role of Natural Language Processing algorithms for “predicting” personality traits from text (Doctoral dissertation, University of South Florida).
- Semeraro, A. , Vilella, S., Improta, R., De Duro, E. S., Mohammad, S. M., Ruffo, G., and Stella, M. EmoAtlas: An emotional network analyzer of texts that merges psychological lexicons, artificial intelligence, and network science. Behavior Research Methods 2025, 57, 77. [Google Scholar] [CrossRef] [PubMed]
- Cai, F. , and Liu, X. Data Analytics for Discourse Analysis with Python: The Case of Therapy Talk, by Dennis Tay. New York: Routledge, 2024. ISBN: 9781032419015 (HB: USD 41.24), xiii+ 182 pages. Natural Language Processing, 1-4.
- Wu, Yonghui. "Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv, arXiv:1609.08144.
- Hettiarachchi, H. , Ranasinghe, T., Rayson, P., Mitkov, R., Gaber, M., Premasiri, D., ... and Uyangodage, L. Overview of the First Workshop on Language Models for Low-Resource Languages (LoResLM 2025). arXiv, arXiv:2412.16365.
- Das, B. R. , and Sahoo, R. Word Alignment in Statistical Machine Translation: Issues and Challenges. Nov Joun of Appl Sci Res 2024, 1, 01–03. [Google Scholar]
- Oluwatoki, T. G. , Adetunmbi, O. A., and Boyinbode, O. K. A Transformer-Based Yoruba to English Machine Translation (TYEMT) System with Rouge Score.
- UÇKAN, T. , and KURT, E. Word Embeddings in NLP. PIONEER AND INNOVATIVE STUDIES IN COMPUTER SCIENCES AND ENGINEERING, 58.
- Pastor, G. C. , Monti, J., Mitkov, R., and Hidalgo-Ternero, C. M. (2024). Recent Advances in Multiword Units in Machine Translation and Translation Technology. Recent Advances in Multiword Units in Machine Translation and Translation Technology.
- Fernandes, R. M. Decoding spatial semantics: A comparative analysis of the performance of open-source LLMs against NMT systems in translating EN-PT-BR subtitles (Doctoral dissertation, Universidade de Sã o Paulo).
- Jozić, K. (2024). Testing ChatGPT’s Capabilities as an English-Croatian Machine Translation System in a Real-World Setting: ETranslation versus ChatGPT at the European Central Bank (Doctoral dissertation, University of Zagreb. Faculty of Humanities and Social Sciences. Department of English language and literature).
- Yang, M. Adaptive Recognition of English Translation Errors Based on Improved Machine Learning Methods. International Journal of High Speed Electronics and Systems 2025, 2540236. [Google Scholar] [CrossRef]
- Linnemann, G. A. , and Reimann, L. E. (2024). Artificial Intelligence as a New Field of Activity for Applied Social Psychology–A Reasoning for Broadening the Scope.
- Merkel, S. , and Schorr, S. OPP: APPLICATION FIELDS and INNOVATIVE TECHNOLOGIES.
- Kushwaha, N. S. , and Singh, P. Artificial Intelligence based Chatbot: A Case Study. Journal of Management and Service Science (JMSS) 2022, 2, 1–13. [Google Scholar]
- Macedo, P. , Madeira, R. N., Santos, P. A., Mota, P., Alves, B., and Pereira, C. M. A Conversational Agent for Empowering People with Parkinson’s Disease in Exercising Through Motivation and Support. Applied Sciences 2024, 15, 223. [Google Scholar] [CrossRef]
- Gupta, R. , Nair, K. , Mishra, M., Ibrahim, B., and Bhardwaj, S. Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda. International Journal of Information Management Data Insights 2024, 4, 100232. [Google Scholar]
- Foroughi, B. , Iranmanesh, M., Yadegaridehkordi, E., Wen, J., Ghobakhloo, M., Senali, M. G., and Annamalai, N. Factors Affecting the Use of ChatGPT for Obtaining Shopping Information. International Journal of Consumer Studies 2025, 49, e70008. [Google Scholar] [CrossRef]
- Jandhyala, V. S. V. BUILDING AI chatbots and virtual assistants: a technical guide for aspiring professionals. International journal of research in computer applications and information technology (IJRCAIT) 2024, 7, 448–463. [Google Scholar]
- Pavlović, N. , and Savić, M. The Impact of the ChatGPT Platform on Consumer Experience in Digital Marketing and User Satisfaction. Theoretical and Practical Research in Economic Fields 2024, 15, 636–646. [Google Scholar] [CrossRef]
- Mannava, V. , Mitrevski, A., and Plöger, P. G. (2024, August). Exploring the Suitability of Conversational AI for Child-Robot Interaction. In 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN) (pp. 1821–1827). IEEE.
- Sherstinova, T. , Mikhaylovskiy, N., Kolpashchikova, E., and Kruglikova, V. (2024, April). Bridging Gaps in Russian Language Processing: AI and Everyday Conversations. In 2024 35th Conference of Open Innovations Association (FRUCT) (pp. 665–674). IEEE.
- Lipton, Z. C. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv 2015. [Google Scholar]
- Pascanu, R. On the difficulty of training recurrent neural networks. arXiv 2013, arXiv:1211.5063. [Google Scholar]
- Jaeger, H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 2001, 148, 13. [Google Scholar]
- Hochreiter, S. (1997). Long Short-term Memory. Neural Computation MIT-Press.
- Kawakami, K. (2008). Supervised sequence labelling with recurrent neural networks (Doctoral dissertation, Ph. D. thesis).
- Bengio, Y. , Simard, P., and Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
- Bhattamishra, S. , Patel, A., and Goyal, N. On the computational power of transformers and its implications in sequence modeling. On the computational power of transformers and its implications in sequence modeling. arXiv arXiv:2006.09286, 2020.
- Siegelmann, H. T. (1993). Theoretical foundations of recurrent neural networks.
- Sutton, R. S. (2018). Reinforcement learning: An introduction. A Bradford Book.
- Barto, A. G. Reinforcement Learning: An Introduction. By Richard’s Sutton. SIAM Rev 2021, 6, 423. [Google Scholar]
- Bertsekas, D. P. (1996). Neuro-dynamic programming. Athena Scientific.
- Kakade, S. M. (2003). On the sample complexity of reinforcement learning. University of London, University College London (United Kingdom).
- Szepesvári, C. (2022). Algorithms for reinforcement learning. Springer nature.
- Haarnoja, T. , Zhou, A., Abbeel, P., and Levine, S. (2018, July). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861–1870). PMLR.
- Mnih, V. , Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... and Hassabis, D. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Konda, V. , and Tsitsiklis, J. Actor-critic algorithms. Advances in neural information processing systems 1999, 12. [Google Scholar]
- Levine, S. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv 2018, arXiv:1805.00909. [Google Scholar]
- Mannor, S. , Mansour, Y., and Tamar, A. (2022). Reinforcement Learning: Foundations. Online manuscript.
- Borkar, V. S. , and Borkar, V. S. (2008). Stochastic approximation: A dynamical systems viewpoint (Vol. 9). Cambridge: Cambridge University Press.
- Takhsha, Amir Reza, Maryam Rastgarpour, and Mozhgan Naderi. "A Feature-Level Ensemble Model for COVID-19 Identification in CXR Images using Choquet Integral and Differential Evolution Optimization. arXiv, arXiv:2501.08241.
- Singh, P. , and Raman, B. (2025). Graph Neural Networks: Extending Deep Learning to Graphs. In Deep Learning Through the Prism of Tensors (pp. 423–482). Singapore: Springer Nature Singapore.
- Yao, L. , Shi, Q., Yang, Z., Shao, S., and Hariri, S. Development of an Edge Resilient ML Ensemble to Tolerate ICS Adversarial Attacks. arXiv 2024, arXiv:2409.18244. [Google Scholar]
- Chen, K. , Bi, Z., Niu, Q., Liu, J., Peng, B., Zhang, S., ... and Feng, P. Deep learning and machine learning, advancing big data analytics and management: Tensorflow pretrained models. arXiv, arXiv:2409.13566.
- Dumić, E. (2024). Learning neural network design with TensorFlow and Keras. In ICERI2024 Proceedings (pp. 10689–10696). IATED.
- Bajaj, K. , Bordoloi, D., Tripathy, R., Mohapatra, S. K., Sarangi, P. K., and Sharma, P. (2024, September). Convolutional Neural Network Based on TensorFlow for the Recognition of Handwritten Digits in the Odia. In 2024 International Conference on Advances in Computing Research on Science Engineering and Technology (ACROSET) (pp. 1–5). IEEE.
- Abbass, A. M. , and Fyath, R. S. Enhanced approach for artificial neural network-based optical fiber channel modeling: Geometric constellation shaping WDM system as a case study. Journal of Applied Research and Technology 2024, 22, 768–780. [Google Scholar] [CrossRef]
- Prabha, D. , Subramanian, R. S., Dinesh, M. G., and Girija, P. (2024). Sustainable Farming Through AI-Enabled Precision Agriculture. In Artificial Intelligence for Precision Agriculture (pp. 159–182). Auerbach Publications.
- Abdelmadjid, S. A. A. D. , and Abdeldjallil, A. I. D. I. (2024, November). Optimized Deep Learning Models For Edge Computing: A Comparative Study on Raspberry PI4 For Real-Time Plant Disease Detection. In 2024 4th International Conference on Embedded and Distributed Systems (EDiS) (pp. 273–278). IEEE.
- Mlambo, F. (2024). What are Bayesian Neural Networks?
- Team, G. Y. Bifang: A New Free-Flying Cubic Robot for Space Station.
- Tabel, L. (2024). Delay Learning in Spiking.
- Naderi, S. , Chen, B., Yang, T., Xiang, J., Heaney, C. E., Latham, J. P., ... and Pain, C. C. A discrete element solution method embedded within a Neural Network. Powder Technology 2024, 448, 120258. [Google Scholar] [CrossRef]
- Polaka, S. K. R. (2024). Verifica delle reti neurali per l’apprendimento rinforzato sicuro.
- Erdogan, L. E. , Kanakagiri, V. A. R., Keutzer, K., and Dong, Z. Stochastic Communication Avoidance for Recommendation Systems. arXiv, arXiv:2411.01611.
- Liao, F. , Tang, Y., Du, Q., Wang, J., Li, M., and Zheng, J. Domain Progressive Low-dose CT Imaging using Iterative Partial Diffusion Model. IEEE Transactions on Medical Imaging 2024. [Google Scholar]
- Sekhavat, Y. (2024). Looking for creative basis of artificial intelligence art in the midst of order and chaos based on Nietzsche’s theories. Theoretical Principles of Visual Arts.
- Cai, H. , Yang, Y., Tang, Y., Sun, Z., and Zhang, W. Shapley value-based class activation mapping for improved explainability in neural networks. The Visual Computer 2025, 1–19. [Google Scholar]
- Na, W. (2024). Rach-Space: Novel Ensemble Learning Method With Applications in Weakly Supervised Learning (Master’s thesis, Tufts University).
- Khajah, M. M. Supercharging BKT with Multidimensional Generalizable IRT and Skill Discovery. Journal of Educational Data Mining 2024, 16, 233–278. [Google Scholar]
- Zhang, Y. , Duan, Z., Huang, Y., and Zhu, F. Theoretical Bound-Guided Hierarchical VAE for Neural Image Codecs. arXiv, arXiv:2403.18535.
- Wang, L. , and Huang, W. On the convergence analysis of over-parameterized variational autoencoders: A neural tangent kernel perspective. Machine Learning 2025, 114, 15. [Google Scholar] [CrossRef]
- Li, C. N. , Liang, H. P., Zhao, B. Q., Wei, S. H., and Zhang, X. Machine learning assisted crystal structure prediction made simple. Journal of Materials Informatics 2024, 4. [Google Scholar] [CrossRef]
- Huang, Y. Research Advanced in Image Generation Based on Diffusion Probability Model. Highlights in Science, Engineering and Technology 2024, 85, 452–456. [Google Scholar] [CrossRef]
- Chenebuah, E. T. (2024). Artificial Intelligence Simulation and Design of Energy Materials with Targeted Properties (Doctoral dissertation, Université d’Ottawa| University of Ottawa).
- Furth, N. , Imel, A., and Zawodzinski, T. A. (2024, November). Graph Encoders for Redox Potentials and Solubility Predictions. In Electrochemical Society Meeting Abstracts prime2024 (No. 3, pp. 344–344). The Electrochemical Society, Inc.
- Gong, J. , Deng, Z., Xie, H., Qiu, Z., Zhao, Z., and Tang, B. Z. Deciphering Design of Aggregation-Induced Emission Materials by Data Interpretation. Advanced Science 2025, 12, 2411345. [Google Scholar] [CrossRef]
- Kim, H. , Lee, C. H., and Hong, C. (2024, July). VATMAN: Video Anomaly Transformer for Monitoring Accidents and Nefariousness. In 2024 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1–7). IEEE.
- Albert, S. W. , Doostan, A., and Schaub, H. Dimensionality Reduction for Onboard Modeling of Uncertain Atmospheres. Journal of Spacecraft and Rockets 2024, 1–13. [Google Scholar]
- Sharma, D. K. , Hota, H. S., and Rababaah, A. R. (2024). Machine Learning for Real World Applications (Doctoral dissertation, Department of Computer Science and Engineering, Indian Institute of Technology Patna).
- Li, T. , Shi, Z., Dale, S. G., Vignale, G., and Lin, M. Jrystal: A JAX-based Differentiable Density Functional Theory Framework for Materials.
- Bieberich, S. , Li, P., Ngai, J., Patel, K., Vogt, R., Ranade, P.,... and Stafford, S. (2024). Conducting Quantum Machine Learning Through The Lens of Solving Neural Differential Equations On A Theoretical Fault Tolerant Quantum Computer: Calibration and Benchmarking.
- Dagréou, M. , Ablin, P., Vaiter, S., and Moreau, T. (2024). How to compute Hessian-vector products?. In The Third Blogpost Track at ICLR 2024.
- Lohoff, J. , and Neftci, E. Optimizing Automatic Differentiation with Deep Reinforcement Learning. arXiv 2024, arXiv:2406.05027. [Google Scholar]
- Legrand, N. , Weber, L., Waade, P. T., Daugaard, A. H. M., Khodadadi, M., Mikuš, N., and Mathys, C. pyhgf: A neural network library for predictive coding. arXiv, arXiv:2410.09206.
- Alzás, P. B. , and Radev, R. Differentiable nuclear deexcitation simulation for low energy neutrino physics. arXiv 2024, arXiv:2404.00180. [Google Scholar]
- Edenhofer, G. , Frank, P., Roth, J., Leike, R. H., Guerdi, M., Scheel-Platz, L. I., ... and Enßlin, T. A. Re-envisioning numerical information field theory (NIFTy. re): A library for Gaussian processes and variational inference. arXiv, arXiv:2402.16683.
- Chan, S. , Kulkarni, P., Paul, H. Y., and Parekh, V. S. (2024, September). Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification. In 2024 IEEE International Conference on Quantum Computing and Engineering (QCE) (Vol. 1, pp. 572–582). IEEE.
- Ye, H. , Hu, Z., Yin, R., Boyko, T. D., Liu, Y., Li, Y., ... and Li, Y. Electron transfer at birnessite/organic compound interfaces: Mechanism, regulation, and two-stage kinetic discrepancy in structural rearrangement and decomposition. Geochimica et Cosmochimica Acta 2025, 388, 253–267. [Google Scholar] [CrossRef]
- Khan, M. , Ludl, A. A., Bankier, S., Björkegren, J. L., and Michoel, T. Prediction of causal genes at GWAS loci with pleiotropic gene regulatory effects using sets of correlated instrumental variables. PLoS genetics 2024, 20, e1011473. [Google Scholar] [CrossRef] [PubMed]
- Ojala, K. , and Zhou, C. (2024). Determination of outdoor object distances from monocular thermal images.
- Popordanoska, T. , and Blaschko, M. (2024). Advancing Calibration in Deep Learning: Theory, Methods, and Applications.
- Alfieri, A. , Cortes, J. M. P., Pastore, E., Castiglione, C., and Rey, G. M. Z. A Deep Q-Network Approach to Job Shop Scheduling with Transport Resources.
- Zanardelli, R. (2025). Statistical learning methods for decision-making, with applications in Industry 4.0.
- Norouzi, M. , Hosseini, S. H., Khoshnevisan, M., and Moshiri, B. Applications of pre-trained CNN models and data fusion techniques in Unity3D for connected vehicles. Applied Intelligence 2025, 55, 390. [Google Scholar] [CrossRef]
- Wang, R. , Yang, T., Liang, C., Wang, M., and Ci, Y. Reliable Autonomous Driving Environment Perception: Uncertainty Quantification of Semantic Segmentation. Journal of Transportation Engineering, Part A: Systems 2025, 151, 04024117. [Google Scholar] [CrossRef]
- Xia, Q. , Chen, P., Xu, G., Sun, H., Li, L., and Yu, G. Adaptive Path-Tracking Controller Embedded With Reinforcement Learning and Preview Model for Autonomous Driving. IEEE Transactions on Vehicular Technology 2024, 74, 3736–3750. [Google Scholar] [CrossRef]
- Liu, Q. , Tang, Y., Li, X., Yang, F., Wang, K., and Li, Z. (2024). MV-STGHAT: Multi-View Spatial-Temporal Graph Hybrid Attention Network for Decision-Making of Connected and Autonomous Vehicles. IEEE Transactions on Vehicular Technology 2024. [Google Scholar]
- Chakraborty, D. , and Deka, B. (2025). Deep Learning-based Selective Feature Fusion for Litchi Fruit Detection using Multimodal UAV Sensor Measurements. IEEE Transactions on Artificial Intelligence 2025. [Google Scholar]
- Mirindi, D. , Khang, A., and Mirindi, F. Artificial Intelligence (AI) and Automation for Driving Green Transportation Systems: A Comprehensive Review. Driving Green Transportation System Through Artificial Intelligence and Automation: Approaches, Technologies and Applications 2025; pp. 1–19.
- Choudhury, B. , Rajakumar, K., Badhale, A. A., Roy, A., Sahoo, R., and Margret, I. N. (2024, June). Comparative Analysis of Advanced Models for Satellite-Based Aircraft Identification. In 2024 International Conference on Smart Systems for Electrical, Electronics, Communication and Computer Engineering (ICSSEECC) (pp. 483–488). IEEE.
- Almubarok, W. , Rosiani, U. D., and Asmara, R. A. (2024, November). MobileNetV2 Pruning for Improved Efficiency in Catfish Classification on Resource-Limited Devices. In 2024 IEEE 10th Information Technology International Seminar (ITIS) (pp. 271–277). IEEE.
- Ding, Q. (2024, February). Classification Techniques of Tongue Manifestation Based on Deep Learning. In 2024 IEEE 3rd International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA) (pp. 802–810). IEEE.
- He, K. , Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
- Krizhevsky, A. , Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012, 25. [Google Scholar]
- Sultana, F. , Sufian, A., and Dutta, P. (2018, November). Advancements in image classification using convolutional neural network. In 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) (pp. 122–129). IEEE.
- Sattler, T. , Zhou, Q., Pollefeys, M., and Leal-Taixe, L. (2019). Understanding the limitations of cnn-based absolute camera pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3302–3312).
- Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
- Nannepagu, M. , Babu, D. B., and Madhuri, C. B. Leveraging Hybrid AI Models: DQN, Prophet, BERT, ART-NN, and Transformer-Based Approaches for Advanced Stock Market Forecasting.
- De Rose, L. , Andresini, G., Appice, A., and Malerba, D. VINCENT: Cyber-threat detection through vision transformers and knowledge distillation. Computers and Security 2024, 103926. [Google Scholar]
- Buehler, M. J. (2025). Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers. arXiv:2501.02393.
- Tabibpour, S. A. , and Madanizadeh, S. A. (2024). Solving High-Dimensional Dynamic Programming Using Set Transformer. Available at SSRN 5040295.
- Li, S. , and Dong, P. (2024, October). Mixed Attention Transformer Enhanced Channel Estimation for Extremely Large-Scale MIMO Systems. In 2024 16th International Conference on Wireless Communications and Signal Processing (WCSP) (pp. 394–399). IEEE.
- Asefa, S. H. , and Assabie, Y. Transformer-Based Amharic-to-English Machine Translation with Character Embedding and Combined Regularization Techniques. IEEE Access 2024, 13, 1090–1105. [Google Scholar] [CrossRef]
- Liao, M. , and Chen, M. (2024, November). A new deepfake detection method by vision transformers. In International Conference on Algorithms, High Performance Computing, and Artificial Intelligence (AHPCAI 2024) (Vol. 13403, pp. 953–957). SPIE.
- Jiang, L. , Cui, J., Xu, Y., Deng, X., Wu, X., Zhou, J., and Wang, Y. (2024, August). SCFormer: Spatial and Channel-wise Transformer with Contrastive Learning for High-Quality PET Image Reconstruction. In 2024 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) (pp. 26–31). IEEE.
- Goodfellow, I. , Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... and Bengio, Y. Generative adversarial nets. Advances in neural information processing systems 2014, 27. [Google Scholar]
- CHAPPIDI, J. , and Sundaram, D.M. DUAL Q-learning with graph neural networks: a novel approach to animal detection in challenging ecosystems. Journal of Theoretical and Applied Information Technology 2024, 102. [Google Scholar]
- Joni, R. (2024). Delving into Deep Learning: Illuminating Techniques and Visual Clarity for Image Analysis (No. 12808). EasyChair.
- Kalaiarasi, G. , Sudharani, B., Jonnalagadda, S. C., Battula, H. V., and Sanagala, B. (2024, July). A Comprehensive Survey of Image Steganography. In 2024 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS) (pp. 1225–1230). IEEE.
- Arjmandi-Tash, A. M. , Mansourian, A., Rahsepar, F. R., and Abdi, Y. Predicting Photodetector Responsivity through Machine Learning. Advanced Theory and Simulations 2024, 2301219. [Google Scholar] [CrossRef]
- Gao, Y. (2024). Neural networks meet applied mathematics: GANs, PINNs, and transformers. HKU Theses Online (HKUTO).
- Hisama, K. , Ishikawa, A., Aspera, S. M., and Koyama, M. Theoretical Catalyst Screening of Multielement Alloy Catalysts for Ammonia Synthesis Using Machine Learning Potential and Generative Artificial Intelligence. The Journal of Physical Chemistry C 2024, 128, 18750–18758. [Google Scholar] [CrossRef]
- Wang, M. , and Zhang, Y. Image Segmentation in Complex Backgrounds using an Improved Generative Adversarial Network. International Journal of Advanced Computer Science and Applications 2024, 15. [Google Scholar]
- Alonso, N. I. , and Arias, F. (2025). The Mathematics of Q-Learning and the Hamilton-Jacobi-Bellman Equation. Fernando, The Mathematics of Q-Learning and the Hamilton-Jacobi-Bellman Equation (5 January 2025).
- Lu, C. , Shi, L., Chen, Z., Wu, C., and Wierman, A. Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization. arXiv 2024, arXiv:2411.07591. [Google Scholar]
- Humayoo, M. Time-Scale Separation in Q-Learning: Extending TD (▵) for Action-Value Function Decomposition. arXiv 2024, arXiv:2411.14019. [Google Scholar]
- Jia, L. , Qi, N., Su, Z., Chu, F., Fang, S., Wong, K.K., and Chae, C.B. Game theory and reinforcement learning for anti-jamming defense in wireless communications: Current research, challenges, and solutions. IEEE Communications Surveys and Tutorials 2024. [Google Scholar] [CrossRef]
- Chai, J. , Chen, E., and Fan, J. Deep Transfer Q-Learning for Offline Non-Stationary Reinforcement Learning. arXiv, arXiv:2501.04870.
- Yao, J. , and Gong, X. (2024, October). Communication-Efficient and Resilient Distributed Deep Reinforcement Learning for Multi-Agent Systems. In 2024 IEEE International Conference on Unmanned Systems (ICUS) (pp. 1521–1526). IEEE.
- Liu, Y. , Yang, T., Tian, L., and Pei, J. SGD-TripleQNet: An Integrated Deep Reinforcement Learning Model for Vehicle Lane-Change Decision. Mathematics 2025, 13, 235. [Google Scholar]
- Masood, F. , Ahmad, J., Al Mazroa, A., Alasbali, N., Alazeb, A., and Alshehri, M. S. Multi IRS-Aided Low-Carbon Power Management for Green Communication in 6G Smart Agriculture Using Deep Game Theory. Computational Intelligence 2025, 41, e70022. [Google Scholar] [CrossRef]
- Patrick, B. Reinforcement Learning for Dynamic Economic Models.
- El Mimouni, I. , and Avrachenkov, K. (2025, January). Deep Q-Learning with Whittle Index for Contextual Restless Bandits: Application to Email Recommender Systems. In Northern Lights Deep Learning Conference 2025.
- Shefin, R. S. , Rahman, M. A., Le, T., and Alqahtani, S. xSRL: Safety-Aware Explainable Reinforcement Learning–Safety as a Product of Explainability. arXiv, arXiv:2412.19311.
- Khlifi, A. , Othmani, M., and Kherallah, M. (2025). A Novel Approach to Autonomous Driving Using DDQN-Based Deep Reinforcement Learning.
- Kuczkowski, D. (2024). Energy efficient multi-objective reinforcement learning algorithm for traffic simulation.
- Krauss, R. , Zielasko, J., and Drechsler, R. Large-Scale Evolutionary Optimization of Artificial Neural Networks Using Adaptive Mutations.
- Ahamed, M. S. , Pey, J. J. J., Samarakoon, S. B. P., Muthugala, M. V. J., and Elara, M. R. (2025). Reinforcement Learning for Reconfigurable Robotic Soccer. IEEE Access 2025. [Google Scholar]
- Elmquist, A. , Serban, R., and Negrut, D. A methodology to quantify simulation-vs-reality differences in images for autonomous robots. IEEE Sensors Journal 2024, 25, 6522–6533. [Google Scholar] [CrossRef]
- Kobanda, A. , Portelas, R., Maillard, O. A., and Denoyer, L. Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning. arXiv 2024, arXiv:2412.14865. [Google Scholar]
- Xu, J. , Xie, G., Zhang, Z., Hou, X., Zhang, S., Ren, Y., and Niyato, D. UPEGSim: An RL-Enabled Simulator for Unmanned Underwater Vehicles Dedicated in the Underwater Pursuit-Evasion Game. IEEE Internet of Things Journal 2025, 12, 2334–2346. [Google Scholar] [CrossRef]
- Patadiya, K. , Jain, R., Moteriya, J., Palaniappan, D., Kumar, P., and Premavathi, T. (2024, December). Application of Deep Learning to Generate Auto Player Mode in Car Based Game. In 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 233–237). IEEE.
- Janjua, J. I. , Kousar, S., Khan, A., Ihsan, A., Abbas, T., and Saeed, A. Q. (2024, December). Enhancing Scalability in Reinforcement Learning for Open Spaces. In 2024 International Conference on Decision Aid Sciences and Applications (DASA) (pp. 1–8). IEEE.
- Yang, L. , Li, Y., Wang, J., and Sherratt, R. S. Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 2020, 8, 23522–23530. [Google Scholar] [CrossRef]
- Manikandan, C. , Kumar, P. S., Nikitha, N., Sanjana, P. G., and Dileep, Y. Filtering Emails Using Natural Language Processing.
- ISIAKA, S. O. , BABATUNDE, R. S., and ISIAKA, R. M. Exploring Artificial Intelligence (AI) Technologies in Predictive Medicine: A Systematic Review.
- Petrov, A. , Zhao, D., Smith, J., Volkov, S., Wang, J., and Ivanov, D. Deep Learning Approaches for Emotional State Classification in Textual Data.
- Liang, M. Leveraging natural language processing for automated assessment and feedback production in virtual education settings. Journal of Computational Methods in Sciences and Engineering 2025, 14727978251314556. [Google Scholar] [CrossRef]
- Jin, L. Research on Optimization Strategies of Artificial Intelligence Algorithms for the Integration and Dissemination of Pharmaceutical Science Popularization Knowledge. Scientific Journal of Technology 2025, 7, 45–55. [Google Scholar] [CrossRef]
- McNicholas, B. A. , Madden, M. G., and Laffey, J. G. Natural language processing in critical care: Opportunities, challenges, and future directions. Intensive Care Medicine 2025, 1–5. [Google Scholar]
- Abd Al Abbas, M. , and Khammas, B. M. Efficient IoT Malware Detection Technique Using Recurrent Neural Network. Iraqi Journal of Information and Communication Technology 2024, 7, 29–42. [Google Scholar] [CrossRef]
- Kalonia, S. , and Upadhyay, A. (2025). Deep learning-based approach to predict software faults. In Artificial Intelligence and Machine Learning Applications for Sustainable Development (pp. 326–348). CRC Press.
- Han, S. C. , Weld, H., Li, Y., Lee, J., and Poon, J. Natural Language Understanding in Conversational AI with Deep Learning.
- Potter, K. , and Egon, A. RECURRENT Neural Networks (Rnns) For Time Series Forecasting.
- Yatkin, M. A. , Kõrgesaar, M., and Işlak, Ü. A Topological Approach to Enhancing Consistency in Machine Learning via Recurrent Neural Networks. Applied Sciences 2025, 15, 933. [Google Scholar] [CrossRef]
- Saifullah, S. (2024). Comparative Analysis of LSTM and GRU Models for Chicken Egg Fertility Classification using Deep Learning.
- Noguer I Alonso, Miquel, The Mathematics of Recurrent Neural Networks (, 2024). Available at SSRN: Https://ssrn.com/abstract=5001243 or http://dx.doi.org/10.2139/ssrn.5001243.
- Tu, Z. , Jeffries, S. D., Morse, J., and Hemmerling, T. M. Comparison of time-series models for predicting physiological metrics under sedation. Journal of Clinical Monitoring and Computing 2024, 1–11. [Google Scholar]
- Zuo, Y. , Jiang, J., and Yada, K. Application of hybrid gate recurrent unit for in-store trajectory prediction based on indoor location system. Scientific Reports 2025, 15, 1055. [Google Scholar] [CrossRef]
- Lima, R. , Scardua, L. A., and De Almeida, G. M. (2024). Predicting Temperatures Inside a Steel Slab Reheating Furnace Using Neural Networks. Authorea Preprints.
- Khan, S. , Muhammad, Y., Jadoon, I., Awan, S. E., and Raja, M. A. Z. Leveraging LSTM-SMI and ARIMA architecture for robust wind power plant forecasting. Applied Soft Computing 2025, 170, 112765. [Google Scholar] [CrossRef]
- Guo, Z. , and Feng, L. Multi-step prediction of greenhouse temperature and humidity based on temporal position attention LSTM. Stochastic Environmental Research and Risk Assessment 2024, 1–28. [Google Scholar]
- Abdelhamid, N. M. , Khechekhouche, A., Mostefa, K., Brahim, L., and Talal, G. Deep-RNN based model for short-time forecasting photovoltaic power generation using IoT. Studies in Engineering and Exact Sciences 2024, 5, e11461–e11461. [Google Scholar] [CrossRef]
- Rohman, F. N. , and Farikhin, B.S. Hyperparameter Tuning of Random Forest Algorithm for Diabetes Classification.
- Rahman, M. Utilizing Machine Learning Techniques for Early Brain Tumor Detection.
- Nandi, A. , Singh, H., Majumdar, A., Shaw, A., and Maiti, A. Optimizing Baby Sound Recognition using Deep Learning through Class Balancing and Model Tuning.
- Sianga, B. E. , Mbago, M. C., and Msengwa, A. S. Predicting the prevalence of cardiovascular diseases using machine learning algorithms. Intelligence-Based Medicine 2025, 100199. [Google Scholar] [CrossRef]
- Li, L. , Hu, Y., Yang, Z., Luo, Z., Wang, J., Wang, W., ... and Zhang, Z. Exploring the assessment of post-cardiac valve surgery pulmonary complication risks through the integration of wearable continuous physiological and clinical data. BMC Medical Informatics and Decision Making 2025, 25, 1–11. [Google Scholar] [CrossRef]
- Lázaro, F. L. , Madeira, T., Melicio, R., Valério, D., and Santos, L. F. Identifying Human Factors in Aviation Accidents with Natural Language Processing and Machine Learning Models. Aerospace 2025, 12, 106. [Google Scholar] [CrossRef]
- Li, Z. , Zhong, J., Wang, H., Xu, J., Li, Y., You, J., ... and Dev, S. RAINER: A Robust Ensemble Learning Grid Search-Tuned Framework for Rainfall Patterns Prediction. arXiv 2025, arXiv:2501.16900. [Google Scholar]
- Khurshid, M. R. , Manzoor, S., Sadiq, T., Hussain, L., Khan, M. S., and Dutta, A. K. Unveiling diabetes onset: Optimized XGBoost with Bayesian optimization for enhanced prediction. PloS one 2025, 20, e0310218. [Google Scholar] [CrossRef]
- Kanwar, M. , Pokharel, B., and Lim, S. A new random forest method for landslide susceptibility mapping using hyperparameter optimization and grid search techniques. International Journal of Environmental Science and Technology 2025, 1–16. [Google Scholar]
- Fadil, M. , Akrom, M. , and Herowati, W. Utilization of Machine Learning for Predicting Corrosion Inhibition by Quinoxaline Compounds. Journal of Applied Informatics and Computing 2025, 9, 173–177. [Google Scholar]
- Emmanuel, J. , Isewon, I., and Oyelade, J. An Optimized Deep-Forest Algorithm Using a Modified Differential Evolution Optimization Algorithm: A Case of Host-Pathogen Protein-Protein Interaction Prediction. Computational and Structural Biotechnology Journal 2025, 27, 595–611. [Google Scholar] [CrossRef]
- Gaurav, A. , Gupta, B. B., Attar, R. W., Alhomoud, A., Arya, V., and Chui, K. T. Driver identification in advanced transportation systems using osprey and salp swarm optimized random forest model. Scientific Reports 2025, 15, 2453. [Google Scholar] [CrossRef] [PubMed]
- Ning, C. , Ouyang, H., Xiao, J., Wu, D., Sun, Z., Liu, B., ... and Huang, G. Development and validation of an explainable machine learning model for mortality prediction among patients with infected pancreatic necrosis. eClinicalMedicine 2025, 80, 103074. [Google Scholar] [CrossRef]
- Muñoz, V. , Ballester, C., Copaci, D., Moreno, L., and Blanco, D. Accelerating hyperparameter optimization with a secretary. Neurocomputing 2025, 129455. [Google Scholar] [CrossRef]
- Balcan, M. F. , Nguyen, A. T., and Sharma, D. Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function. arXiv, arXiv:2501.13734.
- Azimi, H. , Kalhor, E. G., Nabavi, S. R., Behbahani, M., and Vardini, M. T. Data-based modeling for prediction of supercapacitor capacity: Integrated machine learning and metaheuristic algorithms. Journal of the Taiwan Institute of Chemical Engineers 2025, 170, 105996. [Google Scholar] [CrossRef]
- Shibina, V. , and Thasleema, T. M. Voice feature-based diagnosis of Parkinson’s disease using nature inspired squirrel search algorithm with ensemble learning classifiers. Iran Journal of Computer Science 2025, 1–25. [Google Scholar]
- Chang, F. , Dong, S., Yin, H., Ye, X., Wu, Z., Zhang, W., and Zhu, H. 3D displacement time series prediction of a north-facing reservoir landslide powered by InSAR and machine learning. Journal of Rock Mechanics and Geotechnical Engineering 2025. [Google Scholar] [CrossRef]
- Cihan, P. Bayesian Hyperparameter Optimization of Machine Learning Models for Predicting Biomass Gasification Gases. Applied Sciences 2025, 15, 1018. [Google Scholar] [CrossRef]
- Makomere, R. , Rutto, H., Alugongo, A., Koech, L., Suter, E., and Kohitlhetse, I. Enhanced dry SO2 capture estimation using Python-driven computational frameworks with hyperparameter tuning and data augmentation. Unconventional Resources 2025, 100145. [Google Scholar] [CrossRef]
- Bakır, H. A new method for tuning the CNN pre-trained models as a feature extractor for malware detection. Pattern Analysis and Applications 2025, 28, 26. [Google Scholar] [CrossRef]
- Liu, Y. , Yin, H., and Li, Q. (2025). Sound absorption performance prediction of multi-dimensional Helmholtz resonators based on deep learning and hyperparameter optimization. Physica Scripta.
- Ma, Z. , Zhao, M., Dai, X., and Chen, Y. Anomaly detection for high-speed machining using hybrid regularized support vector data description. Robotics and Computer-Integrated Manufacturing 2025, 94, 102962. [Google Scholar] [CrossRef]
- El-Bouzaidi, Y. E. I. , Hibbi, F. Z., and Abdoun, O. (2025). Optimizing Convolutional Neural Network Impact of Hyperparameter Tuning and Transfer Learning. In Innovations in Optimization and Machine Learning (pp. 301–326). IGI Global Scientific Publishing.
- Mustapha, B. , Zhou, Y., Shan, C., and Xiao, Z.. Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks. Current Medical Imaging 2025, e15734056326685. [Google Scholar] [CrossRef] [PubMed]
- Adly, S. , and Attouch, H. Complexity Analysis Based on Tuning the Viscosity Parameter of the Su-Boyd-Candès Inertial Gradient Dynamics. Set-Valued and Variational Analysis 2024, 32, 17. [Google Scholar] [CrossRef]
- Wang, Z. , and Peypouquet, J. G. Nesterov’s Accelerated Gradient Method for Strongly Convex Functions: From Inertial Dynamics to Iterative Algorithms.
- Hermant, J. , Renaud, M., Aujol, J. F., and Rondepierre, C. D. A. Nesterov momentum for convex functions with interpolation: Is it faster than Stochastic gradient descent?. Book of abstracts PGMO DAYS 2024 2024, 68.
- Alavala, S. , and Gorthi, S. (2024). 3D CBCT Challenge 2024: Improved Cone Beam CT Reconstruction using SwinIR-Based Sinogram and Image Enhancement. arXiv:2406.08048.
- Li, C. J. (2024). Unified Momentum Dynamics in Stochastic Gradient Optimization. Available at SSRN 4981009.
- Gupta, K. , and Wojtowytsch, S. Nesterov acceleration in benignly non-convex landscapes. arXiv, arXiv:2410.08395.
- Razzouki, O. F. , Charroud, A., El Allali, Z., Chetouani, A., and Aslimani, N. (2024, December). A Survey of Advanced Gradient Methods in Machine Learning. In 2024 7th International Conference on Advanced Communication Technologies and Networking (CommNet) (pp. 1–7). IEEE.
- Wang, J. , Du, B., Su, Z., Hu, K., Yu, J., Cao, C., ... and Guo, H. A fast LMS-based digital background calibration technique for 16-bit SAR ADC with modified shuffling scheme. Microelectronics Journal 2025, 156, 106547. [Google Scholar] [CrossRef]
- Naeem, K. , Bukhari, A., Daud, A., Alsahfi, T., Alshemaimri, B., and Alhajlah, M. Machine Learning and Deep Learning Optimization Algorithms for Unconstrained Convex Optimization Problem. IEEE Access 2024. [Google Scholar]
- Campos, C. M. , de Diego, D. M., and Torrente, J. Momentum-based gradient descent methods for Lie groups. arXiv, arXiv:2404.09363.
- Jing Li, Hewan Chen, Mohd Shahizan Othman, Naomie Salim, Lizawati Mi Yusuf, Shamini Raja Kumaran, NFIoT-GATE-DTL IDS: Genetic algorithm-tuned ensemble of deep transfer learning for NetFlow-based intrusion detection system for internet of things. Engineering Applications of Artificial Intelligence 2025, 143, 110046. [CrossRef]
- GÜL, M.F. , Bakır, H. GA-ML: Enhancing the prediction of water electrical conductivity through genetic algorithm-based end-to-end hyperparameter tuning. Earth Sci Inform 2025, 18, 191. [Google Scholar] [CrossRef]
- Sen, A. , Sen, U., Paul, M., Padhy, A. P., Sai, S., Mallik, A., and Mallick, C. QGAPHEnsemble: Combining Hybrid QLSTM Network Ensemble via Adaptive Weighting for Short Term Weather Forecasting. arXiv, arXiv:2501.10866.
- Roy, A. , Sen, A., Gupta, S., Haldar, S., Deb, S., Vankala, T. N., and Das, A. DeepEyeNet: Adaptive Genetic Bayesian Algorithm Based Hybrid ConvNeXtTiny Framework For Multi-Feature Glaucoma Eye Diagnosis. arXiv, arXiv:2501.11168.
- Jiang, T. , Lu, W., Lu, L., Xu, L., Xi, W., Liu, J., and Zhu, Y. Inlet Passage Hydraulic Performance Optimization of Coastal Drainage Pump System Based on Machine Learning Algorithms. Journal of Marine Science and Engineering 2025, 13, 274. [Google Scholar] [CrossRef]
- Borah, J. , and Chandrasekaran, M. Application of Machine Learning-Based Approach to Predict and Optimize Mechanical Properties of Additively Manufactured Polyether Ether Ketone Biopolymer Using Fused Deposition Modeling. Journal of Materials Engineering and Performance 2025, 1–17. [Google Scholar]
- Tan, Q. , He, D., Sun, Z., Yao, Z., zhou, J. X., and Chen, T. (2025). A deep reinforcement learning based metro train operation control optimization considering energy conservation and passenger comfort. Engineering Research Express.
- García-Galindo, A. , López-De-Castro, M., and Armañanzas, R. Fair prediction sets through multi-objective hyperparameter optimization. Machine Learning 2025, 114, 27. [Google Scholar] [CrossRef]
- Montufar, G. F. , Pascanu, R., Cho, K., and Bengio, Y. On the number of linear regions of deep neural networks. Advances in neural information processing systems 2014, 27. [Google Scholar]
- Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function.
- Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural networks 2017, 94, 103–114. [Google Scholar] [CrossRef] [PubMed]
- Telgarsky, M. (2016, June). Benefits of depth in neural networks. In Conference on learning theory (pp. 1517–1539). PMLR.
- Lu, Z. , Pu, H., Wang, F., Hu, Z., and Wang, L. The expressive power of neural networks: A view from the width. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Zhang, C. , Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]
- Scarselli, F. , Gori, M., Tsoi, A. C., Hagenbuchner, M., and Monfardini, G. The graph neural network model. IEEE transactions on neural networks 2008, 20, 61–80. [Google Scholar] [CrossRef]
- Kipf, T. N. , and Welling, M. Semi-supervised classification with graph convolutional networks. arXiv, arXiv:1609.02907.
- Hamilton, W. , Ying, Z., and Leskovec, J. Inductive representation learning on large graphs. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Veličković, P. , Cucurull, G., Casanova, A., Romero, A., Lio, P., and Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Xu, K. , Hu, W., Leskovec, J., and Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
- Gilmer, J. , Schoenholz, S. S., Riley, P. F., Vinyals, O., and Dahl, G. E. (2017, July). Neural message passing for quantum chemistry. In International conference on machine learning (pp. 1263–1272). PMLR.
- Battaglia, P. W. , Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., ... and Pascanu, R. Relational inductive biases, deep learning, and graph networks. arXiv, arXiv:1806.01261.
- Bruna, J. , Zaremba, W., Szlam, A., and LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
- Ying, R. , He, R., Chen, K., Eksombatchai, P., Hamilton, W. L., and Leskovec, J. (2018, July). Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 974–983).
- Zhou, J. , Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., ... and Sun, M. Graph neural networks: A review of methods and applications. AI open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Raissi, M. , Perdikaris, P., and Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Karniadakis, G. E. , Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L. Physics-informed machine learning. Nature Reviews Physics 2021, 3, 422–440. [Google Scholar] [CrossRef]
- Lu, L. , Meng, X., Mao, Z., and Karniadakis, G. E. DeepXDE: A deep learning library for solving differential equations. SIAM review 2021, 63, 208–228. [Google Scholar] [CrossRef]
- Sirignano, J. , and Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations. Journal of computational physics 2018, 375, 1339–1364. [Google Scholar] [CrossRef]
- Wang, S. , Teng, Y., and Perdikaris, P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing 2021, 43, A3055–A3081. [Google Scholar] [CrossRef]
- Mishra, S. , and Molinaro, R. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA Journal of Numerical Analysis 2023, 43, 1–43. [Google Scholar] [CrossRef]
- Zhang, D. , Guo, L., and Karniadakis, G. E. Learning in modal space: Solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM Journal on Scientific Computing 2020, 42, A639–A665. [Google Scholar] [CrossRef]
- Jin, X. , Cai, S., Li, H., and Karniadakis, G. E. NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. Journal of Computational Physics 2021, 426, 109951. [Google Scholar] [CrossRef]
- Chen, Y. , Lu, L., Karniadakis, G. E., and Dal Negro, L. Physics-informed neural networks for inverse problems in nano-optics and metamaterials. Optics express 2020, 28, 11618–11633. [Google Scholar] [CrossRef]
- Psichogios, D. C. , and Ungar, L. H. A hybrid neural network-first principles approach to process modeling. AIChE Journal 1992, 38, 1499–1511. [Google Scholar] [CrossRef]
- Chizat, L. , and Bach, F. On the global convergence of gradient descent for over-parameterized models using optimal transport. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Du, S. , Lee, J., Li, H., Wang, L., and Zhai, X. (2019, May). Gradient descent finds global minima of deep neural networks. In International conference on machine learning (pp. 1675–1685). PMLR.
- Arora, S. , Du, S., Hu, W., Li, Z., and Wang, R. (2019, May). Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning (pp. 322–332). PMLR.
- Allen-Zhu, Z. , Li, Y., and Song, Z. (2019, May). A convergence theory for deep learning via over-parameterization. In International conference on machine learning (pp. 242–252). PMLR.
- Cao, Y. , and Gu, Q. Generalization bounds of stochastic gradient descent for wide and deep neural networks. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Yang, G. Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation. arXiv 2019, arXiv:1902.04760. [Google Scholar]
- Huang, J. , and Yau, H. T. (2020, November). Dynamics of deep neural networks and neural tangent hierarchy. In International conference on machine learning (pp. 4542–4551). PMLR.
- Belkin, M. , Ma, S., and Mandal, S. (2018, July). To understand deep learning we need to understand kernel learning. In International Conference on Machine Learning (pp. 541–549). PMLR.
- Sra, S. , Nowozin, S., and Wright, S. J. (Eds.). (2011). Optimization for machine learning. Mit Press.
- Choromanska, A. , Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. (2015, February). The loss surfaces of multilayer networks. In Artificial intelligence and statistics (pp. 192–204). PMLR.
- Arora, S. , Cohen, N., and Hazan, E. (2018, July). On the optimization of deep networks: Implicit acceleration by overparameterization. In International conference on machine learning (pp. 244–253). PMLR.
- Baratin, A. , George, T., Laurent, C., Hjelm, R. D., Lajoie, G., Vincent, P., and Lacoste-Julien, S. Implicit regularization in deep learning: A view from function space. arXiv, arXiv:2008.00938.
- Balduzzi, D. , Racaniere, S., Martens, J., Foerster, J., Tuyls, K., and Graepel, T. (2018, July). The mechanics of n-player differentiable games. In International Conference on Machine Learning (pp. 354–363). PMLR.
- Han, J. , and Jentzen, A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in mathematics and statistics 2017, 5, 349–380. [Google Scholar]
- Beck, C. , Becker, S., Grohs, P., Jaafari, N., and Jentzen, A. Solving the Kolmogorov PDE by means of deep learning. Journal of Scientific Computing 2021, 88, 1–28. [Google Scholar] [CrossRef]
- Han, J. , Jentzen, A., and E, W. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences 2018, 115, 8505–8510. [Google Scholar] [CrossRef]
- Jentzen, A. , Salimova, D., and Welti, T. A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. arXiv 2018, arXiv:1809.07321. [Google Scholar]
- Yu, B. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 2018, 6, 1–12. [Google Scholar]
- Khoo, Y. , Lu, J., and Ying, L. Solving parametric PDE problems with artificial neural networks. European Journal of Applied Mathematics 2021, 32, 421–435. [Google Scholar] [CrossRef]
- Hutzenthaler, M. , and Kruse, T. Multilevel Picard approximations of high-dimensional semilinear parabolic differential equations with gradient-dependent nonlinearities. SIAM Journal on Numerical Analysis 2020, 58, 929–961. [Google Scholar] [CrossRef]
- Li, L. , Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 2018, 18, 1–52. [Google Scholar]
- Falkner, S. , Klein, A., and Hutter, F. (2018, July). BOHB: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning (pp. 1437–1446). PMLR.
- Li, L. , Jamieson, K., Rostamizadeh, A., Gonina, E., Ben-Tzur, J., Hardt, M., ... and Talwalkar, A. A system for massively parallel hyperparameter tuning. Proceedings of Machine Learning and Systems 2020, 2, 230–246. [Google Scholar]
- Snoek, J. , Larochelle, H., and Adams, R. P. Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems 2012, 25. [Google Scholar]
- Slivkins, A. , Zhou, X., Sankararaman, K. A., and Foster, D. J. Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression. Journal of Machine Learning Research 2024, 25, 1–37. [Google Scholar]
- Hazan, E. , Klivans, A., and Yuan, Y. Hyperparameter optimization: A spectral approach. arXiv, arXiv:1706.00764.
- Domhan, T. , Springenberg, J. T., and Hutter, F. (2015, June). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence.
- Agrawal, T. (2021). Hyperparameter optimization in machine learning: Make your machine learning and deep learning models more efficient (pp. 109–129). New York, NY, USA:: Apress.
- Shekhar, S. , Bansode, A., and Salim, A. (2021, December). A comparative study of hyper-parameter optimization tools. In 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (pp. 1–6). IEEE.
- Bergstra, J. , Bardenet, R., Bengio, Y., and Kégl, B. Algorithms for hyper-parameter optimization. Advances in neural information processing systems 2011, 24. [Google Scholar]
- Zoph, B. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
- Maclaurin, D. , Duvenaud, D., and Adams, R. (2015, June). Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning (pp. 2113–2122). PMLR.
- Pedregosa, F. (2016, June). Hyperparameter optimization with approximate gradient. In International conference on machine learning (pp. 737–746). PMLR.
- Franceschi, L. , Frasconi, P., Salzo, S., Grazzi, R., and Pontil, M. (2018, July). Bilevel programming for hyperparameter optimization and meta-learning. In International conference on machine learning (pp. 1568–1577). PMLR.
- Franceschi, L. , Donini, M., Frasconi, P., and Pontil, M. (2017, July). Forward and reverse gradient-based hyperparameter optimization. In International Conference on Machine Learning (pp. 1165–1173). PMLR.
- Liu, H. , Simonyan, K., and Yang, Y. Darts: Differentiable architecture search. arXiv 2018, arXiv:1806.09055. [Google Scholar]
- Lorraine, J. , Vicol, P., and Duvenaud, D. (2020, June). Optimizing millions of hyperparameters by implicit differentiation. In International conference on artificial intelligence and statistics (pp. 1540–1552). PMLR.
- Liang, J. , Gonzalez, S., Shahrzad, H., and Miikkulainen, R. (2021, June). Regularized evolutionary population-based training. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 323–331).
- Jaderberg, M. , Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., ... and Kavukcuoglu, K. Population based training of neural networks. arXiv 2017, arXiv:1711.09846. [Google Scholar]
- Co-Reyes, J. D. , Miao, Y., Peng, D., Real, E., Levine, S., Le, Q. V., ... and Faust, A. Evolving reinforcement learning algorithms. arXiv 2021, arXiv:2101.03958. [Google Scholar]
- Song, C. , Ma, Y., Xu, Y., and Chen, H. Multi-population evolutionary neural architecture search with stacked generalization. Neurocomputing 2024, 587, 127664. [Google Scholar] [CrossRef]
- Wan, X. , Lu, C., Parker-Holder, J., Ball, P. J., Nguyen, V., Ru, B., and Osborne, M. (2022, September). Bayesian generational population-based training. In International conference on automated machine learning (pp. 14–1). PMLR.
- García-Valdez, M. , Mancilla, A., Castillo, O., and Merelo-Guervós, J. J. Distributed and asynchronous population-based optimization applied to the optimal design of fuzzy controllers. Symmetry 2023, 15, 467. [Google Scholar] [CrossRef]
- Akiba, T. , Sano, S., Yanase, T., Ohta, T., and Koyama, M. (2019, July). Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2623–2631).
- Akiba, T. , Shing, M., Tang, Y., Sun, Q., and Ha, D. Evolutionary optimization of model merging recipes. Nature Machine Intelligence 2025, 1–10. [Google Scholar]
- Kadhim, Z. S. , Abdullah, H. S., and Ghathwan, K. I. Artificial Neural Network Hyperparameters Optimization: A Survey. International Journal of Online and Biomedical Engineering 2022, 18. [Google Scholar]
- Jeba, J. A. (2021). Case study of Hyperparameter optimization framework Optuna on a Multi-column Convolutional Neural Network (Doctoral dissertation, University of Saskatchewan).
- Yang, L. , and Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
- Wang, T. (2024). Multi-objective hyperparameter optimisation for edge machine learning.
- Frazier, P.I. A tutorial on Bayesian optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar]
- Hutter, F. , Kotthoff, L., and Vanschoren, J. (2019). Automated machine learning: Methods, systems, challenges (p. 219). Springer Nature.
- Jamieson, K. , and Talwalkar, A. (2016, May). Non-stochastic best arm identification and hyperparameter optimization. In Artificial intelligence and statistics (pp. 240–248). PMLR.
- Schmucker, R. , Donini, M., Zafar, M. B., Salinas, D., and Archambeau, C. Multi-objective asynchronous successive halving. arXiv, arXiv:2106.12639.
- Dong, X. , Shen, J., Wang, W., Shao, L., Ling, H., and Porikli, F. Dynamical hyperparameter optimization via deep reinforcement learning in tracking. IEEE transactions on pattern analysis and machine intelligence 2019, 43, 1515–1529. [Google Scholar] [CrossRef]
- Rijsdijk, J. , Wu, L., Perin, G., and Picek, S. Reinforcement learning for hyperparameter tuning in deep learning-based side-channel analysis. IACR Transactions on Cryptographic Hardware and Embedded Systems 2021, 2021, 677–707. [Google Scholar] [CrossRef]
- Jaafra, Y. , Laurent, J. L., Deruyver, A., and Naceur, M. S. Reinforcement learning for neural architecture search: A review. Image and Vision Computing 2019, 89, 57–66. [Google Scholar] [CrossRef]
- Afshar, R. R. , Zhang, Y., Vanschoren, J., and Kaymak, U. Automated reinforcement learning: An overview. arXiv, arXiv:2201.05000.
- Wu, J. , Chen, S., and Liu, X. Efficient hyperparameter optimization through model-based reinforcement learning. Neurocomputing 2020, 409, 381–393. [Google Scholar] [CrossRef]
- Iranfar, A. , Zapater, M., and Atienza, D. Multiagent reinforcement learning for hyperparameter optimization of convolutional neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2021, 41, 1034–1047. [Google Scholar] [CrossRef]
- He, X. , Zhao, K. , and Chu, X. AutoML: A survey of the state-of-the-art. Knowledge-based systems 2021, 212, 106622. [Google Scholar]
- Gomaa, I. , Zidane, A., Mokhtar, H. M., and El-Tazi, N. (2022). SML-AutoML: A Smart Meta-Learning Automated Machine Learning Framework.
- Khan, A. N. , Khan, Q. W., Rizwan, A., Ahmad, R., and Kim, D. H. Consensus-Driven Hyperparameter Optimization for Accelerated Model Convergence in Decentralized Federated Learning. Internet of Things 2025, 30, 101476. [Google Scholar] [CrossRef]
- Morrison, N. , and Ma, E. Y. Efficiency of machine learning optimizers and meta-optimization for nanophotonic inverse design tasks. APL Machine Learning 2025, 3. [Google Scholar] [CrossRef]
- Berdyshev, D. A. , Grachev, A. M., Shishkin, S. L., and Kozyrskiy, B. L. EEG-Reptile: An Automatized Reptile-Based Meta-Learning Library for BCIs. arXiv, arXiv:2412.19725.
- Pratellesi, C. (2025). Meta Learning for Flow Cytometry Cell Classification (Doctoral dissertation, Technische Universität Wien).
- García, C. A. , Gil-de-la-Fuente, A., Barbas, C., and Otero, A. Probabilistic metabolite annotation using retention time prediction and meta-learned projections. Journal of Cheminformatics 2022, 14, 33. [Google Scholar] [CrossRef] [PubMed]
- Deng, L. , Raissi, M., and Xiao, M. (2024). Meta-Learning-Based Surrogate Models for Efficient Hyperparameter Optimization. Authorea Preprints.
- Jae, J. , Hong, J., Choo, J., and Kwon, Y. D. (2024). Reinforcement learning to learn quantum states for Heisenberg scaling accuracy. arXiv:2412.02334.
- Upadhyay, R. , Phlypo, R., Saini, R., and Liwicki, M. Meta-Sparsity: Learning Optimal Sparse Structures in Multi-task Networks through Meta-learning. arXiv, arXiv:2501.12115.
- Paul, S. , Ghosh, S., Das, D., and Sarkar, S. K. (2025). Advanced Methodologies for Optimal Neural Network Design and Performance Enhancement. In Nature-Inspired Optimization Algorithms for Cyber-Physical Systems (pp. 403–422). IGI Global Scientific Publishing.
- Egele, R. , Mohr, F., Viering, T., and Balaprakash, P. The unreasonable effectiveness of early discarding after one epoch in neural network hyperparameter optimization. Neurocomputing 2024, 127964. [Google Scholar] [CrossRef]
- Wojciuk, M. , Swiderska-Chadaj, Z., Siwek, K., and Gertych, A. Improving classification accuracy of fine-tuned CNN models: Impact of hyperparameter optimization. Heliyon 2024, 10. [Google Scholar] [CrossRef] [PubMed]
- Geissler, D. , Zhou, B., Suh, S., and Lukowicz, P. Spend More to Save More (SM2): An Energy-Aware Implementation of Successive Halving for Sustainable Hyperparameter Optimization. arXiv, arXiv:2412.08526.
- Hosseini Sarcheshmeh, A. , Etemadfard, H., Najmoddin, A., and Ghalehnovi, M. Hyperparameters’ role in machine learning algorithm for modeling of compressive strength of recycled aggregate concrete. Innovative Infrastructure Solutions 2024, 9, 212. [Google Scholar] [CrossRef]
- Sankar, S. U. , Dhinakaran, D., Selvaraj, R., Verma, S. K., Natarajasivam, R., and Kishore, P. P. (2024). Optimizing diabetic retinopathy disease prediction using PNAS, ASHA, and transfer learning. In Advances in Networks, Intelligence and Computing (pp. 62–71). CRC Press.
- Zhang, X. , and Duh, K. (2024, September). Best Practices of Successive Halving on Neural Machine Translation and Large Language Models. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track) (pp. 130–139).
- Aach, M. , Sarma, R., Neukirchen, H., Riedel, M., and Lintermann, A. Resource-Adaptive Successive Doubling for Hyperparameter Optimization with Large Datasets on High-Performance Computing Systems. arXiv, arXiv:2412.02729.
- Jang, D. , Yoon, H., Jung, K., and Chung, Y. D. QHB+: Accelerated Configuration Optimization for Automated Performance Tuning of Spark SQL Applications. IEEE Access 2024. [Google Scholar] [CrossRef]
- Chen, Y. , Wen, Z., Chen, J., and Huang, J. (2024, May). Enhancing the Performance of Bandit-based Hyperparameter Optimization. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) (pp. 967–980). IEEE.
- Zhang, Y. , Wu, H., and Yang, Y. FlexHB: A More Efficient and Flexible Framework for Hyperparameter Optimization. arXiv, arXiv:2402.13641.
- Srivastava, N. Improving neural networks with dropout. University of Toronto 2013, 182, 7. [Google Scholar]
- Baldi, P. , and Sadowski, P.J. Understanding dropout. Advances in neural information processing systems 2013, 26. [Google Scholar]
- Gal, Y. , and Ghahramani, Z. (2016, June). Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning (pp. 1050–1059). PMLR.
- Gal, Y. , Hron, J., Kendall, A. Concrete dropout. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Gal, Y. , and Ghahramani, Z. A theoretically grounded application of dropout in recurrent neural networks. Advances in neural information processing systems 2016, 29. [Google Scholar]
- Friedman, J. H. , Hastie, T., and Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 2010, 33, 1–22. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Meinshausen, N. Relaxed lasso. Computational Statistics and Data Analysis 2007, 52, 374–393. [Google Scholar] [CrossRef]
- Carvalho, C. M. , Polson, N. G., and Scott, J. G. (2009, April). Handling sparsity via the horseshoe. In Artificial intelligence and statistics (pp. 73–80). PMLR.
- Hoerl, A. E. , and Kennard, R. W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
- Cesa-Bianchi, N. , Conconi, A., and Gentile, C. On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory 2004, 50, 2050–2057. [Google Scholar] [CrossRef]
- Devroye, L. , Györfi, L., and Lugosi, G. (2013). A probabilistic theory of pattern recognition (Vol. 31). Springer Science and Business Media.
- Abu-Mostafa, Y. S. , Magdon-Ismail, M., and Lin, H. T. (2012). Learning from data (Vol. 4, p. 4). New York: AMLBook.
- Shalev-Shwartz, S. , and Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press.
- Bühlmann, P. , and Van De Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Springer Science and Business Media.
- Gareth, J. , Daniela, W., Trevor, H., and Robert, T. (2013). An introduction to statistical learning: With applications in R. Spinger.
- Efron, B. , Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression.
- Fan, J. , and Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Meinshausen, N. , and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso.
- Montavon, G. , Orr, G., and Müller, K. R. (Eds.). (2012). Neural networks: Tricks of the trade (Vol. 7700). springer.
- Prechelt, L. (2002). Early stopping-but when?. In Neural Networks: Tricks of the trade (pp. 55–69). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Brownlee, J. Develop deep learning models on theano and TensorFlow using keras. J Chem Inf Model 2019, 53, 1689–1699. [Google Scholar]
- Zhang, H. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Shorten, C. , and Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. Journal of big data 2019, 6, 1–48. [Google Scholar]
- Perez, L. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
- Cubuk, E. D. , Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. Autoaugment: Learning augmentation policies from data. arXiv, arXiv:1805.09501.
- Domingos, P. A few useful things to know about machine learning. Communications of the ACM 2012, 55, 78–87. [Google Scholar] [CrossRef]
- Stone, M. Cross-validatory choice and assessment of statistical predictions. Journal of the royal statistical society: Series B (Methodological) 1974, 36, 111–133. [Google Scholar] [CrossRef]
- LeCun, Y. , Denker, J., and Solla, S. Optimal brain damage. Advances in neural information processing systems 1989, 2. [Google Scholar]
- Li, H. , Kadav, A., Durdanovic, I., Samet, H., and Graf, H. P. Pruning filters for efficient convnets. arXiv 2016, arXiv:1608.08710. [Google Scholar]
- Frankle, J. , and Carbin, M. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv, arXiv:1803.03635.
- Han, S. , Pool, J., Tran, J., and Dally, W. Learning both weights and connections for efficient neural network. Advances in neural information processing systems 2015, 28. [Google Scholar]
- Liu, Z. , Sun, M., Zhou, T., Huang, G., and Darrell, T. Rethinking the value of network pruning. arXiv 2018, arXiv:1810.05270. [Google Scholar]
- Cheng, Y. , Wang, D., Zhou, P., and Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv 2017, arXiv:1710.09282. [Google Scholar]
- Frankle, J. , Dziugaite, G. K., Roy, D. M., and Carbin, M. Pruning neural networks at initialization: Why are we missing the mark? arXiv 2020, arXiv:2009.08576. [Google Scholar]
- Breiman, L. Bagging predictors. Machine learning 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Machine learning 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Freund, Y. , and Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Friedman, J. H. Greedy function approximation: A gradient boosting machine. Annals of statistics 2001, 1189–1232. [Google Scholar] [CrossRef]
- Zhou, Z. H. (2025). Ensemble methods: Foundations and algorithms. CRC press.
- Dietterich, T. G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning 2000, 40, 139–157. [Google Scholar] [CrossRef]
- Chen, T. , and Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785–794).
- Bühlmann, P. , and Yu, B. Boosting with the L 2 loss: Regression and classification. Journal of the American Statistical Association 2003, 98, 324–339. [Google Scholar] [CrossRef]
- Hinton, G. E. , and Van Camp, D. (1993, August). Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory (pp. 5–13).
- Bishop, C. M. Training with noise is equivalent to Tikhonov regularization. Neural computation 1995, 7, 108–116. [Google Scholar] [CrossRef]
- Grandvalet, Y. , and Bengio, Y. Semi-supervised learning by entropy minimization. Advances in neural information processing systems 2004, 17. [Google Scholar]
- Wager, S. , Wang, S., and Liang, P. S. Dropout training as adaptive regularization. Advances in neural information processing systems 2013, 26. [Google Scholar]
- Pei, Z. , Zhang, Z., Chen, J., Liu, W., Chen, B., Huang, Y., ... and Lu, Y. KAN–CNN: A Novel Framework for Electric Vehicle Load Forecasting with Enhanced Engineering Applicability and Simplified Neural Network Tuning. Electronics 2025, 14, 414. [Google Scholar] [CrossRef]
- Chen, H. (2024). Augmenting image data using noise, rotation and shifting.
- An, D. , Liu, P., Feng, Y., Ding, P., Zhou, W., and Yu, B. Dynamic weighted knowledge distillation for brain tumor segmentation. Pattern Recognition 2024, 155, 110731. [Google Scholar] [CrossRef]
- Song, Y. F. , and Liu, Y. Fast adversarial training method based on data augmentation and label noise. Journal of Computer Applications 2024, 0. [Google Scholar]
- Hosseini, S. A. , Servaes, S., Rahmouni, N., Therriault, J., Tissot, C., Macedo, A. C., ... and Rosa-Neto, P. Leveraging T1 MRI Images for Amyloid Status Prediction in Diverse Cognitive Conditions Using Advanced Deep Learning Models. Alzheimer’s and Dementia 2024, 20, e094153. [Google Scholar] [CrossRef]
- Cakmakci, U. B. Deep Learning Approaches for Pediatric Bone Age Prediction from Hand Radiographs.
- Surana, A. V. , Pawar, S. E., Raha, S., Mali, N., and Mukherjee, T. Ensemble fine tuned multi layer perceptron for predictive analysis of weather patterns and rainfall forecasting from satellite data. ICTACT Journal on Soft Computing 2024, 15. [Google Scholar] [CrossRef]
- Chanda, A. An In-Depth Analysis of CIFAR-100 Using Inception v3.
- Zaitoon, R. , Mohanty, S. N., Godavarthi, D., and Ramesh, J. V. N. (2024). SPBTGNS: Design of an Efficient Model for Survival Prediction in Brain Tumour Patients using Generative Adversarial Network with Neural Architectural Search Operations. IEEE Access 2024. [Google Scholar] [CrossRef]
- Bansal, A. , Sharma, D. R., and Kathuria, D. M. Bayesian-Optimized Ensemble Approach for Fall Detection: Integrating Pose Estimation with Temporal Convolutional and Graph Neural Networks. Available at SSRN 4974349.
- Kusumaningtyas, E. M. , Ramadijanti, N., and Rijal, I. H. K. (2024, August). Convolutional Neural Network Implementation with MobileNetV2 Architecture for Indonesian Herbal Plants Classification in Mobile App. In 2024 International Electronics Symposium (IES) (pp. 521–527). IEEE.
- Yadav, A. C. , Alam, Z., and Mufeed, M. (2024, August). U-Net-Driven Advancements in Breast Cancer Detection and Segmentation. In 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT) (Vol. 1, pp. 1–6). IEEE.
- Alshamrani, A. F. A. , and Alshomran, F. (2024). Optimizing Breast Cancer Mammogram Classification through a Dual Approach: A Deep Learning Framework Combining ResNet50, SMOTE, and Fully Connected Layers for Balanced and Imbalanced Data. IEEE Access 2024. [Google Scholar]
- Zamindar, N. (2024). Using Artificial Intelligence for Thermographic Image Analysis: Applications to the Arc Welding Process (Doctoral dissertation, Politecnico di Torino).
- Xu, M. , Yin, H., and Zhong, S. (2024, July). Enhancing Generalization and Convergence in Neural Networks through a Dual-Phase Regularization Approach with Excitatory-Inhibitory Transition. In 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET (pp. 1–4). IEEE.
- Elshamy, R. , Abu-Elnasr, O., Elhoseny, M., and Elmougy, S. Enhancing colorectal cancer histology diagnosis using modified deep neural networks optimizer. Scientific Reports 2024, 14, 19534. [Google Scholar] [CrossRef] [PubMed]
- Vinay, K. , Kodipalli, A., Swetha, P., and Kumaraswamy, S. (2024, May). Analysis of prediction of pneumonia from chest X-ray images using CNN and transfer learning. In 2024 5th International Conference for Emerging Technology (INCET) (pp. 1–6). IEEE.
- Gai, S. , and Huang, X. Regularization method for reduced biquaternion neural network. Applied Soft Computing 2024, 166, 112206. [Google Scholar] [CrossRef]
- Xu, Y. Deep regularization techniques for improving robustness in noisy record linkage task. Advances in Engineering Innovation 2025, 15, 9–13. [Google Scholar] [CrossRef]
- Liao, Z. , Li, S., Zhou, P., and Zhang, C. Decay regularized stochastic configuration networks with multi-level data processing for UAV battery RUL prediction. Information Sciences 2025, 701, 121840. [Google Scholar] [CrossRef]
- Dong, Z. , Yang, C., Li, Y., Huang, L., An, Z., and Xu, Y. (2024, May). Class-wise Image Mixture Guided Self-Knowledge Distillation for Image Classification. In 2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD) (pp. 310–315). IEEE.
- Ba, Y. , Mancenido, M. V., and Pan, R. How Does Data Diversity Shape the Weight Landscape of Neural Networks? arXiv 2024, arXiv:2410.14602. [Google Scholar]
- Li, Z. , Zhang, Y., and Li, W. (2024, September). Fusion of L2 Regularisation and Hybrid Sampling Methods for Multi-Scale SincNet Audio Recognition. In 2024 IEEE 7th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 7, pp. 1556–1560). IEEE.
- Zang, X. , and Yan, A. (2024, May). A Stochastic Configuration Network with Attenuation Regularization and Multi-kernel Learning and Its Application. In 2024 36th Chinese Control and Decision Conference (CCDC) (pp. 2385–2390). IEEE.
- Moradi, R. , Berangi, R., and Minaei, B. A survey of regularization strategies for deep models. Artificial Intelligence Review 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
- Rodríguez, P. , Gonzalez, J., Cucurull, G., Gonfaus, J. M., and Roca, X. Regularizing cnns with locally constrained decorrelations. arXiv 2016, arXiv:1611.01967. [Google Scholar]
- Tian, Y. , and Zhang, Y. A comprehensive survey on regularization strategies in machine learning. Information Fusion 2022, 80, 146–166. [Google Scholar] [CrossRef]
- Cong, Y. , Liu, J., Fan, B., Zeng, P., Yu, H., and Luo, J. Online similarity learning for big data with overfitting. IEEE Transactions on Big Data 2017, 4, 78–89. [Google Scholar] [CrossRef]
- Salman, S. , and Liu, X. Overfitting mechanism and avoidance in deep neural networks. arXiv 2019, arXiv:1901.06566. [Google Scholar]
- Wang, K. , Muthukumar, V., and Thrampoulidis, C. Benign overfitting in multiclass classification: All roads lead to interpolation. Advances in Neural Information Processing Systems 2021, 34, 24164–24179. [Google Scholar]
- Poggio, T. , Kawaguchi, K., Liao, Q., Miranda, B., Rosasco, L., Boix, X., ... and Mhaskar, H. Theory of deep learning III: Explaining the non-overfitting puzzle. arXiv, arXiv:1801.00173.
- Oyedotun, O. K. , Olaniyi, E. O., and Khashman, A. A simple and practical review of over-fitting in neural network learning. International Journal of Applied Pattern Recognition 2017, 4, 307–328. [Google Scholar] [CrossRef]
- Luo, X. , Chang, X., and Ban, X. Regression and classification using extreme learning machine based on L1-norm and L2-norm. Neurocomputing 2016, 174, 179–186. [Google Scholar] [CrossRef]
- Zhou, Y. , Yang, Y., Wang, D., Zhai, Y., Li, H., and Xu, Y. Innovative Ghost Channel Spatial Attention Network with Adaptive Activation for Efficient Rice Disease Identification. Agronomy 2024, 14, 2869. [Google Scholar] [CrossRef]
- Omole, O. J. , Rosa, R. L., Saadi, M., and Rodriguez, D. Z. AgriNAS: Neural Architecture Search with Adaptive Convolution and Spatial–Time Augmentation Method for Soybean Diseases. AI 2024, 5, 2945–2966. [Google Scholar] [CrossRef]
- Tripathi, L. , Dubey, P., Kalidoss, D., Prasad, S., Sharma, G., and Dubey, P. (2024, December). Deep Learning Approaches for Brain Tumour Detection Using VGG-16 Architecture. In 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 256–261). IEEE.
- Singla, S. , and Gupta, R. (2024, December). Pneumonia Detection from Chest X-Ray Images Using Transfer Learning with EfficientNetB1. In 2024 International Conference on IoT Based Control Networks and Intelligent Systems (ICICNIS) (pp. 894–899). IEEE.
- Al-Adhaileh, M. H. , Alsharbi, B. M., Aldhyani, T., Ahmad, S., Almaiah, M., Ahmed, Z. A., ... and Singh, S. DLAAD-Deep Learning Algorithms Assisted Diagnosis of Chest Disease Using Radiographic Medical Images. Frontiers in Medicine 2025, 11, 1511389. [Google Scholar] [CrossRef]
- Harvey, E. , Petrov, M., and Hughes, M.C. Learning Hyperparameters via a Data-Emphasized Variational Objective. arXiv, arXiv:2502.01861.
- Mahmood, T. , Saba, T., Al-Otaibi, S., Ayesha, N., and Almasoud, A. S. (2025). AI-Driven Microscopy: Cutting-Edge Approach for Breast Tissue Prognosis Using Microscopic Images. Microscopy Research and Technique.
- Shen, Q. Predicting the value of football players: Machine learning techniques and sensitivity analysis based on FIFA and real-world statistical datasets. Applied Intelligence 2025, 55, 265. [Google Scholar] [CrossRef]
- Guo, X. , Wang, M., Xiang, Y., Yang, Y., Ye, C., Wang, H., and Ma, T. (2025). Uncertainty Driven Adaptive Self-Knowledge Distillation for Medical Image Segmentation. IEEE Transactions on Emerging Topics in Computational Intelligence 2025. [Google Scholar] [CrossRef]
- Zambom, A. Z. , and Dias, R. A review of kernel density estimation with applications to econometrics. International Econometric Review 2013, 5, 20–42. [Google Scholar]
- Reyes, M. , Francisco-Fernández, M., and Cao, R. Nonparametric kernel density estimation for general grouped data. Journal of Nonparametric Statistics 2016, 28, 235–249. [Google Scholar] [CrossRef]
- Tenreiro, C. A Parzen–Rosenblatt type density estimator for circular data: Exact and asymptotic optimal bandwidths. Communications in Statistics-Theory and Methods 2024, 53, 7436–7452. [Google Scholar] [CrossRef]
- Devroye, L. , and Penrod, C. S. The consistency of automatic kernel density estimates. The Annals of Statistics 1984, 1231–1249. [Google Scholar]
- El Machkouri, M. Asymptotic normality of the Parzen–Rosenblatt density estimator for strongly mixing random fields. Statistical Inference for Stochastic Processes 2011, 14, 73–84. [Google Scholar] [CrossRef]
- Slaoui, Y. Bias reduction in kernel density estimation. Journal of Nonparametric Statistics 2018, 30, 505–522. [Google Scholar] [CrossRef]
- Michalski, A. The use of kernel estimators to determine the distribution of groundwater level. Meteorology Hydrology and Water Management. Research and Operational Applications 2016, 4, 41–46. [Google Scholar]
- Gramacki, A. , and Gramacki, A. Kernel density estimation. Nonparametric Kernel Density Estimation and Its Computational Aspects 2018, 25–62. [Google Scholar]
- Desobry, F. , Davy, M., and Fitzgerald, W. J. (2007, April). Density kernels on unordered sets for kernel-based signal processing. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07 (Vol. 2, pp. II–417). IEEE.
- Gasser, T. , H. G. (1979). Kernel estimation of regression functions. Kernelestimationofregressionfunctions.InSmoothing Techniques for Curve Estimation: Proceedings of a Workshop held in Heidelberg, April 2–4, 1979 (pp.23-68). Springer Berlin Heidelberg.
- Gasser, T. , and Müller, H. G. Estimating regression functions and their derivatives by the kernel method. Scandinavian journal of statistics 1984, 171–185. [Google Scholar]
- Härdle, W. , and Gasser, T. On robust kernel estimation of derivatives of regression functions. Scandinavian journal of statistics 1985, 233–240. [Google Scholar]
- Müller, H. G. Weighted local regression and kernel methods for nonparametric curve fitting. Journal of the American Statistical Association 1987, 82, 231–238. [Google Scholar]
- Chu, C. K. A new version of the Gasser-Mueller estimator. Journal of Nonparametric Statistics 1993, 3, 187–193. [Google Scholar] [CrossRef]
- Peristera, P. , and Kostaki, A. An evaluation of the performance of kernel estimators for graduating mortality data. Journal of Population Research 2005, 22, 185–197. [Google Scholar] [CrossRef]
- Müller, H. G. Smooth optimum kernel estimators near endpoints. Biometrika 1991, 78, 521–530. [Google Scholar] [CrossRef]
- Gasser, T. , Gervini, D., Molinari, L., Hauspie, R. C., and Cameron, N. Kernel estimation, shape-invariant modelling and structural analysis. Cambridge Studies in Biological and Evolutionary Anthropology 2024, 179–204. [Google Scholar]
- Jennen-Steinmetz, C. , and Gasser, T. A unifying approach to nonparametric regression estimation. Journal of the American Statistical Association 1988, 83, 1084–1089. [Google Scholar] [CrossRef]
- Müller, H. G. Density adjusted kernel smoothers for random design nonparametric regression. Statistics and probability letters 1997, 36, 161–172. [Google Scholar] [CrossRef]
- Neumann, M. H. , and Thorarinsdottir, T. L. Asymptotic minimax estimation in nonparametric autoregression. Mathematical Methods of Statistics 2006, 15, 374. [Google Scholar]
- Steland, A. The average run length of kernel control charts for dependent time series.
- Makkulau, A. T. A. , Baharuddin, M. , and Agusrawati, A. T. P. M. (2023, December). Multivariable Semiparametric Regression Used Priestley-Chao Estimators. In Proceedings of the 5th International Conference on Statistics, Mathematics, Teaching,and Research 2023 (ICSMTR 2023) (Vol. 109, p.118). Springer Nature.
- Staniswalis, J. G. The kernel estimate of a regression function in likelihood-based models. Journal of the American Statistical Association 1989, 84, 276–283. [Google Scholar] [CrossRef]
- Mack, Y. P. , and Müller, H. G. Convolution type estimators for nonparametric regression. Statistics and probability letters 1988, 7, 229–239. [Google Scholar] [CrossRef]
- Jones, M. C. , Davies, S. J., and Park, B. U. Versions of kernel-type regression estimators. Journal of the American Statistical Association 1994, 89, 825–832. [Google Scholar] [CrossRef]
- Ghosh, S. Surface estimation under local stationarity. Journal of Nonparametric Statistics 2015, 27, 229–240. [Google Scholar] [CrossRef]
- Liu, C. W. , and Luor, D. C. Applications of fractal interpolants in kernel regression estimations. Chaos, Solitons and Fractals 2023, 175, 113913. [Google Scholar] [CrossRef]
- Agua, B. M. , and Bouzebda, S. Single index regression for locally stationary functional time series. AIMS Math 2024, 9, 36202–36258. [Google Scholar] [CrossRef]
- Bouzebda, S. , Nezzal, A., and Elhattab, I. Limit theorems for nonparametric conditional U-statistics smoothed by asymmetric kernels. AIMS Mathematics 2024, 9, 26195–26282. [Google Scholar] [CrossRef]
- Zhao, H. , Qian, Y., and Qu, Y. Mechanical performance degradation modelling and prognosis method of high-voltage circuit breakers considering censored data. IET Science, Measurement and Technology 2025, 19, e12235. [Google Scholar] [CrossRef]
- Patil, M. D. , Kannaiyan, S., and Sarate, G. G. Signal denoising based on bias-variance of intersection of confidence interval. Signal, Image and Video Processing 2024, 18, 8089–8103. [Google Scholar] [CrossRef]
- Kakani, K. , and Radhika, T. S. L. Nonparametric and nonlinear approaches for medical data analysis. International Journal of Data Science and Analytics 2024, 1–19. [Google Scholar]
- Kato, M. Debiased Regression for Root-N-Consistent Conditional Mean Estimation. arXiv 2024, arXiv:2411.11748. [Google Scholar]
- Sadek, A. M. , and Mohammed, L. A. Evaluation of the Performance of Kernel Non-parametric Regression and Ordinary Least Squares Regression. JOIV: International Journal on Informatics Visualization 2024, 8, 1352–1360. [Google Scholar] [CrossRef]
- Gong, A. , Choi, K., and Dwivedi, R. Supervised Kernel Thinning. arXiv 2024, arXiv:2410.13749. [Google Scholar]
- Zavatone-Veth, J. A. , and Pehlevan, C. Nadaraya–Watson kernel smoothing as a random energy model. Journal of Statistical Mechanics: Theory and Experiment 2025, 2025, 013404. [Google Scholar] [CrossRef]
- Ferrigno, S. (2024, December).Nonparametric estimation of reference curves. In CMStatistics 2024.
- Fan, X. , Leng, C., and Wu, W. Causal Inference under Interference: Regression Adjustment and Optimality. arXiv, arXiv:2502.06008.
- Atanasov, A. , Bordelon, B., Zavatone-Veth, J. A., Paquette, C., and Pehlevan, C. Two-Point Deterministic Equivalence for Stochastic Gradient Dynamics in Linear Models. arXiv, arXiv:2502.05074.
- Mishra, U. , Gupta, D., Sarkar, A., and Hazarika, B. B. A hybrid approach for plant leaf detection using ResNet50-intuitionistic fuzzy RVFL (ResNet50-IFRVFLC) classifier. Computers and Electrical Engineering 2025, 123, 110135. [Google Scholar] [CrossRef]
- Elsayed, M. M. , and Nazier, H. (2025). Technology and evolution of occupational employment in Egypt (1998–2018): A task-based framework. Review of Economics and Political Science.
- Kong, X. , Li, C., and Pan, Y. Association Between Heavy Metals Mixtures and Life’s Essential 8 Score in General US Adults. Cardiovascular Toxicology 2025, 1–12. [Google Scholar]
- Bracale, D. , Banerjee, M., Sun, Y., Stoll, K., and Turki, S. Dynamic Pricing in the Linear Valuation Model using Shape Constraints. arXiv 2025, arXiv:2502.05776. [Google Scholar]
- Köhne, F. , Philipp, F. M., Schaller, M., Schiela, A., and Worthmann, K. L∞-error bounds for approximations of the Koopman operator by kernel extended dynamic mode decomposition. arXiv, arXiv:2403.18809.
- Sadeghi, R. , and Beyeler, M. Efficient Spatial Estimation of Perceptual Thresholds for Retinal Implants via Gaussian Process Regression. arXiv 2025, arXiv:2502.06672. [Google Scholar]
- Naresh, E. , A., and Bhuvan, S. (2025, February). Enhancing network security with eBPF-based firewall and machine learning. Data Science and Exploration in Artificial Intelligence: Proceedings of the First International Conference On Data Science and Exploration in Artificial Intelligence (CODE-AI 2024) Bangalore, India, 3rd-4th July, 2024 (Volume 1) (p.169). CRC Press.
- Zhao, W. , Chen, H., Liu, T., Tuo, R., and Tian, C. From Deep Additive Kernel Learning to Last-Layer Bayesian Neural Networks via Induced Prior Approximation. In The 28th International Conference on Artificial Intelligence and Statistics.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
