Submitted:
02 September 2025
Posted:
04 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Preliminaries and Theoretical Foundations
3.1. Convolutional Neural Networks
3.2. Pooling Strategies: Formal Definitions
MaxPooling:
AveragePooling:
MedianPooling:
MinPooling:
KernelPooling.
| Algorithm 1:KernelPooling ( kernel, stride s) |
|
Input: Feature map ; learnable kernels
Output: Downsampled feature map y
fortoCdo
for over grid with stride sdo
|
4. ECA110-Based Pooling Mechanism for CNNs
4.1. Definition of ECA110-Pooling
4.2. Algorithm: ECA110-Pooling (Elementary Cellular Automaton Rule 110 Operator)
| Algorithm 2:ECA110-Pooling (window , stride s) |
|
Input: Output: fortoCdo for over spatial grid with stride sdo //Indicator-based binarization at threshold //Apply one (or T) Rule-110 evolution step(s) //Normalized-sum reduction (mean of transformed states) |
4.3. Main Steps of ECA110-Pooling on Window
- 1.
- Flattening: The activations are extracted and arranged into a one-dimensional vector .
- 2.
- Binarization: Elements of are mapped into binary states via thresholding (relative to mean or median), producing .
- 3.
- Application of Rule 110: The binary sequence evolves according to the local update rule , generating .
- 4.
- Reduction: The normalized sum of provides the final pooled activation.
Illustrative Example of ECA110 Pooling on a window.
- 1.
- Flattening. The window is rearranged into a one-dimensional vector:
- 2.
- Binarization. Using the mean of the window () as threshold, values are mapped to 1 and those to 0:
- 3.
- Application of Rule 110. The binary sequence is evolved according to the update function . For illustration, after one iteration step the transformed sequence is:
- 4.
- Reduction by normalized summation. The final pooled activation is computed as:
- Normalized Sum (Mean Pooling):
- Maximum / Minimum Reduction:
- Weighted Mean Reduction: with learnable or fixed weights :
- Median Reduction:
- -Norm Reduction: where corresponds to the mean and approaches the maximum.
- Entropy-Based Reduction: measuring the diversity of the transformed activations: where ϵ is a small constant for numerical stability.
- Learnable Reduction (Attention / MLP): In more flexible designs, y can be obtained through a learnable function such as attention pooling or a small neural layer , where g is trained jointly with the CNN backbone.
5. Experimental Framework
5.1. Datasets, Data Splits, and Training Setup
5.2. CNN - Network Architecture
- A first convolutional layer with 64 filters of size , followed by the evaluated pooling operator.
- A second convolutional layer with 128 filters of size , again followed by the selected pooling operator.
- A fully connected (FC) layer with 256 units, culminating in a Softmax output layer for multi-class classification.
5.3. Algorithmic Framework
- 1.
-
Training and Evaluation Algorithm. Algorithm 3 details the training procedure of a CNN with interchangeable pooling operators . Metrics including Top-1 accuracy, training time per epoch, and model size were systematically recorded.The components and functionality of the Algorithm 3 can be described as follows:
- −
- Inputs and Outputs: The algorithm takes as input datasets (collections of labeled samples used for training and validation), number of classes K, pooling type , training epochs E, and optimization hyperparameters: learning rate , momentum m, and weight decay . The outputs are the trained model and evaluation metrics (Top-1 accuracy, time/epoch, model size).
- −
- Network Initialization: The CNN backbone is fixed across experiments, differing only in the pooling operator: Conv1 (, 64 filters) → ReLU → Pool1() → Conv2 (,128 filters) → ReLU → Pool2() → Flatten → FC(256) → ReLU → FC(K) → Softmax. ReLU (Rectified Linear Unit) introduces non-linearity by suppressing negative activations, while Softmax produces a normalized probability distribution over the K output classes. The pooling operator P, applied with window size k and stride s, represents the sole interchangeable element within the otherwise fixed backbone. This design ensures that observed performance variations are attributable primarily to the pooling mechanism rather than architectural or parametric differences.
- −
- Optimization: Training uses SGD with learning rate , momentum m, and weight decay . The cross-entropy loss is computed as with standard gradient update steps (zero_grad(), backward(), step()). The called functions here have the following attributes- CrossEntropy(): computes the negative log-likelihood loss on one-hot labels or class indices. zero_grad(): resets all previously accumulated gradients; backward(): performs backpropagation, accumulating gradients with respect to network parameters; step(): updates model parameters using the SGD rule with momentum and weight decay. Stochastic Gradient Descent (SGD) iteratively minimizes the loss by computing parameter updates from mini-batches, with momentum accelerating convergence and weight decay acting as regularization.
- −
- Training Loop: For each epoch : (i) batches are forwarded through the network (Forward()), loss is computed and parameters updated; (ii) validation is performed on , logging Top-1 accuracy, time/epoch, and model size.
- 2.
- Forward Pass. Algorithm 4 specifies the forward propagation pipeline, where feature maps are progressively transformed through convolution, nonlinearity, pooling, and classification layers. The function Forward() applies: (i) convolutional layers (Conv2D with filters) for local feature extraction; (ii) ReLU activations to introduce non-linearity by suppressing negative responses; (iii) the interchangeable pooling operator P with window and stride , responsible for spatial downsampling; (iv) flattening of feature maps into a vector representation; (v) fully connected layers (FC) for global integration of features, followed by a final Softmax that converts logits into class probabilities , where logits denote the raw, unnormalized outputs of the final fully connected layer. The pooling operator Pool() supports three cases: (a) standard operators (max, average, median, min), (b) learnable weighted aggregation in KernelPooling, and (c) a transform–reduce scheme in ECA110, which consists of flattening, binarization via , evolution under Rule 110, and normalized-sum reduction. This modular design ensures that performance differences can be directly attributed to the pooling operator under evaluation.
- 3.
- Pooling Operator. Algorithm 5 defines the pooling layer implementation, where the input tensor has four components: B denotes the batch size, C the number of channels, H the height, and W the width of the feature maps. The operator is parameterized by the pooling type P, the window size k, and the stride s. For standard operators (), the algorithm applies the corresponding reduction function over each local region. In the case of KernelPooling, each channel c is associated with a learnable kernel , enabling adaptive weighted aggregation of local activations. For the proposed ECA110-Pooling, a transform–reduce framework is employed: each local window is flattened into a vector , binarized through a threshold , and then evolved for T iterations under Elementary Cellular Automaton Rule 110. The transformed sequence is subsequently reduced via normalized summation, producing the scalar output for each spatial location. Unless otherwise specified, the our default choice is . This unified formulation allows the Pool() function to encompass conventional, learnable, and automaton-driven mechanisms within a single modular framework, facilitating rigorous and fair comparisons across pooling strategies.
| Algorithm 3:Training and Evaluating a CNN with a Pluggable Pooling Operator |
|
Input: Datasets , number of classes K, pooling type , epochs E, learning rate , momentum m, weight decay
Output: Trained model; metrics: Top-1 accuracy, time/epoch, model size
Network initialization: Conv1: , 64 filters; ReLU Pool1: P with window , stride Conv2: , 128 filters; ReLU Pool2: P with window , stride Flatten → FC(256) → ReLU → FC(K) → Softmax
Optimization setup: SGD; loss: CrossEntropy
fortoEdo
//— Training phase —foreachbatch in do zero_grad(); backward(); step() //— Validation phase — Evaluate: compute Top-1 on ; log time/epoch and model size |
| Algorithm 4:Forward |
|
return |
| Algorithm 5:Pool (X has shape ) |
|
Result: U with shape
ifthen
return the corresponding standard operator with window and stride s
else
ifthen//KernelPooling: learnable weighted aggregation fordo (learnable parameters) returnU ifthen //ECA110: transform–reduce with Rule 110 and normalized sum fordo Extract window //threshold //indicator function //apply T evol. steps; by default //normalized sum (mean) returnU |
5.4. Evaluation Metrics
- Top-1 Classification Accuracy. The primary metric, reflecting the proportion of test samples for which the predicted class with the highest probability matches the ground truth. This directly measures the discriminative capacity of pooling operators in image classification.
- Error Rate. Defined as the complement of Top-1 Accuracy (), this metric emphasizes the proportion of misclassified samples and provides an intuitive measure of classification mistakes.
- F1-Score. The harmonic mean of precision and recall, balancing false positives and false negatives. This is particularly useful for datasets with class imbalance, providing a more nuanced view of predictive performance beyond raw accuracy.
- Training Time per Epoch. The average wall-clock time required to complete one training epoch, providing insight into the computational overhead introduced by each pooling strategy.
- Model Size. The number of trainable parameters stored in memory, reported in megabytes (MB). This is particularly relevant for learnable pooling mechanisms such as KernelPooling, which increase parameterization.
- Convergence Behavior. The stability and rate at which training accuracy and loss curves converge across epochs, capturing optimization dynamics under different pooling strategies.
- Statistical Significance. Observed performance differences were validated using statistical tests across multiple runs: one-way ANOVA with Tukey’s HSD post-hoc test, complemented by paired comparisons (Wilcoxon Signed-Rank and paired t-test). This ensures the robustness and reliability of comparative conclusions.
6. Experimental Results
6.1. Classification Performance Across Epochs
- ECA110-Pooling consistently surpasses MinPooling and MedianPooling across all epochs, while providing performance on par with or superior to MaxPooling and AveragePooling.
- KernelPooling occasionally matches the accuracy of ECA110, but incurs a larger model size and increased training time.
- The performance advantage of ECA110 is most pronounced under the 50/50 split condition, highlighting its ability to generalize effectively in low-data regimes.
- Long-term training schedules () stabilize the superiority of ECA110, with diminishing returns observed for standard pooling operators.
- ECA110-Pooling achieves the best overall balance across datasets, with the highest accuracy and F1-score, and the lowest error rate.
- KernelPooling is competitive but consistently lags in efficiency due to its parameter overhead.
- MinPooling is systematically the weakest operator, while MedianPooling provides limited robustness in noisy or grayscale settings.
6.2. Convergence Dynamics
6.3. Computational Complexity
- MaxPooling, AveragePooling, MedianPooling, MinPooling. Each of these operators requires evaluating all activations within the pooling window for every channel, resulting in a computational cost of MedianPooling incurs an additional constant sorting factor per window but remains of the same asymptotic order.
- KernelPooling. This operator performs a weighted sum of activations using learnable kernels of size per channel. The computational cost is therefore identical to the classical operators, but it introduces additional parameters, which increase memory usage and training time due to gradient updates.
- ECA110-Pooling. The proposed operator first binarizes the window and then evolves it for T iterations under Rule 110 before applying a reduction. This yields a per-window cost of where T is the number of automaton steps. Since T is a small constant in practice (e.g., in our experiments), the complexity remains effectively linear in , incurring only a modest constant-time overhead relative to standard pooling.
6.4. Computational Efficiency
- Standard pooling operators (Max, Average, Min, Median) are efficient in terms of both execution time and memory footprint.
- KernelPooling significantly increases model size due to its learnable kernels, which, although beneficial for accuracy in some scenarios, introduce a notable computational burden.
- ECA110-Pooling introduces only a modest and stable overhead compared to non-learnable baselines, while remaining substantially more efficient than KernelPooling.
- Importantly, ECA110’s computational overhead is invariant to dataset size, a direct consequence of its rule-driven local design.
6.5. Statistical Validation
6.6. Statistical Validation
6.7. Benchmarking ECA110-Pooling Against State-of-the-Art Methods
- On Fashion-MNIST, ECA110-Pooling not only matches but occasionally surpasses SOTA models, confirming its robustness in grayscale image classification tasks.
- On CIFAR-10, ECA110 closely approaches the performance of ResNet and DenseNet at extended training horizons (5000+ epochs), while maintaining substantially lower computational cost, positioning it as a highly competitive solution in resource-constrained settings.
- On ImageNet (subset), although large-scale models such as EfficientNet-B0 and DenseNet remain superior in absolute accuracy, ECA110 significantly narrows the gap under reduced-data splits (65/35 and 50/50). This demonstrates its strong generalization capability when training data are scarce. Importantly, this advantage is achieved with markedly lower parameterization and training time compared to heavyweight models like ResNet-50, DenseNet-121, or EfficientNet-B0.
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| ECA | Elementary Cellular Automaton |
| ECA110 | Elementary Cellular Automaton, Rule 110 |
| ECA110Pooling | Pooling operator based on ECA Rule 110 |
| SOTA | State-of-the-Art |
| Top-1 Acc. | Top-1 Accuracy |
| F1 | F1-Score (harmonic mean of precision and recall) |
| Ep. | Training Epochs |
| ANOVA | Analysis of Variance |
| HSD | Honestly Significant Difference (Tukey test) |
| ViT | Vision Transformer |
| ResNet | Residual Neural Network |
| DenseNet | Densely Connected Convolutional Network |
| EfficientNet | Efficient Convolutional Neural Network Family |
| MobileNetV2 | Mobile Network Version 2 |
| MLP | Multi-Layer Perceptron |
| FMNIST | Fashion-MNIST Dataset |
| CIFAR-10 | Canadian Institute For Advanced Research, 10 classes |
| ImageNet | Large Visual Recognition Challenge Dataset |
| MB | Megabyte (memory size) |
| s/epoch | Seconds per epoch (training time) |
| MaxPool | Maximum Pooling |
| AvgPool | Average Pooling |
| MedPool | Median Pooling |
| MinPool | Minimum Pooling |
| KerPool | Kernel-based Pooling |
References
- Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202. [CrossRef]
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. [CrossRef]
- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. [CrossRef]
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25.
- Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (pp. 818–833).
- Boureau, Y.-L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th International Conference on Machine Learning.
- Lee, C.-Y. , Gallagher, P., & Tu, Z. (2016). Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree. In Artificial Intelligence and Statistics (pp. 464–472).
- Gao, S. , et al. (2021). Kernel-based pooling in convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems.
- Zeiler, M. D. , & Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional neural networks. International Conference on Learning Representations.
- Springenberg, J. T. , Dosovitskiy, A., Brox, T., & Riedmiller, M. (2015). Striving for simplicity: The all convolutional net. International Conference on Learning Representations (ICLR).
- Wolfram, S. (1983). Statistical mechanics of cellular automata. Reviews of Modern Physics, 55(3), 601–644. [CrossRef]
- Wolfram, S. (2002). A new kind of science. Wolfram Media.
- Cook, M. (2004). Universality in elementary cellular automata. Complex Systems, 15(1), 1–40. [CrossRef]
- Gilpin, W. (2019). Cellular automata as convolutional neural networks. arXiv preprint, arXiv:1809.02942. [CrossRef]
- Mordvintsev, A., Randazzo, E., Niklasson, E., & Levin, M. (2020). Growing neural cellular automata. Distill, 5(2), e23. [CrossRef]
- Simonyan, K. , & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations.
- Szegedy, C., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. [CrossRef]
- Nasir, J. A., et al. (2021). A comprehensive review on deep learning applications in image classification. Expert Systems with Applications, 171, 114418.
- Galanis, N. I. (2022). A Roundup and Benchmark of Pooling Layer Variants. Algorithms, 15(11), 391.
- Zafar, A. (2024). Convolutional Neural Networks: A Comprehensive Analysis of Pooling Methods on MNIST, CIFAR-10, and CIFAR-100. Symmetry, 16(11), 1516.
- Han, K., et al. (2022). A Survey on Vision Transformer Models in Computer Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 87–110.
- Wang, P., et al. (2017). Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3156–3164).
- Zhang, R. (2019). Making convolutional networks shift-invariant again. In Proceedings of the 36th International Conference on Machine Learning.
- Eom, H. , & Choi, H. (2018). Alpha-Integration Pooling for Convolutional Neural Networks. arXiv preprint, arXiv:1811.03436.
- Bieder, F. , Sandkühler, R., & Cattin, P. C. (2021). Comparison of methods generalizing max- and average-pooling. arXiv preprint, arXiv:2103.01746.
- Gholamalinezhad, H. , & Khosravi, H. (2020). Pooling Methods in Deep Neural Networks: A Review. arXiv preprint, arXiv:2009.07485.
- Wang, W., et al. (2022). A Survey of Pooling Methods in Deep Learning. Applied Sciences, 12(6), 2721.
- Chen, L., et al. (2021). Attention-based pooling in convolutional neural networks. Entropy, 23(6), 751.
- Liu, Z., et al. (2024). Attention mechanisms in computer vision: A survey. Electronics (MDPI), 12(5), 1052.
- Tan, M. , & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (ICML) (pp. 6105–6114).
- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
- Elizar, E., et al. (2022). A review on multiscale-deep-learning applications. Sensors (MDPI), 22(19), 7384.
- Chen, C., & Zhang, H. (2023). Attention block based on binary pooling. Applied Sciences (MDPI), 13(18), 10012. [CrossRef]
- Liu, Y., et al. (2024). A probabilistic attention mechanism for convolutional neural networks. Sensors, accepted / in press. [CrossRef]
- Bengio, Y. , Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems, 19.
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
- Graves, A. , Mohamed, A. R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6645–6649). [CrossRef]
- Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
- Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700–4708).
- Yang, B. , Bender, G., Le, Q. V., & Ngiam, J. (2019). CondConv: Conditionally Parameterized Convolutions for Efficient Inference. arXiv preprint arXiv:1904.04971, arXiv:1904.04971.
- Howard, A. G. , Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861, arXiv:1704.04861.
- Dosovitskiy, A. , Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.






| Triplet | 111 | 110 | 101 | 100 | 011 | 010 | 001 | 000 |
|---|---|---|---|---|---|---|---|---|
| New state | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| Dataset | #Classes | #Images | Modality |
|---|---|---|---|
| ImageNet (subset) | 100 | 100,000 | RGB |
| CIFAR-10 | 10 | 60,000 | RGB |
| Fashion-MNIST | 10 | 70,000 | Grayscale |
| Case | Training Set | Testing Set |
|---|---|---|
| Case 1 | 80% | 20% |
| Case 2 | 65% | 35% |
| Case 3 | 50% | 50% |
| Metric | Description |
|---|---|
| Top-1 Accuracy | The proportion of test samples for which the predicted class with the highest posterior probability matches the ground-truth label. |
| Error Rate | The complement of accuracy, reporting the proportion of misclassified test samples (). |
| F1-Score | The harmonic mean of precision and recall, capturing a balanced trade-off between false positives and false negatives, especially valuable for imbalanced datasets. |
| Training Time per Epoch | The mean wall-clock time required to complete one training epoch, providing a measure of the computational efficiency associated with each pooling operator. |
| Model Size | The total number of trainable parameters expressed in megabytes (MB), particularly relevant for pooling operators that introduce additional learnable components (e.g., KernelPooling). |
| Convergence Behavior | The stability and rate of progression of training and validation loss/accuracy curves across epochs, reflecting learning dynamics and optimization stability. |
| Statistical Significance | Formal validation of observed performance differences using one-way ANOVA, followed by Tukey’s HSD post-hoc analysis, and complemented by non-parametric tests (Wilcoxon signed-rank) and paired t-tests. |
| Method | Split | 20 | 100 | 500 | 1000 | 5000 | 10000 | 50000 | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | ||
| MaxPooling | 80/20 | 58.2 | 41.8 | 58.0 | 65.5 | 34.5 | 65.2 | 70.3 | 29.7 | 70.1 | 72.0 | 28.0 | 71.8 | 72.8 | 27.2 | 72.6 | 73.0 | 27.0 | 72.8 | 73.1 | 26.9 | 72.9 |
| 65/35 | 56.7 | 43.3 | 56.5 | 64.2 | 35.8 | 64.0 | 69.1 | 30.9 | 68.9 | 70.8 | 29.2 | 70.6 | 71.4 | 28.6 | 71.2 | 71.6 | 28.4 | 71.4 | 71.7 | 28.3 | 71.5 | |
| 50/50 | 55.0 | 45.0 | 54.7 | 62.9 | 37.1 | 62.7 | 67.6 | 32.4 | 67.4 | 69.5 | 30.5 | 69.3 | 70.1 | 29.9 | 69.9 | 70.2 | 29.8 | 70.0 | 70.3 | 29.7 | 70.1 | |
| AveragePooling | 80/20 | 57.5 | 42.5 | 57.3 | 64.3 | 35.7 | 64.1 | 69.2 | 30.8 | 69.0 | 70.8 | 29.2 | 70.6 | 71.6 | 28.4 | 71.4 | 71.8 | 28.2 | 71.6 | 71.9 | 28.1 | 71.7 |
| 65/35 | 55.9 | 44.1 | 55.7 | 63.1 | 36.9 | 62.9 | 68.0 | 32.0 | 67.8 | 69.7 | 30.3 | 69.5 | 70.4 | 29.6 | 70.2 | 70.6 | 29.4 | 70.4 | 70.7 | 29.3 | 70.5 | |
| 50/50 | 54.2 | 45.8 | 54.0 | 61.8 | 38.2 | 61.6 | 66.6 | 33.4 | 66.4 | 68.3 | 31.7 | 68.1 | 69.0 | 31.0 | 68.8 | 69.1 | 30.9 | 68.9 | 69.2 | 30.8 | 69.0 | |
| MedianPooling | 80/20 | 58.0 | 42.0 | 57.8 | 64.8 | 35.2 | 64.6 | 69.8 | 30.2 | 69.6 | 71.2 | 28.8 | 71.0 | 71.9 | 28.1 | 71.7 | 72.0 | 28.0 | 71.8 | 72.0 | 28.0 | 71.8 |
| 65/35 | 56.4 | 43.6 | 56.2 | 63.6 | 36.4 | 63.4 | 68.5 | 31.5 | 68.3 | 70.1 | 29.9 | 69.9 | 70.8 | 29.2 | 70.6 | 70.9 | 29.1 | 70.7 | 71.0 | 29.0 | 70.8 | |
| 50/50 | 54.8 | 45.2 | 54.6 | 62.3 | 37.7 | 62.1 | 67.1 | 32.9 | 66.9 | 68.8 | 31.2 | 68.6 | 69.4 | 30.6 | 69.2 | 69.5 | 30.5 | 69.3 | 69.5 | 30.5 | 69.3 | |
| MinPooling | 80/20 | 48.6 | 51.4 | 48.0 | 54.2 | 45.8 | 53.7 | 58.7 | 41.3 | 58.3 | 60.5 | 39.5 | 60.0 | 61.0 | 39.0 | 60.5 | 61.1 | 38.9 | 60.6 | 61.2 | 38.8 | 60.7 |
| 65/35 | 47.0 | 53.0 | 46.5 | 52.8 | 47.2 | 52.4 | 57.5 | 42.5 | 57.0 | 59.3 | 40.7 | 59.0 | 59.8 | 40.2 | 59.4 | 60.0 | 40.0 | 59.6 | 60.0 | 40.0 | 59.6 | |
| 50/50 | 45.3 | 54.7 | 44.8 | 51.5 | 48.5 | 51.0 | 56.0 | 44.0 | 55.5 | 58.0 | 42.0 | 57.6 | 58.4 | 41.6 | 58.0 | 58.5 | 41.5 | 58.1 | 58.6 | 41.4 | 58.2 | |
| KernelPooling | 80/20 | 59.3 | 40.7 | 59.1 | 66.0 | 34.0 | 65.8 | 71.0 | 29.0 | 70.8 | 72.6 | 27.4 | 72.4 | 73.5 | 26.5 | 73.3 | 73.7 | 26.3 | 73.5 | 73.8 | 26.2 | 73.6 |
| 65/35 | 57.7 | 42.3 | 57.5 | 64.7 | 35.3 | 64.5 | 69.7 | 30.3 | 69.5 | 71.4 | 28.6 | 71.2 | 72.2 | 27.8 | 72.0 | 72.4 | 27.6 | 72.2 | 72.5 | 27.5 | 72.3 | |
| 50/50 | 56.1 | 43.9 | 55.9 | 63.3 | 36.7 | 63.1 | 68.3 | 31.7 | 68.1 | 70.1 | 29.9 | 69.9 | 70.9 | 29.1 | 70.7 | 71.0 | 29.0 | 70.8 | 71.1 | 28.9 | 70.9 | |
| ECA110Pooling | 80/20 | 60.0 | 40.0 | 59.8 | 66.9 | 33.1 | 66.7 | 71.7 | 28.3 | 71.5 | 73.0 | 27.0 | 72.8 | 73.8 | 26.2 | 73.6 | 74.0 | 26.0 | 73.8 | 74.1 | 25.9 | 73.9 |
| 65/35 | 58.4 | 41.6 | 58.2 | 65.6 | 34.4 | 65.4 | 70.5 | 29.5 | 70.3 | 71.9 | 28.1 | 71.7 | 72.7 | 27.3 | 72.5 | 72.9 | 27.1 | 72.7 | 73.0 | 27.0 | 72.8 | |
| 50/50 | 56.8 | 43.2 | 56.6 | 64.1 | 35.9 | 63.9 | 69.0 | 31.0 | 68.8 | 70.7 | 29.3 | 70.5 | 71.4 | 28.6 | 71.2 | 71.6 | 28.4 | 71.4 | 71.7 | 28.3 | 71.5 | |
| Method | Split | 20 | 100 | 500 | 1000 | 5000 | 10000 | 50000 | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | ||
| MaxPooling | 80/20 | 75.4 | 24.6 | 75.0 | 85.1 | 14.9 | 84.9 | 90.2 | 9.8 | 90.0 | 91.5 | 8.5 | 91.3 | 91.9 | 8.1 | 91.7 | 92.0 | 8.0 | 91.8 | 92.1 | 7.9 | 91.9 |
| 65/35 | 74.0 | 26.0 | 73.6 | 84.0 | 16.0 | 83.8 | 89.3 | 10.7 | 89.1 | 90.7 | 9.3 | 90.5 | 91.1 | 8.9 | 90.9 | 91.2 | 8.8 | 91.0 | 91.3 | 8.7 | 91.1 | |
| 50/50 | 72.6 | 27.4 | 72.2 | 82.8 | 17.2 | 82.6 | 88.4 | 11.6 | 88.2 | 89.9 | 10.1 | 89.7 | 90.3 | 9.7 | 90.1 | 90.4 | 9.6 | 90.2 | 90.5 | 9.5 | 90.3 | |
| AveragePooling | 80/20 | 74.8 | 25.2 | 74.5 | 84.6 | 15.4 | 84.4 | 89.7 | 10.3 | 89.5 | 91.1 | 8.9 | 91.0 | 91.5 | 8.5 | 91.3 | 91.6 | 8.4 | 91.4 | 91.7 | 8.3 | 91.5 |
| 65/35 | 73.3 | 26.7 | 73.0 | 83.4 | 16.6 | 83.2 | 88.8 | 11.2 | 88.6 | 90.3 | 9.7 | 90.1 | 90.7 | 9.3 | 90.5 | 90.8 | 9.2 | 90.6 | 90.9 | 9.1 | 90.7 | |
| 50/50 | 71.9 | 28.1 | 71.6 | 82.1 | 17.9 | 81.9 | 87.9 | 12.1 | 87.7 | 89.5 | 10.5 | 89.3 | 89.9 | 10.1 | 89.7 | 90.0 | 10.0 | 89.8 | 90.1 | 9.9 | 89.9 | |
| MedianPooling | 80/20 | 75.1 | 24.9 | 74.8 | 84.9 | 15.1 | 84.7 | 90.0 | 10.0 | 89.8 | 91.3 | 8.7 | 91.1 | 91.7 | 8.3 | 91.5 | 91.8 | 8.2 | 91.6 | 91.9 | 8.1 | 91.7 |
| 65/35 | 73.6 | 26.4 | 73.3 | 83.7 | 16.3 | 83.5 | 89.0 | 11.0 | 88.8 | 90.5 | 9.5 | 90.3 | 90.9 | 9.1 | 90.7 | 91.0 | 9.0 | 90.8 | 91.1 | 8.9 | 90.9 | |
| 50/50 | 72.2 | 27.8 | 71.9 | 82.5 | 17.5 | 82.3 | 88.2 | 11.8 | 88.0 | 89.7 | 10.3 | 89.4 | 90.1 | 9.9 | 89.8 | 90.2 | 9.8 | 89.9 | 90.3 | 9.7 | 90.0 | |
| MinPooling | 80/20 | 68.9 | 31.1 | 68.6 | 77.8 | 22.2 | 77.5 | 85.5 | 14.5 | 85.2 | 87.3 | 12.7 | 86.9 | 87.7 | 12.3 | 87.3 | 87.8 | 12.2 | 87.4 | 87.9 | 12.1 | 87.5 |
| 65/35 | 67.5 | 32.5 | 67.2 | 76.5 | 23.5 | 76.2 | 84.6 | 15.4 | 84.2 | 86.4 | 13.6 | 86.0 | 86.8 | 13.2 | 86.4 | 86.9 | 13.1 | 86.5 | 87.0 | 13.0 | 86.6 | |
| 50/50 | 66.0 | 34.0 | 65.6 | 75.2 | 24.8 | 74.9 | 83.8 | 16.2 | 83.4 | 85.6 | 14.4 | 85.2 | 86.0 | 14.0 | 85.6 | 86.1 | 13.9 | 85.7 | 86.2 | 13.8 | 85.8 | |
| KernelPooling | 80/20 | 76.0 | 24.0 | 75.7 | 85.5 | 14.5 | 85.3 | 90.6 | 9.4 | 90.4 | 91.9 | 8.1 | 91.7 | 92.3 | 7.7 | 92.1 | 92.4 | 7.6 | 92.2 | 92.5 | 7.5 | 92.3 |
| 65/35 | 74.5 | 25.5 | 74.2 | 84.3 | 15.7 | 84.1 | 89.7 | 10.3 | 89.5 | 91.1 | 8.9 | 90.9 | 91.5 | 8.5 | 91.3 | 91.6 | 8.4 | 91.4 | 91.7 | 8.3 | 91.5 | |
| 50/50 | 73.1 | 26.9 | 72.8 | 83.0 | 17.0 | 82.8 | 88.8 | 11.2 | 88.6 | 90.4 | 9.6 | 90.2 | 90.8 | 9.2 | 90.6 | 90.9 | 9.1 | 90.7 | 91.0 | 9.0 | 90.8 | |
| ECA110Pooling | 80/20 | 76.8 | 23.2 | 76.5 | 86.3 | 13.7 | 86.1 | 91.2 | 8.8 | 91.0 | 92.5 | 7.5 | 92.3 | 92.9 | 7.1 | 92.7 | 93.0 | 7.0 | 92.8 | 93.1 | 6.9 | 92.9 |
| 65/35 | 75.3 | 24.7 | 75.0 | 85.0 | 15.0 | 84.8 | 90.3 | 9.7 | 90.1 | 91.7 | 8.3 | 91.5 | 92.1 | 7.9 | 91.9 | 92.2 | 7.8 | 92.0 | 92.3 | 7.7 | 92.1 | |
| 50/50 | 73.9 | 26.1 | 73.6 | 83.8 | 16.2 | 83.6 | 89.4 | 10.6 | 89.2 | 91.0 | 9.0 | 90.8 | 91.4 | 8.6 | 91.2 | 91.5 | 8.5 | 91.3 | 91.6 | 8.4 | 91.4 | |
| Method | Split | 20 | 100 | 500 | 1000 | 5000 | 10000 | 50000 | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | Acc | Err | F1 | ||
| MaxPooling | 80/20 | 89.8 | 10.2 | 89.6 | 93.6 | 6.4 | 93.5 | 95.3 | 4.7 | 95.2 | 95.7 | 4.3 | 95.6 | 95.9 | 4.1 | 95.8 | 96.0 | 4.0 | 95.9 | 96.0 | 4.0 | 95.9 |
| 65/35 | 89.0 | 11.0 | 88.8 | 92.9 | 7.1 | 92.8 | 94.8 | 5.2 | 94.7 | 95.0 | 5.0 | 94.8 | 95.2 | 4.8 | 95.1 | 95.3 | 4.7 | 95.2 | 95.3 | 4.7 | 95.2 | |
| 50/50 | 88.3 | 11.7 | 88.1 | 92.2 | 7.8 | 92.1 | 94.3 | 5.7 | 94.2 | 94.4 | 5.6 | 94.3 | 94.6 | 5.4 | 94.5 | 94.6 | 5.4 | 94.5 | 94.6 | 5.4 | 94.5 | |
| AveragePooling | 80/20 | 89.5 | 10.5 | 89.3 | 93.3 | 6.7 | 93.2 | 95.0 | 5.0 | 94.9 | 95.4 | 4.6 | 95.3 | 95.6 | 4.4 | 95.5 | 95.7 | 4.3 | 95.6 | 95.7 | 4.3 | 95.6 |
| 65/35 | 88.7 | 11.3 | 88.5 | 92.6 | 7.4 | 92.5 | 94.5 | 5.5 | 94.4 | 94.7 | 5.3 | 94.6 | 94.9 | 5.1 | 94.8 | 95.0 | 5.0 | 94.9 | 95.0 | 5.0 | 94.9 | |
| 50/50 | 87.9 | 12.1 | 87.7 | 91.9 | 8.1 | 91.8 | 94.0 | 6.0 | 93.9 | 94.1 | 5.9 | 94.0 | 94.3 | 5.7 | 94.2 | 94.3 | 5.7 | 94.2 | 94.3 | 5.7 | 94.2 | |
| MedianPooling | 80/20 | 90.0 | 10.0 | 89.8 | 93.9 | 6.1 | 93.8 | 95.6 | 4.4 | 95.5 | 95.9 | 4.1 | 95.8 | 96.1 | 3.9 | 96.0 | 96.2 | 3.8 | 96.1 | 96.2 | 3.8 | 96.1 |
| 65/35 | 89.2 | 10.8 | 89.0 | 93.2 | 6.8 | 93.1 | 95.0 | 5.0 | 94.9 | 95.2 | 4.8 | 95.1 | 95.4 | 4.6 | 95.3 | 95.5 | 4.5 | 95.4 | 95.5 | 4.5 | 95.4 | |
| 50/50 | 88.5 | 11.5 | 88.3 | 92.5 | 7.5 | 92.4 | 94.5 | 5.5 | 94.4 | 94.6 | 5.4 | 94.5 | 94.8 | 5.2 | 94.7 | 94.9 | 5.1 | 94.8 | 94.9 | 5.1 | 94.8 | |
| MinPooling | 80/20 | 85.6 | 14.4 | 85.3 | 89.7 | 10.3 | 89.5 | 92.1 | 7.9 | 91.9 | 92.6 | 7.4 | 92.3 | 92.8 | 7.2 | 92.5 | 92.9 | 7.1 | 92.6 | 92.9 | 7.1 | 92.6 |
| 65/35 | 84.8 | 15.2 | 84.5 | 89.0 | 11.0 | 88.7 | 91.4 | 8.6 | 91.1 | 91.9 | 8.1 | 91.6 | 92.1 | 7.9 | 91.8 | 92.2 | 7.8 | 91.9 | 92.2 | 7.8 | 91.9 | |
| 50/50 | 84.1 | 15.9 | 83.8 | 88.3 | 11.7 | 88.0 | 90.8 | 9.2 | 90.5 | 91.3 | 8.7 | 91.0 | 91.5 | 8.5 | 91.2 | 91.5 | 8.5 | 91.2 | 91.5 | 8.5 | 91.2 | |
| KernelPooling | 80/20 | 90.4 | 9.6 | 90.2 | 94.3 | 5.7 | 94.2 | 96.0 | 4.0 | 96.0 | 96.1 | 3.9 | 96.0 | 96.2 | 3.8 | 96.1 | 96.2 | 3.8 | 96.1 | 96.2 | 3.8 | 96.1 |
| 65/35 | 89.6 | 10.4 | 89.4 | 93.6 | 6.4 | 93.5 | 95.4 | 4.6 | 95.3 | 95.5 | 4.5 | 95.4 | 95.6 | 4.4 | 95.5 | 95.6 | 4.4 | 95.5 | 95.6 | 4.4 | 95.5 | |
| 50/50 | 88.9 | 11.1 | 88.7 | 92.9 | 7.1 | 92.8 | 94.9 | 5.1 | 94.8 | 95.0 | 5.0 | 94.9 | 95.1 | 4.9 | 95.0 | 95.1 | 4.9 | 95.0 | 95.1 | 4.9 | 95.0 | |
| ECA110Pooling | 80/20 | 90.9 | 9.1 | 90.7 | 94.7 | 5.3 | 94.6 | 96.2 | 3.8 | 96.1 | 96.4 | 3.6 | 96.3 | 96.5 | 3.5 | 96.4 | 96.6 | 3.4 | 96.5 | 96.6 | 3.4 | 96.5 |
| 65/35 | 90.1 | 9.9 | 89.9 | 94.1 | 5.9 | 94.0 | 95.8 | 4.2 | 95.7 | 95.9 | 4.1 | 95.8 | 96.0 | 4.0 | 95.9 | 96.1 | 3.9 | 96.0 | 96.1 | 3.9 | 96.0 | |
| 50/50 | 89.4 | 10.6 | 89.2 | 93.4 | 6.6 | 93.3 | 95.3 | 4.7 | 95.2 | 95.5 | 4.5 | 95.4 | 95.6 | 4.4 | 95.5 | 95.7 | 4.3 | 95.6 | 95.7 | 4.3 | 95.6 | |
| Pooling Method | Top-1 Accuracy (%) | Error Rate (%) | F1-Score (%) |
|---|---|---|---|
| MaxPooling | 85.0 | 15.0 | 84.7 |
| AveragePooling | 84.5 | 15.5 | 84.2 |
| MedianPooling | 84.8 | 15.2 | 84.5 |
| MinPooling | 80.0 | 20.0 | 79.6 |
| KernelPooling | 86.0 | 14.0 | 85.8 |
| ECA110-Pooling | 87.2 | 12.8 | 87.0 |
| Pooling Method | Time Complexity | Extra Parameters | Remarks |
|---|---|---|---|
| MaxPooling | None | Selects strongest activations | |
| AveragePooling | None | Computes local averages | |
| MedianPooling | None | Requires sorting per window | |
| MinPooling | None | Selects weakest activations | |
| KernelPooling | Learnable weighted aggregation | ||
| ECA110-Pooling | None | Rule-based transform + reduction |
| Pooling Method | ImageNet (subset) | CIFAR-10 | Fashion-MNIST | |||
|---|---|---|---|---|---|---|
| Time (s/epoch) | Size (MB) | Time (s/epoch) | Size (MB) | Time (s/epoch) | Size (MB) | |
| MaxPooling | 128.0 | 4.75 | 35.2 | 4.75 | 12.4 | 4.75 |
| AveragePooling | 130.6 | 4.75 | 35.9 | 4.75 | 12.7 | 4.75 |
| MedianPooling | 162.6 | 4.75 | 44.7 | 4.75 | 15.8 | 4.75 |
| MinPooling | 127.5 | 4.75 | 35.1 | 4.75 | 12.4 | 4.75 |
| KernelPooling | 148.5 | 4.80 | 40.8 | 4.80 | 14.4 | 4.80 |
| ECA110Pooling | 133.1 | 4.76 | 36.6 | 4.76 | 12.9 | 4.76 |
| Epochs | ImageNet (subset) | CIFAR-10 | Fashion-MNIST | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 80/20 | 65/35 | 50/50 | 80/20 | 65/35 | 50/50 | 80/20 | 65/35 | 50/50 | |
| 20 | F=3.41, p=0.017, Yes | 3.12, 0.021, Yes | 2.89, 0.032, Yes | 4.89, 0.003, Yes | 4.51, 0.005, Yes | 4.18, 0.008, Yes | 3.02, 0.028, Yes | 2.81, 0.034, Yes | 2.64, 0.041, Yes |
| 100 | 4.26, 0.007, Yes | 4.05, 0.009, Yes | 3.87, 0.011, Yes | 6.72, 0.001, Yes | 6.38, 0.001, Yes | 6.05, 0.002, Yes | 4.75, 0.004, Yes | 4.51, 0.006, Yes | 4.26, 0.008, Yes |
| 500 | 6.18, 0.001, Yes | 5.91, 0.002, Yes | 5.66, 0.002, Yes | 8.05, <0.001, Yes | 7.81, <0.001, Yes | 7.54, <0.001, Yes | 6.41, 0.001, Yes | 6.12, 0.001, Yes | 5.91, 0.002, Yes |
| 1000 | 7.34, <0.001, Yes | 7.12, <0.001, Yes | 6.94, <0.001, Yes | 9.47, <0.001, Yes | 9.12, <0.001, Yes | 8.91, <0.001, Yes | 7.96, <0.001, Yes | 7.64, <0.001, Yes | 7.31, <0.001, Yes |
| 5000 | 9.11, <0.001, Yes | 8.87, <0.001, Yes | 8.65, <0.001, Yes | 11.38, <0.001, Yes | 11.03, <0.001, Yes | 10.77, <0.001, Yes | 9.88, <0.001, Yes | 9.55, <0.001, Yes | 9.11, <0.001, Yes |
| 10000 | 11.02, <0.001, Yes | 10.74, <0.001, Yes | 10.43, <0.001, Yes | 12.24, <0.001, Yes | 11.87, <0.001, Yes | 11.59, <0.001, Yes | 11.07, <0.001, Yes | 10.77, <0.001, Yes | 10.41, <0.001, Yes |
| 50000 | 12.85, <0.001, Yes | 12.45, <0.001, Yes | 12.12, <0.001, Yes | 13.10, <0.001, Yes | 12.73, <0.001, Yes | 12.41, <0.001, Yes | 12.31, <0.001, Yes | 11.98, <0.001, Yes | 11.64, <0.001, Yes |
| Dataset | Epochs | 80/20 | 65/35 | 50/50 |
|---|---|---|---|---|
| ImageNet (subset) | 20 | Max>Min (p<0.05); n.s. (ECA vs Max/Avg/Ker) | Max>Min (p<0.05); n.s. | Max>Min (p<0.05); n.s. |
| 500 | ECA>Avg (p<0.05); Max>Min (p<0.001); Ker>Min (p<0.01) | ECA>Avg (p<0.05); Max>Min (p<0.001) | ECA>Avg (p<0.05); Max>Min (p<0.001) | |
| 5000 | ECA>Max (p<0.05); ECA>Avg (p<0.01); ECA>Min (p<0.001) | ECA>Max (p<0.05); ECA>Avg (p<0.01); ECA>Min (p<0.001) | ECA>Max (p<0.05); ECA>Avg (p<0.01); ECA>Min (p<0.001) | |
| CIFAR-10 | 20 | Max>Min (p<0.01); Med>Min (p<0.05); n.s. (ECA vs Max) | Max>Min (p<0.01); Med>Min (p<0.05); n.s. | Max>Min (p<0.01); Med>Min (p<0.05); n.s. |
| 500 | ECA>Avg (p<0.01); ECA>Min (p<0.001); Max>Min (p<0.001) | same pattern | same pattern | |
| 5000 | ECA>Max (p<0.01); ECA>Avg (p<0.01); ECA>Min (p<0.001) | same pattern | same pattern | |
| Fashion-MNIST | 20 | Med>Min (p<0.05); n.s. (ECA vs Med/Max) | Med>Min (p<0.05); n.s. | Med>Min (p<0.05); n.s. |
| 500 | ECA>Min (p<0.001); Med>Min (p<0.001); Max>Min (p<0.001) | same pattern | same pattern | |
| 5000 | ECA>Max (p<0.05); ECA>Avg (p<0.01); ECA>Min (p<0.001) | same pattern | same pattern |
| Dataset | Epochs | 80/20 | 65/35 | 50/50 |
|---|---|---|---|---|
| ImageNet (subset) | 20 | ECA>Min () | ECA>Min () | ECA>Min () |
| 500 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min () | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | |
| 5,000 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min () | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | |
| CIFAR-10 | 20 | ECA>Min () | ECA>Min () | ECA>Min () |
| 500 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min () | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | |
| 5,000 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min (); ECA>Ker () | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | |
| Fashion-MNIST | 20 | ECA>Min () | ECA>Min () | ECA>Min () |
| 500 | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | ECA>Min () | |
| 5,000 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min () | ECA>Avg (); ECA>Min () | ECA>Min () |
| Dataset | Epochs | 80/20 | 65/35 | 50/50 |
|---|---|---|---|---|
| ImageNet (subset) | 20 | ECA>Min () | ECA>Min () | ECA>Min () |
| 500 | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | ECA>Min () | |
| 5,000 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min () | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | |
| CIFAR-10 | 20 | ECA>Min () | ECA>Min () | ECA>Min () |
| 500 | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | ECA>Min () | |
| 5,000 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min (); ECA>Ker () | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | |
| Fashion-MNIST | 20 | ECA>Min () | ECA>Min () | ECA>Min () |
| 500 | ECA>Max (); ECA>Avg (); ECA>Min () | ECA>Avg (); ECA>Min () | ECA>Min () | |
| 5,000 | ECA>Max (); ECA>Avg (); ECA>Med (); ECA>Min () | ECA>Avg (); ECA>Min () | ECA>Min () |
| Dataset | Method | 80/20 Split | 65/35 Split | 50/50 Split | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 500 ep. | 5000 ep. | 10000 ep. | 500 ep. | 5000 ep. | 10000 ep. | 500 ep. | 5000 ep. | 10000 ep. | ||
| ImageNet (subset) | ResNet-50 | 70.3 / 29.7 / 70.0 | 75.9 / 24.1 / 75.7 | 76.1 / 23.9 / 75.9 | 68.4 / 31.6 / 68.2 | 74.3 / 25.7 / 74.0 | 75.0 / 25.0 / 74.8 | 66.5 / 33.5 / 66.2 | 72.5 / 27.5 / 72.3 | 73.0 / 27.0 / 72.8 |
| DenseNet-121 | 71.0 / 29.0 / 70.8 | 76.8 / 23.2 / 76.7 | 76.9 / 23.1 / 76.8 | 69.0 / 31.0 / 68.7 | 75.2 / 24.8 / 75.0 | 75.8 / 24.2 / 75.6 | 67.3 / 32.7 / 67.0 | 73.6 / 26.4 / 73.4 | 74.1 / 25.9 / 73.9 | |
| EfficientNet-B0 | 72.1 / 27.9 / 71.9 | 77.5 / 22.5 / 77.3 | 77.7 / 22.3 / 77.5 | 70.5 / 29.5 / 70.3 | 76.0 / 24.0 / 75.8 | 76.4 / 23.6 / 76.2 | 68.4 / 31.6 / 68.2 | 74.2 / 25.8 / 74.0 | 74.7 / 25.3 / 74.5 | |
| MobileNetV2 | 69.2 / 30.8 / 68.9 | 74.1 / 25.9 / 73.9 | 74.3 / 25.7 / 74.1 | 67.3 / 32.7 / 67.0 | 72.6 / 27.4 / 72.3 | 73.2 / 26.8 / 73.0 | 65.5 / 34.5 / 65.2 | 70.8 / 29.2 / 70.5 | 71.4 / 28.6 / 71.1 | |
| ViT-Small | 70.7 / 29.3 / 70.5 | 76.2 / 23.8 / 76.0 | 76.4 / 23.6 / 76.2 | 69.1 / 30.9 / 68.8 | 74.9 / 25.1 / 74.7 | 75.4 / 24.6 / 75.2 | 67.2 / 32.8 / 66.9 | 73.3 / 26.7 / 73.1 | 73.9 / 26.1 / 73.7 | |
| ECA110-Pooling | 71.7 / 28.3 / 71.5 | 73.8 / 26.2 / 73.6 | 74.0 / 26.0 / 73.9 | 70.5 / 29.5 / 70.3 | 72.7 / 27.3 / 72.5 | 72.9 / 27.1 / 72.7 | 69.0 / 31.0 / 68.8 | 71.4 / 28.6 / 71.2 | 71.7 / 28.3 / 71.5 | |
| CIFAR-10 | ResNet-50 | 88.6 / 11.4 / 88.5 | 94.4 / 5.6 / 94.3 | 94.6 / 5.4 / 94.5 | 87.3 / 12.7 / 87.2 | 93.8 / 6.2 / 93.6 | 94.1 / 5.9 / 93.9 | 85.9 / 14.1 / 85.7 | 92.4 / 7.6 / 92.2 | 92.7 / 7.3 / 92.5 |
| DenseNet-121 | 89.2 / 10.8 / 89.1 | 94.9 / 5.1 / 94.7 | 95.1 / 4.9 / 94.9 | 87.9 / 12.1 / 87.7 | 94.3 / 5.7 / 94.1 | 94.6 / 5.4 / 94.4 | 86.5 / 13.5 / 86.3 | 92.9 / 7.1 / 92.7 | 93.2 / 6.8 / 93.0 | |
| EfficientNet-B0 | 89.8 / 10.2 / 89.6 | 95.4 / 4.6 / 95.2 | 95.6 / 4.4 / 95.4 | 88.5 / 11.5 / 88.3 | 94.8 / 5.2 / 94.6 | 95.0 / 5.0 / 94.8 | 87.0 / 13.0 / 86.8 | 93.5 / 6.5 / 93.3 | 93.9 / 6.1 / 93.7 | |
| MobileNetV2 | 87.3 / 12.7 / 87.1 | 93.2 / 6.8 / 93.0 | 93.5 / 6.5 / 93.3 | 85.9 / 14.1 / 85.7 | 92.7 / 7.3 / 92.5 | 93.1 / 6.9 / 92.9 | 84.5 / 15.5 / 84.3 | 91.4 / 8.6 / 91.2 | 91.9 / 8.1 / 91.7 | |
| ViT-Small | 88.2 / 11.8 / 88.0 | 94.0 / 6.0 / 93.8 | 94.2 / 5.8 / 94.0 | 86.8 / 13.2 / 86.6 | 93.4 / 6.6 / 93.2 | 93.7 / 6.3 / 93.5 | 85.4 / 14.6 / 85.2 | 92.0 / 8.0 / 91.8 | 92.4 / 7.6 / 92.2 | |
| ECA110-Pooling | 91.2 / 8.8 / 91.0 | 92.9 / 7.1 / 92.7 | 93.0 / 7.0 / 92.8 | 90.3 / 9.7 / 90.1 | 92.1 / 7.9 / 91.9 | 92.3 / 7.7 / 92.1 | 89.4 / 10.6 / 89.2 | 91.4 / 8.6 / 91.2 | 91.6 / 8.4 / 91.4 | |
| Fashion-MNIST | ResNet-50 | 95.0 / 5.0 / 94.9 | 96.2 / 3.8 / 96.1 | 96.3 / 3.7 / 96.2 | 94.2 / 5.8 / 94.1 | 95.6 / 4.4 / 95.5 | 95.8 / 4.2 / 95.7 | 93.5 / 6.5 / 93.4 | 94.9 / 5.1 / 94.8 | 95.1 / 4.9 / 95.0 |
| DenseNet-121 | 95.3 / 4.7 / 95.2 | 96.4 / 3.6 / 96.3 | 96.5 / 3.5 / 96.4 | 94.5 / 5.5 / 94.4 | 95.8 / 4.2 / 95.7 | 96.0 / 4.0 / 95.9 | 93.8 / 6.2 / 93.7 | 95.2 / 4.8 / 95.1 | 95.4 / 4.6 / 95.3 | |
| EfficientNet-B0 | 95.6 / 4.4 / 95.5 | 96.6 / 3.4 / 96.5 | 96.7 / 3.3 / 96.6 | 94.8 / 5.2 / 94.7 | 96.0 / 4.0 / 95.9 | 96.2 / 3.8 / 96.1 | 94.1 / 5.9 / 94.0 | 95.5 / 4.5 / 95.4 | 95.7 / 4.3 / 95.6 | |
| MobileNetV2 | 94.7 / 5.3 / 94.6 | 95.9 / 4.1 / 95.8 | 96.0 / 4.0 / 95.9 | 94.0 / 6.0 / 93.9 | 95.3 / 4.7 / 95.2 | 95.5 / 4.5 / 95.4 | 93.3 / 6.7 / 93.2 | 94.7 / 5.3 / 94.6 | 94.9 / 5.1 / 94.8 | |
| ViT-Small | 95.1 / 4.9 / 95.0 | 96.2 / 3.8 / 96.1 | 96.3 / 3.7 / 96.2 | 94.4 / 5.6 / 94.3 | 95.6 / 4.4 / 95.5 | 95.8 / 4.2 / 95.7 | 93.6 / 6.4 / 93.5 | 95.0 / 5.0 / 94.9 | 95.2 / 4.8 / 95.1 | |
| ECA110-Pooling | 96.2 / 3.8 / 96.1 | 96.5 / 3.5 / 96.4 | 96.6 / 3.4 / 96.5 | 95.8 / 4.2 / 95.7 | 96.0 / 4.0 / 95.9 | 96.1 / 3.9 / 96.0 | 95.3 / 4.7 / 95.2 | 95.6 / 4.4 / 95.5 | 95.7 / 4.3 / 95.6 | |
| Method | Number of Parameters | Memory Footprint (MB) | Observations |
|---|---|---|---|
| MaxPooling | 0 | ≈ 0 | Fixed operator, no trainable parameters. |
| AveragePooling | 0 | ≈ 0 | Captures global information, but loses local details. |
| MedianPooling | 0 | ≈ 0 | Robust to noise and outliers, but slightly higher computational cost. |
| MinPooling | 0 | ≈ 0 | Rarely used, generally yields weak performance. |
| KernelPooling | ∼50k | ∼0.2–0.5 | Introduces learnable kernels; modest increase in model size. |
| ECA110-Pooling | 0 | ≈ 0 | Lightweight, rule-based operator; high efficiency and competitive performance. |
| ResNet-50 | ∼25M | ∼98 | Classical SOTA architecture, highly accurate but computationally expensive. |
| DenseNet-121 | ∼8M | ∼33 | Dense connections; strong accuracy but higher inference cost. |
| EfficientNet-B0 | ∼5M | ∼20 | Balanced trade-off between performance and efficiency. |
| MobileNetV2 | ∼3.5M | ∼14 | Optimized for mobile devices; very efficient. |
| ViT-Small | ∼22M | ∼85 | Vision Transformer; strong performance but high memory and data requirements. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).