Submitted:
17 April 2025
Posted:
18 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Importance and Challenges of Database Anomaly Detection
1.2. Potential of Sample Difficulty-Based Approaches in Anomaly Detection
2. Related Work
2.1. Overview of Traditional Database Anomaly Detection Methods
2.2. Deep Learning-Based Anomaly Detection Techniques
2.3. Applications of Sample Difficulty Estimation in Machine Learning
3. Based on Sample Difficulty Estimation of Anomaly Detection Framework
3.1. Design of Sample Difficulty Estimation Model
3.2. Difficulty-Oriented Priority Assignment Mechanism
3.3. Adaptive Computational Resource Allocation Strategy
4. Experiments and Result Analysis
4.1. Experimental Setup and Datasets
4.2. Efficiency and Accuracy Evaluation
4.3. Comparative Analysis with Existing Methods
5. Conclusions
5.1. Research Summary
5.2. Limitations Discussion
Acknowledgment
References
- Mosin, V., Staron, M., Durisic, D., de Oliveira Neto, F. G., Pandey, S. K., & Koppisetty, A. C. (2022, August). Comparing input prioritization techniques for testing deep learning algorithms. In 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA) (pp. 76-83). IEEE.
- Zhao, X., & Huang, C. (2024, September). Efficient Anomaly Detection Algorithm for Operational Data Based on Fuzzy Cognitive Map. In 2024 3rd International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC) (pp. 201-204). IEEE.
- Liu, Y., Lou, Y., & Huang, S. (2020, June). Parallel algorithm of flow data anomaly detection based on isolated forest. In 2020 international conference on artificial intelligence and electromechanical automation (AIEA) (pp. 132-135). IEEE.
- Pan, J., Dong, Y., Chen, B., Fu, J., & Huang, A. (2023, August). Research on parallel detection of heterogeneous cloud resources with multiple anomalies in cross-type database. In 2023 11th International Conference on Information Technology: IoT and Smart City (ITIoTSC) (pp. 68-72). IEEE.
- Shirbhate, D. D., & Gupta, S. R. (2024, November). Unveiling Covert Databases: A Comprehensive Detection Framework. In 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI) (pp. 1-6). IEEE.
- Ma, X., Bi, W., Li, M., Liang, P., & Wu, J. (2025). An Enhanced LSTM-based Sales Forecasting Model for Functional Beverages in Cross-Cultural Markets. Applied and Computational Engineering, 118, 55-63. [CrossRef]
- Wang, J., Zhao, Q., & Xi, Y. (2025). Cross-lingual Search Intent Understanding Framework Based on Multi-modal User Behavior. Annals of Applied Sciences, 6(1). [CrossRef]
- Huang, D., Yang, M., & Zheng, W. (2024). Using Deep Reinforcement Learning for Optimizing Process Parameters in CHO Cell Cultures for Monoclonal Antibody Production. Artificial Intelligence and Machine Learning Review, 5(3), 12-27. [CrossRef]
- Huang, T., Xu, Z., Yu, P., Yi, J., & Xu, X. (2025). A Hybrid Transformer Model for Fake News Detection: Leveraging Bayesian Optimization and Bidirectional Recurrent Unit. arXiv preprint arXiv:2502.09097.
- Weng, J., Jiang, X., & Chen, Y. (2024). Real-time Squat Pose Assessment and Injury Risk Prediction Based on Enhanced Temporal Convolutional Neural Networks. [CrossRef]
- Xu, X., Yu, P., Xu, Z., & Wang, J. (2025). A hybrid attention framework for fake news detection with large language models. arXiv preprint arXiv:2501.11967.
- Ma, X., & Fan, S. (2024). Research on Cross-national Customer Churn Prediction Model for Biopharmaceutical Products Based on LSTM-Attention Mechanism. Academia Nexus Journal, 3(3).
- Bi, W., Trinh, T. K., & Fan, S. (2024). Machine Learning-Based Pattern Recognition for Anti-Money Laundering in Banking Systems. Journal of Advanced Computing Systems, 4(11), 30-41.
- Yu, P., Xu, Z., Wang, J., & Xu, X. (2025). The Application of Large Language Models in Recommendation Systems. arXiv preprint arXiv:2501.02178.
- Chen, J., Yan, L., Wang, S., & Zheng, W. (2024). Deep Reinforcement Learning-Based Automatic Test Case Generation for Hardware Verification. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 6(1), 409-429. [CrossRef]
- Weng, J., & Jiang, X. (2024). Research on Movement Fluidity Assessment for Professional Dancers Based on Artificial Intelligence Technology. Artificial Intelligence and Machine Learning Review, 5(4), 41-54. [CrossRef]
- Ma, D. (2024). AI-Driven Optimization of Intergenerational Community Services: An Empirical Analysis of Elderly Care Communities in Los Angeles. Artificial Intelligence and Machine Learning Review, 5(4), 10-25. [CrossRef]
- Wang, P., Varvello, M., Ni, C., Yu, R., & Kuzmanovic, A. (2021, May). Web-lego: trading content strictness for faster webpages. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications (pp. 1-10). IEEE.
- Ni, C., Zhang, C., Lu, W., Wang, H., & Wu, J. (2024). Enabling Intelligent Decision Making and Optimization in Enterprises through Data Pipelines.
- Diao, S., Wan, Y., Huang, D., Huang, S., Sadiq, T., Khan, M. S., ... & Mazhar, T. (2025). Optimizing Bi-LSTM networks for improved lung cancer detection accuracy. PloS one, 20(2), e0316136. [CrossRef]
- Zhang, C., Lu, W., Ni, C., Wang, H., & Wu, J. (2024, June). Enhanced user interaction in operating systems through machine learning language models. In International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024) (Vol. 13180, pp. 1623-1630). SPIE.
- Xiao, Jue, Wei Xu, and Jianlong Chen. “Social media emotional state classification prediction based on Arctic Puffin Algorithm (APO) optimization of Transformer mode.” Authorea Preprints (2024).
- Chen, J., Xu, W., Ding, Z., Xu, J., Yan, H., & Zhang, X. (2024). Advancing Prompt Recovery in NLP: A Deep Dive into the Integration of Gemma-2b-it and Phi2 Models. arXiv preprint arXiv:2407.05233.
- Wang, H., Wu, J., Zhang, C., Lu, W., & Ni, C. (2024). Intelligent security detection and defense in operating systems based on deep learning. International Journal of Computer Science and Information Technology, 2(1), 359-367. [CrossRef]
- Lu, W., Ni, C., Wang, H., Wu, J., & Zhang, C. (2024). Machine learning-based automatic fault diagnosis method for operating systems.
- Jiang, C., Zhang, H., & Xi, Y. (2024). Automated Game Localization Quality Assessment Using Deep Learning: A Case Study in Error Pattern Recognition. Journal of Advanced Computing Systems, 4(10), 25-37.
- Liu, Y., Xu, Y., & Zhou, S. (2024). Enhancing User Experience through Machine Learning-Based Personalized Recommendation Systems: Behavior Data-Driven UI Design. Authorea Preprints. [CrossRef]






| Dataset | Sample Size | Anomaly Percentage | Pearson Correlation | Spearman Correlation |
|---|---|---|---|---|
| MNIST | 70,000 | 9.2% | -0.721 | -0.683 |
| CIFAR-10 | 60,000 | 10.0% | -0.694 | -0.651 |
| STL-10 | 13,000 | 8.5% | -0.758 | -0.722 |
| CloudDB | 45,000 | 2.3% | -0.812 | -0.793 |
| Difficulty Metric | AUC-ROC | Precision | Recall | F1-Score | Computational Overhead (ms) |
|---|---|---|---|---|---|
| Isolation Score | 0.878 | 0.912 | 0.865 | 0.888 | 12.3 |
| Density-Based | 0.891 | 0.889 | 0.903 | 0.896 | 18.7 |
| Surprise Adequacy | 0.914 | 0.927 | 0.882 | 0.904 | 25.2 |
| Combined (Our Approach) | 0.946 | 0.935 | 0.921 | 0.928 | 29.8 |
| Function Type | Mathematical Formulation | Sensitivity to High Difficulty | Discrimination Power | Selected |
|---|---|---|---|---|
| Linear | p = α × d + β | Low | Moderate | No |
| Exponential | p = eᵏᵈ | Very High | Poor | No |
| Sigmoid | p = 1/(1+e⁻ᵏ⁽ᵈ⁻ᵛ⁾) | Moderate | High | Yes |
| Logarithmic | p = log(d + 1) | Moderate | Low | No |
| Priority Range | Processing Tier | Detection Models Applied | Feature Set | Computational Resources | Time Budget (ms) |
|---|---|---|---|---|---|
| 0.0-0.3 | Tier 1 | Statistical Only | Basic | 10% | 5.2 |
| 0.3-0.6 | Tier 2 | Statistical + Light ML | Extended | 25% | 12.8 |
| 0.6-0.8 | Tier 3 | Statistical + Advanced ML | Full | 30% | 18.5 |
| 0.8-1.0 | Tier 4 | Statistical + Deep Learning | Full+ | 35% | 27.3 |
| Dataset | Source | Records | Features | Anomaly Percentage | Database Type |
|---|---|---|---|---|---|
| MNIST-AD | MNIST (Modified) | 70,000 | 784 | 9.21% | Image Data Store |
| CIFAR-10-AD | CIFAR-10 (Modified) | 60,000 | 3,072 | 10.03% | Image Data Store |
| CloudDB | Enterprise Cloud | 102,457 | 147 | 1.82% | Operational DB |
| Financial-Trans | Financial Institution | 284,807 | 29 | 0.17% | Transaction DB |
| IoT-Sensors | Smart Manufacturing | 943,528 | 21 | 2.41% | Time-Series DB |
| Metric | Dataset | Baseline (Uniform) | Proposed (Difficulty-Based) | Improvement (%) |
|---|---|---|---|---|
| Avg. Processing Time (ms/record) | MNIST-AD | 18.72 | 8.43 | 54.97% |
| CIFAR-10-AD | 27.35 | 13.82 | 49.47% | |
| CloudDB | 14.58 | 6.24 | 57.20% | |
| Financial-Trans | 5.83 | 2.91 | 50.09% | |
| IoT-Sensors | 3.47 | 1.65 | 52.45% | |
| CPU Utilization (%) | Combined | 78.4 | 42.3 | 46.05% |
| Memory Footprint (GB) | Combined | 34.2 | 18.7 | 45.32% |
| Energy Consumption (kWh) | Combined | 1.73 | 0.89 | 48.55% |
| Dataset | Method | Precision | Recall | F1-Score | AUC-ROC |
|---|---|---|---|---|---|
| MNIST-AD | Baseline | 0.921 | 0.914 | 0.917 | 0.943 |
| Proposed | 0.934 | 0.928 | 0.931 | 0.956 | |
| CIFAR-10-AD | Baseline | 0.887 | 0.872 | 0.879 | 0.912 |
| Proposed | 0.902 | 0.893 | 0.897 | 0.928 | |
| CloudDB | Baseline | 0.953 | 0.927 | 0.940 | 0.968 |
| Proposed | 0.962 | 0.945 | 0.953 | 0.974 | |
| Financial-Trans | Baseline | 0.892 | 0.814 | 0.851 | 0.931 |
| Proposed | 0.908 | 0.857 | 0.882 | 0.947 | |
| IoT-Sensors | Baseline | 0.927 | 0.901 | 0.914 | 0.952 |
| Proposed | 0.942 | 0.917 | 0.929 | 0.961 |
| Method | Avg. F1-Score | Avg. Detection Time (ms) | Scalability Factor | APFD Score |
|---|---|---|---|---|
| Statistical (Z-score) | 0.742 | 2.83 | 0.62 | 0.532 |
| Isolation Forest | 0.831 | 7.24 | 0.78 | 0.679 |
| Liu et al. (2020)8 | 0.857 | 15.47 | 0.83 | 0.709 |
| Mosin et al. (2022)9 | 0.912 | 17.82 | 0.71 | 0.891 |
| Zhao et al. (2024)10 | 0.895 | 12.35 | 0.84 | 0.827 |
| Pan et al. (2023)11 | 0.908 | 11.73 | 0.79 | 0.864 |
| Proposed Approach | 0.918 | 6.61 | 0.92 | 0.915 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).