Submitted:
04 May 2025
Posted:
05 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Taxonomy of Anomaly Detection Methods
3. Key Findings, Research Gaps and Proposed Directions
3.1. Higher-Level and Niche Domain Selection
3.2. Identified Research Gaps and Opportunities for Improvement
- Ensemble and Hybrid Modeling: Very few studies have explored the combination of multiple machine learning paradigms (e.g., clustering, density estimation, reconstruction, forecasting) into ensemble or hybrid architectures. This remains a promising avenue to enhance model robustness.
- Limited Focus on Options Data: The majority of studies concentrate on equities or futures; options markets, despite their growing importance, remain largely neglected in anomaly detection frameworks.
- False Positive and False Negative Challenges: Many existing systems suffer from high false positive or false negative rates, necessitating the development of more precise detection models.
- Underutilization of Transformer-Based Attention: Transformer architectures, particularly attention mechanisms, have not been sufficiently exploited for sequential financial data, especially in the context of anomaly detection.
- Insufficient Semi-Supervised Learning: Semi-supervised approaches, which are critical when labeled anomaly data is scarce, are not extensively applied in the current literature.
- Minimal Use of Clustering Techniques: Clustering methods, particularly for unsupervised anomaly detection, have been underutilized.
- Explainable AI (XAI) for Trustworthiness: While anomaly detection models are advancing, their interpretability remains poor. Techniques such as SHAP and LIME could be deployed to explain model predictions, thereby improving trust and regulatory acceptance.
3.3. Metrics and Evaluation Criteria
- Classification Metrics: ROC-AUC, Precision, Recall, and F1-Score to measure detection performance.
- Imbalanced Dataset Metrics: Matthews Correlation Coefficient (MCC) for assessing performance under data imbalance.
- Operational Metrics: Confusion matrix analysis and latency measurements for real-time deployment assessment.
- Profitability Metrics: Sharpe ratio calculation to evaluate profitability impact.
- Portfolio Stability: Volatility reduction measures in the managed portfolio.
- Robustness: Stress testing under extreme market conditions to assess model resilience.
3.4. Specific Findings and Observations
- Deep Learning Strengths: Deep neural networks effectively learn complex, hierarchical data structures, crucial for identifying subtle market anomalies.
- Autoencoders and Anomaly Detection: Autoencoders, particularly in unsupervised or semi-supervised settings, are highly effective. LSTM Autoencoders outperform standard LSTM models in certain scenarios.
- Simulated Fraud Data: Due to the rarity of real-world manipulation cases, synthetic fraud datasets are often generated by injecting simulated anomalies into clean datasets.
- Feature Engineering Importance: Features such as price spreads, volatility measures, trading volume anomalies, and divergence between option and underlying asset prices are essential signals.
- Options Market Specificity: Metrics like implied volatility, option Greeks (delta, gamma, theta, vega), and order flow dynamics are critical in detecting anomalies unique to options markets.
- Volatility Prediction Techniques: GARCH models, SVMs, random forests, and LSTMs have demonstrated varying levels of success in volatility prediction tasks.
- Explainability Need: It is advisable to build inherently interpretable models rather than relying exclusively on post-hoc explainability techniques.
- Handling Dimensionality: Techniques such as feature selection (MRMR, CMIM), dimensionality reduction (PCA, t-SNE), and deep autoencoders are necessary to combat the curse of dimensionality.
3.5. Proposed Research Directions
- Hybrid Architecture: Develop models combining LSTM for sequence modeling, Transformers for attention mechanisms, and GANs for synthetic anomaly generation.
- Multi-Paradigm Anomaly Detection: Create ensemble frameworks that integrate clustering, density estimation, reconstruction, and forecasting models.
- Options Market Focus: Build dedicated anomaly detection systems for options data, considering OI spikes, implied volatility shifts, and unusual order flows.
- Real-Time Processing Pipelines: Leverage Apache Kafka for ingestion and Apache Flink or Spark Streaming for real-time anomaly detection.
- Model Update Mechanisms: Implement reinforcement learning or incremental learning strategies to enable continuous model adaptation without full retraining.
- Explainable AI Integration: Integrate SHAP and LIME explainability from model inception to enhance trustworthiness and facilitate regulatory compliance.
- Stock Grouping Strategies: Cluster stocks by industry (e.g., healthcare, FMCG) to capture sector-specific anomaly patterns.
- Stress Testing Models: Regularly subject detection models to extreme market simulation scenarios to ensure robustness.
4. Research Motivation, Objectives, and Proposed Framework
4.1. Research Motivation
4.2. Research Objectives
4.3. Proposed Methodological Framework
4.4. Expected Contributions
5. Conclusion
6. Glossary of Terms
References
- Brugman, S. R. D. "The development of a real-time monitoring system for fatigue detection on truckers." Bachelor's thesis, University of Twente, 2022.
- Veryzhenko, Iryna, Nohade Nasrallah, and Henri Garcia. "Detecting spoofing in high frequency trading using machine learning techniques.".
- Rizvi, Baqar, Ammar Belatreche, and Ahmed Bouridane. "Stock Price Manipulation Detection using Empirical Mode Decomposition based Kernel Density Estimation Clustering Method." (2018).
- Zhao, Hang, Yujing Wang, Juanyong Duan, Congrui Huang, Defu Cao, Yunhai Tong, Bixiong Xu, Jing Bai, Jie Tong, and Qi Zhang. "Multivariate time-series anomaly detection via graph attention network." In 2020 IEEE international conference on data mining (ICDM), pp. 841-850. IEEE, 2020.
- Mejri, Nesryne, Laura Lopez-Fuentes, Kankana Roy, Pavel Chernakov, Enjie Ghorbel, and Djamila Aouada. "Unsupervised anomaly detection in time-series: An extensive evaluation and analysis of state-of-the-art methods." Expert Systems with Applications (2024): 124922.
- Chen, Yuexing, Maoxi Li, Mengying Shu, Wenyu Bi, and Siwei Xia. "Multi-modal Market Manipulation Detection in High-Frequency Trading Using Graph Neural Networks." Journal of Industrial Engineering and Applied Science 2, no. 6 (2024): 111-120. [CrossRef]
- Xi, Yue, Yining Zhang, and Hanqing Zhang. "Real-time Multimodal Route Optimization and Anomaly Detection for Cross-border Logistics Using Deep Reinforcement Learning." Academia Nexus Journal 3, no. 3 (2024). [CrossRef]
- Guanghe, Cao, Shuaiqi Zheng, Yibang Liu, and Maoxi Li. "Real-time anomaly detection in dark pool trading using enhanced transformer networks." Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 3, no. 4 (2024): 320-329. [CrossRef]
- Cao, Guanghe, Yitian Zhang, Qi Lou, and Gaike Wang. "Optimization of High-Frequency Trading Strategies Using Deep Reinforcement Learning." Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023 6, no. 1 (2024): 230-257. [CrossRef]
- Li, Maoxi, Mengying Shu, and Tianyu Lu. "Anomaly Pattern Detection in High-Frequency Trading Using Graph Neural Networks." Journal of Industrial Engineering and Applied Science 2, no. 6 (2024): 77-85. [CrossRef]
- Alaminos, David, M. Belén Salas, and Antonio Partal-Ureña. "Hybrid ARMA-GARCH-Neural Networks for intraday strategy exploration in high-frequency trading." Pattern Recognition 148 (2024): 110139. [CrossRef]
- Shanmuganathan, V., and Annamalai Suresh. "Markov enhanced I-LSTM approach for effective anomaly detection for time series sensor data." International Journal of Intelligent Networks 5 (2024): 154-160. [CrossRef]
- Bello, Halima Oluwabunmi, Adebimpe Bolatito Ige, and Maxwell Nana Ameyaw. "Deep learning in high-frequency trading: conceptual challenges and solutions for real-time fraud detection." World Journal of Advanced Engineering Technology and Sciences 12, no. 02 (2024): 035-046. [CrossRef]
- Li, Lin, Yitian Zhang, Jiayi Wang, and Ke Xiong. "Deep Learning-Based Network Traffic Anomaly Detection: A Study in IoT Environments." (2024). [CrossRef]
- M. Jin et al., "A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10466-10485, Dec. 2024. [CrossRef]
- Haq, Ijaz Ul, Byung Suk Lee, Donna M. Rizzo, and Julia N. Perdrial. "An automated machine learning approach for detecting anomalous peak patterns in time series data from a research watershed in the northeastern United States critical zone." Machine Learning with Applications 16 (2024): 100543. [CrossRef]
- Bakumenko, Alexander, and Ahmed Elragal. "Detecting anomalies in financial data using machine learning algorithms." Systems 10, no. 5 (2022): 130. [CrossRef]
- Poutré, Cédric, Didier Chételat, and Manuel Morales. "Deep unsupervised anomaly detection in high-frequency markets." The Journal of Finance and Data Science 10 (2024): 100129. [CrossRef]
- Yang, Mingxuan, Decheng Huang, Weixiang Wan, and Meizhizi Jin. "Federated learning for privacy-preserving medical data sharing in drug development." Applied and Computational Engineering 108 (2024): 7-13. [CrossRef]
- Guanghe, Cao, Shuaiqi Zheng, Yibang Liu, and Maoxi Li. "Real-time anomaly detection in dark pool trading using enhanced transformer networks." Journal of Knowledge Learning and Science Technology 3, no. 4 (2024): 320–329. [CrossRef]
- Ashtiani, Matin N., and Bijan Raahemi. "Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review." Ieee Access 10 (2021): 72504-72525. [CrossRef]
- Wang, Jiayi, Tianyu Lu, Lin Li, and Decheng Huang. "Enhancing personalized search with ai: a hybrid approach integrating deep learning and cloud computing." Journal of Advanced Computing Systems 4, no. 10 (2024): 1-13. [CrossRef]
- Wang, Shikai, Haotian Zheng, Xin Wen, and Shang Fu. "Distributed high-performance computing methods for accelerating deep learning training." Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 3, no. 3 (2024): 108-126.
- Ibikunle, Gbenga, Ben Moews, Dmitriy Muravyev, and Khaladdin Rzayev. "Can machine learning unlock new insights into high-frequency trading?." arXiv preprint arXiv:2405.08101 (2024).
- Zhang, Lu, and Lei Hua. "Major Issues in High-Frequency Financial Data Analysis: A Survey of Solutions." Mathematics 13, no. 3 (2025): 347. [CrossRef]
- Bello, Halima Oluwabunmi, Adebimpe Bolatito Ige, and Maxwell Nana Ameyaw. "Adaptive machine learning models: concepts for real-time financial fraud prevention in dynamic environments." World Journal of Advanced Engineering Technology and Sciences 12, no. 02 (2024): 021-034. [CrossRef]
- Basit, Jamshaid, Danish Hanif, and Madiha Arshad. "Quantum Variational Autoencoders for Predictive Analytics in High Frequency Trading Enhancing Market Anomaly Detection." International Journal of Emerging Multidisciplinaries: Computer Science & Artificial Intelligence 3, no. 1. [CrossRef]
- Nguyen, Huu Du, Kim Phuc Tran, Sébastien Thomassey, and Moez Hamad. "Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management." International Journal of Information Management 57 (2021): 102282. [CrossRef]
- Zhao, Zihan, Longfei Li, Qingyun Wu, and Liuyi Yao. "DeepLOB: Deep Convolutional Neural Networks for Limit Order Books." IEEE Transactions on Signal Processing 68 (2020): 1441–1452.
- Sirignano, Justin, and Rama Cont. "Universal Features of Price Formation in Financial Markets: Perspectives from Deep Learning." Quantitative Finance 19, no. 9 (2019): 1449–1459. [CrossRef]
- Zhang, Ziyu, Yichuan Charlie Hu, and Srinivasan Parthasarathy. "Spatio-Temporal Modeling of Financial Data via Temporal Graph Attention Networks." Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM), 2020.
- Zhu, Zhiwei Steven, and Timothy Chan. "Anomaly Detection in High-Frequency Trading Data Using LSTM Networks." Journal of Financial Data Science 2, no. 1 (2020): 55–69.

| Sr. No. | Paper (First Author + Year) | Problem Addressed | Methods Used | Dataset Used | Key Findings | Limitations |
| 1 | Brugman (2022) | Real-time fatigue detection in trucking industry | Real-time monitoring system, CNN | Physiological data from truckers | Demonstrated feasibility of real-time fatigue monitoring | Limited to physiological signals, not financial domain |
| 2 | Veryzhenko (2023) | Spoofing detection in high-frequency trading (HFT) | Machine Learning classifiers (SVM, Random Forest) | Simulated order book data | Improved spoofing detection performance with ML models | Absence of real HFT trading data for validation |
| 3 | Rizvi (2018) | Detection of stock price manipulation. | Clustering using Empirical Mode Decomposition (EMD) and Kernel Density Estimation (KDE). | Simulated financial time series | Proposed an effective clustering approach for unsupervised manipulation detection | Scalability and real-time deployment not addressed |
| 4 | Zhao (2020) | Multivariate time-series anomaly detection | Graph Attention Networks (GAT) | Public industrial datasets (not specific to finance) | Outperformed traditional time-series models using graph-based approach | Model complexity; real-time inference not tested |
| 5 | Mejri (2024) | Evaluation of time-series anomaly detection methods | Comparative study of 10+ algorithms (Forecasting, Reconstruction) | Public time-series datasets | Provided detailed benchmarking of unsupervised techniques | Lack of focus on financial or high-frequency datasets |
| 6 | Chen (2024) | Market manipulation detection using multi-modal data | Graph Neural Networks (GNN) on order book and trade network data | Simulated multi-modal HFT datasets | Achieved better detection by combining multiple data modalities | Scalability to live data streams remains a challenge |
| 7 | Xi (2024) | Optimization of routes and detection of anomalies in logistics. | Deep Reinforcement Learning (DRL) | Cross-border logistics datasets | Proposed a multimodal DRL framework | Application outside finance; financial anomalies not considered |
| 8 | Guanghe (2024) | Detection of anomalies in dark pool trading in real-time. | Enhanced Transformer Networks | Simulated dark pool transaction data | Improved performance for opaque trading environments | Lack of real dark pool datasets for training and validation |
| 9 | Cao (2024) | Optimization of HFT strategies | Deep Reinforcement Learning (DRL) | Simulated HFT trading datasets | DRL models achieved superior performance in dynamic strategy optimization | Focused on strategy optimization, not anomaly detection |
| 10 | Li (2024) | Anomaly pattern detection in HFT | Graph Neural Networks | Financial transaction datasets | Successfully captured complex transaction patterns for anomaly detection | Scalability and real-time deployment issues |
| 11 | Alaminos (2024) | Intraday strategy exploration in HFT | Hybrid ARMA-GARCH-Neural Network model | High-frequency trading data | Demonstrated improved intraday trading strategy prediction using hybrid models | Limited real-time adaptability; offline model focus |
| 12 | Shanmuganathan (2024) | Anomaly detection in time-series sensor data | Markov-enhanced I-LSTM | Sensor data (non-financial) | Enhanced anomaly detection accuracy through Markov modeling | Not directly validated on financial datasets |
| 13 | Bello (2024) | Fraud detection in high-frequency trading | Deep Learning architectures | Conceptual framework, no specific dataset used | Addressed conceptual challenges and proposed real-time fraud detection solutions | Lack of empirical results and benchmarks |
| 14 | Li (2024) | Network traffic anomaly detection in IoT environments | Deep Learning models (CNN, LSTM) | IoT network datasets | Showed effectiveness of DL models for network anomaly detection | Application domain outside of financial trading |
| 15 | Jin (2024) | Survey of GNNs for time-series tasks | Review of forecasting, classification, imputation, anomaly detection using GNNs | Various time-series datasets | Summarized the potential of GNNs across multiple time-series applications | Limited specific financial use-case demonstrations |
| 16 | Haq (2024) | Detecting anomalous peak patterns in watershed time series | Automated ML (AutoML) approach | Watershed environmental datasets | Demonstrated effectiveness of AutoML for peak anomaly detection | Application domain is environmental science, not finance |
| 17 | Bakumenko (2022) | Financial data anomaly detection using ML | Machine Learning algorithms (XGBoost, RF, SVM) | Financial transaction datasets | ML models successfully identified financial anomalies | Need for higher-dimensional and real-time adaptation |
| 18 | Poutré (2024) | Deep unsupervised anomaly detection in HFT markets | Deep Autoencoder models | High-frequency market data | Achieved strong anomaly detection without labeled data | Challenges in scaling for large real-time systems |
| 19 | Yang (2024) | Privacy-preserving data sharing using federated learning | Federated Learning frameworks | Medical datasets (non-financial) | Demonstrated privacy-preserving anomaly detection using federated learning | No application to financial markets or HFT scenarios |
| 20 | Cao (2024) | Detection of anomalies in real-time within dark pool trading | Enhanced Transformer Networks | Dark pool transaction datasets | Developed an enhanced transformer model for opaque market conditions | Dataset quality limitations; repeated listing suggests overlap |
| 21 | Ashtiani (2021) | Fraud detection in financial statements | Machine Learning and Data Mining | Financial statement data | Provided a systematic literature review for intelligent fraud detection | Focused on offline financial reports, not HFT or real-time data |
| 22 | Wang, Jiayi (2024) | Personalized search enhancement using AI | Hybrid Deep Learning and Cloud Computing | User interaction datasets | Proposed hybrid model improving search personalization | Domain not related to financial trading or anomaly detection |
| 23 | Wang, Shikai (2024) | Accelerating deep learning training | Distributed High-Performance Computing methods | Deep learning training datasets | Improved computational efficiency for training deep models | Focused on system acceleration, not anomaly detection |
| 24 | Ibikunle (2024) | Potential of machine learning in HFT analysis | Survey and conceptual insights | Various public and private datasets reviewed | Highlighted the opportunities ML offers in understanding HFT behaviors | Lacks experimental model validation and implementation examples |
| 25 | Zhang, Lu (2025) | Major issues in high-frequency financial data analysis | Survey and solution categorization | Multiple HFT datasets | Presented key challenges and survey of solutions in financial HFT data analysis | No experimental model or comparative benchmarks provided |
| 26 | Bello, Halima (2024) | Adaptive machine learning models for detecting fraud in real-time. | Conceptual discussion of adaptive learning | Financial fraud environments (conceptual) | Addressed need for dynamic models in fraud prevention systems | Lack of experimental setups or real-world deployments |
| 27 | Basit (2024) | Quantum Variational Autoencoders for HFT anomaly detection | Quantum Machine Learning (QVAE) | Simulated high-frequency trading datasets | Proposed using quantum variational autoencoders to enhance anomaly detection | Early-stage; quantum computing feasibility remains uncertain |
| 28 | Nguyen (2021) | Forecasting and anomaly detection in supply chains | LSTM and LSTM Autoencoders | Supply chain time-series datasets | Demonstrated LSTM-based models improving anomaly detection and forecasting | Non-financial datasets; supply chain context, not HFT |
| 29 | Zhao, Zihan (2020) | Limit order book modeling for price prediction | Deep Convolutional Neural Networks (DeepLOB) | Public limit order book datasets | Achieved strong predictive performance using CNNs for order book data | Focused on price prediction, not explicitly on anomaly detection |
| 30 | Sirignano (2019) | Universal features of price formation using deep learning | Deep Neural Networks | Extensive limit order book data | Identified deep universal patterns in financial market behavior | Anomaly detection not the primary focus; descriptive analysis |
| 31 | Zhang, Ziyu (2020) | Spatio-temporal modeling of financial data | Temporal Graph Attention Networks (TGAT) | Financial transaction datasets | Improved modeling of complex spatio-temporal relationships | Scalability to extremely high-frequency real-time data untested |
| 32 | Zhu, Zhiwei (2020) | Anomaly detection in HFT using LSTM networks | LSTM-based anomaly detection | High-frequency trading datasets | Demonstrated LSTM networks effectively detecting anomalies in HFT | Real-time latency considerations not deeply addressed |
| Category | Key Techniques | Data Type | Supervision | Advantages | Limitations |
| Statistical Methods | Z-score, Histogram-based | Numeric | Unsupervised | Simple, interpretable | Assumes specific data distribution |
| Proximity-Based Methods | k-Nearest Neighbors (k-NN), LOF | Numeric | Unsupervised | Flexible, non-parametric | Computationally expensive in large/high-D |
| Clustering-Based Methods | k-Means, DBSCAN | Numeric | Unsupervised | Detects arbitrarily shaped clusters | Sensitive to parameter choice |
| Classification-Based | SVM, One-Class SVM | Numeric | Supervised / Semi-Supervised | High accuracy with labels | Requires labeled data |
| Reconstruction-Based | Autoencoders, PCA | High-dimensional | Unsupervised | Captures complex patterns | May reconstruct anomalies as normal |
| Ensemble Methods | Isolation Forest, Feature Bagging | Various | Unsupervised | Robust, captures diverse patterns | Increased model complexity |
| Deep Learning-Based | CNNs, RNNs (LSTM/GRU), VAEs | Unstructured | Unsupervised | Models non-linear, hierarchical features | Data- and compute-intensive |
| Graph-Based Methods | Graph Neural Networks, Subgraph Analysis | Rational | Unsupervised / Semi-Supervised | Captures structural relationships | Graph construction and scaling challenges |
| Hybrid Methods | Combinations of above (e.g., clustering + autoencoder) | Mixed | Varies | Leverages complementary strengths | Design and tuning complexity |
| Feature | Description | Relevance to Anomaly Detection |
| Implied Volatility | The market's forecasts of future volatility are based on option pricing. | Identifies sudden changes in perceived market risk. |
| Historical Volatility | Past realized volatility of the underlying asset. | Detects discrepancies between past and expected volatility. |
| Delta | The responsiveness of an option's price to variations in the underlying asset's price. | Captures unusual hedging activity or manipulation. |
| Gamma | Rate of change of Delta relative to the underlying asset price. | Highlights nonlinear price dynamics during anomalies. |
| Theta | Time decay of an option's value. | Observes pricing anomalies related to time-value erosion. |
| Vega | The responsiveness of option prices to shifts in volatility. | Identifies volatility-driven anomalies. |
| Bid-Ask Spread | Difference between highest buying and lowest selling prices. | Captures liquidity disruptions or artificial widening. |
| Order Flow Imbalance | Net difference between buy and sell orders. | Detects spoofing, momentum ignition, and liquidity shifts. |
| Best Bid and Ask Prices | The maximum bid and minimum ask prices currently offered. | Useful for detecting order book spoofing or layering. |
| Open Interest Dynamics | Number of open contracts outstanding. | Identifies abnormal buildup or unwinding positions. |
| Metric | Purpose | Notes on Importance |
| ROC-AUC | Assesses the model's capacity to differentiate between categories. | Robust to class imbalance, good for anomaly detection tasks. |
| Precision | Ratio of true positives within predicted positives. | Crucial when false positives incur high costs. |
| Recall | Proportion of true positives detected among all actual positives. | Important to catch rare but critical anomalies. |
| F1-Score | The harmonic mean of precision and recall. | Achieves a balance between false positives and false negatives. |
| Matthews Correlation Coefficient (MCC) | Evaluates the quality of binary classification, particularly in cases of class imbalance. | Critical for rare-event anomaly detection. |
| Sharpe Ratio Impact | Evaluates improvement in portfolio profitability after anomaly detection. | Links model accuracy with trading strategy profitability. |
| Latency Measurement | Time taken to detect anomalies after occurrence. | Essential for real-time financial market applications. |
| Stress Testing Under Market Regimes | Testing model under extreme volatility and low liquidity conditions. | Validates robustness and practical usability of the model. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).