Submitted:
16 April 2025
Posted:
17 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background and Motivation
1.2. Research Challenges in AML Detection
1.3. Research Objectives
2. Literature Review and Related Work
2.1. Traditional AML Detection Methods
2.2. Machine Learning in AML Detection
2.3. Deep Learning Methods
2.4. Reinforcement Learning in Financial Crime Detection
3. Proposed Dynamic Reinforcement Learning Framework
3.1. System Architecture Overview

3.2. Multi-layer Transaction Network Construction

3.3. Dynamic Feature Extraction

3.4. Adaptive Strategy Optimization Module
3.5. Reinforcement Learning Model Design
- Layer 1: Graph Convolutional Layer (Input: 512, Output: 256)
- Layer 2: Temporal Attention Layer (Input: 256, Output: 128)
- Layer 3: Policy Network (Input: 128, Output: Action_Space)
4. Implementation and Experimental Results
4.1. Dataset Description and Preprocessing

4.2. Experimental Setup and Parameters

4.3. Performance Evaluation Metrics
4.4. Comparative Analysis with Baseline Methods
4.5. Ablation Studies
- Network architecture variations
- Feature extraction mechanisms
- Reinforcement learning components
- Optimization strategies
| Component | Base Performance | Component Removed | Performance Change |
|---|---|---|---|
| Graph Layers | 0.925 | 0.847 | -8.43% |
| Temporal Features | 0.913 | 0.856 | -6.24% |
| Attention Mechanism | 0.925 | 0.879 | -4.97% |
| Experience Replay | 0.925 | 0.891 | -3.68% |
5. Conclusions
5.1. Summary of Contributions
5.2. Limitations and Challenges
6. Acknowledgment
References
- Alkhalili, M., Qutqut, M. H., & Almasalha, F. (2021). Investigation of applying machine learning for watch-list filtering in anti-money laundering. iEEE Access, 9, 18481-18496. [CrossRef]
- Canhoto, A. I. (2021). Leveraging machine learning in the global fight against money laundering and terrorism financing: An affordances perspective. Journal of Business Research, 131, 441-452. [CrossRef]
- Wang, Q., Tsai, W. T., & Du, B. (2025). RMGANets: reinforcement learning-enhanced multi-relational attention graph-aware network for anti-money laundering detection. Complex & Intelligent Systems, 11(1), 5. [CrossRef]
- Labanca, D., Primerano, L., Markland-Montgomery, M., Polino, M., Carminati, M., & Zanero, S. (2022). Amaretto: An active learning framework for money laundering detection. IEEE Access, 10, 41720-41739.
- Kute, D. V., Pradhan, B., Shukla, N., & Alamri, A. (2021). Deep learning and explainable artificial intelligence techniques applied for detecting money laundering–a critical review. IEEE Access, 9, 82300-82317. [CrossRef]
- Xia, S., Zhu, Y., Zheng, S., Lu, T., & Ke, X. (2024). A Deep Learning-based Model for P2P Microloan Default Risk Prediction. International Journal of Innovative Research in Engineering and Management, 11(5), 110-120.
- Li, S., Xu, H., Lu, T., Cao, G., & Zhang, X. (2024). Emerging Technologies in Finance: Revolutionizing Investment Strategies and Tax Management in the Digital Era. Management Journal for Advanced Research, 4(4), 35-49.
- Liu, Y., Xu, Y., & Zhou, S. (2024). Enhancing User Experience through Machine Learning-Based Personalized Recommendation Systems: Behavior Data-Driven UI Design. Authorea Preprints. [CrossRef]
- Xu, Y., Liu, Y., Wu, J., & Zhan, X. (2024). Privacy by Design in Machine Learning Data Collection: An Experiment on Enhancing User Experience. Applied and Computational Engineering, 97, 64-68. [CrossRef]
- Xu, X., Xu, Z., Yu, P., & Wang, J. (2025). Enhancing User Intent for Recommendation Systems via Large Language Models. Preprints.
- Li, L., Xiong, K., Wang, G., & Shi, J. (2024). AI-Enhanced Security for Large-Scale Kubernetes Clusters: Advanced Defense and Authentication for National Cloud Infrastructure. Journal of Theory and Practice of Engineering Science, 4(12), 33-47.
- Yu, P., Xu, X., & Wang, J. (2024). Applications of Large Language Models in Multimodal Learning. Journal of Computer Technology and Applied Mathematics, 1(4), 108-116.
- Wang, S., Hu, C., & Jia, G. (2024). Deep Learning-Based Saliency Assessment Model for Product Placement in Video Advertisements. Journal of Advanced Computing Systems, 4(5), 27-41. [CrossRef]
- Pu, Y., Chen, Y., & Fan, J. (2023). P2P Lending Default Risk Prediction Using Attention-Enhanced Graph Neural Networks. Journal of Advanced Computing Systems, 3(11), 8-20. [CrossRef]
- Yan, L., Zhou, S., Zheng, W., & Chen, J. (2024). Deep Reinforcement Learning-based Resource Adaptive Scheduling for Cloud Video Conferencing Systems. [CrossRef]
- Chen, J., Yan, L., Wang, S., & Zheng, W. (2024). Deep Reinforcement Learning-Based Automatic Test Case Generation for Hardware Verification. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 6(1), 409-429. [CrossRef]
- Yu, P., Xu, Z., Wang, J., & Xu, X. (2025). The Application of Large Language Models in Recommendation Systems. arXiv preprint arXiv:2501.02178.
- Liang, X., & Chen, H. (2024, July). One cloud subscription-based software license management and protection mechanism. In Proceedings of the 2024 International Conference on Image Processing, Intelligent Control and Computer Engineering (pp. 199-203).
- Xu, J., Wang, Y., Chen, H., & Shen, Z. (2025). Adversarial Machine Learning in Cybersecurity: Attacks and Defenses. International Journal of Management Science Research, 8(2), 26-33. [CrossRef]
- Chen, H., Shen, Z., Wang, Y., & Xu, J. (2024). Threat Detection Driven by Artificial Intelligence: Enhancing Cybersecurity with Machine Learning Algorithms.
- 21. Xu,J.;Chen,H.;Xiao,X.;Zhao,M.;Liu,B. (2025).Gesture Object Detection and Recognition Based on YOLOv11.Applied and Computational Engineering,133,81-89.
- Weng, J., & Jiang, X. (2024). Research on Movement Fluidity Assessment for Professional Dancers Based on Artificial Intelligence Technology. Artificial Intelligence and Machine Learning Review, 5(4), 41-54.
- Jiang, C., Jia, G., & Hu, C. (2024). AI-Driven Cultural Sensitivity Analysis for Game Localization: A Case Study of Player Feedback in East Asian Markets. Artificial Intelligence and Machine Learning Review, 5(4), 26-40. [CrossRef]
- Ma, D. (2024). AI-Driven Optimization of Intergenerational Community Services: An Empirical Analysis of Elderly Care Communities in Los Angeles. Artificial Intelligence and Machine Learning Review, 5(4), 10-25.
- Sun, J., Zhou, S., Zhan, X., & Wu, J. (2024). Enhancing Supply Chain Efficiency with Time Series Analysis and Deep Learning Techniques.
- Wang, Z., Shen, Q., Bi, S., & Fu, C. (2024). AI Empowers Data Mining Models for Financial Fraud Detection and Prevention Systems. Procedia Computer Science, 243, 891-899.
- Chen, J., Xu, W., Ding, Z., Xu, J., Yan, H., & Zhang, X. (2024). Advancing Prompt Recovery in NLP: A Deep Dive into the Integration of Gemma-2b-it and Phi2 Models. arXiv preprint arXiv:2407.05233.
- Wang, P., Varvello, M., Ni, C., Yu, R., & Kuzmanovic, A. (2021, May). Web-lego: trading content strictness for faster webpages. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications (pp. 1-10). IEEE.
- Ni, C., Zhang, C., Lu, W., Wang, H., & Wu, J. (2024). Enabling Intelligent Decision Making and Optimization in Enterprises through Data Pipelines.
- Xiong, K., Wu, Z., & Jia, X. (2025). DeepContainer: A Deep Learning-based Framework for Real-time Anomaly Detection in Cloud-Native Container Environments. Journal of Advanced Computing Systems, 5(1), 1-17.
- Ju, C., & Ma, X. (2024). Real-time Cross-border Payment Fraud Detection Using Temporal Graph Neural Networks: A Deep Learning Approach. International Journal of Computer and Information Systems (IJCIS), 5(1), 103-114. [CrossRef]

| Component | Input | Output | Key Functions |
|---|---|---|---|
| Data Preprocessing | Raw transaction data | Formatted transaction records | Data cleaning, normalization |
| Network Construction | Processed transactions | Multi-layer network | Node mapping, edge weighting |
| Feature Extraction | Network structure | Feature vectors | Temporal-spatial feature computation |
| RL Optimization | Feature vectors, rewards | Detection policies | Policy update, strategy adaptation |
| Layer | Node Type | Edge Type | Weight Computation |
|---|---|---|---|
| Account | Entity accounts | Direct transactions | Transaction volume |
| Entity | Legal entities | Business relationships | Interaction frequency |
| Community | Account clusters | Fund flow patterns | Flow intensity |
| Temporal | Time-stamped nodes | Sequential links | Time-weighted flows |
| Layer | Embedding Dimension | Update Frequency | Initialization Method |
|---|---|---|---|
| Account | 128 | Real-time | Random uniform |
| Entity | 256 | Daily | Xavier normal |
| Community | 512 | Weekly | Orthogonal |
| Temporal | 64 | Hourly | He initialization |
| Feature Type | Computation Method | Update Interval | Dimension |
|---|---|---|---|
| Topological | Graph convolution | Real-time | 64 |
| Temporal | LSTM encoding | Hourly | 128 |
| Behavioural | Attention mechanism | Daily | 256 |
| Risk | Multi-head attention | Real-time | 32 |
| Parameter | Value | Description | Optimization Range |
|---|---|---|---|
| Learning Rate | 0.0001 | Policy network update rate | [0.00001, 0.001] |
| Discount Factor | 0.99 | Future reward discount | [0.95, 0.999] |
| Batch Size | 256 | Training batch size | [64, 512] |
| BufferSize | 100000 | Experience replay capacity | [50000, 200000] |
| Category | Value | Description |
|---|---|---|
| Total Transactions | 10,205,218 | Complete transaction records |
| Unique Accounts | 484,932 | Individual account entities |
| Period | 24 months | Transaction period |
| Suspicious Cases | 11,816 | Confirmed suspicious patterns |
| Transaction Types | 8 | Different transaction categories |
| Processing Step | Method | Parameters | Output Format |
|---|---|---|---|
| Missing Value | MICE imputation | n_iterations=5 | Complete matrix |
| Normalization | Min-Max scaling | range=(0,1) | Normalized values |
| Feature Engineering | Graph embedding | dim=128 | Feature vectors |
| Temporal Alignment | Time window | window=1h | Aligned sequences |
| Component | Specification | Usage |
|---|---|---|
| CPU | Intel Xeon 64-core | Model training |
| GPU | NVIDIA A100 80GB | Network processing |
| Memory | 512GB DDR4 | Data handling |
| Storage | 8TB NVMe SSD | Dataset storage |
| Metric | Formula | Range | Optimal Value |
|---|---|---|---|
| Detection Rate | TP/(TP+FN) | [0,1] | 1.0 |
| False Positive Rate | FP/(FP+TN) | [0,1] | 0.0 |
| AUC-ROC | Area under curve | [0,1] | 1.0 |
| F1-Score | 2*(P*R)/(P+R) | [0,1] | 1.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).