Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Multi-Entity Real-Time Fraud Detection System using Machine Learning: Improving Fraud Detection Efficiency using FROST-Enhanced Oversampling

Version 1 : Received: 20 April 2024 / Approved: 22 April 2024 / Online: 23 April 2024 (11:50:51 CEST)

How to cite: PRASAD, M.; SRIKANTH, T. Multi-Entity Real-Time Fraud Detection System using Machine Learning: Improving Fraud Detection Efficiency using FROST-Enhanced Oversampling. Preprints 2024, 2024041461. https://doi.org/10.20944/preprints202404.1461.v1 PRASAD, M.; SRIKANTH, T. Multi-Entity Real-Time Fraud Detection System using Machine Learning: Improving Fraud Detection Efficiency using FROST-Enhanced Oversampling. Preprints 2024, 2024041461. https://doi.org/10.20944/preprints202404.1461.v1

Abstract

Fraudulent transactions pose a significant threat to financial institutions and e-commerce platforms.Machine learning models, trained on historical labeled data (fraudulent vs. legitimate transactions), are often employed to identify and prevent fraud. However, real-world datasets frequently exhibit class imbalance, where fraudulent transactions (minority class) are significantly outnumbered by legitimate transactions (majority class). Machine learning models may perform poorly as a result of this imbalance, underestimating fraud and favouring the majority class. This paper proposes a novel approach to address class imbalance and improve fraud detection accuracy. We explore the implementation of FROST (Feature space RObust Synthetic saTuration) oversampling, a technique specifically designed to generate synthetic samples for the minority class. The FROST function leverages the k-nearest neighbors (KNN) algorithm and a user-defined amplification factor (m) to create synthetic data points that closely resemble existing minority class instances. We integrate the FROST-enhanced oversampling technique into the machine learning pipeline for fraud detection. The paper evaluates the effectiveness of this approach compared to traditional oversampling methods and analyzes its impact on classification accuracy metrics.

Keywords

Classification, sampling, majority class, minority class, Classifier, Smote, Frost, k-nearest neighbors, Random Forest

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.