Real-Time Fraud ML on Spark Structured Streaming: Micro-Batch vs. Continuous Processing

Authors

  • Bhargav Vadgama

Keywords:

Real-Time Fraud Detection, Apache Spark Structured Streaming, Micro-Batch vs. Continuous Processing, Machine Learning Inference, Scalability and Latency

Abstract

Digital financial transactions have become highly vulnerable to fraudulent activities due to their growing volume and velocity, which necessitates real-time fraud detection systems. This article critically examines the analysis of machine learning-based fraud detection using Apache Spark Structured Streaming, which has two processing modes: Micro-Batch and Continuous Processing. A usable pipeline is trained and constructed on the IEEE-CIS Fraud Detection dataset, which incorporates feature engineering, supervised learning models, and stream processing to perform near-real-time fraud classification. Complex transformations and fault tolerance, supported by the Micro-Batch mode, have demonstrated reliability and analytical power with a marginal increase in latency. Continuous Processing mode offers significantly improved latency and throughput, is best suited for quickly issuing alerts in high-risk environments, and is limited in its transformations and recovery actions. Comprehensive experimentation compares the two modes in terms of latency, precision, recall, resource utilization, and reliability, assessing the operations that can be achieved. The results indicated that no one mode is inherently optimal; instead, they should be chosen according to particular use case guidelines. The paper concludes with practical advice, outlining the existing inadequacies of Spark used in Continuous mode, and presents opportunities for future exploration, including active learning, graph ML, and hybrid systems that would offer the best of both worlds in real-time fraud mitigation.

Downloads

Published

2025-09-09

How to Cite

Bhargav Vadgama. (2025). Real-Time Fraud ML on Spark Structured Streaming: Micro-Batch vs. Continuous Processing. Utilitas Mathematica, 122(2), 856–879. Retrieved from https://utilitasmathematica.com/index.php/Index/article/view/2786

Citation Check

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.