Mathematical Optimization of AI-Based Document Processing Workflows Using Markov Decision Processes

Authors

  • Ranadheer Reddy Charabuddi

Keywords:

Markov Decision Process, Deep Reinforcement Learning, Document Workflow, PPO, OCR Receipts Dataset

Abstract

Conventional models tend to malfunction with stochastic task arrivals, indefinite processing times, and priority conflicts. The study presents a scalable and wise optimization model based on Markov Decision Process (MDP) along with Deep Reinforcement Learning (DRL) as Proximal Policy Optimization (PPO) to acquire optimal policies for document routing. The OCR Receipts Dataset from Kaggle (2023) is used that contains annotated receipt images. Major techniques involve feature vector encoding, filtering out noise, and balance in reward of latency, accuracy, and cost. Python implementation involves state-action modeling and multi-objective optimization to enhance task scheduling, usage of resources, and decision-making. There is improved accuracy, less processing latency, and adaptive performance compared to baseline models. The framework supports scalable automation of finance, law, and the public sector. On top of that, the PPO agent has strong learning abilities across varying workflow circumstances. The suggested system is up-scalable to other areas which need an intelligent, real-time processing of documents and routing tasks.

Downloads

Published

2025-09-27

How to Cite

Ranadheer Reddy Charabuddi. (2025). Mathematical Optimization of AI-Based Document Processing Workflows Using Markov Decision Processes. Utilitas Mathematica, 122(2), 1372–1384. Retrieved from https://utilitasmathematica.com/index.php/Index/article/view/2865

Citation Check

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.