AN EXPLAINABLE ENSEMBLE LEARNING FRAMEWORK USING SHAP AND XGBOOST FOR EARLY PREDICTION OF HEART DISEASE FROM CLINICAL AND LIFESTYLE DATA
Keywords:
Heart Disease, XGBoost, SHAP, Explainable AI, Clinical DataAbstract
Cardiovascular diseases continue to be a major contributor to global mortality rates, necessitating the development of Reliable, Explainable, And Accurate early diagnostic tools. This research introduces a Novel Hybrid Machine Learning Framework that integrates Extreme Gradient Boosting (XGBoost) with Shapley Additive Explanations (SHAP) to forecast heart disease using a Comprehensive Combination of clinical and lifestyle attributes. The model utilizes the UCI Heart Disease Dataset, enriched with Synthetically Generated lifestyle features to Improve Feature Diversity and Model Generalization. Comprehensive data cleaning and Advanced Feature Selection Methods are applied to enhance predictive performance. Compared to conventional algorithms such as logistic regression, support vector machines, and random forest, XGBoost achieved Superior Performance Metrics, including a 92.6% accuracy rate and an AUC-ROC score of 0.96. SHAP was used to clarify the impact of individual features both at the dataset level and for specific predictions, with significant influence noted from factors like chest pain type, age, and cholesterol. The interpretable framework demonstrates Significant Potential for Clinical Decision Support, improving both accuracy and understanding of model decisions.











