Integrating Deep Learning and Sentiment Analysis for English Premier League Match Forecasting
Keywords:
LSTM, KOD, ANN, RNNAbstract
This research presents an ensemble framework that predicts the outcomes of English Premier League matches by integrating historical match statistics with Twitter sentiment analysis. Employing the CRISP-DM methodology, the study combines a Long Short-Term Memory (LSTM) model trained on twenty seasons of match data with sentiment insights derived from tweets collected a week before each game. Feature engineering and dimensionality reduction techniques were applied to enhance model efficiency and address multicollinearity issues. The LSTM model achieved a prediction accuracy of 70%, outperforming other machine learning algorithms like Decision Trees, Random Forests, and SVMs. Sentiment analysis of over 10,000 tweets per week provided additional predictive power. By ensembling the outputs of the LSTM and sentiment models using a weighted average approach (70:30 ratio), the system consistently predicted 7–8 correct match outcomes per week. The results demonstrate that integrating social media signals with historical data significantly improves predictive accuracy, offering a robust approach for forecasting football match results.