A Machine Learning Framework for Team Success Classification in Professional Football: A Pilot Study Using Premier League Performance Data

Rayvanth Sankar Ravichandran; Nor Samsiah Sani

doi:10.20944/preprints202605.1076.v1

Submitted:

14 May 2026

Posted:

18 May 2026

You are already at the latest version

Abstract

In the era of data-driven decision-making, the pursuit of competitive excellence in professional football has evolved beyond instinct and tradition. This research explores the question: What makes a football team successful? — by adopting a team-centric machine learning approach grounded in performance analytics. Using a comprehensive dataset of Premier League player statistics from 1992 to 2019, the study aims to develop predictive models that can identify the key performance indicators (KPIs) that drive team success over time. Chapter I establishes the research background, problem statement, and objectives, emphasizing the growing relevance of artificial intelligence in modern football analysis. Chapter II presents a critical review of existing literature on sports analytics and machine learning, highlighting methodological gaps in explainable, team-focused success modelling. Chapter III details a structured methodology based on the CRISP-DM framework, encompassing data preprocessing, feature engineering, performance tier formulation, feature selection strategies, and supervised learning model development. Three supervised classification models-Logistic Regression, Random Forest, and Gradient Boosting—were implemented and evaluated using metrics including Accuracy, F1-Score, ROC-AUC, and confusion matrices. Ensemble learning techniques, including voting and stacking, were further explored to enhance predictive robustness. Model stability was assessed through 5-fold stratified cross-validation, and paired t-tests on cross-validated F1-scores indicated no statistically significant performance differences between models (p > 0.05). Gradient Boosting demonstrated consistently strong performance (mean F1-score ≈ 1.00), low variance across folds, and superior interpretability, supporting its selection as the primary base learner within the final ensemble framework. To address model transparency, SHAP (SHapley Additive exPlanations) was applied at both team and player levels, enabling granular interpretation of feature contributions to success predictions. The findings reveal that attacking efficiency, defensive stability, and disciplinary control consistently influence successful team outcomes. Beyond predictive accuracy, the study proposes practical decision-support extensions, like performance tiering, highlighting the real-world applicability of the framework. This project ultimately aims not only to predict success but to uncover why certain teams win—offering insights that could inform coaching, scouting, and strategy. The outcome is a step forward in applying AI to assist the beautiful game to further evolve.

Keywords:

machine learning

;

explainable AI

;

SHAP

;

football analytics

;

gradient boosting

;

sports data mining

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Machine Learning Framework for Team Success Classification in Professional Football: A Pilot Study Using Premier League Performance Data

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe