Submitted:
29 June 2025
Posted:
30 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- To propose and validate an integrated, multi-stage analytics framework tailored to elite sports environments;
- To demonstrate its practical effectiveness using detailed synthetic scenarios representative of real-world challenges;
- To quantify the framework’s impact on injury rates, tactical accuracy, and biomechanical optimization, bridging the gap between theoretical analytics and actionable interventions.
2. Literature Review
2.1. Foundations
2.2. Decision-Making and Big Data in Sports
3. Materials and Methods
3.1. Synthetic Data Generation and Validation
3.1.1. Group Sizes
- Football: a total of 40 synthetic players were simulated to represent a full-season squad, each of them being “active” throughout all simulated events.
- Basketball: we simulated a standard National Basketball Association (NBA)-style roster over an 82-game season, with each game generating approximately 50 decision events per team, providing detailed contextual and fatigue-related scenarios.
- Athletics: we created data for 12 synthetic sprinters, covering a distribution of baseline 100 m personal best times ranging between 10.00 s and 15.00 s. Joint-angle distributions (hip flexion angle (HFA), knee extension velocity (KEV), ankle dorsiflexion at initial contact (ADIC)) were drawn from a Gaussian mixture model fitted to a small reference dataset (n = 50) compiled from publicly reported sprint biomechanics in elite athlete cohorts [18].
3.1.2. Data Generation Methodology
- Football scenario: synthetic data were generated based on typical physiological (heart rate, muscle oxygen saturation), biomechanical (GPS-derived speed, acceleration), and subjective (wellness questionnaires) parameters. Athlete workload metrics (e.g., acute:chronic distance ratios, oxygen depletion) were sampled from normal distributions reflecting literature-reported ranges [19]. Injury events were probabilistically generated using a Bernoulli model with a baseline injury rate of approximately 12%, aligning with established epidemiological reports in football [20].
- Basketball scenario: decision-making under fatigue was simulated by creating datasets integrating biomechanical fatigue metrics (jump power, agility scores) and tactical variables (score differentials, time remaining). Decision events were generated using probabilistic decision trees, parameterized by domain-expert consensus and recent basketball analytics studies [21,22].
- Empirical Distribution Fitting: we collected publicly available statistics, such as hamstring-strain rates and joint-angle means and variances for athletes based on existing literature [23,25,26]. For each continuous feature (e.g., PlayerLoad™, hip flexion angle, knee extension velocity), we fitted either a Gaussian mixture model (if bimodal) or a single Gaussian distribution (if unimodal) to the real-world sample data [27].
- Noise Injection and Variability: after sampling from the fitted distributions, we injected zero-mean Gaussian noise with a standard deviation equal to 5% of the feature’s mean. For example, if the mean hip flexion angle (HFA) at block exit was 45° in the real data, we simulated individual athlete angles as HFA_i ∼ (45°, (0.05 × 45°)^2) [28].
- Event Undersampling & Synthetic Minority Oversampling Technique for Time Series (SMOTE-TS): for rare events (e.g., hamstring strains, decision-making errors under fatigue), we utilized SMOTE-TimeSeries to oversample minority-class sequences [29]. This ensured that approximately 10% of match/training instances in football carried an “injury-risk” label, closely approximating real-world season-long injury incidence rates.
- Time-Stamp & Sequence Construction: for Football and Basketball scenarios, we generated timestamped sequences for each event at 1-Hz resolution during training sessions (lasting 90 minutes each) and at 10 Hz during match simulations. For Athletics, joint-angle streams were initially generated at 1,000 Hz (simulating high-speed camera data), then downsampled to 100 Hz with Kalman smoothing applied prior to analytical modeling [30]
3.1.3. Data Validation Procedures
- Stratified 5-fold cross-validation: selected due to its optimal balance between bias reduction and variance control in moderately sized datasets [31].
- Train-validation-test split (60/20/20): standard practice in sports analytics, ensuring adequate data for training, validation, and independent testing [32].
3.2. Data Collection

3.3. Data Processing
3.4. Analytical Modeling and Training Procedure
3.5. Validation & Feedback
-
Cross-Validation Techniques - each analytical model underwent rigorous cross-vaidation tailored to its specific data structure:
- for the injury risk classification model (Football), a stratified 5-fold cross-validation approach was implemented to preserve class balance between injured and non-injured instances. This method was selected for its capacity to provide robust model validation while maintaining computational efficiency, thereby effectively addressing the bias–variance trade-off, as recommended in the sports analytics literature [58,60].
- for the tactical decision engine (Basketball), rolling window validation across game sequences simulated real-time deployment scenarios, maintaining temporal fidelity [61].
- Additionally, leave one team out validation was employed for synthetic datasets to mimic variability across different team environments, enhancing generalizability [33].
- for the performance prediction network (Athletics), a 60/20/20 train–validation–test split was employed, with early stopping criteria to mitigate overfitting. [56,57]. Training data were derived from the synthetic datasets described in Section 3.1, with labels generated through rule-based logic reflecting domain knowledge. For instance, injury events were tagged using probabilistic thresholds on load variables [62]; tactical decisions were labeled to simplified game-state reward structures [63]; and sprint improvements were associated with specific biomechanical configurations [54]. Stratified sampling was applied where relevant to preserve class balance [60]. This approach ensured clear separation between training, model optimization, and independent evaluation datasets, aligning with standard practices in predictive analytics [58].
-
Statistical analyses - to evaluate the statistical significance of intervention outcomes, the following statistical tests were explicitly conducted:
- Paired-sample t-tests: used to evaluate whether the difference in pre- and post-intervention metrics within the same group is statistically significant - that is, unlikely to be due to random chance. Significance was defined as p < 0.05 [67].
- Effect sizes: calculated using Cohen’s d, which quantifies the standardized difference between pre- and post-intervention values. This provides an interpretable scale of effect magnitude (e.g., d = 0.2 = small, 0.5 = medium, 0.8 = large) [68].
- All statistical analyses were performed using standard statistical software (SPSS or Python-based packages) [69].
- Proof-of-Concept Validation and Future Deployment - although all data used in this study were synthetic, model performance was evaluated using rigorous statistical validation (e.g., cross-validation, train-test splits) and expert review sessions involving coaches and sports scientists. This validation approach does not aim to prove empirical generalizability, but rather to assess the internal logic, robustness, and operational feasibility of the proposed framework under controlled and replicable conditions. Such simulation-based validation is widely used as a precursor to real-world deployment, particularly when working with sensitive or restricted data [74,75]. Future work will apply this framework to empirical datasets, pending ethical approval and institutional access.
3.6. Visualization and Decision Support Systems
3.7. Ethical Considerations and Synthetic Data Justification
4. Decision Making Frameworks
- Define Objectives - the process begins by convening a cross functional steering group - coaches, sport scientists, medical staff, and data engineers - to translate organizational performance goals into Specific, Measurable, Achievable, Relevant, Time-bound (SMART) objectives. Targets may include reductions in non-contact injuries, improvements in sprint times, or tactical efficiency gains. These are linked to key performance indicators (KPIs) and acceptable risk thresholds, with clear success criteria and failure modes documented. Governance protocols around data ethics, privacy (e.g., GDPR/HIPAA), and stakeholder approvals are established here.
- Data Acquisition - based on the defined objectives, a gap analysis of instrumentation is performed. Appropriate sensors (e.g., Global Positioning System (GPS)/Real-Time Kinematic (RTK) systems, IMUs ≥ 500 Hz, stereoscopic cameras, heart rate monitors) are selected and piloted. Athlete self report tools (e.g., wellness questionnaires, RPE scales) are also deployed. Data schemas and anticipated volume are documented to inform infrastructure design.
- Data Integration - heterogeneous data streams are ingested into a centralized data lake or feature store. ETL workflows (e.g., Airflow Directed Acyclic Graphs (DAGs)) standardize time bases (via PTP), correct for sensor drift, and impute missing data using statistical models. Schema registries enforce data structure standards, and metadata catalogs track data lineage. Incremental Change Data Capture (CDC) ensures timely updates from training and competition sources.
- Analytics and Modeling - analytics progress through four stages: 1) Descriptive: automated dashboards summarize load, movement, and performance metrics; 2) Diagnosis: correlation matrices and causal inference models performance- infuencing factors; 3) Predictive: machine learning models forecast injury risk, fatique, or game outcomes; 4) Prescriptive: optimization models or reinforcement learning agents interventions aligned with the defined KPIs. Models are deployed to production environments with version control and monitored for performance drift.
- Decision Support - insights are delivered via multimodal interfaces: dashboards, mobile alerts, or smart devices (e.g., watches, helmets). Each recommendation includes explanations (e.g., SHAP, CI - confidence intervals, counterfactuals), ensuring transparency. Access control restricts data visibility based on stakeholder roles.
- Implementation and Monitoring - interventions (e.g., training load changes or tactical adjustments) are implemented through A/B testing or controlled pilots. Adherence metrics and outcomes are logged. Dashboards monitor KPI progression, and automated model retraining occurs monthly or when drift is detected. Feedback loops ensure continuous optimization.

5. Case Studies



6. Results
6.1. Key Quantitative Outcomes
6.2. Quantitative Impact
6.3. Visualization for Decision Support
7. Discussion
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Ethics Statement
References
- Camomilla V, Bergamini E, Fantozzi S, Vannozzi G. Trends supporting the in-field use of wearable inertial sensors for sport performance evaluation: A systematic review. Sensors. 2018, 18, 873. [Google Scholar] [CrossRef]
- Claudino JG, Cardoso Filho CA, Bittencourt NFN, [et al.] Integrated approaches to athlete monitoring and injury prevention: lessons from the field. Sports Med. 2019, 49, 1245–1259. [Google Scholar] [CrossRef]
- Cust EE, Sweeting AJ, Ball K, Robertson S. Machine and deep learning for sport-specific movement recognition: A systematic review of model development and performance. J Sports Sci. 2019, 37, 568–600. [Google Scholar] [CrossRef]
- Rossi A, Perri E, Trecroci A, Savino M, Alberti G, Iaia FM. GPS data reflect players’ internal load in soccer: A comparison with physiological variables. Int J Sports Physiol Perform. 2017, 12, 1220–1224. [Google Scholar] [CrossRef]
- Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017, 30, 4765–74. [CrossRef]
- Stein M, Janetzko H, Lamprecht A, Breitkreutz T, Zimmermann P, Goldlücke B, et al. Bring it to the pitch: Combining video and movement data to enhance team sport analysis. IEEE Trans Vis Comput Graph. 2018, 24, 13–22. [Google Scholar] [CrossRef] [PubMed]
- Herold M, Goes F, Nopp S, Bauer P, Thompson C, Meyer T. Machine learning in men’s professional football: Current applications and future directions for improving attacking play. Sports Med Open. 2019, 5, 39. [Google Scholar] [CrossRef]
- Perin C, Vuillemot R, Stolper CD, Stasko JT, Wood J, Carpendale S. State of the art of sports data visualization. Comput Graph Forum. 2018, 37, 663–86. [Google Scholar] [CrossRef]
- Chambers R, Gabbett TJ, Cole MH, Beard A. The use of wearable microsensors to quantify sport-specific movements. Sports Med. 2015, 45, 1065–81. [Google Scholar] [CrossRef]
- Gill SS, Tuli S, Xu M, Singh I, Singh KV, Lindsay D, et al. Transformative effects of IoT, blockchain and artificial intelligence on cloud computing: Evolution, vision, trends and open challenges. Internet Things. 2019; 8, 100118. [CrossRef]
- Qiao X, Yang Z, Liu Y, Liu X. Performance evaluation of containerization in edge–cloud computing stacks for industrial applications: a client perspective. J Cloud Comput. 2020, 9, XX. [Google Scholar] [CrossRef]
- Amazon Web Services, Inc. MQTT bridge and edge compute architecture in AWS IoT Greengrass. AWS IoT Greengrass Developer Guide [Internet]. 2024. Available from: AWS Documentation.
- Chen J, Ran X. Deep Learning With Edge Computing: A Review. Proc IEEE. 2019, 107, 1655–74. [Google Scholar] [CrossRef]
- Ranjan R, Rana O, Nepal S, Yousif M, James P, Wen Z, et al. The next grand challenges: Integrating the Internet of Things and Data Science. IEEE Cloud Comput. 2018, 5, 12–26. [Google Scholar] [CrossRef]
- Hesse G, Matthies C, Perscheid M, Uflacker M. Real-time stream processing in sports analytics. IEEE Trans Parallel Distrib Syst. 2019, 30, 1715–28. [Google Scholar] [CrossRef]
- Link D, Lang S, Seidenschwarz P. Real time quantification of dangerousity in football using spatiotemporal tracking data. PLoS ONE. 2016, 11, e0168768. [Google Scholar] [CrossRef]
- Thornton HR, Delaney JA, Duthie GM, Dascombe BJ. Importance of various training-load measures in injury incidence of professional rugby league athletes. Int J Sports Physiol Perform. 2017, 12, 819–24. [Google Scholar] [CrossRef]
- Reynolds DA. Gaussian Mixture Models. In: Li SZ, Jain AK, editors. Encyclopedia of Biometrics. Boston, MA: Springer; 2009. p. 659–663. [CrossRef]
- Malone S, Owen A, Newton M, Mendes B, Collins KD, Gabbett TJ. The acute:chronic workload ratio in relation to injury risk in professional soccer. J Sci Med Sport. 2017, 20, 561–5. [Google Scholar] [CrossRef]
- Ekstrand J, Hägglund M, Waldén M. Epidemiology of muscle injuries in professional football (soccer). Am J Sports Med. 2011, 39, 1226–1232. [Google Scholar] [CrossRef]
- Sampaio J, McGarry T, Calleja-González J, Jiménez Sáiz SL, Schelling i del Alcázar X, Balciunas M. Exploring game performance in the NBA using player tracking data. PLoS ONE. 2015, 10, e0132894. [Google Scholar] [CrossRef]
- Badau, D.; Badau, A.; Ene-Voiculescu, V.; Ene-Voiculescu, C.; Teodor, D. F.; Sufaru, C.; Dinciu, C. C.; Dulceata, V.; Manescu, D. C.; Manescu, C. O. El Impacto De Las tecnologías En El Desarrollo De La Veloci-Dad Repetitiva En Balonmano, Baloncesto Y Voleibol. Retos 2025, 64, 809–824. [Google Scholar] [CrossRef]
- Bezodis NE, Kerwin DG, Salo AI. Joint angular kinematics in sprint acceleration: a comparison across performance levels. J Sports Sci. 2021, 39, 583–92. [Google Scholar] [CrossRef]
- Badau, D.; Badau, A.; Joksimović, M.; Manescu, C. O.; Manescu, D. C.; Dinciu, C. C.; Margarit, I.R.; Tudor, V.; Mujea, A.M.; Neofit, A.; et al. Identifying the Level of Symmetrization of Reaction Time According to Manual Lateralization between Team Sports Athletes, Individual Sports Athletes, and Non-Athletes. Symmetry 2024, 16. [Google Scholar] [CrossRef]
- UEFA. UEFA Elite Club Injury Study 2022: Season report. Union of European Football Associations (UEFA); 2022.
- FIFA Medical Network. FIFA Injury Report: Men’s Football World Cup Russia 2018; FIFA: Zurich, 2019. [Google Scholar]
- Bishop, CM. Pattern recognition and machine learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Ibañez J, Serrano JI, Castillo MD, Minguez J, Pons JL. Evaluating artificial variability in EMG signals for neuromuscular modeling. IEEE Trans Neural Syst Rehabil Eng. 2015, 23, 399–407. [Google Scholar] [CrossRef]
- Luo C, Li J, Zhang B, Wang H, Song Q. T-SMOTE: Temporal-oriented synthetic minority oversampling for imbalanced time-series classification. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22); 2022 Jul 23–29; Vienna, Austria. pp. 2406–14. [CrossRef]
- De Groote F, De Laet T, Jonkers I, De Schutter J. Kalman smoothing improves the estimation of joint kinematics and kinetics in marker-based human gait analysis. J Biomech. 2008, 41, 3390–8. [Google Scholar] [CrossRef]
- Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI); 1995 Aug 20; Montréal, Québec, Canada. p. 1137–43.
- Claudino JG, Capanema DO, de Souza TV, Serrão JC, Machado Pereira AC, Nassis GP. Current approaches to the use of artificial intelligence for injury risk assessment and performance prediction in team sports: A systematic review. Sports Med Open. 2019, 5, 28. [Google Scholar] [CrossRef] [PubMed]
- Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: Data mining, inference, and prediction. 2nd ed. New York: Springer; 2009. Chapter 7, Model assessment and selection; p. 219–260. [CrossRef]
- Bartlett JD, O’Connor F, Pitchford N, Torres-Ronda L, Robertson SJ. Relationships between internal and external training load in team-sport athletes: Evidence for an individualized approach. Int J Sports Physiol Perform. 2017, 12, 230–4. [Google Scholar] [CrossRef]
- Cronin, NJ. Using deep neural networks for kinematic analysis: Challenges and opportunities. J Biomech. 2021; 123, 110460. [Google Scholar] [CrossRef]
- Saw AE, Main LC, Gastin PB. Monitoring athletes through self-report: Factors influencing implementation. J Sports Sci Med. 2015, 14, 137–46. [Google Scholar]
- Van Eetvelde H, Mendonça LD, Ley C, Seil R, Tischer T. Machine learning methods in sport injury prediction and prevention: A systematic review. J Exp Orthop. 2021, 8, 27. [Google Scholar] [CrossRef]
- Kreps J, Narkhede N, Rao J. Kafka: A distributed messaging system for log processing. In: Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB’11); 2011 Jun; Athens, Greece. p. 1–7.
- Rossi A, Pappalardo L, Cintia P, Iaia FM, Fernández J, Medina D. Effective injury forecasting in soccer with GPS training data and machine learning. PLoS ONE. 2018, 13, e0201264. [Google Scholar] [CrossRef]
- Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS 2017); 2017 Dec 4–9; Long Beach, CA.
- Buchheit M, Simpson BM. Player-tracking technology: half-full or half-empty glass? Int J Sports Physiol Perform. 2017;12(Suppl 2):S235–S241. [CrossRef]
- Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012, 13, 281–305. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit Lett. 2006, 27, 861–74. [Google Scholar] [CrossRef]
- Lundberg SM, Erion GG, Lee SI. Consistent individualized feature attribution for tree ensembles. arXiv [Preprint]. 2018 Feb. arXiv:1802.03888. [CrossRef]
- Stoltzfus, JC. Logistic regression: a brief primer. Acad Emerg Med. 2011, 18, 1099–104. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst (NeurIPS), 2017; 30, 3146–3154.
- Puterman ML. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley-Interscience; 2005. Chapter 2, Foundations; p. 17–60. ISBN 978-0471727828.
- Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature. 2015, 518, 529–33. [CrossRef]
- Sutton RS, Barto AG. Reinforcement learning: An introduction. 2nd ed. Cambridge, MA: MIT Press; 2018. Chapter 6–8. ISBN 978-0262039246.
- Bartlett, R. Introduction to Sports Biomechanics: Analysing Human Movement Patterns. London: Routledge; 2007.
- Cronin J, Hansen KT. Strength and power predictors of sports speed. J Strength Cond Res. 2006, 20, 349–357. [Google Scholar]
- Nagahara R, Mizutani M, Matsuo A, Kanehisa H, Fukunaga T. Association of sprint performance with ground reaction forces during acceleration and maximal speed phases in a single sprint. J Appl Biomech. 2018, 34, 104–110. [Google Scholar] [CrossRef] [PubMed]
- Morin JB, Bourdin M, Edouard P, Peyrot N, Samozino P, Lacour JR. Mechanical determinants of 100-m sprint running performance. Eur J Appl Physiol. 2012, 112, 3921–3930. [Google Scholar] [CrossRef]
- Bezodis NE, North JS, Razavet JL. Alterations to the orientation of the ground reaction force vector affect sprint acceleration performance in team sports athletes. J Sports Sci. 2017, 35, 1–8. doi:10.1080/02640414.2016.1239024. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv [Internet]. 2014 Dec 22 [cited 2025 Jun 27];arXiv:1412.6980.
- Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv [Internet]. 2014; arXiv:1412.6980.
- Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014, 15, 1929–1958. [Google Scholar]
- James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning: with Applications in R, Springer: New York, 2013.
- Japkowicz N, Shah M. Evaluating Learning Algorithms: A Classification Perspective, 2011. [CrossRef]
- Fernández A, Garcia S, Herrera F, Chawla NV. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
- Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. 2nd ed. Melbourne: OTexts; 2018.
- Bring, J. How to standardize regression coefficients. Am Stat. 1994, 48, 209–213. [Google Scholar] [CrossRef]
- Bahr R, Holme I. Risk factors for sports injuries—a methodological approach. Br J Sports Med. 2003, 37, 384–392. [Google Scholar] [CrossRef] [PubMed]
- Rein R, Raabe D, Memmert D. “Which pass is better?” Novel approaches to assess passing effectiveness in elite soccer. Hum Mov Sci. 2017, 55, 172–181. [Google Scholar] [CrossRef]
- Field, A. Discovering Statistics Using IBM SPSS Statistics. 5th ed. London: Sage Publications; 2017.
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988. [CrossRef]
- Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat Methods. 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
- Friedman, JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Robertson PS, Manley AJ, Davis O, Hooper SL. Expert consensus as a method of validation in sports science. Int J Sports Physiol Perform. 2021, 16, 2–9. [Google Scholar] [CrossRef]
- Raab M, Lobinger B, Hoffmann S, Pizzera A, Laborde S. Performance Psychology: Perception, Action, Cognition, and Emotion. London: Academic Press; 2016. [CrossRef]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2nd ed. Leanpub; 2022. Available from: https://christophm.github.io/interpretable-ml-book/.
- Balasubramanian, A. End-to-end model lifecycle management: An MLOps framework for drift detection, root cause analysis, and continuous retraining. Int J Multidiscip Res Growth Eval. 2020, 1, 92–102. [Google Scholar] [CrossRef]
- Lacson R, Carrodeguas E, Swanson W, et al. Machine Learning Model Drift: Predicting Diagnostic Imaging Follow-Up as a Case Example. J Am Coll Radiol. 2022, 19, 1162–9. [Google Scholar] [CrossRef]
- Marcelino-Silva F, Bessa-Barbosa Á, Rebelo-Gonçalves R. Synthetic Data for Sharing and Exploration in High-Performance Sport: applicability of sequential tree-based algorithms in athlete monitoring data. Sports Med Open. 2025, 11, 10. [Google Scholar] [CrossRef]
- Naughton M, Weaving D, Scott T, Compton H. Synthetic Data as a Strategy to Resolve Data Privacy and Confidentiality Concerns in the Sport Sciences: Practical Examples and an R Shiny Application. Int J Sports Physiol Perform. 2023, 18, 1213–8. [Google Scholar] [CrossRef]
- Lin T, Chen Z, Beyer J, Wu Y, Pfister H, Yang Y. The Ball is in Our Court: Conducting Visualization Research with Sports Experts. IEEE Trans Vis Comput Graph. 2024, 30, 719–29. [Google Scholar] [CrossRef]
- Davis J, Bransen L, et al. Methodology and evaluation in sports analytics: challenges, approaches, and lessons learned. Mach Learn. 2024, 113, 6977–7010. [Google Scholar] [CrossRef]
- Bartolomeo, J. New in Grafana 8.0: Streaming real-time events and data to dashboards. Grafana Labs Blog. 2021. [Google Scholar]
- Khattach O, Moussaoui O, Hassine M. End-to-End Architecture for Real-Time IoT Analytics and Predictive Maintenance Using Stream Processing and ML Pipelines. Sensors. 2025, 25, 2945. [Google Scholar] [CrossRef]
- dos Santos NA, Almeida AR, et al. Artificial intelligence and Machine Learning approaches in sports. Braz J Phys Ther. Ahead of print. 2024. [CrossRef]
- Juliano E, Thakkar C, Taber C, Raval MR, et al. A Dynamic Online Dashboard for Tracking the Performance of Division 1 Basketball Athletic Performance. In: Proc Int Sports Anal Conf & Exhibition (ISACE), Singapore; 2023. [CrossRef]
- Grafana Labs. Grafana: The open observability platform.




| Sport | Synthetic Sample Size | Description |
|---|---|---|
| Football | 40 players | Full-season squad: 18 frequent starters + 22 rotating substitutes; |
| Basketball | 15 players | NBA-style roster across 82 simulated games |
| Athletics | 12 sprinters | 12 simulated athletes with 100 m times from 10’–15’ |
| Use Case | Model | Input Features |
Key Parameters |
Purpose |
|---|---|---|---|---|
| Football: Injury Risk |
LightGBM Classifier | ACDR, HODI,FSS (GPS, NIRS, self-reports) |
max_depth=6, learning_rate=0.1, SHAP explanations |
Binary risk classification |
| Basketball: Tactical Decisions | DQN over MDP |
FAJP, APS, EVD (IMU + contextual states) |
γ=0.95, ε=0.1, reward = win prob. differential |
Real-time decision optimization |
| Athletics: Sprint Gains |
Multivariate Linear Regression |
HFA, KEV, ADIC (pose data) |
OLS, standardized β coefficients | Biomechanical performance prediction |
| Model | Validation Strategy |
Interpretability Tool |
Statistical Output |
|---|---|---|---|
| Football (LightGBM) |
Stratified 5-fold CV + SHAP |
Global + local SHAP values |
AUC = 0.87; Injury ↓ 12%, t = 2.78, p = 0.012 |
| Basketball (DQN) |
Rolling window + EVD alert thresholds |
Decision coefficient analysis |
Decision ↑ 16%, turnovers ↓ 22%, p < 0.01 |
| Athletics (Linear) |
Train/val/test split + early stopping |
Regression coefficients |
Sprint ↓ 8%, d = 0.94, p < 0.001 |
| Case Study | Objective | Method | Outcome |
|---|---|---|---|
| Football | Hamstring Injury Reduction | LightGBM + SHAP |
↓12% injury, ↓30% flagged players |
| Basketball | Decision-Making Accuracy Improvement |
Logistic Regression | ↑16% accuracy, ↓22% turnovers |
| Athletics | Sprint Mechanics Optimization |
Linear Regression | ↓8% sprint time |
| Case Study | Metric Compared |
Δ (Change) | p-value | 95% CI | Effect Size (Cohen’s d) |
|---|---|---|---|---|---|
| Football | Injury Rate ↓ | –12% | 0.012 | [– 20%, – 3%] | 0.65 (Medium–Large) |
| Football | Flagged Injury ↓ | –30% | 0.007 | [– 42%, – 18%] | 0.72 (Large) |
| Baskeball | Decision Making Accuracy ↑ |
+16% | 0.008 | [+ 3%, + 19%] | 0.71 (Large) |
| Basketball | Turnovers ↓ | –22% | 0.017 | [– 34%, – 10%] | 0.63 (Medium–Large) |
| Athletics | Sprint Time ↓ | –0.90s | <0.001 | [– 1.23, – 0.57]s | 0.94 (Large) |
| Case Study |
Primary Model |
Performance Metric 1 |
Performance Metric 2 |
Interpretability Tools |
|---|---|---|---|---|
| Football Injury Risk |
Light GBM Classifier |
AUC-ROC 0.87 |
Injury rate↓12%, Flagged ↓30% |
SHAP Values |
| Basketball Decision Making |
Logistic Regression |
Optimal Choice ↑16% | Turnovers ↓22% |
Model Coefficients |
| Athletics Sprint Mechanics |
Multivariate Linear Regression |
100 m Sprint Time Reduction ↓8% |
Joint Angles Optimized |
Regression Coefficients |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).