4. Discussion
In an era of rapid technological advances in load monitoring that generate large volumes of data, the primary aim of this study was to reduce the dimensionality of external load variables derived from a LPS and to explore observation-level external-load profiles using an unsupervised data-driven approach [
16,
23,
27,
28,
36]. The findings suggest that highly interrelated external-load variables can be summarized into a smaller number of interpretable latent dimensions, which may support more efficient and transparent load monitoring in indoor team sports [
27,
28,
29].
An important methodological implication of the present study is that dimensionality reduction, when applied carefully, preserves the data's intrinsic structure rather than distorting it [
27,
29]. This was demonstrated by directly comparing clustering solutions obtained in the full standardized feature space and the PCA-reduced space. The exceptionally high agreement between the two solutions (ARI = 0.981; NMI = 0.971), combined with the nearly identical cluster-size distributions and minimal reassignment of observations, indicates that the reduced representation retains the essential characteristics of the original dataset. This finding is relevant for both data analysts and practitioners, as it suggests that dimensionality reduction can simplify complex datasets while preserving the main structure of the analyzed data [
28,
29,
37,
38]. In this context, PCA should be considered an exploratory structure-reduction technique rather than a validation method for athlete profiling [
29,
38]. Consequently, the comparison between full-space and reduced-space clustering supports the internal consistency of the identified observation-level profiles within the present dataset, but external validation remains necessary.
The PCA results provided a clear and interpretable representation of the latent structure of external load [
29,
38]. Specifically, the first principal component, explaining 37.2% of the variance, was dominated by high-intensity variables such as anaerobic activity, high-speed running, sprint actions, and very high acceleration load. This component can be interpreted as a global high-intensity external load axis, reflecting the overall explosive and metabolically demanding aspects of performance. The second principal component, explaining an additional 16.5% of the variance, captured a distinct dimension related to neuromuscular load and movement dynamics, characterized by variables such as decelerations and jump-related metrics, in contrast to lower-intensity volume indicators. Together, these components accounted for more than half of the total variance, indicating that external load in indoor team sports can be largely described by a combination of intensity-driven and neuromuscular factors [
16,
22,
29,
38]. It is important to clarify that the distinction between component retention and component interpretation is central to the methodological approach of this study. Twelve principal components were retained to exceed the 90% explained variance threshold and to ensure that the PCA-reduced clustering solution preserved the essential structure of the original dataset. However, the first five components explained approximately 78% of the total variance and captured the most coherent and practically interpretable external-load dimensions. The remaining components each contributed only a small additional percentage of variance and appeared to represent more specific or residual aspects of movement behavior. For this reason, they were included in the clustering procedure to avoid unnecessary information loss, but the practical interpretation was concentrated on the first five components.
Importantly, the clustering structure was directly aligned with these principal components. The primary separation between clusters occurred along PC1, suggesting that differences in high-intensity load represent the dominant factor distinguishing performance profiles. Secondary differentiation along PC2 further separated observations based on neuromuscular demands and movement characteristics. This strong alignment between PCA and clustering suggests that the identified clusters reflect structured patterns within the dataset rather than purely random algorithmic partitioning. For data analysts, this highlights the importance of combining unsupervised learning techniques to move beyond purely descriptive metrics and toward interpretable latent structures [
28,
35,
38]. The three-cluster solution may be interpreted as three exploratory external-load profiles rather than as arbitrary statistical groups. The smallest cluster (approximately 15% of observations) appears to represent the most demanding high-intensity profile, characterized by markedly elevated acceleration-load, anaerobic activity, and sprint-related variables. In contrast, the largest cluster was associated with more moderate and cumulative workload characteristics, including total distance and low-speed activity. The remaining cluster demonstrated a mixed neuromuscular profile with stronger contributions from jump-related and deceleration variables. This distribution suggests that the identified profiles may reflect distinct movement-demand patterns rather than simple differences in overall activity volume. Specifically, based on the PCA structure, the variable separation analysis, and the spatial distribution of the observations, the first profile appears to represent observations characterized by higher high-intensity external load. This profile is mainly associated with greater values in variables related to anaerobic activity, high-speed distance, sprinting actions, and very high acceleration load, corresponding primarily to the positive direction of PC1. From a practical perspective, this profile may reflect athletes or match periods with greater explosive and metabolically demanding activity. A second profile appears to represent observations with comparatively lower high-intensity load but greater contribution from overall volume and lower-intensity activity. This profile is consistent with the contribution of PC3 and variables such as total distance, low-speed distance, time on the playing field, and metabolic work. Although these observations may not be characterized by the highest explosive demands, they still represent meaningful cumulative workload and should not be interpreted as low importance from a monitoring perspective. The third profile appears to be distinguished more strongly by neuromuscular and mechanical load characteristics, including jump load, decelerations, and acceleration-related variables. This profile is particularly relevant in indoor team sports, where repeated braking, changes of direction, jumping, and short accelerative efforts are central to performance demands. Therefore, the three profiles may be understood as representing different combinations of high-intensity output, cumulative workload, and neuromuscular-mechanical stress. Importantly, these profiles were observed in both the full standardized feature space and the PCA-reduced space, with nearly identical cluster-size distributions and very high agreement indices. This supports the view that the profiles are not artifacts of dimensionality reduction, but consistent patterns within the analyzed dataset.
The variable separation analysis further reinforced this interpretation. Jump Load emerged as the most discriminative variable across both clustering approaches, followed by acceleration load, metabolic power, and anaerobic activity distance. These variables are all associated with high mechanical and metabolic stress, indicating that cluster differentiation is primarily driven by high-intensity and neuromuscular load characteristics [
16,
20,
22,
29]. The consistency of these findings across both analytical spaces confirms the robustness of the results and suggests that a small subset of variables can effectively capture the dataset's structure [
28,
29,
37]. The separation of the two axes and their contribution to explaining the data is consistent with the nature of indoor team sports. These sports are, by definition, intermittent, with periods of high intensity followed by periods of lower intensity [
11,
14,
16]. A key characteristic of these sports is the change of direction, accelerations and decelerations, and, overall, high-intensity actions across all three movement axes [
16,
21,
39,
40]. Between them, the sports present differences; for example, in basketball, more actions are required in the vertical axis, expressed through parameters such as jump load, whereas in futsal and handball, greater emphasis is placed on the horizontal axis, where accelerations and decelerations over short time intervals are the decisive performance factors [
16,
20,
22,
29,
39,
40].
The t-SNE projections provided an additional qualitative visualization of the clustering structure in a nonlinear low-dimensional space [
41]. In both representations, the observations showed a visually apparent three-group pattern with relatively limited overlap. However, because t-SNE is primarily a visualization technique and may exaggerate apparent separation depending on hyperparameter selection, these projections should not be interpreted as independent validation of cluster validity [
28,
37].
From a research and applied perspective, many studies in recent years have focused on identifying the key parameters that determine athletic performance and, ultimately, match outcomes [
16,
23,
35]. The parameters that are particularly recognized in futsal and handball are the ability to perform a high number of accelerations and decelerations within short time periods, as well as the ability to perform sprints [
39,
40]. However, it is also clear that indoor load monitoring systems show high validity for positional data and for variables related to the total number of accelerations, decelerations, and high-intensity actions, while they demonstrate lower accuracy in determining maximal accelerations, decelerations, and maximal speed [
42,
43,
44,
45]. The results of the present study suggest exploratory external-load profiles that also incorporate parameters related to athletes’ maximal capacities. The inclusion of athletes from different sports helps increase the sample's dispersion and supports the identification of exploratory external-load profiles. Essentially, to our knowledge, few studies have explored observation-level external-load profiling in indoor team sports using LPS-derived data [
42,
43,
44,
45]. Nevertheless, although the results are in line with the nature of these sports, they should be interpreted with caution, as the axes derived include parameters with relatively low validity [
42,
43,
44,
45].
From a practical perspective, these findings may have practical implications for both performance analysts and coaching staff. Modern tracking systems generate numerous external load variables, many of which provide redundant information [
28,
29,
37]. Monitoring all available metrics is not only impractical but may also obscure meaningful insights. The present study suggests that these variables can be summarized into a smaller number of interpretable dimensions, primarily high-intensity load, neuromuscular load, and overall workload volume, thereby allowing practitioners to focus on the most informative aspects of performance [
28,
29,
37]. Building on this and on the basis that load monitoring has dual significance in sport, as it is involved in both maximizing performance and reducing injury risk, and in guiding the return-to-play process for injured athletes [
3,
5,
6,
46], the development of hybrid parameters, calculated from multiple variables without losing the original information, might better explain the relationship between load and outcomes [
27,
28,
38].
The parameters used so far in the literature relate to accelerations, decelerations, and high-intensity actions and stem mainly from studies in soccer and American football, and to a lesser extent in futsal and handball, while research in basketball is almost nonexistent [
16,
19,
20,
22,
24,
25,
26,
39,
40]. Based on the present study's data, it appears that the identified latent dimensions group high-intensity actions within common external-load structures, creating specific profiles; however, variables of moderate or low intensity, which also contribute to the athletes’ overall load, likewise help explain the data. Longitudinal monitoring of load within each athlete profile may potentially highlight different parameters associated with both performance maximization and injury risk reduction [
6,
18,
46]. The results of this preliminary study can contribute to the development of exploratory external-load profiles in indoor team sports. Nevertheless, for these profiles to be used prospectively, field studies are needed to demonstrate their practical utility. At this stage, it is recommended that they be monitored alongside the indicators already used by coaches, and that they complement current monitoring practice only if longitudinal studies confirm their importance.
Additionally, the identification of different external-load patterns suggests that observations may respond differently to similar movement demands based on their load characteristics. However, the present findings should not be interpreted as validated individualized monitoring or training-prescription models. Instead, they provide an exploratory framework that may support future hypothesis generation and longitudinal investigation of individualized responses to training and competition demands. From a data science perspective, this reinforces the value of unsupervised learning methods for discovering hidden patterns in complex performance data [
28,
29,
35,
38].
Despite these strengths, several limitations should be acknowledged. First, the dataset included observations from multiple indoor sports, which may introduce variability related to sport-specific movement demands [
16,
39,
40]. While this broadens the ecological scope of the findings, future studies could investigate whether similar dimensional structures emerge within individual sports. It should be mentioned that the inclusion of athletes from different indoor sports was intentional, as the objective of the study was not to derive sport-specific profiles, but rather to investigate whether common latent external-load structures emerge across indoor team sports characterized by intermittent multidirectional activity. Therefore, the identified dimensions should be interpreted as generalized indoor-sport load constructs rather than sport-specific performance signatures. Second, the analysis was limited to external load variables and did not incorporate internal load measures such as heart rate, perceived exertion, or physiological responses. Integrating internal and external load data could provide a more comprehensive understanding of athlete stress and adaptation [
3,
6,
12,
13]. Third, in the model used to explain the data and to create external-load profiles, variables were included whose validity is debatable. Parameters such as maximal acceleration, deceleration, and speed show lower validity when assessed with LPS systems [
42,
43,
44,
45]. Nevertheless, they were used because they represent commonly monitored variables in indoor team-sport load analysis, and the technology used to evaluate them is currently among the most commonly used and practically applicable technologies available for indoor monitoring.
Furthermore, the unsupervised nature of the analysis means that the identified clusters were not directly linked to performance outcomes or injury risk. Although the clusters may reflect interpretable external-load structures, their practical significance would be strengthened by examining their relationship with match performance, fatigue, or injury incidence [
3,
5,
6,
46]. Future research should therefore explore the predictive value of these profiles and assess their applicability in real-world performance and health monitoring contexts. Finally, while PCA and clustering effectively reduced dimensionality and improved interpretability, alternative machine learning approaches, such as nonlinear dimensionality reduction or model-based clustering, may provide additional insights into the structure of external load data [
27,
35,
41]. Future work could compare different analytical techniques and evaluate their relative performance in identifying meaningful external-load profiles.
Additionally, the sample size of the present study was relatively modest compared with the dimensionality of the dataset, which may influence PCA stability, clustering reproducibility, and external generalizability. Nevertheless, the PCA adequacy metrics and the stability analyses across random seeds supported the internal consistency of the dimensionality-reduction and clustering procedures within the present dataset. Furthermore, no external validation cohort was available, and repeated-measures longitudinal modeling was not performed. Consequently, the temporal stability and individual responsiveness of the identified external-load profiles remain unknown and should be investigated in future longitudinal studies.
Moreover, K-means clustering assumes compact and approximately spherical cluster structures and may be sensitive to initialization procedures, outliers, and covariance structure. In addition, contextual variables such as playing position, tactical role, match status, opponent characteristics, and situational game demands were not incorporated into the present analysis. Future research should therefore examine whether the identified latent dimensions and exploratory external-load profiles remain stable when these contextual and sport-specific factors are considered.