3.1. Regression model
The regression model developed from the variables detailed above explains 84.8% of the differences in the fees paid for the 8,389 transfers included in the sample. As indicated, all the variables selected reach a level of significance greater than 95%. The t-value shows the relative weight of each variable in the determination of players’ prices.
With a t-value of 45.58, the remaining contract duration is an extremely important variable in the formation of players’ prices on the transfer market. It is transformed on a logarithmic scale, which implies that the shorter the remaining contract period, the greater its impact. As a matter of example, all other things being equal, when a player has four years remaining on his contract, one year less implies a 13% drop in his value, whereas the decrease is 29% if the footballer only had two years remaining on his contract.
The t-value is also very high for age, in this case with a negative sign (-51.09). Given their longer-term potential, younger players are comparatively better valued than older ones. The effect is linear, which means that a difference of five years will have the same impact (-44%) if we compare, for example, two players with identical profiles aged 18 and 23 years, or 28 and 33 years.
The variables that compare the position of players in relation to centre forwards all have negative coefficients, which indicates that centre forwards, all other things being equal, are the most valued. The most negative coefficient compared to centre forwards is recorded for full/wing backs, who have a value 16% lower than the reference category. On the other hand, the difference is very limited (not significant) for goalkeepers and attacking midfielders (on average around -3%).
Regarding all the variables referring to players’ performance, the most important is the sporting level of matches played, followed by the number of minutes played in the league, the number of goals scored and the tendency to be included in the starting 11s. In comparison, the results of matches played or the fact of playing international games, both for club or national teams, while also highly significant, are less important.
Whatever the variable, that referring to the most recent period (last 365 days) has systematically the highest t-value. This result indicates that the prices of players on the transfer market are formed essentially on the basis of their performances over the last year, with the previous year also playing a significant role, but to a much lesser extent. The tests carried out including previous years were all negative. International status, acquired once and for all, is another element that tends to increase the price paid for a footballer, by around a third, all other things being equal.
Finally, characteristics relating to the economic potential of both releasing and recruiting clubs form a last key group of factors. In this case, the variables referring to the destination club and league have a greater weight compared to those relating the club of departure. This finding reflects the overriding importance of the recruiters’ buying power in determining prices during negotiations.
Figure 4.
Fitted and actual transfer fees.
Figure 4.
Fitted and actual transfer fees.
Correlation levels are also very high when observations are segmented according to different criteria. By period (
Table 6), the coefficients of determination increase over time, suggesting a rationalisation of the market on the basis of the logics underlying the variables included in the statistical model. An improvement in the quality of the data gathered can also explain this development. The explanatory power of the model is relatively similar across all positions and age categories: between 83% and 87% (
Table 7 and
Table 8).
Table 6.
Coefficient of determination per period.
Table 6.
Coefficient of determination per period.
| Season |
N |
R2 |
| 2014/15-2015/16 |
1264 |
80.0% |
| 2016/17-2017/18 |
1741 |
83.7% |
| 2018/19-2019/20 |
1771 |
84.8% |
| 2020/21-2021/22 |
1418 |
84.5% |
| 2022/23-2023/24 |
2195 |
87.6% |
Table 7.
Coefficient of determination per player position.
Table 7.
Coefficient of determination per player position.
| Country |
N |
R2 |
| Goalkeepers |
350 |
87.1% |
| Centre-backs |
1449 |
84.0% |
| Full-backs |
1014 |
85.2% |
| Defensive midfielders |
1719 |
85.2% |
| Attacking midfielders |
589 |
87.6% |
| Wingers |
1377 |
85.2% |
| Centre-forwards |
1891 |
83.1% |
Table 8.
Coefficient of determination per age category.
Table 8.
Coefficient of determination per age category.
| Country |
Number |
Percentage |
| 21 years or less |
1599 |
83.4% |
| 22-25 years |
3587 |
85.6% |
| 26-29 years |
2419 |
84.2% |
| 30 years or more |
784 |
83.2% |
Cross-validation is another way of testing the quality and robustness of the model. To do this, the sample was randomly divided into five groups with the same number of individuals. Each time, a model was created with the transfers of four of these groups and the parameters were applied to predict the values of the fifth. The results obtained in terms of coefficients of determination are very stable (between 83% and 85%), both in terms of modelling and application.
Table 9.
5-fold cross-validation analysis for model to assess transfer fees.
Table 9.
5-fold cross-validation analysis for model to assess transfer fees.
| |
Training sample |
Test sample |
| |
N |
R2 adj |
N |
R2 adj |
| Cross-validation 1 |
6712 |
84.70% |
1677 |
85.00% |
| Cross-validation 2 |
6711 |
84.81% |
1678 |
84.57% |
| Cross-validation 3 |
6711 |
84.71% |
1678 |
84.97% |
| Cross-validation 4 |
6711 |
85.05% |
1678 |
83.54% |
| Cross-validation 5 |
6711 |
84.59% |
1678 |
85.46% |
On a global scale, the scatter plot linking the model’s estimates and the residuals does not show any particular shape, which rules out any flagrant problem of heteroscedasticity. The application of statistical tests such as the White or Breusch-Pagan test does not validate the hypothesis of the homoscedasticity of the variance of the residuals. The model tends indeed to slightly underestimate the values in the most extreme spectra, while the values for certain intermediate segments are slightly overestimated.
Figure 5.
Estimates’ and residuals’ scatter plot.
Figure 5.
Estimates’ and residuals’ scatter plot.
Figure 6.
Average residuals, as per estimate in percentile.
Figure 6.
Average residuals, as per estimate in percentile.