2. Literature Review
GLM generalizes classical linear models, allowing the outcome to follow a distribution other than the normal, such as logistic, Poisson, or Gamma. This characteristic of GLM renders it applicable in models where data are distributed in various ways and represents a robust means by which to conduct hypothesis testing in a variety of contexts.
The Gamma Regression Model gives a specific definition to the association between the independent variables and the mean of the dependent variable through the link function. The log or the inverse link function is frequently employed. These link functions modify the predicted mean and enable the model to effectively accommodate positively skewed data [
10].
The model has a large application scope and is especially common in insurance for claims data analysis. The Gamma distribution is appropriate for modeling strictly positive continuous random variables such as the cost of claims or costs associated with losses from damages [
11]. This model gives us insight into how varying each one of these factors, for example, the vehicle’s age or model, impacts the claims cost.
The Gamma distribution’s characteristics and uses are discussed in this paper, which serves to introduce the Gamma distribution in statistical modeling [
12].
In addition to introducing generalized linear models with a discussion of link functions (e.g., the Gamma distribution), this book represents an excellent resource on binary and other kinds of dependent variables [
13].
Generalized linear models, of which the Gamma regression model is an example, are covered throughout this textbook, as theoretical developments and presentations of GLMs with examples are found throughout [
14].
Despite being centered on Bayesian methods, this work contains information on the Gamma distribution, general linear models, and their applications in the real world (among others, insurance and economics), [
15].
In this paper, we investigate the application of count data models that are generalized, specifically, generalized linear models that can be used to model over-dispersed count data, thereby providing a broader perspective on the application of GLMs in econometrics [
16].
A foundational textbook on the theory and application of generalized linear models, including Gamma regression, with a focus on model fitting, diagnostics, and interpretation of results [
17].
A seminal work that introduced the GLM framework, covering various link functions and distributions, including the Gamma distribution, which is integral to this research [
18].
This textbook offers extensive coverage of econometric techniques, including generalized linear models for various data types, with applications to the analysis of insurance and economic data [
19].
This paper discusses the extension of generalized linear models to include random effects, which may be relevant when modeling more complex data structures, including insurance claims.
This book explores the generalization of logistic regression to other types of data, including the use of the Gamma distribution in regression models, and provides examples of applications in insurance and risk management.
5. Application
The use of a generalized linear model of the suitability of gamma regression for the analysis of non-negative (positive) data [
32]. The data set relates to the cost of compensation for damage to insured vehicles. The average compensation amount follows a gamma distribution (as the dependent variable), where the inverse link function is used to connect the expected value of the dependent variable, y, to a linear combination of independent variables [
33].
insurance age,
vehicle class and
vehicle model, i.e., its age). Because we have a different number of claims used to calculate the average claim amounts, you can specify the number of claims as the measurement weight
, To build a Kama regression model is as follows [
34,
35,
36,
37,
38]:
The Pearson chi-square method is selected from the set of parameter estimation using the Scale Parameter Method, which is the method used by McCullagh and Nelder to obtain the same results. The rest of the options remain constant, typical of the product [
39,
40].
Predicted value of linear predictors and the standard deviation of residential are selected to be stored in the data window, which can help us to check any problem in the fit of the model [
41]. Then, clicking on OK, we get the following results:
Table 1.
Model Information.
Table 1.
Model Information.
| Dependent Variable |
Average amount of billing |
| Probability Distribution |
Gamma |
| Link Function |
Power (-1) |
| Scale Weight Variable |
Number of Bills |
The model information table shows the name of the dependent variable, the probability distribution is Gamma, the correlation function is the inverse function, and the measurement weight variable is represented by the number of requests [
42,
43,
44].
Table 2.
Processing Summary of Cases.
Table 2.
Processing Summary of Cases.
| |
N |
Percent |
| Included |
123 |
96.1% |
| Excluded |
5 |
3.9% |
| Total |
128 |
100.0% |
The summary
Table 2 of the case Operations shows that there are (123) views included in the Model (effective) and (5) missing values, and their total is equal to (128).
The information
Table 3 for the categorical variable (consisting of three factors) displays the frequency distribution of each category included in the model, showing both the count and the proportion for each.
Table 3.
Categorical Variable Information.
Table 3.
Categorical Variable Information.
| |
|
|
N |
Percent |
| Factor |
Policyholder age |
60+ |
16 |
13.0% |
| |
|
50-59 |
16 |
13.0% |
| |
|
40-49 |
16 |
13.0% |
| |
|
35-39 |
15 |
12.2% |
| |
|
30-34 |
16 |
13.0% |
| |
|
25-29 |
16 |
13.0% |
| |
|
21-24 |
15 |
12.2% |
| |
|
17-20 |
13 |
10.6% |
| |
|
Total |
123 |
100.0% |
| |
Vehicle group |
D |
28 |
22.2% |
| |
|
C |
31 |
25.2% |
| |
|
B |
32 |
26.0% |
| |
|
A |
32 |
26.0% |
| |
|
Total |
123 |
100.0% |
| |
Vehicle age |
10+ |
28 |
22.8% |
| |
|
8-9 |
31 |
25.2% |
| |
|
4-7 |
32 |
26.0% |
| |
|
0-3 |
32 |
26.0% |
| |
|
Total |
123 |
100.0% |
Table 4.
Information on Continuous Variables.
Table 4.
Information on Continuous Variables.
| |
N |
Minimum |
Maximun |
Mean |
Std.Deviation |
| Dependent Variable |
Average amount of billing |
123 |
11 |
850 |
231.14 |
117.048 |
| Scale Weight |
Number of bills |
123 |
1 |
434 |
72.70 |
92.598 |
The continuous variable information table shows that we have only two quantitative variables, the dependent variable representing the average compensation value, and the weight representing the number of requests, and provides descriptive statistic for them.
Table 5, he fit quality table provides two Tests for the null hypothesis that the model fits the data exactly. The deviation and chi-square statistics of Pearson Chi-Square are equal to (124.783 and 131.786), which is less than its Tabular value under the level of (0.05) and degrees of freedom equal to (109), which is equal to (134.085), which indicates the suitability of the data for the estimated model and so on for measuring (Scaled) the two Tests. The smaller the information, the more accurate the model is, and it is compared with other estimated models for other distributions to show preference.
Table 5.
Goodness of Fit.
Table 5.
Goodness of Fit.
| |
Value |
df |
Value|df |
| Deviance |
124.783 |
109 |
1.145 |
| Scaled Deviance |
103.207 |
109 |
|
| Pearson Chi-Square |
131.786 |
109 |
1.209 |
| Scaled Pearson Chi-Square |
109.000 |
109 |
|
| Log Likelihood |
-622.754 |
|
|
| Adjusted Log Likelihood |
-623.873 |
|
|
| Akaike Information Criterion (AIC) |
1273.509 |
|
|
| Finite Sample Corrected AIC (AICC) |
1277.398 |
|
|
| Bayesian Information Criterion (BIC) |
1312.879 |
|
|
| Consistent AIC(CAIC) |
1326.879 |
|
|
Table 6.
Omnibus Testa.
| Likelihood Ratio Chi-Square |
df |
Sig. |
| 434.299 |
13 |
0.000 |
Dependent Variable: Average amount of billing
Mode: (Intercept), Policyholder age, Vehicle group, Vehicle age
The omnibus test table or weighting ratio tests show the contribution of the influence of variables to the model. (-2 log weighting) is calculated for the reduced model (the model without the influence of these variables, that is, for the constant value only) with the effect of the full model and the value of-p is to test the difference between the two models to indicate the significance of the parameters of the variables as a whole; the value of-p is equal to zero (less than 0.05) the independent variables contribute to the interpretation of the model [
46].
Table 7.
Tests of Model Effects.
Table 7.
Tests of Model Effects.
| Type III |
| Source |
Wald Chi-Square |
df |
Sig. |
| (Intercept) |
1440.320 |
1 |
0.000 |
| Policyholder age |
57.613 |
7 |
0.000 |
| Vehicle group |
136.430 |
3 |
0.000 |
| Vehicle age |
136.016 |
3 |
0.000 |
Dependent Variable: Average amount of billing
Model: (Intercept), Policyholder age, Vehicle group, Vehicle age
Table 7 of the model effects tests shows the significance of all the parameters of the independent variables in the model because all the values of p are equal to zero and are less than the significance level (0.05), as observed by the statistic of the father (Kai quadrature), which were all greater than their Tabular values [
47,
48].
Comprehensive testing and model effects tests (not shown) indicate that the overall model is superior to the empty model and that each of the main effects contributes to the model. The table of parameter estimates shows the same values obtained by Makula and Nelder for the factor levels and the scale parameter [
49,
50]. Look at the estimated marginal or marginal circles to explain the relationships between factor levels.
The parameter estimation table summarizes the effect of each factor at each level (category). Squaring the ratio of the coefficient to its standard error is equal to Wald’s statistic. If the values of p for the parent statistic are small (less than 0.05), then the parameter differs from zero, and there is a significant influence of that factor at that level.
Table 8.
Parameter Estimates.
Table 8.
Parameter Estimates.
| |
|
|
95% Wald Confidence Interval |
Hypothesis Test |
| Parameter |
B |
Std. Error |
Lower |
Upper |
Wald Chi-Square |
df |
Sig |
| (Intercept) |
0.003 |
0.0004 |
0.003 |
0.004 |
66.593 |
1 |
0.000 |
| [Policyholder age=8] |
0.001 |
0.0004 |
0.000 |
0.002 |
4.898 |
1 |
0.027 |
| [Policyholder age=7] |
0.001 |
0.0004 |
0.000 |
0.002 |
5.046 |
1 |
0.025 |
| [Policyholder age=6] |
0.001 |
0.0004 |
0.000 |
0.002 |
5.740 |
1 |
0.017 |
| [Policyholder age=5] |
0.001 |
0.0004 |
0.001 |
0.002 |
10.682 |
1 |
0.001 |
| [Policyholder age=4] |
0.000 |
0.0004 |
0.000 |
0.001 |
1.268 |
1 |
0.260 |
| [Policyholder age=3] |
0.000 |
0.0004 |
0.000 |
0.001 |
0.720 |
1 |
0.396 |
| [Policyholder age=2] |
0.000 |
0.0004 |
-0.001 |
0.001 |
0.054 |
1 |
0.816 |
| [Policyholder age=1] |
0a
|
|
|
|
|
|
. |
| [Vehicle group=4] |
-0.001 |
0.0002 |
-0.002 |
0.001 |
61.883 |
1 |
0.000 |
| [Vehicle group=3] |
-0.001 |
0.0002 |
-0.001 |
0.000 |
13.039 |
1 |
0.000 |
| [Vehicle group=2] |
3.765E-5 |
0.0002 |
0.000 |
0.000 |
0.050 |
1 |
0.823 |
| [Vehicle group=1] |
0a
|
|
|
|
|
|
. |
| [Vehicle age=4] |
0.004 |
0.0004 |
0.003 |
0.005 |
88.175 |
1 |
0.000 |
| [Vehicle age=13 |
0.002 |
0.0002 |
0.001 |
0.002 |
53.013 |
1 |
0.000 |
| [Vehicle age=2] |
0.000 |
0.0001 |
0.000 |
0.001 |
13.191 |
1 |
0.000 |
| [Vehicle age=1] |
0a
|
|
|
|
|
|
. |
| Scale |
1.209b
|
|
|
|
|
|
|
Dependent Variable: Average amount of billing.
Model: (Intercept), Policyholder age, Vehicle group, Vehicle age.
Table 9.
Estimated Marginal Means 1: Policyholder age.
Table 9.
Estimated Marginal Means 1: Policyholder age.
| |
Estimates |
|
| |
95% Wald Confidence Interval |
| Policyholder age |
Mean |
Std. Error |
Lower |
Upper |
| 60+ |
186.08 |
6.084 |
174.87 |
198.82 |
| 50-59 |
186.20 |
5.551 |
175.92 |
197.75 |
| 40-49 |
184.38 |
5.155 |
174.80 |
195.07 |
| 35-39 |
171.70 |
5.516 |
161.53 |
183.24 |
| 30-34 |
203.40 |
6.930 |
190.67 |
217.95 |
| 25-29 |
208.15 |
7.482 |
194.46 |
223.93 |
| 21-24 |
219.52 |
10.824 |
200.17 |
243.00 |
| 17-20 |
224.51 |
20.690 |
190.16 |
274.00 |
The insurance age table presents the model-derived marginal means, along with their corresponding standard errors and confidence intervals, for the average claim cost — which serves as the dependent variable. This information helps in examining how the average claim cost varies across different age categories. In the given case, these marginal means span from 171.70 in the 35–39 age group to 224.51 in the 17–20 age group. To determine whether these observed differences are statistically meaningful or merely the result of random variation, one should refer to the associated test statistics [
51,
52].
Table 10.
Individual Test Results.
Table 10.
Individual Test Results.
| Policyholder age Repeated Contrast |
Contrast Estimate |
Std. Error |
Wald Chi-Square |
df |
Sequential Sidak Sig. |
| Level 60+vs. Level 50-59 |
-0.13 |
6.093 |
0.000 |
1 |
0.984 |
| Level 50-59 vs. Level 40-49 |
1.82 |
5.068 |
0.129 |
1 |
0.978 |
| Level 35-39 vs. Level 35-39 |
12.68 |
5.458 |
5.397 |
1 |
0.115 |
| Level 30-34 vs. Level 30-34 |
-31.70 |
6.705 |
22.351 |
1 |
0.000 |
| Level 25-29 vs. Level 25-29 |
-4.75 |
7.359 |
0.417 |
1 |
0.946 |
| Level 21-24 vs. Level 21-24 |
-11.36 |
10.709 |
1.125 |
1 |
0.818 |
| Level 17-20 vs. Level 17-20 |
-5.00 |
21.812 |
0.052 |
1 |
0.978 |
Table 10 of individual test results compares consecutive insurance age groups and determines whether the differences observed are statistically significant or due to chance. Only one significant difference was found between the 30-34 and 35-39 age groups (p-value = 0.000). This suggests that these two groups are distinct, with younger insurance policies showing higher compensation costs. There is also some indication, though not statistically significant, that the number of claims first decreases and then increases as the insurance age progresses.
Table 11.
Summary of Test Results.
Table 11.
Summary of Test Results.
| Wald Chi-Square |
df |
Sig. |
| 47.925 |
7 |
0.000 |
The Comprehensive Test table presents the results for all comparisons listed in the individual test results table. A p-value of zero indicates a significant difference in insurance compensation costs across the different categories or levels of insurance age. It is important to note that this test is not the same as the model effects test for insurance duration, as the latter is conducted on a transformed scale, whereas the estimated marginal means are calculated on the original scale.
Table 12.
Estimates.
| |
95% Wald Confidence Interval |
| Vehicle group |
Mean |
Std.Error |
Lower |
Upper |
| D |
239.87 |
9.513 |
222.57 |
260.08 |
| C |
200.98 |
6.168 |
189.58 |
213.84 |
| B |
177.71 |
4.668 |
169.02 |
187.34 |
| A |
178.91 |
5.871 |
168.09 |
191.20 |
Table 12 shows the estimated marginal averages for the vehicle class (type or model), standard errors, and confidence intervals of the average cost of insurance compensation at the levels of the vehicle type factor. This table is useful for exploring the differences between the levels of this factor. In this example, the marginal averages range from 177.71 for the Type B vehicle to the highest 239.87 for the Type D vehicle. To find out if the values in this table represent real differences or are likely due to chance, look at the test results.
Table 13.
Pairwise Comparisons.
Table 13.
Pairwise Comparisons.
| |
95% Wald Confidence Interval Differencea
|
| (I)Vehicle group |
(J)Vehicle group |
Mean Difference (I-J) |
Std. Error |
df |
Sequential Sidak Sig. |
Lower |
Upper |
| D |
C |
38.89a
|
6.953 |
1 |
0.000 |
21.57 |
56.21 |
| B |
62.16a
|
7.338 |
1 |
0.000 |
42.85 |
81.47 |
| A |
60.96a
|
8.527 |
1 |
0.000 |
39.06 |
82.87 |
| C |
D |
-38.89a
|
6.953 |
1 |
0.000 |
-56.21 |
-21.57 |
| B |
23.27a
|
4.406 |
1 |
0.000 |
12.75 |
33.79 |
| A |
22.07a
|
6.020 |
1 |
0.000 |
8.61 |
35.54 |
| B |
D |
-62.16a
|
7.338 |
1 |
0.000 |
-81.47 |
-42.85 |
| C |
-23.27a
|
4.406 |
1 |
0.000 |
-33.79 |
-12.75 |
| A |
-1.20 |
5.378 |
1 |
0.824 |
-11.74 |
9.34 |
| A |
D |
-60.96a
|
8.527 |
1 |
0.000 |
-82.87 |
-39.06 |
| C |
-22.07a
|
6.020 |
1 |
0.000 |
-35.54 |
-8.61 |
| B |
1.20 |
5.378 |
1 |
0.824 |
-9.34 |
11.74 |
Pairwise comparisons of estimated marginal means based on the original scale of the dependent variable, Average amount of billing:
The pairwise comparison table shows the differences between each pair of vehicle types and tests whether each difference is due to a coincidence difference. Comparison of only types of compounds A and B is not statistically significant (the value of p is equal to 0.824). These two groups are associated with the lowest claims (the cost of insurance compensation), followed by Type C, and then Type D, with the highest number of claims.
Table 14.
Overall Test Results.
Table 14.
Overall Test Results.
| Wald Chi-Square |
df |
Sig. |
| 78.468 |
3 |
0.000 |
The Wald chi-square tests the effect of the Vehicle group. This test is based on the linearly independent pairwise comparisons among the estimated marginal means
The table shows the results of the Comprehensive Test of all antagonists in the pairwise comparisons table, and a value of p equal to zero indicates a significant difference between the cost of insurance compensation depending on the categories or levels of the vehicle class. This test is not equivalent to the model effects test for the vehicle class because the model effects test is performed on the converted scale, while these estimated limit averages were calculated on the original scale.
Table 15.
Estimates.
| |
95% Wald Confidence Interval |
| Vehicle age |
Mean |
Std. Error |
Lower |
Upper |
| 10+ |
129.85 |
7.404 |
116.80 |
146.19 |
| 8-9 |
192.35 |
8.227 |
177.48 |
209.95 |
| 4-7 |
255.50 |
5.976 |
244.30 |
267.78 |
| 0-3 |
281.89 |
6.620 |
269.48 |
295.49 |
This table displays the model-estimated marginal means, along with the associated standard errors and confidence intervals, for the average cost of insurance compensation across different levels of the vehicle age factor. It serves as a useful tool for examining variations among these levels. In this case, the independent variable ranges from a low of 129.85 for vehicles older than 10 years to a high of 281.89 for vehicles aged between 0 and 3 years. To assess whether these observed differences are statistically significant or potentially due to random variation, one should consult the corresponding test results.
Table 16.
Individual Test Results.
Table 16.
Individual Test Results.
| Vehicle age Repeated Contrast |
Contrast Estimate |
Std. Error |
Wald Chi-Square |
df |
Sequential Sidak Sig. |
| Level 10+vs. Level 8-9 |
-62.50 |
10.889 |
32.947 |
1 |
0.000 |
| Level 8-9 vs. Level 4-7 |
-63.15 |
9.544 |
43.783 |
1 |
0.000 |
| Level 4-7 vs. Level 0-3 |
-26.38 |
7.268 |
13.175 |
1 |
0.000 |
Table 16 of the results of individual tests shows the differences between the successive age groups of the vehicle and tests whether each difference is due to chance. All antagonists are statistically significant because all values of p are equal to zero, which is strong evidence that the volume of claims (the cost of insurance compensation) decreases as the vehicle ages.
Table 17.
Overall Test Results.
Table 17.
Overall Test Results.
| Wald Chi-Square |
df |
Sig. |
| 282.315 |
3 |
0.000 |
The Comprehensive Test table shows the test results for all antagonists in the pairwise comparisons table, and a p-value equal to zero indicates a significant difference between the cost of insurance compensation depending on the categories or age levels of vehicles. This test is not equivalent to the model effects test for the age of the vehicle because the model effects test is performed on the converted scale, while these estimated limit averages were calculated on the original scale.
To verify the formulation of the model, look at the drawing of the standard deviation residuals against the estimated linear independent variables. The spread plot of the standard deviation residuals against the expected values of the linear forecast was performed as follows:
Figure 1.
Simple Scatter Standardized Residual.
Figure 1.
Simple Scatter Standardized Residual.
Some points are outside the General Group and can be investigated further, but otherwise, there is nothing suspicious in the drawing.
7. Conclusions
The Gamma Regression Model is a powerful tool for analyzing data that follows a Gamma distribution. The use of the logarithmic link function proved to be more effective in modeling the cost of insurance claims compared to the inverse link function. By applying this model, insurance companies can better understand the factors influencing claims costs, leading to more informed decisions and improved risk management strategies.
The Gamma Regression Model offers a robust and flexible framework for modeling positive continuous data, particularly when the data exhibit a skewed distribution. This model, part of the Generalized Linear Models (GLM) family, is especially useful in cases where traditional assumptions of normality do not hold, such as in the analysis of insurance claims, environmental damage costs, or other economic applications.
In this study, the application of the Gamma Regression Model using SPSS has demonstrated its effectiveness in analyzing the cost of insurance claims. By applying various link functions, specifically the logarithmic link function, the study found that it provides the best fit compared to the inverse link function, as evidenced by lower deviance statistics and Pearson Chi-Square values. This highlights the importance of selecting the appropriate link function for better model accuracy.
The results from this study also emphasize the significance of independent variables such as vehicle age and vehicle type in determining the cost of claims. The statistical significance of these variables provides valuable insights into the factors that drive insurance costs, allowing insurance companies to make more informed decisions in terms of risk management and pricing strategies.
Overall, the Gamma Regression Model proves to be an essential tool in analyzing non-negative, skewed data, and its flexibility in handling various types of data distributions makes it a highly valuable technique in multiple fields, including finance, insurance, and environmental economics.