Graphical Diagnostic for Mortality Data Modeling

The main contribution of this paper is to develop a graphical tool to evaluate the suitability of a candidate parametric model to fit a data set. The practical motivation is to check the adequacy of the so called SAINT model proposed in Jarner and Kryger (2011). This model has been implemented in practice by an important European pension fund concerned with setting annuity reserves for all current or former employees of Denmark. So, one particular relevant question is whether this mortality model is actually fitting old-age. Our graphical test can be described as follows. First we transform the data by means of the parametric model which is being evaluated. Should the model be correct, the transformed data will be in accordance with an Exponential distribution with rate 1. Then we construct a family of local linear hazard estimates based on the data on the transformed scale. Finally we use the statistical tool SiZer to assess the goodness-of-fit of the Exponential distribution to the data on the transformed scale. If the model is correct the SiZer map should not reveal any significant feature in the family of kernel estimates. Our method allow us to establish a diagnostic regarding the validity of the SAINT model when describing mortality patterns in four different countries.


Introduction
The main contribution of this paper is to develop a graphical tool to evaluate the suitability of a candidate parametric model to fit a data set.The practical motivation is to check the adequacy of the so called SAINT model proposed in Jarner and Kryger (2011).This model has been implemented in practice by an important European pension fund concerned with setting annuity reserves for all current or former employees of Denmark.So, one particular relevant question is whether this mortality model is actually fitting old-age.
The SAINT model has been evaluated previously in Spreeuw, Nielsen and Jarner (2013), where a visual inspection technique is introduced in order to detect the credibility of the parametric assumptions of the proposed model.The technique consists of first transforming the original mortality data according to the proposed parametric hazard model, and then non parametrically estimate the density based on transformed data.The density of the transformed data should be close to the uniform distribution when the model assumptions are correct.To estimate the density on the transformed axis the local linear density estimator based on filtered data and proposed in Nielsen, Tanggaard and Jones (2009) is used.
In this paper we focus on hazards estimation instead of the densities.In Section 1 we describe the parametric frailty hazard model (SAINT model) to be tested and describe the estimation procedure of the parameters involved.In Section 2 we present the local linear fitting of the hazard and its first derivative in the context of aggregated survival data.Section 3 is devoted to the description of the exploratory tool SiZer.In Section 4 we carry out a study with real mortality data where we evaluate the goodness-of-fit of the SAINT frailty model in different scenarios determined by mortality data concerning four different countries and two calendar years.Conclusions are given in Section 5.

Model specifications
The parametric model for mortality data suggested in Jarner and Kryger (2011), SAINT model is a multiplicative frailty model where the mortality rate of a individual is decomposed in two factors: one is a standard intensity function and the other is a non-negative random variable, the frailty.In Spreeuw et al. (2013) this model is evaluated.The authors discuss the adequacy of several candidates for the frailty variable, a Gamma model, an Inverse Gaussian model, and degenerate (i.e.no frailty) to properly adjust hazard rates based on mortality data.In their study the authors introduce a visual test to assess the validity of the mixed model to explain mortality rates based on data concerning female population of ages ranging from 40 onwards and coming from four different countries, i.e., United States, United Kingdom, Denmark and Iceland, during the year 2006.In their conclusions, the authors argue that the Gamma frailty model provides a more accurate fit to the data in all real cases considered, specially at the oldest old-ages for which the Gamma frailty model better accommodate to the presence of an old-age mortality plateau.The parametric mixed mortality model considered generalizes the classical Gompertz survival model by including more parameters and by including a multiplicative frailty component.
Specifically, the frailty model presented in Jarner and Kryger (2011) and Spreeuw et al.( 2013) introduces the individual effect for life's mortality multiplicatively.So, it is assumed that the individual effect for a particular subject is described by a random variable X, in the sense that the conditional mortality hazard at age t, given X = x, is expressed as where α 0 (t) represents the baseline mortality hazard at age t, that is, the hazard corresponding to an individual with frailty level x = 1.Also, it is assumed that, given a cohort of n individuals, X 1 , X 2 , . . ., X n are independent and identically distributed with mean 1.
We adopt here the specifications given in Jarner and Kryger (2011) for national and international mortality modeling, later analyzed in Spreeuw et al. (2013), which consist of defining the baseline mortality hazard as (2) Also, following the recommendation of Spreeuw et al. (2003) we consider for the frailty X a gamma distribution with mean 1 and variance σ 2 .From Equation (1), the relationship between the corresponding survival functions can be written as S x (t) = e −Λ x (t) = e −xΛ 0 (t) , where Λ x (t) = t 0 α x (s)ds is the cumulative hazard mortality for an individual for which X = x and Λ 0 (t) = t 0 α 0 (s)ds is the baseline cumulative hazard.Then the cohort survival function is given by where, L X (s) = E e −sX denotes the Laplace transform of the distribution of the frailty random variable, X.From here we can express the frailty model as Considering (2) and that X has a gamma distribution as stated above, L X (s , we have that the SAINT model specifies the cohort mortality hazard as

Maximum likelihood estimation of the parametric model
Consider a data set consisting of mortality statistics of n lives.More specifically, the information we register is related, on the one hand to the exposure process Y i , which takes value 1 when individual i is alive and under observation, and, on the other hand, N i , the counting process taking the value 1 when the individual i has died while under observation and 0, otherwise.We consider both Y i and N i depending on the age t, in such a way that we assume that N i is a one-dimensional counting process with respect to an increasing right continuous complete filtration F t , t > 0. Also we assume that the corresponding intensity function of N i satisfies the multiplicative model of Aalen (see for example Andersen et al., 1993) where, as in Spreeuw et al. (2013), we consider that {α θ , θ ∈ Θ} is a parametric family of hazard functions to which the true hazard function α = α θ belongs.In other words, the exact mortality is determined except for the value of the parameter vector θ which is estimated by maximum-likelihood methods, that is, θ = arg max θ∈Θ L(θ), with

Transformation of data
In this section we use the parametric fit explained in the previous section to transform the data in such a way that the underlying hazard would become constant in case the parametric fit indeed would have been the correct underlying model α.The parametric fit is in other words used as a kind of prior knowledge to simplify the non-parametric estimation problem.If the prior knowledge is of high quality and the parametric model has good approximation properties, then the resulting non-parametric problem is simplified in the second step compared to a fully non parametric treatment of the original dataset (not transformed).Next a local linear hazard estimator is applied to the transformed data.If the parametric specification α θ were true then after the time transformation x = Λ θ (t) with Λ θ (t) = t 0 α θ (s)ds, the functional form of the hazard on the transformed scale would be simply equal to the standard Exponential, i.e. the hazard would be equal to 1 on the transformed scale.
For a given θ ∈ Θ we put This transformed process follows Aalen's multiplicative hazard model with transformed stochastic hazard where , α is the true hazard and α θ is the assumed parametric value.We now carry out our non-parametric local linear smoothing technique on the transformed processes N θ i and Y θ i and obtain an estimate g θ of g θ on the transformed time axis.As in Spreeuw et al. (2013) we use a parametric mixed hazard model based on a gamma frailty mortality model and we estimate the parameters by a standard maximum likelihood procedure.Then we transform the survival data as just described.If the parametric model is correct, then the estimated hazard α θ should be very close to the true curve, and in consequence the transformed time scale data would produce a sample from an Exponential distribution with rate 1.In other words the estimated curve g θ should be very close to the line y = 1.

Local linear estimation with aggregated data
In this section we present a local linear estimation method for aggregated survival data which is based on a least squares minimization problem.We follow the general formulation proposed in Nielsen and Tanggaard (2001) and we obtain an estimator for the hazard function as well as for its first derivative.Also we derive an expression for the variance of the estimator of the derivative of the hazard function needed to develop the SiZer tool.The details are given in the following.We observe n individuals i = 1, . . ., n.Let N i count observed failures for the ith individual in the time interval [0, T].N i can take values 0 or 1.We assume that N i is a one-dimensional counting process with respect of an increasing, right continuous, complete filtration F t , t ∈ [0, T], i.e. it obeys les conditions habituelles.We assume Aalen's multiplicative model where the random intensity is written as with no restriction on the functional form of the hazard function α(•).Y i is a predictable process taking values in {0, 1}, indicating (by the value 1) when the ith individual is at risk.We assume that (N 1 , Y 1 ) , . . . ,(N n , Y n ) are i.i.d. for the n individuals.Note that this formulation contains the important particular case of a longitudinal study with left truncation and right censoring.In that case we observe triplets (L i , Z i , δ i ) (i = 1, .., n) where L i is the time an individual enters the study, Z i is the time he/she leaves the study and δ i is binary and equal to 1 if death is the reason for leaving the study (censoring indicator).Then, the process Y i above would be , where I(•) is the indicator function.Hereafter we will work in the general model.Expressions for particular sampling schemes such as left truncation and right censoring can be derived just by substituting the particular expressions of the processes.
The data that we will use are divided into discrete numbers of occurrences and exposures.We suppose that the following aggregated values of occurrences and exposures are available for r = 1, . . ., m.Here, x 1 , . . ., x m are some time points.For simplicity in the formulation of the problem we consider this points equidistant however we can easily generalize for the non equidistant case.With ∆ r = x r − x r−1 we can define Y r = E r /∆ r .This is the average number of individuals which are at risk in the interval [x r−1 , x r ) for r = 1, . . ., m and with x 0 = 0.
In this situation, we define an empirical (naive) estimator of α r as From theorem A.1 (i) and (ii) in Wang, Muller and Capra (1998) it can be deduced (see also Bagkavos, 2011) that Moreover, the pairs (x r , α r ), r = 1, 2, . . ., M can be considered approximately independent.Under a local linear approach we assume that, for each x r , the hazard function α can be locally approximated (in a neighborhood of x r ) by a linear function, this is, for all x ∈ (x r − h, x r + h), α(x) = θ 0 + θ 1 (x − x r ), for an appropriate bandwidth h.Thus, at the grid point x r (r = 1, . . ., M), an estimation of the intercept θ 0 would provide an estimation of α (x r ), whereas an estimation of the slope θ 1 would provide an estimation of the derivative α (x r ).
To this goal, we formulate the following least squares problem where α m = O m E m .The solution to the equations above is easy to obtain and can be expressed as • The first derivative of the hazard function, α where, as usual, we define the moments S j , for j = 0, 1, 2, and, for j = 0, 1, we denote , with K a symmetric kernel function.Using matrix notation, we can write the estimators as follows and where and Denoting A x = −e t 2 X t WX −1 X t W, then the variance of the estimator of α (x) can be expressed as where, using independence, we have that Expanding the expression in (9) we get where R = X t WVWX.
On the one hand, it can be checked that and, on the other hand, we have that R is a symmetric matrix with entries After easy algebra we obtain that Finally, we need to estimate the variance of α r , i.e.V r , for r = 1, . . ., M. To do it we use an appropriate estimator of the survival function and plug in the expression (6).So we propose

SiZer description
SiZer is a graphical device for the display of significant features in data with respect to a localization space and a scale space.It was proposed by Chaudhuri and Marron (1999) in the context of univariate density and regression.The authors described SiZer as a powerful exploratory tool to allow the visual assessment of significance of features in data.In real data analysis to discover new features in the data such as "bumps" or increasing/decreasing patterns might lead to important new scientific insights.In the context of goodness-of-fit and graphical residuals analysis, features and monotonic or shaped patterns in residuals lead to the rejection of postulated models.The exploratory power of the SiZer analysis has been applied recently for goodness-of-fit purposes.Some examples are Marron and Zhang (2005) 2011) for testing uniformity in homogeneous Poisson processes.All these works differ from standard nonparametric methods for solving tests, which make use of difficult choices of bandwidths and provide global information that does not allow adequate diagnostics.The key idea of the SiZer analysis for goodness-of-fit purposes is to visualize the inference problem locally and considering many different scales.Since useful information can be available at different levels of smoothing, the problem is analysed considering a wide range of smoothing levels that correspond to resolution levels.Besides possible local deviations from the postulated model can be located at specific areas in the localization space.
In this paper we are interested in estimating the hazard function, more specifically the hazard for data transformed under a parametric null hypothesis.The idea is to check the adequacy of a postulated model by assessing significance of features in the hazard function of the transformed data.If the model does fit the data then the hazard for the data transformed using the model should be constant, with no monotonic trend neither features.Features such as bumps are characterised by the significant zero crossing of first derivative of the target curve.At a bump there is a zero crossing of the derivative, curve first goes up (positive derivative) and then goes down (negative derivative).Chaudhuri and Marron (1999) described the SiZer analysis from two plots, the "family plot" that is the family of smoothers for the target function indexed by the bandwidth parameter, and a coloured map which displays the scale space inference about the first derivative of the function as follows: for each bandwidth (scale) and value in the support (localization), a confidence interval for the first derivative is calculated and the results about the sign are displayed in a two-dimensional space (map) using a colour code.Considering n h scales and n grid localizations, each pixel in the n grid × n h map is colour-coded as blue if zero is greater than the upper confidence limit (significantly increasing), red if zero is less than the lower confidence limit (significantly decreasing), purple if zero is within the confidence limits (not significantly increasing or decreasing), and gray indicating regions where the data are too sparse to make statements about significance (the effective sample size defined below is less than 5).From the map the features in data are characterized by changes in colour so a change from blue to red displays a significant bump.
Figure 1 shows the SiZer analysis for simulated data from a parametric hazard model.The details about how this SiZer has been constructed are given in the next section, now we consider this just as an example to facilitate the understanding of readers not familiarised with this graphical tool.The family plot shows the local linear hazard estimates with different choices of the bandwidth parameter.This gives an idea of the shape of the underlying hazard that the considered smoothing method is able to capture.The bigger bandwidths produce oversmoothed curves showing the general trend in the curve while small bandwidths provide a very detailed description of the hazard with many wiggles.The answer about which features are really there and whether any of these wiggles actually correspond to a significant feature is answered by looking at the colour map in Figure 1.This map shows the result of the inference about the first derivative of the hazard.In the vertical axis different smoothing levels in logarithmic scale are represented so drawing a horizontal line at a given value in the range we can see the inference results a that particular value.Following the spirit of the scale space inference real features in data will appear at some of these scales so the analyst should look at all of them to conclude about the significance.On the other hand the horizontal axis shows the values where the hazard (in this case the first derivative) is evaluated, so placing a vertical line in the map we can see the results for the inference at the corresponding point.With this in mind and the colour code described above we can see that the underlying hard is significantly increasing at the beginning (blue colour up to about 0.1), then there is a zero crossing (confidence intervals contain zero between 0.1 and 0.3) and the hazard goes down (red colour up to about 0.5).The change of colours indicates a significant bump around 0.2.Now, if we continue moving the localization up to the right then we can see that a second bump seems to occurs around the value 0.8.Note that the blue colour before 0.8 shows significant increasing interrupted by a zero crossing of the derivative (purple) and ending with significant decrease (red).The data used to construct these plots have been simulated from a distribution with hazard rate obtained as a mixture of Beta functions, that is α(t) = B(t; 6, 2) + B(t; 2, 6), where B(t; a, b) is the density function of a Beta distribution with parameters a, and b.Note that the inference with SiZer is able to detect the two features in the true hazard, and it has been possible considering many different bandwidth choices.

SiZer specifications
We now describe the main elements to construct SiZer for inference about a hazard function in the context of aggregated mortality data described in the previous sections.The first choice to define the SiZer analysis is about the family of smoothers to be considered.Our proposal is the local linear estimator of the hazard given in (7), which works from common aggregated mortality data.For the scale space inference about the first derivative we again propose the local linear estimator given in (8), and the estimate of its variance given in (9).With this estimator and its variance we can compute confidence intervals for the first derivative using the expression Here p is the significance level, and q 1−p/2 , q p/2 are appropriate quantiles.To calculate the quantiles q 1−p/2 , q p/2 we suggest to use the a Normal approximation.Note that kernel estimates are local averages so the pointwise asymptotic normality of these estimators can be used to construct pointwise confidence intervals, just by taking q 1−p/2 and q p/2 equal to the (1 − p/2) and p/2 quantiles of the standard Normal distribution, respectively.However SiZer Map is a simultaneous display of statistical inferences, so there is a multiple comparison issue which needs to be addressed.We consider simultaneous inference for SiZer by defining m independent confidence interval problems, where m reflects the number of independent blocks.Thus, the simultaneous Normal quantiles in SiZer are calculated as follows Here Φ −1 is the inverse of the standard Normal distribution function and m, which depends on the bandwidth h, m ≡ m(h), is the number of independents blocks of average size available from a dataset of size n.The estimation of m is based on an estimate of the effective sample size (ESS), which reflects the "number of data points in each kernel window".For the kernel estimators of the intensity the ESS can be computed as And the number of independent blocks for each h is estimated as where avg t D h denotes the average value on the set D h = {t : ESS(t, h) ≥ 5}.

A motivating example to illustrate the strength of SiZer
We now show with an example the versatility of SiZer when analysing mortality data.We consider the young male population from Finland and we are interested in evaluating how mortality rate corresponding to this section of the population has evolved along the years.Specifically we estimate mortality rate for male population aged 25 years old from calendar year 1906 to the year 2012.The results are shown in Figure 2. According to the plots we can deduce that the mortality rate exhibits a long-term decreasing trend.However two significant features are highlighted in the SiZer map.The colour change blue-purple-red in the first calendar years reveals the existence of a bump around the year 1918.Also another even more prominent bump is located around the years 1943-1945 (see again the colour change blue-purple-red around these years).These two bumps indicate an unusual increase of the mortality rate for the section of population considered here (male population 25 years old) compared to the decreasing evolutionary trend of the mortality along the whole period of time considered (calendar years 1906 to 2012).Notice that this local behavior of the mortality can be explained as a consequence of the two World Wars that took place during the periods 1914-1918 and 1939-1945,

Graphical inspection of a candidate mortality model based on SiZer analysis
In this section we illustrate the use of SiZer analysis to check the adequacy of a candidate mortality model.We consider an application to real data based on the previous work of Spreeuw et al.The first step to check the adequacy of the model is to estimate the parameters by maximum likelihood.Using the methods described in Section 1.2 we have derived these estimates for the data in 2006 and 2013, the results are shown in Table 1.Then, from the estimated parametric hazard α θ and the corresponding cumulative hazard Λ θ (t) = t 0 α θ (s)ds, the data are transformed as x = Λ θ (t) (see Section 1.3.Finally the inference is performed about the underlying hazard of the transformed data.If the parametric model is correct then we should expect a hazard estimate close to the standard exponential model, i.e., constant and equal to 1. Significant features in the underlying hazard or significant monotone patterns would lead to reject the candidate model.To detect possible violations from the assumed model at each country and year we have constructed the SiZer plots with the transformed data.Figures 3 to 6 show the results for the year 2006 and the updates for the 2013 are shown in Figures 7 to 10.The total absence of red/blue colour in the maps confirm that there are not significant deviations from the Gamma frailty model for any of the countries in 2006.Also looking at the family plots we can see that the family of local linear hazards from transformed data are nearly constant around 1, as expected if the model is correct.This is the same conclusion that Spreeuw et al. (2013) drew in their paper.However the results from data in 2013 are slightly different.For the smaller countries Denmark and Iceland, the red-colour pattern in the maps for some of the considered bandwidths reveals a significant deviation from the constant line, which again would be expected if the parametric model is correct.A similar issue can be observed for US and UK since blue (significant increasing) and red colour (significant decreasing) arise in the maps from the 2013 data.Even these deviations are not very strong they suggest that the models are not describing the mortality rates in 2013 as well as they did in 2006.

Conclusions
In this paper we have developed the graphical exploratory tool SiZer map for inference of mortality rates with aggregated data.Using local linear smoothers for the hazard rate and its first derivative the developed SiZer tool is able to detect and confirm the underlying features in the hazard rate such as monotone patterns or local bumps.Besides parametric mortality models can be checked using a transformation approach and afterwards performing the SiZer analysis of the transformed data.The strength of the proposed SiZer analysis has been demonstrated on real mortality data and relevant conclusions have been drawn about a commonly used SAINT frailty model for this kind of data.
for testing univariate parametric regression models and monotonicity; González-Manteiga et al. (2008) for testing additive regression models.Park and Kang (2008) and Park et al. (2014) for the comparison of regression curves; Park et al. (2004, 2009) and Rondonotti et al. (2007) for significance of trends in times series; Park et al. (

Figure 1 .
Figure 1.Illustration of SiZer with simulated data.
(2013)  andGamiz et al. (2013).As these authors we analyse mortality data of four countries differing significantly in terms of population size, namely United States (US), United Kingdom (UK), Denmark and Iceland.For each country, the data set consists of female period mortality data, obtained from the Human Mortality Database, and concerning the calendar years 2006 and 2013, the last year from which we have complete datasets for the four countries available at Human Mortality Database.Spreeuw et al. (2013) used these data to check whether the SAINT model that is being used by the major Danish Pension Fund ATP for purposes of asset-liability management is a valid for representing old age mortality.Including the most recent available year 2013 we aim to check whether their conclusions from data in 2006 are still valid in 2013.In the analysis, only the ages from 40 to 110 are included and, as recommended in Spreeuw et al. (2013), we consider a Gamma frailty model as the candidate model to represent the data in the four countries.A visual inspection of a local linear estimator of the density based on transformed data for a single bandwidth choice they showed that the model represented fine the mortality patterns of adult at old ages.With the SiZer analysis introduced in this paper we are now able to assess the significance of these conclusions, avoiding the problem of the non-trivial bandwidth choice, see Gamiz et al. (2013) for a discussion of this issue.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 July 2016 doi:10.20944/preprints201607.0047.v1Table 1 .
Maximum likelihood estimations of the parameters of the Gamma frailty hazard model for each country and calendar year.The results concerning the year 2006 have been obtained from Spreeuw et al. (2013).

www.preprints.org) | NOT PEER-REVIEWED | Posted: 18 July 2016 doi:10.20944/preprints201607.0047.v1
SiZer analysis from mortality data in UK in 2006, transformed under a Gamma frailty hazard model.SiZer analysis from mortality data in Denmark in 2006, transformed under a Gamma frailty hazard model.