Local linear estimate of the conditional hazard function Limit law of the local linear estimate of the conditional hazard function for functional data

In this work, we treat a prediction problem via the conditional hazard function of a scalar response variable Y given a functional random variable X by using the local linear technique. The main purpose of this paper is to investigate the asymptotic normality of the nonparametric estimator of the conditional hazard function, under some general conditions. A simulation study, conducted to assess finite sample behavior, demonstrates the superiority of our method than the standard kernel method.


Introduction
The functional nonparametric data has been the subject of many studies over the last decade which is devoted to the study of functional data i.e., random variables which are curves or surfaces. They appear more and more frequently in many scientific fields (see, for instance, Müller et al [31] on biological data, and Chiou et al [11] on demographic data). This field of modern statistics has attracted the attention of statisticians. From a historical point of view, the first works in this area dates back to Grenander [24], Dauxois et al. [13], and it has been popularized by Ramsay and Silverman [34], Bosq [8] for parametric models and Ferraty and Vieu [23] for functional nonparametric models.
In this paper, we focus on the nonparametric conditional hazard function model. Noting that, many authors treated this model in their investigations, we refer to Watson and Leadbetter [37] for the first results of the nonparametric estimate of the hazard function by using the kernel method. Ferraty et al. [21] they studied the almost complete convergence of a kernel estimator of the conditional hazard function in the case where the observations are independent and identically distributed. Quantela-del Rio [32] has shown that the almost complete convergence and the asymptotic normality of the estimator presented by Ferraty et al. [21]. The asymptotic mean square error of the conditional hazard function was studied by Rabhi et al. [33]. Recently, Merouan et al. [30]. They established the mean square error and the rates of the convergence of the estimator based on the local modeling approach of the conditional hazard function.
It should be noted that, the local linear method is an alternative statistical approach to kernel method. Precisely, this technique has many advantages over the kernel method, they correct the asymptotic bias that is adversely affected at the boundaries (See Fan [18], and Fan and Gijbels [19], etc). The local linear smoothing in the functional data analysis (FDA) has been discussed very recently. In fact, the first result in this direction, were established in Baìllo and Gran [3], the authors established the L 2 convergence rate of the local linear estimator of the regression function in the Hilbertian case. However, Barrientos et al. [4] investigated the almost complete convergence (with rate) of the local linear estimator of the regression function for independent data. After that, this technique has been applied for the estimation of other conditional models ( see Demongeot et al. [14] for the conditional density and Demongeot et al. [15] for the conditional distribution). Recently, the local linear estimation of the conditional hazard function when the observations are independent was obtained by Massim et al. [29]). On the other hand, the local linear estimation of the point at high risk in the case where observations are spatially dependent. We can cite Abeidallah et al. [1]. For more results on the local linear approach ( see, for instance, Benhenni et al [5], Al-Awadhi et al. [1], Attouch et al. [2] and Chahad et al. [10]).
Our paper presents the result of the asymptotic normality of the nonparametric local linear estimator proposed by Massim and Mechab [29] and Merouan et al. [30] for independent and identically distributed observations. It is well clear that the importance of this result is the construction of confidence intervals.
Our work is organized as follows; we present our model and estimator in the following section 2. The notations and hypotheses are given in the section 3. We state our main results in section 4. In section 5, we present applications of the asymptotic normality such determination of confidence intervals. A simulation study is presented in section 6. Finally, the last section 7 is dedicated to the proofs of the results.

Model and estimator
Consider n independent pairs of random variables (X i , Y i ) for i = 1, ..., n, we assume that drawn from the pair (X, Y ). The latter is valued in F × R, where F is a semi-metric space and d denotes a semi-metric. For a fixed x ∈ F, we denote the conditional probability distribution of Y i given X i = x, by for all y ∈ R, L x (y) = P(Y ≤ y|X = x).
This distribution is absolutely continuous with respect to the Lebesgue measure on R and has bounded density, denoted by g x . Recall that, Demongeot et al. [15] proposed the functional local linear estimate of L x (y) is obtained as the solution for a of the following minimization problem: where β(., .) and ρ(., .) are known functions from F × F into R, K is a kernel, T is a distribution function, h K := h K,n and h T := h T,n are the bandwidths parameters. However, if the bi-functional operator β is such that, for all z ∈ F, β(z, z) = 0. Then, by some algebra, the estimator L x (y) can explicitly be rewritten as follows: From this estimator, we deduce an estimator for the conditional density, noted g x (y), defined by: where T denotes the derivative of T . Now, we consider the problem of the nonparametric estimation of the conditional hazard function defined, for all y ∈ R such that L x (y) < 1, by According to equation (2) and (3) , the estimator of the conditional hazard function is h x (y)) defined by .

Hypotheses and notations
In what follows, x (resp. y) denotes a fixed point in F (resp. in R), N x (resp. N y ) denotes a fixed neighborhood of x (resp. of y,) and φ x (r 1 , r 2 ) = P(r 2 ≤ ρ(X, x) ≤ r 1 ).
(H1): On the distribution of the functional variables and the regularity of the model. i) For any r > 0, φ x (r) := φ x (−r, r) > 0, and there exists a function Ψ x (·) such that: ii) For any l ∈ {0, 2}, we have where the functions G l (s) and Λ l (s) are derivable at s = 0. Obviously, these assumption is very standard in the FDA context. In particular, assumption (H1)i is adapted to condition (H1) in Ferraty et al. [20], when we replace the semi-metric d by some bi-functional operator ρ. The first part of this assumption characterizes the concentration property of the probability measure of the functional variable X. More precisely, the formula for the probability of a small ball does not admit an explicit expression but in some process this probapility is written in the form of two independent functions, one which depend on x, and the other depends on the smoothing parameter ie : φ x (h) = g(x)C(h). (see, Li and Shao. [28] for the performance of φ x (h) in the neighborhood of 0.) In finite dimension, for example X ∈ R, φ x (h) is written under the form g(x) h, where g(x) is a density of the random variable X. The second part is used to control the regularity of the functional space of our model and this condition is needed to evaluate the bias term of the convergence rates. (H2): On the locating functions ρ(., .) and β(., .). i) where B(x, r) is the closed-ball centered at x and of radius r.
The assumption (H2) has been introduced and commented in Barrientos et al. [4] and it plays an important role in our methodology, particularly when we will compute exact constant terms involved in the asymptotic result. (H3): On the kernels K and T .
(i) The kernel K is a positive function which is supported within [−1, 1] and for which the first derivative K satisfies: (H4): On the bandwidths h T and h K .
(i) The bandwidths h K are such that: there exists a positive integer n 0 for which The hypotheses (H3) and (H4) are imposed for a sake of brevity of our results's proofs.
• Remark 1. Notice that in the case b = 0, Asymptotic normality of a conditional hazard function estimate has been studied by Laksaci et al. [27]. • Remark 2. The local constant estimator of h x (y) can be explicitly expressed by the ratio of the conditional density and the conditional distribution function: .

Main Results
Before listing our results, we present the parameters appear in the bias and variance dominant terms, such as these quantities is defined as following : To simplify the proofs of our results let us note Theorem 1. Under assumptions (H1)-(H4), we have • Remark 3. In practice, the small ball probability φ x (t) is usually unknown, so we replace φ x (t) by the empirical estimator defined as where #(A) denotes the cardinality of the set A.
In additin if we omit the bias term, we need the following assumption : according to this hypothesis, we obtain the asymptotic normality of our estimator without the bias term. Then we have the following corollary.
Corollary 1. When the assumptions (H1)-(H5) hold, we have the following asymptotic result Proof of Theorem 1. We consider the the following decomposition: Then, the rest of the proof of this theorem is based on the following lemmas for which proofs are given in the appendix 7.  [30]) Under the assumptions of Theorem 1, we have . . . . .
• Remark 4. It is clear that, the results of Lemma 4 allows to write: .

Confidence bands
The proposed of theorem 5 can be used practically in the constructing of the confidence intervals for the true value of h x (y), because the unknown functions defined as where j = 1, 2.
intervening in the expression of asymptotic variance, we need to estimate the quantities C j , can be obtained using the following empirical estimators:: Now a plug-in estimate for the asymptotic the variance V h HK (x, y), we have: From a statistical point of view, the choice of the semi-metric plays a determining role in particular in the improvement of the convergence rates by increasing the concentration of the probability measure of the functional explanatory variable. In the functional statistical approach, the performance of an estimator strongly depends on the locating functions β(., .) and ρ(., .). In the theory case, under assumption (H2-ii) we can take β(., .) = ρ(., .), But do not play similar roles. However, in the practical case the choice of bi-functional operators β(., .) and ρ(., .) will depend on the nature of the data, the shape of the curves. For example, if the functional data are smooth or rough. we can using the following family of locating functions ( see Barrientos-Marina-Marina et al. [4] for more discussions on this subject) where θ(t) is a given function and x (q) denotes the q th derivative of the curve x if the curves are observed at the same points or are irregular, we can using the following family of locating functions where v 1 , ..., v q are the orthonormal eigenfunctions of the empirical covariance op- After choosing the kernel function K(.) which satisfies condition (H3-i), for instance quadratic kernel, and by using the local cross-validation procedure method for selecting the bandwidths h K andh T , we can obtain the asymptotic confidence intervals at asymptotic level (1 − ξ) where 0 < ξ < 1 for h x (y) is given by where u 1− ξ 2 denotes 1 − ξ 2 quantile of the standard normal distribution.

Simulation
Our main purpose of this illustration is to show the usefulness of the conditional hazard function in a prediction context. More precisely, we illustrate the performance and the superiority of our estimator by using the criteria the mean square error (MSE). For this aim we compare the (MSE) of the local linear approach (L.L), studied here, over the classical kernel method (L.C) when the data are of functional kind. Therefore, we define the two models by the following formula . .
For this aim, we generate the random variables (X i , Y i ) 1≤i≤100 by using the regression model: where the ε i is a rondom variables (r.v) normaly distributed as N (0, 0.2) and independent of X, while the nonlinear operator R, is given by : The functional covarite X are generated (see figure 1 ) by the following process : where η i are independent and identically distributed and follow the normal distribution N (0, 0.2), while the random variables b i are generated from a uniformly distributed on the interval [1,3] ). All the curves X i 's are generated from 100 equidistant values in [0, 1] 6.1. Concerning the smoothing parameters (h T and h K ). The bandwidth parameters is very crucial in nonparametric estimation because this parameters intervenes in all the asymptotic properties as in the case which we are studying, in particular in the improvement of the rate of convergence.
In this application we using the cross-validation method (CV) to select the bandwidths h T and h K . For the similar technique introduced by Rachdi et al. [14]. We consider the minimization of the quadratic error in the local linear estimation of the conditional density for functional data, which defined by the following criterion: .

6.2.
On the choice of the locating functions (ρ and β). .
The choice of bi-functional operatorsρ and β will depend on the nature of the data, the shape of the curves. regarding the shape of the curves X i , it is clear that the following family of locating functions : is well-adapted to this kind of data. x (s) (t) denoting the sth derivative of the curve x(t) and θ is the eigenfunction of the empirical covariance operator associated with the q-greatest eigenvalue.
-(X j , Y j ) j=81,..,100 , test sample. • Step 3. We calculate the two estimators by using the learning sample and we find the local linear (L.L) and the local constant (L.C) estimators of the conditional hazard function ( h L.L and h L.C ).
• Step 4. We present our results by plotting the boxplot of the prediction error are represented in ( Figure 5) and we compute the empirical mean square error with :

Conclusion
It is clear to see that the mean quadratic error presented by a local linear estimator is much improved than the classical estimator. This is confirmed by the mean squared error MSE

Appendix
Proof of Lemma 1. We use the same decomposition idea as in the proof of Theorem 3.1 in Bouanani et al. [9], we obtain :   where Claim 2 :

Proof of claim 1
First, we need to evaluate the variance of R 2,n . For this, we have where For the second term on the right hand side of equation (13), by applying Lemma 2 in Ezzahrioui and Ould-Saïd [16], we obtain (14) nh For the first part of equation (13), by a change of variable, we get Remark that g X1 (y − th T ) − g x (y) → o(1) as n → ∞. Then, as n → ∞, we deduce that On other hand, by using again the technical Lemma A.1 in Zhou and Lin [38] we obtain Consequently lim n→+∞ V ar(R 2,n ) = C 2 C 2 1 g x (y) (T (t)) 2 dt.
Finally, to prove the the asymptotic normality of S n , we employ the Lindeberg's central limit theorem on R i for which suffices to show that → 0, for all ε > 0.
We have Now, by the application of Lemma A.1 [38], we have the following By following the same ideas as those used by Ezzahrioui et al. [16] in Lemma (2.6), we get We put: on the other hand, we have Therefore, for all and if n is great enough, the set becomes empty. Moreover, by assumption (H4) for n large enough the set (16) is empty, because lim n→∞ nh T φ x (h K ) = ∞. Consequently, we obtain from 15 and 16, that R 2 i > ε 2 V ar(R n ) is empty. The proof of Claim 10 is therefore complete.
Concerning the proof of claim2 12, Using the same ideas as those used in equation (11). This yields the proof.
Proof of Lemma 2. To show the convergence in probability of B n (x, y) to 0 it suffices to prove the following two formulas Moreover, the asymptotic variance of ( L x D − L x N (y)) given in Remark 4 and by assumption (H4) allows to obtain: which yields to the proof of Lemma 3.