Preprint
Article

Nonparametric Partial Linear Estimation for Spatial Functional Data with Missing At-Random

Altmetrics

Downloads

94

Views

27

Comments

0

Submitted:

05 December 2023

Posted:

07 December 2023

You are already at the latest version

Alerts
Abstract
The aim of this paper is to study a semi-functional partial linear regression model (SFPLR) for spatial data with responses missing at random. The estimators are constructed by the kernel method, and some asymptotic properties such as probability convergence rates of the nonparametric component and asymptotic distribution of the parametric and nonparametric components are established under certain conditions. Next, the performances and the superiority of these estimators are presented and examined using a study on simulated data and on real data by carrying out a comparison between our semi-functional partially linear model with MAR estimator (SFPLRM), the semi-functional partially linear model with the full-case estimator (SFPLRC) and the nonparametric functional model estimator with MAR (FNPM). The results show that the proposed estimators outperform existing estimators as the number of random missing data increases.
Keywords: 
Subject: Computer Science and Mathematics  -   Probability and Statistics

1. Introduction

Nowadays, statistical techniques are necessary for the analysis of massive volumes of data with a spatial argument; this is the case of environmental sciences, climate, geography, econometrics, medicine, biology, and other applied fields. Thus, high-dimensional data becomes functional data which is then processed and analyzed using functional data analysis (FDA) methods and they will be called functional data with spatial correlation. The theoretical and practical aspects resulting from this new branch of statistics have given rise to beneficial advances in areas with geographic dependence (for recent reviews on the subject, see [1]). Indeed, functional data analysis (FDA) has seen rapid growth in recent years, moving from exploratory and descriptive data analysis to linear models and estimation techniques. This dynamic contributes to theoretical and methodological improvements, as well as to the diversification of fields of application (see, for example, [2–4] and through various recent bibliographic discussions such as [5,6]). One of the major subjects in this field of research is functional regression, which studies the influence of a functional random variable on a scalar variable. Among the typical questions that arise on this subject, we find, in particular semi-parametric functional regression models that they retain the flexibility of parametric regression models and free themselves from the sensitivity to the dimensional effects of the non-parametric approaches. For a complete discussion with a state of the art on semi-parametric modeling by functional regression, we refer the readers to the bibliographical surveys of [7].
In particular, one of the most important semi-parametric functional models is the partially linear regression model introduced by [8], which is called the semi-functional partial linear regression (SPFLR) model, and therefore, many works were carried out to estimate and apply this model. This model is expressed as:
Y = X T β + m ( Z ) + ϵ ,
where Y is the scalar response variable, X = ( X 1 , X 2 , , X p ) is a p-vector of explanatory variables, β is unknown p-dimensional parameter vector, Z is functional explanatory variable, m ( . ) is an unknown smooth functional operator and ε are identically distributed random errors satisfying E ( ϵ ) = 0 and unknown variance σ 2 ( ϵ ) < . We generally assume an additional condition of independence of the variable error ε with respect to the random vector ( X , Z ) . The authors use the classical kernel method (Nadaraya-Watson type weights method) to prove the asymptotic normality of β and the convergence rate of m. This model was extended to dependent data by [9]. Furthermore, [10] proposed a bootstrap procedure to approximate the distribution of these estimators, while [11] proposes to generalize this model to the case where the linear component is also functional. More recently, [12] studied different bootstrapping procedures for this model under dependency structures. Additionally, a procedure for testing linearity in partially linear functional models is proposed in [13]. Other approaches have been proposed to estimate SPFLR model parameters; we cite, for example, the local linear approach used by [14], the robust procedures considered by [15], the k nearest neighbors (kNN) procedure used by [16] and Bayesian approaches proposed by [17]. For recent advances, we can consult the bibliographic reviews in [6,18].
Furthermore, only a few research works have paid attention to estimation in the semi-functional partial linear regression model for spatially dependent observations. We cite the work of [19] on the autoregressive semi-functional linear regression (SAR) model, which proposed an estimator based on the method of quasi-maximum likelihood and local linear estimation while [20] obtained the asymptotic normality of the parametric component as well as the convergence in probability with the rate of the nonparametric component. The complete analysis of the data is the subject of all the works listed. On the other hand, in many applications, this is unfortunately not the case, and we are faced with missing data. There are several common causes of missed responses, including faulty equipment, sample contamination, manufacturing defects, clinical study dropouts, climatic circumstances, inaccurate data entry, and more. On this subject, filling in missing data and the uncertainty linked to such imputation has been the subject of in-depth studies in the multivariate case (see, for example, [21,22]). However, little research on nonparametric functional regression with missing data has been done. The first work was carried out by [23] to estimate the mean of a scalar response using an i.i.d. functional sample. The functional regressor is fully observed, with some responses missing at random (MAR). They generalize the result obtained in [24] in the multivariate case. While [25] took into account the estimation of the regression function and established these asymptotic properties under the condition of stationarity and ergodicity with MAR response. On the other hand, [26] used the k nearest neighbors method (k-NN) combined with the local linear method to estimate the regression function with a low number of randomly missing values repones (MAR). For semi-parametric partially linear multivariate models with randomly missing responses, we can cite the work of [27,28] while [29] is the first to study the SFPLR model for i.i.d data with a MAR response, and their results generalize those obtained in [27].
However, the handling of missing data for spatial data is very weak. We cite the work of [30] in the multivariate case to estimate the regression function with the kernel method. The performance of the estimator obtained is compared with the K nearest neighbors method.[31] considers multiple spatial regression models with missing data using the regression-based imputation method to predict missing values and adds a normally distributed residual term to each predicted value. It restores the loss of variability and biases associated with regression imputation. For functional data, we cite [32], which proposes to study the kernel estimation of the regression function when the data are spatially functional and MAR. They obtain the asymptotic properties of this estimator, notably the probability convergence (with rates) and the asymptotic normality of the estimator under certain weak conditions.
In this study, we propose to study SFPLR models for spatial data with missing responses. Our paper is organized as follows. In Section 2, the semi functional partial linear model for spatial data is presented with these estimators as well as the notations and hypotheses used in our work. whereas Section 3 is devoted to the asymptotic results. The proofs of the asymptotic results are postponed in Section 4. In section 5, a computer study on simulated and real data is carried out to demonstrate that the proposed estimators allow a nice improvement compared to the usual global approach.

2. The Model and Its Estimators

Let Z N be the integer lattice points in the N-dimensional Euclidean space and let Λ i = { ( Y i , X i ( 1 ) , X i ( 2 ) , , X i ( p ) , Z i ) T , i Z N } be a measurable strictly stationary spatial process defined over a probability space Ω , A , P and identically distributed as ( Y , X 1 , X 2 , , X p , Z ) T , where Y is a response variable that is depend on p-dimensional random vector X s ( s = 1 , , p ) in a linear way but is related to another independent functional variable Z in an indeterminate form by means of an SFPLR model
Y i = s = 1 p X i s β s + g ( Z i ) + ϵ i = X i T β + m ( Z i ) + ϵ i i Z N ,
where X i = ( X i ( 1 ) , X i ( 2 ) , , X i ( p ) ) , β = ( β 1 , , β p ) T , m ( . ) and ε are defined as before such that
E ( ϵ i | X i ( 1 ) , X i ( 2 ) , , X i ( p ) , Z i ) = 0 and E ( ϵ i 2 | X i ( 1 ) , X i ( 2 ) , , X i ( p ) , Z i ) < .
We assume that Z is valued in semi-metric space F , where the associated semi-metric is denoted by d ( · , · ) and we denote by B ( T , h ) = { Z F such that d ( Z , Z ) h } the topological closed ball. We further assume that the process can be observed in the rectangular region I n = { i = ( i 1 , , i N ) Z N , 1 i k n k , k = 1 , N } , with a sample size of n ^ = n 1 × × n N where n = ( n 1 , , n N ) . Suppose moreover that, for l = 1 , N , n l approaches infinity at the same rate: C 1 < n j / n k < C 2 for some 0 < C 1 < C 2 < and we write that n if min k = 1 , N ( n k ) . Remember that the term site is used to designate a point i .
The kernel estimators of β n and m n (see [20]) are defined by:
β ^ n = i I n X ˘ i ( X ˘ i ) T 1 i I n X ˘ i Y ˘ i
and
m ^ n ( t ) = i I n w n ( z , Z i ) Y i X i T β ^ ,
where
Y ˘ i = Y i i I n Y i w n ( Z i , Z j ) , X ˘ i = X i i I n w n ( Z i , Z j ) X i
with
w n ( Z i , Z j ) = K d ( Z i , Z j ) b n 1 i I n K d ( Z i , Z j ) b n 1 .
K denotes a real-valued kernel function and b n a decreasing sequence of bandwidths which tends to zero as n tends to infinity.
In this article, we develop an estimation approach in which the values of the independent variables (X and Z) are all observed, however, some observations of the response variable (Y values) are missing. We recall that we say that Y i is missing if Y i does not contain all of the required elements. For this we consider a real random variable denoted δ such that δ i = 1 if the value Y i is known, and δ i = 0 otherwise. Thus, the study will be carried out on an incomplete sample of size n:
Y i , X i , Z i , δ i , i I n .
It is assumed that the random missing data mechanism satisfies the saving condition
P ( δ i = 1 / Y i = y , X i = x , Z i = z ) = P ( δ i = 1 / X i = x , Z i = z ) = p ( x , z ) .
This conditional probability p ( x ) is generally unknown.
First, we note that
δ i Y i = δ i X i T β + δ i m ( Z i ) + δ i ϵ i i I n ,
From our assumption,by conditioning on Z i = z , it follows that
E ( δ i Y i / Z i = Z ) = E ( δ i X i / Z i = Z ) T β + E ( δ i / Z i = Z ) m ( Z i ) i I n ,
from where we have
m ( Z ) = E ( δ i Y i / Z i = Z ) E ( δ i X i / Z i = Z ) T β E ( δ i / Z i = Z ) .
In the following, we define m X ( z ) = E ( δ X / Z = z ) / E ( δ / Z = z ) , m Y ( z ) = E ( δ Y / Z = z ) / E ( δ / Z = z ) and we write A 2 = A A T .
Then, by equations 5 and 6, we can write
δ i Y i m Y ( Z i ) = δ i X i m X ( Z i ) T β + δ i ϵ i .
Thus, if the functions m X ( Z ) and m Y ( Z ) are known, the least squares estimator of β is given by
β ^ n = i I n δ i X i m X ( Z i ) 2 1 i I n δ i X i m X ( Z i ) Y i m Y ( Z i ) .
However, m X ( Z i ) and m Y ( Z i ) are generally unknown and they must be estimated for applied 8. Assuming that m X and m y are smooth functions of Z i , these can be estimated by using the nonparametric Nadaraya-Watson kernel estimator noted by m ^ X ( Z ) and m ^ Y ( Z ) and expressed by
m ^ X , n ( Z ) = i I n δ i w n ( Z , Z i ) X i ,
m ^ y , n ( Z ) = i I n δ i w n ( Z , Z i ) Y i
where
w n ( Z , Z i ) = K ( d ( Z , Z i ) / h n ) j I n δ j K ( d ( Z , Z j ) / h n ) ,
with K is a real-valued kernel function and h n a decreasing sequence of bandwidths which tends to zero as n tends to infinity.
Hence, an estimator of β n is given by
β ^ n = i I n δ i X ˜ i ( X ˜ i ) T 1 i I n δ i X ˜ i Y ˜ i ,
where
Y ˜ i = Y i j I n δ j w n ( Z i , Z j ) Y j and X ˜ i = X i j I n δ j w n ( Z i , Z j ) X j .
We then insert β ^ n in equation 7 to obtain
m ^ n ( Z ) = m ^ Y , n ( Z ) m ^ X , n T ( Z ) β ^ n .

3. Notations and Hypotheses

Our main objective is to obtain the asymptotic normality of our estimator under the condition that the process Λ i is strictly stationary, which satisfies the following α -mixing condition (see [33]): there exists a real function φ ( t ) which tends to 0 when t goes to , such that for finite cardinal subsets ( Card ( . ) < ) E , E Z N :
α ( B ( E ) , B ( E ) ) = sup { ( A , B ) B ( E ) × B ( E ) } { P ( A B ) P ( A ) P ( B ) } φ ( d ( E , E ) ) ψ ( C a r d ( E ) , C a r d ( E ) ) ,
where d ( E , E ) denotes the Euclidean distance in Z N , B ( S ) = B ( Z i , i S ) , for S Z N ,the σ -fields generated by a random variable Z i and ψ : Z 2 R + is a symmetric positive function nondecreasing in each variable such that:
( r , s ) Z 2 , ψ ( r , s ) C min ( r , s ) , for some C > 0 ,
Moreover, as is often the case in spatial regression, we assume other condition on the function φ :
i = 1 i γ ( φ ( i ) ) < , for some γ > 0 .
In what follows, in order to simplify the writings, we introduce the following notations:
θ i ( s ) ( z ) = X i ( s ) E δ i X i ( s ) | Z i = z , s = 1 , , p , θ i ( 0 ) ( z ) = Y i E Y i | Z i = z ,
θ i = θ i ( 1 ) , , θ i ( p ) T , Δ i ( s ) ( z ) = E [ δ i X i ( s ) | Z i = z ] j I n δ j w n ( Z i , Z j ) X j ( s ) , , s = 1 , , p
Δ i ( 0 ) ( z ) = E [ δ i Y i | Z i = z ] j I n δ j w n ( Z i , Z j ) Y j , Δ i = Δ i ( 1 ) , , Δ i ( p ) T ,
and we need the following assumptions which are necessary to prove and obtain our results.
(H1): Let ν ij = sup i j P ( Z i , Z j ) B ( z , h ) × B ( z , h the probabilistic joint distribution of Z i and Z j and let ϕ z ( h n ) = P ( Z B ( z , h n ) ) = μ B ( z , h n ) the small ball probability.
We suppose that, i j Z N and z E , there exist C > 0 such that,
for   some   1 < a < γ N 1 , sup i j ν ij C ( ϕ z ( h ) ) a + 1 a .
(H2): The kernel function K : R R + is assumed to be differentiable with support in the interval [ 0 , 1 ] such that there exist two constants C 3 and C 4 with
< C 3 < K ( t ) < C 4 < 0 , for t [ 0 , 1 ] .
(H3): We assume that any function f m , θ i ( 1 ) ( z ) , , θ i ( p ) ( z ) . is smooth i.e there exists c > 0 such that for κ > 0 , we have:
f ( u ) f ( v ) c d ( u , v ) κ , u , v F
(H4) There exists a differentiable nonnegative functions τ and g such that:
ϕ z ( h ) = g ( h ) × τ ( x ) + o ( g ( h ) ) .
(H5) For all t [ 0 , 1 ] , we have:
lim h n 0 ϕ z ( t h n ) ϕ z ( h n ) = ι z ( t ) .
(H6) Let R i = θ i ε i and Σ = E [ p ( X 1 , Z 1 ) θ 1 ( θ 1 ) T ] , where 1 is the site spatial ( 1 , , 1 ) and we denote by 0 he site spatial ( 0 , , 0 ) .
i) The matrix B = i I n E [ p ( X i , Z i ) R 0 ( R i ) T ] is assumed positive definite.
ii) We assume that Σ is an invertible matrix.
(H7): We suppose that:
i) E ε 1 ρ + E θ 1 ( 1 ) ρ + + E θ 1 ( p ) ρ < for some ρ 3 .
ii) For all i j , E Y i Y j | Z i , Z j <
iii) For all i j , max 1 s p E X i ( s ) X j ( s ) | Z i , Z j < .
(H8)
i) Let V 2 ( z ) = v a r ( Y i X i T β | Z i = z ) and V s ( z ) = E Y i X i T β m ( z ) s | Z i = z , with s > 2 . We suppose that the function V 2 ( . ) and V s ( . ) is continuous function near z, i.e. since h tends toward 0, we have
sup z : d ( z , z ) h V k ( z ) V k ( z ) = o ( 1 ) , k 2 .
ii) Let V ( z , z , z ) = E Y i X i T β m ( z ) ) ( Y j X j T β m ( z ) | Z i = z , Z i = z , for i j .
We suppose that the function V is continuous in some neighborhood of ( z , z ) .
(H9) We also suppose that p 1 ( . ) = P ( δ = 1 / Z = z ) , is continuous function near z, i.e,
sup u : d ( z , z ) h p 1 ( z ) p 1 ( z ) = o ( 1 ) , when h go to 0 .
(H10) There exists 1 > ϑ > N γ , in such that, ϕ z ( h ) n ^ ϑ 1 2 N + 1 .
Comments on the assumptions
The asymptotic results obtained subsequently are based on Theorem 4.1 [32] (see the proof section). It is, therefore, normal to impose the same hypotheses as in this theorem, as well as to hypotheses (H1)–(H5) and (H7)-H(10) (for comments on these hypotheses, see [32]). The additional hypothesis (H6) is usual in the context of the SFPLR.

4. Theoretical Results

We are now in a position to give our asymptotic results. The first one gives the asymptotic distribution of the estimator for the parametric component of the model ( β ^ n ).
Theorem 1.
When the assumptions (H1)-(H10), 11, 12, 13 hold, if additionally the bandwidth parameter h n and the function ϕ z ( h n ) satisfies n ^ ϕ x ( h n ) log ( n ^ ) and n ^ h n κ ϕ x ( h n ) log ( n ^ ) 0 , when n , we have:
n ^ 1 / 2 β ^ n β D N 0 , Σ 1 B Σ 1 T
The following results give the probability convergence and the asymptotic normality of the estimator of the non-parametric part.
Theorem 2.
Based on hypotheses of the Theorem 1, we have:
m ^ n ( z ) m ( z ) p 0 .
Theorem 3.
Under the assumptions of 1, if in addition, n ^ h n 2 κ ( ϕ x ( h n ) ) 0 , as n , we have
n ^ ϕ z ( h n ) 1 / 2 m ^ n ( z ) m ( z ) D N 0 , σ 2 ( z ) .
where σ 2 ( z ) = ι 2 ι 1 2 V 2 ( z ) p 1 ( z ) τ ( z ) , with ι k = 0 1 ι z ( t ) ( K k ) ( t ) d t , for k = 1 , 2 .
We note that these results obtained extend those which are established in the case of complete data (see [20]).

5. Proof

Throughout the rest of this paper, we set, K i = K h n 1 d ( x , X i ) , for all i I n .
First, we recall the following results (Theorem 4.1 in [32]), which will be used in the demonstrations of our results.
Lemma 1.
Under hypotheses (H1)-(H10), 11, 12, 13, if additionally the bandwidth parameter h n and the function ϕ z ( h n ) satisfies n ^ ϕ x ( h n ) log ( n ^ ) and n ^ h n κ ϕ x ( h n ) log ( n ^ ) 0 , when n tend to ∞, then
n ^ ( ϕ x ( h n ) / log ( n ^ ) ) m ^ X ( z ) m X ( z ) converge in probability to 0 .
n ^ ( ϕ x ( h n ) / log ( n ^ ) ) m ^ Y ( z ) m Y ( z ) converge in probability to 0 .
Second, we need to state some preliminary results.
We denote by λ i ( z ) = ϕ z ( h ) E ( K 1 ) δ i Y i X i T β m ( z ) K i and we define the random variable W i ( z ) by W i ( z ) = λ i ( z ) E λ i ( z ) .
Lemma 2.
Based on hypotheses of the Theorem 1, we have:
n ^ 1 V a r i I n W i ( z ) V ( z ) = p 1 ( z ) V 2 ( z ) ι 2 ι 1 2 τ ( z ) as n .
Proof:
Before going further, note that
V a r i I n W i ( z ) = i I n V a r W i ( z ) + i j C o v W i ( z ) , W j ( z ) = I n ( z ) + R n ( z ) ,
First, we have V a r W i ( z ) = E λ i 2 ( z ) E 2 ( λ i ( z ) ) .
Conditioning on Z i and using MAR assumption, we have
E ( λ i ( z ) ) = ϕ z ( h ) m ( z i ) m ( z ) E ( K i ) E ( K 1 ) ,
then, ( H 3 ) and ( H 9 ) implies
E ( λ i ( z ) ) ϕ z ( h ) h κ p 1 ( z ) + o ( 1 ) E ( K i ) E ( K 1 ) .
Similarly, conditioning on Z i and using MAR assumption we have
E λ i 2 ( x ) = ϕ z ( h ) E p ( z i ) ( m ( z i ) m ( z ) ) 2 K i 2 E 2 ( K 1 ) + ϕ x ( h ) E p ( x i ) V 2 ( x i ) K i 2 E 2 ( K 1 ) .
It follows, by ( H 8 ) ( i ) and ( H 9 ) , that:
E p ( z i ) V 2 ( z i ) K i 2 E 2 ( K 1 ) = p 1 ( z ) + o ( 1 ) V 2 ( z ) + o ( 1 ) E ( K i 2 ) E 2 ( K 1 ) ,
and by ( H 3 ) and ( H 9 ) that:
E p 1 ( z i ) ( m ( z i ) m ( z ) ) 2 K i 2 E 2 ( K 1 ) p 1 ( z ) + o ( 1 ) h 2 κ E ( K i 2 ) E 2 ( K 1 ) .
From where, we have
E λ i 2 ( z ) ϕ z ( h ) p 1 ( z ) + o ( 1 ) V 2 ( z ) + o ( 1 ) + h 2 κ E ( K i 2 ) E 2 ( K 1 ) ,
which implies that
E ( W i ( z ) 2 ) ϕ z ( h ) p 1 ( z ) + o ( 1 ) V 2 ( z ) + o ( 1 ) E ( K i 2 ) E 2 ( K 1 ) + h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) E ( K i 2 ) E 2 ( K 1 ) + h 2 κ ϕ x ( h ) p 1 ( z ) + o ( 1 ) E 2 ( K i ) E 2 ( K 1 ) ,
and, thus
I n ( x ) n ^ ϕ z ( h ) p 1 ( z ) + o ( 1 ) V 2 ( z ) + o ( 1 ) E ( K i 2 ) E 2 ( K 1 ) + + n ^ h 2 κ p 1 ( z ) + o ( 1 ) E ( K i 2 ) E 2 ( K 1 ) + n ^ h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) E 2 ( K i ) E 2 ( K 1 ) .
On other hand, using ( H 1 ) ( H 2 ) , ( H 4 ) ( H 5 ) , we leads to the fact that
τ ( z ) ϕ z ( h ) 1 E ( K i j ) ι j , j = 1 , 2 ,
Consequently; we have
1 n ^ I n ( z ) = 1 n ^ i I n E ( W i ( z ) 2 ) V ( z ) = ι 2 ι 1 2 p 1 ( z ) V 2 ( z ) τ ( z ) as n .
For the covariance term R n ( z ) , taking account that
i j C o v W i ( z ) , W j ( z ) = i j E W i ( z ) W j ( z ) ,
then, by some argument as above, we have
i j E W i ( z ) W j ( z ) 1 E 2 ( K 1 ) h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) 2 i j E ( K i K j ) E ( K i ) E ( K i ) .
In the following, we introduce the following sets E 1 = { i j I n such that i j c n } , and E 2 = { i j I n such that i j > c n } , where c n is a real sequence that tends to + as n , which will be specified later, and we note in the following
R n 1 = E 1 E K i K j E K i E K j and R n 2 = E 2 E K i K j E K i E K j .
On one hand, by assumption ( H 1 ) , we have:
R n 1 C n ^ c n N 1 E 2 ( K 1 ) h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) i j ϕ z ( h ) 1 + 1 a .
On the other hand, by Lemma (3.3) in [33] and using the fact that the random variables K i are bounded, we have
E ( K i K j ) E ( K i ) E ( K j ) C φ i j .
Consequently, we have
R n 2 1 E 2 ( K 1 ) C h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) i , j E 2 φ i j 1 E 2 ( K 1 ) C n ^ h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) i : i c n φ i 1 E 2 ( K 1 ) C n ^ h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) c n N a i j i : i c n i N a φ i .
Then, by the condition (13); if we ask c n = ϕ z ( h ) 1 N a , we have
R n = R n 1 + R n 2 C n ^ 1 E 2 ( K 1 ) h 2 κ ϕ z ( h ) p 1 ( z ) + o ( 1 ) .
So the equations (21), (22) and (23) implies that
i , j C o v ( W i ( z ) , W j ( z ) ) = o ( n ^ ) .
The proof concludes from (18), (20), (25) ◾
Using the lemma 2 and the same reasoning as in the proof of Theorem (4.2) in [32], with the same notations, we prove the following result.
Lemma 3.
Under hypotheses (H1)-(H10), if the bandwidth parameter h n and if the function ϕ z ( h n ) satisfies n ^ ( ϕ z ( h n ) ) and n ^ h n 2 κ ( ϕ x ( h n ) ) 0 , as n + , then, we have
n ^ ϕ z ( h n ) m ^ n ( z ) m ( z ) D N 0 , σ 2 ( z ) .
with σ 2 ( z ) = ι 2 ι 1 2 V 2 ( z ) p 1 ( z ) τ ( z ) .
Lemma 4.
For the hypotheses (H1) through (H10), if additionally n ^ ϕ x ( h n ) log ( n ^ ) and n ^ h n κ ϕ x ( h n ) log ( n ^ ) 0 , when n tend to ∞, then we have:
1 n ^ i I n δ i X ˜ i ( X ˜ i ) T Σ in probability .
Proof:
Taking account that
1 n ^ i I n δ i X ˜ i ( X ˜ i ) T r s = 1 n ^ i I n δ i θ i ( r ) θ i ( s ) + 1 n ^ i I n δ i Δ i ( r ) θ i ( s )
+ 1 n ^ i I n δ i Δ i ( s ) θ i ( r ) + 1 n ^ i I n δ i Δ i ( r ) Δ i ( s ) ,
where
Δ i ( s ) ( z ) = E [ δ i X i ( s ) | Z i = z ] i I n δ j w n ( Z i , Z j ) X j ( s ) , s = 1 , , p .
Then, by the strong law of large numbers and the MAR assumption, we have that
1 n ^ i I n δ i θ i ( r ) θ i ( s ) E p ( X 1 , Z 1 ) θ 1 ( r ) θ 1 ( s ) = Σ r s i n p r o b a b i l i t y .
Second, by the lemma 1 and Cauchy-Schwartz inequality, it is easy to check that
1 n ^ i I n δ i Δ i ( r ) θ i ( s ) converge in probability to 0 ,
and
1 n ^ i I n δ i Δ i ( s ) θ i ( r ) converge in probability to 0 .
So, we conclude the proof of Lemma 4 by using (29)-(31).
Proof of Theorem 1 Let us use the following decomposition:
n ^ 1 / 2 β ^ n β = 1 n ^ i I n δ i X ˜ i ( X ˜ i ) T 1 1 n ^ i I n δ i R i + i I n δ i θ i Δ i ( 0 ) Δ i T β + i I n δ i Δ i ε i + i I n δ i Δ i Δ i ( 0 ) Δ i T β
where
Δ i ( 0 ) = E [ δ i Y i | Z i ] i I n δ i w n ( Z i , Z j ) Y i
So, by using (30)-(31) and the Cauchy-Schwartz inequality, we can see that
1 n ^ i I n δ i θ i Δ i ( 0 ) Δ i T β + 1 n ^ i I n δ i Δ i ε i + 1 n ^ i I n δ i Δ i Δ i ( 0 ) Δ i T β = o P ( 1 ) .
Which implies, by using Lemma 4, that
β ^ n β Σ 1 + o ( 1 ) 1 n ^ i I n δ i R i + o ( 1 ) in probability .
Then, according to the condition (H6)(ii) and by applying theorem 6.1.1 in [35] on { δ i R i , i Z 2 }, we have:
n ^ 1 / 2 i I n δ i R i D N 0 , B .
The proof concludes from (30), (34), (35) and Lemma 4.
Proof of Theorem 2.
It suffices to see that
m ^ n ( z ) m ( t ) = m ^ y , n ( Z ) m y , n ( Z ) m ^ X , n ( Z ) β ^ n β = S 1 S 2 ,
Then, we have
n ^ ( ϕ x ( h n ) / log ( n ^ ) ) m ^ n ( z ) m ( z ) n ^ ( ϕ x ( h n ) / log ( n ^ ) ) S 1 + n ^ ( ϕ x ( h n ) / log ( n ^ ) ) S 2
By the lemma 1, we have
n ^ ( ϕ x ( h n ) / log ( n ^ ) ) S 1 p 0
On the other hand, we can write
S 2 β ^ β E δ i X i | Z i = z + i I n δ i w n ( t , Z i ) X i E δ i X i | Z i = z .
Then, by the theorem 1, β ^ n β converge in probability to 0 and according to (H7), we have E δ i X i | Z i = z < . Moreover, by the lemma 1, we have
n ^ ( ϕ x ( h n ) / log ( n ^ ) ) i I n δ i w n ( z , Z i ) X i E δ i X i | Z i = z p 0 .
That implies
n ^ ( ϕ x ( h n ) / log ( n ^ ) ) S 2 p 0
Thus, from (37)-(39) the proof is over.
Proof of Theorem 3.
From 36, we have
n ^ ϕ x ( h n ) ( m ^ n ( z ) m ( t ) ) = n ^ ϕ x ( h n ) m ^ y , n ( Z ) m y , n ( Z ) n ^ ϕ x ( h n ) m ^ X , n ( Z ) β ^ β = S 3 S 4 ,
Now, by lemma 3, we have
S 3 D N 0 , σ 2 ( z ) ,
and by the theorem 1, we have S 4 p 0
Then, from (40)-(41) the proof is over.

6. Computational Study

In this section, we are interested in the behavior of the estimators proposed on samples of finite size, with particular attention to the influence of spatial correlation and the effect of MAR on the efficiency of the estimators. To do that, we conducted simulations based on observations denoted as ( X i , Z i , Y i , δ i ) with i = ( i 1 , i 2 ) , 1 i 1 n 1 , 1 i 2 n 2 , and we generate the SFPLR model with MAR as follows:
δ i Y i = δ i r Z i , X i + δ i ε i = δ i X i T β + δ i m ( Z i ) + δ i ε i ,
where β = ( 1 , 2 ) T , X i j Exponential ( 0.5 ) , j = 1 , 2 and take the nonparametric operator m as follows
m ( z ) = 1 0 1 | 1 + z ( t ) | d t .
The curves Z i ( t ) were Z i ( t ) = B i t sin ( t A i ) + ν i ( t ) f o r t [ 0 , 1 ] .
For simulated the curves Z i , we take ν i ( t ) N ( 0 , . 2 ) , A = D * sin G 2 + . 5 with G = G R F ( 0 , 5 , 3 ) , B = G R F ( 2.5 , 5 , 3 ) and ε = G R F ( 0 , . 1 , 5 ) where the function G R F μ , σ 2 , s denotes a stationary Gaussian random field with mean μ and covariance function defined by C ( l ) = σ 2 exp l s 2 , l R 2 and s > 0 . and the function D is defined by D i = 1 n 1 × n 2 j exp i j a   i . e . D ( i , j ) = 1 n 1 × n 2 1 j 1 n 1 1 j 2 n 2 exp ( i 1 , i 2 ) ( j 1 , j 2 ) a . The curves, following the values of a, are displayed in Figure 1. Note that we consider the same curves as in [34] where the function D is used to control the spatial mixing condition. Therefore, these observations are a mixture of independent and dependent data points, as illustrated in Figure 2 (for more details, see [34]). To reduce the level of independence in the data, it is sufficient to decrease the value of a. For our analysis, we have employed a value of σ 2 = 5 and s = 5 . Moreover, similar to that described in [29], we adopted the following missing data mechanism:
p ( x 1 , x 2 , z ) = P ( δ = 1 | X 1 = x 1 , X 2 = x 2 , Z = z ) = expit 2 α j = 1 2 | x j | + 0 1 z 2 ( t ) d t ,
where expit ( u ) = e u / ( 1 + e u ) for all u R and we will take for α the following values: α = 0.05 , 0.5 , 5 .
Let’s remember that the degree of dependence between the functional variable Z and the variable δ is controlled by the parameter α and to check the value of p ( x ) , we compute δ ¯ = 1 1 n 1 × n 2 i 1 = 1 n 1 i 2 = 1 n 2 δ ( i 1 , i 2 ) .
In order to calculate our estimators, we use the class of semi-metrics based on the principal component analysis (PCA) metric which is best suited to the treatment of this type of data (discontinuous functional variable). Furthermore, we have chosen the standard kernel function defined as follows: K ( u ) = 3 2 1 u 2 1 [ 0 , 1 ] ( u ) .
The objective of this computer study is to conduct a comparison between our semi-partially linear model with MAR estimator (SFPLRM), semi-partially linear model estimator r ˜ n in the complete case (SFPLRC) (see [20]) and the nonparametric functional model estimator with MAR (FNPM).
Recall that the FNPM model is defined as follows:
Y = r ( Z ) + ϵ ,
and the estimator of operator regression r (see [32])is given by
r ^ n ( z ) = j I n δ j Y j K d z , Z j / h n j I n δ j K d z , Z j / h n .
For that purpose, we conducted a random split of our data, denoted as ( X i , Z i , Y i , δ i ) i , into two subsets: a test sample ( X i , Z i , Y i ) i I (consisting of data without missing values) and training sample ( X i , Z i , Y i , δ i ) i I that they will be used to select the optimal smoothing parameters h o p t , via cross-validation procedures (for more details on the selected tuning parameters, see [4]).
To evaluate the precision of the estimators of the three models (SFPLRM, SFPLRC, and FNPM), we use the square and mean square errors, denoted M S E :
MSE F N P M = 1 # I i I r ^ n Z i Y i 2 , MSE S F P L R C = 1 # I i I r ˜ n Z i , X i Y i 2 , MSE S F P L R M = 1 # I i I r ^ n Z i , X i Y i 2 ,
where # I represents the size of the testing sample I . The experiment was replicated M = 100 times, which allows us to compute M values for M S E and display their distribution through a boxplot. The results obtained (the prediction values compared with the true values) are presented in Figure 3 for the three models with different values of a. Figure 4 displays boxplots constructed from MSE obtained for the three models.
By Figure 4, we observe that the SFPLRM estimator shows better prediction effects than the FNPM in comparison to the SFPLRC estimator.
In Table 1 below, we present the mean squared error (MSE) results for three methods, FNPM, SFPLM, and SFPLR, under various combinations of n 1 , n 2 , and significance level α .
In Table 1, which presents the mean squared error ( M S E ) for FNPM, SFPLRM, and SFPLRC under various conditions defined by combinations of n 1 , n 2 and α , it becomes evident that the SFPLRM estimator consistently shows superior predictive accuracy when compared to the FNPM estimator across a range of settings. This is notably evident as the M S E values for the SFPLRM estimator consistently remain lower across the majority of combinations of n 1 , n 2 and α . Furthermore, a notable observation is that higher values of α (e.g., α = 5 ) tend to correspond with lower MSE values, indicating enhanced predictive performance for the SFPLRM estimator. It’s also worth mentioning that as the number of samples ( n 1 × n 2 ) increases, the results suggest that the missing at random (MAR) mechanism has little to no discernible effect on the prediction MSE. This observation implies that the SFPLRM estimator maintains consistent performance regardless of the presence of missing data, mainly when dealing with larger sample sizes.
Now, we aim to examine the bias in the estimation of β and its associated Square Error ( S E ), which is defined as:
S E = | β ^ n β | 2 ,
The results are presented in Table 2
The results from Table 2 suggest that the SFPLRM estimator displays good accuracy in estimating the parameters β ^ n , with relatively small Square Error values. This implies that the estimator provides reasonably accurate estimates of the true parameters, even in cases with missing data and varying degrees of dependence. The theoretical conclusions of Theorems 3.1 and 3.2 are consistent with such a numerical outcome.

7. Real Data Application

The objective of this part is to compare, on a set of real data consisting of particle pollution indices, the effectiveness of the SFPLR model by our estimators when the data are MAR. The source of these data is the AriaWeb information system, managed by CSI Piemonte and Regione Piemonte, and our analysis is obtained from 34 monitoring sites using gravimetric instruments recorded during the winter season from October 2005 to March 2006 (daily measurements including T = 182 days). This involves analyzing the levels of pollution, which allowed us to detect higher levels of pollution in the plains closer to urban centers while lower concentrations of this index are observed near the Alps (for more detailed information about the data, we recommend referring to the publication by [36]).
To select the appropriate covariates, a preliminary regression analysis was carried out and the following covariates were selected
  • X 1 = H M I X : maximum daily mixing height (in meters),
  • X 2 = E M I ( s ) : daily primary aerosol emission rates (in g / s ).
  • X 3 = P R E C ( s ) : total daily precipitation (in millimeters),
  • Z = T E M P : the average daily temperature (in Kelvin ° K ).
To implement the theoretical conclusions of the previous section on real data, we will analyze the effectiveness of our estimators built with MAR data in the context of spatial functional prediction which highlights the importance of considering spatial locations. Specifically, we assume that the observations are linked via an SFPLR model (42), where the response variable is Y = P M 10 ( s ) (in μ g / m 3 ) (for each s = 1 , . . . , 182 ) represents pollution levels, the functional predictor Z i ( t ) represents the daily mean temperature curve recorded at the i th station, with its precise location determined by the coordinates i = ( U T M X ; U T M Y ) , Z = T E M P ( t ) ; t = 1 , . . . , 182 and the parametric part are: X = ( X 1 , X 2 , X 3 ) , (for 182 days). δ = 1 if P M 10 ( s ) is observed and δ = 0 otherwise. Note that our data have some missing values (473 NaN of P M 10 ( s ) for each s = 1 , . . . , 182 and each station, about 7.64% missing data).
Figure 5 provides the curves of the functional variable Z i and Figure 4 represents the spatial position of the 34 monitoring stations in the Piemonte region (northern Italy)
Figure 5. Temperature curves Z.
Figure 5. Temperature curves Z.
Preprints 92342 g005
Figure 6. Locations of the stations in Piemonte (northern Italy).
Figure 6. Locations of the stations in Piemonte (northern Italy).
Preprints 92342 g006
However, implementing this spatial modeling approach requires preliminary data preparation to validate the stationarity assumption that handles spatial heterogeneity resulting from variations in the effects of space on sampled units. To answer this, we will use an "detrending step" introduced by [37], which is designed for the multivariate case of the three variables (response, functional, and vectorial explanatory). This algorithm is defined by the following regression.
X ˜ i = m 1 ( i ) + X i , Z ˜ i = m 2 ( i ) + Z i and Y ˜ i = m 3 ( i ) + Y i .
Thus, instead to the initial observations ( X i , Z i , Y i , δ i ) i , we compute the SFPLRM estimator from the statistics ( X ^ i , Z ^ i , Y ^ i , δ i ) i (see Figure 7). The latter are obtained by
X ^ i = X ˜ i m ^ 1 ( i ) , Z ^ i = Z ˜ i m ^ 2 ( i ) and Y ^ i = Y ˜ i m ^ 3 ( i ) ,
and m ^ 1 ( . ) , m ^ 2 and m ^ 3 are the kernel estimators of the regression functions m 1 ( . ) , m 2 ( . ) and m 3 ( . ) which are expressed by
m ^ 1 ( i 0 ) = i I n δ i X i H 1 i 0 i / h n 1 i I n δ i H 1 i 0 i / h n 1 , m ^ 2 ( i 0 ) = i I n δ i Z i H 2 i 0 i / h n 2 i I n δ i H 2 i 0 i / h n 2
and m ^ 3 ( i 0 ) = i I n δ i Y i H 3 i 0 i / h n 3 i I n δ i H 3 i 0 i / h n 3 ,
where the functions H j , j = 1 , 2 , 3 represent kernel functions, while h n j , j = 1 , 2 , 3 are the bandwidth parameters associated with the actual regression. To illustrate the practical implications of this detrending step on our dataset, we will examine its impact by performing a comparative analysis of the performance of the SFPLRM regression in the two cases, one with detrending and the other without this one.
To conduct this analysis, we employ the same methodology as employed in the simulation example for the selection of the estimator’s parameters. Specifically, we use the quadratic kernel on the interval ( 0 , 1 ) in combination with the PCA metric and the cross-validation (CV) criterion to determine the smoothing parameter h n . For the real regressions m 1 ( . ) , m 2 ( . ) , and m 3 ( . ) , we use the npreg routine in the R-package np, with K = H 1 = H 2 = H 3 .
To assess the feasibility of this approach, we randomly split the data sample multiple times (precisely 100 times). The data is divided into two subsets: a learning sample consisting of 24 observations and a test sample containing 10 observations. We then evaluate the significance of the proposed detrending procedure by examining the Mean Squared Error ( M S E ), as used in the simulation example. This analysis allows us to determine the impact of detrending on the performance of the estimators in practice.
The results in Figure 8 reveal a significant enhancement in model accuracy with the inclusion of the detrending step. The Mean Squared Error ( M S E ) values are substantially reduced when detrending is applied, indicating a clear advantage of this data preprocessing technique. As illustrated in Figure 9, it becomes apparent that the detrending step consistently outperforms the non-detrending, clearly showcasing its superior capability to accurately capture the underlying functional relationships. These results underscore the pivotal role of detrending in enhancing model performance and underscore the inherent advantages of the SFPLRM estimator in the realm of non-parametric spatial data analysis.

8. Conclusions

This paper addresses the issue of a semi-functional partial linear regression model for spatial data, assuming that the missing responses occur randomly. The construction of the estimator encompasses both the linear and nonparametric components of the model. One crucial aspect of this study involves the demonstration of the asymptotic normality of the best estimators. This is achieved by imposing certain mild constraints and establishing the probability convergence of the nonparametric component. Furthermore, the utilization of both simulated and real data in the analysis serves to underscore the possible viability and adaptability of the model under investigation, along with the estimators derived from it, in the context of predictive tasks. This is achieved through a comparative analysis using a non-parametric estimator. It is essential to highlight that significant emphasis was placed on the absence of a missing mechanism that is missing at random within the domain of functional data statistics.
In the subsequent analysis, it would be intriguing to examine the expansion of our framework into alternative avenues. One such direction is expanding the framework from missing at random (MAR) data to censored data, necessitating the utilization of intricate mathematical techniques.

Author Contributions

The authors contributed approximately equally to this work. All authors have read and agreed to the final version of the manuscript. Formal analysis, Tawfik Benchikh; Validation, Omar Fetitah; Writing – review & editing, Ibrahim M. Almanjahie and Mohammad Kadi Attouch.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors thank and extend their appreciation to the funder of this work. This work was supported by the Deanship of Scientific Research at King Khalid University through the Large Research Groups Program under grant number R.G.P. 2/406/44.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mateu, J.; Romano, E. Advances in spatial functional statistics. Stoch Environ Res Risk Assess, 2017, 31, 1–6. [Google Scholar] [CrossRef]
  2. Ramsay, J.; Silverman, B. Functional Data Analysis, 2nd ed.; Spinger-Verlag: New York, 2005. [Google Scholar]
  3. Bosq, D.; Blanke, D. Inference and prediction in large dimension; Wiley series in probability and statistics; Wiley: Chichester, 2007. [Google Scholar]
  4. Ferraty, F.; Vieu, P. Nonparametric functional data analysis. Theory and Practice. Springer Series in Statistics. New York, 2006.
  5. Aneiros-Pérez, G.; Horová, I.; Hu˜sková, M.; Vieu, P. Editorial for the Special Issue on Functional Data Analysis and Related Fields. J. Multivariate Anal., 2022, 189. [Google Scholar]
  6. Ling, N.; Vieu, P. Nonparametric modelling for functional data: selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
  7. Greven, S.; Scheipl, F. A general framework for functional regression modelling. Statistical Modelling 2017, 17, 1–35. [Google Scholar] [CrossRef]
  8. Aneiros-Pérez, G.; Vieu, P. Semi-functional partial linear regression. Stat. Probab. Lett. 2006, 76, 1102–1110. [Google Scholar] [CrossRef]
  9. Aneiros-Pérez, G.; Vieu, P. Nonparametric time series prediction. A semi-functional partial linear modeling. Journal of Multivariate Analysis, 2008, 99, 834–857. [Google Scholar] [CrossRef]
  10. Aneiros-Pérez, G.; Vieu, P. Automatic estimation procedure in partial linear model with functional data. Stat.Pap. 2011, 52, 751–771. [Google Scholar] [CrossRef]
  11. H. Lian, Functional partial linear model. J. Nonparametr. Stat. 2011, 23, 115–128. [CrossRef]
  12. Aneiros Pérez, G.; Raña, R.; Vieu, P.; Vilar, J. Bootstrap in semi-functional partial linear regression under dependence. Test 2018, 27, 659–679. [Google Scholar] [CrossRef]
  13. Zhao, F.-r.; Zhang, B.-X. Testing Linearity in Functional Partially Linear Models. Acta Mathematicae Applicatae Sinica, English Series 2022. [Google Scholar] [CrossRef]
  14. Feng, S.; Xue, L. Partially functional linear varying coefficient model. Statistics, 50, 717–732. [Google Scholar] [CrossRef]
  15. Boente, G.; Vahnovan, A. Robust estimators in semi-functional partial linear regression models. Journal of Multivariate Analysis 2017, 154, 59–87. [Google Scholar] [CrossRef]
  16. Ling, N.; Aneiros-Pérez, G.; Vieu, P. knn estimation in functional partial linear modeling. Statist. Papers 2020, 61, 423–444. [Google Scholar] [CrossRef]
  17. Shang, H. Bayesian bandwidth estimation for a semi-functional partial linear regression model with unknown error density. Comput. Stat. 2014, 29, 829–848. [Google Scholar] [CrossRef]
  18. Ling, N.; Vieu, P. On semiparametric regression in functional data analysis. WIRES Computational Statistics 2020, 12, 20–30. [Google Scholar] [CrossRef]
  19. Li, Y.; Ying, C. Semi-functional partial linear spatial autoregressive model. Communication in Statistics-Theory and Methods, 2021, 50, 5941–5954. [Google Scholar] [CrossRef]
  20. Benallou, M.; Attouch, M. K.; Benchikh, T.; Fetitah, O. Asymptotic results of semi-functional partial linear regression estimate under functional spatial dependency. Communications in Statistics-Theory and Methods, 2021, 51, 1–21. [Google Scholar] [CrossRef]
  21. Graham, J. W. Missing data analysis and design; Springer: New York, NY, USA, 2020. [Google Scholar]
  22. Little, R. J. A.; Rubin, D. B. Statistical Analysis with Missing Data. 3rd Edition; Wiley Series in Probability and Statistics. 2020.
  23. Ferraty, F.; Sued, F.; Vieu, P. Mean estimation with data missing at random for functional covariables. Statistics 2013, 47, 688–706. [Google Scholar] [CrossRef]
  24. Cheng, P.E. Nonparametric estimation of mean functionals with data missing at random. J. Amer. Statist. Assoc. 1994, 89, 81–87. [Google Scholar] [CrossRef]
  25. Ling, N.; Liang, L.; Vieu, P. Nonparametric regression estimation for functional stationary ergodic data with missing at random. J. Stat. Plan. Inference, 2015, 162, 75–87. [Google Scholar] [CrossRef]
  26. Rachdi, M.; Laksaci, A.; Kaid, Z.; Benchiha, A.; Fahimah, A. k-Nearest neighbors local linear regression for functional and missing data at random. Statistica Neerlandica, Netherlands Society for Statistics and Operations Research 2021, 75, 42–65. [Google Scholar] [CrossRef]
  27. Wang, Q. H; Linton, O.; Wolfgang H., Semiparametric regression analysis with missing response at random. J. Amer. Statist. Assoc., 2004, 99, 334–345. [Google Scholar] [CrossRef]
  28. Wang, Q.H. , Sun, Z. Estimation in partially linear models with missing responses at random. J. Multivariate Anal., 2007, 98, 1470–1493. [Google Scholar] [CrossRef]
  29. Ling, N.; Kan, R.; Vieu, P.; Meng, S. Semi-functional partially linear regression model with responses missing at random. Metrika 2019, 82, 39–70. [Google Scholar] [CrossRef]
  30. Haworth, J.; Cheng, T. Non-parametric regression for space-time forecasting under missing data. Computers, Environment and Urban Systems 2012, 36, 538–550. [Google Scholar] [CrossRef]
  31. Puranik, A.; Binub, V.S.; Seena, B. Estimation of missing values in aggregate level spatial data. Clinical Epidemiology and Global Health journal 2021, 9, 304–309. [Google Scholar]
  32. Alshahrani, F.; Almanjahi, I. M.; Benchikh, T.; Fetitah, O.; Attouch, M.K. Asymptotic normality of nonparametric kernel regression estimation for missing at random functional spatial data. Journal of Mathematics, Hindawi 2023. [Google Scholar] [CrossRef]
  33. Carbon, M.; Tran, L. T.; Wu, B. Kernel density estimation for random fields density estimation for random fields. Stat Probab Lett 1997, 36, 115–115. [Google Scholar] [CrossRef]
  34. Dabo-Niang, S.; Rachdi, M.; Yao, A.F. Kernel regression estimation for spatial functional random variables. Far East Journal of Theoretical Statistics 2011, 37, 77–113. [Google Scholar]
  35. Lin, Z.; Lu, C. Limit theory for mixing dependent random variables; Kluwer: Dordrecht, 1996. [Google Scholar]
  36. Cameletti, M.; Ignaccolo, R.; Bande, S. Comparing spatio-temporal models for particulate matter in Piemonte. Environmetrise 2011, 22, 985–996. [Google Scholar] [CrossRef]
  37. Hallin, M.; Lu, Z.; Yu, K. Local Linear Spatial Quantile Regression. Bernoulli 2009, 15, 659–659. [Google Scholar] [CrossRef]
Figure 1. The curves Z i , t [ 0 , 1 ] for a = 5 , 20 , 50 .
Figure 1. The curves Z i , t [ 0 , 1 ] for a = 5 , 20 , 50 .
Preprints 92342 g001
Figure 2. Simulations of the random field were generated for different values of a, specifically a = 5 , 20 , 50 .
Figure 2. Simulations of the random field were generated for different values of a, specifically a = 5 , 20 , 50 .
Preprints 92342 g002
Figure 3. Predictions of the 2 models for a = 5 , 20 , 50 .
Figure 3. Predictions of the 2 models for a = 5 , 20 , 50 .
Preprints 92342 g003
Figure 4. Boxplot MSE of the 2 models for α = 0.05 , 0.5 , 5 .
Figure 4. Boxplot MSE of the 2 models for α = 0.05 , 0.5 , 5 .
Preprints 92342 g004
Figure 7. No detrending observations P M 10 ( s ) (Y) versus detrending observations P M 10 ( s ) ( Y ^ ).
Figure 7. No detrending observations P M 10 ( s ) (Y) versus detrending observations P M 10 ( s ) ( Y ^ ).
Preprints 92342 g007
Figure 8. Boxplot of M S E Values: Detrending vs. No Detrending
Figure 8. Boxplot of M S E Values: Detrending vs. No Detrending
Preprints 92342 g008
Figure 9. Prediction of the testing sample of the P M 10 for s = 150 , . . . , 182 in 10 stations.
Figure 9. Prediction of the testing sample of the P M 10 for s = 150 , . . . , 182 in 10 stations.
Preprints 92342 g009
Table 1. Mean squared error ( M S E ) for FNPM, SFPLRM and SFPLRC.
Table 1. Mean squared error ( M S E ) for FNPM, SFPLRM and SFPLRC.
FNPM SFPLRM FNPM SFPLRM FNPM SFPLRM SFPLRC
n 1 n 2 α = 0.05 α = 0.05 α = 0.05 α = 0.5 α = 5 α = 5 Complete
10 10 22.29 6.562 21.64 3.782 21.13 2.611 2.504
10 20 26.29 5.869 24.88 3.329 25.06 1.232 1.228
10 50 22.62 5.132 21.89 2.929 21.71 1.105 1.098
20 10 22.98 4.865 21.62 3.079 21.37 1.282 1.253
20 20 20.00 4.539 19.22 2.845 18.82 0.991 0.987
20 50 21.63 4.518 21.18 2.428 21.03 0.763 0.755
50 10 20.14 4.578 20.19 2.973 19.51 1.642 1.624
50 20 22.27 3.817 21.83 2.285 21.93 0.631 0.630
50 50 20.64 3.892 20.37 2.246 20.29 0.491 0.488
Table 2. Square Error ( S E ) for β ^ .
Table 2. Square Error ( S E ) for β ^ .
SFPLRM SFPLRM SFPLRM SFPLRM SFPLRM SFPLRM SFPLRC SFPLRC
β ^ n 1 β ^ n 2 β ^ n 1 β ^ n 2 β ^ n 1 β ^ n 2 β ^ n 1 β ^ n 2
n 1 n 2 α = 0.05 α = 0.05 α = 0.5 α = 0.5 α = 5 α = 5 Č Complete
10 10 0.504 1.148 0.477 0.807 0.574 0.475 0.463 0.456
10 20 0.479 1.309 0.471 1.025 0.280 0.491 0.331 0.475
10 50 0.485 1.415 0.472 1.144 0.337 0.492 0.287 0.479
20 10 0.485 1.488 0.464 1.086 0.348 0.510 0.320 0.471
20 20 0.488 1.597 0.477 1.163 0.311 0.488 0.275 0.478
20 50 0.469 1.432 0.463 1.029 0.317 0.488 0.258 0.482
50 10 0.491 1.534 0.475 1.048 0.274 0.516 0.322 0.498
50 20 0.489 1.426 0.479 1.096 0.237 0.481 0.271 0.472
50 50 0.477 1.463 0.468 1.106 0.290 0.487 0.246 0.480
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated