Varying-Coefficient Additive Models with Density Responses and Functional Auto-Regressive Error Process

Zixuan Han; Tao Li; Jinhong You; Narayanaswamy Balakrishnan

doi:10.20944/preprints202507.1075.v1

Submitted:

13 July 2025

Posted:

14 July 2025

You are already at the latest version

Abstract

In many practical applications, data collected over time exhibit auto-correlation, which, if not addressed, can lead to incorrect statistical inferences. To address this, we propose a varying-coefficient additive model with density responses, incorporating a functional autoregressive (FAR) error process to account for serial dependency. We present a three-step spline-based estimation procedure for the varying-coefficient components after mapping densities into a linear space using the log-quantile density transformation. First, a B-spline series approximation provides initial estimates of the bivariate varying-coefficient functions. Second, spline estimation of the error process is obtained from the residuals. Lastly, improved estimates of the additive components are obtained by removing the estimated error process. Theoretical results, including convergence rates and asymptotic properties, are provided, and the practical performance of the method is demonstrated through simulations and real data analyses.

Keywords:

varying-coefficient

;

density response

;

functional auto-regressive error process

;

log quantile density transformation

Subject:

Computer Science and Mathematics - Probability and Statistics

1. Introduction

Density, or more generally, distributional data appear increasingly often in various real-world research domains. Specific examples include the distributions of cross-sectional or intraday stock returns [9,15], mortality densities [13], and distributions of intrahub connectivity in neuroimaging [11,14]. In many applications, the density curves are observed consecutively over time, which we call in the article a density time series. A motivating example is shown in Figure 1, (a) shows density time series of global mortality rate (‰) over an interval of 100 days from January 22, 2020 to April 15, 2021, while Figure (b) is an alternative look with plotting densities at three different days. In this article, we consider a regression model where density time series, served as response, is coupled with scalar predictors.

Unlike conventional functional data, density data do not form a linear space due to inherent constraints, such as nonnegativity and the requirement to integrate to one. These characteristics present significant challenges when attempting to directly apply functional data analysis techniques to random densities. Several studies have addressed this issue and proposed various solution methods, which can be broadly categorized into two types. One natural approach to overcoming the nonlinear limitations of density functions is to map them into a Hilbert space using a suitable transformation method. In this direction, [12] proposed two continuous and invertible transformations, which are log hazard transformation and log quantile density (LQD) transformation, to map probability densities to an unrestricted space of square integrable functions. [8] map densities by LQD transformation as in [12] to unrestricted square integrable functions, model and fit the responses with an additive functional-to-scalar regression. In order to forecast density functions derived from cross-sectional and intraday returns, [9] proposed two approaches based on compositional data analysis and modified log quantile transformation, combining the FPC representation and exponential smoothing forecasting method. The second way involves choosing appropriate metrics and adopting a geometric approach. For instance, [17] adopted the infinite-dimension version of the Aitchison geometry to construct a density-on-scalar linear regression model in Bayes–Hilbert spaces, while [13] discussed Fréchet regression in a general metric space with the Wasserstein metric. [5] utilized the geometry of tangent bundles of the Wasserstein space with the Wasserstein metric to propose distribution-on-distribution regression. They also studied an extension to autoregressive models of order one for distribution-valued time series. With different methodology, [19] also discussed order-p Wasserstein autoregressive model.

Denote

F

as the space of density functions f defined on a common support

U

. Without of generality, we assume that

U = [0, 1]

. Given the transformation

Ψ : F \to L 2

, for any given

X \in R^{d}

, the conditional Fréchet mean of the random density f is defined as

μ (\cdot | X) = arg min_{d \in F} E (| | Ψ (f) - {Ψ (d) | |}_{2}^{2}| X),

where the expectation E represents the joint distribution of

(X, f)

.

This is equivalent to the following equation

Ψ (μ (\cdot | X)) (u) = E (Ψ (f) (u) | X), 0 \leq u \leq 1,

leading to the fact that

μ (s | X) = Ψ^{- 1} (E (Ψ (f) (u) | X)) (s), 0 \leq s \leq 1 .

Data that we consider in this article are the density time series

d_{t}

coupled with scalar predictors

(X_{t}, Z_{t})

. To deal with the density functions, we adopt the LQD transformation

Ψ : F \to L_{2}

, where

F

is the class of density function d satisfying

\int_{R} u^{2} d (u) d u < \infty

. For each

d_{t} \in F

, let

Q_{t} (u)

be the quantile function corresponding to the cumulative distribution function

F_{t} (y)

of

d_{t} (y)

with support on [0,1], and

q_{t} (u)

be the quantile density function, i.e.,

q_{t} (u) = Q_{t}^{'} (u) = \frac{d}{d u} F_{t}^{- 1} (u)

for

u \in [0, 1]

, then the LQD transformation of

d_{t}

is defined as

Ψ (d_{t}) (u) = log (\frac{d}{d u} F_{t}^{- 1} (u)), u \in [0, 1] .

In this article, a varying-coefficient additive model with functional auto-regressive error process is proposed to estimate

E (Ψ (d_{t}) | X_{t}, Z_{t})

. Then

d_{t}

can be written as

d_{t} = Ψ^{- 1} (E (Ψ (d_{t}) | X_{t}, Z_{t})) + δ_{t 1},

where

δ_{t 1}

is the error from regression.

Except for

δ_{t 1}

, there is usually another source of error, which comes from the estimation of

d_{t}

, i.e.,

{\hat{d}}_{t} = d_{t} + δ_{t 2}

. That is because in most practical applications, instead of the density

d_{t}

itself, only

n_{t}

random samples generated from each

d_{t}

can be observed. Namely,

Y_{t 1}, \dots, Y_{t n_{t}} \sim d_{t}

. In this article, we assume that

n_{t} = n

. Following [12],

d_{t}

is estimated by the modified kernel smoothing method, i.e,

{\hat{d}}_{t} (y) = \sum_{i = 1}^{n} K (\frac{y - Y_{t i}}{h}) w (y, h) / \sum_{i = 1}^{n} \int_{0}^{1} K (\frac{s - Y_{t i}}{h}) w (s, h) d s,

where the weight function

w (y, h)

is defined so as to rectify the boundary effects as follows:

w (y, h) = {(\int_{- y / h}^{1} K (u) d u)}^{- 1} I_{y \in [0, h)} + {(\int_{- 1}^{(1 - y) / h} K (u) d u)}^{- 1} I_{y \in (1 - h, 1]} + I_{y \in [h, 1 - h]},

in which

K

is the kernel with bandwidth

h < 1 / 2

and bounded variation that is symmetric about 0 such that

\int_{0}^{1} K (u) d u > 0

,

\int_{R} | u | K (u) d u

,

\int_{R} K^{2} (u) d u

, and

\int_{R} | u | K^{2} (u) d u

are finite. Thus, when we fit the regression model with

{\hat{d}}_{t}

as the observation of

d_{t}

, we have

{\hat{d}}_{t} = Ψ^{- 1} (E (Ψ (d_{t}) | X_{t}, Z_{t})) + δ_{t 1} + δ_{t 2} .

The contribution of this article lies in integrating density time series modeling with a functional autoregressive error process. A common assumption in regression modeling is the independence of random errors. However, data collected over time often exhibit serial dependence, rendering this assumption potentially inappropriate. By incorporating autoregressive structures into the error process, this work accounts for the temporal correlation inherent in density time series, thereby improving the model’s flexibility and accuracy. For example, daily price curves or liquidity demand-supply curves in economic and financial applications or climatological curves describing the natural phenomenon [1,3,4,6] all change over time. In such cases, the daily fluctuations in the collected data are heavily influenced by previous values, potentially resulting in what is known as sequence dependence. Numerous empirical studies have demonstrated that neglecting the temporal dependencies between observations can lead to biased statistical results and flawed inferences. Consequently, it is crucial to incorporate autoregressive error structures that explicitly account for the autocorrelation present within the data. This approach ensures a more accurate representation of the underlying temporal dynamics in regression modeling. While functional autoregressive (FAR) models have been extensively studied in the literature [1,2,3,4,6,10,18], there is a notable gap in the literature regarding the use of FAR processes to model the random error component in functional regression models. In this article, to improve the accuracy of density time series predictions, we integrate the FAR error structure within a varying coefficient additive regression model. This novel approach allows for better handling of temporal dependencies in the error process, ultimately enhancing the model’s predictive performance.

The remainder of this article is organized as follows. Section 2 outlines the methodology for constructing a varying coefficient additive model with a density response, incorporating a functional autoregressive (FAR) error process. In Section 3, we introduce a three-step procedure for estimating the bivariate varying-coefficient components within the model. Section 4 provides the theoretical results and corresponding inferences derived from the model. Monte Carlo simulations are conducted in Section 5 to demonstrate the efficiency and robustness of the proposed approach. In Section 6, we showcase the practical applicability of the model through analyses of COVID-19 data and U.S. income data. Finally, a discussion of the findings is provided in Section 7, with detailed proofs of the theoretical results included in the Appendix.

2. Model Setup

In this article, we focus on the modeling of density response. Due to the constraints of density function which is nonnegative and integrated to 1, we consider the functions after LQD transformation.

In the article, the main goal is to estimate

E (Ψ (d_{t}) (u) | x)

based on the transfer of density function as

E (Ψ (d_{t}) (u) | x) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m}), 0 \leq u \leq 1,

(1)

leading to our proposed varying-coefficient additive models with density responses and functional auto-regressive

F A R (p)

error process (DVCA-FAR):

f_{t} (u) = Ψ (d_{t}) (u) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m}) + ε_{t} (u), 0 \leq u \leq 1, 1 \leq t \leq T,

(2)

where

ε_{t} (u)

is the

F A R (p)

process

ε_{t} (u) = \int γ_{1} (s, u) ε_{t - 1} (s) d s + \dots + \int γ_{p} (s, u) ε_{t - p} (s) d s + e_{t} (u) .

(3)

In the above model, the random density

d_{t} (\cdot) \in F

serves as a response, and

Ψ : F ⟶ L_{2}

is the LQD transformation. Each density is coupled with the

k -

dimensional covariates

x_{t} = {(x_{t, 1}, \dots, x_{t, k})}^{τ}

and

z_{t} = {(z_{t, 1}, \dots, z_{t, k})}^{τ}

with supports

S_{x}

and

S_{z}

respectively. Without loss of generality, we take

S_{x} = S_{z} = [0, 1]

. In this article, covariate

x_{t}

can be

z_{t}

or

t / T

.

The bivariate functions

g_{m} (\cdot, x_{m})

quantify the effect of

z

,

γ_{l} (\cdot, \cdot)

are smooth functions satisfying

\int \int γ_{l}^{2} (s, u) d u d s < \infty

. Moreover, the error

e_{t} (u)

is an independent and identically distributed random process satisfying

E (e_{t} (u)) = 0

and

C o v (e_{t} (u), e_{t} (s)) = σ_{t}^{2} (u, s)

.

With the estimated density

\hat{d}

, let

{\hat{f}}_{t} = Ψ ({\hat{d}}_{t})

. Then DVCA-FAR model can be written as

{\hat{f}}_{t} (u) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m}) + ε_{t} (u) + ε_{f_{t}} (u), 0 \leq u \leq 1,

(4)

where

ε_{f_{t}} (u)

is random error generated from the transformation of estimated density.

3. Three-Step Estimation Methodology

We propose a three-step estimation procedure. First, the B-spline smoothing method is employed to obtain the initial estimate of the bivariate varying coefficients, disregarding the error structure. Second, using the initial estimate and the response, we estimate the error process. The functional autoregressive (FAR) process is then estimated using the sequential test proposed by [10]. Finally, after removing the estimated FAR error, the optimal estimate is derived using the spline method.

3.1. Initial Estimation of Bivariate Varying-Coefficient Function

First, without considering error structure, we adopt the spline method to obtain the initial estimations of

g_{m} (u, x_{m})

,

m = 1, \dots, k,

regardless of the error process, by the spline series approximation method.

Let

{B_{0} (u), \dots, B_{N_{0}} (u)}

be the set of B-spline basis functions for u of order q with

L_{0}

interior knots, where

N_{0} + 1 = L_{0} + q

. Similarly, for each

m = 1, \dots, k

, let

{B_{0, m} (x_{m}), \dots, B_{N_{m}, m} (x_{m})}

be the set of B-spline basis functions of order q for

x_{m}

(

m = 1, \dots, k

) with

L_{m}

interior knots, where

N_{m} + 1 = L_{m} + q

. Denote

b_{j, m}^{*} (x_{m})

as the normalization of

B_{j, m} (x_{m})

and

b_{r} (u) = N_{0}^{1 / 2} B_{r} (u)

as the scaled version of

B_{r} (u)

.

Using these basis functions, we define the tensor product of the B-spline basis as

b_{r, j, m} (u, x_{m}) = b_{r} (u) b_{j, m}^{*} (x_{m}), 1 \leq r \leq N_{0}, 1 \leq j \leq N_{m}, 1 \leq m \leq k .

The spline approximation of

g_{m} (u, x_{m})

is given by

g_{m} (u, x_{m}) \approx \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} λ_{r, j, m} b_{r, j, m} (u, x_{m}), 1 \leq m \leq k .

With the least square method, the estimation of

g_{m} (u, x_{m})

is

{\tilde{g}}_{m} (u, x_{m}) = \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} {\tilde{λ}}_{r, j, m} b_{r, j, m} (u, x_{m}), 1 \leq m \leq k,

(5)

where

\tilde{λ} = {({\tilde{λ}}_{1, 1, 1}, \dots, {\tilde{λ}}_{N_{0}, N_{k}, k})}^{τ}

is a

(N_{0} \sum_{m = 1}^{k} N_{m})

-dimensional vector satisfying

\tilde{λ} = arg min_{λ} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {[{\hat{f}}_{t} (u_{i}) - \sum_{m = 1}^{k} z_{t, m} \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} λ_{r, j, m} b_{r, j, m} (u_{i}, x_{t, m})]}^{2} .

(6)

Theorem 1 shows that the initial estimation

{\tilde{g}}_{m} (u, x_{m})

is uniformly consistent.

3.2. Estimation of FAR Error Process

With the initial estimation

{\tilde{g}}_{m} (u, x_{m})

, we now estimate the FAR error process. To this end, let

{\tilde{ε}}_{t} (u) = f_{t} (u) - \sum_{m = 1}^{k} z_{t, m} {\tilde{g}}_{m} (u, x_{t, m}), 1 \leq t \leq T .

(7)

Denote the additive function

ρ_{t} (u) = \sum_{l = 1}^{p} \int γ_{l} (s, u) ε_{t - l} (s) d s

, thus the

F A R (p)

error process (3) can be written as

ε_{t} (u) = ρ_{t} (u) + e_{t} (u)

.

Let

{B_{0} (u), B_{2} (u), \dots, B_{N} (u)}

be the set of B-spline basis functions of order q with L interior knots, where

N + 1 = L + q

. Define the tensor product of the B-spline basis as

b_{r, j} (u, s) = B_{r} (u) B_{j} (s), 1 \leq r, j \leq N .

Then, the spline approximations of the functions

γ_{l} (\cdot, \cdot)

is given by

γ_{l} (s, u) = \sum_{r = 1}^{N} \sum_{j = 1}^{N} μ_{r, j, l} b_{r, j} (u, s), 1 \leq l \leq p .

The estimation of the

p N^{2}

-dimensional vector

μ = {(μ_{1, 1, 1}, \dots, μ_{N, N, p})}^{τ}

can be obtained by minimizing the following quadratic loss function:

\hat{μ} = arg min_{μ} \sum_{t = p + 1}^{T} \sum_{i = 1}^{n} {[{\tilde{ε}}_{t} (u_{i}) - \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} μ_{r, j, l} \int b_{r, j} (u_{i}, s) {\tilde{ε}}_{t - l} (s) d s]}^{2} .

Then, the corresponding estimation of

γ_{l} (\cdot, \cdot)

and the additive function

ρ_{t} (u)

are given by

{\hat{γ}}_{l} (s, u) = \sum_{r = 1}^{N} \sum_{j = 1}^{N} {\hat{μ}}_{r, j, l} b_{r, j} (u, s), 1 \leq l \leq p,

and

{\hat{ρ}}_{t} (u) = \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} {\hat{μ}}_{r, j, l} \int b_{r, j} (u, s) {\hat{ε}}_{t - l} (s) d s, 0 \leq u \leq 1, p + 1 \leq t \leq T,

(8)

respectively.

In practice, the order p of error process is unknown. To solve this problem, we utilize a sequential test by [10] to identify the order p. The detailed identification procedure is given in Section 3.4.2.

3.3. Improved Estimation of Bivariate Varying-Coefficient Function

By removing the estimated error process (8) from the response function

f_{t} (u)

and repeating the same procedure as in Section 3.1, we obtain an improved estimation of

g_{m} (u, x_{m})

.

To do so, we first denote

f_{t}^{c} (u) = f_{t} (u) - \sum_{l = 1}^{p} \int γ_{l} (s, u) ε_{t - l} (s) d s, 0 \leq u \leq 1, p + 1 \leq t \leq T,

and the estimation of

f_{t}^{c} (u)

as

{\hat{f}}_{t}^{c} (u) = f_{t} (u) - \sum_{l = 1}^{p} \int {\hat{γ}}_{l} (s, u) {\hat{ε}}_{t - l} (s) d s, 0 \leq u \leq 1, p + 1 \leq t \leq T .

On the other side,

f_{t}^{c} (u) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m}) + e_{t} (u), 0 \leq u \leq 1 .

Therefore, following the same procedure as the first step, the improved spline approximation estimations are given by

{\hat{g}}_{m} (u, x_{m}) = \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} {\hat{λ}}_{r, j, m} b_{r, j, m} (u, x_{m}), 1 \leq m \leq k,

(9)

where

\hat{λ} = {({\hat{λ}}_{1, 1, 1}, \dots, {\hat{λ}}_{N_{0}, N_{k}, k})}^{T}

is a

(N_{0} \sum_{m = 1}^{k} N_{m})

-dimensional vector satisfying

\hat{λ} = arg min_{λ} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {[{\hat{f}}_{t}^{c} (u_{i}) - \sum_{m = 1}^{k} z_{t, m} \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} λ_{r, j, m} b_{r, j, m} (u_{i}, x_{t, m})]}^{2} .

(10)

Theorem 2 and 3 show the uniform convergence and asymptotic normality of the improved estimation

{\hat{g}}_{m} (u, x_{m})

. Besides, simulation studies in Section 5 show that the improved estimation is more efficient than the initial estimation

{\tilde{g}}_{m} (u, x_{m})

.

3.4. Implementation

3.4.1. Selection of Bandwidth

In empirical applications, prior to modeling, it is necessary to estimate the density, which involves selecting the appropriate bandwidth for the modified kernel estimation. In this section, we employ the leave-one-out cross-validation method to determine the optimal bandwidth. Specifically, the bandwidth h is chosen by minimizing the following mean squared error (MSE):

C V (h) = \frac{1}{n T} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {[d_{t} (y_{t i}) - {\hat{d}}_{t} (y_{t i})]}^{2},

where for each

i = 1, \dots, n

,

{\hat{d}}_{t} (y_{t i})

is the estimate of

d_{t} (y_{t i})

with bandwidth h obtained by using observations from the t-th section other than the i-th one.

3.4.2. Identifying the Order of the FAR Process

In this section, we utilize the order determination procedure proposed by [10] to identify th order p of FAR error process. The main idea is to represent the FAR process as a fully functional linear model with dependent regressors and construct a sequential test procedure.

Given a sequence of hypotheses:

H_{0, p} : {ε_{t}} are F A R (p) v s H_{a, p + 1} : {ε_{t}} are F A R (p + 1), p = 0, 1, 2, \dots,

Here,

F A R (0)

means independent and identically distributed process.The sequential test begins with

p = 0

and terminated if

H_{0, p}

is not rejected, the order of the process is then identified as p. See [10] for more relevant details and further explanation.

To construct the test statistic, define

η_{j} (s) = \sum_{l = 1}^{p} {\tilde{ε}}_{j - l} (s p - (l - 1)) I_{l} (s), φ (s, u) = p \sum_{l = 1}^{p} γ_{l} (s p - (l - 1), u) I_{l} (s),

where

I_{l}

as the indicator function of the interval

[(l - 1) / p, l / p]

. Denote the orthonormal basis of

L^{2}

as

{{\hat{x}}_{j}, 1 \leq j \leq T}

from the eigenfunctions of

{\hat{C}}_{η} (s, u) = \frac{1}{T} \sum_{j = 1}^{T} (η_{j} (s) - \bar{η} (s)) (η_{j} (u) - \bar{η} (u)),

with the corresponding decreasingly ordered eigenvalues

{\hat{λ}}_{j}

, where

\bar{η}

is the mean function of

η_{j}

. In the proposed method, only the first

q_{η}

eigenfunction/eigenvalue pairs are used. Moreover, define

{{\hat{y}}_{j}, 1 \leq j \leq T}

and

q_{π}

analogously to

{{\hat{x}}_{j}}

and

q_{η}

for the response functions denoted as

π_{j}

.

For the product space

L^{2} ([0, 1] \times [0, 1])

, denote

η (j, k) = < η_{j}, {\hat{x}}_{k} >

,

π (j, m) = < π_{j}, {\hat{y}}_{m} >

,

ψ (k, m) = < φ, {\hat{x}}_{k} \otimes {\hat{y}}_{m} >

. Let

π = {[π (j, m)]}_{T \times q_{π}}, η = {[η (j, k)]}_{T \times q_{η}}

, and

ψ = {[ψ (k, m)]}_{q_{η} \times q_{π}}

,

j = 1, \dots, T

;

k = 1, \dots, q_{η}

;

m = 1, \dots, q_{π}

.

Construct the matrix

\hat{A}

in which the entries are

\hat{A} (k, k^{'}) = < {\hat{x}}_{k, p}, {\hat{x}}_{k^{'}, p} >,

where

{\hat{x}}_{k, p} (s) = {\hat{x}}_{k} (\frac{s + p - 1}{p}), 0 \leq s \leq 1 .

Define the orthonormal eigenvectors

{\hat{β}}_{k}

with corresponding ordered eigenvalues

{\hat{ξ}}_{1} \geq \dots \geq {\hat{ξ}}_{q_{η}}

as

\hat{A} {\hat{β}}_{k} = {\hat{ξ}}_{k} {\hat{β}}_{k}, 1 \leq k \leq q_{η},

and denote

\hat{B} = [{\hat{β}}_{1}, \dots, {\hat{β}}_{q^{*}}]

, where

q^{*} = max {k \in {1, \dots, q_{η}} : | | {\hat{z}}_{k, p} {| |}^{2} \geq 0.9 p}

and

{\hat{z}}_{k, p} (s) = \sum_{i = 1}^{q_{η}} {\hat{β}}_{k, i} {\hat{x}}_{k, p} (s)

.

Following [10], the test statistic is constructed as

{\hat{τ}}_{p} = \frac{1}{T} {(v e c [{\hat{B}}^{τ} \hat{ψ}])}^{τ} {[(I_{q_{ε}} \otimes {\hat{B}}^{τ}) (\hat{C} \otimes \hat{Λ}) (I_{q_{ε}} \otimes \hat{B})]}^{- 1} v e c [{\hat{B}}^{τ} \hat{ψ}],

(11)

where

\hat{Λ} = d i a g ({\hat{λ}}_{1}, \dots, {\hat{λ}}_{q_{η}}), \hat{C} = \frac{1}{T} {(π - η \hat{ψ})}^{τ} (π - η \hat{ψ})

. The test statistic

{\hat{τ}}_{p + 1}

has an approximately chi-squared distribution with degree

q_{π} q^{*}

.

4. Theoretical Results

In this section, we discuss the asymptotic properties of both initial and improved estimation of

g_{m} (u, x_{m})

. Moreover, the consistency of estimation of order p is derived. All proofs are given in Appendix.

Throughout the remainder of this article, for any fixed interval

[a, b]

, we denote the space of l-th order smooth functions as

C^{(l)} [a, b] = {g | g^{(l)} \in [a, b]}

and the class of Lipschitz-continuous functions for some fixed constant

C > 0

as

L i p ([a, b], C) = {g | | g (x) - g (x^{'}) | \leq | x - x^{'} |, \forall x, x^{'} \in [a, b]} .

Let

S_{x_{m}}

and

S_{z_{m}}

denote the supports of

x_{m}

and

z_{m}

, respectively. Then, the supports of

x

and

z

are

S_{x} = \prod_{m = 1}^{k} S_{x_{m}}

and

S_{z} = \prod_{m = 1}^{k} S_{z_{m}}

, respectively. The necessary assumptions for the asymptotic results are as follows.

(A1) For any

d \in F

, d is differentiable and there exists a constant

M > 1

such that

{| | d | |}_{\infty}

,

| | 1 / {d | |}_{\infty}

, and

| | d^{'} {| |}_{\infty}

are all bounded by M.

(A2) (a) The kernel density

K

is Lipschitz-continuous, bounded, and symmetric about 0. Furthermore,

K \in L i p ([- 1, 1], L_{k})

for some constant

L_{k} > 0

. (b) The kernel density

K

is such that

\int_{0}^{1} K (u) d u > 0

,

\int_{R} | u | K (u) d u

,

\int_{R} K^{2} (u) d u

, and

\int_{R} | u | K^{2} (u) d u

are finite.

(A3) The covariates

x_{t, m}, z_{t, m}

,

1 \leq m \leq k

, and the errors

ε_{t} (u)

satisfy the following moment conditions: for some

s > 2

,

max_{1 \leq t \leq T} max_{1 \leq m \leq k} E (| x_{t, m} |^{2 s}) < \infty, max_{1 \leq t \leq T} max_{1 \leq m \leq k} E (| z_{t, m} |^{2 s}) < \infty, max_{1 \leq t \leq T} sup_{u} E (| ε_{t} (u) |^{2 s}) < \infty .

For each

t = 1, \dots, T

, the covariance function

C o v (ε_{t} (s), ε_{t} (v)) = Σ_{t} (s, v)

has finite nondecreasing eigenvalues

λ_{1} \leq \dots \leq λ_{m a x}

satisfying

\sum_{j} λ_{j} < \infty

.

(A4) The additive component functions

g_{m} (u, x_{m}), 1 \leq m \leq k

, are continuous on

[0, 1] \times [a_{m}, b_{m}]

and twice continuously partially differentiable with respect to u and

x_{m}

, where

[a_{m}, b_{m}]

is a compact subset of

S_{x_{m}}

.

(A5)

N_{0} \sim {(n T)}^{1 / 6} log n T

,

N_{m} \sim {(n T)}^{1 / 6} log n T

,

1 \leq m \leq k

,

h \sim n^{- 1 / 3}

, as

n, T \to \infty

.

Remark 1.

Assumption (A1) is basic and essential for deriving the consistency of densities after transformation. The conditions in (A2) on the kernel function

K (\cdot)

are mild and can be satisfied by commonly used kernel functions such as the uniform and Epanechnikov kernels. The moment conditions in (A3) are crucial for deriving the uniform convergence and other asymptotic properties based on the spline function. The smoothness conditions for the component functions in (A4) are greatly relaxed. The conditions in (A5) are commonly applied in spline smoothing to ensure optimal convergence rates.

We first derive the uniform consistency of the initial estimates of bivariate functions

g_{m} (u, x_{m})

, as stated in Theorem 1.

Theorem 1.

Theorem 1. Assume that assumptions (A1)–(A5) hold, and that

{\tilde{g}}_{m} (u, x_{m})

are the initial estimates of

g_{m} (u, x_{m})

defined by (5),

m = 1, \dots, k

. Then, as

n \to \infty

and

T \to \infty

, it holds that

sup_{u, x_{m} \in [0, 1]} | {\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}) | = O_{p} ({(n T)}^{- 1 / 3} log (n T) + n^{- 1 / 3}) .

Theorem 2 characterizes the uniform convergence of the improved estimation of

g_{m} (u, x_{m})

, and Theorem 3 describes the asymptotic properties of both the initial and improved estimations.

Theorem 2.

Theorem 2.Assume that assumptions (A1)–(A5) hold,

{\hat{g}}_{m} (u, x_{m})

are the improved estimates of

g_{m} (u, x_{m})

defined by (9),

m = 1, \dots, k

, and that the order of the functional error process p is known. Then, as

n \to \infty

and

T \to \infty

, the following holds:

sup_{u, x_{m} \in [0, 1]} | {\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}) | = O_{p} ({(n T)}^{- 1 / 3} {(log (n T))}^{- 2} + n^{- 1 / 3}) .

To present the asymptotic normality of the estimations, some notations are given as follows. Denote

b (u, x_{t, m}) = {(b_{1, 1, m} (u, x_{t, m}), \dots, b_{N_{0}, N_{m}, m} (u, x_{t, m}))}^{τ}

,

b_{z} (u, x_{t, m}) = z_{t, m} b (u, x_{t, m})

,

{Bz}_{t, m} = {(b_{z} (u_{1}, x_{t, m}), \dots, b_{z} (u_{n}, x_{t, m}))}_{n \times N_{0} N_{m}}^{τ}

,

B_{m} = {({Bz}_{1, m}^{τ}, \dots, {Bz}_{T, m}^{τ})}^{τ}

, and

B = (B_{1}, \dots, B_{k})

.

Let

B_{*} = B / \sqrt{n T}

, and

A_{m} = (0, \dots, I, \dots, 0)

is an

1 \times k

block matrix, with the m-th block being

N_{0} N_{m} \times N_{0} N_{m}

identity matrix and the j-th (

j \neq m

) block being

N_{0} N_{m} \times N_{0} N_{j}

zero matrix.

Theorem 3.

Theorem 3.Assume that assumptions (A1)–(A5) hold,

{\tilde{g}}_{m} (u, x_{m})

and

{\hat{g}}_{m} (u, x_{m})

are the initial and improved estimates of

g_{m} (u, x_{m})

defined by (5) and (9),

m = 1, \dots, k

, respectively. Then, as

n ≫ T \to \infty

, the following hold for all

u \in (0, 1)

and

x_{m} \in [0, 1]

:

(i) The initial estimate

{\tilde{g}}_{m} (u, x_{m})

is asymptotically normally distributed, i.e.,

\sqrt{n T} {(C_{m} Σ_{ε} C_{m}^{τ})}^{- 1} ({\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})) \overset{D}{\to} N (0, 1),

where

C_{m} = b^{τ} (u, x_{m}) E (A_{m} {(B_{*}^{τ} B_{*})}^{- 1} B_{*}^{τ})

, the covariance matrix

Σ_{ε} = {(Σ_{t, s})}_{1 \leq t, s \leq T}

, with

Σ_{t, s} = C o v (ε_{t}, ε_{s})

.

(ii) The improved estimate

{\hat{g}}_{m} (u, x_{m})

is asymptotically normally distributed, i.e.,

\sqrt{n T} {(C_{m} Ξ_{ε} C_{m}^{τ})}^{- 1} ({\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})) \overset{D}{\to} N (0, 1),

where the covariance matrix

Ξ_{ε} = diag {(Ξ_{t, t})}_{1 \leq t \leq T}

, with

Ξ_{t, t} (u, s) = σ_{t}^{2} (u, s)

.

5. Numerical Study

In this section, we conduct two simulation studies to demonstrate the performance of the proposed identification and estimation procedure for the additive model.

5.1. Case 1

With the assumption that the auto-regressive error process is known, this case is conducted to demonstrate the performance of the estimation procedure with finite n and T. We consider the following DVCA-FAR(1) model:

f_{t} (u) = z_{t, 1} g_{1} (u, x_{t, 1}) + z_{t, 2} g_{2} (u, x_{t, 2}) + ε_{t} (u), 0 \leq u \leq 1,

(12)

and the error function

ε_{t} (u)

is

ε_{t} (u) = \int γ_{1} (s, u) ε_{t - 1} (s) d s + e_{t} (u), 2 \leq t \leq T .

Let the bivariate varying-coefficient functions be

g_{1} (u, x_{t, 1}) = sin (2 π u) (2 x_{t, 1} - 1), g_{2} (u, x_{t, 2}) = sin (2 π u) sin (2 π x_{t, 2}),

and the coefficient functions be

γ_{1} (s, u) = 0.2 u s, e_{t} (u) = 0.2 η_{t, 1} sin (π u) + η_{t, 2} sin (2 π u),

where

η_{t, 1} \sim N (0, 0 . 1^{2})

,

η_{t, 2} \sim N (0, 0 . 05^{2})

, and

η_{t, 1}

are independent of

η_{t, 2}

for

u \in [0, 1]

.

The covariates

z_{t, 1}, z_{t, 2}

are generated by

N (0, 1)

and

N (0, 0 . 5^{2})

, respectively, while

x_{t, 1}, x_{t, 2}

are generated by

{(x_{t, 1}, x_{t, 2})}^{τ} = {(Φ (v_{t, 1}), Φ (v_{t, 2}))}^{τ}

,

1 \leq t \leq T

, where

Φ

is the cumulative distribution function of the standard normal distribution and

v_{t, 1}

,

v_{t, 2}

are mutually independent variables of the standard normal distribution.

To generate the response densities, for each given

Z = z

and

X = x

, let

α (u, x, z)

be the additive surface given by

α (u, x, z) = \sum_{m = 1}^{p} z_{m} g_{m} (u, x_{m})

. The conditional quantile function

Q (\cdot | x, z)

with the error process

ε (u)

satisfy

Q (u | x, z) = F^{- 1} (u | x, z) = θ {(x, z)}^{- 1} \int_{0}^{u} exp {α (v, x, z) + ε (v)} d v

, where

θ (x, z) = \int_{0}^{1} exp {α (v, x, z) + ε (v)} d v

.

Implementing the conditional quantile function to

{U_{t, 1} \leq \dots \leq U_{t, n_{t}}} \sim U (0, 1)

, which are independent of

X_{t}

and

Z_{t}

, we can get the random samples

Y_{t} = {Y_{t, j} = Q (U_{t, j} | X_{t}, Z_{t}) : 1 \leq j \leq n_{t}}

for each

1 \leq t \leq T

, so that

Y_{t, 1}, \dots, Y_{t, n_{t}} \sim d_{t}

, where

d_{t}

is the random response density. Denoting

f_{t} (u) = Ψ (d_{t} (u))

, we obtain the transformed density expressed in model (2). Without loss of generality, we assume that

n_{t} = n

independent and identically distributed observations are available for each response distribution.

With

T = 100, n = 100

, the estimation is conducted with 200 Monte Carlo runs. Figure 2 displays the true curves of the FAR error process

ε (u)

in panel (a) and its corresponding spline-based estimations in panel (b). Figure 3 presents a comprehensive view of the true surface of

g_{m} (u, x_{m})

alongside the average estimations obtained from 200 Monte Carlo simulations. Specifically, the left panel displays the true densities, the middle panel shows the initial spline estimates derived without accounting for the

F A R (1)

error process, and the right panel provides the improved estimations achieved through the same method after removing the estimated error process. To better illustrate the performance, the bivariate function estimations are presented from two different relative perspectives, offering a clearer understanding of the improvements made through the error correction process.

Figure 2 indicates that the estimation of

ε (u)

yields highly accurate results, which is further corroborated by the findings presented in Figure 3. Specifically, the right panel of Figure 3 illustrates that the improved estimation, after accounting for the FAR error process, significantly outperforms the initial estimation shown in the middle panel. This improvement highlights the effectiveness of the proposed methodology in refining the bivariate function estimates by appropriately addressing the error structure.

For further comparison, we conducted simulations with sample sizes of

T = 50, 100

and

n = 50, 100

observations. We use the root mean squared errors (RMSEs) to measure the accuracy of the estimations, including the initial and improved estimations of

g_{m} (u, x_{m})

. The RMSE is defined as

R M S E ({\dot{g}}_{m}) = \frac{1}{T} \sum_{t = 1}^{T} {\frac{1}{n} \sum_{i = 1}^{n} | | {\dot{g}}_{m} (u_{i}, x_{t, m}) - g_{m} (u_{i}, x_{t, m}) {| |}_{2}^{2}}^{\frac{1}{2}} .

where

{\dot{g}}_{m}

is

{\tilde{g}}_{m}

or

{\hat{g}}_{m}

.

Based on the results from 200 Monte Carlo simulations, Table 1 presents the average root mean square errors (RMSEs) along with their standard deviations for both the initial and improved estimations of

g_{m} (u, x_{m})

. The findings reveal that the RMSEs of the bivariate functions decrease as both the sample size T and the number of observations n increase. Notably, the RMSEs associated with the improved estimation are consistently smaller than those for the initial estimation. This is to be expected, as the initial estimates were derived without accounting for the error process, which, as demonstrated, has a significant impact on the accuracy of the results. By incorporating the error process and removing its estimated effects in the improved estimation, the model yields more accurate and refined results.

To provide a clearer explanation of the theoretical validity differences between the two estimations, we calculate their biases and standard deviations and compare them. Taking the first example in the numerical simulation section as an example.

Table 2 presents the average bias and standard deviation (SD) of both the initial and improved estimations of

g_{m} (u, x_{m})

. The results clearly indicate that, across all settings of the model sample size, the standard deviation of the improved estimates of the bivariate additive functions is substantially smaller than that of the initial estimates, with the bias also being correspondingly reduced. Moreover, as the sample size increases, both the standard deviation and bias of the estimates decrease, further reinforcing the reliability of the improved method. This finding numerically substantiates the claim that the improved estimation method results in a smaller asymptotic variance-covariance matrix compared to the initial estimation, thereby enhancing the precision and robustness of the estimates.

5.2. Case 2

Case 2 is conducted to demonstrate the efficiency of identifying the auto-regressive order of the functional error process. The densities are also generated from model (12), but with the

F A R (2)

error function, where

γ_{2} (s, u) = \frac{1}{4} u s^{2}

. All other settings are the same as for Case 1.

Table 3 presents the empirical power of the testing algorithm used to determine the order of the FAR error process under various settings and significance levels. The results clearly demonstrate that the power of the test increases as the sample size T and the number of observations n grow. Specifically, the power approaches 1 as both T and n increase to 100 when testing the null hypothesis of an independent and identically distributed (i.i.d.) sample. This suggests that the test becomes increasingly reliable with larger sample sizes. However, the power is slightly lower when testing the null hypothesis of order 1 against order 2, which is expected given the complexity of distinguishing between these two orders. Furthermore, the size of the test remains low when testing order 2 against a higher-order process, confirming the accuracy and feasibility of the testing algorithm for determining the appropriate order of the functional error process. These findings validate the effectiveness of the proposed testing algorithm in practical applications, ensuring its robustness and precision in various settings.

Furthermore, to assess the efficiency of the auto-regressive order p on the overall estimation results, Table 4 presents the average RMSEs for the bivariate varying-coefficient functions. The observed pattern closely mirrors the results from Case 1, further reinforcing the effectiveness and reliability of the proposed model’s identification and estimation procedures. These findings provide strong empirical evidence that the auto-regressive order plays a crucial role in enhancing the accuracy of the estimation process. The consistency of the RMSE results across different settings underscores the robustness of the model, confirming its ability to effectively account for the error structure and yield precise estimates of the bivariate varying-coefficient functions.

6. Real Data Analysis

In this section, we demonstrate the feasibility and efficiency of the proposed estimation procedure through the analysis of two real-world datasets. By applying the developed methodology to empirical data, we aim to showcase the practical utility of the model in capturing the underlying patterns and dependencies inherent in the datasets. The analysis serves not only to validate the effectiveness of the proposed estimation approach but also to highlight its applicability across different domains, offering insights into the model’s versatility and robustness in handling complex, time-dependent, and non-Euclidean data structures.

6.1. COVID-19 Data

On March 11, 2020, the World Health Organization (WHO) declared COVID-19, an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a global pandemic. The rapid and widespread transmission of the virus led to unprecedented global health challenges, with countries around the world instituting lockdowns and other measures to curb the spread of the disease. As of August 15, 2021, official statistics from the WHO reported a staggering 221,885,822 confirmed cases and 4,583,539 deaths across nearly all countries, reflecting the profound and far-reaching impact of the pandemic. Given the magnitude of this crisis, it is essential for international health organizations and research institutions to closely monitor the evolving global trends of COVID-19. Such monitoring facilitates timely and accurate analysis, enabling effective public health responses and the development of strategies for medical treatment, prevention, and control of future outbreaks. Understanding the dynamics of the epidemic through data-driven models is thus critical for informing policy decisions and improving global health outcomes in the face of such a devastating crisis.

To demonstrate this proposition, we select the mortality rate as an appropriate metric for measuring the global trend of the COVID-19 pandemic. The mortality rate is defined as the ratio of cumulative deaths per day to the total population of each country, serving as a key indicator of the disease’s lethality and spread. Notably, the data required to calculate the mortality rate are inherently temporally dependent, as they rely on the data from the previous day. This results in the mortality rate, and thus the global trend of the epidemic, exhibiting temporal auto-correlation.

The data on COVID-19-related deaths, which are critical for our analysis, can be accessed from the Johns Hopkins University repository. This repository includes a dynamic tracking map that provides a comprehensive view of the global trends related to the pandemic. The dataset, which is publicly available at https://www.jhu.edu/, covers the period from January 22, 2020, to April 15, 2021. Additionally, the most recent total population data for each country, necessary for calculating the mortality rate, can be obtained from the World Bank’s online platform, accessible at https://data.worldbank.org/indicator. These publicly available datasets serve as a rich resource for tracking the progression of the pandemic and for conducting rigorous statistical and epidemiological analyses aimed at understanding the disease dynamics across different regions.

Due to the varying outbreak times across different countries and regions, we define the origin of the time scale as the point at which the cumulative number of confirmed COVID-19 cases reached 100 in all countries. For this analysis, we focus on a dataset containing daily cumulative deaths from 189 countries, considering a 100-day period following this reference time. At each time point t, we estimate the density function of the mortality rate, denoted as

{\hat{d}}_{t} (y)

, using the observations from the 189 countries. Figure 1 (a) presents the estimated densities of the global mortality rate (‰) over the 100-day interval, with data records from up to 189 countries for each time point. Figure 1 (b) offers an alternative perspective by displaying the estimated densities for three selected days. From the figures, it is evident that the mortality rate densities are well-defined across the observed period. Moreover, a temporal dependency among the distributions is clearly observable, which suggests the presence of an auto-regressive process in the data, potentially supporting the hypothesis of a FAR error structure.

The primary goal of this analysis, based on the COVID-19 data, is to identify the FAR process underlying the mortality rate and estimate its component functions. For the sake of simplicity, we begin by considering a special case in which the covariate z is constant (=1) and x represents a time scale, denoted as

t / T

, namely, we consider the following model:

{\hat{f}}_{t} (u) = Ψ ({\hat{d}}_{t}) (u) = g_{1} (u, x_{t, 1}) + ε_{t} (u), 1 \leq t \leq 100,

where

ε_{t} (u) = \sum_{l = 1}^{p} \int γ_{l} (u, s) ε_{t - l} (s) d s + e_{t} (u)

and

x_{t, 1}

denotes the time scale

t / T

.

Based on the initial spline estimation of

g_{1} (u, x_{1})

, the testing algorithm is employed to determine the order of the FAR process. Table 5 presents the test results, specifically the p-values under different hypotheses. The table reveals that the observed p-values indicate significant evidence of auto-correlation in the data. This suggests that the underlying process can indeed be effectively modeled as a first-order functional auto-regressive process, denoted as

F A R (1)

. Such findings provide strong empirical support for the presence of temporal dependencies in the COVID-19 mortality rate, further justifying the application of a FAR error structure in modeling the dynamics of the epidemic.

Figure 4 presents the heat map of the estimated bivariate function

g_{1} (u, x_{1})

after accounting for the functional error process and identifying the auto-regressive order. The heat map reveals a relatively stable temporal pattern, with the function initially reaching a minimum value at lower values of u, gradually increasing, and eventually reaching a maximum at later time points. This pattern highlights the underlying dynamics of the COVID-19 mortality rate across successive days. The observed correlation between the data from consecutive days supports the notion that the global mortality rate exhibits significant temporal dependencies. This is consistent with the design of the mortality rate measure, which is derived from previous daily data, reflecting the evolving trend of the pandemic on a global scale. The results further validate the presence of auto-correlation in the mortality rate data and underscore the necessity of incorporating a functional auto-regressive process to capture these dependencies effectively.

6.2. USA Income Data

Personal income statistics play a crucial role in enabling governments to understand the dynamics between national income, spending, and saving, while also serving as an important tool for evaluating and comparing the economic well-being across different regions or nations. In this context, we focus on the density time series of per capita personal income, which is defined as the total personal income of an area divided by its population. This measure provides a more granular perspective on the economic conditions within a region, reflecting the distribution and trends in income on a per-person basis over time. By analyzing such time series, policymakers and researchers can gain insights into the long-term economic trajectory of a region, assess disparities in income distribution, and make informed decisions regarding fiscal policies, social welfare programs, and economic development strategies.

Income data for the USA are publicly available at the official website of the United States Bureau of Economic Analysis (http://www.bea.gov/). We consider the quarterly per capita personal income of 50 states in the USA from the first quarter of 2010 to the fourth quarter of 2020, namely,

t = 1, \dots, 44

. For each t, we obtain the density function of per capita income,

{\hat{d}}_{t} (y)

, each with 50 observations. As the quarterly personal income is an economic measure based on national conditions, we choose two related covariates, `GDP’ (quarterly gross domestic product of the USA) and `Population’ (quarterly total population of the USA), which can also be obtained from http://www.bea.gov/.

The income curve, traditionally studied in economics as panel data, primarily reflects the relationship between consumers’ equilibrium points. As individuals’ income levels fluctuate, the connections between these equilibrium points form a trajectory, symbolizing not only an increase in income but also a corresponding rise in consumer satisfaction. This approach emphasizes the dynamic nature of income growth and its impact on individual well-being, providing valuable insights into consumer behavior over time.

In contrast, the income density curve, treated as functional data, is used to describe the distribution of income within a specific region or demographic group. It offers a graphical representation of the characteristics and trends in income across various income intervals, providing a more holistic view of the socio-economic landscape. By analyzing the income density curve, the degree of income inequality within a population can be effectively observed, revealing important patterns of wealth distribution. This type of curve is crucial for economic research as it facilitates a deeper understanding of consumption behavior, socio-economic conditions, and the formulation of societal policies.

Moreover, income density curves are instrumental in economic forecasting and analysis. By examining trends in income distribution, economists can combine insights into consumer preferences and consumption habits at different income levels. This integration enables predictions about future economic conditions and shifts in consumption patterns, making the income density curve a key tool for anticipating changes in both microeconomic and macroeconomic contexts. In this way, the income density curve plays an essential role in informing policy decisions, economic strategies, and the broader understanding of economic well-being.‌

Figure 5 (a) illustrates the density time series of quarterly personal income over a period of 44 quarters. The density curves reveal that, over the past decade, the overall distribution of per capita income across various states in the United States has exhibited a consistent pattern. Specifically, there are relatively few individuals in the high-income and middle-to-high-income brackets, a moderate number in the middle-income category, and a larger proportion in the middle-to-low-income segments.

To further elucidate the temporal evolution of income distribution, Figure 5 (b) presents the density curves for three distinct time points: the second quarter of 2015, the first quarter of 2017, and the third quarter of 2018. A clear trend emerges, showing a gradual shift towards higher income values over time, accompanied by a corresponding decrease in the peak of the density curve. This shift is unsurprising given the broader economic and technological advancements in modern society. As the economy progresses, the proportion of low-income individuals in the United States has been steadily declining, while the number of middle-to-high-income individuals has been rising. Consequently, the distribution of income is becoming more balanced, with a growing proportion of the population occupying the middle to high-income brackets. This pattern reflects the broader trends of economic development and income redistribution that have taken place in recent years.

We consider the following DVCA-FAR model:

{\hat{f}}_{t} (u) = Ψ ({\hat{d}}_{t}) (u) = g_{0} (u, x_{t, 0}) + z_{t, 1} g_{1} (u, x_{t, 1}) + z_{t, 2} g_{2} (u, x_{t, 2}) + ε_{t} (u), 1 \leq t \leq 44,

where

ε_{t} (u) = \sum_{l = 1}^{p} \int γ_{l} (u, s) ε_{t - l} (s) d s + e_{t} (u)

.

z_{t, 1}

denotes the quarterly gross domestic product of the USA,

z_{t, 2}

denotes the quarterly total population of the USA, and

x_{t, 0}, x_{t, 1}, x_{t, 2}

denote the time scale

t / T

.

Similar to the previous estimation procedure, the testing algorithm is applied to determine the optimal order based on the initial spline estimations. Table 6 presents the p-values of the test under various hypotheses, which indicate that the

F A R (2)

model is the most appropriate for modeling the error process in this context. This result suggests that a second-order functional autoregressive process effectively captures the autocorrelated structure of the error terms in the income data.

Using the three-step estimation procedure, estimations of the bivariate varying-coefficient functions can be obtained. Figure 6 displays the heat maps of the three bivariate functions, where

g_{0} (u, x_{0})

represents the common effect of the data over time,

g_{1} (u, x_{1})

and

g_{2} (u, x_{2})

reflect the impact of quarterly GDP and quarterly total population on data respectively. The heat map of

g_{0}

reveals a clear pattern, where high and low values alternate over time, indicating that individuals with both higher and lower per capita income experience similar effects, in contrast to those in the middle-income bracket, who show an opposing trend. For

g_{1}

, the figure demonstrates a mode that is consistent for both small and large values of u, but varies over time. Initially, the function reaches a maximum, then decreases to a minimum before increasing again towards the end of the time scale. In contrast, the effect of population, as shown in the heat map of

g_{2}

, exhibits a trend that is opposite to that of the common effect. The population impact is relatively consistent across both higher and lower per capita income groups. Taken together, these results suggest a significant dependency of the quarterly personal income distributions in the United States on previous statistics, with the dynamics of income closely tied to both macroeconomic factors (such as GDP) and demographic factors (such as population).

7. Discussion

Data collected from sequential time points often exhibit autocorrelation, which must be addressed in the modeling process. Additionally, the analysis of non-Euclidean data has become increasingly common. To tackle these challenges, we propose a varying-coefficient additive model with density responses, incorporating a Functional Auto-Regressive (FAR) error process. Due to the complexity and constraints of the data, we first map density functions into a linear space using the transformation method by [12]. We then develop a three-step estimation procedure for the varying-coefficient components. First, B-spline series approximation is used to obtain initial estimates of the bivariate varying-coefficient components without considering the functional errors. Next, the FAR error process order is determined using the test statistic from [10] based on the residuals from the initial estimation. Finally, the FAR error process is removed, and improved spline estimators are constructed for the varying-coefficient components. Theoretical results, including convergence rates and asymptotic properties, are derived for both the initial and improved estimations. The performance of the proposed method is demonstrated through simulations and two real datasets, showing the importance of accounting for autocorrelation and validating the efficiency of the proposed approach.

Future research can explore a variety of related problems. In this study, we employed the varying-coefficient additive model to establish the relationship between density function responses and scalar predictors. With the advent of large and complex datasets, functional predictors have become increasingly valuable in analyzing practical data applications. Furthermore, to account for sequence dependence, we utilized the FAR process to model the correlation structures in the data. However, more intricate models representing autocorrelation could also be explored to further refine the representation of sequence dependence in functional errors. Future work will focus on extending these approaches and conducting additional studies in these areas.

Appendix A

In this appendix, we provide detailed proofs of the theoretical results.

Proof of Theorem 1

As described in the article, the proposed varying-coefficient additive model with functional error process can be written as

f_{t} (u) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m}) + ε_{t} (u), 0 \leq u \leq 1 .

Denote

b (u, x_{m}) = {(b_{1, 1, m} (u, x_{m}), \dots, b_{N_{0}, N_{m}, m} (u, x_{m}))}^{τ}

,

λ_{m} = {(λ_{1, 1, m}, \dots, λ_{N_{0}, N_{m}, m})}^{τ}

,

{\tilde{λ}}_{m} = {({\tilde{λ}}_{1, 1, m}, \dots, {\tilde{λ}}_{N_{0}, N_{m}, m})}^{τ}

,

λ = {(λ_{1}, \dots, λ_{k})}^{τ}

, and

\tilde{λ} = {({\tilde{λ}}_{1}, \dots, {\tilde{λ}}_{k})}^{τ}

,

m = 1, \dots, k

,

t = 1, \dots, T

.

By using the spline approximation method,

g_{m} (u, x_{m})

can be written in the matrix form as

\begin{matrix} g_{m} (u, x_{m}) & \approx \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} λ_{r, j, m} b_{r, j, m} (u, x_{m}) = b^{τ} (u, x_{m}) λ_{m} . \end{matrix}

Ignoring the

F A R

error term, the initial estimator of bivariate varying-coefficient functions

g_{m} (u, x_{m})

can be written as

{\tilde{g}}_{m} (u, x_{m}) = \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} {\tilde{λ}}_{r, j, m} b_{r, j, m} (u, x_{m}) = b^{τ} (u, x_{m}) {\tilde{λ}}_{m}, 1 \leq m \leq k,

where

\tilde{λ} = {({\tilde{λ}}_{1, 1, 1}, \dots, {\tilde{λ}}_{N_{0}, N_{k}, k})}^{τ}

is a

(N_{0} \sum_{m = 1}^{k} N_{m})

-dimensional vector defined by

\tilde{λ} = arg min_{λ} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {[f_{t} (u_{i}) - \sum_{m = 1}^{k} z_{t, m} \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} λ_{r, j, m} b_{r, j, m} (u_{i}, x_{t, m})]}^{2} .

We first give some notations. Denote

B = (B_{1}, \dots, B_{k})

,

B_{m} = {({Bz}_{1, m}^{τ}, \dots, {Bz}_{T, m}^{τ})}^{τ}

,

{Bz}_{t, m} = {(b_{z} (u_{1}, x_{t, m}), \dots, b_{z} (u_{n}, x_{t, m}))}^{τ}

,

b_{z} (u, x_{t, m}) = z_{t, m} b (u, x_{t, m})

,

f_{t} = {(f_{t} (u_{1}), \dots, f_{t} (u_{n}))}^{τ}

,

f = {(f_{1}^{τ}, \dots, f_{T}^{τ})}^{τ}

,

ε_{t} = {(ε_{t} (u_{1}), \dots, ε_{t} (u_{n}))}^{τ}

,

ε = {(ε_{1}^{τ}, \dots, ε_{T}^{τ})}^{τ}

.

Let

{\hat{f}}_{t} (u) = Ψ ({\hat{d}}_{t}) (u)

be the estimate of

f_{t} (u)

based on observations

{Y_{t}}

. Denote

\hat{f} = {({\hat{f_{1}}}^{τ}, \dots, {\hat{f_{T}}}^{τ})}^{τ}

, where

{\hat{f}}_{t} = {({\hat{f}}_{t} (u_{1}), \dots, {\hat{f}}_{t} (u_{n}))}^{τ}

. We then have

\hat{f} = f + ε_{f}

, where

ε_{f} = {(ε_{f_{1}}^{τ}, \dots, ε_{f_{T}}^{τ})}^{τ}

, and

ε_{f_{t}} = Ψ ({\hat{d}}_{t}) - Ψ (d_{t})

is the error generated from the LQD transformation of the density estimation

{\hat{d}}_{t}

. For each

d_{t}

, we assume the error from transfer and kernel smoothing is identically and independent distributed. The estimation of

λ

is given by

\tilde{λ} = {(B^{τ} B)}^{- 1} B^{τ} \hat{f}

.

For simplicity, denote

g_{z} (u, x_{t}) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m})

,

g_{t} = {(g_{z} (u_{1}, x_{t}), \dots, g_{z} (u_{n}, x_{t}))}^{τ}

, and

g = {(g_{1}^{τ}, \dots, g_{T}^{τ})}^{τ}

. Let

A_{m} = (0, \dots, I, \dots, 0)

be an

N_{0} N_{m} \times N_{0} \sum_{i = 1}^{k} N_{i}

block matrix, with each block be an

N_{0} N_{m} \times N_{0} N_{i}

matrix,

i = 1, \dots, k

, and the m-th position an identity matrix.

To prove the consistency of

{\tilde{g}}_{m} (u, x_{m})

, we first decompose

{\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})

into three parts, which is as follows.

\begin{matrix} {\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} \hat{f} - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} f + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m}) \\ + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} \\ = & g^{B} (u, x_{m}) + g^{V} (u, x_{m}) + g^{e} (u, x_{m}), \end{matrix}

where

\begin{matrix} g^{B} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m}), \\ g^{V} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε, \\ g^{e} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} . \end{matrix}

For

g^{B} (u, x_{m})

, we have

\begin{matrix} g^{B} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} [{(B^{τ} B)}^{- 1} B^{τ} g - λ] + b^{τ} (u, x_{m}) A_{m} λ - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} [g - B λ] + [b^{τ} (u, x_{m}) λ_{m} - g_{m} (u, x_{m})] \\ = & b^{τ} (u, x_{m}) A_{m} {(\frac{1}{n T} B^{τ} B)}^{- 1} B^{τ} [\frac{1}{n T} (g - B λ)] + [b^{τ} (u, x_{m}) λ_{m} - g_{m} (u, x_{m})] . \end{matrix}

It indicates by [16] that the order of the traditional bivariate spline estimator is

O_{p} (N_{m}^{- 2})

, i.e., there exists a constant

C_{0}

, such that

sup_{u, x_{m}} | g_{m} (u, x_{m}) - b {(u, x_{m})}^{τ} λ_{m} | \leq C_{0} N_{m}^{- 2} .

For simplicity, we assume that there exists a constant

N_{1}

such that

N_{0} = N_{m} = N_{1}

,

1 \leq m \leq k

. As a result, we obtain that

{sup}_{u, x} \frac{1}{n T} | g - B λ | \leq C_{1} N_{1}^{- 2}

with a constant

C_{1}

. Combined with the result that

| | {(\frac{1}{n T} B^{τ} B)}^{- 1} | | = O_{p} (N_{1}^{2})

, which can be derived from DeVore & Lorentz (1993), therefore, we can get that

sup_{u, x_{m} \in [0, 1]} | g^{B} (u, x_{m}) | = O_{p} (N_{1}^{- 2}) .

For

g^{V} (u, x_{m}) = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε

,

ε

is a functional auto-regressive process, defined as

ε_{t} (u) = \sum_{l = 1}^{p} \int γ_{l} (s, u) ε_{t - l} (s) d s + e_{t} (u)

. Based on the assumption that

{e_{t}}_{t = 1}^{T}

is independent with

x_{t}, z_{t}

, satisfying

E (e_{t} (u) | x_{t}, z_{t}) = 0

, and the largest eigenvalues of the covariance function

Σ_{ε} (u)

,

λ_{m a x}

, is finite. Therefore,

\begin{matrix} E (g^{V} (u, x_{m})) & = E [E (g^{V} (u, x_{m}) | x, z)] = E [E (b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε | x, z)] \\ = E [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} E (ε | x, z)] = 0 \end{matrix}

and

\begin{matrix} E {(g^{V} (u, x_{m}))}^{2} = E [E (g^{V} {(u, x_{m})}^{2} | x, z)] \\ = & E [E ({(b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε)}^{2} | x, z)] \\ = & E [E (b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε ε^{τ} B {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) | x, z)] \\ = & E [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} E (ε ε^{τ} | x, z) B {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m})] \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} Σ_{ε} B {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) \\ \leq & \frac{C λ_{m a x}}{n T} b^{τ} (u, x_{m}) A_{m} {(\frac{1}{n T} B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) \\ \leq & \frac{C λ_{m a x}}{n T} b (u, x_{m}) b^{τ} (u, x_{m}) ||{(\frac{1}{n T} B^{τ} B)}^{- 1}|| \end{matrix}

Subsequently, combining the above result, we have

sup_{u, x_{m} \in [0, 1]} | g^{V} (u, x_{m}) | = O_{p} (\sqrt{N_{1} / n T}) = O_{p} (N_{1}^{1 / 2} / \sqrt{n T}) .

For

g^{e} (u, x_{m}) = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}

, note that

f_{t} (u) = Ψ (d_{t}) (u)

and the estimation obtained from observations is

{\hat{f}}_{t} (u) = Ψ ({\hat{d}}_{t} (u))

. From Petersen & Müller (2016), it can be obtained that

{sup}_{d_{t} \in F} | {\hat{d}}_{t} - d_{t} | = O_{p} (h + {(n h)}^{- 1 / 2})

. Since

ε_{f_{t}} = {\hat{f}}_{t} - f_{t}

is the error from the transformation of

{\hat{d}}_{t} (\cdot)

and

d_{t} (\cdot)

, the consistency of LQD transformation indicates that

{sup}_{d_{t}} | Ψ ({\hat{d}}_{t}) - Ψ (d_{t}) | = O_{p} (h + {(n h)}^{- 1 / 2})

. Then, under the assumption of smoothness condition, we can get that

sup_{u, x_{m} \in [0, 1]} | g^{e} (u, x_{m}) | = O_{p} (h + {(n h)}^{- \frac{1}{2}}) .

Therefore, as

n, T \to \infty

,

h \sim n^{- 1 / 3}

,

N_{0}, N_{m} \sim {(n T)}^{1 / 6} log n T

, thus

N_{1} \sim {(n T)}^{1 / 6} log n T

, it is easy to have

\begin{matrix} sup_{u, x_{m} \in [0, 1]} | {\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}) | \\ \leq & sup_{u, x_{m} \in [0, 1]} | g^{B} (u, x_{m}) | + sup_{u, x_{m} \in [0, 1]} | g^{V} (u, x_{m}) | + sup_{u, x_{m} \in [0, 1]} | g^{e} (u, x_{m}) | \\ = & O_{p} (N_{1}^{- 2}) + O_{p} (N_{1} / \sqrt{n T}) + O_{p} (h + {(n h)}^{- 1 / 2}) \\ = & O_{p} ({(n T)}^{- 1 / 3} (log n T) + n^{- 1 / 3}) . \end{matrix}

The proof of the theorem is completed. □

Proof of Theorem 2

The improved spline approximation of error process

ε_{t} (u)

is given by

{\hat{ε}}_{t} (u) = \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} {\hat{μ}}_{r, j, l} \int b_{r, j} (u, s) {\hat{ε}}_{t - l} (s) d s, 0 \leq u \leq 1,

where

\hat{μ} = {({\hat{μ}}_{1, 1, 1}, \dots, {\hat{μ}}_{N, N, p})}^{τ}

is a

p N^{2}

-dimensional vector, minimizing the following problem, i.e.,

\hat{μ} = arg min_{μ} \sum_{t = p + 1}^{T} \sum_{i = 1}^{n} {[{\tilde{ε}}_{t} (u_{i}) - \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} μ_{r, j, l} \int b_{r, j} (u_{i}, s) {\tilde{ε}}_{t - l} (s) d s]}^{2} .

Let

f_{t}^{c} (u) = f_{t} (u) - ε_{t} (u)

, and

{\hat{f}}_{t}^{c} (u) = f_{t} (u) - {\hat{ε}}_{t} (u)

, with the spline algorithm again, the improved estimate of

g_{m} (u, x_{m})

is given by

{\hat{g}}_{m} (u, x_{m}) = \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} {\hat{λ}}_{r, j, m} b_{r, j, m} (u, x_{m}), 1 \leq m \leq k,

where

\hat{λ} = {({\hat{λ}}_{1, 1, 1}, \dots, {\hat{λ}}_{N_{0}, N_{k}, k})}^{T}

is a

(N_{0} \sum_{m = 1}^{k} N_{m})

-dimensional vector satisfying

\hat{λ} = arg min_{λ} \sum_{t = 1}^{T} \sum_{i = 1}^{n} {[{\hat{f}}_{t}^{c} (u_{i}) - \sum_{m = 1}^{k} z_{t, m} \sum_{r = 1}^{N_{0}} \sum_{j = 1}^{N_{m}} λ_{r, j, m} b_{r, j, m} (u_{i}, x_{t, m})]}^{2} .

Denote

f_{t}^{c} = {(f_{t}^{c} (u_{1}), \dots, f_{t}^{c} (u_{n}))}^{τ}

and

f^{c} = {(f_{1}^{c} τ, \dots, f_{T}^{c} τ)}^{τ}

,

{\hat{f}}_{t}^{c} = {({\hat{f}}_{t}^{c} (u_{1}), \dots, {\hat{f}}_{t}^{c} (u_{n}))}^{τ}

and

{\hat{f}}^{c} = {({\hat{f}}_{1}^{c} τ, \dots, {\hat{f}}_{T}^{c} τ)}^{τ}

, meanwhile

e_{t} = {(e_{t} (u_{1}), \dots, e_{t} (u_{n}))}^{τ}

and

e = {(e_{1}^{τ}, \dots, e_{T}^{τ})}^{τ}

.

Based on the estimation of

ε

, the estimation of

λ

is given by

\hat{λ} = {(B^{τ} B)}^{- 1} B^{τ} {\hat{f}}^{c}

, where

{\hat{f}}^{c} = f - \hat{ε}

. Since random error exists in the density estimation process, namely,

\hat{f} = f + ε_{f}

, where

ε_{f}

is defined similarly as

ε

, therefore, denote

{\tilde{\hat{f}}}^{c} = \hat{f} - \hat{ε}

, the estimate of

λ

based on the observations is given by

\hat{λ} = {(B^{τ} B)}^{- 1} B^{τ} {\tilde{\hat{f}}}^{c}

.

Similar as the proof of Theorem 1, we decompose

{\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})

as

\begin{matrix} {\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} {\tilde{\hat{f}}}^{c} - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (f - \hat{ε}) + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} - g_{m} (u, x_{m}) \\ = & b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m}) \\ + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}) + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} \\ = & g^{B} (u, x_{m}) + g^{V} (u, x_{m}) + g^{e} (u, x_{m}), \end{matrix}

where

\begin{matrix} g^{B} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m}), \\ g^{V} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}), \\ g^{e} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} . \end{matrix}

For

g^{B} (u, x_{m})

,

\begin{matrix} g^{B} (u, x_{m}) & = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m}) \\ = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} [g - B λ] + [b^{τ} (u, x_{m}) λ_{m} - g_{m} (u, x_{m})] \\ = b^{τ} (u, x_{m}) A_{m} {(\frac{1}{n T} B^{τ} B)}^{- 1} B^{τ} [\frac{1}{n T} (g - B λ)] + [b^{τ} (u, x_{m}) λ_{m} - g_{m} (u, x_{m})] \end{matrix}

Similar to the discussion in the proof of Theorem 1, we can have

sup_{u, x_{m} \in [0, 1]} | g^{B} (u, x_{m}) | = O_{p} (N_{1}^{- 2}) .

For

g^{V} (u, x_{m}) = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε})

, the functional error process can be written as

ε_{t} (u) = \sum_{l = 1}^{p} \int γ_{l} (u, s) ε_{t - l} (s) d s + e_{t} (u) = \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} μ_{r, j, l} \int b_{r, j} (u, s) ε_{t - l} (s) d s + e_{t} (u),

with the corresponding approximation

{\hat{ε}}_{t} (u) = \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} {\hat{μ}}_{r, j, l} \int b_{r, j} (u, s) {\tilde{ε}}_{t - l} (s) d s, 0 \leq u \leq 1 .

Denote

b_{t} (u) = {(\int b_{1, 1} (u, s) {\tilde{ε}}_{t - 1} (s) d s, \dots, \int b_{N, N} (u, s) {\tilde{ε}}_{t - p} (s) d s)}^{τ}

,

b_{t} = {(b_{t} (u_{1}), \dots, b_{t} (u_{n}))}^{τ}

,

B_{ε} = {(b_{p}^{τ}, \dots, b_{T}^{τ})}^{τ}

. Denote

ε_{t} = {(ε_{t} (u_{1}), \dots, ε_{t} (u_{n}))}^{τ}

,

ε = {(ε_{p + 1}^{τ}, \dots, ε_{T}^{τ})}^{τ}

.

The model can be rewritten in the form of matrix as

ε \approx B_{ε} μ + e

. Based on the initial estimation, the model is

\tilde{ε} \approx B_{ε} μ + \tilde{e}

, where

\tilde{e}

is defined similarly as

ε

, and the estimation of

μ

is given by

\hat{μ} = {(B_{ε}^{τ} B_{ε})}^{- 1} B_{ε}^{τ} \tilde{ε}

.

Then,

\begin{matrix} {\hat{ε}}_{t} (u) - ε_{t} (u) & = \sum_{l = 1}^{p} \sum_{r = 1}^{N} \sum_{j = 1}^{N} {\hat{μ}}_{r, j, l} \int b_{r, j} (u, s) {\tilde{ε}}_{t - l} (s) d s - ε_{t} (u) \\ = b_{t}^{τ} (u) {(B_{ε}^{τ} B_{ε})}^{- 1} B_{ε}^{τ} \tilde{ε} - ε_{t} (u) \\ = b_{t}^{τ} (u) {(B_{ε}^{τ} B_{ε})}^{- 1} B_{ε}^{τ} (\tilde{ε} - B_{ε} μ) + (b_{t}^{τ} (u) μ - ε_{t} (u)) \end{matrix}

Since there exist a constant

C_{t}

, such that

{sup}_{u} | ε_{t} (u) - b_{t}^{τ} (u) μ | \leq C_{t} N^{- 2}

, and similarly,

{sup}_{u} | \tilde{ε} - B_{ε} μ | \leq C_{p} N^{- 2}

with another constant

C_{p}

. Then, under the assumption that sample size is infinite, it is clear that

{sup}_{u} | \tilde{ε} - ε | = O_{p} (N^{- 2})

. Therefore,

sup_{u, x_{m} \in [0, 1]} | g^{V} (u, x_{m}) | = sup_{u, x_{m} \in [0, 1]} | b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}) | = O_{p} (N^{- 2}) .

For

g^{e} (u, x_{m}) = b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}

, under the smoothness condition of

f_{t}

, combining with the proof in Theorem 1, we can get that

sup_{u, x_{m} \in [0, 1]} | g^{e} (u, x_{m}) | = O_{p} (h + {(n h)}^{- 1 / 2}) .

Therefore, as

n, T \to \infty

,

h \sim n^{- 1 / 3}

,

N_{0}, N_{m} \sim {(n T)}^{1 / 6} log n T

, namely,

N_{1} \sim {(n T)}^{1 / 6} log n T

,

N \sim {(n T)}^{1 / 6} log n T

, it is easy to have

\begin{matrix} sup_{u, x_{m} \in [0, 1]} | {\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}) | \\ \leq & sup_{u, x_{m} \in [0, 1]} | g^{B} (u, x_{m}) | + sup_{u, x_{m} \in [0, 1]} | g^{V} (u, x_{m}) | + sup_{u, x_{m} \in [0, 1]} | g^{e} (u, x_{m}) | \\ = & O_{p} (N_{1}^{- 2}) + O_{p} (N^{- 2}) + O_{p} (N_{1}^{- 2} + h + {(n h)}^{- 1 / 2}) \\ = & O_{p} ({(n T)}^{- 1 / 3} {(log n T)}^{- 2} + n^{- 1 / 3}) . \end{matrix}

The proof of the theorem is completed. □

Proof of Theorem 3

(i) We first prove the asymptotic normality of the initial estimation

{\tilde{g}}_{m} (u, x_{m})

. With (1)-(4), we rewrite

\sqrt{n T} ({\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}))

as

\begin{matrix} \sqrt{n T} ({\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})) \\ = & \sqrt{n T} [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m})] \\ + \sqrt{n T} [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}] \end{matrix}

Since the error process

ε_{t}

is independent with the covariates

x_{t}, z_{t}

, we have

E (ε_{t} | x_{t}, z_{t}) = 0

. Combined with the result of the Theorem 1, it is easy to have that

\begin{matrix} E [\sqrt{n T} ({\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}))] \\ = & \sqrt{n T} E [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (g - B λ)] + \sqrt{n T} E [b^{τ} (u, x_{m}) λ_{m} - g_{m} (u, x_{m})] \\ + \sqrt{n T} E [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} E (ε | x, z)] \\ + \sqrt{n T} E [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} E (ε_{f} | x, z)] \\ = & 0 \end{matrix}

Meanwhile,

\begin{matrix} V a r [\sqrt{n T} ({\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}))] \\ = & n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}] \end{matrix}

Based on the results proved in Theorem 1, we can get that

\begin{matrix} n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε] \\ = & n T E [E {(b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε | x, z)}^{2}] \\ = & n T E [E (b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε ε^{τ} B {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) | x, z)] \end{matrix}

Denote

D_{m} = A_{m} {(B_{*}^{τ} B_{*})}^{- 1} B_{*}^{τ}

, where

B_{*} = B / \sqrt{n T}

, then

\begin{matrix} n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε] = E [b^{τ} (u, x_{m}) E (D_{m} ε ε^{τ} D_{m}^{τ} | x, z) b^{τ} (u, x_{m})] . \end{matrix}

Denote

Σ_{ε} = E (ε ε^{τ} | x, z)

, since the error process has auto-correlation, the covariance matrix can be decomposed into two parts, namely,

Σ_{ε} = Σ_{1} + Σ_{2}

, where

Σ_{1} = diag {(Σ_{t, t})}_{1 \leq t \leq T}

is a diagonal matrix with the t-th diagonal element as

Σ_{t, t} = C o v (ε_{t})

, and

Σ_{2}

contains the off-diagonal elements as

Σ_{2} = {(Σ_{i, j})}_{1 \leq t \neq s \leq T} = {(C o v (ε_{t}, ε_{s}))}_{1 \leq t \neq s \leq T}

, representing the dependence between the auto-regressive errors.

For another,

\begin{matrix} n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}] \\ = & n T E [E (b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} ε_{f}^{τ} B {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) | x, z)] \\ = & n T σ_{ϵ}^{2} E [E (b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) | x, z)], \end{matrix}

under the smoothness assumption, when sample size

n, T \to \infty

, the error generated from the density estimation tends to zero, thus the variance of this part tends to zero.

Therefore, combined with the proof that the property of the second moment in the Theorem 1 satisfies the Linderberg-Feller central limit theorem, then under the assumption that

n ≫ T \to \infty

,

\sqrt{n T} {(C_{m} Σ_{ε} C_{m}^{τ})}^{- 1} ({\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})) \overset{D}{\to} N (0, 1),

where

C_{m} = b^{τ} (u, x_{m}) E (D_{m}) = b^{τ} (u, x_{m}) E (A_{m} {(B_{*}^{τ} B_{*})}^{- 1} B_{*}^{τ})

,

Σ_{ε} = {(Σ_{t, s})}_{1 \leq t, s \leq T}

satisfying

Σ_{t, s} = C o v (ε_{t}, ε_{s})

.

(ii) After obtaining the spline estimation of error process as

\hat{ε}

, the improved estimation of bivariate varying-coefficient functions

g_{m} (u, x_{m})

is estimated from the refined model as

f_{t}^{c} (u) = f_{t} (u) - ε_{t} (u) = \sum_{m = 1}^{k} z_{t, m} g_{m} (u, x_{t, m}) + e_{t} (u)

.

As proved in the Theorem 3,

\begin{matrix} \sqrt{n T} ({\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})) \\ = & \sqrt{n T} [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} {\tilde{\hat{f}}}^{c} - g_{m} (u, x_{m})] \\ = & \sqrt{n T} [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (f - \hat{ε}) + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f} - g_{m} (u, x_{m})] \\ = & \sqrt{n T} [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} g - g_{m} (u, x_{m})] \\ + \sqrt{n T} [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}) + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}] \end{matrix}

Since

\hat{ε}

is the estimation of the error process

ε

, then based on the convergence results obtained in Theorem 3, it is clear to get that

E (\sqrt{n T} [{\tilde{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})] = 0

.

Meanwhile,

\begin{matrix} V a r [\sqrt{n T} ({\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m}))] \\ = & n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (g + ε - \hat{ε} - ε_{f}) - g_{m} (u, x_{m})] \\ = & n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}) + b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} ε_{f}] \end{matrix}

Based on the results proved in Theorem 3 and the same notes, we can get that

\begin{matrix} n T V a r [b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε})] \\ = & n T E [E {(b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}) | x, z)}^{2}] \\ = & n T E [E (b^{τ} (u, x_{m}) A_{m} {(B^{τ} B)}^{- 1} B^{τ} (ε - \hat{ε}) {(ε - \hat{ε})}^{τ} B {(B^{τ} B)}^{- 1} A_{m}^{τ} b (u, x_{m}) | x, z)] \\ = & E [b^{τ} (u, x_{m}) D_{m} E ((ε - \hat{ε}) {(ε - \hat{ε})}^{τ} | x, z) D_{m}^{τ} b (u, x_{m})] \end{matrix}

Denote

Ξ_{ε} = E ((ε - \hat{ε}) {(ε - \hat{ε})}^{τ} | x, z)

, similarly, it can also be decomposed as two parts as

Ξ_{ε} = Ξ_{1} + Ξ_{2}

, where

Ξ_{1} = diag {(Ξ_{t, t})}_{1 \leq t \leq T}

is a diagnoal matrix with the t-th diagonal element as

Ξ_{t, t} = C o v (ε_{t} - {\hat{ε}}_{t})

, and

Ξ_{2}

contains the off-diagonal elements as

Ξ_{2} = {(Ξ_{t, s})}_{1 \leq t \neq s \leq T} = {(C o v (ε_{t} - {\hat{ε}}_{t}, ε_{s} - {\hat{ε}}_{s}))}_{1 \leq t \neq s \leq T}

.

Due to the convergence of

ε - \hat{ε}

proved before, the variance matrix becomes

Ξ_{ε} = Ξ_{1} = C o v (e_{t})

as

n, T \to \infty

. Since

C o v (e_{t} (u), e_{t} (s)) = σ_{t}^{2} (u, s)

, then, the variance matrix can be written as

Ξ_{ε} = C o v (e_{t}) = diag {(Ξ_{t, t})}_{1 \leq t \leq T}

, where

Ξ_{t, t} (u, s) = σ_{t}^{2} (u, s)

.

Therefore,

\sqrt{n T} {(C_{m} Ξ_{ε} C_{m}^{τ})}^{- 1} ({\hat{g}}_{m} (u, x_{m}) - g_{m} (u, x_{m})) \overset{D}{\to} N (0, 1),

where

C_{m} = b^{τ} (u, x_{m}) E (A_{m} {(B_{*}^{τ} B_{*})}^{- 1} B_{*}^{τ})

,

Ξ_{ε} = diag {(Ξ_{t, t})}_{1 \leq t \leq T}

with

Ξ_{t, t} (u, s) = σ_{t}^{2} (u, s)

.

The proof of the theorem is completed. □

Author Contributions

Conceptualization, Zixuan Han, TAO LI and Jinhong You; Data curation, Zixuan Han; Formal analysis, Zixuan Han; Funding acquisition, TAO LI and Jinhong You; Investigation, Zixuan Han; Methodology, Zixuan Han, TAO LI and Jinhong You; Project administration, TAO LI, Jinhong You and Narayanaswamy Balakrishnan; Resources, Zixuan Han; Software, Zixuan Han; Supervision, TAO LI, Jinhong You and Narayanaswamy Balakrishnan; Validation, Zixuan Han; Writing – original draft, Zixuan Han; Writing – review & editing, Zixuan Han, TAO LI, Jinhong You and Narayanaswamy Balakrishnan. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Dr. Li and Dr. You. Dr. Li’s research is supported by grants from Humanities and Social Science Fund of Ministry of Education of China (No. 21YJA910001). Dr. You’s research is supported by grants from the National Natural Science Foundation of China (NSFC) (No.11971291).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original datasets employed in this study are publicly accessible from the official website of Johns Hopkins University at https://www.jhu.edu/ and the World Bank’s online platform at https://data.worldbank.org/.

Acknowledgments

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Berhoune, K.; Bensmain, N. . Sieves estimator of functional autoregressive process. Statistics and Probability Letters 2018, 135, 60–69. [Google Scholar] [CrossRef]
Bosq, D. . Linear processes in function spaces: theory and applications. Springer Science & Business Media, New York, 2000.
Chen, Y.; Chua, W. S.; Hardle, W. Forecasting limit order book liquidity supply-demand curves with functional autoregressive dynamics. Quantitative Finance 2019, 19(9), 1473–1489. [Google Scholar] [CrossRef]
Chen, Y.; Li, B. . An adaptive functional autoregressive forecast model to predict electricity price curves. Journal of Business and Economic Statistics 2017, 35(3), 371–388. [Google Scholar] [CrossRef]
Chen, Y.; Lin, Z.; Müller, H. . Wasserstein regression. Journal of the American Statistical Association 2023, 118(542), 869–882. [Google Scholar] [CrossRef]
Daniel, R.; David, S.; David, R. . Functional autoregression for sparsely sampled data. Journal of Business and Economic Statistics 2019, 37(1), 97–109. [Google Scholar]
DeVore, R.; Lorentz, G. . Constructive approximation, Springer Science & Business media, 1993; volume 303.
Han, K.; Müller, H.; Park, B. . Additive functional regression for densities as responses. Journal of the American Statistical Association 2020, 115(530), 997–1010. [Google Scholar] [CrossRef]
Kokoszka, P.; Miao, H.; Petersen, A.; Shang, H. L. . Forecasting of density functions with an application to cross-sectional and intraday returns. International Journal of Forecasting 2019, 35, 1304–1317. [Google Scholar] [CrossRef]
Kokoszka, P.; Reimherr, M. . Determining the order of the functional autoregressive model. Journal of Time Series Analysis 2013, 34, 116–129. [Google Scholar] [CrossRef]
Petersen, A.; Chen, C.; Müller, H. Quantifying and visualizing intraregional connectivity in resting-state functional magnetic resonance imaging with correlation densities. Brain Connectivity 2019, 9(1), 37–47. [Google Scholar] [CrossRef] [PubMed]
Petersen, A.; Müller, H. . Functional data analysis for density functions by transformation to a Hilbert space. The Annals of Statistics 2016, 44(1), 183–218. [Google Scholar] [CrossRef]
Petersen, A.; Müller, H. . Fréchet regression for random objects with Euclidean predictors. The Annals of Statistics 2019, 47(2), 691–719. [Google Scholar] [CrossRef]
Saha, A.; Banerjee, S.; Kurtek, S.; Narang, S.; Lee, J.; Rao, G.; Martinez, J.; Bharath, K.; Rao, A.; Baladandayuthapani, V. . DEMARCATE: Density-based magnetic resonance image clustering for assessing tumor heterogeneity in cancer. NeuroImage: Clinical 2016, 12, 132–143. [Google Scholar] [CrossRef] [PubMed]
Sen, R.; Ma, C. Forecasting density function: Application in finance. Journal of Mathematical Finance 2015, 5, 433–447. [Google Scholar] [CrossRef]
Stone, C. . The use of polynomial splines and their tensor products in multivariate function estimation. The Annals of Statistics 1994, 22(1), 118–171. [Google Scholar]
Talská, R.; Menafoglio, A.; Machalová, J.; Hron, K.; Fiserová, E. Compositional regression with functional response. Computational Statistics & Data Analysis 2018, 123(1), 66–85. [Google Scholar] [CrossRef]
Xu, X.; Chen, Y.; Zhang, G.; Koch, T. . Modeling functional time series and mixed-type predictors with partially functional autoregressions. Journal of Business and Economic Statistics 2022, 0, 1–18. [Google Scholar] [CrossRef]
Zhang, C.; Kokoszka, P.; Petersen, A. . Wasserstein autoregressive models for density time series. Journal of Time Series Analysis 2022, 43, 30–52. [Google Scholar] [CrossRef]

Figure 1. Densities of global mortality rate (‰) of COVID-19 over an interval of 100 days. (a): Three-dimensional trend of density time series during the whole period; (b) Density curves at three different selected days.

Figure 2. Average estimations of

F A R (1)

error process

ε (u)

obtained from 200 Monte Carlo runs with sample size

T = 100

and

n = 100

observations. Picture (a): true curves, (b): spline estimations.

Figure 2. Average estimations of

F A R (1)

error process

ε (u)

obtained from 200 Monte Carlo runs with sample size

T = 100

and

n = 100

observations. Picture (a): true curves, (b): spline estimations.

Figure 3. Average estimations of

g_{m} (u, x_{m}), m = 1, 2

. Left panels: true densities, middle panels: initial estimations, right panels: improved estimations. Upper two panels display

g_{1} (u, x_{1})

from two angles, lower panels display

g_{2} (u, x_{2})

.

Figure 3. Average estimations of

g_{m} (u, x_{m}), m = 1, 2

. Left panels: true densities, middle panels: initial estimations, right panels: improved estimations. Upper two panels display

g_{1} (u, x_{1})

from two angles, lower panels display

g_{2} (u, x_{2})

.

Figure 4. Heat map of bivariate function

g_{1} (u, x_{1})

in the model based on the mortality rate (‰) data of COVID-19.

Figure 4. Heat map of bivariate function

g_{1} (u, x_{1})

in the model based on the mortality rate (‰) data of COVID-19.

Figure 5. Densities of national quarterly personal income in the USA over an interval of 44 quarters. (a): Three-dimensional trend of density time series during the whole period; (b): Density curves at three different selected quarters .

Figure 6. Heat maps of bivariate varying-coefficient functions

g_{m} (u, x_{m}), m = 0, 1, 2,

in the USA income data model.

Figure 6. Heat maps of bivariate varying-coefficient functions

g_{m} (u, x_{m}), m = 0, 1, 2,

in the USA income data model.

Table 1. Average RMSEs of both initial and improved estimations of

g_{m} (u, x_{m})

.

Table 1. Average RMSEs of both initial and improved estimations of

g_{m} (u, x_{m})

.

Average RMSEs of Bivariate Varying-Coefficient Additive Functions
Sample Size		$g_{1} (u, x_{1})$		$g_{2} (u, x_{2})$
T	n	Initial	Improved	Initial	Improved
50	50	0.2247	0.1848	0.2139	0.1785
	100	0.1759	0.1325	0.1844	0.1521
100	50	0.1826	0.1471	0.1732	0.1354
	100	0.1431	0.1164	0.1319	0.1057

Table 2. Average Standard Deviation (SD) and Bias of both initial and improved estimations of

g_{m} (u, x_{m})

.

Table 2. Average Standard Deviation (SD) and Bias of both initial and improved estimations of

g_{m} (u, x_{m})

.

Average SD and Bias of Bivariate Varying-Coefficient Additive Functions
Sample Size		$g_{1} (u, x_{1})$				$g_{2} (u, x_{2})$
		Initial		Improved		Initial		Improved
T	n	SD	Bias	SD	Bias	SD	Bias	SD	Bias
50	50	0.205	0.147	0.168	0.104	0.219	0.137	0.183	0.117
	100	0.179	0.122	0.142	0.093	0.196	0.128	0.164	0.095
100	50	0.174	0.136	0.151	0.082	0.187	0.131	0.158	0.086
	100	0.133	0.099	0.112	0.057	0.153	0.111	0.129	0.061

Table 3. Empirical power of testing algorithm to determine the order of FAR error process under different significance levels.

Null Hypothesis		$p = 0$		$p \leq 1$		$p \leq 2$
Alternative Hypothesis		$p \geq 1$		$p \geq 2$		$p \geq 3$
Sample Size		Significance Level		Significance Level		Significance Level
T	n	0.05	0.1	0.05	0.1	0.05	0.1
50	50	0.893	0.962	0.787	0.846	0.082	0.134
	100	0.931	0.985	0.824	0.893	0.073	0.125
100	50	0.942	0.972	0.821	0.881	0.071	0.121
	100	0.985	1.000	0.889	0.935	0.064	0.113

Table 4. Average RMSEs of both initial and improved estimations of

g_{m} (u, x_{m})

.

Table 4. Average RMSEs of both initial and improved estimations of

g_{m} (u, x_{m})

.

Average RMSEs of Bivariate Varying-Coefficient Additive Functions
Sample Size		$g_{1} (u, x_{1})$		$g_{2} (u, x_{2})$
T	n	Initial	Improved	Initial	Improved
50	50	0.2739	0.2438	0.2691	0.2235
	100	0.2264	0.1852	0.2157	0.1809
100	50	0.2136	0.1817	0.2232	0.1761
	100	0.1729	0.1263	0.1816	0.1224

Table 5. P-values of the testing algorithm for identifying the order of functional error process based on the mortality rate data of COVID-19.

Null Hypothesis	$p = 0$	$p \leq 1$
Alternative Hypothesis	$p \geq 1$	$p \geq 2$
P-value	0.000	0.194

Table 6. P-values of the testing algorithm for identifying the order of the functional error process based on USA income data.

Null Hypothesis	$p = 0$	$p \leq 1$	$p \leq 2$
Alternative Hypothesis	$p \geq 1$	$p \geq 2$	$p \geq 3$
P-value	0.000	0.000	0.436

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Varying-Coefficient Additive Models with Density Responses and Functional Auto-Regressive Error Process

Abstract

Keywords:

Subject:

1. Introduction

2. Model Setup

3. Three-Step Estimation Methodology

3.1. Initial Estimation of Bivariate Varying-Coefficient Function

3.2. Estimation of FAR Error Process

3.3. Improved Estimation of Bivariate Varying-Coefficient Function

3.4. Implementation

3.4.1. Selection of Bandwidth

3.4.2. Identifying the Order of the FAR Process

4. Theoretical Results

5. Numerical Study

5.1. Case 1

5.2. Case 2

6. Real Data Analysis

6.1. COVID-19 Data

6.2. USA Income Data

7. Discussion

Appendix A

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe