A Literature Review of Semi-Functional Partial Linear Regres- sion models

Background: In the functional data analysis (FDA), the hybrid or mixed data are scalar and functional datasets. The semi-functional partial linear regression model (SFPLR) is one of the first semiparametric models for the scalar response with hybrid covariates. Various extensions of this model are explored and summarized. Methods: Two first research articles, including “semi-functional partial linear regression model”, and “Partial functional linear regression” have more than 300 citations in the Google Scholar. Finally, only 106 articles remained according to the inclusion and exclusion criteria such as 1) including the published articles in the ISI journals and excluding 2) non-English and 3) preprints, slides and conference papers. We use PRISMA standard for systematic review. Results: The articles are categorized into the following main topics: estimation procedures, confidence regions, time series, and panel data, Bayesian, spatial, robust, testing, quantile regression, varying Coefficient Models, Variable Selection, Single-index model, Measurement error, Multiple Functions, Missing values, Rank Method and Others. There are different applications and datasets such as Tecator dataset, air quality, electricity consumption, and Neuroimaging ,among others. Conclusions: SFPLR is one of the most famous regression modeling methods for hybrid data that has a lot of extensions among other models.


Introduction
The regression models in statistics estimate the relations between dependent variables (continuous, categorical, counts, or time-to-event) and independent variables. They are linear regression models, generalized linear models (GLM), generalized additive models (GAM), etc. for modeling different relations [1].
Recent advances in theoretical and application yield the new term "functional data analysis (FDA)" that it provides methodologies to summarize and model the highdimension variables with underlying functional structure. In many cases when both functional and non-functional (such as scalar) are in interest, the new term is mixed ,or hybrid data [2,3].
One of the first regression models that formulate the relationship between scalar and hybrid data is semi-functional partial linear regression [4]. Since then, there have been many extensions and applications of this model that is published, and this systematic review collects and summaries the most important of them. It first introduces the model in the mathematical formula, in the next part, it categorizes the models and applications. Finally, it has some conclusions about computing and usefulness of the model.

Systematic review
The research articles titled "Semi-Functional Partial Linear Regression" by German Aneiros-Perez and Philippe Vieu [4] and "Partial functional linear regression" by Hyejin Shin [5] published in 2006 and 2009 in the Statistics and Probability Letters and Journal of Statistical Planning and Inference, respectively.
The main idea of these models is to consider both the functional and nonfunctional, or mixed and hybrid, covariates to predict the real-valued scalar response. It used the nonparametric method for functional covariate with the weights of the functional version of the Nadaraya-Watson and the parametric method for the nonfunctional covariate with the linear relation. In the next few years, the authors and some researchers extended this model ,and recently ,many other extensions published.
In this study, we search these extensions with Google Scholar from 2006 to 2021 ,and there are at least 300 results. In the next step, we select among them based on the following inclusion criteria: published in the ISI-indexed journals and the exclusion criteria are 1) written in a non-English language, 2) published as preprints, thesis, conference proceedings, and slides. Finally, 106 research articles remained. (Figure 1) [6] 3. Models and their extensions

Partial functional linear regression
In a more general model, "suppose , the response is a real-valued r.v. on a probability space (Ω, ℬ, ). And ,non-functional covariates, is a p-dimensional vector of r.v. with zero means and finite second moments. And { ( ): ∈ Τ} be a zero mean, the second-order stochastic process defined on (Ω, ℬ, ) with sample paths in ( ) the set of all square-integrable functions on " [5] : It is a general model, such that:

Other extensions
There are many other extensions for models (1) and (5) ( [4] and [5]), we categorize them:  Estimation procedures: there are different methods and studies for estimation procedures of SFPLR model such as: fully automatic estimation procedure with the data-driven method and cross-validation for bandwidth selection of the smoothing parameter of nonparametric component [7], the asymptotic normality of linear part is studied [8], in a situations when the observation number of each subject is completely flexible and studying the convergence rate of the nonparametric part [9], the spline estimator of nonparametric part with studying their convergence rate [10], the two estimation procedure: 1-functional principal components regression (FPCR) and 2-functional ridge regression (FRR) based on the Tikhonov regularization [11][12][13], the new estimators for the parametric component called semiparametric least squares estimator (SLSE) [14], the nonparametric component approximated by a B-spline function [15], polynomial spline [16] and the slope function is estimated with the functional principal component basis [15,16], the k-nearest-neighbours (kNN) estimates with the local adaptive property that is better in the practice than kernel methods and some computations properties of this estimator [17][18][19], the Functional Semiparametric Additive Model via COmponent Selection and Smoothing Operator (FSAM-COSSO) in sparse setting [20], the sufficient dimension reduction methods such as sliced inverse regression (SIR) and sliced average variance estimation (SAVE) [21], the estimations are from the reproducing kernel Hilbert spaces (RKHS) [22], the frequentist and optimal model averaging [23], the latent group structure with K-means clustering [24], the joint asymptotic framework called joint Bahadur representation [25], the empirical likelihood estimation for non-functional high-dimension covariates [26], Sparse and penalized least-squares estimators [27] and the software for doing this analysis is available [28] .  Confidence Regions: Some papers have some sections for calculating confidence regions ,and we do not repeat them. We emphasize the following articles: the empirical likelihood ratio with plug-in approach and its biascorrected version [29] and the confidence bands for Partial functional linear regression [30].
 Time Series and Panel Data: The extensions that are related to the time series and forecasting are: the Semi-functional partial linear time series modeling for prediction [31], with autoregressive errors [32,33], with timevarying parameters for latent parameter regimes [34], regularized forecasting via smooth-rough partitioning of the regression coefficients [35].  Bayesian: The Bayesian estimation methods are present in some papers ,but we only mention these two papers in this part: the Bayesian bandwidth estimation and semi-metric selection for a functional partial linear model with unknown error density [36,37].  Spatial: The spatial variability is considered in many research articles such as The partial functional linear spatial regression autoregressive model with spatial dependence responses [38], with two-stage estimator based on quasi-maximum likelihood estimation (QMLE) method and local linear regression method [39], studying the asymptotic normality of the parametric component, and probability convergence with the rate of the nonparametric component [40], B-spline approximation for slope function and residualbased approach for pointwise confidence-intervals [41], the robust spatial autoregressive model with t-distribution error terms with an expectationmaximization algorithm [42].  Robust: Existing outliers in the data or violations from distributional assumptions yield to the robust methods such as the sieve M-estimator for semi-functional linear model [43], with polynomial splines to approximate the slope parameter and resistance to heavy-tailed errors or outliers in the response [44], different estimators such as M-estimators with bi-square function, GM-estimator with Huber function, LMS-estimator and LTS-estimators [45], estimation based on exponential squared loss and FPCA [46], estimation based on the class of scale mixtures of normal (SMN) distributions for measurement errors and Bayesian framework with MCMC algorithm [47], Robust MM-estimators with B-Spline approximation [48], with modal regression [49] and a modified Huber's function with tail function with a data-driven procedure for selecting the tuning parameters [50].  Testing: Different hypothesis and testing statistics are developed ,such as: testing the linear component [51,52] with B-spline [53], functional covariates [54], densely and sparsely observed single and multiple functional covariates with four tests such as Wald, Score, likelihood ratio and F [55], Goodness-of-fit tests with wild bootstrap resampling, false discovery rate and independence test with generalized distance covariance or new metric, functional martingale difference divergence (FMDD), [56][57][58], series correlation test [59].  Quantile regression: Some extensions consider quantile regression property such as: proposed functional partially linear quantile regression model (FPLQRM) that has the linear variables which may be categorical [60], estimating the slope function between a dependent variable and both vector and functional random variable with FPCA [61], and piecewise polynomial [62] and kNN quantile method [63], functional composite quantile regression (CQR) with simple partial quantile regression (SIMPQR) algorithm and partial quantile regression (PQR) basis [64], composite quantile estimation with strictly stationary process errors [65] and with polynomial splines [66], Hill estimator for extreme quantile estimation with heavy-tailed distributions [67], developed quantile rank score test for a parametric component of the model [68], varying-coefficient partially functional linear quan-tile regression model [69] with quantile estimation [70]. And responses with missing-at-random (MAR) [71].  Varying Coefficient Models: There are some papers with extensions of varying coefficient models ,but we don't repeat them ,and we only select the following extensions: the partially varying coefficient models stratified by a functional covariate [72], varying coefficient partially functional linear regression model (VCPFLM) [73], partially functional linear varying coefficient model (PFLVCM) with a hypothesis and bootstrap [74], and the robust estimation based on the rank-based estimation [75].  Variable Selection: These papers are related to the variable selections methodologies: the variable selection with nonconcave-penalized least square in a high-dimensional partial linear regression model and with penalized composite quantile regression method [76,77], simultaneously consider multiple functional and scalar predictors and identify the important features [78], estimation and variable selection based on penalized regression estimators [79].  Single-index model : The SFPLR with single-index models are: functional partial linear single-index model (FPLSIM) [80], with B-spline approximations [81] , with profile least-squares estimation (PLSE) for slope [82] and Partially Linear Generalized Single Index Models for Functional Data (PLGSIMF) [83]. The systemic review of semiparametric regression (single functional index regression (SFIR)) model is available [84].  Measurement error: There are some extensions that variables have measurement error, such as the model with error-in-response and FPCA estimation [85], non-functional covariate with error, its test and with corrected profile least-squares based estimation [86,87], both scalar and functional covariate measured with additive error [88]. for models with missing at random (MAR) responses with their confidence intervals [91][92][93].
 Rank Method: the nonparametric estimation methods are extended ,such as the rank estimation for partial functional linear regression models with FPCA [94], hypothesis test for the parametric component based on the rank score function [95].
 Others: Some important papers are: the functional partial linear model that combines the parametric and nonparametric approaches with functional regression [96]. The error variance estimation and confidence region construction are presented [97]. Naïve and wild bootstrap procedures are for kernel-based estimators [98]. The partial functional linear model with skew-normal errors and homogeneity test is proposed [99]. The generalized partial functional linear additive models (GPFLAM) are approximated by polynomial splines ,and FPCA and asymptotic normality of the estimator is obtained [100,101]. The time-to-event response in the presence of random right censoring is modeled with a synthetic response by transforming it with three different types of transformation (L), (KSV), and a more general class of transformation (FG). Then it models with functional linear regression model [102]. The two-sample functional linear model with functional responses is a general model for the partial functional linear model ,and it has been studied recently [103]. An example of partial functional linear regression in reinforced risk prediction with electronic health records is simulated [104].

Discussion
The hybrid data [2] is widespread, in the FDA problems ,and three main scalar-onfunctions regression models are introduced [4,5,96] ,and their extensions are the main focus of this article. But there is not limited to only these methods ,and we discuss some other frameworks and models with different structures.

Other Models
Among the various models, we select the following methods: Dimension reduction methods: the principal component analysis for hybrid or mixed data was introduced by [2] with some examples in Canadian weather stations that consider average vectors along the temperature curves. The other models are three dimension hybrid PCA with an application in electroencephalography [110] and hybrid functional and vector data (HFV-PCA) [111].
Regression Models: The scalar-on-function regression with FACE, Penalized functional regression and scalar-on-additive models are counted as functional regressions [1,112].Other examples are the general additive regression model and variable selection with scalar response and mixed (scalar, functional, directional,…) covariates [113], function-on-both functional and scalar covariates with signal compression [114], covariateadjusted generalized functional linear model [115].

Final Remarks
The extensions of semi-functional partial linear regression models published in more than 40 ISI-indexed journals, mainly in statistical journals. The first four journals from several articles are "Communications in Statistics-Theory and Methods", "Journal of Statistical Planning and Inference", "Journal of Multivariate Analysis", and "Metrika" with 9,8,7 and 6 published articles, respectively. The number of published articles from 2006 to 2021 is increasing ,which is about 26 for 2020 and 20 for 2021 at the end of September.

Conclusions
The semi-functional partial linear regression model and their extensions for different methods and situations are for examples time-series, quintile regression, varying coefficient model, statistical testing, robust estimation, Bayesian estimation, multifunctional covariates, variable selection, confidence bands, and prediction intervals, missing data, errors in variable and others and different tests for both parametric and non-parametric components of the model.
And there are also different applications such as in spectroscopy, in air-pollution and related topics, child growth study, neuroimaging, electricity demand and price, and others. Most of them published in statistical journals but some of them published in neuroscience, energy, and mathematics journal. But they are other methods for mixed and hybrid data that we discussed.
With exploiting of big data and availability of different kinds of data types such as scalar, functions, time-series, spatial points, missing data, directional, survival analysis, images, etc. in various fields of research such as genetics, pharmaceuticals, neuroimaging, movement, and mobile health (mhealth) monitoring, the need for models that can use the most information of them are vital. Among them, Semi-Functional Partial Linear Regression models are developed and used widely. Supplementary Materials: Some visualization is available in Shiny web applications: https://mohammadfayaz.shinyapps.io/IWFOS_2021/ . The research article is checked for Plagiarism with iThenticate (www.ithenticate.com) and it is checked for correctness, clarity and writing tips with Grammarly Premium (www.grammarly.com). This paper is prepared for a special issue functional data analysis (FDA) of the Stats.