Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification

Influential observations (IOs), which are outliers in the x direction, y direction or both, remain a problem in the classical regression model fitting. Spatial regression models have a peculiar kind of outliers because they are local in nature. Spatial regression models are also not free from the effect of influential observations. Researchers have adapted some classical regression techniques to spatial models and obtained satisfactory results. However, masking or/and swamping remains a stumbling block for such methods. In this article, we obtain a measure of spatial Studentized prediction residuals that incorporate spatial information on the dependent variable and the residuals. We propose a robust spatial diagnostic plot to classify observations into regular observations, vertical outliers, good and bad leverage points using a classification based on spatial Studentized prediction residuals and spatial diagnostic potentials, which we refer to as ISRs− Posi and ESRs− Posi. Observations that fall into the vertical outliers and bad leverage points categories are referred to as IOs. Representations of some classical regression measures of diagnostic in general spatial models are presented. The commonly used diagnostic measure in spatial diagnostics, the Cook’s distance, is compared to some robust methods, H2 i (using robust and non-robust measures), and our proposed ISRs − Posi and ESRs − Posi plots. Results of our simulation study and applications to real data showed that the Cook’s distance, non-robust Hsi1 and robust H 2 si2 were not very successful in detecting IOs. The Hsi1 suffered from the masking effect, and the robust H 2 si2 suffered from swamping in general spatial models. Interestingly, the results showed that the proposed ESRs− Posi plot, followed by the ISRs− Posi plot, was very successful in classifying observations into the correct groups, hence correctly detecting the real IOs.


Introduction
Belsley et al. [1] defined an influential observation (IO) as one which, either individually or together with several other observations, has a demonstrably large impact on the calculated values of various estimates. An influential observation could be an outlier in the X-space (leverage points) or outlier in the Y-space (vertical outlier). Leverage points can be classified into good (GLPs) and bad leverage points (BLPs). Unlike BLPs, GLPs follow the pattern of the majority of the data; hence, they are not considered as IOs as they have little or no influence on the calculated values of numerous estimates [2,3]. In this connection, Rashid et al. [2] stated that IOs could be vertical outliers (VO) or BLPs. Thus, it is very crucial to identify IOs as they are responsible for misleading conclusions about the fitted regression models and various other estimates. Once the IOs are identified, there is a need to study their impact on the model and subsequent analyses. There is a handful of studies on the diagnostic of IOs in linear regression; some examples are [1,[3][4][5][6][7][8][9][10][11][12]. Other articles 2 of 20 in the literature deal with regressions with correlated residuals, e.g., [13][14][15][16][17]. However, only a few articles deal with the detection of IOs in spatial regression models; some examples include [18][19][20][21][22]. Some robust estimation methods in spatial regression are [23][24][25]. Christensen et al. [18] and Haining [19] adapted one of the diagnostic measures in [3] to detect influential observations in spatial error autoregression model. They achieved this by defining correlated errors through the spatial weight matrix and coefficient of spatial autocorrelation in the error term. They also presented the spatial Studentized prediction residuals and the spatial leverage terms that contain error terms in spatial information.
The presence of high or low attribute value in the neighbourhood of a spatial location may result in the inability to detect the true spatial outlier, or the false identification of a good observation as an outlier [26]. Hadi [27] has also noted that spatial outlier detection methods inherit the problem of masking and swamping. Masking occurs when outlying observations are incorrectly declared as inliers. Swamping on the other hand, occurs when clean observations are incorrectly classified as outliers [28]. Aggarwal [29] observed that spatial outlier breaks the spatial autocorrelation and continuity of spatial locations. Spatial autocorrelation is a systematic pattern in attribute values that are recorded in several locations on a map. Attribute values in one location that are associated with values at neighbouring locations indicate the presence of autocorrelation. Positive autocorrelation indicates similar values that are clustered together. Negative autocorrelation indicates low attribute values in the neighbourhood of high attribute values and vice-versa [30].
Robust estimation methods mostly focus on estimations that are not influenced much by the effects of outliers. Anselin [23] has extended the bootstrap estimation to mixed regressive spatial autoregressive models, where pseudo error terms are generated by sampling from the vector of error terms. The spatial structure of the data is maintained through the generation of error terms. Politis et al. [31] and Heagerty and Lumley [32] also adopted the bootstrap method on blocks of contiguous locations to generate replicates of the estimates of the asymptotic standard error of statistics. Cerioli and Riana [24] argued that a robust estimator of the spatial autocorrelation parameters did not exist based on all datasets. They proposed a forward search algorithm based on blocks of contagious spatial locations (BFS). The BFS algorithm are drawn in such a way that the blocks retain the spatial dependence structure of the original data. Yildirim [25] proposed a robust estimation method of the log-likelihood with influence function in the spatial error model. This is achieved iteratively using scoring algorithm to estimate the parameters. Though they succeeded in obtaining robust estimates, identifying spatial outliers, which is vital in spatial statistics [26], was not achieved. Popular graphical techniques to detect spatial outliers are the scatterplot [33], the Moran's scatterplot [30] and the pockets of nonstationarity [34]. Besides being prone to the problem of masking and swamping [26], they focused mainly on spatial outliers in the Y-space only.
Diagnostic works on models that have both spatial autocorrelations in dependent variable and residual terms are missing in the literature. The problem of masking and swamping is prevalent in spatial regression model diagnostics, which may be due to the presence of vertical outliers as well as leverage points, as in the case of linear regression ( [27]). This motivates us to represent the spatial Studentized prediction residuals and spatial leverage values in the general spatial model, and to adapt and extend some robust diagnostic measures of detection of outliers and IOs in linear regression, such as Hadi's potential (p oii ), Cook's distance (CD i ) [3], the overall potential influence (H 2 i ) [10], and the external (ESRs) and internal (ISRs) Studentized residuals [1,9,10], to spatial regression models in order to minimize the problem of masking and swamping in spatial models.
In this article, we propose a robust spatial diagnostic plot and adapt some diagnostic measures in the linear regression model. Representations of the diagnostic measures in the spatial regression model are obtained, with a special emphasis on the general spatial regression model (GSM) that performs autoregression on both the dependent variable and error terms. The main objective of this study is to propose a robust spatial diagnostic plot. Other objectives are: (1) to represent the leverage values of the hat matrix of the linear regression in the GSM model; (2) to extend the ISR of the linear regression to the GSM model; (3) to extend the ESR of the linear regression to the GSM model; (4) to extend the Cook's distance and the overall potential influence of the linear regression to the GSM model (5) to develop a method of identification of the influential observations of the GSM model by proposing a procedure of classification of the observations into regular observations, vertical outliers, good and bad leverage points, and hence IOs; (6) to evaluate the performances of the proposed methods by using simulation studies; (7) to apply the proposed methods on gasoline price data for retail sites in Sheffield, UK, COVID-19 data in Georgia, USA, and the life expectancy data from USA counties. The significance of this study is that it can contribute to the development of a method of identification of influential observations in spatial regression models.

Identification of Influential Observations in a Linear Regression Model
Consider a k-variable regression model: where y is an n × 1 vector of observations of dependent variables, X is an n × k matrix of independent variables, β is a k × 1 vector of unknown regression parameters, ε is an n × 1 vector of random errors with identical normal distributions, that is, ε ∼ N ID 0, σ 2 . The ordinary least squares (OLS) estimates in Equation (1) are given by: The vector of predicted values can be written as: where P = X X T X −1 X T is the hat/leverage matrix. The diagonal elements of the leverage matrix are called the hat values, denoted as p ii , and given by: The hat matrix is often used as diagnostics to identify leverage points. Leverage is the amount of influence exerted by the observed response y i on the predicted variableŷ i . As a result, a large leverage value indicates that the observed response has a large effect on the predicted response.
Hoaglin and Welsh [3] suggested that an observation which exceeds 2k n , where 2k n is the average value of p ii , is considered as a leverage point, while Vellman and Welsch suggested 3k n as a cut-off point for leverage points. Huber [7] suggested that the ranges p ii ≤ 0.2, 0.2 < p ii ≤ 0.5 and p ii > 0.5 are safe, risky and to be avoided, respectively, for leverage values.
Unfortunately, the hat matrix suffers from the masking effect. As a result, p ii often fails to detect high leverage points. Hadi [10] suggested a single-case-deleted measure called potentials or Hadi's potentials. The diagonal element of a potential denoted as p oii , is given by: where X (i) is the matrix X with the i th row deleted. We can rewrite p 0ii as a function of p ii as: The vector of the residuals, r, can be written as: The Studentized residuals (internally Studentized residuals) denoted as ISRs and R-Student residuals (externally Studentized residuals) denoted as ESRs are widely used measures for the identification of outliers (see [7]). The ISR, denoted as t i , is defined as: whereσ is the standard deviation of the residuals, r i and p ii are the i th residual and diagonal element of the matrix P, respectively (see [9] for details). Meanwhile, Chatterjee and Hadi [9] defined ESR denoted as t * i and given by: whereσ (i) is the residuals mean square excluding the i th case. The ESR follows a Student's t-distribution with (n − k − 1) degrees of freedom [9]. One of the most employed measures of influence in linear regression is the Cook's distance [3]. It measures the influence on the regression coefficient estimate or the predicted values. The Cook's distance is given bŷ whereβ is the vector of estimates of β using the full data,β (−i) is the vector of estimates of β with the i th observation of y i and x i omitted, k is the number of parameters andσ 2 is the estimate of variance. Any i th observation is declared influential observation (IO) ifĈD i > F[0.5; k, (n − k)]. Meloun [12] noted that any observation in which CD i > 1 is considered as an influential observation. The Cook's distance can also be written as [8,9]: Computing theĈD i X T X, kσ 2 does not require fitting a regression equation for each of the i th observations and the full model; instead, Equation (3) can further be simplified as ( [3,8,9]):Ĉ where is the ISR and p ii q ii (q ii = 1 − p ii ) is referred to as the potential [7][8][9]. Interestingly, the Cook's distance is a measure of influence based on the potential (  Hadi [10] demonstrated the drawback of methods that are multiplicative of functions, such as the Cook's distance [3], Andrews-Pregibon statistic [5], Cook and Weisberg statistic [8], etc. (see [10] for details), and proposed a method that is additive of the functions. Though both the multiplicative and additive methods are functions of residuals and leverage values, the former diminishes towards zero for smaller value of any of the two functions or both, while in the latter case, the measure is large if one of the two functions or with k, the number of the parameters in the model, I = {i 1 , i 2 , · · · , i m } the set of indices of observations of length m, and P I the leverage indexed by I. For m = 1 and I = i, Equation (7) simplifies to: e T e is the square of the i th normalized residual. Hadi [10] suggested a cut-off point for Hadi's potential (p oii ) and , H 2 i denoted as (l 1 ) which is given as follows: where c = 2, 3, s = ∑ p ii and p oi is the vector of Hadi's potential. Since both the mean and the standard deviation are easily affected by outliers, he suggested to employ such a confidence-bound type of cut-off points by replacing the mean and the standard deviation by robust estimators, namely the median and normalized median absolute deviation, respectively. The resulting cut-off point is denoted as l 2 ;

Influential Observations in Spatial Regression Models
The general spatial autoregressive model (GSM) ( [21,35,36]) includes the spatial lag term and spatially correlated error structure. The data generating process (DGP) of the general spatial model is given by: where y is an n × 1 vector of dependent variables. X is an n × k matrix of explanatory variables. W 1 and W 2 are n × n spatial weight matrices. I n is an n × n identity matrix. ξ is the spatially correlated error term, and ε is the random residual term. The parameter ρ is the coefficient of the spatially lagged dependent variables W 1 y, and λ is the coefficient of the spatially correlated errors.
The general spatial autoregressive model in Equation (9) can be rewritten as: where Estimation of the parameters is achieved using the maximum likelihood estimation method. The log-likelihood function (L) is given by: Letρ,λ,σ 2 ,β be the maximum likelihood estimates (MLEs) of ρ, λ, σ 2 , β, respectively. The MLEs are obtained iteratively using numerical methods in the maximum likelihood Symmetry 2021, 13, 2030 6 of 20 estimation. Anselin [35] and LeSage [36] discussed the maximum likelihood estimation procedure of the parameters.

Leverage in Spatial Regression Model
Denote the vector of parameters in Equation (11) as β ay . The estimate of β ay ,β ay , is given by:β The model (11) is viewed as fitting a general linear model, Ay on X, that has correlated residual terms. Set z = Ay, where var(Ay) = σ 2 V. Therefore, The hat matrix, in this case, is given by P ay , Let Q ay = I n − P ay . Though P ay and Q ay have satisfied the idempotence property and their sum of diagonal elements equals k and n − k, respectively, they are not symmetric. As a result, they are not positive semi-definite, and as such, the diagonal elements of P ay will have negative values. The hat matrices P ay and Q ay are not symmetric, and their diagonal values do not lie between 0 and 1 (inclusive).
Martins [15] proposed a measure of leverage that is orthogonal, in the models with correlated residuals, whose diagonal values lie in the interval [0, 1], which we denote by P * ay , such that: Let Q * ay = I n − P * ay . P * ay and Q * ay are idempotent, symmetric and orthogonal with respect to V, i.e.,

1.
P * ayV P * ay = P * ay 2. Q * ayV Q * ay = Q * ay 3. P * ayV Q * ay = 0 Note that the sum of the diagonal elements of P * ay and Q * ay , the leverage, does not sum to k and n − k.
Again, consider a new set of dependent variables obtained by pre-multiplying Equation (11) by the matrix B (B as defined in Equation (10)) so that z * = BAy. Schall and Dunne [14] defined the matrix V −1 as a singular value decomposition such that V −1 = B∆B T ; where B is of the same order as V −1 and ∆ is a diagonal matrix. The transformation z * is the principal component score. Puterman [13] and Haining [19] defined it as canonical variates such that BX X T V −1 X X T B T is positive semi-definite. By setting z * = BAy, Equation (9) is rewritten in a generalized least squares (GLS) form as: where X * = BX.
The estimateβ s of β s is now given by: where,Â = I n −ρW 1 andB = I n −λW 2 . Note thatŷ is deduced from Equation (13) as follows:BÂŷ =BX X TB TB X −1 Denote the projection matrix in the transformed spatial regression model as P s , then: The properties of the leverage in the transformed spatial model in Equation (13) are: Property I: idempotent and symmetric. Property Ia: idempotence Hence, P s is idempotent. Property Ib: symmetric The matrix P s is symmetric. Therefore, P s in the transformation z * =BÂy is both idempotent and symmetric.
Property II: the sum of the diagonal terms of the projection matrix is k, the number of parameters including the constant term.
ps ii = k. ps ii is the i th diagonal element of the leverage P s .
Property III: bounds on the spatial leverage. The bound on the leverage of the classical regression is 0 ≤ p ii ≤ 1 due to the fact that the hat matrix P satisfies all the orthogonal properties, including symmetry. As such, it is positive semi-definite. However, the spatial leverage P ay is not symmetric because positive semi-definite matrix is symmetric [37][38][39]. The transformation in Equation (11) yields the projection P s that satisfies the symmetry condition.
From the idempotent property of P s , Equating diagonal terms of LHS and RHS, we have: where ps ij are the off-diagonal terms. Equation (14) implies that ps ii ≥ 0. Therefore, Note that P s and Q s are orthogonal: The model in Equation (9) gives rise to different special spatial regressions in accordance with different restrictions. Such special spatial regression models are the spatial autoregressive-regressive model (SAR) and the spatial error model (SEM). While the former has spatial autoregression in the response variable, the latter has spatial autoregression in the model residual; model (9) (GSM) combines both features.
The spatial autoregressive-regressive model is obtained when the coefficient of the lagged spatial autoregression in the residuals of Equation (9) is zero, i.e., λ = 0. Thus, the SAR model is given by: The P s corresponding to the model in Equation (13) reduces to: with the transformation in Equation (11) simplifying to z * = Ay, since V = B T B −1 and B = I n , when λ = 0. Clearly, the hat matrix in the SAR model preserves the features of the hat matrix in the classical regression model.
In the spatial error model (SEM), the coefficient of the spatial autoregression on the lagged dependent variable is zero, i.e., ρ = 0. This yields the model: The transformation in Equation (11) simplifies to z * = By, and the projection matrix remains: It can be observed that the leverage measure in the spatial regression model is dominated by the autocorrelation in the residual term.
Works on spatial regression diagnostics in the literature mainly focus on the autocorrelation in the residuals, mostly using a time series analogy [13][14][15]. Some remarkable works on the spatial regression model can be found in [18,19,21].

Influential Observations in Spatial Regression Model
The leverages P s and Q s in Equation (11) satisfy all the properties of a projection matrix, including that the sum of the diagonal terms of P s and Q s equal k and n − k, respectively. It also incorporates the autocorrelation in the dependent variables, Wy. Hence, it can be used as a diagnostic measure of leverage points in a spatial regression model.
By extending the results of linear regression to spatial regression with slight modification, the Cook's distance in the spatial regression of Equation (13), denoted as CD si , can be formulated as follows: V (i,i) andÂ (i,i) denoteV andÂ with the i th row and the i th column deleted, respectively. The spatial Cook distance, CD si , is declared large if CD si > 0.70 [19]. In its simplified form, the Cook's distance in spatial regression is written aŝ where t si is the spatial Studentized prediction residual (also called spatial internally Studentized residual), p si is the spatial leverage, which is the i th diagonal element of P s , and q si = 1 − p si . Let r si = y i −ŷ i , then: where b i and a i are the i th columns of matrices B and A, respectively. The spatial Studentized residual has a cut-off point of 2 to declare a point large [19,40]. Similarly, the spatial externally Studentized residual (ESRs), is defined as: whereσ (i) is the residuals mean square excluding the i th case. The ESRs follow a Student's t-distribution with (n − k − 1) degrees of freedom. Thus, the spatial Studentized prediction residuals contain the neighbourhood information of both the dependent variable and the residual of each r si , and the leverage P s contains the residual autocorrelation effect. The spatial potential, which is analogous to the potential in [10], is defined in Equation (19) as: where q si = 1 − p si . Let q osi = 1 − p osi . We define the spatial measure of overall potential influence as When measuring the influence of an observation in a linear regression model by using the Cook's distance [3], the observation in question is deleted, and the model is then refitted. In a similar way, usually a group of suspected influential observations is deleted in the linear regression and admitted into the model if it is proven clean (BACON [41,42], DGRP [11]). This is because IOs in linear regression are global in nature; however, in a spatial regression model, IOs are local. Haining [20] noted that spatial outliers are local in nature; their attribute values are outliers if they are extreme relative to the set of values in their neighbourhood on the map. IOs in spatiotemporal statistics usually carry vital information in applications. Kou et al. [26] further pointed out that detecting spatial outliers can help in locating extreme meteorological events such as tornadoes and hurricanes, identify aberrant genes or tumour cells, discover highway traffic congestion points, pinpoint military targets in satellite images, determine possible locations of oil reservoirs and detect water pollution incidents. Thus, measuring the influence of multiple spatial locations requires a contiguous set of points to reveal the unusual features related to that neighbourhood.
Although methods that detect multiple outliers in spatial regression work well (see [21]), we refer to methods that group observations as clean or suspect, irrespective of their positions (with reference to spatial data), and admit them into the model as clean observations according to some conditions. According to Hadi [10], examining each value of influence measure alone, such as P si , ISRs, ESRs, CD si and H 2 si , might not be successful to indicate the IOs or the source of influence. Imon [43] and Mohammed [44] noted that one should consider both outliers and leverage points when identifying IOs. The easiest way to capture IOs is by using diagnostic plots. Following [43,45], we adopt their rules for the classification of observations into four categories, namely regular observations, vertical outliers, GLPs and BLPs. Once observations are classified accordingly, those observations that fall in the vertical outliers and BLPs categories are referred to as IOs. However, due to the local nature of spatial IOs, we have to make some modifications to the classification scheme. In this paper, a new diagnostic plot is proposed by plotting the ISRs (or ESRs) on the Y-axis against the spatial potential, P osi , on the X-axis. We consider the ISRs and ESRs because both measures contain spatial information. On the other hand, the potentials that are obtained from the transformed model in Equation (13) are considered in order to reflect spatial dependence. Hence, the proposed diagnostic plots are denoted as ISRs − P osi and ESRs − P osi plot, and they are based on the following classification schemes:

Results and Discussions
In this section, the performance of all the proposed methods, i.e., the Cook's Distance (̂), 2 ( 1 2 (non-robust) and 2 2 (robust)), − and − , is evaluated using a simulation study, artificial data and real datasets of gasoline price data in the southwest area of Sheffield, UK, COVID-19 data in the counties of the State of Georgia, USA and the life expectancy data in counties of the USA.

Simulated Data
We simulated the spatial regression model in Equation (9) for a square spatial grid with sample size, = 400, = 0.4, = 0.5 and = , using row-standardized Queen's contiguity spatial weights.
= , ∼ (0,1), = , = (bold face 0 and 1 refer to column vectors of values zeros and ones, respectively). The contamination is taken at two percent in each of and directions. The contamination in the direction is taken from the Cauchy distribution because of its fat tails. Contamination in the direction is taken from the following multivariate distribution, i th observation is declared RO if |ESRs| < t n−k−1 and p osi < l 2 . ii i th observation is GL if |ESRs| < t n−k−1 and p osi > l 2 . iii i th observation is IO if |ESRs| > t n−k−1 and p si > l 2 . iv i th observation is IO if |ESRs| ≥ t n−k−1 and p si ≤ l 2 .

Results and Discussions
In this section, the performance of all the proposed methods, i.e., the Cook's Distance (ĈD si ), H 2 si (H 2 si1 (non-robust) and H 2 si2 (robust)), ISRs − P osi and ESRs − P osi , is evaluated using a simulation study, artificial data and real datasets of gasoline price data in the southwest area of Sheffield, UK, COVID-19 data in the counties of the State of Georgia, USA and the life expectancy data in counties of the USA.

Simulated Data
We simulated the spatial regression model in Equation (9) for a square spatial grid with sample size, n = 400, ρ = 0.4, λ = 0.5 and W 1 = W 2 , using row-standardized Queen's contiguity spatial weights. x 0 = 1, x 1 ∼ N(0, 1), β 0 = 0, β 1 = 1 (bold face 0 and 1 refer to column vectors of values zeros and ones, respectively). The contamination is taken at two Symmetry 2021, 13, 2030 12 of 20 percent in each of X and y directions. The contamination in the y direction is taken from the Cauchy distribution because of its fat tails. Contamination in the X direction is taken from the following multivariate distribution, However, it is important to note that during the contamination, some of the contaminations may have attributes similar to those in their neighbourhood, as noted by Dowd [46], and spatial simulation is conditioned to a real dataset. Figure 3 shows the graph of average attribute values in the neighbourhood of locations against their attribute values with added contamination. It can be observed that some of the added contamination, in black dots, are in the middle of clean data points while some stand out from the bulk of the data (i.e., away from their average neighbourhood values), which clearly indicates outlyingness.  In order to confirm the outlyingness of the locations classified as spatial IOs, the threshold of each outlier neighbourhood given by med + 3  In order to confirm the outlyingness of the locations classified as spatial IOs, the threshold of each outlier neighbourhood given by is computed for the Studentized residuals of the classified location and its immediate neighbourhood, where med i is the median of the Studentized residuals and MAD i is the median absolute deviation. The absolute value of the Studentized residuals is compared to the neighbourhood threshold for confirmation as an outlier.
The CD si detected location 201, which has large ISRs, ESRs and p si . ISRs − P osi and ESRs − P osi classified locations 1, 4, 35, 51, 91, 201 and 265 as IOs. As noted on Figure 4, ISRs − P osi and ESRs − P osi classified locations 1, 4, 35, 91 and 265 as outliers in the y direction, and locations 51 and 201 in both X and y directions. The cut-off limits of ESRs − P osi are narrower than 2 for the 5% cut-off point of the Student's t-distribution, which is around 1.96 for large sample sizes.   In a 1000-run of the simulation described above at different error variances of 0.01, 0.1, 0.2 and 0.3 as shown in Table 2, the consistently maintained low classification of influential observations with consistent swamping rates of 0%. The − demonstrated a high detection to the tune of 98% while − had 100% accurate classification of the IOs, both with swamping rates of 0%. 1 2 had less than 40% accurate classification with zero swamping rate, while the 2 2 had up to 99% accurate IO classification, but usually with very high swamping rates. Table 2. Influential observations classification rate based on large prediction Studentized residuals and large potentials.

Method
Accurate   In a 1000-run of the simulation described above at different error variances of 0.01, 0.1, 0.2 and 0.3 as shown in Table 2, the CD si consistently maintained low classification of influential observations with consistent swamping rates of 0%. The ISRs − P osi demonstrated a high detection to the tune of 98% while ESRs − P osi had 100% accurate classification of the IOs, both with swamping rates of 0%. H 2 si1 had less than 40% accurate classification with zero swamping rate, while the H 2 si2 had up to 99% accurate IO classification, but usually with very high swamping rates.

Example 1
The gasoline price data for 61 retail sites in the southwest area of Sheffield from [19] were used in Example 1. The analysis indicated the presence of spatial interaction in the error term with a Moran's I of 0.239. The fitted SEM model is given by Equation (21): where, y M and X M are the March and February sales from the southwest Sheffield gasoline sale data, respectively,λ = 0.15 is the estimate of coefficient of correlation in the residuals, W is the standardized weight matrix and ξ is the vector of correlated residuals. Table 3 shows the results of the detected IOs in the SEM model for the gasoline data with all the sites detected by the methods. A "yes" under a method column indicates that the site has been detected by the method as IO and a "no" means otherwise. The values in bold in columns ISRs, ESRs and p si indicate large Studentized residuals and potentials greater than 0.0335, respectively. Figure 5 shows the classification of observations by ISRs − P osi and ESRs − P osi .  2 2 detects 11 more sites as IOs in addition to site 25. Haining [19] has made elaborate diagnostic analysis of the data where he emphasized the effect of site 25 as IO in the data. Our methods have classified site 30 in addition to location 25 as IO. Figure 5 shows the graph of the lagged residuals against the residuals. It is noticeable from the graph that site 30 has also been marked as an IO.
Though the   The CD si , ISRs − P osi , ESRs − P osi , and H 2 si1 coincidentally identified site 25 only as IO. H 2 si2 detects 11 more sites as IOs in addition to site 25. Haining [19] has made elaborate diagnostic analysis of the data where he emphasized the effect of site 25 as IO in the data. Our methods have classified site 30 in addition to location 25 as IO. Figure 5 shows the graph of the lagged residuals against the residuals. It is noticeable from the graph that site 30 has also been marked as an IO.
Though the H 2 si2 has detected all the suspected IOs, it is prone to swamping. The remaining high potentials are classified as GLP by ISRs − P osi and ESRs − P osi since their Studentized values are small. Figure 6 shows the graph of classification of the ISRs − P osi (a) and ESRs − P osi (b) indicating the outliers in red dots, where both are classified as outliers in both the X and y directions.

Example 2
The data for example 2 were the COVID-19 data for the 159 counties of the State of Georgia, USA, as of 30 June 2020 (http://dph.georgia.gov/covid-19-daily-status-report; accessed on 30 June 2020) and the health ranking (http://www.countyheathrankings.org; accessed on 30 June 2020). The case-rate per 100,000 of COVID-19 was the dependent variable. The independent variables were the population of black race in the county (X 1 ), population of Asians (X 2 ), population of Hispanic (X 3 ), population of people that are 65 years and above (X 4 ), population of female in the county (X 5 ) and life expectancy (X 6 ).
The model was fitted with the SAR model (model with the lowest Akaike information criteria (AIC) value of 2192). The SAR model is presented in Equation (22):  Table 4 shows the detected locations by the various methods with large ISRs, ESRs and high potentials in bold font.
The IOs identified by ISRs − P osi and ESRs − P osi both have large Studentized residuals and large potentials as can be observed in Table 4. Figure 7 shows the outliers in X, y and both X and y directions. The CD si detected the largest Studentized residual with a high potential as IO. The H 2 si1 identified two observations with large Studentized values and high potential values. The H 2 si2 detected all suspected IOs, but with many having both small values of Studentized residuals and potential values.  While examining the outlyingness of the classified counties, we find that county 50 is clearly an IO since it has both large Studentized residual and a large potential value. It is outside the threshold value of its neighbourhood.
The spatial error model (SEM) had the lowest AIC value and was fitted to the data. The model was significant at the 5% level with a significant Moran's I of 0.2160. 1 and While examining the outlyingness of the classified counties, we find that county 50 is clearly an IO since it has both large Studentized residual and a large potential value. It is outside the threshold value of its neighbourhood.
Four of the counties classified by ISRs − P osi and ESRs − P osi (i.e., 26, 50, 70 and 128) are classified as vertical outliers while the counties 3, 49, 120, 135, 141 and 142 have large potential values and Studentized values greater than their threshold values and are classified as BLPs and hence IOs.
Besides the counties classified by ISRs − P osi and ESRs − P osi , all the other counties detected by H 2 si2 have their Studentized difference residuals below their neighbourhood threshold. Though their potential values are mostly large, their prediction Studentized residuals are small in both ISRs and ESRs.

Example 3
In example 3, the life expectancy of the counties of the US was measured by population density (X 1 ), fair/poor health status (X 2 ), obesity (X 3 ), population in rural area (X 4 ), inactivity rate (X 5 ), population of smokers (X 6 ), population of black people (X 7 ), population of Asians (X 8 ) and population of Hawaiians (X 9 ). The data were obtained from the Kaggle website (https://www.kaggle.com/johnjdavisiv/us-counties-covid19-weathersociohealth-data; accessed on 13 December 2020). The spatial error model (SEM) had the lowest AIC value and was fitted to the data. The model was significant at the 5% level with a significant Moran's I of 0.2160. X 1 and X 4 were not significant at the 5%. All the other estimates were significant at the 5% level.
The fitted model is given by: The ISRs − P osi classified 139 counties as IOs, while ESRs − P osi classified eight more counties, making a total of 147. H 2 si1 and H 2 si2 have classified 24 and 324 counties as IOs, respectively. CD si classified no county as IO.

Conclusions
In this article, we demonstrated the application of influential observations (IOs) detection techniques from the classical regression to the spatial regression model. Measures that contained spatial information in the spatial autoregression in the dependent variables and residuals were obtained. We also evaluated the performance of some methods employed in classical regression to their spatial counterparts. Though the methods work well in classical regression models, they are mostly prone to either masking or swamping in spatial applications. This is attributable to the local nature of spatial outliers. Hence, we proposed new ISRs − P osi and ESRs − P osi plots to classify observations into four categories: regular observations, vertical outliers, good leverage points and bad leverage points, whereby IOs are those observations which fall in the vertical and bad leverage point categories. Interestingly, the proposed ESRs − P osi diagnostic plot was very successful in classifying observations into the correct categories followed by the ISRs − P osi , as demonstrated by the results obtained from a simulation study and real data examples. Thus, the newly established ESRs − P osi plot can be a suitable alternative to identify IOs in the spatial regression model.