Which Are the Effects of Hard and Soft Equality Constraints on the Iterative Outlier Elimination Procedure?

In this paper we evaluate the effects of hard and soft constraints on the Iterative Data Snooping (IDS), an iterative outlier elimination procedure. Here, the measurements of a levelling geodetic network were classified according to the local redundancy and maximum absolute correlation between the outlier test statistics, referred to as clusters. We highlight that the larger the relaxation of the constraints, the higher the sensitivity indicators MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) for both the clustering of measurements and the clustering of constraints. There are circumstances that increase the family-wise error rate (FWE) of the test statistics, increase the performance of the IDS. Under a scenario of soft constraints, one should set out at least three soft constraints in order to identify an outlier in the constraints. In general, hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. In that process, one should opt to set out the redundant hard constraints. After identifying and removing possible outliers, the soft constraints should be employed to propagate their uncertainties to the model parameters during the process of least-squares estimation.


Introduction
It is very common to build models (i.e. the equation systems) based on some initial knowledge about a given problem. In other words, models are often set up in a way that the model parameters need to fulfil certain constraints. Such constraints are a prior knowledge embedded into a model to avoid a trivial solution; to guarantee the stability of estimates; to improve the precision and accuracy of the results by reducing the number of unknown parameters or accordingly by increasing the redundancy of the system; and to mitigate (or even estimate) a possible systematic effect [1][2][3]. For example, [4] adopted constraints to determine the transponder coordinates in a problem of combining satellite positioning (GNSS) of a surface platform with acoustic ranging to seafloor transponders; [5][6][7][8][9][10] have used constraints to model the atmospheric effects on GNSS signals; and [11] have imposed the constraints of predicted satellite clocks to improve the precise orbit determination (POD) processing during maneuvers.
The models are usually formulated with minimal constraint or extra (redundant) constraints. In that case, we are referring to the so-called equality constraints which are usually incorporated into a system of equations in order to try to have a model well posed [12].
For the most part, minimal constraints are introduced to solve to the problem of rank deficiency in linear systems. The rank deficiency is often caused by the lack (or insufficient) information about a problem. In the field of geodesy, for example, minimal constraints are external information whose primary role is to specify the coordinate system to which the network station positions will be estimated by least-squares method (LS). This problem is known as datum definition (or also zero-order design or datum choice problem) [13][14][15][16][17][18]. Several works have been investigated in the sense of minimum-constrained adjustment and the datum choice problem in the geodetic literature, focusing on topics like free-adjustment and the role of inner constraints (see e.g. [19][20][21][22]). 135). Although here we restrict ourselves to the case of one outlier at a time, IDS can also be applied for the case of multiples (simultaneous) outliers [49]. For more details about multiples (simultaneous) outlier refers to [50][51][52].
There are chances of correct and false decisions of IDS, because these procedure is based on statistical hypothesis testing. Recently, Rofatto et al. [45] have provided an algorithm based on Monte Carlo to determine the probability levels associated with IDS. In that case, they described six classes of decisions for IDS, namely probability of correct identification (P CI ), probability of missed detection (P MD ), probability of wrong exclusion (P WE ), probability of over-identification positive (P over+ ), probability of over-identification negative (P over− ) and probability of statistical overlap (P ol ), as follows: • P CI : Probability of identifying and removing correctly an outlying measurement; • P MD : Probability of not detecting the outlier (i.e. Type II decision error for IDS); • P WE : Probability of identifying and removing a non-outlying measurement while the 'true' outlier remains in the dataset (i.e. Type III decision error [53] for IDS); • P over+ : Probability of identifying and removing correctly the outlying measurement and others; • P over− : Probability of identifying and removing more than one non-outlying measurement, whereas the 'true outlier' remains in the dataset; • P ol : occurs in cases where one alternative hypothesis has the same distribution as the another one. These hypotheses cannot be distinguished, because their test statistics are numerically the same, violating the IDS rule of one outlier at a time. In that case, they are nonseparable and an outlier cannot be identified. In other words, it corresponds to the probability of flagging simultaneously two (or more) measurements as outliers.
Based on the probability of correct detection (P CD ) and probability of correct identification (P CI ), the minimal biases, MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias), can be computed as sensitivity indicators for outlier detection and identification, respectively. "Outlier detection" only informs whether or not there might have been at least one outlier. However, the detection does not tell us which measurement is an outlier. The localization of the outlier is a problem of "outlier identification", i.e. "outlier identification" implies the execution of a search among the measurements for the most likely outlier [45]. Therefore, the smallest value of an outlier that can be detected given a certain P CD defines the MDB. On the other hand, the smallest value of an outlier that can be identified given a certain P CI defines the MIB.
In this paper, we investigate the effects of models subject to constraints (minimum, redundant, hard and soft) on the probability levels associated with IDS. It is important to emphasised that if a standard-deviation of a constraint (or a set of a constraint) is changed from zero to a nonzero value, it is called "relaxation" of the constraint [28]. Here, we also evaluate the effect of relaxing constraints on the MIB and MDB. This kind of assessment is a kind of sensitivity analysis. We also highlight that the task of clustering a set of geodetic measurements is applied for the first time in this study. We show that the clusters can be defined according to two deterministic parameters: local redundancy and correlation between the outlier test statistics. Moreover, critical values optimized by Monte Carlo method were used here [45,46] in order to compute the decision classes associated with IDS, i.e. P CI , P MD , P WE , P over+ , P over− and P ol .

Theoretical Framework
The IDS is an important case of multiple hypothesis testing. In this section, therefore, we briefly present the principal elements involved for the case of multiple testing.
First, the null hypothesis, denoted by H 0 , is formulated under the condition that random errors are normally distributed with expectation zero, i.e. in the absence of outliers. Thus, the null hypothesis H 0 of the standard Gauss-Markov model in the linear or linearised form is given by [33]: H 0 : E{y} = Ax + E{e} = Ax; D{y} = Q e (1) where E{.} is the expectation operator, D{.} is the dispersion operator, y ∈ R n×1 is the vector of measurements, A ∈ R n×u is the coefficient matrix, x ∈ R u×1 is the unknown parameter vector, e ∈ R n×1 is the unknown vector of measurement errors and Q e ∈ R n×n is the positive-definite covariance matrix of the measurements y. Under normal working conditions (i.e., H 0 ), the measurement error model is then given by e ∼ N(0, Q e ), First, we assume that the coefficient matrix A suffers from a rank deficiency, i.e. u − rank(A) > 0. In that case, minimal constraints are added to solve the problem of the rank-deficient system. An example of how to handle the problem of the rank-deficient is given in the section 3.1, where a minimal constraint in Equation 4 was added to the matrix A in 3. This means that the columns of the design matrix A in Equation 1 become linearly independent, i.e. the matrix A become full rank, such that u − rank(A) = 0.
The best linear unbiased estimator (BLUE) of e under H 0 is the well-known estimated least-squares residual vectorê ∈ R n×1 , which is given bŷ withx ∈ R u×1 being the BLUE of x under H 0 ; W ∈ R n×n is the known matrix of weights, taken as W = σ 0 2 Q −1 e , where σ 2 0 is the variance factor, I ∈ R n×n is the identity matrix and R ∈ R n×n is known as the redundancy matrix. The R matrix is an orthogonal projector that projects onto the orthogonal complement of the range space of A. The main diagonal elements of matrix R are known as local redundancy numbers (denoted by r i of the the system. The larger the number of local redundancy for a given measurement, the larger the degree of importance of that measurement for the model, i.e. the larger the absorption of a possible error of that measurement into their corresponding least-squares residual. The degrees of freedom r (i.e. the overall redundancy) of the model under H 0 (Equation (1)) is On the other hand, an alternative model is proposed when there are doubts about the reliability level of the model under H 0 . Here, we assume that the validity of the null hypothesis H 0 in Equation (1) can be violated if the dataset is contaminated by outliers. The model in an alternative hypothesis, denoted by H A , is to oppose Equation (1) by an extended model that includes the unknown vector ∇ ∈ R q×1 of deterministic bias parameters as follows ( [29,31]): where C ∈ R n×q is the matrix that relates bias parameters, i.e., the values of the outliers to observations. We restrict ourselves to the matrix (A C) having full column rank, such that One of the most used procedures based on hypothesis testing for outliers in linear (or linearised) models is the well-known data snooping method [29,54]. This procedure consists of screening each individual measurement for the presence of an outlier [49]. In that case, data snooping is based on a local model test, such that q = 1, and therefore, the n alternative hypothesis is expressed as Now, matrix C in Equation (6) is reduced to a canonical unit vector c i , which consists exclusively of elements with values of 0 and 1, where 1 means that the ith bias parameter of magnitude ∇ i affects the ith measurement, and 0 means otherwise. In that case, the rank of (A c i ) ∈ R n×(u+1) and the vector ∇ in Equation (6) reduces to a scalar ∇ i in Equation (42), i.e., c i = 0 0 0 · · · 1 i th 0 · · · 0 T . When q = n − u, an overall model test is performed. For more details about the overall model test, see, for example, [55,56].
Note that the alternative hypothesis H (i) A in Equation (42) is formulated under the condition that the outlier acts as a systematic effect by shifting the random error distribution under H 0 by its own value [44]. In other words, the presence of an outlier in a dataset can cause a shift of the expectation under H 0 to a nonzero value. Therefore, hypothesis testing is often employed to check whether the possible shifting of the random error distribution under H 0 by an outlier is, in fact, a systematic effect (bias) or merely a random effect. This hypothesis test-based approach is called the mean-shift model (see, e.g., [18,23,29,46,47,52,54,[57][58][59][60][61][62][63][64][65].
In the context of the mean-shift model, the test statistic involved in data snooping is given by the normalised least-squares residual, denoted by w i . This test statistic, also known as Baarda's w-test, is given as follows: The alternative hypothesis in Equation (42) is formulated in the sense that "There is at least one outlier in the vector of measurements y i " [46]. In that case, we are interested in knowing which of the alternative hypotheses may lead to the rejection of the null hypothesis with a certain probability. This means testing H 0 against H (1) A . This is known as multiple hypothesis testing (see, e.g., [47,48,53,55,57,58,[66][67][68][69][70]). In that case, the test statistic coming into effect is the maximum absolute Baarda's w-test value (denoted by max-w), which is computed as [47] max-w = max i∈{1,··· ,n} The decision rule for this case is given by The decision rule in Equation 45 says that if none of the n w-tests get rejected, then we accept the null hypothesis H 0 . If the null hypothesis H 0 is rejected in any of the n tests, then one can only assume that detection occurred. In other words, if the max-w is larger than some percentile of its probability distribution (i.e., some critical valuek), then there is evidence that there is an outlier in the dataset. Therefore, "outlier detection" only informs us whether the null hypothesis H 0 is accepted or not [45].
However, the detection does not tell us which alternative hypothesis H

(i)
A would have led to the rejection of the null hypothesis H 0 . The localisation of the alternative hypothesis, which would have rejected the null hypothesis, is a problem of "outlier identification". Outlier identification implies the execution of a search among the measurements for the most likely outlier. In other words, one seeks to find which of Baarda's w-test is the maximum absolute value max-w and if that max-w is greater than some critical valuek.
Therefore, the data snooping procedure of screening measurements for possible outliers is actually an important case of multiple hypothesis testing and not single hypothesis testing. Moreover, note that outlier identification only happens when outlier detection necessarily exists; i.e., "outlier identification" only occurs when the null hypothesis H 0 is rejected. However, correct detection does not necessarily imply correct identification [47,55,68].
In a special case of having only one single alternative hypothesis, one should decide between the null hypothesis H 0 and only one single alternative hypothesis H (i) A of Equation (42). In that case, the false decisions are restricted to Type I error and Type II error. The probability of a Type I Error α 0 is the probability of rejecting the null hypothesis H 0 when it is true, whereas the probability of a Type II error β 0 is the probability of failing to reject the null hypothesis H 0 when it is false (note: the index '0' represents the case in which a single hypothesis is tested). Instead of α 0 and β 0 , there is the confidence level CL = 1 − α 0 and the power of the test γ 0 = 1 − β 0 , respectively. The first deals with the probability of accepting a true null hypothesis H 0 ; the second addresses the probability of correctly accepting the alternative hypothesis H (i) A . In that case, given a probability of a Type I decision error α 0 , we find the critical value k 0 as follows: where Φ −1 denotes the inverse of the cumulative distribution function (cdf) of the two-tailed standard normal distribution N(0, 1). The normalised least-squares residual w i follows a standard normal distribution with the expectation that E{w i } = 0 if H 0 holds true (there is no outlier). On the other hand, if the system is contaminated with a single outlier at the ith location of the dataset (i.e., under H where λ 0 is the non-centrality parameter for q = 1. Note, therefore, that there is an outlier that causes the expectation of w i to become √ λ 0 . The square-root of the non-centrality parameter √ λ 0 in Equation (47) represents the expected mean shift of a specific w-test. In such a case, the term c i T Q −1 e QêQ −1 e c i in Equation (47) is a scalar, and therefore, it can be rewritten as follows [71]: where |∇ i | is the Minimal Detectable Bias (MDB 0 (i) ) for the case in which there is only one single alternative hypothesis, which can be computed for each individual alternative hypothesis according to Equation (42). For a single outlier, the variance of an estimated outlier, denoted by σ 2 Thus, the MDB can also be written as where is the standard deviation of estimated outlier ∇ i . The MDB in Equations (48) or (16) of an alternative hypothesis is the smallest-magnitude outlier that can lead to the rejection of the null hypothesis H 0 for a given α 0 and β 0 . Thus, for each model of the alternative hypothesis H (i) A , the corresponding MDB can be computed [47,72,73]. The limitation of this MDB is that it was initially developed for the binary hypothesis testing case. In that case, the MDB is a sensitivity indicator of Baarda's w-test when only one single alternative hypothesis is taken into account. In this article, we are confined to multiple alternative hypotheses. Therefore, both the MDB and MIB are computed by considering the case of multiple hypothesis testing.
For a scenario coinciding with the null hypothesis H 0 under multiple testing hypothesis, there is the probability of incorrectly identifying at least one alternative hypothesis. This type of wrong decision is known as the family-wise error rate (FWE). The FWE is defined as which is approximately where α 0 is the significance level for an individual test. The quantity in Equation (18) is just equal to the upper bound of the Bonferroni inequality, i.e., α ≤ nα [74]. For example, if the FWE level (α ) is 0.05 and one is running 5 tests, then each test will have an α 0 of 0.05/5 = 0.01. In other words, one uses a global Type I Error rate α that combines all tests under consideration instead of an individual error rate α 0 that only considers one test at a time time [70]. In that case, the critical value k bon f is computed as The Bonferroni in Equation (18) is a good approximation for the case in which alternative hypotheses are independent. In practice, however, the test results always depend on each other to some degree because we always have a correlation between w-tests. The correlation coefficient between any Baarda's w-test statistic (denoted by ρ w i ,w j ), such as w i and w j , is given by [57] The correlation coefficient ρ w i ,w j can assume values within the range [−1, 1].
Here, the extreme normalised residuals max-w (i.e., maximum absolute) in Equation (44) are treated directly as a test statistic. Note that when using Equation (44) as a test statistic, the decision rule is based on a one-sided test of the form max-w ≤k. However, the distributions of max-w cannot be derived from well-known test distributions (e.g., normal distribution). The procedure to compute the critical value of max-w is given step-by-step by Rofatto et al. [45].
The other side of the multiple testing problem is the situation in which there is an outlier in the dataset. In that case, apart from Type I and Type II errors, there is a third type of wrong decision associated with Baarda's w-test. Baarda's w-test can also flag a non-outlying observation while the 'true' outlier remains in the dataset. We are referring to the Type III error [53], also referred to as the probability of wrong identification (P W I ). The description of the Type III error involves a separability analysis between alternative hypotheses [57,66,68,69]. Therefore, we are now interested in the identification of the correct alternative hypothesis. In that case, the non-centrality parameter in Equation (47) is not only related to the sizes of Type I and Type II decision errors but also dependent on the correlation coefficient ρ w i ,w j given by Equation (20). 8

of 41
On the basis of the assumption that one outlier is in the ith position of the dataset (i.e., H (i) A is 'true'), the probability of a Type II error (also referenced as the probability of "missed detection", denoted by P MD ) for multiple testing is and the size of a Type III wrong decision (also called "misidentification", denoted by P W I ) is given by On the other hand, the probability of correct identification (denoted by P CI ) is with 1 − P CI = 1 − P CD + P W I = P MD + P W I (24) Note that the three probabilities of missed detection P MD , wrong identification P W I and correct identification P CI sum up to unity: i.e., P MD + P W I + P CI = 1.
The probability of correct detection P CD is the sum of the probability of correct identification P CI (selecting a correct alternative hypothesis) and the probability of misidentification P W I (selecting one of the n-1 other hypotheses), i.e., P CD = P CI + P W I The probability of wrong identification P W I is identically zero, P W I = 0, when the correlation coefficient is exactly zero, ρ w i ,w j = 0. In that case, we have The relationship given in Equation (26) would only happen if one neglected the nature of the dependence between alternative hypotheses. In other words, this relationship is valid for the special case of testing the null hypothesis H 0 against only one single alternative hypothesis H Since the critical region in multiple hypothesis testing is larger than that in single hypothesis testing, the Type II decision error (i.e., P MD ) for the multiple test becomes smaller [47]. This means that the correct detection in binary hypothesis testing (γ 0 ) is smaller than the correct detection P CD under multiple hypothesis testing, i.e., P CD > γ 0 (27) Detection is easier in the case of multiple hypothesis testing than single hypothesis testing. However, the probability of correct detection P CD under multiple testing is spread out over all alternative hypotheses, and therefore, identifying is harder than detecting. From Equation (25), it is also noted that detection does not depend on identification. However, outlier identification depends on correct outlier detection. Therefore, we have the following inequality: Note that the probability of correct identification P CI depends on the probability of missed detection P MD and wrong identification P W I for the case in which data snooping is run only once, i.e., a single round of estimation and testing. However, in this paper, we deal with data snooping in its iterative form (i.e., IDS), and therefore, the probability of correct identification P CI depends on other decision rules. 9 of 41 In contrast to the data snooping single run, the success rate of correct detection P CD for IDS depends on the sum of the probabilities of correct identification P CI , wrong exclusion (P WE ), over-identification cases (P over+ and P over− ), and statistical overlap (P ol ), i.e., It is important to mention that the probability of correct detection is the complement of the probability of missed detection. Note from Equation (29) that the probability of correct detection P CD is available even for cases in which the identification rate is null, P CI = 0. However, the probability of correct identification (P CI ) necessarily requires that the probability of correct detection P CD be greater than zero. For the same reasons given for the data snooping single run in the previous section, detecting is easier than identifying. In that case, we have the following relationship for the success rate of correct outlier identification P CI : It is important to mention that the wrong exclusion P WE describes the probability of identifying and removing a non-outlying measurement while the 'true' outlier remains in the dataset. In other words, P WE is the Type III decision error for IDS). The overall wrong exclusion P WE is the result of the sum of each individual contribution to P WE , i.e., On the basis of the probability levels of correct detection P CD and correct identification P CI , the sensitivity indicators of minimal biases-Minimal Detectable Bias (MDB) and Minimal Identifiable Bias (MIB)-for a given α can be computed as follows: Equation (33) gives the smallest outlier ∇ i that leads to its detection for a user-defined correct detection rateP CD , whereas (34) provides the smallest outlier ∇ i that leads to its identification for a user-defined correct identification rateP CI .
As a consequence of the inequality in (28), the MIB will be larger than MDB, i.e., MIB ≥ MDB. For the special case of having only one single alternative hypothesis, there is no difference between the MDB and MIB [55]. The computation of MDB 0 is easily performed by Equations (48) or (16), whereas the computation of the MDB in Equation (33) and the MIB in Equation (34) must be computed using Monte Carlo because the acceptance region (as well as the critical region) for the case of multiple alternative hypotheses is analytically intractable.

Case study of Levelling Geodetic Network
The analyse of the constraints effects on the probability levels of IDS are performed by an example of a levelling geodetic network.
A levelling geodetic network is a set of points located on the Earth's surface or near it and interconnected by height difference measurements. Some of these points are associated with a vertical height reference system (i.e. some points have known height), whereas others are parameters (unknown heights). The term "known" here means that those points with known heights are constraints necessary to ensure that parameters (unknown heights) are sufficiently estimable. In the sense of modern geodetic reference systems, realisation and unification of a vertical height system consists of a combination of GNSS (Global Navigation Satellite System) and spirit levelling with geoid models. We will not go into the details about height geodetic reference system and frame. (Readers who wish to have further details on that issue please see e.g. [75]). The mathematical model associated with a geodetic levelling network is linear (i.e. the levelling measurements bear a known linear relationship with the unknown heights). Geodetic networks usually has more measurements than parameters (i.e. n > u), i.e. we have a redundant measurement system. However, due to intrinsic random errors in a system, redundant measurements often lead to an inconsistent system of equations. To make the system consistent we have to introduce the information about the random measurement errors. The stochastical properties of the measurement errors are directly associated with the assumption of the probability distribution of these errors. In geodesy and many other scientific branches the well-known normal distribution is one of the most used as measurement error model [76]. Because of this, the model ceases to be purely mathematical and becomes a statistical model with functional and stochastic part.
In the absence of outliers, i.e. under null hypothesis H 0 in Equation (1), the model for levelling geodetic network can be written as follows: where ∆h i−j is the height difference measured from point i to j and e ∆h i−j is the random error associated with the levelling measurement. For a only one single levelling line, one of these points is the constraint (known height), from which the height of another point is determined. The point with known height is also referred to as control point or benchmark. In a geodetic network, on the other hand, we have a set of levelling lines that connect both from points of unknown height and from points of known height (constraints). Under normal working conditions (i.e. H 0 ), the measurement errors model is then given by Equation (2). In this case study, we demonstrate the application of the proposed algorithm by [45] subject to a different constraints scenarios for that levelling geodetic network.
Recent work has been focused on non-stochastic constraints (hard constraints). Here, we also consider cases where constraints are subject to a random errors (i.e. soft constraints). That soft constraints are non-deterministic, and therefore they are measured model elements that can be designated as a priori information. The approach of this example can be applied in the design stage of geodetic network analysis [62,65]. Furthermore, the experiments performed here can also be extended to geodetic deformation analysis when the deformation effects are unmodelled in the sense of deterministic form [28].

Problem description
To analyse the constraints effects on the IDS, an example is taken from a geodetic levelling network with 12 height differences between the points. The equipment used to measure the level difference can be an electronic digital level. In that case, the levelling measurement system comprises of a special bar-coded staff (also called barcode rod) and a digital level (instrument). A digital level is basically a telescope that enables a horizontal line of sight. Digital levels consist of additional electronic image processing components to automatically read and analyse digital (bar coded) levelling staffs, where the graduation is replaced by a manufacturer dependent code pattern. Generally, the result is automatically stored in the data collector of the digital level. An example of a "digital level -bar-code staff " system is displayed in the Figure 1. For more details about digital level see e.g. [77][78][79][80]. The standard-deviation of the uncorrelated measurements were the same and taken equal to σ = 1mm. The points are indicated as A to G. The eight network configuration are displayed in Figure  2(a, b, c, d, and e) and their details are given as follows: 1. Figure 2(a): Network with 1 hard constraint (i.e. network minimally constrained). Since the dimension of the network is 1D, the minimum information necessary to estimate the unknown heights is one. The height of G was fixed as control point (hard constraint), and 6 unknown heights (A,B,C,D,E,F) were minimally constrained. Therefore, the redundancy of the system (or overall degrees of freedom) was r = n − rank(A) = n − u = 12 − 6 = 6.
2. Figure 2(b): Network with one extra hard constraint (i.e. two hard constraints). The heights A and D were taken as hard constraints (i.e. heights A and D were fixed). The redundancy of the system in that case was r = 12 − 5 = 7 with 5 unknown heights (B,C,E,F,G) over-constrained.
3. Figure 2(c): Network with two extra hard constraints (i.e. three hard constraints).The heights A, D and G were taken as hard constraints. In that case, the redundancy of the system was r = 12 − 4 = 8.
4. Figure 2(d): Network with two soft constraints (A and D). In that case, a standard-deviation larger than zero was assigned to both constraints (i.e. σ c > 0. In other words, A and D were processed as being both observations and unknown parameters, i.e. A and D were pseudo-observations. Here, the both constraints were simultaneously relaxed by considering their uncertainties 10 times worse than measurements (i.e. σ c = 10 × σ = 10mm); 10 times better than measurements (i.e. σ c = 0.1mm); and their uncertainties equal to the measurements (σ c = 1mm). In that case, the redundancy of the system was r = 14 − 7 = 7.
5. Figure 2(e): Network processed with A, D and G as pseudo-observations. Those three constraints were simultaneously relaxed by considering their standard-deviations equal to σ c = 10mm (10 times worse than measurements); σ c = 0.1mm (10 times better than measurements); and σ c = 1mm (the same as the measurements). In that case, the redundancy of the system was r = 15 − 7 = 8.  Figure 2. Levelling Geodetic Network subject to different constraint scenarios.
The following system of equations for that problem is given by: The functional model (A) for the system of equations in 36 is given by: Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2020 doi:10.20944/preprints202003.0467.v1 Note that the rank defect of the matrix A is u − rank(A) = 7 − 6 = 1. In that case, there is needed to add at least one constraint in order to avoid rank deficiency of the matrix A. This is guaranteed when we take one height as known. For example, from the network in Figure 2(a), we have added the height G as known (i.e. as a hard constraint). In that case, the constraint equation should be added into the system in 36, i.e.: noticing that because the standard deviation is zero, the observation is non-stochastic (hard constraint) and the residual e y 13 = 0. This can generate problems in the inversion of the covariance matrix of the observations Q e for the calculation of the weight matrix W, because the weight for that constraint would be 1 0 = ∞. In order to avoid that problem, we have eliminated the rank deficiency of matrix A by removing the seventh column of matrix A in 37 associated with the height G. Now, we have The constraint defines the geodetic datum, i.e. the S-system ( [81],p.41). Another approach to solving the system of equations in 2 could be based on generalised (pseudo) inverses (see e.g. [82]).
The location of the constraints can be chosen in some circumstances, for example during the design stage of a geodetic network. For the special case of having a minimally constrained system, the location of the constraint will not influence the w-test statistics and the sensitivity indicators (MIB and MDB) [18]. However, more constraints than the minimum necessary to have a solution (i.e. extra constraints or redundant constraints) can change the least-squares residuals and hence w-test statistics and the minimal biases.
From the network with one extra constraint (2 constraints) in Figure 2(b), for example, the both first (height A) and fourth column (height D) of matrix A in 37 were eliminated in the case of having the two heights as hard constraints. For the case where these two heights (A and D) were taken as soft constraints, however, two observation equations were added to Equation 36, i.e.: In the case of soft constraints in Equation 39, two lines were added in matrix A. In other words, A and D were taken as pseudo-observations. In that case, the rank deficiency was also null (i.e. u − rank(A) = 7 − 7 = 0), the redundancy of the system was r = n − rank(A) = n − u = 7 and the matrix A was given as follows: Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2020 doi:10.20944/preprints202003.0467.v1 For this example of two soft constraints, and by considering the both soft constraints with standard-deviation σ c = 10mm, the symmetric and positive semi-definite covariance matrix of the observations (Q e ) was given as follows: The last two rows and columns of the matrix Q e in Equation 41 refer to the variances (σ c 2 = (10mm) 2 = 100mm 2 ) of the heights constraints A and D, respectively.
Similarly, matrices A and Q e were constructed for the other cases studied here.
Although other measurements are able to identify an outlier for the case of having only one single soft constraint, the pseudo-observation (constraint) is not. In that case, the defect configuration is associated with the additional parameter in the constraint (i.e. the presence of an outlier in the constraint). In other words, an additional parameter on the soft constraint will not estimable. For example, if the height point G was taken as a soft constraint, the presence of an outlier in pseudo-observation G would lead to rank deficiency of matrix A, i.e. u − rank(A) = 8 − 7 = 1. Therefore, the case of having only one single soft constraint was not considered here.

Result of the Hard Constraint Effects on the Iterative Outlier Elimination Procedure
In this section, we present the results of the effects of the hard constraint on the IDS. The scenarios in Figure 2(a) (network minimally constrained), Figure 2(b) (two hard constraints) and Figure 2(c) (three hard constraints) were considered for the analysis.
The correlation coefficient between w-test statistics (ρ w i ,w j ) were computed for the three scenarios, according to Equation (20). Table 1 provides that correlation for that network minimally constrained, Table 2 for that network over-constrained with two hard constraints and Table 3 for that network over-constrained with three hard constraints. Table 1. Correlation matrix of w-test statistics for the levelling network minimally constrained in Figure  2(a). w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12  Table 2. Correlation matrix of w-test statistics for the levelling network with two hard constraints in Figure 2(b).
w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12 1.0 -0.4 0.1 0.4 w y 10 1.0 0.4 0.1 w y 11 1.0 -0.1 w y 12 1.0 Table 3. Correlation matrix of w-test statistics for the levelling network with three hard constraints in Figure 2(c).
w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12 9 1.0 -0.1 0.1 0.3 w y 10 1.0 0.3 0.1 w y 11 1.0 -0.1 w y 12 1.0 Table 4 gives the local redundancy (r i ), the standard-deviation of the LS-estimated outlier σ ∇ i and the maximum absolute correlation (max ρ w i ,w j ) for each scenario of hard constraint set out of in this study, i.e. Figure 2(a,b and c). Table 4. Local redundancy (r i ), standard-deviation of the LS-estimated outlier σ ∇ i and the maximum absolute correlation (max ρ w i ,w j ) for each scenario of hard constraint.
1 hard constraint 2 hard constraints 3 hard constraints Measurement Here, the twelve levelling measurements were clustered into four clusters. The clustering was defined according to the local redundancy (r i ) and the maximum absolute correlation (max ρ w i ,w j ) in Table 4. This is the first time that a clustering technique based on the similarity of local redundancy and the maximum absolute correlation between w-test statistics is applied to a problem of geodetic networks. Similarly, this has been done in [45]. The four cluster were defined as follows: • Cluster 1: y 1 , y 3 , y 4 and y 6 . • Cluster 2: y 2 and y 5 .
The probability levels associated with IDS were averaged for each of these clusters. Here, we compute the probability levels of the IDS based on the procedure in Section (5) [45]. The critical values werê k = 3.89,k = 3.93 andk = 3.93 for 1 hard constraint, 2 hard constraints and 3 hard constraints, respectively. These critical values were found for α = 0.001 according to the procedure described in Appendix A. The probability of correct identification (P CI ) and correct detection (P CD = 1 − P MD ) are displayed in Figure 3 for each number of hard constraint (denoted by h.c.). The outlier magnitude were defined from |5σ| to |9σ|. The outlier of |5σ| was chosen because it is approximately the lowest MDB 0(i) of the network when a single hypothesis testing is in play, as can be seen in Equation (48) or Equation (16). That MDB 0(i) of |5σ| was computed for significance level of α = 0.001 and a power of the test γ 0 = 0.8. This strategy reduces the search space for a MIB (Minimal Identifiable Bias), because we will always have the following inequality MIB ≥ MDB 0(i) [47,55]. Remember that the IDS procedure is an example of multiple hypothesis testing (see 2).
The sensitivity indicators MDB and MIB for IDS were also computed according to Equation (33) and (34), respectively. The success rate for outlier detection and outlier identification were taken as beingP CD =P CI = 0.8, respectively. Table 5 provides the values of MDB and MIB for that case of hard constraints.  Figure 4 shows the probability of wrong exclusion (P WE ). The over-identification cases (P over+ and P over− ) were smaller than 0.001 (i.e. they were practically null). There were not statistical overlap (P ol ) for clusters 2, 3 and 4. We will discuss more about statistical overlap (P ol ) later.

Result of the Soft Constraint Effects on the Iterative Outlier Elimination Procedure
The both configuration in Figure 2(d) and Figure 2(e) were analysed in terms of soft constraints. In that case, the critical values werek = 3.95,k = 3.95 andk = 3.92 for two soft constraints with σ c = 0.1mm, σ c = 1mm and σ c = 10mm, respectively. In the case of three soft constraints, the critical values found werek = 3.99,k = 3.99 andk = 3.96 for σ c = 0.1mm, σ c = 1mm and σ c = 10mm, respectively. All these critical values were computed for α = 0.001.
The correlation coefficient between w-test statistics (ρ w i ,w j ) are displayed in Table 6, Table 7 and Table 8 for two soft constraints with standard-deviation 10 times larger than measurements (i.e. σ c = 10 × σ = 10mm); 10 times better than measurements (i.e. σ c = 0.1mm); and equal to the measurements (σ c = 1mm), respectively. Table 6. Correlation matrix of w-test statistics for the levelling network with two soft constraints with σ c = 10mm in Figure 2(d).
w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12 w y 13 w y 14  Table 7. Correlation matrix of w-test statistics for the levelling network with two soft constraints with σ c = 0.1mm in Figure 2(d).
w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12 w y 13 w y 14  Table 8. Correlation matrix of w-test statistics for the levelling network with two soft constraints with σ c = 1mm in Figure 2(d).
w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12 w y 13 w y 14  Table 9 gives the local redundancy (r i ), the standard-deviation of the LS-estimated outlier σ ∇ i and the maximum absolute correlation (max ρ w i ,w j ) for the scenarios of two constraints. Table 9. Local redundancy (r i ), standard-deviation of the LS-estimated outlier σ ∇ i (mm) and the maximum absolute correlation (max ρ w i ,w j ) for each scenario of two soft constraints. From Table 9, five clusters were defined for each case of two soft constraints, i.e. for the case where heights A and D were given as soft constraints in Figure 2(d), as follows: • Cluster 1: y 1 , y 3 , y 4 and y 6 . • Cluster 2: y 2 and y 5 .
The probabilities of correct identification (P CI ) and correct detection (P CD ) for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayed in Figure 5. Note that the Cluster 5 is associated with the two soft constraints (i.e. y 13 and y 14 ). The probability of correct identification (P CI ) for these both soft constraints were null. However, the probability of correct detection were not. Figure 6 shows the probability of correct detection (P CD ) for these two soft constraints (i.e. heights A and D). The probability of wrong exclusion P WE for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of two soft constraints (heights A and D) are displayed in Figure 7. Figure 8 gives the the probability of wrong exclusion P WE for that two constraints (i.e. heights A and D). The over-identification cases (P over+ and P over− ) and the statistical overlap (P ol ) were practically null for that case.
The sensitivity indicators (MDB and MIB) for each scenario of two soft constraints are displayed in Table 10. Table 10. MDB and MIB for the case of two soft constraints based on α = 0.001 andP CD =P CI = 0.8. As with the case of two soft constraints, Tables 11,12 and 13 show the correlations (ρ w i ,w j ) for the case where there were three soft constraints, i.e. for the network configuration in Figure 2(e). Table 11. Correlation matrix of w-test statistics for the levelling network with three soft constraints with σ c = 10mm in Figure 2(e). w y 1 w y 2 w y 3 w y 4 w y 5 w y 6 w y 7 w y 8 w y 9 w y 10 w y 11 w y 12 w y 13 w y 14 w y 15  Table 12. Correlation matrix of w-test statistics for the levelling network with three soft constraints with σ c = 0.1mm in Figure 2(e).
w y1 w y2 w y3 w y4 w y5 w y6 w y7 w y8 w y9 w y10 w y11 w y12 w y13 w y14 w y15 1.0 Table 13. Correlation matrix of w-test statistics for the levelling network with three soft constraints with σ c = 1mm in Figure 2(e).
w y1 w y2 w y3 w y4 w y5 w y6 w y7 w y8 w y9 w y10 w y11 w y12 w y13 w y14 w y15 w y1  Table 14. Local redundancy (r i ), standard-deviation of the LS-estimated outlier σ ∇ i (mm) and the maximum absolute correlation (max ρ w i ,w j ) for each scenario of three soft constraints. The probabilities of correct identification (P CI ) and correct detection P CD for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A, D and G) are displayed in Figure 9. The probabilities of correct identification (P CI ) and correct detection (P CD ) in Figure 8 were computed for the clusters based on Table 14, as follows: Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2020 doi:10.20944/preprints202003.0467.v1 • Cluster 1: y 1 , y 3 , y 4 and y 6 .
• Cluster 6: y 15 . Figure 10 shows the probabilities of correct identification P CI and correct detection P CD for the three soft constraints, i.e. for Cluster 5 (heights A and D) and Cluster 6 (height G) in Figure 2 The probability of wrong exclusion P WE for the measurements (Cluster 1 to Cluster 4) subject to the scenarios of three soft constraints (heights A, D and G) are displayed in Figure 11. Figure 12 gives the the probability of wrong exclusion P WE for that three constraints (i.e. heights A, D and G).  The over-identification cases (P over+ and P over− ) and the statistical overlap (P ol ) were also practically null for that case of three soft constraints. The sensitivity indicators (MDB and MIB) for each scenario of three soft constraints are displayed in Table 15. Table 15. MDB and MIB for the case of three soft constraints based on α = 0.001 andP CD =P CI = 0.8.

Discussion
We start from the scenario of one hard constraint in Figure 2(a). Table 4 reveal that the maximum correlation between w-test statistics for the measurements constituting Cluster 1 is exactly equal to 1.00 (i.e. max ρ w i ,w j = 1.00). This means that the measurements belonging to Cluster 1 are those that are connected with unknown heights whose connections are limited to only two. The both unknown heights A and D are tied only to two measurements (i.e. y 1 and y 6 linked to A, and y 3 and y 4 linked to D), therefore if an outlier occurred in one of these measurements, we would only be able to analyse the consistency between them, but we would not be able to distinguish which of them was contaminated by an outlier. This means that we would only be able to detect, because the w-test statistics could be larger than a critical valuek, however, in that case, the values of w-test statistics would be the same, and therefore we would not have only one unique maximum w-test statistics, but actually would have four maximum w-test statistics. In other words, the equation systems associated with the measurements of Cluster 1 are linearly dependent [83]. Therefore, there is no reliability in terms of outlier identification for the Cluster 1, as can be seen in Figure 3(a).
From Figure 3(b), we can note that there is reliability in terms of outlier detection for Cluster 1, and it is caused by overlapping w-test statistics. The probability of statistics overlap P ol for Cluster 1 in this scenario of minimally constrained network is displayed in Figure 13. The problem of not having more connections (i.e. more measurements) for the unknown heights A and D in the case of one hard constraint with G fixed is overcome when these heights (A and D) are taken as hard constraints in 2(b) or when the heights A, D and G are hard constraints in figure 2(c). Figure 3(a, b) show that the measurements of Cluster 1 are now able to identify an outlier when two hard constraints (A and D fixed) are in play. It is also verified to the case of three hard constraints (A, D and G fixes) in Figure 3(e, f), i.e. there is reliability in terms of both outlier detection and identification for these measurements in those conditions. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 March 2020 doi:10.20944/preprints202003.0467.v1 One can argue that the more constraints, the larger the probability of correct identification (P CI ) and correct detection (P CD ). This claim is not entirely correct. This may be true in case where all points on the network have the same number of connections. However, in the example given, if both heights B and G were selected as hard constraints, the measurements of the Cluster 1 (i.e. y 1 , y 6 , y 3 and y 4 ) would still have their least-squares residuals fully correlated (i.e. ρ w y 1 ,w y 6 = 1.00 and ρ w y 3 ,w y 4 = 1.00), and therefore there would be no reliability in terms of outlier identification for these measurements.
From Table 5, we observe different behaviour for the clusters as follows: • Cluster 1: there was no MIB for the case of having only one single hard constraint, whereas MDB = MIB for the other cases. However, both MDB and MIB decrease significantly with the increase in the number of hard constraints. • Cluster 2: MDB slightly smaller than MIB. Both MDB and MIB were practically the same for the case of having two or three hard constraints. In terms of outlier detection and identification, therefore, Cluster 1 was more sensitive to constraints, Cluster 3 relatively sensitive to constraints, whereas the Cluster 4 completely insensitive to constraints and Cluster 2 relatively insensitive to constraints. This is also can be seen in Figure 3. The reason for this is that the local redundancy (r i ) of the Cluster 1 increased with the increase of the number of hard constraints, whereas the Cluster 4 have remained the same, as can be seen in Table 4. Leaving aside the cases of statistical overlap (P ol ), the network presents low least-squares residuals correlation (ρ w i ,w j < 0.5) and high local redundancy (r i > 0.5). Because of this, the probabilities of wrong exclusion (P WE ) were less than 1%, as can be seen in Figure 4. The over-identification cases (P over+ and P over− ) were practically null. Consequently, P CI ≈ P CD . Due of this fact, the family-wise error rate (α ) should be increased in order to have more success rate in the outlier detection and identification [45].
From Figure 14, we observe that increasing the α increases the both probabilities of correct identification (P CI ) and correct detection (P CD ) for outlier magnitude from 5σ to 6σ in the case of three hard constraints and from 5σ to 6.8σ in the case of two hard constraints. Although the rates of P over+ and P WE also increase, they are not significant when compared to the improvement of correct identification (P CI ) and detection (P CD ). This same analysis can be done for the other clusters.
In terms of soft constraints for the cases of two constraints in Figure 2(d), we observe from Table 9 that the larger the relaxation of the constraint (i.e. the larger the standard-deviation of the constraint σ c ), the larger the residuals correlation (ρ w i ,w j ) and the standard-deviation of the outlier σ ∇ i , and the smaller the local redundancy (r i ). Consequently, the probabilities of correct identification (P CI ) and detection (P CD ) get smaller and smaller with the relaxation of the constraints, whereas the probability of wrong exclusion gets larger (P WE ). This can be more clearly verified in Figure 5(a, b) and Figure  7(a) for Cluster 1, whose measurements are connected with the constraints A and D (i.e. y 13 and y 14 in Table 9, respectively).
Note from Figure 7(a) that the probability of wrong exclusion (P WE ) increases as the magnitude of the outlier (∇ i ) increases. However, this is true up to a certain limit of outlier magnitude. The effect of residuals correlation ρ w i ,w j on the rates of wrong exclusion (P WE ) and correct identification (P CI ) tends to decrease with the increase in the magnitude of the outlier ∇ i . This effect is more clearly verified for Cluster 1 in case where the precision of the constraints are ten times worse than the measurements σ c = 10σ = 10mm. Note from Figure 5 that identifying an outlier in the Cluster 1 (i.e. y 1 , y 3 , y 4 and y 6 ) when σ c = 10mm is more difficult than the other clusters. This is due to the fact the Cluster 1 has a higher residuals correlation ρ w i ,w j = 0.994 than other clusters. In general, therefore, we observe that the larger the relaxation of the constraints, the larger is the effect of the correlation ρ w i ,w j on the success rate of outlier identification (P CI ). Consequently, the higher the sensitivity indicator for outlier identification (MIB). Table 10 reveals that the ratio between MIB and MDB for the Cluster 1 and for the scenario where the standard-deviations of that two soft constraints are σ c = 10mm is MIB/MDB=25/7.5=3.3. On the other hand, the relationship between MIB and MDB is practically one (i.e. MIB/MDB=1.0) for the others scenarios. If the FWE rate (α ) were increased for the case where the two soft constraints of σ c = 10mm are in play, we would not have great advantages for Cluster 1, due to its high residuals correlation (ρ w i ,w j = 99.4%). From Figure 15, we can observe that the probability of correct identification (P CI ) for outlier magnitudes from 5σ to 8σ is effectively larger to a user-defined α = 0.1 than α = 0.001. However, the success rate is still less than 80%, i.e. P CI < 0.8. Note, for example, the correct identification rate is P CI = 56% for an outlier magnitude of ∇ i = 8σ and α = 0.1. For α = 0.1 the MIB = 33.5σ = 33.5mm, whereas for α = 0.001 is MIB = 25σ = 25mm. Therefore, in that case, the MIB forP CI = 0.8(80%) and α = 0.1 would be 34% larger than user-defined α = 0.001.
The soft constraints A and D were grouped in the Cluster 5 (i.e. A and D were treated as pseudo-observations in the model). There is no reliability in terms of outlier identification for the constraints, because the residual correlation between them is ρ w i ,w j = 100%, as can be seen in Table 9 for y 13 and y 14 . However, these soft constraints are able to detect an outlier. In that case, the probability of correct detection P CD in Figure 6 is mainly caused by the statistical overlap P ol , as can be seen in the example of the case of σ c = 10mm in Figure 16. From Table 10, we observe that the larger the relaxation of the constraints, the larger the MDB. Note that the values of MDB are given in σ, and therefore the MDB for σ c = 10mm is larger than σ c = 1mm and σ c = 0.1mm, i.e. we had the following inequality: MDB = 6.8σ c = 6.8 × 10mm = 68mm > MDB = 8.8σ c = 8.8 × 1mm = 8.8mm > MDB = 57σ c = 57 × 0.1mm = 5.7mm. In that case, if the FWE (α ) were increased, the rate of outlier detection by the Cluster 4 (i.e. by the soft constraints) would increase. In case of having three soft constraints in Figure 2(e), there is reliability in terms of outlier identification for the three pseudo-observations y 13 , y 14 and y 15 (i.e. for A,D and G), as can be seen in Figure 10 and Table 15. In that case, we also observe that the probabilities of correct detection P CD of the soft constraints A and D (i.e. Cluster 5) were approximately 13% for σ c = 10mm, 16% for σ c = 1mm and 24% for σ c = 0.1mm larger than the scenario of the network subject to two soft constraints. Table  15 reveals that the advantage of having three soft constraints instead of two constraints is that the constraints become identifiable in the presence of an outlier. The behaviour of the probabilities of correct detection (P CD ), correct identification (P CI ) and wrong exclusion (P WE ) was similar to the case of the two soft constraints.
Furthermore, the larger the relaxation of the constraints, the smaller the residuals correlation between the measurements and the soft constraints and the larger the residuals correlation among the measurements. This effect can be verified in Tables 6, 7 and 8 for two soft constraints as well as in  Tables 11, 12 and 13 for three soft constraints. In general, we observe that the case of two soft constraints for σ c = 0.1mm was comparable with two hard constraints (see e.g. Table 5 and Table 10) in terms of the probability levels associated with IDS for the measurements (i.e. clusters 1, 2, 3 and 4). In the same way for the case of two soft constraints with σ c = 1mm or σ c = 10mm, the probabilities levels were similar to the one hard constraint for that measurements, with the benefit of two soft constraints having reliability in terms of outlier identification for the Cluster 1. Finally, the three soft constraints with σ c = 1mm and σ c = 10mm were comparable to the two soft constraints for that scenario of constraints relaxation, wheres the three soft constraints for σ c = 0.1mm showed similar outcomes with three hard constraints for the measurements (see e.g. Tables 5 and 15). In that case, however, an advantage of the three soft constraints on the three hard constraints is the possibility of analysing the sensitivity of the constraints. It is important to emphasised that the stochastic models of the measurements and constraints were assumed well-known and defined for the analyzes performed here.

Material and Methods
Here, we use the procedure provided by Rofatto et al. [45] to compute the probability levels associated with IDS, as well as to estimate the minimal biases -Minimal Detectable Bias(MDB) and Minimal Identifiable Bias (MIB). The procedure is summarised in Figure 16. After finding the critical valuek by the process described in Appendix A, the procedure based on Monte Carlo is also applied to compute the probability levels of IDS when there is an outlier in the dataset. The overview of of the main elements involved with the IDS was detailed in Section 2. The steps displayed in Figure 17 are detailed as follows: First, random error vectors are synthetically generated on the basis of a multivariate normal distribution because the assumed stochastic model for random errors is based on the matrix covariance of the observations. Here, we use the Mersenne Twister algorithm [84] to generate a sequence of random numbers and Box-Muller [85] to transform it into a normal distribution.
The magnitude intervals of simulated outliers are user-defined. The magnitude intervals are based on the standard deviation of the observation (σ), e.g., |3σ| to |6σ|. Since the outlier can be positive or negative, the proposed algorithm randomly selects the signal of the outlier (for q = 1). Here, we use the discrete uniform distribution to select the signal of the outlier. Thus, the total error (ε) is a combination of random errors, and its corresponding outlier is as follows: In Equation (42), e is the random error generated from the normal distribution according to Equation (2), and the second part c i ∇ i is the additional parameter that describes the alternative model according to Equation (42). Next, we compute the least-squares residuals vector according to Equation (3), but now we use the total error (ε) as follows: For IDS, the hypothesis of (42) for q = 1 (one outlier) is assumed, and the corresponding test statistic is computed according to (43). Then, the maximum test statistic value is computed according to Equation (44). Now, the decision rule is based on the critical valuek computed by Monte Carlo (see the steps in Appendix A). After identifying the measurement suspected to be the most likely outlier, it is excluded from the model, and least-squares estimation (LSE) and data snooping are applied iteratively until there are no further outliers identified in the dataset. Every time that a measurement suspected to be the most likely outlier is removed from the model, we check whether the normal matrix A T W A is invertible or not. If the determinant of A T W A is 0, det|A T W A| = 0, then there is a necessary and sufficient condition for a square matrix A T W A to be non-invertible. In other words, the IDS is ended when det|A T W A| = 0. The IDS procedure is performed for m experiments of random error vectors for each experiment contaminated by an outlier in the ith measurement. Therefore, for each measurement contaminated by an outlier, there are υ = 1, . . . , m experiments.
The probability of correct identification (P CI ) is obtained by the ratio between the correct identification cases and possible cases. Thus, if m is the total number of Monte Carlo experiments (possible cases), then we count the number of times that the outlier is correctly identified (denoted as n CI ) and then approximate the probability of correct identification P CI as Similar to Equation (44), false decisions are computed as P MD = n MD m (45) P WE = n WE m (46) P over + = n over + m (47) P over − = n over − m (48) P ol = n ol m (49) where: • n MD is the number of experiments in which IDS does not detect the outlier (P MD corresponds to the rate of missed detection); • n WE is the number of experiments in which the IDS procedure flags and removes only one single non-outlying measurement while the 'true' outlier remains in the dataset (P WE is the wrong exclusion rate); • n over + is the number of experiments in which IDS correctly identifies and removes the outlying measurement and others, and P over + corresponds to its probability; • n over − represents the number of experiments in which IDS identifies and removes more than one non-outlying measurement, whereas the 'true outlier' remains in the dataset (P over − is the probability corresponding to this error probability class); and • n ol is the number of experiments in which the detector in Equation (44) flags two (or more) measurements simultaneously during a given iteration of IDS. Here, this is referred to as the number of statistical overlap n ol , and P ol corresponds to its probability.
In this paper, the probability levels associated with IDS were computed for each observation individually and for each outlier magnitude. However, they were grouped into clusters based on number of local redundancy (r i ) and maximum absolute correlation between the w-test statistics (ρ w i ,w j ). Furthermore, we take care to control the family-wise error rate.

Conclusions
In general, the probability of correct identification (P CI ) for the case of hard constraints is larger than soft constraints. It can be verified in Figure 3, Figure 5 and Figure 9. Therefore, hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. In that process, one should opt to set out the redundant hard constraints at points in the network where the smallest connections exist. After identifying and removing possible outliers, the soft constraints should be employed to propagate the uncertainties of the constraints (pseudobservations) to the model parameters during the process of least-squares estimation. This recommendation is valid for outlier detection and identification purpose.
We highlight the main findings of this research as follows: • Under a system of a high local redundancy r i > 0.5 and low residuals correlation (ρ w i ,w j < 0.5), if one increase the family-wise error rate (FWE) of the test statistic, the performance of the procedure will be improved for both scenarios of hard constraints and soft constraints. • The larger the relaxation of the constraints, the larger is the effect of the residuals correlation (ρ w i ,w j ) on the success rate of outlier identification (P CI ). Consequently, the higher the sensitivity indicator for outlier identification (MIB), and therefore more difficult it becomes to identify an outlier. • Under a scenario of soft constraints, one should set out at least three soft constraints in order to identify an outlier in the constraints.
In other types of analysis, for example, deformation analysis of geodetic networks one should formulated the constraints as pseudo-observations with σ c > 0 in an adjustment model of system equations. In other words, the relaxation of constraints makes it possible to estimate deformation effects that are unmodelled in the deterministic model matrix, for example, deformations which have a spatial and temporal variations. In that case, the deformation can be modelled by covariance functions that determine the cofactors between constraints [28]. This should be applied to control points located in the structure, but not to control points located outside the structure. The control points located outside the structure should be hard constraints and have their stability verified by an independent process in order to analyse the deformation of the structure. So, one will be also able to analyse the sensitivity in terms of minimal detectable deformations and/or minimal identifiable deformations [23]. Future researches will be addressed to sensitivity analysis for deformation monitoring networks.  The sorted valuesw in Equation (A4) provide a discrete representation of the cumulative density function (cdf) of the maximum test statistic max-w. 5.
Determine the critical valuek as follows:k where [.] denotes rounding down to the next integer that indicates the position of the selected elements in the ascending order ofw. This position corresponds to a critical value for a stipulated overall false alarm probability α . This can be done for a sequence of values α in parallel.
It is important to mention that the probability of a Type I decision error for multiple testing α is larger than that of Type I for single testing α 0 . This is because the critical region in multiple testing is larger than that in single hypothesis testing.