Statistical Mirroring: A Good Alternative Estimator of Dispersion

The statistical properties of a good estimator include robustness, unbiasedness, efficiency, and consistency. However, the commonly used estimators of dispersion have lack or are weak in one or more of these properties. In this paper, I proposed statistical mirroring as a good alternative estimator of dispersion around defined location estimates or points. In the main part of the paper, attention is restricted to Gaussian distribution and only estimators of dispersion around the mean that functionalize with all the observations of a dataset were considered at this time. The different estimators were compared with the proposed estimators in terms of alternativeness, scale and sample size robustness, outlier biasedness, and efficiency. Monte Carlo simulation was used to generate artificial datasets for application. The proposed estimators (of statistical meanic mirroring) turn out to be suitable alternative estimators of dispersion that is less biased (more resistant) to contaminations, robust to scale and sample size, and more efficient to a random distribution of variable than the standard deviation, variance, and coefficient of variation. However, statistical meanic mirroring is not suitable with a mean (of a normal distribution) close to zero, and on a scale below ratio level.


Introduction
A good estimator is statistically characterized as robust (invariant) to outliers and scale, unbiased, efficient to random samples, and consistent to sample sizes. However, the commonly used estimators of dispersion (e.g., standard deviation, variance, coefficient of variation, etc) lack one or more of these properties. Most of the robust estimators (e.g., interquartile range, quartile coefficient of dispersion, Qn and Sn by Rousseeuw-Croux, and other M-estimators) have low efficiency [1]; [2], while most of the relatively efficient estimators (e.g., standard deviation, variance, coefficient of variation) are not robust [1]. In practice, inefficient and inconsistent estimators could lead to a strong bias and erroneous statistical conclusions about a sample or population. Therefore, users have to be very cautious about which estimator is suitable and precise for their data, otherwise wrong statistical inference and conclusion would be drawn.
Under asymmetric distribution with some outliers, mean and standard deviation estimates lack robustness and lead to a strong bias [1], and the same thing would be expected from its derivative functions such as the coefficient of variation. Median and interquartile range are robust but inefficient estimators under contaminated and asymmetric distributions. In survey statistics, outliers are unavoidable elements, and their analysis must be in non-parametric form, otherwise, the data should be checked and corrected, or transformed to suit the parametric designs [1]. Unfortunately, data transformation (intended to remove, smooth, or normalize the existing outliers) is considered, in some cases, a dishonest and bias treatment [1]. variance, standard deviation, dispersion index (or variance-to-mean ratio), interdecile range, median absolute deviation from median (MADM), mean absolute deviation from mean (MAD), interquartile range (IQR), quartile coefficient of dispersion (derived from IQR), Qn and Sn by Rousseeuw-Croux, and other M-estimators have failed. The coefficient of variation stands as one of the important scale invariants (a standardized, unitless and dimensionless) estimator that have been used to compare datasets on different scales. The classic version of the coefficient of variation has been used in different areas such as biology [3], biochemistry [4], medical physics [5], neuroscience [6], [7], engineering [8], psychology [9], [10], sociology and economics [11]. Despite that, coefficient of variation is not always a good measure of relative dispersion and has the following pitfalls and drawbacks: a) it has no clear bounds that are within a fixed range b) inappropriate in asymmetric distribution, c) inappropriate with nominal, ordinal and interval scales d) very sensitive to outliers especially if the mean of the distribution tends close to zero, e) appropriately works with positive values of observations, f) inappropriate for comparing groups with different sample sizes [11].
Scale statistics (a measure of dispersion, scale, and spread of data) is an important primary stage for inferential statistics. Therefore, the goodness and statistical qualities of inferential statistics rely on the goodness of its scale estimators. The strict barriers between parametric and non-parametric statistics are importantly the normality assumption and homoscedasticity condition which depend on the distribution's shape and the scale invariance respectively. Therefore, having a robust estimator (i.e., robustness to outliers, scale, and sample size) and efficient breaks this barrier.
Recently, scientists, statisticians, and data analysts suffer on the choice of estimators (navigating from descriptive to inferential methods) that resist outliers and the underlying distribution of the data, and at the same time maintain its robustness, efficiency, and consistency. Among the good properties, users should look at for good estimators of dispersion included the following: robustness to scale and contaminations, unbiasedness, efficiency, and consistency; both at symmetric and asymmetric distributions. In this paper, a statistical mirroring is proposed as a good alternative estimator of dispersion that measures the coefficient of proximity, proximity, and deviation of all the data points of a variable about defined location estimates or points. I restricted attention to the use of Kabirian-based isomorphic optinalysis [12] to derive the concept of statistical mirroring. In application, attention is focused on Gaussian distribution and only estimators of dispersion around the mean that functionalize with all the observations of a dataset were considered at this time. In due course, absolute measures of dispersion, the variance and standard deviation; and a relative measure of dispersion, the coefficient of variation were used as the reference standard estimators of dispersion. is a co-domain of ; is a mid-point or symmetrical line, and is the optical scale. The symbol ⇻ indicates a bijective mapping between the isoreflective pair around a midpoint and ↠ indicates a bijective remapping by the optical scale .
Definition V In comparative optinalysis, a reflection (pairing) is said to be head-to-head if the lower order elements (observations) of the isoreflective pair (of two mathematical structures) are extreme away from the midpoint [12].
Definition VI. In comparative optinalysis, a reflection or pairing is said to be tail-to-tail if the lower order elements (observations) of the isoreflective pair (of two mathematical structures) are extreme towards the midpoint [12].
Suppose we have an optinalytic construction of isoreflective pair with an assigned optical scale ( ) as follows:  Where | ( )| and | ( )| are the absolute optical moment of and respectively about the central mid-point through a distance started from the centre. It is expressed by equations (2.1) and (2.2).

Proposition:
The dispersion (proximity and deviation) of data points from a defined location of a given distribution is the isoreflectivity of its data points to a defined statistical mirror (i.e., a defined and amplified location estimate of the distribution).

Properties of statistical mirroring
i.
It is based on all observations of the dataset. Therefore, extreme minimum and maximum values are not discarded.
ii. It applies to all sets of real numbers (such as discrete or continuous variables containing either or both negative and positive values), iii. It may involve the use of mean and other defined location estimates such as median, maximum, minimum, and range.
iv. It is variant concerning changes in a location parameter. Therefore, it is not suitable for the comparison of multiple datasets (measurements) below the ratio level of scale. where , , = ℝ v. Statistical mirroring on ratio level of scale is scale-invariant (i.e., robust to scale, unitless and dimensionless) estimator of dispersion (scale). This property corresponds to the invariance of isomorphic optinalysis under translation transformation [12]. Therefore, it is very suitable with measurements on a ratio scale. where , , = ℝ; ≠ 0.
vi. Statistical mirroring is bi-coefficients and translative (i.e., forward and reverse translations). It gives two possible coefficients ( 1 . , 2 . ) due to its commutive property, but each coefficient translates into the same results ( . , and . ), which can be used to compute back to the two coefficients.
The two possible Kabirian bi-coefficients work on two different optinalytic scales.
vii. Statistical mirroring is commutive around a central mid-point of the two isoreflective pairs. This property corresponds to the invariance of isomorphic optinalysis under central rotation (alternate reflection) transformation [12].
viii. Statistical mirroring is population independent and invariance to sample size. But the sample size invariance is effective to . and . , and not to . . ix. Statistical meanic mirroring only, not any others, is invariant to sample size or multiple repeats in the same order of all the observations of a dataset.

Application and Methods Comparison on Dispersion Measures
In this paper, attention is restricted to symmetric distribution and only estimators that functionalize with all the observations of the datasets were considered. I apply the statistical mirroring (specifically the statistical meanic mirroring) to show its suitability as an alternative approach of dispersion measure around the mean. The proposed estimators (i.e., the statistical meanic mirroring) were compared based on desirable properties a good estimator should have, with the most used reference standard estimators of absolute and relative dispersion around the mean (i.e, standard deviation, variance, and coefficient of variation).

Artificial Datasets
Monte Carlos simulation was used to generate artificial datasets. A total of 1000 random variables were generated from a normal distribution with the following parameters: = 10; = 1; = 10, 50, 100, 200, & 500. These steps of datasets generation was repeated with = 2, 3, … . , 15. This made a total of 75,000 parametrized random numbers.
Before outlier biasedness evaluation, Monte Carlos simulation was also used to generate 1000 random variables from a normal distribution with = 10; = 2; = 50. Then, a single point and 20% contaminations with a magnitude of contaminants of ±5, ±10, ±15, ±20, ±500, ±1000, ±5000, ±10000 were added to the upper and the lower values of the sorted distribution. Table 1 presented how the average mean of the distribution changed with the added contaminations. From table 1, the procedural design allows us to check the behaviors of the estimators close and away from a zero mean of the contaminated normal distribution.

Data Analysis
Microsoft Excel statistical functions (Refer to supplementary files S1-S5) were used to estimate the Kabirian coefficient of meanic proximity, meanic proximity, meanic deviation, standard deviation, coefficient of variation and variance of the generated random variables. The two possible Kabirian bicoefficients ( 1 . ( , ) and 2 . ( , )) were evaluated and sorted on a standardized optinalytic scales by a reverse translation using the calculated meanic proximity ( . ) onto equations (14) and (15).
Let = 1, 2, 3, … , 2 + 1, and is the sample size. That is in this case, the identity of the optical scale equals to the scale. Then we have: The alternate reflection and estimate of the above argument become Let = 1, 2, 3, … , 2 + 1, and is the sample size. That is in this case, the identity of the optical scale equals to the scale. Then we have: However, some parameters or properties, such as alternativeness, outlier biasedness, efficiency, and relative efficiency of the estimators were computed using the following statistics.

i. Alternativeness
Pearson correlation was used to correlate between the averages of the estimates through the range of variable standard deviations ( = 1, 2, 3, … , 15), for each treated sample size ( = 10, 50, 100, 200, & 500). A strong correlation between the proposed estimator and the gold standard represents suitability as an alternative method. Find the supplementary files (S1-S6). ii.

Robustness to contamination (Outlier biasedness)
The biasedness of the estimators under contaminations from a normal distribution was evaluated from the equation (16). Find the supplementary files (S7).
= A standardized estimate expectation of the estimator before contamination.
̂ = A standardized estimate expectation of the estimator after contamination.
Note: is the 1,000 estimates of the estimator. iii.

Relative absolute biasedness (RAB) of contaminations
Relative absolute biasedness of the estimator under contaminations is an evaluation that checks the equality of biasedness under positive and negative contaminations. It is expressed by the equation (17).

v. Efficiency
The efficiency and relative efficiency of the estimators from a normal distribution was evaluated from equations (18) and (19) respectively. Find the supplementary (S1-S6).
The standardized efficiency (standardized variance) now becomes: The results in Table 2 shows that the proposed estimators (KC1-MnProx., KC2-MnProx., MnProx., MnDev.) are very strongly correlated and associated (/R/= 0.7674 to 0.9994) with the standard deviation, coefficient of variation, and variance. The KC1-MnProx., and MnProx., were positively correlated with the standard deviation, coefficient of variation, and variance; while the KC2-MnProx., and MnDev. were negatively correlated. The statistical meanic mirroring is therefore a suitable alternative estimator of dispersion.

Outlier biasness and normality of the estimators' estimates under contaminations
Outlier biasedness of the Kabirian bi-coefficients of meanic proximity (KC1-MnProx., and KC2-MnProx.), the meanic proximity (MnProx.), and the meanic deviation (MnDev.) was compared with the standard deviation and coefficient of variation. A total of 1,000 artificial datasets from a normal distribution with μ = 10, σ = 2 was contaminated with a varying magnitude of contaminants, and biasedness of the estimators due to the contaminants was analyzed. To simplify the graphical presentation of the results, the absolute outlier biasedness of the estimates was log-transformed. Therefore, the higher values of the logtransformed result represent low outlier biasedness and the vise-versa.
The Figure 1 and 2 presented how sensitive are the estimators to the contaminations (outliers). The results show that the statistical meanic mirroring (composed of KC1-MnProx., KC2-MnProx., MnProx., MnDev.) is less biased (less sensitive and more resistant to contaminations) than the standard deviation and coefficient of variation, at lower and extreme contaminations, from the top and the bottom points of the ordered random numbers. At the lower magnitude of contaminations, the negative contaminations lead to more bias than the positive contaminations for KC1-MnProx., KC2-MnProx., and MnProx. estimators; while MnDev., StDev., and CV are the opposite case. At the extreme magnitude of contaminations, both positive and negative contaminations are relatively the same outlier biasedness. In terms of the outlier biasedness between the possible Kabirian bi-coefficients, KC1-MnProx. is superior (more resistant) with negative contaminations than KC2-MnProx. with positive contaminations. Statistical mirroring is more sensitive to contamination as the mean of the distribution tends close to zero than away from zero mean, but the sensitivity is very less compared to the coefficient of variation. Table B1 of Appendix B presented the normality distribution of the estimates of the estimators under contaminations (outliers). The results show that the estimates of the statistical meanic mirroring (KC1-MnProx., KC2-MnProx., MnProx., MnDev.) have passed the normality test under low and extreme outliers, except in the case of the mean tending close to zero due to negative outliers. In the case of StDev., the normality of the estimates has failed in all the examined cases of contaminations except in only one case. While for the case of coefficient of variation, the normality has given an imprecise result because it failed with low contaminations and passed with higher contaminations.

Robustness (invariance) to scale
The scale invariance (scale robustness) of statistical mirroring has already been shown by its properties stated in this paper. In this case, attention is focused to compare the scale robustness of coefficient of variation and statistical meanic mirroring on two cases: a) a case of positive and negative scaling, b) a case of zero mean.
From the stated properties in this paper, a statistical mirroring is invariant to positive and negative scaling of a subset from natural numbers. It is therefore independent of whether the mean is positive or negative and all results are the same in all respect to scaling. where , , = ℝ; ≠ 0. This means it satisfies the following network of relationships: But the coefficient of variation (CV), it is either invariant to positive or negative scaling of a subset from natural numbers. Therefore, the invariance property of CV depends on whether the mean is positive or negative, all estimates are not the same in all respect of scaling.
In the case of a zero-mean from the set of integers or a constant zero value, the coefficient of variation (CV) functionally breaks down but statistical meanic mirroring does not. The functional breakdown of the CV is a result of zero denominators (i.e., zero mean) which is never found with the statistical mirroring except in the case of a uniform zero set of numbers. Even at this case, any small amount of optinalytic normalization can eliminate this scenario.

Robustness (invariance) to sample size
The sample size invariance (robustness to sample sizes) of statistical mirroring has already been shown by its properties stated in this paper. At this time, an example numerical problem (i.e., a measurement of relative diversity of a certain attribute) was provided (Table 3) to compare the impact of sample size on the estimators of dispersion. The results from Table 3 shows that, despite having an identical central tendency and score distribution, all the estimators (variance, standard deviation, coefficient of variation) are not robust to sample size except the statistical meanic mirroring (MnProx., and MnDev.).
In sociology and economics, the use of coefficient of variation to estimate demographic diversity index has been one of the potential problems for the comparison of groups with different sample sizes [11]. To make the groups comparable for their differing sample size, [13] created a corrected version of the coefficient of variation that resists sample size variation. Anthur & Kevin [11] reported that 29 out of 36 published articles from 1984 to 1999 on work group diversity have used uncorrected coefficient of variation as an index of diversity. Now, the simplest way to deal with this problem is the use of statistical meanic mirroring which is robust and unbiased to sample sizes.

Efficiency and relative efficiency of the estimators
Efficiency and relative efficiency properties were used to evaluate the goodness of the statistical meanic mirroring (KC1-MnProx., KC2-MnProx., MnProx., MnDev.) and some gold standard estimators of dispersion around the mean. Total of 1000 artificial datasets from a normal distribution with μ = 10; σ = 1, 2, 3, … , 15; = 10, 50, 100, 200, 500 was used. The standardized variance of the estimates of the 1000 random numbers express the efficiency. Tables 4 and 5 presented how efficient are the estimators. The results show that the estimators of statistical meanic mirroring (KC1-MnProx., KC2-MnProx., MnProx., MnDev.) are more or equally efficient as compared to standard deviation and coefficient of variation. Similar to the efficiency, the relative efficiency of all the estimators decreases with a higher spread of the normal distribution and leaves most of the estimators of statistical meanic mirroring the most superior efficiency. Fortunately, the superior efficiency gets declined or lost as the spread of the normal distribution gets wider. The loss in superior efficiency is due to the inconsistent and increased variance of the mean as the spread gets larger (Figure 3 and Appendix C1). Thus, this loss of efficiency is not an estimators' weakness per see, but it is a simulator's weakness. That is why only the two relative estimators of dispersion are affected because they are very relative to the mean of the distribution.

Drawback and limitations of statistical meanic mirroring
The following are some of the identified drawbacks and limitations of statistical meanic mirroring: i. Anonymity is not respected. It depends on the ordering of the list of the observations, otherwise inaccurate results. ii.
Under Gaussian distribution, it is more sensitive to contaminations as the mean tends close to zero than away from zero mean. iii.
It is inappropriate below a ratio level of scale.

Conclusion
Statistical mirroring (specifically the statistical meanic mirroring) is, under Gaussian distribution, a suitable alternative estimator of dispersion that is less biased (more resistant) to contaminations, robust to scale and sample size, and more efficient to a random distribution of variables as compared with some reference standard estimators. However, some of the limitations of statistical mirroring include: It relies on a) the ordering of the list of the observations, b) more sensitive to contaminations as the mean (of a normal distribution) tends close to zero, c) inappropriate below ratio level of scale.

Recommendation
In this paper, all the proposed estimators of statistical dispersion were not compared for their statistical goodness, evaluated for suitability and application. It is therefore recommended that other estimators of dispersion around defined location estimates or points (e.g, statistical medianic, maximalic, minimalic, modalic and rangic mirroring) should be explored for possible application and comparison.
The suitability and statistical goodness of statistical mirroring should also be checked for other distributions such as Poisson, uniform, binormial, chi-square, Bernoulli, patterned, discrete distributions, etc. However, the performance of the estimators with real datasets should be evaluated.

Supplementary material:
The supplementary files attached are customized Excel sheets. Find the supplementary files (S1-S7).

Conflict of interest:
The author declares no conflict of interest.
Funding: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.