Preprint
Article

This version is not peer-reviewed.

Negative Indicators and Ordering Stability in Exploratory Factor Analysis: A Sign-Orientation Theory with Reproducible Simulation Evidence

Submitted:

02 June 2026

Posted:

08 June 2026

You are already at the latest version

Abstract
Exploratory factor analysis (EFA) and first-component scoring are widely used to rank samples in multi-indicator evaluation. A recurring empirical observation is that untreated negative indicators can make rankings appear unchanged when they are rare, unstable when positive and negative indicators are balanced, and reversed when negative indicators dominate. This paper gives a mathematical explanation of this phenomenon and separates two mechanisms that are often conflated. First, in a one-factor model, re-extracting the leading factor after a coordinate sign flip is invariant up to the unavoidable global sign of the factor; therefore, the population eigenstructure itself is not the source of the phase transition. Second, when scores are compared in a fixed semantic direction, or when EFA regression scores are published after an anchor-based sign-orientation rule, the common factor signal is partially cancelled by untreated negative indicators. We derive closed-form correlations, Kendall ordering probabilities, weighted thresholds for heterogeneous loadings, and a finite-sample critical band of order $n^{-1/2}$. Reproducible simulations verify the theory: a 200-repetition fixed-direction simulation shows a transition from correlation near $+1$ to near $0$ and then to near $-1$; EFA regression scores oriented by an anchor statistic switch direction at the predicted information-balance threshold; and a dedicated local-alternative experiment with $\Delta_I=c/\sqrt{n}$ confirms the predicted flip-probability scaling. The results justify mandatory same-direction preprocessing of negative indicators and recommend reporting weighted sign-balance diagnostics before ranking.
Keywords: 
;  ;  ;  ;  ;  ;  ;  

1. Introduction

Multi-indicator evaluation aims to rank observational units by a latent level that is not directly observed, such as organizational performance, medical quality, regional development, consumption quality, risk exposure, or a psychological trait. Exploratory factor analysis (EFA), principal component analysis (PCA), and first-factor scoring are common tools in this setting because they reduce correlated indicators to a smaller set of latent dimensions and provide numerical scores for comparison [1,2,7,14,21]. Recent methodological reviews emphasize that EFA is not a mechanical dimension-reduction routine: researchers must justify factor extraction, factor retention, rotation, loading interpretation, and score reporting decisions, especially when factor scores are later used as substantive variables or ranking indices [8,12,13,15,23,25,27,28,30,31,32,33,39].
The practical setting studied here has an additional semantic layer. Some indicators are positive indicators, for which larger values imply a higher latent level, while others are negative indicators, for which larger values imply a lower latent level. Examples include cost, risk, mortality, error rate, delay, pollution, and negatively worded or reverse-keyed questionnaire items. Applied researchers usually transform negative indicators before EFA so that all variables share the same semantic direction. In Chinese multi-indicator evaluation this operation is often called same-tendency processing, while in psychometrics related operations are discussed under reverse coding, reverse keying, or wording effects [19,29,35,36]. The point is not merely presentational. A factor score can be algebraically well defined but semantically wrong if variables pointing in opposite substantive directions are added into the same scoring direction.
A persistent empirical pattern is observed when negative indicators are omitted from this harmonization step or are treated inconsistently. If negative indicators are rare, the ranking produced by a first-factor score is often almost unchanged. If the number or information content of negative indicators is close to that of positive indicators, the ranking becomes unstable or appears nearly random. If negative indicators dominate, the resulting ranking can be almost the reverse of the desired ranking. Related problems have been documented in research on negatively keyed items, mixed-format scales, acquiescence, careless responding, and wording method factors [10,11,16,18,20,22,24,26,41,43]. Recent studies further show that reverse-keyed items can distort factor structure through item polarity, item extremity, linguistic ambiguity, and respondent inattention [37,42,45,46]. In Chinese measurement contexts, wording effects have also been reported when re-examining the dimensionality of translated or culturally adapted scales [38].

1.1. Theoretical Background

The theoretical difficulty begins with the identification of factor-analysis models. In a one-factor covariance model, changing the sign of all loadings and the sign of the corresponding factor gives the same fitted covariance matrix. Thus, the sign of a factor is not identified by the covariance model alone; it must be supplied by a convention, a reference indicator, a theoretical anchor, or a post-estimation orientation rule [4,9,17,44]. This sign indeterminacy is harmless when a study only reports a covariance fit, but it becomes consequential when factor scores are used to rank samples, compare institutions, trigger warnings, or communicate a direction such as “better”, “higher quality”, or “greater risk”.
The second theoretical layer is the distinction between coordinate sign changes and semantic sign changes. A coordinate sign change multiplies selected observed variables by 1 . At the covariance-matrix level this is an orthogonal similarity transformation: if D is a diagonal sign matrix, then D Σ D has the same eigenvalues as Σ . Therefore, a re-extracted leading factor in an ideal one-factor model is not intrinsically destabilized simply because half of the variables have opposite signs. The leading vector is transformed by D, and the score is preserved after consistent sign alignment. The semantic problem appears when the analyst keeps a fixed positive scoring direction, or when software returns an arbitrary factor sign and the analyst orients it by an anchor statistic that is itself affected by untreated negative indicators.
The third layer concerns score determinacy and ordering. Factor scores are estimates or projections, not directly observed latent variables. Their quality depends on the measurement model, communalities, residual variance, sample size, and the scoring method [4,17,28]. For ranking applications, however, the central question is not only whether a factor score is determinate, but also whether it preserves the pairwise ordering induced by the intended positive latent direction. A score with high absolute correlation can still publish a reversed ranking if its sign is oriented incorrectly. Conversely, a score near the sign-cancellation threshold can have low ordering information even when the underlying covariance matrix has a clear leading eigenspace.

1.2. Research Gap

Existing work provides strong practical warnings: EFA decisions must be reported carefully; reverse-keyed or negatively oriented items can generate method factors; careless responding and acquiescence can damage factor recovery; and loading signs require explicit conventions [25,31,40,44]. Nevertheless, three gaps remain. First, the literature often mixes together three workflows: re-extracting the factor after a coordinate sign transformation, scoring transformed data with a fixed semantic direction, and publishing EFA regression scores after an anchor-based sign-orientation rule. These workflows have different invariance properties and should not be interpreted as the same phenomenon. Second, the empirical terms “unstable”, “random”, or “chaotic” are rarely connected to explicit ranking probabilities. Third, for heterogeneous indicators the critical threshold is often described by a raw count of negative indicators, although high-loading or low-noise negative indicators should clearly carry more information than weak indicators.
This paper addresses these gaps by giving a sign-orientation theory for negative indicators in one-factor EFA/PCA-style scoring. The theory separates population eigenspace invariance from fixed-direction signal cancellation, converts score correlation into Kendall ordering consequences, and derives a weighted information threshold for heterogeneous indicators. It also analyzes finite-sample anchor orientation, showing that the practically unstable region has width of order n 1 / 2 around the signed information-balance threshold.
The broader motivation is not limited to psychometrics. In applied data-intelligence settings, model outputs are often converted into operational warnings, recognition decisions, or design quantification indices, so feature direction and score interpretation must remain stable across preprocessing pipelines [34,47,48,49]. Financial early-warning models are a direct example: positive indicators of resilience and negative indicators of risk must be directionally harmonized before a composite warning score can be interpreted [47].
This paper provides a unified theory of the empirical three-regime behavior. The contribution is fourfold:
  • It proves that sign-flipped one-factor covariance and correlation matrices have invariant eigenvalues and transformed eigenvectors. Thus, the leading eigenspace is not intrinsically unstable merely because positive and negative indicators are balanced.
  • It derives closed-form correlations between the desired all-positive score and an unharmonized fixed-direction score. In the equal-loading case, the threshold is the negative-indicator fraction k / p = 1 / 2 ; in heterogeneous cases, the threshold is a signed loading-energy balance rather than a simple count.
  • It converts score correlation into ranking consequences using the bivariate normal formula for Kendall’s tau and pairwise disagreement, giving a formal interpretation of the empirical term “chaos” as approximately random pairwise ordering.
  • It analyzes EFA regression scores under a sign-orientation rule and shows that published ranking direction is governed by a weighted information difference Δ I = I + I . A finite-sample normal approximation yields a critical band of width O ( n 1 / 2 ) around Δ I = 0 .
The remainder of the paper is organized as follows. Section 2 defines the model, the sign-flip operator, score types, and ranking metrics. Section 3 gives the main theoretical results. Section 4 describes the reproducible simulation protocols. Section 5 reports numerical evidence. Section 6 discusses interpretation, limitations, and practice. Section 7 concludes.

2. Model and Methods

This section formalizes the three layers used throughout the paper: the measurement model, the coordinate sign operation, and the score-orientation rule. The notation deliberately distinguishes an algebraic sign flip from a substantive reversal of meaning. This distinction is necessary because factor analysis is invariant to certain sign transformations, whereas a published ranking is not invariant once a researcher declares that larger scores mean a better, safer, healthier, or otherwise more desirable state.

2.1. One-Factor Measurement Model

Let x = ( x 1 , , x p ) T denote p standardized indicators. The baseline one-factor model is
x = a f + ε ,
where f N ( 0 , 1 ) is the latent evaluation factor, a = ( a 1 , , a p ) T is a loading vector with a i > 0 , and ε N ( 0 , ψ I p ) is independent idiosyncratic noise. The covariance matrix is
Σ = a a T + ψ I p .
The leading population eigenvector is
v = a a ,
with leading eigenvalue a 2 + ψ and remaining eigenvalues equal to ψ .
A sign-flip matrix is a diagonal matrix
D = diag ( d 1 , , d p ) , d i { 1 , + 1 } .
The value d i = 1 represents an indicator whose observed scale is opposite to the positive semantic direction. The unharmonized observation is
x ( D ) = D x ,
with covariance
Σ ( D ) = D Σ D = ( D a ) ( D a ) T + ψ I p .
In the equal-loading case, a i 1 and the standardized correlation matrix has the equicorrelation form
R = ( 1 ρ ) I p + ρ J p , ρ = 1 1 + ψ ,
where J p is the all-one matrix.

2.2. Three Score Definitions

The first score is the desired all-positive semantic score,
s = v T x .
The second score is the fixed-direction unharmonized score,
t ( D ) = v T D x .
This is the score obtained when the original positive loading direction v is kept fixed but the variables contain untreated negative indicators.
The third score is a re-extracted score. Let u ( D ) be the leading eigenvector of Σ ( D ) . Then
s re ( D ) = ( u ( D ) ) T x ( D ) .
This score is included because it clarifies a common misconception: in the ideal one-factor population model, re-extraction cancels the coordinate sign flip up to the arbitrary global sign of the factor.

2.3. EFA Regression Scores and Sign Orientation

For a one-factor EFA model with loading vector λ , the regression or Thomson factor score has the form [3,17,21]
f ^ raw = β T x , β = Σ x x 1 λ ,
where Σ x x = λ λ T + Ψ . Since a one-factor loading vector is only identified up to a global sign, λ and λ represent the same covariance model. Applied work therefore often uses a sign-orientation rule before publishing a score. We write such a rule as
T ( λ ) = w T λ , O ( λ ) = sign { T ( λ ) } , w i > 0 ,
and publish
f ^ = O ( λ ^ ) f ^ raw .
The vector w can represent equal weights, theoretically selected anchor items, or inverse-noise information weights.
To connect this rule to positive and negative indicators, let the external positive latent variable be θ and write
x i = s i a i θ + ε i , s i { + 1 , 1 } , a i > 0 .
Let P = { i : s i = + 1 } and N = { i : s i = 1 } . Define
I + = i P w i a i , I = i N w i a i , Δ I = I + I .
The quantity Δ I is the signed information balance that determines the published direction in the EFA regression-score workflow.

2.4. Ranking Metrics

For two score vectors s 1 , , s N and t 1 , , t N , ranking disagreement is measured by
Δ ^ = 1 N 2 1 i < j N 1 { ( s i s j ) ( t i t j ) < 0 } .
We also report Pearson correlation, Spearman rank correlation, and Kendall’s tau. For bivariate normal scores with Pearson correlation r, Kendall’s tau is 2 π 1 arcsin ( r ) and the pairwise disagreement probability is 1 / 2 π 1 arcsin ( r ) [5].

3. Theoretical Results

The results below are organized to separate two mechanisms that are easily confused. The first mechanism is covariance-model invariance: a diagonal sign matrix only changes coordinates and therefore preserves the population spectrum. The second mechanism is semantic scoring: once a fixed positive direction or an anchor orientation is imposed, untreated negative indicators can cancel the common-factor signal. The same data transformation can therefore look harmless in an eigenspace analysis but harmful in a ranking analysis.

3.1. Re-Extraction Invariance

Theorem 1 
(Population re-extraction invariance). Let Σ = a a T + ψ I p with a 0 and ψ > 0 , and let D be any diagonal sign matrix. Then Σ ( D ) = D Σ D has the same eigenvalues as Σ. If v = a / a is the leading eigenvector of Σ, then D v is a leading eigenvector of Σ ( D ) . Consequently,
( D v ) T ( D x ) = v T x = s .
Thus, if the leading direction is re-extracted and sign-aligned after the same coordinate sign transformation, the score is invariant at the population level up to the usual global factor-sign ambiguity.
This result is essential for correct interpretation. The empirical transition studied in this paper is not a claim that the ideal equicorrelation matrix D R D has a smaller eigenvalue gap near k = p / 2 . It does not; D R D is orthogonally similar to R. The transition occurs when scores are compared in a fixed semantic direction or when a sign-orientation rule maps an intrinsically sign-indeterminate EFA solution to a published direction.

3.2. Fixed-Direction Correlation

Theorem 2 
(Fixed-direction score correlation). Assume the one-factor model in Equation (1). Let s = v T x and t ( D ) = v T D x , where v = a / a . Define
A = a 2 , M ( D ) = a T D a = i = 1 p d i a i 2 .
Then
r ( D ) = Corr { s , t ( D ) } = M ( D ) ( 1 + ψ / A ) ( A + ψ ) ( M ( D ) 2 / A + ψ ) .
The sign of r ( D ) is the sign of M ( D ) . Therefore the critical region is not necessarily where the number of negative indicators equals the number of positive indicators. It is where the signed loading energy nearly cancels:
i : d i = + 1 a i 2 i : d i = 1 a i 2 .
Corollary 1 
(Equal-loading threshold). If a i 1 , D flips exactly k coordinates, and m = p 2 k , then
r ( D ) = m ( 1 + ψ / p ) ( p + ψ ) ( m 2 / p + ψ ) .
If p , k / p α , and ψ remains bounded, then
r ( D ) + 1 , α < 1 / 2 , 0 , α = 1 / 2 , 1 , α > 1 / 2 .

3.3. Ranking Consequences

Theorem 3 
(Pairwise ranking disagreement). Let ( s , t ) be zero-mean bivariate normal scores with Pearson correlation r. For two independent samples i and j,
Δ ( r ) = Pr { ( s i s j ) ( t i t j ) < 0 } = 1 2 1 π arcsin ( r ) .
Hence r = 1 gives identical rankings, r = 0 gives random pairwise ordering with disagreement 1 / 2 , and r = 1 gives complete reversal.
Combining Corollary 1 with Theorem 3 formalizes the observed three-regime pattern. Rare negative indicators imply r 1 and Δ 0 . Balanced positive and negative signal implies r 0 and Δ 1 / 2 . Dominant negative signal implies r 1 and Δ 1 .

3.4. EFA Regression Scores with Anchor-Based Sign Orientation

Theorem 4 
(Regression kernel and published-score direction). Consider the semantic one-factor model in Equation (14). Given a fixed oriented loading vector λ, the raw regression factor score satisfies
E ( f ^ raw θ ) = κ θ , κ = λ T Σ x x 1 λ > 0 .
Therefore the regression kernel itself has positive gain for the chosen factor direction. If a sign-orientation rule in Equation (12) is used and λ ^ ± λ with λ i = s i a i , then the published score has asymptotic direction
E ( f ^ θ ) sign ( Δ I ) θ ,
where Δ I is defined in Equation (15). Thus the published ordering is positive when Δ I > 0 , critical when Δ I = 0 , and reversed when Δ I < 0 .
For homogeneous items with a i a and w i w , Equation (15) gives
Δ I = w a ( m + m ) = w a p ( 1 2 ρ ) ,
where ρ = m / p is the negative-item fraction. The threshold is again ρ = 1 / 2 . For heterogeneous loadings and variances, a natural information weight is w i a i / ψ i , giving
Δ I i P a i 2 ψ i i N a i 2 ψ i .
Define
ρ w = I I + + I .
Then ρ w < 1 / 2 , ρ w = 1 / 2 , and ρ w > 1 / 2 correspond respectively to positive, critical, and reversed published directions. This explains why counting negative indicators alone can be misleading when their loadings or residual variances differ.

3.5. Finite-Sample Critical Band

Proposition 1 
(Critical band for sign orientation). Let S n = T ( λ ^ ) be the empirical anchor statistic and suppose
n { S n Δ I } d N ( 0 , τ 2 ) .
Then the sign-error probability is approximately
P flip = Pr { sign ( S n ) sign ( Δ I ) } Φ n | Δ I | τ ,
with exponential tail behavior of the form
P flip exp n Δ I 2 2 τ 2 .
Consequently, the practically unstable region has width | Δ I | = O ( n 1 / 2 ) .
If p 0 > 1 / 2 is the pairwise ordering accuracy when the orientation is correct, then a simple mixture representation gives
π n = ( 1 P flip ) p 0 + P flip ( 1 p 0 ) = 1 2 + ( 1 2 P flip ) p 0 1 2 .
When P flip is non-negligible near the critical band, the observed ordering accuracy is pulled toward 1 / 2 .

4. Simulation Protocol

All simulations were run using reproducible materials accompanying this manuscript. Three complementary protocols were used.

4.1. Fixed-Direction Spiked-Covariance Simulation

The first protocol tests Theorems 2 and 3. Data were generated from Equation (1) with p = 100 , a = 1 p , sample size N = 200 , and 200 Monte Carlo repetitions. The pairwise indicator correlation was set to ρ { 0.2 , 0.5 , 0.8 } through ψ = ( 1 ρ ) / ρ . The sign-flip count k was swept over a coarse grid with a dense grid near p / 2 . For every replicate and every k, the study computed Pearson correlation, Spearman correlation, Kendall’s tau, and pairwise disagreement between s = v T x and t ( D ) = v T D x . An estimated-direction variant used the top eigenvector from the original sample covariance matrix and aligned its sign with the population direction. The reproducible materials include the simulation procedures, numerical summaries, and figure-generation workflow needed to reproduce these results.
A second fixed-direction experiment used heterogeneous positive loadings. The loading vector was generated from a lognormal distribution, scaled to a 2 = p , and coordinates were flipped in decreasing order of a i 2 . This tests the weighted threshold M ( D ) = 0 rather than the count threshold k = p / 2 .

4.2. EFA Regression-Score Simulation

The second protocol tests Theorem 4 and illustrates the near-threshold sensitivity described by Proposition 1 using the factor_analyzer implementation of one-factor maximum-likelihood EFA and regression scores. In the homogeneous setting, p = 12 , a i = 0.65 , ψ i = 0.55 , and the negative-item fraction was scanned from approximately 0.08 to 0.92 , including the balanced case. In the heterogeneous setting, a i and ψ i were sampled once, and configurations were chosen to scan the weighted negative information ratio ρ w . Sample sizes were n { 200 , 400 , 800 } with 60 repetitions per configuration. The EFA workflow was
simulate data fit one factor EFA compute regression scores orient by T ( λ ^ ) = w T λ ^ .
The reproducible materials include the implementation details and summary outputs for this EFA regression-score protocol.

4.3. Calibrated Critical-Band Simulation

The third protocol directly tests Proposition 1. The anchor statistic was simulated under the local alternative
S n = Δ I + τ n Z , Z N ( 0 , 1 ) , Δ I = c n ,
with τ = 1 , n { 100 , 200 , 400 , 800 , 1600 } , c [ 0 , 3 ] , and 100 , 000 repetitions per grid point. Under this design,
P flip = Pr ( S n < 0 ) = Φ ( c ) ,
and n Δ I 2 = c 2 . The experiment is deliberately calibrated: it isolates the finite-sample sign-orientation mechanism and checks that flip probabilities collapse by c = n Δ I and decay on the exponential scale n Δ I 2 .

5. Results

5.1. Population Geometry and Re-Extraction

Figure 1 illustrates Theorem 1. The sign-flipped correlation matrix contains negative between-block correlations, but it is exactly D R D . Its leading eigenvector is D v , not an unstable arbitrary vector. Therefore, if a workflow re-extracts the leading direction from the transformed data and aligns factor signs consistently, the score agrees with the original score. This finding is a diagnostic safeguard: if a study observes dramatic ranking changes after re-extracting a one-factor solution, the cause should be sought in sign conventions, rotations, orientation rules, additional factors, sampling variation, or departures from the ideal one-factor model.

5.2. Fixed-Direction Scores Show the Three-Regime Transition

Figure 2 reports the fixed-direction simulation. In all three correlation settings, the score correlation is close to + 1 when k / p is far below 1 / 2 , approaches 0 at the balanced point, and approaches 1 when k / p is far above 1 / 2 . The ranking-disagreement panel follows the predicted transformation in Equation (23): near the critical point the disagreement rate is approximately 1 / 2 , which means that pairwise rankings are close to random relative to the desired score.
Table 1 gives representative numerical anchors from the same experiment. At k / p = 0.5 , Pearson correlations are essentially zero and pairwise disagreement is approximately 0.5 . At k / p = 1 , all rankings are reversed.

5.3. Heterogeneous Loadings Shift the Threshold

Figure 3 shows that when loadings are heterogeneous, the transition is controlled by the signed loading-energy balance β = M ( D ) / a 2 , not by the raw number of flipped indicators. The empirical Pearson and disagreement curves follow the closed-form predictions. Because high-loading coordinates were flipped first in this experiment, the transition occurs after relatively few high-energy variables are flipped; this is exactly the behavior predicted by Equation (20).

5.4. EFA Regression Scores Follow the Anchor Threshold

Figure 4 reports the EFA regression-score experiment. In the homogeneous case, Kendall’s tau between the published score and the external positive latent variable is positive when the negative fraction is below 1 / 2 , close to zero near the balanced configuration, and negative when the negative fraction exceeds 1 / 2 . Pairwise accuracy against the positive latent order is near 0.90 in the positive stable region, near random at the balanced point, and near 0.10 in the reversed stable region. In the heterogeneous case, the same switch is organized by ρ w , the weighted negative information ratio.
Table 2 gives representative values at n = 800 . The homogeneous simulation shows the expected positive, critical, and reversed regimes. The heterogeneous simulation shows that ρ w 0.905 produces a stable reversed published ranking even though the raw negative count is not the only relevant quantity.
The near-critical heterogeneous row in Table 2 is intentionally informative rather than idealized. It shows that a numerically tiny positive Δ I can still be dominated by implementation-level sign orientation in a finite sample and produce a reversed published score. This is consistent with Proposition 1: near Δ I = 0 , the orientation decision is highly sensitive, so the empirical direction can be unstable or implementation-dependent unless an external anchor is imposed.
The EFA experiment should therefore be read as practical evidence for the direction threshold and near-critical sensitivity. The calibrated tail-probability experiment below isolates the asymptotic sign-orientation statistic and directly tests the c / n critical-band scaling.

5.5. Critical-Band Simulation Supports the Flip-Probability Scaling

Figure 5 reports the dedicated local-alternative simulation with Δ I = c / n . In panel (a), the estimated flip probabilities for n = 100 , 200 , 400 , 800 , 1600 collapse onto the same curve as a function of c = n Δ I , matching the theoretical probability Φ ( c ) . This confirms the practical meaning of the O ( n 1 / 2 ) critical band: increasing n alone does not reduce the flip probability if the signed information difference shrinks at exactly the same 1 / n rate. Panel (b) plots the same results against n Δ I 2 = c 2 on a log scale. The Monte Carlo curve follows log Φ ( c ) and is bounded on the exponential scale described by Equation (31).

6. Discussion

6.1. What the Three Regimes Mean

The common phrase “negative indicators cause chaotic factor-analysis rankings” is imprecise. The results show that the practical phenomenon has a precise sign-cancellation interpretation. If a fixed positive semantic direction is used, untreated negative indicators subtract common-factor signal from the score. When the signed signal is strongly positive, rankings remain stable. When it is near zero, the score is dominated by noise and pairwise comparisons become close to random. When it is strongly negative, the ranking is stable but reversed.
For EFA regression scores, the regression scoring kernel does not fail. Conditional on a chosen factor direction, it has positive gain. The published reversal arises because the factor direction is not intrinsically signed and because the post-estimation orientation rule can choose the direction favored by the aggregate sign of the loadings. Thus, the central object is not simply the count of negative indicators but the signed information balance Δ I .

6.2. Practical Implications

The theory gives several concrete recommendations.
  • Reverse or otherwise harmonize all negative indicators before EFA, PCA, or any fixed-weight composite scoring. This is not merely a cosmetic preprocessing step; it determines whether common-factor signal is added or cancelled.
  • Report the sign-orientation rule used for factor loadings and scores. A table of loadings is not reproducible unless the rule for choosing the global sign is stated.
  • In heterogeneous scales, report a weighted diagnostic such as ρ w or Δ I , not only the number of negative indicators. A small number of high-loading negative indicators can dominate many weak positive indicators.
  • Treat Δ I 0 as a high-risk region. Near this critical band, small sampling fluctuations, software sign conventions, or anchor choices can change the published ranking direction.
  • Use external anchors when rankings carry substantive consequences. Examples include theoretically positive anchor indicators, independent validation variables, or pre-registered sign constraints.

6.3. Limitations

The exact closed-form expressions are derived under a one-factor Gaussian model with isotropic residual variance for the fixed-direction theory. The weighted EFA result allows heterogeneous loadings and variances, but it still assumes a one-factor orientation problem. Multi-factor EFA adds rotations, factor permutations, cross-loadings, and factor-specific sign choices. Finite-sample eigenvector perturbation can be analyzed with standard tools such as the Davis–Kahan theorem [6], but that perturbation mechanism is separate from the population sign-cancellation threshold studied here. Negative indicators may also induce method factors or response-style effects, especially in psychological questionnaires [10,11,16,18]. Those effects are important but distinct from the sign-orientation mechanism analyzed here.
The simulation evidence is deliberately transparent and reproducible rather than exhaustive. The main fixed-direction experiment uses 200 repetitions and the calibrated critical-band experiment uses 100 , 000 repetitions per grid point, which is sufficient for the reported curves; however, real-data replication remains important. Future work can extend the experiments to ordinal indicators, non-Gaussian noise, oblique rotations, multi-factor structures, confirmatory factor analysis, richer loading-estimator simulations, and real benchmark datasets with known external criteria.

7. Conclusions

This paper explains why untreated negative indicators can produce stable same-direction rankings, apparently chaotic rankings, or stable reversed rankings in factor-analysis-based evaluation. The key mechanism is sign cancellation under fixed semantic scoring or anchor-based sign orientation, not an intrinsic loss of the leading population eigenspace. In the equal-loading case, the threshold is the negative-indicator fraction 1 / 2 . In heterogeneous cases, the threshold is a weighted information balance. In finite samples, a critical band of order n 1 / 2 around zero information balance creates sensitivity to sampling fluctuation and software sign conventions.
For applied researchers, the conclusion is direct: negative indicators should be harmonized before factor analysis, the factor-score sign rule should be disclosed, and weighted sign-balance diagnostics should be checked before publishing sample rankings. These steps convert a fragile and sometimes reversed ranking workflow into a reproducible one.

Author Contributions

Conceptualization, Haitong Wei; methodology, Haitong Wei; software, Haitong Wei; validation, Haitong Wei; formal analysis, Haitong Wei; writing–original draft preparation, Haitong Wei; writing–review and editing, Haitong Wei. The author has read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data supporting this study consist of reproducible simulation code, generated numerical summaries, and figure outputs corresponding to the fixed-direction, EFA regression-score, heterogeneous-loading, and critical-band experiments. These materials are available from the author upon reasonable request and should be provided as supplementary materials with the preprint. No individual-level human participant data are involved.

Acknowledgments

The author thanks the maintainers of the open-source scientific Python ecosystem used for the reproducible simulations.

Conflicts of Interest

The author declares no conflicts of interest.

Use of Artificial Intelligence

An AI-assisted writing tool was used to help organize the manuscript draft and improve English expression. The author reviewed, edited, and takes responsibility for all content, analysis, and conclusions.

Abbreviations

The following abbreviations are used in this manuscript:
EFA Exploratory factor analysis
PCA Principal component analysis
ML Maximum likelihood

Appendix A. Proof Details

Appendix A.1. Proof of Theorem 1

Proof. 
Since D T = D and D 2 = I p , the matrix Σ ( D ) = D Σ D is orthogonally similar to Σ . Orthogonal similarity preserves eigenvalues. Moreover,
Σ ( D ) ( D v ) = D Σ D D v = D Σ v = D λ 1 v = λ 1 D v .
Thus D v is a leading eigenvector of Σ ( D ) . The re-extracted score is
( D v ) T ( D x ) = v T D T D x = v T x .
The only remaining ambiguity is the global sign of the eigenvector, which is unavoidable in EFA and PCA. □

Appendix A.2. Proof of Theorem 2

Proof. 
Let A = a 2 , M = a T D a , and v = a / A . Since x is Gaussian, ( s , t ( D ) ) is bivariate Gaussian. Its second moments determine the correlation. First,
Var ( s ) = v T Σ v = v T ( a a T + ψ I p ) v = A + ψ .
Second,
Cov { s , t ( D ) } = v T Σ D v
= v T a a T D v + ψ v T D v
= M + ψ M A = M 1 + ψ A .
Third,
Var { t ( D ) } = v T D Σ D v
= v T D a a T D v + ψ v T v
= M 2 A + ψ .
Combining these three expressions gives Equation (19). □

Appendix A.3. Proof of Corollary 1

Proof. 
When a i 1 , A = p . If exactly k entries of D are 1 , then
M ( D ) = i = 1 p d i = p 2 k = m .
Substitution into Equation (19) gives Equation (21). If k / p α , then m = p ( 1 2 α ) + o ( p ) . For α 1 / 2 , the leading terms in numerator and denominator have the same magnitude and the sign of 1 2 α , giving limits + 1 and 1 . For α = 1 / 2 , m = o ( p ) and, at exact balance, m = 0 , giving r = 0 . □

Appendix A.4. Proof of Theorem 3

Proof. 
For two independent samples, define Δ s = s i s j and Δ t = t i t j . The vector ( Δ s , Δ t ) is bivariate normal with the same correlation r as ( s , t ) . After standardization, let ( X , Y ) be standard bivariate normal with correlation r. The probability of concordance is
Pr ( X Y > 0 ) = Pr ( X > 0 , Y > 0 ) + Pr ( X < 0 , Y < 0 ) .
The bivariate normal quadrant identity gives
Pr ( X > 0 , Y > 0 ) = 1 4 + 1 2 π arcsin ( r ) .
By symmetry, the same probability holds for ( X < 0 , Y < 0 ) . Therefore
Pr ( X Y > 0 ) = 1 2 + 1 π arcsin ( r ) ,
so the discordance probability is
Δ ( r ) = 1 Pr ( X Y > 0 ) = 1 2 1 π arcsin ( r ) .

Appendix A.5. Proof of Theorem 4

Proof. 
For a fixed oriented loading vector λ , the regression score is f ^ raw = λ T Σ x x 1 x . Under x = λ θ + ε and E ( ε θ ) = 0 ,
E ( f ^ raw θ ) = λ T Σ x x 1 λ θ .
Because Σ x x is positive definite and λ 0 , the quadratic form κ = λ T Σ x x 1 λ is strictly positive.
Now let the population loading vector relative to the external positive trait be λ i = s i a i . The anchor statistic is
T ( λ ) = i w i s i a i = i P w i a i i N w i a i = Δ I .
If λ ^ ± λ , the sign-orientation rule chooses the sign determined by Δ I in the population limit. Multiplying the positive-gain raw regression score by that orientation gives Equation (25). □

Appendix A.6. Justification of Proposition 1

Under Equation (29), the event that the anchor statistic has the wrong sign is a one-sided normal tail event. If Δ I > 0 , then
P flip = Pr ( S n < 0 ) Φ n Δ I τ .
The case Δ I < 0 is identical after replacing Δ I by | Δ I | . Gaussian and sub-Gaussian tail inequalities then yield the exponential form in Equation (31). The argument shows that orientation errors vanish rapidly when n | Δ I | is large, but they remain non-negligible when | Δ I | = O ( n 1 / 2 ) .

References

  1. Spearman, C. General intelligence, objectively determined and measured. Am. J. Psychol. 1904, 15, 201–292. [Google Scholar] [CrossRef]
  2. Thurstone, L.L. The vectors of mind. Psychol. Rev. 1934, 41, 1–32. [Google Scholar] [CrossRef]
  3. Thomson, G.H. The Factorial Analysis of Human Ability; University of London Press: London, UK, 1939. [Google Scholar]
  4. Guttman, L. The determinacy of factor score matrices with implications for five other basic problems of common-factor theory. Br. J. Stat. Psychol. 1955, 8, 65–81. [Google Scholar] [CrossRef]
  5. Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  6. Davis, C.; Kahan, W.M. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 1970, 7, 1–46. [Google Scholar] [CrossRef]
  7. Lawley, D.N.; Maxwell, A.E. Factor Analysis as a Statistical Method, 2nd ed.; Butterworths: London, UK, 1971. [Google Scholar]
  8. Kaiser, H.F.; Rice, J. Little Jiffy, Mark IV. Educ. Psychol. Meas. 1974, 34, 111–117. [Google Scholar] [CrossRef]
  9. Green, B.F. On the factor score controversy. Psychometrika 1976, 41, 263–266. [Google Scholar] [CrossRef]
  10. Schmitt, N.; Stults, D.M. Factors defined by negatively keyed items: The result of careless respondents? Appl. Psychol. Meas. 1985, 9, 367–373. [Google Scholar] [CrossRef]
  11. Marsh, H.W. Negative item bias in ratings scales for preadolescent children: A cognitive-developmental phenomenon. Dev. Psychol. 1986, 22, 37–49. [Google Scholar] [CrossRef]
  12. Horn, J.L. A rationale and test for the number of factors in factor analysis. Psychometrika 1965, 30, 179–185. [Google Scholar] [CrossRef]
  13. Joreskog, K.G. Some contributions to maximum likelihood factor analysis. Psychometrika 1967, 32, 443–482. [Google Scholar] [CrossRef]
  14. Fabrigar, L.R.; Wegener, D.T.; MacCallum, R.C.; Strahan, E.J. Evaluating the use of exploratory factor analysis in psychological research. Psychol. Methods 1999, 4, 272–299. [Google Scholar] [CrossRef]
  15. MacCallum, R.C.; Widaman, K.F.; Zhang, S.; Hong, S. Sample size in factor analysis. Psychol. Methods 1999, 4, 84–99. [Google Scholar] [CrossRef]
  16. Billiet, J.B.; McClendon, M.J. Modeling acquiescence in measurement models for two balanced sets of items. Struct. Equ. Model. 2000, 7, 608–628. [Google Scholar] [CrossRef] [PubMed]
  17. Grice, J.W. Computing and evaluating factor scores. Psychol. Methods 2001, 6, 430–450. [Google Scholar] [CrossRef]
  18. Podsakoff, P.M.; MacKenzie, S.B.; Lee, J.-Y.; Podsakoff, N.P. Common method biases in behavioral research: A critical review of the literature and recommended remedies. J. Appl. Psychol. 2003, 88, 879–903. [Google Scholar] [CrossRef] [PubMed]
  19. Chen, J. Discussion of same-tendency processing methods in principal component and factor analysis. Stat. Inf. Forum 2005, 20, 19–22. (In Chinese) [Google Scholar]
  20. Woods, C.M. Careless responding to reverse-worded items: Implications for confirmatory factor analysis. J. Psychopathol. Behav. Assess. 2006, 28, 186–191. [Google Scholar] [CrossRef]
  21. DiStefano, C.; Zhu, M.; Mindrila, D. Understanding and using factor scores: Considerations for the applied researcher. Pract. Assess. Res. Eval. 2009, 14, 1–11. [Google Scholar] [CrossRef]
  22. Roszkowski, M.J.; Soven, M. Shifting gears: Consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assess. Eval. High. Educ. 2010, 35, 113–130. [Google Scholar] [CrossRef]
  23. Brown, T.A. Confirmatory Factor Analysis for Applied Research, 2nd ed.; Guilford Press: New York, NY, USA, 2015. [Google Scholar]
  24. Weijters, B.; Baumgartner, H.; Schillewaert, N. Reversed item bias: An integrative model. Psychol. Methods 2013, 18, 320–334. [Google Scholar] [CrossRef]
  25. Howard, M.C. A review of exploratory factor analysis decisions and overview of current practices: What we are doing and how can we improve? Int. J. Hum.-Comput. Interact. 2016, 32, 51–62. [Google Scholar] [CrossRef]
  26. Zhang, X.; Noor, R.; Savalei, V. Examining the effect of reverse worded items on the factor structure of the Need for Cognition Scale. PLoS ONE 2016, 11, e0157795. [Google Scholar] [CrossRef]
  27. Flora, D.B.; Flake, J.K. The purpose and practice of exploratory and confirmatory factor analysis in psychological research: Decisions for scale development and validation. Can. J. Behav. Sci. 2017, 49, 78–88. [Google Scholar] [CrossRef]
  28. Ferrando, P.J.; Lorenzo-Seva, U. Assessing the quality and appropriateness of factor solutions and factor score estimates in exploratory item factor analysis. Educ. Psychol. Meas. 2018, 78, 762–780. [Google Scholar] [CrossRef] [PubMed]
  29. Suarez-Alvarez, J.; Pedrosa, I.; Lozano, L.M.; Garcia-Cueto, E.; Cuesta, M.; Muniz, J. Using reversed items in Likert scales: A questionable practice. Psicothema 2018, 30, 149–158. [Google Scholar] [CrossRef]
  30. Watkins, M.W. Exploratory factor analysis: A guide to best practice. J. Black Psychol. 2018, 44, 219–246. [Google Scholar] [CrossRef]
  31. Goretzko, D.; Pham, T.T.H.; Buhner, M. Exploratory factor analysis: Current use, methodological developments and recommendations for good practice. Curr. Psychol. 2021, 40, 3510–3521. [Google Scholar] [CrossRef]
  32. Knekta, E.; Runyon, C.; Eddy, S. One size doesn’t fit all: Using factor analysis to gather validity evidence when using surveys in your research. CBE Life Sci. Educ. 2019, 18, rm1. [Google Scholar] [CrossRef] [PubMed]
  33. Auerswald, M.; Moshagen, M. How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychol. Methods 2019, 24, 468–491. [Google Scholar] [CrossRef]
  34. Kim, K.-C.; Wei, H.-T. Development of a face detection and recognition system using a RaspberryPi. J. Korea Inst. Electron. Commun. Sci. 2017, 12, 859–864. [Google Scholar]
  35. Dueber, D.M.; Toland, M.D.; Lingat, J.E.; Love, A.M.A.; Qiu, C.; Wu, R.; Brown, A.V. To reverse item orientation or not to reverse item orientation, that is the question. Assessment 2022, 29, 1229–1249. [Google Scholar] [CrossRef]
  36. Garcia-Fernandez, J.; Postigo, A.; Cuesta, M.; Gonzalez-Nuevo, C.; Menendez-Aller, A.; Garcia-Cueto, E. To be direct or not: Reversing Likert response format items. Span. J. Psychol. 2022, 25, e33. [Google Scholar] [CrossRef]
  37. Kam, C.C.S.; Meyer, J.P.; Sun, S. Why do people agree with both regular and reversed items? A logical response perspective. Assessment 2021, 28, 1374–1386. [Google Scholar] [CrossRef] [PubMed]
  38. Ou, X. Multidimensional structure or wording effect? Reexamination of the factor structure of the Chinese General Self-Efficacy Scale. J. Pers. Assess. 2022, 104, 64–73. [Google Scholar] [CrossRef] [PubMed]
  39. Revuelta, J.; Ximenez, M.C.; Olea, J. Overfactoring in rating scale data: A comparison between factor analysis and item response theory. Front. Psychol. 2022, 13, 982137. [Google Scholar] [CrossRef]
  40. D’Urso, E.D.; Tijmstra, J.; Vermunt, J.K.; De Roover, K. Awareness is bliss: How acquiescence affects exploratory factor analysis. Educ. Psychol. Meas. 2023, 83, 568–597. [Google Scholar] [CrossRef]
  41. Dodeen, H. The effects of changing negatively worded items to positively worded items on the reliability and the factor structure of psychological scales. J. Psychoeduc. Assess. 2023, 41, 733–745. [Google Scholar] [CrossRef]
  42. Kam, C.C.S. Why do regular and reversed items load on separate factors? Response difficulty vs. item extremity. Educ. Psychol. Meas. 2023, 83, 1085–1112. [Google Scholar] [CrossRef]
  43. Steger, D.; Jankowsky, K.; Schroeders, U.; Wilhelm, O. The road to hell is paved with good intentions: How common practices in scale construction hurt validity. Assessment 2023, 30, 1811–1824. [Google Scholar] [CrossRef]
  44. Tang, D.; Boker, S.M.; Tong, X. Are the signs of factor loadings arbitrary in confirmatory factor analysis? Problems and solutions. Struct. Equ. Model. 2025, 32, 142–154. [Google Scholar] [CrossRef] [PubMed]
  45. Elek, D.; Cigler, H.; Gruning, D.J.; Jezek, S. Advancing the psychometrics of reverse-keyed items: Enriching cognitive theory by a logical and linguistic perspective. Front. Psychol. 2025, 16, 1684612. [Google Scholar] [CrossRef] [PubMed]
  46. Ertuna, L.; Kaya-Uyanik, G.; Gencaslan, D. More than just noise: Careless responding and its systematic effects on reliability, validity, and measurement invariance. Front. Psychol. 2026, 17, 1815225. [Google Scholar] [CrossRef]
  47. Wei, H.; Wang, X. Financial risk management early-warning model for Chinese enterprises. J. Risk Financ. Manag. 2024, 17, 255. [Google Scholar] [CrossRef]
  48. Wei, H. DecorPGNet: Functional area division and layout algorithm model in living rooms of Chinese apartment-style family homes. Civ. Eng. Res. J. 2024, 15, 555902. [Google Scholar] [CrossRef]
  49. Wei, H. Exploring and practicing the quantification of interior design colors from an IKEA design perspective. J. Sens. Netw. Data Commun. 2024, 4, 01–12. [Google Scholar] [CrossRef]
Figure 1. Population geometry of a coordinate sign flip. Panels (a) and (b) show an equicorrelation matrix and its sign-flipped form R = D R D . Panels (c) and (d) show the corresponding leading eigenvectors. The transformed eigenvector is D v , so re-extracted scores are invariant up to a global sign.
Figure 1. Population geometry of a coordinate sign flip. Panels (a) and (b) show an equicorrelation matrix and its sign-flipped form R = D R D . Panels (c) and (d) show the corresponding leading eigenvectors. The transformed eigenvector is D v , so re-extracted scores are invariant up to a global sign.
Preprints 216582 g001
Figure 2. Fixed-direction scoring results for p = 100 , N = 200 , 200 repetitions, and ρ { 0.2 , 0.5 , 0.8 } . Solid curves are empirical means with empirical bands; dashed curves are theory. The vertical dotted line marks k / p = 1 / 2 .
Figure 2. Fixed-direction scoring results for p = 100 , N = 200 , 200 repetitions, and ρ { 0.2 , 0.5 , 0.8 } . Solid curves are empirical means with empirical bands; dashed curves are theory. The vertical dotted line marks k / p = 1 / 2 .
Preprints 216582 g002
Figure 3. Heterogeneous fixed-direction simulation. The horizontal axis is the signed loading-energy balance β = M ( D ) / a 2 . The transition occurs at β 0 , demonstrating that heterogeneous negative indicators must be evaluated by information weight rather than by count alone.
Figure 3. Heterogeneous fixed-direction simulation. The horizontal axis is the signed loading-energy balance β = M ( D ) / a 2 . The transition occurs at β 0 , demonstrating that heterogeneous negative indicators must be evaluated by information weight rather than by count alone.
Preprints 216582 g003
Figure 4. EFA regression scores oriented by an anchor statistic. Panels (a) and (c) use homogeneous loadings and the raw negative-item fraction ρ . Panels (b) and (d) use heterogeneous loadings and the weighted negative information ratio ρ w . The vertical dotted line marks the predicted threshold 1 / 2 ; the horizontal dashed line in the accuracy panels marks random pairwise ordering.
Figure 4. EFA regression scores oriented by an anchor statistic. Panels (a) and (c) use homogeneous loadings and the raw negative-item fraction ρ . Panels (b) and (d) use heterogeneous loadings and the weighted negative information ratio ρ w . The vertical dotted line marks the predicted threshold 1 / 2 ; the horizontal dashed line in the accuracy panels marks random pairwise ordering.
Preprints 216582 g004
Figure 5. Calibrated finite-sample critical-band experiment. The anchor statistic is simulated as S n = Δ I + τ Z / n with τ = 1 , Δ I = c / n , and 100 , 000 repetitions per grid point. Panel (a) shows collapse by the local signal scale c. Panel (b) shows the exponential decay scale in n Δ I 2 = c 2 .
Figure 5. Calibrated finite-sample critical-band experiment. The anchor statistic is simulated as S n = Δ I + τ Z / n with τ = 1 , Δ I = c / n , and 100 , 000 repetitions per grid point. Panel (a) shows collapse by the local signal scale c. Panel (b) shows the exponential decay scale in n Δ I 2 = c 2 .
Preprints 216582 g005
Table 1. Representative fixed-direction results at three negative-indicator fractions. Values are Monte Carlo means from the saved simulation outputs.
Table 1. Representative fixed-direction results at three negative-indicator fractions. Values are Monte Carlo means from the saved simulation outputs.
ρ k / p Pearson Spearman Kendall τ Disagreement
0.2 0.0 1.000 1.000 1.000 0.000
0.2 0.5 0.006 0.008 0.005 0.497
0.2 1.0 -1.000 -1.000 -1.000 1.000
0.5 0.0 1.000 1.000 1.000 0.000
0.5 0.5 -0.006 -0.008 -0.005 0.503
0.5 1.0 -1.000 -1.000 -1.000 1.000
0.8 0.0 1.000 1.000 1.000 0.000
0.8 0.5 0.005 0.007 0.005 0.498
0.8 1.0 -1.000 -1.000 -1.000 1.000
Table 2. Representative EFA regression-score results at n = 800 . Values are Monte Carlo means from the saved numerical summaries.
Table 2. Representative EFA regression-score results at n = 800 . Values are Monte Carlo means from the saved numerical summaries.
Scenario Ratio used Ratio Δ I Kendall τ Pairwise accuracy
Homogeneous ρ 0.083 6.500 0.797 0.899
Homogeneous ρ 0.500 0.000 0.054 0.527
Homogeneous ρ 0.917 -6.500 -0.797 0.101
Heterogeneous ρ w 0.087 9.749 0.819 0.909
Heterogeneous ρ w 0.500 0.010 -0.820 0.090
Heterogeneous ρ w 0.905 -9.566 -0.818 0.091
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated