Submitted:
30 May 2026
Posted:
02 June 2026
You are already at the latest version
Abstract
Keywords:
Introduction
| Technique | Methodological description |
|---|---|
| Linear Regression | Models continuous outcome/target variables as the weighted sum of covariates/’predictor’ variables; generating coefficient estimates that minimize squared errors to accommodate the model’s adjusted variance, assuming independent, normally distributed residuals and linear relationships; and supports: causal inference of temporally consistent exposure-outcome relationships (subject to the mitigation of confounder/collider bias); and ‘predictive estimation’ of target variables through inter/extrapolation |
| Logistic Regression | Similar to Linear Regression except that the outcome/target variable is dichotomous and odds ratios are used to describe their relationships with covariates/’predictor’ variables; but inference is limited to strength/direction/precision of causal relationships/‘predictive estimates’, not to linear/ordinal trends |
| K-Nearest Neighbours | A non-parametric algorithm that classifies observations by averaging the k closest data points in feature space; capturing local distributional structures (albeit sensitive to scaling and the choice of k); and can provide either grouping of categorical variables or estimates of continuous feature variables |
| K-Means Clustering | Partitions data into k groups by minimizing within-cluster distances to centroids; iteratively updating cluster assignments and centroids; revealing unmeasured group structure/pattern similarity; but dependent on the pre-specification of k, and the existence of/assumption that clusters are spherical |
| Analysis of Covariance (ANCOVA) | Similar to combining Linear Regression with Analysis of Variance (ANOVA) to compare group means while adjusting for continuous/categorical covariates; thereby assessing if adjusting for covariates alters coefficient estimates; and supports causal inference (subject to temporal consistency and bias mitigation) and ‘predictive estimation’ (assuming independent measurements and linear adjustment) |
| Trend Plotting | Visualizes changes in the value of variables over time (or within a sequence), often using smoothing or fitted lines to: identify growth/decline, cycles, turning points, and periodicities; guide modelling, hypothesis formation and ‘predictive’ estimation; albeit with limited mechanistic causal insight |
| Auto-Regressive Integrated Moving Average (ARIMA) |
A time-series model combining autocorrelation (AR), differencing (I), and moving-average smoothing (MA); capturing temporal dependencies, and using past measurements to (predictively) estimate future values; though, when data are stationary (or are made so), this technique can also support inference about persistence, cycles, and forecasting uncertainties |
| Bayesian Analysis | Combines prior beliefs with measured data using Bayes’ theorem to produce: a posteriori distributions that support probabilistic inference about parameters/hypotheses; with direct quantification of uncertainty; and dynamic learning as new data are added, are collected or become available |
| Monte Carlo Simulation |
Uses repeated random sampling from specified distributions to estimate the range, probability, and sensitivity of outcomes in complex or uncertain systems; thereby providing probabilistic outcome distributions, risk assessments, and decision-reliability estimates |
| Cross-Validation | Assesses model generalisability by partitioning data into training and validation subsets; thereby evaluating estimative model performance on ‘unseen’ data; while being capable of detecting ‘overfitting’ and supporting model selection based on ‘out-of-sample’ accuracy measures |
| Bootstrapping | Resamples data with replacement to empirically (‘predictively’) estimate sampling distributions of finite datasets; providing standard errors/confidence intervals, and bias estimates even when strong parametric assumptions are absent; thereby assessing the level of uncertainty evident within (more modest, and) finite samples in the absence of more substantive sample sizes |
Methods
- The conscious and deliberate, generic (technique-agnostic) and particular (technique-specific), choices and decisions analysts must (be competent to) make when using each of the 11 techniques – and the potential analytical and inferential consequences were any of these choices/decisions to be misjudged, or overlooked/made passively (‘by default’);
- The generic (technique-agnostic) and particular (technique-specific) parametric and non-parametric assumptions that need apply when using each of the 11 techniques – and the potential analytical and inferential consequences were any of these assumptions to be violated;
- The generic (technique-agnostic) and particular (technique-specific) sources of non-systematic error (imprecision) and systematic error (bias) that can affect the reliability, validity and interpretation of findings generated using each of the 11 techniques – and the analytical and inferential consequences were steps not to have been taken so as to avoid, attenuate, mitigate, or acknowledge (and accommodate) these sources of imprecision/bias (and their effects);
- The diagnostic assurance measures that producers of ‘statistical analyses’1 – i.e. those involved in identifying/selecting/collating/generating, processing and analysing quantitative/enumerable datasets, such as those working within the UK’s intelligence collection “disciplines” (MOD, 2023) – can, and should take to ensure, evidence and reassure that: the most appropriate analytical choices/decisions have been made (see 1, above); all necessary assumptions apply (see 2, above); and all potential sources of imprecision and bias have been: avoided or attenuated (where possible); their effects mitigated (again, where possible); or any residual bias acknowledged (as a potential influence on the findings obtained) and accommodated (in the level of uncertainty and confidence applied when interpreting these findings; see 3, above) – these being diagnostic measures that should have been implemented within the reported design and implementation of the analyses concerned (or may need to be retrospectively applied during translation), so that consumers of statistical analyses (such as those working within the intelligence analysis “specialisms”; MOD, 2023) are aware of any potential gaps, weaknesses, flaws and residual uncertainties in the findings generated by these analyses;
- Potential misrepresentations evident in the “typical analytical insights” Duffield (2026) provided for each of these 11 techniques (see Table 1, above), and potential misinterpretation of the inferences these “insights” might support – subject to: the proficiency with which each of the 11 techniques have been applied (see 1 and 2, above); the extent to which any sources of imprecision and bias have been addressed or acknowledged (and accommodated; see 3, above); and the diagnostic measures that producers of statistical analyses took to ensure, assure (and evidence) that their analyses were both rigorous and robust (see 4, above); and
- Whether the knowledge and skills required to accommodate each of the considerations summarised in 1 through 5 (above) might plausibly be accessible to analysts operating from what Duffield (2026) acknowledged was “a relatively low mathematical baseline”.
Results
1. What Choices/Decisions Must Analysts Make When Using the 11 Techniques?
- the number of ‘cases’ (i.e. ‘population’ members) on which measurements are available is sufficient to both: support the analytical techniques applied; and reduce the risk of the findings being subject to chance effects (i.e. non-systematic error);
- measurements are available on all cases for all variables pertinent to (and required for) the sample- and variable-dependent analyses intended (and that any under-/over-represented cases can be dealt with, where necessary/appropriate, using robust weighting variables; and any missing values can be accurately imputed – and not least when missing cases/values are not missing [completely] at random, and therefore risk undermining the representativeness of the dataset); and
- these variables include not only the specified ‘target’ variable (for ‘predictive estimation’ and optimisation objectives) or ‘outcome’ variable (for description and causal inference objectives), but also a sufficiently varied array of covariates/predictor variables so as to optimise the diversity of statistical information required to: strengthen the accuracy and precision of the estimated dataset features (for ‘predictive estimation’ and optimisation objectives); and adjust for potential confounders (for causal inference purposes).
2. What Parametric/Non-Parametric Assumptions Need Apply When Using the 11 Techniques?
- the representativeness/external validity of the dataset used;
- the reliability and internal validity of the measurements taken/available on each of the variables therein;
- the appropriateness of the data type, scaling and transformation(s) adopted for/applied to these variables – particularly as required for the statistical technique concerned;
- the independence of measurements/observations made for each of the dataset’s/sample’s individual ‘cases’/’population’ members;
- the absence (or appropriate treatment of) extreme (‘outlier’) measurements/observations;
- the appropriateness of the size of the dataset/sample, and any necessary randomness assumptions (where relevant to re-/sub-sampling procedures germane to the statistical technique applied); and
- a sound theoretical understanding of the temporal sequence of variables – particularly for modelling time-dependent relationships/associations (such as causal pathways).
3. What Are the Potential Sources of Imprecision and Bias When Using the 11 Techniques?
- First, that many ‘standard’ techniques may prove less applicable to the data and datasets that are available, and of most interest, to intelligence practitioners and their customers; and
- Second, that these data/datasets are likely to pose novel statistical challenges to many of the more commonly used statistical analytical techniques.
4. What Diagnostic Measures Should Producers of Statistical Analyses Use to Safeguard Their Methods?
- made appropriate choices and decisions when specifying/selecting their analytical objectives, statistical analytical designs, sampling frames/secondary datasets, measurement/observation protocols, analytical models and inferential interpretations – as summarised in Table A1.1, Table A1.2 and Table A1.3, and described in Section 1 (above);
- designed and conducted their analyses using analytical objectives, datasets, analytical models and inferences that comply with the assumptions required of all (or applicable principally to) specific statistical techniques – as summarised in Table A2.1 and Table A2.2, and described in Section 2, above; and
- applied great care to each of the (design-, sampling/selection-, measurement/observation-, analysis- and interpretation-related) steps involved so as to avoid, attenuate, mitigate, or acknowledge (and accommodate) all relevant sources of imprecision/bias (and thereby their consequences/effects) – as summarised in Table A3.1 and Table A3.2, and described in Section 3 (above).
5. What Alternative Interpretations/Inferences Might Each of Duffield’s 11 “Typical Analytical Insights” Support?
“Since all [statistical] models are wrong the scientist [statistician, or analyst] must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad.”
- misrepresent what each technique involves or what findings it can provide; and thereby lead to the
- misinterpretation of what might properly be inferred from a more cautious summary of their findings.
| “Technique” | “Theme” | “Typical Analytical Insight” | Potential methodological misrepresentation | Potential inferential misinterpretation | Alternative description |
|---|---|---|---|---|---|
| Linear Regression | Regression | “After controlling for other variables, increases in X are consistently associated with proportional increases in Y, suggesting X is a meaningful driver rather than background noise.” |
|
|
“X and Y are positively associated with one another after adjusting for all/all preceding* measured covariates.” *The inferential interpret-ability/-ion of the association depends upon which covariates were adjusted for, hence this warrants clarification. |
| Logistic Regression | Regression | “The probability of event A rises sharply once factors X and Y are present, indicating a threshold effect that distinguishes high-risk cases from the baseline.” |
|
|
“The odds that A occurs is positively associated with the presence of (both)* X and Y.” *Subject to clarification of the “typical analytical insight” as originally drafted |
| K-Nearest Neighbours | Classification | “This actor’s recent behaviour most closely resembles that of a small subset of previously observed cases, which historically went on to exhibit outcome B.” |
|
|
“(On the basis of the recent behavioural features measured)* This actor is most closely associated with past cases that later exhibited outcome B; but this actor’s risk of outcome B will depend upon the continuing presence of any other necessary factors/features.” *Subject to the limited scope of the behavioural features on which K-means classification was performed. |
| K-Means Clustering | Classification | “Observed entities naturally separate into four distinct behavioural groupings, each with internally consistent patterns and materially different risk profiles.” |
|
|
“(On the basis of the behavioural features measured)* Four distinct and internally consistent clusters were identified (and separate analyses indicate these clusters have different risk profiles for X [to be specified])” *Subject to the limited scope of the behavioural features on which K-means classification was performed. |
| “Technique” | “Theme” | “Typical Analytical Insight” | Potential methodological misrepresentation | Potential inferential misinterpretation | Alternative description |
|---|---|---|---|---|---|
| ANCOVA (Analysis of Covariance) | Causal Inference | “Once baseline capability and environment are accounted for, the apparent gap between actors narrows significantly, indicating that much of the observed difference reflects starting conditions rather than divergent behaviour.” |
|
|
“The extent of the difference observed between different types of actors is substantively reduced following adjustment for baseline capability and environment, suggesting that much of the difference observed is due to their different starting conditions rather than subsequent developments”* *This is somewhat vague without information on what “differences” between actors were examined. |
| Trend Plotting | Time Series | “The underlying trajectory shows a sustained upward movement over multiple periods, with short-term volatility masking a longer term structural change.” |
|
|
“Shorter-term fluctuations in the measurements of this feature mask a sustained, longer-term upward trajectory indicative of a possible change in the prevailing conditions or underlying mechanisms involved.”* *This is necessarily abstract in the absence of information on the feature, time-frames and context concerned. |
| ARIMA (Auto-Regressive Integrated Moving Average) |
Time Series | “Assuming current dynamics persist, activity levels are likely to remain within a bounded range over the next three periods, with a non-trivial risk of a sharp deviation thereafter.” |
|
|
“Conditional on the fitted ARIMA dynamics, forecasts of measurable activity levels are likely lie within X%* intervals for three periods, but less likely to remain within these intervals thereafter.” *To better define “a bounded range”. |
| Bayesian Analysis | Probabilistic Modelling | “Given the new reporting, confidence in hypothesis A has increased substantially, while alternative explanations now carry materially lower probability.” |
|
|
“The a posteriori probability for hypothesis A increased following the availability of this new evidence, with the probabilities of mutually exclusive alternative hypotheses being correspondingly reduced.” |
| “Technique” | “Theme” | “Typical Analytical Insight” | Potential methodological misrepresentation | Potential inferential misinterpretation | Alternative description |
|---|---|---|---|---|---|
| Monte Carlo Simulation |
Probabilistic Modelling | “Over thousands of simulations, four plausible scenarios emerged, including a sharp deterioration scenario, which occurs when both X and Y occur at a similar time.” |
|
|
“(Under the assumed input distributions, at least)* four scenarios are evident, one of which involves a sharp deterioration that is associated with the co-occurrence of X and Y” *The original “typical analytical insight” would benefit from adding this disclaimer and clarifying how many scenarios are evident. |
| Cross-Validation | Cross-Validation | “The model’s predictive performance remains stable across unseen data, indicating that the identified patterns are likely to generalise rather than reflect overfitting.” |
|
|
“The performance of the model across unseen datasets is consistent across folds, suggesting that the patterns identified are generalizable to these datasets.” |
| Bootstrapping | Bootstrapping | “Across thousands of resampled datasets, the key estimate remains tightly clustered, suggesting the conclusion is robust and not driven by a small number of observations.” |
|
|
“Bootstrap resampling yielded a narrow sampling distribution for the estimate – indicating that, despite the small number of observations, the key estimate generated from the sample had substantial statistical precision.” |
5.1. Do These “Typical Analytical Insights” Methodologically Misrepresent the Techniques Involved?
- the intended ‘analytical objective’ of the “typical” example/application used as a basis for drafting each technique’s “insight” (i.e. whether the objective be for descriptive, ‘predictive estimation’, optimisation and/or causal inference purposes) – an issue affecting all of the examples/applications and “insights” drafted for these 11 techniques; and a particular issue for those techniques that can be modelled in different ways to inform more than one analytical objective;
- which covariates were adjusted for/conditioned upon – as for: the “other variables” that were “controll[ed] for” in the “insight” drafted for Linear Regression; and variables relevant to “baseline capability and environment” that were “accounted for” in the “insight” drafted for ANCOVA;
- what other additional/subsidiary/ancillary techniques were involved to generate additional findings that would not have been generated by the principal technique concerned – as for: the “threshold effect” in the “insight” drafted for Logistic Regression; the “natural” separation of “distinct groupings” (and the “material” nature of these groupings’ “risk profiles”) in the “insight” drafted for K-Means Clustering; the “[statistical] significance” of the “narrow[ing]” of differences between “[different groups of] actors” in the “insight” drafted for ANCOVA; the “sharp[ness]” of the “deviation” in the “insight” drafted for ARIMA; and the “structural” nature of the “longer term… change” in the “insight” drafted for Trend Plotting;
- whether any extrapolated estimates of future phenomena/trends might be subject to the assumption of ceteris parabus (all other things being equal/remaining unchanged) – as would be the case for the risk-associated inference of “actor” classification inferred in the “insight” drafted for K-Nearest Neighbours; and
- the meaning of value-laden claims (and, where possible, the quantification of such claims) – as for the “bounded range” and “non-trivial[ity]” of trends in “activity levels” in the “insight” drafted for ARIMA; the “substantially… increased… confidence” and “materially lower probability” in the “insight” drafted for Bayesian Analysis; and the “sharp deterioration” in the “insight” for Monte Carlo Simulation.
5.2. Do these “typical analytical insights” lead to the inferential misinterpretation of the techniques involved?
- Linear Regression – This analytical technique can be used to support causal inference based on bivariate associations evident within observational datasets (in this instance that: “X is a… driver [for Y]”) but only under very specific conditions (these being: temporal-consistency; dataset representativeness; and the availability, and selection, of an appropriate covariate adjustment set; see ANCOVA, below for more details). Even when these conditions hold, this technique is not able to determine definitively whether any such association is “meaningful”, since the specified cause/”driver” (in this instance: “X”) may only be incidentally/indirectly associated with a direct cause of Y (since such an association can also occur through chance, unacknowledged collider bias, or as a result of unadjusted/unmeasured/residual confounding). The “insight” as drafted therefore warrants revision to: temper these explicit inferential claims (and their narrow, and potentially unfounded/unsound, interpretations); or better detail the methodological dependencies upon which these rely (including each of the specific conditions – temporal-consistency, dataset representativeness, and appropriate adjustment set composition – necessary for generating probabilistic causal inference from observational datasets; and any additional techniques applied to assess the “meaningful”-ness of the causal link inferred between “X” and “Y”).
- Logistic Regression – This analytical technique can be used to determine whether the ‘odds’ of a binary outcome (in this instance: the “probability of event A” occurring vs. not occurring) is higher/lower, stronger/weaker or more/less precise in the presence/absence of other “factors”. However, this technique neither provides nor supports inferential assessment as to whether a “threshold effect” is/is not present, or what “factors” might be associated with (or responsible for) such an effect’s ‘inflection point’, unless the model(s) used is(are) designed so as to support such assessments. It is also at risk of inferential misinterpretation were “factors X and Y” to have occurred/crystalised after “event A” happened (i.e. as potentially direct/indirect consequences of “A”); before “event A” (i.e. as potentially direct/indirect causes of “A”); or, indeed, were these “factors” to be ‘mathematically coupled’ to, or indivisible components/features of, “event A”. The apparent relationship between (the presence/absence of) “X” and “Y” and the odds of “event A” will likewise only be consistent with a causal relationship under very specific conditions (these being: temporal-consistency; dataset representativeness; and the availability, and selection, of an appropriate covariate adjustment set; see ANCOVA, below for more details). The “insight” as drafted therefore warrants revision to: temper these inferential claims (and their narrow, and potentially unfounded/unsound, interpretations); or better detail the methodological dependencies upon which these rely (including any additional modelling/analytical techniques applied to assess the: nature of the relationship between “X” and “Y” and “event A”; and presence of a potential “threshold effect”).
- K- Nearest Neighbours – This analytical technique can be used to classify/group entities, events, processes or characteristics thereof with regard to similarities in related features (in this instance: “recent behaviour”). However, it does not necessarily provide or support inferential assessment as to whether any group member so classified necessarily shares the subsequent risk profile of all/most of the other group members (in this instance: “a small subset of previously observed cases”), unless: such inference involved risk-relevant outcomes (in this instance: “outcome B” or causes/determinants thereof) that were components of, mathematically coupled to, or (manifestly/latently, and directly/indirectly) causally associated with, the variables used for classification/grouping; and all necessary and sufficient factors/circumstances required for the “historical… outcome[s]” concerned to remain in place. The “insight” as drafted therefore warrants revision to: temper these inferential claims (and their potentially unfounded/unsound interpretations); or better detail the methodological dependencies upon which these rely (including the inclusion of risk-relevant/related components/correlates/latent features within the classification models; and the temporal stability of any factors/circumstances on which “outcome B” depends).
- K-Means Clustering – This analytical technique can be used to classify/group entities, events, processes or characteristics thereof with regard to similarities in related features (in this instance: “recent behaviour”). However, it does not necessarily provide or support inferential assessment as to: why the groupings are evident or how they arise (in this instance: whether “naturally” vs. artefactually); or the nature/extent of any group-specific meta-features (in this instance: “internally consistent patterns”) or group-related differences in risk profiles (in this instance: whether “material” vs. intangible/insubstantial), unless: such inference involved risk profile-relevant/related features that were components of, mathematically coupled to, or (manifestly/latently, and directly/indirectly) causally associated with, the variables used for classification/grouping; and all necessary and sufficient factors/circumstances required for the “risk profiles” concerned to remain in place. The “insight” as drafted warrants revision to temper these inferential claims (and their potentially unfounded/unsound interpretations); or better detail the methodological dependencies upon which these rely (including the inclusion of: what techniques/measures were used to determine whether the groupings “separate… naturally”, and the “internally consistent patterns” and “materially different risk profiles” described; and the inclusion of risk-relevant/related components/correlates/latent features within the classification models).
- ANOVA (Analysis of Covariance) – As its “thematic classification” suggests, this analytical technique can be used to inform (probabilistic) causal inference (as can other, predominantly associational/correlational analytical techniques, such as Linear and Logistic Regression; see above) but only – as mentioned earlier – under very specific conditions. These are: temporal-consistency (such that: the specified cause precedes its specified consequence; and any covariates controlled/adjusted for/conditioned upon precede both the specified cause and consequence); dataset representativeness (to mitigate the risk of a form of collider bias, known as ‘endogenous selection bias’; Elwert and Winship, 2014); and the availability, and selection, of an appropriate, and appropriately diverse, covariate adjustment set (so as to optimally mitigate the risk of measured/measurable confounding, while avoiding any additional risk of collider bias resulting from inappropriate adjustment for variables that occur after the specified cause and/or consequence; Ellison, 2023). In the “insight” originally drafted for this technique, the risk of inferential misinterpretation is modest, despite the technique being thematically misclassified as (predominantly/solely) “causal inference”. Nonetheless, the allusion to a narrowing of an “apparent gap” between (presumably) different groups of “actors” after “accounting for… baseline capability and environment” does imply that the latter were considered potential sources of confounding that, once “accounted for” (presumably through inclusion of relevant variables in the model’s covariate adjustment set), revealed a much narrower gap between the actors – suggesting that (groups of) different “actors” were more similar in terms of their (more recent) “behaviour” than was “apparent” prior to adjustment. This constitutes causal inference in the sense that the (confounding-adjusted) differences between “actors” was found to be small, and therefore predominantly caused by preceding differences in “baseline capability and environment” (i.e. factors acting as bona fide ‘confounders’) rather than contemporaneous divergence in behaviour. Meanwhile, the use of the phrase “narrows significantly” implies that an additional statistical technique will have been used to assess the statistical significance of the reduction observed following adjustment – an assessment that ANCOVA does not ordinarily or directly provide (unless specifically modelled and configured to support this). For these reasons, the “insight” as drafted warrants revision to better qualify these inferential claims (not least with more detail concerning the nature of the “apparent gap between actors”) so as to facilitate greater understanding of what role “divergent behaviour” might play, whether as a specified consequence of “actor” identity, or as a mediator on the causal path between “actor” identity and a separately specified consequence; and better detail the methodological dependencies upon which these rely (including the inclusion of: what techniques/measures were used to assess how the “gap between actors narrow[ed] significantly” following adjustment for “baseline capability and environment”).
- Trend Plotting – While often mistaken for a purely descriptive, univariable technique, Trend Plotting is essentially a bivariable analysis that helps reveal any sequential variation in a specified target/outcome variable in relation to the changing value of a second ordinal or continuous variable (such as a ranked feature or, more commonly, time). It is capable of elucidating and mapping past/current patterns in this variation and thereby offering a retrospective/historical basis upon which past variation might be better characterised (and – potentially – the mechanisms responsible better understood, and applied to generate ‘foresight’ of possible future trends and the ‘predictive estimation’ of dataset features through extrapolation. In the drafted “insight” provided, the inferential claims appear modest since they simply describe the presence of an historical “sustained… longer term… upward movement” that is otherwise, to some degree, obscured by “short-term volatility” – claims that are based solely on extant empirical information. However, where this “insight” does stretch the inferential capabilities of the technique concerned is in suggesting that such (historical) patterns can “show… [evidence of] structural change”. While the specific example and context to which this “insight” refers might potentially clarify/justify such a claim, the omission of this information from the text of the “insight” implies a level of mechanistic/structural understanding that goes beyond what the technique itself can support. The “insight” as drafted therefore warrants revision to temper these inferential claims (and their potentially unfounded/unsound interpretations); or better detail the contextual/situational and methodological dependencies upon which these rely (including the inclusion of: why the “sustained upward movement” is felt to reflect a longer term “structural change” as opposed, for example, to a regular and sustained series of inputs [of some form or another] that might plausibly have elicited such a trend; and so on).
- ARIMA (Auto-Regressive Integrated Moving Average) – In common with Trend Plotting (see above), this is a bivariable technique that can help reveal any sequential variation in a specified target/outcome variable in relation to the changing value of a second ordinal or continuous variable (such as a ranked feature or, more commonly, time). It is similarly capable of elucidating and mapping past/current patterns in such variation and thereby offering a retrospective/historical basis upon which past variation might be better characterised (and – potentially – the mechanisms responsible, and any possible future trends, speculatively inferred, understood or estimated, respectively). Such future trends are estimated through mathematical/statistical extrapolation (commonly, if unhelpfully, called ‘prediction’), albeit under the assumption of ceteris parabus (all other things being equal/remaining unchanged). Indeed, the drafted “insight” provided for this technique begins with a disclaimer to this effect (“assuming current dynamics persist”) before characterising the estimated future trend in “activity levels… over the next three periods [of time]”. In this regard the inference implied appears both clear and defensible, though the absence of any qualification as to what is meant by: the future trend “remain[ing] within a bounded range” (the meaning of which is entirely predicated on the size/scale of the “range” concerned); or a “non-trivial risk” and a “sharp deviation” – substantively detracts from the import of any inference implied. The “insight” as drafted therefore warrants revision to better clarify and qualify these inferential claims (and their potentially unfounded/unsound interpretations); by offering more detail on any quantifiable measures of the “bounded range”, “trivial risk” and “sharp deviation” to which it refers.
- Bayesian Modelling – This (broad group) of techniques, in which ‘prior’ hypotheses concerning the distributional and correlational properties of datasets are tested against new information/data (as this becomes available), offer an attractive alternative to more commonplace ‘frequentist’ techniques – particularly for those analysts who are keen to test their prior understanding (whether speculative or confident) of the contexts and mechanisms in which, and through which, their analytical objectives/problem sets arise. In the “insight” drafted for this technique, the inference is that “new reporting” has confirmed the hypothesised prior(s) and thereby increased the “[analytical] confidence” therein (while “lower[ing the] probability [or likelihood that]… alternative explanations” [are sound]). Whether the technique (as applied in this instance) supports such inference will depend in no small part on the basis upon which the ‘prior’ was derived, and whether the “new reporting” constituted a contribution that was: independent of whatever empirical/theoretical/speculative evidence led to the analyst’s ‘prior’; and capable of challenging/falsifying/qualifying their initial hypothesis. Provided both of these conditions hold, the “insight” as drafted should not lead to inferential misinterpretation – although the use of the term “confidence” to reflect the finding that the ‘prior’/hypothesis was consistent with “new reporting” is unfortunate given “confidence” is more commonly used (in statistical parlance) with reference to precision, and its use here might therefore prompt misinterpretation. Likewise, the use of “materially” to describe (and thereby substantiate) the “lower probability [of]… alternative explanations” would benefit from more detailed (and, where possible/appropriate, quantitative) qualification, alongside further detail on any additional/subsidiary/ancillary techniques used to assess this. The “insight” as drafted therefore warrants revision to better clarify and qualify these inferential claims (and their potentially unfounded/unsound interpretations); by offering more detail on: how the ‘prior’ hypothesis was developed, and in what sense the “new reporting” might constitute an appropriately rigorous/robust test thereof; and what techniques/measures were used to determine “confidence in hypothesis A”, and the “material[ity]” of the “lower probability” observed for the “alternative explanations”.
- Monte-Carlo Simulation – Though lacking in methodological detail (and notwithstanding the unnecessary reference to “thousands of simulations”, given these are germane to this technique), the drafted “insight” in this instance explicitly invites inference regarding the four “plausible” scenarios supported by the distributional and correlational properties of the (resampled) dataset available. Assuming, once more, that the resampling and algorithmic procedures were appropriately applied, such inference is nonetheless predicated upon: precisely how (and with what criteria) “emerg[ing]… scenarios” were considered “plausible”; why only one of these scenarios was deemed necessary/relevant to describe in any (additional) detail; and how both a “sharp” change and a “deterioration” (as in “sharp deterioration”) were defined, identified and classified as such. The “insight” as drafted therefore warrants revision to better clarify and qualify these inferential claims (and their potentially unfounded/unsound interpretations); by offering more detail on: the methodological procedures employed; all four of the “plausible” scenarios these procedures identified; how the “plausib[ility]” of “emerg[ing]… scenarios” was determined (particularly if any statistical tests/parameters informed their determination); and how both “sharp” and “deterioration” (and the possible converse, be these: absent or modest; and stability or improvement, respectively) were defined and determined (including, as before, detail of any additional/subsidiary/ancillary techniques statistical tests and parameters used to judge such assessments).
- Cross-Validation – This technique’s “insight” – assuming, once more, that the choices/decisions made were appropriate to the dataset, context and analytical objective/problem set concerned – explicitly infers that the original model can be considered appropriate to the dataset examined simply because its predictive performance “remains stable” when tested in subsidiary (hence “unseen”) datasets other than that/those used to ‘train’ the initial model(s)/algorithm(s). While such findings are consistent (and therefore reasonable) evidence that the original model was “generalis[able” and unlikely to have been “overfitted” to the original ‘training’ dataset, such inference substantively depends upon: the complexity of the dataset; and the inherent variability of the distributional and correlation properties of its constituent variables – and whether both the dataset- and data-generating mechanisms are inherently stable (i.e. structurally and statistically conserved, regardless of context – as can be the case with artefacts and contexts governed by tight ‘rules’ or design-related constraints). In such instances, this technique may prove less insightful than the inference implied might suggest – unless, that is, prior knowledge/understanding of either/both of these mechanisms was absent, incomplete, imprecise, inaccurate, unreliable or invalid (which may often be the case in intelligence analysis). The “insight” as drafted therefore warrants revision to better clarify and qualify these inferential claims (and their potentially unfounded/unsound interpretations) by offering (or referring back to) more of the detail on: the dataset(s), dataset variables and context(s) examined; and adding further evidence regarding the “patterns… identified” so as to support further inferential interpretation/speculation thereon.
- Bootstrapping – Setting aside the unnecessary reference to “thousands of resampled datasets” – which (like the “thousands of simulations” within the “insight” drafted for Monte Carlo Simulation; see above) is germane to this technique – the “insight” drafted suggests it did not achieve the principal benefit for which Bootstrapping is most commonly applied – this being to generate more precise (i.e. “[more] tightly clustered”) “key estimate[s]” than would otherwise be possible in datasets containing a relatively “small number of observations” (given the higher risk of chance sampling/measurement imprecision therein). Since, in this instance, the “key estimate remains tightly clustered” (italicised emphasis added), the self-evident inference on this occasion is that the original level of “cluster[ing]” (or precision) observed across the whole of the dataset – given this had a relatively “small number of observations” – does not improve despite the application of bootstrapping. Alternatively, one might assume that the analytical objective behind the use of this technique (on this occasion) was to challenge whether a more “tightly clustered… key estimate” than was initially considered possible/likely might have been a chance phenomenon given the “small number of observations” from which this “key estimate” was derived. Yet bootstrapping is unlikely to offer a robust assessment of the potential for ‘Type 1’ errors of this sort, since “tightly clustered… key estimate[s]” generated initially using the entire dataset are unlikely to prove less “tightly clustered” (i.e. less precise) following bootstrapping simply because, by definition, the scope for imprecision is clearly already limited. The “insight” as drafted therefore warrants revision to better clarify and qualify these inferential claims (and their potentially unfounded/unsound interpretations) by including additional information relevant to the analytical objective the technique was deployed to address.
6. Are Duffield’s 11 Techniques Suitable for Intelligence Analysts with “a Relatively Low Mathematical Baseline”?
| Section | Basic foundation content | Additional foundation content | Higher content only |
|---|---|---|---|
| P1 | Record, describe and analyse the frequency of outcomes of probability experiments using tables and frequency trees. | [Blank] | [Blank] |
| P2 | Apply ideas of randomness, fairness, and equally likely events to calculate expected outcomes of multiple future experiments. | [Blank] | [Blank] |
| P3 | Relate relative expected frequencies to theoretical probability, using appropriate language and the 0 to 1 probability scale. | [Blank] | [Blank] |
| P4 | Apply the property that the probabilities of an exhaustive set of outcomes sum to 1. | [Blank] | [Blank] |
| P5 | [Blank] | Understand that empirical unbiased samples tend towards theoretical probability distributions, with increasing sample size. | [Blank] |
| P6 | Enumerate sets and combinations of sets systematically, using tables, grids, and Venn diagrams. | [As for Basic, but also] including using tree diagrams. | [Blank] |
| P7 | Construct theoretical possibility spaces for single and combined experiments with equally likely outcomes; and use these to calculate theoretical probabilities. | [Blank] | [Blank] |
| P8 | Calculate the probability of independent and dependent combined events, including using tree diagrams and other representations; and know the underlying assumptions. | [Blank] | [Blank] |
| P9 | [Blank] | [Blank] | Calculate and interpret conditional probabilities through representation using expected frequencies with two-way tables, tree diagrams and Venn diagrams. |
| Section | Basic foundation content | Additional foundation content | Higher content only |
|---|---|---|---|
| S1 | [Blank] | Infer properties of populations or distributions from a sample, whilst knowing the limitations of sampling. | [Blank] |
| S2c | Interpret and construct tables, charts and diagrams, including: frequency tables, bar charts, pie charts and pictograms for categorical data; vertical line charts for ungrouped discrete numerical data; and know their appropriate use. | [As for Basic, but also] including tables and line graphs for time series data. | [Blank] |
| S3 | [Blank] | [Blank] | Construct and interpret diagrams for grouped discrete data and continuous data – i.e. histograms with equal and unequal class intervals, and cumulative frequency graphs; and know their appropriate use. |
| S4d | Interpret, analyse and compare the distributions of data sets from univariate empirical distributions through:
|
[Blank] | [As for Basic, but also:]
|
| S5 | Apply statistics to describe a population. | [Blank] | [Blank] |
| S6e |
|
[As for Basic, but also:]
|
[Blank] |
Conclusion
- first, because it is far from clear that these techniques are required to address (or would succeed in addressing) the “total information overload” he believes they face due to the accelerating “volume, variety and velocity” of so-called ‘Big Data’ (Ellison, 2026); and
- second, because the scale of the training, supervision and support required to upskill intelligence analysts in the 11 “statistical analytical techniques” appears impracticable and implausible without substantial additional time and resource, and a concomitant recalibration of current intelligence analysis doctrine and practice.
Endnotes
- 1.
- Statistical analysis is, at heart, simply an attempt to extract useful insight from the central tendencies, patterns and trends present within (medium- to large-scale) quantitative datasets, while simultaneously mitigating any non-systematic error (i.e. imprecision or ‘noise’) resulting from ‘measurement error’, and addressing the most important sources of systematic error (i.e. bias).4
- 2.
- That is, to support the estimation/optimisation of hypothesised or (as yet) unmeasured dataset features (whether past, present or future) using univariable or multivariable interpolation, extrapolation and latent class/variable techniques.
- 3.
- Such as how comprehensively a ‘predictive estimation’ algorithm captures the distributional statistical information available, or how sensitive the analyses might be to modest levels of non-systematic and systematic error in the measurement of key variables (and particularly so for the target/outcome variable; see Section 4).
- 4.
- The reliance of statistical analysis on substantive samples of data (comprising multiple measurements/observations or measurements/observations from multiple cases) is a consequence of the way this works by: exploiting and interrogating the distributional and correlational properties of multiple measurements/observations; in order to identify central tendencies, patterns and trends from amongst the ‘noise’ of ‘measurement error’.
Appendices
Appendix A: A non-technical critique of Duffield’s (2026) proposition
- ‘Big Data’ seems unlikely to be the most pressing cause of analyst “overload” – given analytic tradecraft routinely necessitates extended periods of intensive exposure to computer-based digital workspaces, characterised by a relentless stream of multi-sensory/multi-tasking demands, each capable of overwhelming their cognitive processing capacity;5
- only analysts working within specialist “collection disciplines” or “analytical specialisms”4 that require the (statistical) analysis of quantitative datasets will necessarily benefit from enhanced statistical competencies – and those that do may benefit little from the “basic toolkit”2 proposed;
- burdening all analysts with additional responsibilities to develop and apply additional statistical skills may exacerbate any “overload” they experience – unless these responsibilities were offset by a commensurate uplift in analytical capacity/efficiency, or a reduction in analytical output/productivity; and
- generating robust, statistically derived insights from quantitative data requires substantive understanding of the many sources of non-systematic and systematic error (bias) – both analytical and inferential – that arise at each and every step in the application of even the most “basic statistical analytical techniques”.2
- Adopting cognitive hygiene practices as part of its standard operating procedures, to reduce the “overload” attributable to ‘technostress’ (in which ‘Big Data’ may play a part);5,10
- Establishing a dedicated specialist sub-discipline with the expertise required to generate robust insights from the statistical analysis of large quantitative datasets (and ‘Big Data’);
- Ensuring that all analysts understand the inferential pitfalls that often accompany error and bias in statistical analysis (whenever these are not avoided, attenuated, mitigated, or acknowledged and accommodated);9 and
- Discouraging analysts from attempting specialist analytical techniques – such as statistical analysis – without the training, expertise and competencies required to avoid unnecessary mistakes.
Appendix B: Tables
| Analyst-determined choices and decisions | Consequences of misjudged choices and decisions |
|---|---|
| Selection and definition of the analytical objective | Misalignment between method and purpose; and irrelevant or misleading conclusions |
| Selection and definition of critical variables and features | Omitted-variable bias; spurious associations; distorted relationships; and invalid inference |
| How to handle/address missing data | Biased estimates; reduced power; invalid missingness assumptions; and incorrect conclusions |
| The minimum sample size commensurate with the intended analytical insights desired | Imprecise and biased estimates; limited capacity for trustworthy sub-group analyses; enhanced risk of Type 1 errors |
| Necessary data cleaning, preparation and pre-processing steps | Distorted signals; biased parameters; unstable estimates; and incorrect trends or similarity structures |
| Which evaluation metrics to use | Optimising for irrelevant criteria; misleading performance comparisons; and selection of inferior models |
| Which validation/verification approach to use | Inflated or deflated performance; information leakage; and unreliable generalisability |
| Which computational settings to use | Numerical instability; model non-convergence; irreproducible results; and inaccurate estimates |
| Which uncertainty quantification technique(s) to use | Under- or over-stated uncertainty; misleading confidence or credible/credibility intervals; and incorrect risk characterisation |
| Which diagnostics and robustness checks to use | Undetected misfitted model(s); unchecked and invalid assumptions; and fragile/unstable inferences |
| What interpretation strategy/ies to adopt | Misinterpretation of: coefficients; probabilities; forecasts, trends; and/or clusters |
| Analyst-determined choices and decisions | Consequences of misjudged choices and decisions |
|---|---|
| Selecting tuning/hyperparameters | Underfitting or overfitting; poor predictive performance; distorted clusters; unstable estimates; and excessive variance or bias |
| Managing iterative algorithms | Failure to converge; unstable parameter estimates; sensitivity to initial conditions; and irreproducible outputs |
| Designing re-sampling or simulation structure(s) | Invalid interval coverage; biased uncertainty estimates; incorrect risk quantification; and unstable simulation outputs |
| Handling temporal or structural dependencies | Spurious autocorrelation; non-stationary residuals; poor quality forecasts; structural misspecification; and invalid validation. |
| Specifying probabilistic modelling components | Prior-dominated or mis-specified a posteriori; incorrect uncertainty estimates; and misleading inference |
| Technique | Analyst-determined choices and decisions | Consequences of misjudged choices and decisions |
|---|---|---|
| Linear Regression | Functional form; handling of heteroscedasticity; identification of influential data; variance estimators; and multicollinearity treatment | Biased coefficients; invalid standard errors; poorly fitted models; and misleading inference |
| Logistic Regression | Link function; separation handling; correction for any class imbalance; classification threshold; and regularisation | Poor calibration; infinite or unstable estimates; skewed predicted probabilities; and misclassification |
| K-Nearest Neighbours | Distance metric; feature scaling; choice of K; and weighting scheme | Distorted neighbour structure; elevated bias or variance; poor discrimination; and misleading predictions |
| K-Means Clustering | Number of clusters; initialisation method; distance metric; algorithm variant; and stopping criteria | Wrong or unstable clusters; centroid drift; and false segmentation or grouping |
| ANCOVA (Analysis of Covariance) | Covariate choice; interaction specification; homogeneity-of-slopes testing; contrast coding; and error structure | Confounding; biased adjusted means; and invalid group comparisons |
| Trend Plotting | Smoothing method; window/span; detrending; anomaly handling; treatment of temporality and seasonality; and decomposition | Over- or under-smoothing; false patterns; and suppressed or exaggerated trends. |
| ARIMA (Autoregressive Integrated Moving Average) | Order selection for non-seasonal (p,d,q) and seasonal (P,D,Q) parameters; differencing choices; estimation method; structural break handling; and ARIMAX specification | Non-stationarity; over-differencing; poor forecasts; spurious autocorrelation |
| Bayesian Analysis | Choice of prior; hyperparameters; sampling algorithm; convergence diagnostics; and posterior predictive checks | Prior domination; non-convergence; misleading a posteriori inference; and incorrect uncertainty quantification |
| Monte Carlo Simulation | Input distributions; correlation structure and dependencies; scenario construction; and number of iterations | Unrealistic simulations; incorrect uncertainty estimates; and unstable results |
| Cross-Validation | Number of folds; fold construction (whether stratified, blocked or rolling); and tuning of inside/outside folds | Inflated or deflated performance; invalidation for time series; and misleading model comparisons |
| Bootstrapping | Bootstrap type; re-sampling unit; number of bootstrap samples; and interval type | Wrong coverage; unreliable intervals; invalid dependence structure; and incorrect uncertainty assessment |
| Parametric/non-parametric assumptions | Consequences were the assumptions not to hold |
|---|---|
| Data representativeness: sample reflects the population or phenomenon being analysed | Biased estimates; invalid generalisability; and systematic distortion of inference |
| Measurement validity: variables accurately capture the constructs of interest | Misinterpretation; spurious associations; and misleading parameter estimates |
| Measurement reliability: repeated measurements and observations provide consistent values | Increased variance; attenuation of effects; loss of power; and noise-dominated inference |
| Correct data type usage: continuous, ordinal, categorical vs. time-indexed, where appropriate | Model mis-specification; invalid distances/similarities; and incorrect likelihood forms |
| Independence of measurements/observations (unless explicitly modelled otherwise) | Inflated significance; underestimated uncertainty; and misleading confidence/credible intervals |
| Appropriate scale and transformation of variables whenever scale-sensitive methods are used | Distorted distances; unreliable predictions; and dominance of high-variance features |
| Absence of severe outliers unless method is robust or outliers are explicitly modelled. | Skewed parameters; cluster distortion; unstable fits; and misleading trends |
| Appropriate sample size for the chosen analytical technique (in the absence of a formal power calculation) | Unstable estimates; wide uncertainty; high variance; and unreliable re-sampling or simulation. |
| Correct temporal ordering for time-dependent (and causal inference) analyses | Spurious autocorrelation; invalid forecasting; and causality misinterpretation |
| Appropriate randomness assumptions when either re-sampling or simulation is used | Invalid bootstrap intervals; biased Monte Carlo outputs; and incorrect uncertainty estimates |
| Correct specification of the objective function or loss criterion | Model optimises the wrong behaviour; invalid conclusions; and poor predictive performance |
| Technique | Parametric/non-parametric assumptions | Consequences were the assumptions not to hold |
|---|---|---|
| Linear Regression | Linearity of relationships; additive effects; homoscedastic residuals; normally distributed residuals for exact inference; no multicollinearity; and independent errors | Biased coefficients; invalid standard errors; inflated Type I errors; unstable estimates; and misleading inference |
| Logistic Regression | Correct link function (logit typically being appropriate); linearity of log-odds in predictors; absence of complete separation; independent errors; and correct distributional form (i.e. Bernoulli responses) | Infinite coefficients; non-convergence; mis-calibrated predicted probabilities; and biased odds ratios |
| K-Nearest Neighbours | Meaningful distance metric; local smoothness (similarity of nearby points); absence of irrelevant or dominating features; and balanced class structure for classification | Distorted neighbour sets; high variance or high bias; poor classification/regression performance; and meaningless similarity structure |
| K-Means Clustering | Cluster shapes approximately spherical or convex; clusters separable via Euclidean distance; similar cluster variance; and meaningful centroid representation | Mis-clustering; merged or fragmented clusters; unstable solutions; and misleading group interpretations |
| ANCOVA (Analysis of Covariance) | Homogeneity of regression slopes; linear relationship between covariate and outcome; correct specification of covariates; independent residuals; and homoscedasticity | Biased adjusted means; invalid comparisons across groups; and incorrect significance tests |
| Trend Plotting | Underlying process is smooth enough for chosen smoother technique; independence of, or correctly modelled, dependence; appropriateness of smoothing window; and absence of structural breaks (unless modelled) | False trend detection; noise mistaken for signal; significant structure masking; and misleading visual inference |
| ARIMA (Autoregressive Integrated Moving Average) | Stationarity (or stationarity achieved through differencing); invertibility; correct autoregressive and moving-average order; residual independence; homoscedastic residuals; and absence of unmodelled structural breaks | Poor forecasts; spurious autocorrelation; biased parameter estimates; and unstable time-series behaviour |
| Bayesian Analysis | Correct likelihood form; coherent specification of prior; compatibility of prior and likelihood; sufficient MCMC (Markov Chain Monte Carlo) convergence; and a posteriori integrability | Domination of/by prior; misleading a posteriori findings/inferences; invalid credible intervals; non-converged chains; and incorrect uncertainty estimation |
| Monte Carlo Simulation | Correct input distributions; valid dependence and correlation structure; sufficient simulation size; and integrity of RNG (random number generator) | Inaccurate risk/uncertainty estimates; biased outputs; unstable results; and simulation artefacts |
| Cross-Validation | Independence across folds; correct fold structure (e.g., temporal blocking for time series); representativeness of training and testing sets; and a consistent evaluation metric | Inflated or deflated performance estimates; invalid model selection; leakage; and misleading generalisability |
| Bootstrapping | Sample approximates population distribution; independence or correct block design for time series; sufficient re-sample size; and correct bootstrap type (i.e. parametric, non-parametric or block). | Misleading confidence intervals; incorrect/wrong coverage; underestimated variance; and invalid inference for dependent data |
| Sources of imprecision and bias | Consequences were imprecision and bias not addressed |
|---|---|
| Sampling error (random fluctuations with respect to a finite sample) | Low precision; wide intervals; unstable estimates; and misinterpretation of random variation as if it reflected meaningful structure |
| Sampling bias (non-representative samples/selection bias) | Systematic mis-estimation; invalid generalisability; and biased predictions and inference |
| Measurement error (random noise in presentation of variable[s] and/or their measurement/observation) | Attenuated relationships; reduced power; increased variance; and masked effects |
| Measurement bias (systematic under/over recording or reporting, and misclassification) | Systematic distortion of parameters; biased coefficients; and incorrect inference |
| Confounding (unmeasured variables causing both exposure/cause and outcome, or focal predictor and target) | Spurious associations; misleading causal interpretation; and biased parameter estimates |
| Model mis-specification (wrong functional form; omitted interactions; and/or incorrect structural assumptions) | Bias; residual structure; invalid inference; and misleading conclusions |
| Data preprocessing bias (improper scaling; transformations; and/or filtering) | Distorted distances; incorrect trend patterns; and unreliable model behaviour |
| Outlier influence (extreme values not accounted for) | Parameter instability; misleading clusters; distorted regression lines; and invalid forecasts |
| Dependence structure mismanagement (unmodelled autocorrelation, clustering, or grouping) | Underestimated uncertainty; inflated Type I errors; and false significance |
| Information leakage (test data contaminating the training procedure) | Severely inflated performance estimates; and invalid generalisability |
| Algorithmic instability (sensitivity to random initialisation, or to small perturbations) | Low reliability; non-reproducible results; and unstable inferences |
| Hyperparameter/tuning bias (whether tuned on test set or over-optimised) | Overfitting; unrealistic performance; and degraded real-world accuracy |
| Over-smoothing or under-smoothing (for time series or trend analyses) | Masked structure or exaggerated noise; and incorrect conclusions about trends or cycles |
| Simulation or re-sampling randomness (Monte Carlo or bootstrap variability) | Unstable uncertainty estimates; and misleading intervals (if insufficient samples drawn) |
| Interpretation bias (misreading/misinterpretation of statistical or probabilistic outputs) | Incorrect substantive conclusions; and miscommunication of risk, effect sizes, or uncertainty |
| Technique | Sources of imprecision and bias | Consequences were imprecision and bias not addressed |
|---|---|---|
| Linear Regression | Heteroscedastic residuals; multicollinearity; influential outliers; and omitted nonlinear terms | Biased coefficients; incorrect standard errors; unstable estimates; and misleading inference |
| Logistic Regression | Complete/quasi separation; rare events bias; mis-calibrated class imbalance; and mis-specified link | Infinite or unstable coefficients; biased odds ratios; and poor probability calibration |
| K-Nearest Neighbours | ‘Curse of dimensionality’; unscaled variables; irrelevant features; and class imbalance | Distorted neighbour sets; poor classification; and either high variance or high bias |
| K-Means Clustering | Poor centroid initialisation; non-spherical clusters; sensitivity to scaling; and empty clusters | Incorrect cluster assignment; unstable clustering; and misleading segmentation |
| ANCOVA (Analysis of Covariance) | Violation of slope homogeneity; mis-specified covariates; and imbalance across groups | Biased adjusted means; invalid comparisons; and inflated Type I errors. |
| Trend Plotting | Incorrect smoothing span; unremoved seasonality; and failure to handle structural breaks | False trends; masked patterns; and misinterpretation of time-dependent behaviour |
| ARIMA (Autoregressive Integrated Moving Average) | Incorrect differencing order; unmodelled seasonality; residual autocorrelation; and parameter non-invertibility | Spurious autocorrelation; biased forecasts; instability; and misleading time-series structure |
| Bayesian Analysis | Poorly chosen priors; conflict between prior and likelihood; non-converged MCMC (Markov Chain Monte Carlo); and autocorrelated chains | Prior domination; incorrect a posteriori; invalid credible intervals; and unreliable inference |
| Monte Carlo Simulation | Incorrect input distributions; mis-specified correlations; insufficient iterations; random number generator deficiencies | Invalid uncertainty quantification; biased risk estimates; and simulation artefacts |
| Cross-Validation | Improper fold assignment; temporal leakage; stratification failure; and inconsistent evaluation metric | Inflated/deflated performance; wrong model selection; and invalid generalisability |
| Bootstrapping | Incorrect variance estimates; wrong interval coverage; and misleading uncertainty assessments | Incorrect variance estimates; wrong interval coverage; and misleading uncertainty assessments |
| Diagnostic assurance measures available | Consequences should diagnostic assurance measures not be applied (and reported) |
|---|---|
| Explicitly defining the target population and the estimand (the estimate sought/desired)1,2 | Misaligned analyses; results that answer the ‘wrong’ question; and non-transportable conclusions |
| Pre-registering the analysis plan or ‘locking’ a protocol1,3 | Researcher-adjusted degrees of freedom; p-hacking; inflated Type I error rate; and overfitted narratives |
| Auditing: data provenance, missingness, and measurement error4,5 | Biased estimates; spurious associations; unknown and unacknowledged uncertainty; and selective omission |
| Conducting train/validation/test separation and leakage checks6 | Overstated performance; failures of generalization; and misleading model comparisons |
| Conducting exploratory data analysis (to assess distributions, outliers, and scale)4,7 | Model misfit with leverage points dominating; and/or invalid standard errors |
| Undertaking technique-relevant assumption mapping (e.g. regarding: linearity; independence; and stationarity)7,8 | Hidden violations that degrade validity; and biased/inefficient or inconsistent estimates |
| Ensuring analytical features are engineered with appropriate scaling or encoding7 | Distance-based/generalised linear model fits that are distorted; non-convergence; and unstable coefficients |
| Developing and applying a confounding control strategy (through design or using a DAG-based adjustment set)4,9 | Invalid causal claims; ‘omitted-variable bias’; and collider bias (if these are incorrectly conditioned on) |
| Adopting a single, pre-specified model cross-validation or selection/evaluation rule to avoid post hoc cherry-picking6 | Overfitting/underfitting; cherry-picked metrics; and irreproducible choices |
| Conducting diagnostic residuals checks and influence analysis4,7 | Undetected heteroscedasticity, autocorrelation, and/or nonlinearity; with results driven by a few data points |
| Ensuring standard errors are robust; or conducting variance modelling when needed7 | Invalid confidence intervals and p-values; leading to inflated false positives and/or false negatives |
| Controlling for multiple testing/error-rate (e.g. Family-Wise Error Rate or False Discovery Rate)7 | Excessive numbers of false discoveries; and an unreliable portfolio of ‘insights’ |
| Conducting sensitivity analyses (to evaluate specifications, priors, and bandwidths)4,9,10 | Fragile conclusions; and undisclosed model dependencies |
| Undertaking uncertainty quantification (e.g. by generating confidence, credible and predictive intervals)1,8 | Overconfident claims; and unassessed risks |
| Conducting external validation or out-of-time validation6 | Non-transportable models; and enhanced risk of ‘surprise failure’ following model deployment |
| Evaluating reproducibility (of code, seeds, and versions)3 | Irreplicable findings; and undiagnosable discrepancies |
| Applying transparent reporting (including of: analysis limitations, assumptions and data bounds)2 | Misinterpretation by customers/consumers; and potential misuse/misapplication in decision-making |
| Technique | Diagnostic assurance measures available | Consequences should diagnostic assurance measures not be applied (or reported) |
|---|---|---|
| Linear Regression | Residual plots for linearity/normality; heteroscedasticity tests (Breusch–Pagan); robust/heteroscedasticity-consistent standard errors; multicollinearity (variance inflation factor [VIF]); ‘influence’ (Cook’s D); specification tests; and interaction checks1,2 | Biased/inefficient estimates; invalid standard errors/confidence intervals; spurious significance from nonlinearity; and results driven by just a few points |
| Logistic Regression | Linearity in the logit (Box–Tidwell test); separation checks and penalization (Firth bias-reduced logistic regression or Ridge/L2-penalised logistic regression); calibration (reliability plots, Brier scores, and/or the Hosmer–Lemeshow goodness-of-fit test); measures of class imbalance handling, multicollinearity and threshold-independent metrics (area under the curve [AUC] ROC, and/or precision-recall curve [PRC])3,4,5 | Poor calibration; unstable/biased coefficients; misleading accuracy as a result of imbalance; and odds ratios misinterpreted as risk ratios |
| K-Nearest Neighbours | Feature scaling; ‘k’ chosen via cross-validation; selection of appropriate distance metric; class imbalance handling; leakage prevention (temporal splits if needed); and reduction of dimensionality if high-dimensional data (high-D)6,7 | Noisy, unstable predictions; optimism from leakage; and distance that is meaningless in high dimensions |
| K-Means Clustering | Scale features; multiple random initializations; choice of ‘k’ (silhouette, gap statistic); assess cluster stability; handle outliers; and interpret centroids8,9 | Arbitrary or non-replicable clusters; misleading “natural groups”; and spurious downstream profiles |
| ANCOVA (Analysis of Covariance) | Verify homogeneity of regression slopes (parallel slopes); residual diagnostics; covariate reliability; multiple-comparison adjustments; and balance/randomization checks2,10 | Invalid adjusted means; biased group effects; inflated Type I error rate; and misattribution to baseline |
| Trend Plotting | Choose smoothing (locally estimated scatterplot smoothing [LOESS]; or moving average [MA]) with rationale; preserve time order; show uncertainty bands; adjust or model temporal periodicity (if relevant); and test for change points11,12 | Over-reading noise as if it were a trend; seasonality overlooked/omitted; and false claims of structural change |
| Technique | Diagnostic assurance measures available | Consequences should diagnostic assurance measures not be applied (or reported) |
|---|---|---|
| ARIMA (Autoregressive Integrated Moving Average) | Check stationarity (augmented Dickey Fuller test/Kwiatkowski–Phillips–Schmidt–Shin test); identify ARIMA orders: (p) autoregressive, (d) differencing, and (q) moving-average) via autocorrelation function (AF), partial autocorrelation function (PACF) and information criteria (IC); ensure residual whiteness/white noise; include seasonal ARIMA (SARIMA) when appropriate; run stability checks; and evaluate out-of-sample forecasts/intervals13,14 | Mis-specified dynamics; autocorrelated residuals; biased forecasts; and underestimated intervals |
| Bayesian Analysis | Specify/justify priors; diagnose Markov chain Monte Carlo (MCMC) convergence (R-hat, effective sample size (ESS), and trace plots); posterior predictive checks; prior/posterior sensitivity; and identifiability review15,16 | Non-converged or prior-dominated a posteriori; misleading credible intervals; and undetected misfit |
| Monte Carlo Simulation | Validate input distributions and dependencies; justify scenario design; sufficient runs to stabilize estimates; consider variance reduction; and sanity-check vs. historical data17,18 | Unreliable scenario probabilities; spurious tail risks (or missed extremes); and unstable decision metrics |
| Cross-Validation | Proper fold construction (stratified, group, or time-series cross-validation [CV]); prevent leakage; fix evaluation metric; report mean/variance across folds; and repeated CV for stability19,20 | Optimistic performance; non-generalizing/generalisable models; and unstable model selection |
| Bootstrapping | Resample with replacement; choose appropriate bootstrap for dependence (block/cluster); use sufficient replicates; use percentile/bias-corrected and accelerated (BCa) intervals; and check interval stability21,22 | Under/overestimated uncertainty; invalid confidence intervals with dependencies; and overconfident claims of robustness |
References
- Aggarwal CC, Hinneburg A, Keim DA. On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche J and Vianu V (eds.) Lecture Notes in Computer Science 2001; Berlin, Heiledberg (D): Springer. pp. 420-34. [CrossRef]
- Agresti A. Foundations of Linear and Generalized Linear Models, 1st Edition. Hoboken (NJ, USA): Wiley; 2015. 472pp. ISBN: 9781118730034.
- Bergmeir C, Hyndman RJ, Koo B. A note on the validity of cross-validation for evaluating autoregressive time series prediction. Computational Statistics and Data Analysis 2018; 120: 70-83. [CrossRef]
- Bishop CM. Pattern Recognition and Machine Learning, 1st Edition. New York (NY, USA): Springer; 2006. 738pp. ISBN: 9780387310732.
- Box GEP, Jenkins GM, Reinsel GC, Ljung GM. Time Series Analysis: Forecasting and Control, 5th Edition. Hoboken (NJ, USA): Wiley; 2015. 712pp. ISBN: 9781118675021.
- Box GEP. Science and statistics. Journal of the American Statistical Association 1976; 71: 791-799. [CrossRef]
- Box GEP. Robustness in the strategy of scientific model building, In: Launer RL, Wilkinson GN (eds.) Robustness in Statistics. New York (NY, USA): Academic Press; 1979. pp. 201–236. [CrossRef]
- Breusch TS, Pagan AR. A simple test for heteroscedasticity and random coefficient variation. Econometrica 1979; 47:1287–94. [CrossRef]
- British Army. Operator Military Intelligence. British Army Find a Role Website. Archived here on 30MAY26.
- Casella G, Berger RL. Statistical Inference, 2nd Edition. Pacific Grove (CA, USA): Duxbury Press; 2002. 660pp. ISBN: 9780534243128.
- Christensen R, Ranstam J, Overgaard S, Wagner P. Guidelines for a structured manuscript: Statistical methods and reporting in biomedical research journals. Acta Orthopaedica 2023; 94: 243-9. [CrossRef]
- Cleveland WS. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 1979; 74: 829-36. [CrossRef]
- Corera G. How Israel builds its hi-tech start-ups. Technology, BBC News website 2016; 14 Oct. Archived here on 14OCT16.
- Cover T, Hart P. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 1967; 13: 21-7. [CrossRef]
- Davison AC, Hinkley DV. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press; 1997. 582pp. ISBN: 9780511802843.
- Duffield J. Statistical analytical techniques for intelligence analysis. RUSI Journal 2026; 171: 1-12. [CrossRef]
- Efron B, Hastie T. Computer Age Statistical Inference, 1st Edition. Cambridge (UK): Cambridge University Press; 2016. 495pp. ISBN: 9781107149899.
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap, 1st Edition. New York (NY): Chapman and Hall/CRC; 1993. 456pp. ISBN: 978-0412042317.
- Ellison GTH. Using directed acyclic graphs (DAGs) to represent the data generating mechanisms of disease and healthcare pathways: A guide for educators, students, practitioners and researchers. In: Farnell DJJ, Medeiros Mirra R (eds). Teaching Biostatistics in Medicine and Allied Health Sciences. Cham (Switzerland): Springer; 2023. pp. 61-101. [CrossRef]
- Ellison GTH. Statistics in Intelligence Analysis: “a little learning is a dang'rous thing”. RUSI Journal 2026; DOI: 0.1080/03071847.2026.2683229.
- Elwert F, Winship C. Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology 2014; 40: 31-53. [CrossRef]
- Firth D. Bias reduction of maximum likelihood estimates. Biometrika 1993; 80: 27-38. [CrossRef]
- Fox J. Applied Regression Analysis and Generalized Linear Models, 4th Edition. Los Angeles (CA): SAGE Publications; 2015. 824pp. ISBN: 978-1452205663.
- Gelman A, Vehtari A, Simpson D, Margossian CC, Carpenter B, Yao Y, Kennedy L, Gabry J, Bürkner P, Modrák M. Bayesian workflow. arXiv 2020; 03 Nov: 1-77. [CrossRef]
- Glasserman P. Monte Carlo Methods in Financial Engineering. New York (NY): Springer; 2003. 596pp. ISBN: 978-0387004518.
- Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis, 3rd Edition. Boca Raton (FL, USA): CRC Press; 2014. 662pp. ISBN: 9781439840955.
- Goodfellow I, Bengio Y, Courville A. Deep Learning, 1st Edition. Cambridge (MA, USA): MIT Press; 2016. 800pp. ISBN: 9780262035613.
- Hardwicke TE, Salholz-Hillel M, Malički M, Szűcs D, Bendixen T, Ioannidis JPA. (2023) Statistical guidance to authors at top ranked journals across scientific disciplines. American Statistician 2023; 77: 239-47. [CrossRef]
- Harrell FE. Regression Modeling Strategies, 2nd Edition. Cham (Switzerland): Springer; 2015. 582pp. ISBN: 9783319194240.
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, 2nd Edition. New York (NY, USA): Springer; 2009. 745pp. ISBN: 9780387848570.
- Helton JC, Davis FJ. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering and System Safety 2003; 81: 23-69. [CrossRef]
- Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression, 3rd Edition. Hoboken (NJ): Wiley; 2013. 510pp. ISBN: 9780470582473.
- Hyndman RJ, Athanasopoulos G. Forecasting: Principles and Practice, 3rd Edition. Melbourne, Australia: OTexts; 2021. 442pp. ISBN: 9780987507136.
- Ioannidis, JPA. Statistical biases in science communication: What we know about them and how they can be addressed. In: Jamieson KH, Kahan DM, Scheufele DA (eds). The Oxford Handbook of the Science of Science Communication (Online Edition). Oxford Academic, Oxford Library of Psychology: Oxford (UK) 2017; 2017; 06 Jun. [CrossRef]
- Ioannidis, JPA. What have we (not) learnt from millions of scientific papers with p values? American Statistician 2019; 73 (Suppl 1): 20-5. [CrossRef]
- James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning, 2nd Edition. New York (NY, USA): Springer; 2021. 612pp. ISBN: 9781071614174.
- Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken (NJ): Wiley; 1990. 342pp. ISBN: 9780471878766.
- Killick R, Fearnhead P, Eckley IA. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association 2012; 107: 1590-8. [CrossRef]
- Kim J, Kim DH, Kwak SG. Comprehensive guidelines for appropriate statistical analysis methods in research. Korean Journal of Anesthesiology 2024; 77: 503-17. [CrossRef]
- Kutner MH, Nachtsheim CJ, Neter J, Li W. Applied Linear Statistical Models, 5th Edition. Boston (MA, USA): McGraw-Hill; 2005. 1396pp. ISBN: 9780073108742.
- Lang TA, Altman DG. Basic statistical reporting for articles published in biomedical journals: The "Statistical Analyses and Methods in the Published Literature" or the SAMPL Guidelines. International Journal of Nursing Studies 2015; 52: 5-9. [CrossRef]
- Le Carre J. Tinker, Tailor, Sailor, Spy. London (UK): Hodder and Stoughton; 1974. 355pp. ISBN: 0-340-18879-0.
- Little RJA, Rubin DB. Statistical Analysis with Missing Data, 3rd Edition. Hoboken (NJ): Wiley; 2019. 462pp. ISBN: 978-0-470-52679-8.
- Maxwell SE, Delaney HD. Designing Experiments and Analyzing Data: A Model Comparison Perspective, 2nd Edition. New York (NY): Routledge; 2003. 920pp. ISBN: 9781410609243.
- Montgomery C, Engelmann L. Epidemiological publics? On the domestication of modelling in the era of COVID-19. Somatosphere 2020; 10 Apr. Archived here on 02MAR24.
- McElreath R. Statistical Rethinking, 2nd Edition. Boca Raton (FL, USA): CRC Press; 2020. 562pp. ISBN: 9780367139919.
- MOD (Ministry of Defence). Intelligence, Counter-intelligence and Security Support to Joint Operations. Joint Doctrine Pulication JDP 2-00, 4th Edition – Development, Concepts and Doctrine Centre, Ministry of Defence, UK; 2023: 181pp. Archived here on 20NOV23.
- MOD (Ministry of Defence). Cyber and Electromagnetic Activities. Joint Doctrine Note JDN 1/18, 1st Edition – Development, Concepts and Doctrine Centre, Ministry of Defence, UK; 2018: 54pp. Archived here on 27JUN24.
- Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis, 6th Edition. Hoboken (NJ, USA): Wiley; 2021. 656pp. ISBN: 9781119722106.
- Murphy KP. Machine Learning: A Probabilistic Perspective, 1st Edition. Cambridge (MA, USA): MIT Press; 2012. 1067pp. ISBN: 9780262018029.
- Nance N, Petersen ML, van der Laan M, Balzer LB. The causal roadmap and simulations to improve the rigor and reproducibility of real-data applications. Epidemiology 2024; 35: 791-800. [CrossRef]
- Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proceedings of the National Academy of Sciences of the USA. 2018; 115: 2600–6. [CrossRef]
- OxfordAQA (Oxford Assessments and Qualifications Alliance). GCSE Mathematics 8300 – Specification. OxfordAQA Website; 2026. Archived here on 07OCT24.
- Rubin DB. Multiple Imputation for Nonresponse in Surveys, 1st Edition. New York (NY, USA): Wiley; 1987. 258pp. ISBN: 9780471081934.
- RAF (Royal Air Force). Intelligence. Royal Airforce Recruitment Website. Archived here on 01DEC25.
- RN (Royal Navy). 2026a. Warfare Rating. Royal Navy Careers Website. Archived here on 25FEB26.
- RN (Royal Navy). 2026b. Defence Aptitude Assessment. Royal Navy Careers Website 2026; Archived here on 28MAY26.
- Shalizi CR. Advanced Data Analysis from an Elementary Point of View, 1st Edition. Cambridge (UK): Cambridge University Press; 2021. 736pp. ISBN: 9781107190204.
- Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, Henry D, Altman DG, Ansari MT, Boutron I, Carpenter JR. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. British Medical Journal 2016; 355: I4919. [CrossRef]
- Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology 2010; 21: 128-38. [CrossRef]
- Thomas Z, Kruppa M. This Israeli Army Unit Has Become an Incubator for Tech Startups. Tech News Briefing, Wall Street Journal: New York (NY) 2024; 02 Sep. Archived here on 03SEP24.
- Textor J, van der Zander B, Gilthorpe MS, Liskiewicz M, Ellison GTH. Robust causal inference using directed acyclic graphs: The R package 'dagitty'. International Journal of Epidemiology 2016; 45: 1887-1894. [CrossRef]
- Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society B 2001; 63: 411–23. [CrossRef]
- VanderWeele TJ, Ding P. Sensitivity analysis in observational research: Introducing the E-value. Annals of Internal Medicine 2017; 167: 268–74. [CrossRef]
- Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 2006; 7: 91. [CrossRef]
- Vehtari A, Gelman A, Simpson D, Carpenter B, Bürkner P-C. Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. Bayesian Analysis 2021;16: 667–718. [CrossRef]
- Yu X, Zoh RS, Fluharty DA, Mestre LM, Valdez D, Tekwe CD, Vorland CJ, Jamshidi-Naeini Y, Chiou SH, Lartey ST, Allison DB. Misstatements, misperceptions, and mistakes in controlling for covariates in observational research. eLife 2024; 13: E82268. [CrossRef]
- Yuval, A. ‘Lavender’: The AI machine directing Israel’s bombing spree in Gaza. +972 Magazine 2024; 03 Apr. Archived here on 03APR24.
- Wasserman L. All of Statistics, 1st Edition. New York (NY, USA): Springer; 2004. 440pp. ISBN: 9780387402727.
- Wasserman L. All of Nonparametric Statistics, 1st Edition. New York (NY, USA): Springer; 2006. 268pp. ISBN: 9780387251455.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).