A set of “stylized facts” about state-backed influence operations now circulates across journalism, policy, and the peer-reviewed literature: that they are monolithic troll armies; that they win by weaponizing moral-emotional language; that they manufacture their own virality; that they learn and optimize against feedback; and that they have become indistinguishable from ordinary users. Most rest on single-campaign studies, uncontrolled comparisons, and large-sample significance reported without baselines or multiple-comparison control, and are rarely re-tested. We assemble complete, government-attributed archives of seven state campaigns (25,076,853 tweets from 9,071 accounts) with a matched organic-user baseline for five of them, and re-test all five claims under one protocol: pre-registration, Benjamini–Hochberg false-discovery control, permutation nulls, a future-reception placebo, and a takedown-snapshot decomposition. Scoped to these campaigns, every claim weakens or reverses. The operations are narratively segregated, thinly staffed production desks, not a unified army; an organic moral-contagion law fails to replicate in any of them, and a meaningless placebo predicts engagement as well; internal amplification supplies only 0.10%–5.31% of top-percentile reach, the rest captured from an external audience the operation does not control; behavior is scripted, with rare apparent feedback mean-reverting toward baseline; and while their per-account language has drifted off the 2016 “troll” fingerprint, they still coordinate 7–70× more tightly than matched real users—a regularity a frozen re-test reproduces, with the language-drift and segregation patterns, across twelve further country-groups. The corrected picture is coherent: an industrial content factory siloed in production but coordinated in execution, whose reach it does not internally manufacture (its external composition, organic versus undetected coordination, left unresolved). The detectable signature has migrated from language to coordination: per-account content fingerprints age out, while cross-account coordination remains the durable, cross-national marker. Because these operations run scripts rather than optimize against feedback, an adversary that genuinely optimized—now feasible with large language models—would look measurably different from the operations studied here: a forward warning, not a present finding. The recurring lesson is methodological: on corpora of confirmed manipulation, baseline-free significance reconstructs the analyst’s expectations, and platform and policy decisions rest on those beliefs.