Intervention-Based Time Series Causal Discovery via Simulator-Generated Interventional Distributions

Tsuyoshi Okita

doi:10.20944/preprints202605.0617.v1

Submitted:

08 May 2026

Posted:

11 May 2026

You are already at the latest version

Abstract

In many scientific domains, physics-based simulators—programs that compute system behaviour from governing equations, such as density functional theory for materials or fluid dynamics solvers—encode causal mechanisms and can predict system behaviour under hypothetical interventions. Machine learning extracts patterns from observational time series at scale, but those patterns reflect statistical associations ($P(Y \mid X)$), not causal effects ($P(Y \mid \mathrm{do}(X))$): in the presence of latent confounders, the structural VAR is provably non-identifiable from observational data alone (Fact 3.3), and no amount of statistical sophistication can substitute for genuine interventional data. Bridging these two traditions has so far been limited to using simulators for prediction; no existing framework uses them for causal structure discovery in time series. We propose SVAR-FM (Structural VAR with Flow Matching), a framework that treats a physical simulator as a mechanical realization of Pearl's $\mathrm{do}(\cdot)$ operator. Clamping a variable in the simulator physically severs confounding paths, producing interventional data by construction rather than by statistical argument. Conditional Flow Matching then parameterizes the interventional conditionals, enabling nonlinear mechanism learning. This yields four results. (1) The full structural VAR—contemporaneous and lagged edges jointly—becomes identifiable under a coverage condition on the simulator-clampable variables, verifiable a priori from domain knowledge alone (Theorem 4.1). The argument is intrinsic to the time series setting and has no i.i.d.\ counterpart. (2) An end-to-end error bound $|\hat{e}_{i\to j} - e^{*}_{i\to j}| \le O(M^{-1/2}) + O(\delta_{\mathcal{S}}) + O(\varepsilon_{\mathrm{FM}})$ (Theorem 5.2) cleanly separates Monte Carlo sampling, simulator fidelity $\delta_{\mathcal{S}}$, and Flow Matching approximation. A sharp consequence is a sign-flip regime (Corollary 5.5): when $\delta_{\mathcal{S}}$ exceeds a threshold set by the signal magnitude, the estimated causal effect reverses sign—a prediction that the prevailing forward-prediction view of simulators cannot produce. (3) The CausalSim benchmark confirms that SVAR-FM recovers the correct causal sign across four scientific domains (macroeconomics, iabetes, cosmic ray physics, and battery degradation) where observational methods produce sign-reversed estimates due to confounding. (4) A case study in ultrafast laser physics tests the sign-flip prediction by physically varying $\delta_{\mathcal{S}}$ through the accuracy level of a first-principles quantum solver: the low-accuracy setting produces a sign-reversed estimate, while the high-accuracy setting recovers the correct positive slope ($R^2 = 0.983$, zero bias relative to ground truth), providing the first experimental demonstration of a simulator-fidelity-dominated failure mode in causal discovery.

Keywords:

machine learning

;

time-series causal inference

;

intervention

;

physics simulation

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Intervention-Based Time Series Causal Discovery via Simulator-Generated Interventional Distributions

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe