Precise semantic matching between natural language queries and unconstrained videos remains a fundamental yet unresolved challenge in multimedia retrieval. Although recent transformer-based dual encoders and CLIP-style contrastive frameworks have improved global text–video alignment, they still struggle in complex scenes where (i) spatiotemporal cues are highly entangled among objects, motion patterns, and background context, and (ii) cross-modal interactions are easily biased by spurious correlations, resulting in brittle retrieval performance under compositional or ambiguous language. To overcome these limitations, we propose a unified framework that enhances text–video correspondence through three closely coupled components: Query-adaptive Semantic Routing (QSR), Counterfactual Bi-directional Alignment (CBA), and Temporal Causal Regularization (TCR). QSR introduces a query-conditioned routing mechanism that decomposes video representations into multiple semantic experts and dynamically assigns token-level relevance, allowing the model to selectively emphasize appearance, motion, and contextual cues according to the textual query. Based on the routed representations, CBA performs reciprocal attention in both text-to-video and video-to-text directions, while introducing a counterfactual alignment branch to suppress background-driven shortcuts; this encourages robust matching based on causal evidence rather than incidental correlations. Finally, TCR imposes temporal causality-aware consistency by penalizing alignment instability under lightweight temporal perturbations, thereby improving motion sensitivity without requiring dense frame sampling. For scalable deployment, we further incorporate parameter sharing across experts and quantization-friendly projections, achieving a favorable accuracy–latency trade-off. Experiments on MSR-VTT, MSVD, and VATEX demonstrate consistent improvements over strong baselines, achieving Recall@1 scores of 55.0%, 60.3%, and 68.5%, respectively, while maintaining high inference efficiency.