Medical Visual Question Answering (MedVQA) aims to answer medical questions from clinical images. However, current models often rely on spurious language shortcuts rather than visual evidence, compromising clinical reliability. To this end, we propose a causal dual-interventional framework to mitigate language shortcuts in MedVQA. Our method incorporates two components: a textual de-confounding module and a counterfactual visual verifier. The textual de-confounding module disrupts linguistic shortcut biases via concept-agnostic perturbations to block backdoor pathways. Meanwhile, it aligns clinical terms with anatomical regions, compelling the model to establish genuine visual dependencies. In addition, the counterfactual visual verifier evaluates visual reliance by masking key regions and measuring prediction confidence drops under occlusion, thereby reducing language-driven artifacts. Extensive experiments on two public datasets demonstrate that our method significantly outperforms existing baselines.