Large Language Models (LLMs) serving as automatic evaluators (LLM-as-a-Judge) have become essential for assessing Retrieval-Augmented Generation (RAG) systems. However, in multilingual settings, these judges exhibit significant calibration drift across languages, producing scores that are neither comparable nor aligned with human judgments. We present CalibJudge, a post-hoc calibration framework that addresses this challenge through: (1) language-specific temperature scaling, (2) uncertainty quantification, and (3) selective abstention. We evaluate CalibJudge on the MEMERAG benchmark covering five languages. Our experiments demonstrate that CalibJudge improves correlation with human annotations by up to 21.3% relative improvement in Kendall's while reducing cross-lingual fairness gaps by 42% and achieving 88% balanced accuracy at 70% coverage.