This paper argues that evaluating AI–IoT climate adaptation in water systems cannot rely solely on performance metrics; it requires legitimacy stress-testing grounded in contextual validity and incident-based assessment. While artificial intelligence (AI), machine learning, and Internet of Things (IoT) technologies are transforming water management—enhancing forecasting, monitoring, and decision-making for floods, droughts, and agricultural use—current evaluations remain largely model-centric, prioritising predictive accuracy over real-world viability. As a result, even technically robust systems can fail in practice, manifesting as missed events, false-alarm fatigue, delayed escalation, exclusion of vulnerable groups, and weak accountability—especially under climate variability and institutional constraints. The paper introduces a Legitimacy Stress-Test as a structured protocol for evaluating AI–IoT water systems as socio-technical infrastructures. Anchored in the Contextual Research Validity Index (CRVI), the framework comprises eight dimensions: data reliability, sensor performance, institutional readiness, governance of decision rights, equity, contestability, redress, and auditability. It links weaknesses across these dimensions to specific incident pathways, enabling proactive identification of governance risks and mitigation priorities. An illustrative flood early-warning case shows how strong predictive performance can fail to deliver resilience when contextual and governance conditions are misaligned. The proposed stress-test complements, rather than replaces, hydrological validation by clarifying when and why model performance breaks down. It offers a practical evaluation tool for agencies, donors, and regulators scaling AI–IoT climate adaptation systems.