Agentic AI systems plan over time, call tools, write and retrieve memory, and coordinate across modules and services. Some of the most consequential failures in such systems are structural before they are behavioral: hidden coordination, trace-mediated lock-in, seam bottlenecks, and the silent erosion of meaningful override. This position paper argues that agentic AI systems should be evaluated and governed for \textbf{structural governability}, not only output alignment. By structural governability, we mean whether consequential coordination remains observable, attributable, interruptible, and steerable at the seams between components before irreversible commitments occur. Output-only evaluation does not capture this property. We propose an evidence ladder for structural risk in place of any single master metric: architecture-time priors over system structure, runtime coupling signals on telemetry graphs, and deeper state-regime diagnostics for high-stakes cases. We then sketch a research agenda for benchmarks that stress structure rather than terminal task success, reporting standards that disclose control geometry, and seam-level interventions including approval gates, permission freezes, trace decay, rollback, and subsystem isolation. The wider stake is cognitive integrity. Once agentic systems mediate what users retrieve, remember, delegate, and act upon, alignment depends on preserving the conditions under which users and operators can still understand, contest, redirect, and refuse those processes.