Artificial intelligence weather models achieve forecast skill comparable to numerical weather prediction at far lower computational cost, yet their reliability for high-impact extremes remains largely uncharacterized. We evaluate Aurora, a state-of-the-art deterministic AI model, using an event-based framework spanning tropical cyclones, freezes, heatwaves, atmospheric rivers, and extreme precipitation at lead times from 1 to 21 days. Aurora demonstrates strong short-range (1–7 day) skill: mean tropical cyclone track errors of 20–60 km at 1–3 day leads, high spatial agreement for temperature extremes (IoU ≥ 0.78), and accurate atmospheric river structure reproduction. Beyond 7–10 days, amplitude collapses as surface fields regress toward climatology, consistent with theoretical Lorenz predictability limits, while large-scale circulation patterns remain moderately skillful (pattern correlations 0.57–0.85 at 14–21 days for temperature extremes). This pattern–amplitude divergence, where synoptic-scale structure persists but threshold-based extremes collapse, is the central finding; event-specific failures include catastrophic TC recurvature errors, systematic intensity underestimation, and pronounced in-sample versus out-of-sample precipitation skill degradation. Aurora provides reliable deterministic guidance within 7–10 days, positioning it as a computational anchor for hybrid probabilistic forecasting systems rather than a standalone operational replacement.