With the rise of Industry 4.0, Augmented Reality (AR) has become pivotal for human-robot collaboration. However, most industrial AR systems still rely on pre-defined tracked images or markers, limiting adaptability in unmodeled or dynamic environments. This paper proposes a novel Interactive Semantic-Augmented Reality (ISAR) framework that synergizes Edge AI and Cloud Vision-Language Models (VLMs). To ensure real-time performance, we implement a Dual-Thread Asynchronous Architecture on the robotic edge, decoupling video streaming from AI inference. We introduce a Confidence-Based Triggering Mechanism, where a cloud-based VLM is invoked only when edge detection confidence falls below a specific threshold. Instead of traditional image cropping, we employ a Visual Prompting strategy—overlaying bounding boxes on full-frame images—to preserve spatial context for accurate VLM semantic analysis. Finally, the generated insights are anchored to the physical world via Screen-to-World Raycasting without fiducial markers. This framework realizes a semantic-aware 'Intelligent Agent' that enhances Human-in-the-Loop (HITL) decision-making in complex industrial settings.