Weed infestation significantly threatens crop productivity and quality, highlighting the need for accurate and scalable monitoring approaches. Recent advances in unmanned aerial vehicle (UAV) remote sensing and deep learning provide promising tools for field-scale weed detection. This study evaluates and compares two state-of-the-art instance segmentation models, Mask R-CNN and YOLOv8, for species-level weed detection in wheat fields under Mongolian agro-ecological conditions. The experiment was conducted in a 4 ha wheat field in Tuv Province, Mongolia, using high-resolution RGB imagery acquired from UAV flights in July 2025. Three dominant weed species were annotated and analyzed. Model performance was evaluated using mAP@0.5:0.95, Precision, Recall, F1-score, and mask IoU. At IoU thresholds of 0.25 and 0.5, both models demonstrated moderate detection performance (IoU = 0.25: Precision 0.49–0.76, Recall 0.20–0.77, F1-score 0.32–0.75; IoU = 0.5: Precision 0.42–0.67, Recall 0.18–0.75, F1-score 0.28–0.69), with variation among weed species. Mask R-CNN achieved higher Recall and more precise boundary delineation, improving weed coverage estimation, whereas YOLOv8 provided faster inference (≈11 ms per image, ~90 FPS) and higher precision, making it more suitable for large-area and near-real-time monitoring. These findings demonstrate the potential of UAV-based instance segmentation for weed detection in Mongolia and provide practical guidance for model selection in precision agriculture applications.