MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

Yunfei Feng; Xi Zhao; Cheng Zhang; Dahu Feng; Daolin Cheng; Jianqi Yu; Yubin Xia; Erhu Feng

doi:10.20944/preprints202603.1313.v1

Submitted:

16 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract

Mobile agents can autonomously complete user-assigned tasks through GUI interactions. However, existing mainstream evaluation benchmarks, such as AndroidWorld, operate by connecting to a system-level Android emulator and provide evaluation signals based on the state of system resources. In real-world mobile-agent scenarios, however, many third-party applications do not expose system-level APIs to determine whether a task has succeeded, leading to a mismatch between benchmarks and real-world usage and making it difficult to evaluate model performance accurately. To address these issues, we propose MobiFlow, an evaluation framework built on tasks drawn from arbitrary third-party applications. Using an efficient graph-construction algorithm based on multi-trajectory fusion, MobiFlow can effectively compress the state space, support dynamic interaction, and better align with real-world third-party application scenarios. MobiFlow covers 20 widely used third-party applications and comprises 240 diverse real-world tasks, with enriched evaluation metrics. Compared with AndroidWorld, MobiFlow's evaluation results show higher alignment with human assessments and can guide the training of future GUI-based models under real workloads.

Keywords:

GUI Agent

;

VLM

;

evaluation

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe