From Question Answering to Task Completion: A Survey on Agent System and Harness Design

Jianyuan Guo; Zhiwei Hao; Chengcheng Wang; Cheng Fan; Tingzhang Luo; Hongguang Li; Ying Gao; Hefei Mei; Jiankun Peng; Rongjian Xu; Minjing Dong; Han Wu; Mengyu Zheng; Kai Han; Shiqi Wang; Chang Xu; Yunhe Wang

doi:10.20944/preprints202606.1312.v1

Submitted:

15 June 2026

Posted:

17 June 2026

You are already at the latest version

Abstract

LLM-based agents mark a shift from passive question answering to active task completion: they perceive environments, invoke tools, maintain state, and act over extended horizons. As agent systems have evolved from prompt engineering to workflows and context engineering, harness engineering, and agent-native training with co-evolution, a central question has become increasingly important: where does the bottleneck in agent performance reside—in the foundation model, in the execution harness, or in the coupling between them? This survey examines LLM-based agents through a model harness lens. We first clarify the functional definition of agents and the implementation view of an LLM-based agent as a foundation model coupled with an execution harness. Wethen analyze the limits of model-centric scaling, trace four paradigms of agent engineering, and decompose the execution harness into six coupled runtime responsibilities: observation, context, control, action, state, and verification/governance. Using this decomposition, we map task properties and domain pressures to harness configurations, review benchmark and evaluation practices, and synthesize model–harness evidence on how runtime design affects long-horizon task completion, efficiency, and reliability. Finally, we identify open challenges in value-aware evaluation, safety, harness generalization, and model–harness co-evolution. Rather than treating agents as models with auxiliary tools, this survey argues that agent quality—including success, efficiency, safety, and generalization—emerges from the interaction between model capability, runtime infrastructure, task structure, and evaluation design. A collection of papers discussed in this survey is provided in https://github.com/ggjy/Awesome-Agent-Engineering.

Keywords:

LLM-based agents

;

harness engineering

;

prompt engineering

;

model-harness co-evolution

;

evaluation benchmarks

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

From Question Answering to Task Completion: A Survey on Agent System and Harness Design

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe