Dynamic Task-Chain Reconfiguration for Cooperative Counter-UAV Defense: A Multi-Agent Large Language Model Framework for Automated Heuristic Design

Yihao Zhong; Changsheng Yin; Ruopeng Yang; Yuantao Yang; Yiwei Lu; Yongqi Wen; Yongqi Shi; Bo Huang; Yu Tao; Jinyin Bai

doi:10.20944/preprints202606.1111.v1

Submitted:

12 June 2026

Posted:

17 June 2026

You are already at the latest version

Abstract

The growing affordability, autonomy, and swarming of small unmanned aerial vehicles (UAVs) turn low-altitude defense from single-shot interception into a multi-node cooperative decision problem, in which the loss of sensing, coordination, or engagement nodes breaks the closed loops linking them. This study formulates their recovery as the dynamic reconfiguration of cooperative counter-UAV task chains. Given a pre-disturbance plan and a set of failed defending nodes, reconfiguration is modeled as a constrained bi-objective optimization balancing recovered engagement effectiveness against the change to the baseline plan, and is solved by Multi-Agent Heuristic Evolution (MAHE), an automated heuristic design framework whose evolution, coordinator, repair, and reflection agents—driven by a large language model—evolve scoring heuristics for a fixed reconfiguration solver. Across instances of varying scale and under light-to-heavy node loss, MAHE outperforms both a single-agent heuristic-design counterpart and a range of hand-crafted solvers: the latter lose most of their solution quality as the problem grows, whereas MAHE preserves it and sustains high recovery at a nearly constant reconfiguration cost; an ablation confirms that its agents contribute complementary gains. These results indicate that automatically generated, reconfiguration-specific heuristics provide a scalable route to dynamic, heterogeneous, and constraint-intensive counter-UAV task planning.

Keywords:

counter-UAV defense

;

task-chain reconfiguration

;

dynamic task allocation

;

multi-objective optimization

;

automated heuristic design

;

large language model

;

multi-agent system

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Small unmanned aerial vehicles (UAVs) are increasingly low-cost, mobile, and able to operate in swarms, and the resulting low-altitude threats can no longer be neutralized by detecting a single target and intercepting it once. Effective defense is instead a continuous process that links detection, identification, tracking, threat assessment, task assignment, and engagement. Surveys of counter-UAV and cooperative counter-unmanned aircraft system defense consistently conclude that no single sensor or engagement resource can close this loop alone, and that a realistic defense couples detection, tracking, classification, command and control, and engagement across several platforms [1,2]. Counter-UAV defense is therefore best understood as the goal-oriented organization of cooperative task chains—ordered combinations of sensing, coordination, and engagement actions that together carry a single target from detection to neutralization—rather than as an isolated choice of one technique for one stage.

In such a scenario a UAV plays two roles at once: an intruding UAV is a target to be detected, tracked, and engaged, whereas a defending UAV takes on a sensing, coordination, or engagement role according to its payload. A target is handled only when its task chain forms an executable closed loop that joins at least one sensing node to one engagement node, optionally through a coordination node. Studies of swarm resilience and dynamic task allocation show that limited communication, node failure, changing tasks, platform state, and real-time constraints all erode this cooperative capability [3,4]. For a counter-UAV task chain the most immediate risk is structural: when a defending node is lost—to hostile fire, jamming, or power depletion—the chains that pass through it are severed, so that targets remain in the area yet can no longer be engaged by any executable loop.

How to repair such damage has been studied mostly under the banner of dynamic task allocation, where mission reward, resource consumption, chain structure, and the mix of node types are shown to jointly determine effectiveness in a changing environment [5,6,7]. The prevailing response to a change of state, however, is to re-solve the allocation from scratch, and comparatively little attention is paid to how surviving chains should be inherited after a disturbance, how targets should be prioritized for recovery, and how collateral change to the still-intact chains should be contained. Yet in a contested low-altitude setting it is precisely these factors—inheritance, selective recovery, and stability—that decide whether executable task chains can be restored quickly and without destabilizing the parts of the plan that the disturbance left untouched.

This paper accordingly studies the dynamic reconfiguration of cooperative counter-UAV task chains. Given a plan that already exists before a disturbance and the set of defending nodes that fail during the current period, the system must produce a new plan that uses none of the failed nodes. Reconfiguration is deliberately not treated as a one-off re-solving of a static problem: the broken chains should recover as much engagement effectiveness as possible, while the links of the existing plan should be changed no more than necessary. Because sensing and engagement nodes have limited capacity, and a coordination node raises effectiveness only at the cost of added delay and capacity, the task becomes a multi-objective combinatorial optimization problem jointly constrained by capacity, reachability, chain integrity, and time windows—a formulation that makes the trade-off between recovery and disturbance explicit and quantifiable.

Searching this space efficiently usually relies on heuristic rules, but rules crafted by hand struggle to adapt at once to the many node-loss patterns, threat levels, and chain delay structures that a disturbance can produce. Recent work on automated heuristic design (AHD) driven by large language models (LLMs) shows that a language model can take part in constructing heuristics through program search, evolutionary generation, and reflective feedback, and can return executable, quantitatively evaluable algorithms rather than prose advice [8,9,10]. Two hypotheses follow and are tested in this paper: first, that heuristics generated automatically and specialized to reconfiguration actions preserve solution quality as the problem scales, where fixed hand-crafted rules do not; and second, that distributing the heuristic-design loop across cooperating, specialized LLM agents yields more reliable heuristics than a single-agent loop. We therefore apply AHD to task-chain reconfiguration, so that the object it generates shifts from a generic allocation score toward recovery heuristics oriented to chain keeping, node replacement, chain-type switching, and reconfiguration-cost control.

The main contributions of this paper are fourfold. First, we formulate the dynamic reconfiguration of counter-UAV task chains as a problem in its own right: a baseline plan is given and part of the defending nodes have failed, and a new plan must be rebuilt from the surviving links. Unlike full reallocation, this formulation makes the inheritance of the historical plan, the recovery of capability after a disturbance, and the stability of the unaffected chains first-class concerns. Second, we build a constrained bi-objective model that couples recovery with disturbance: the total weighted engagement effectiveness of the reconstructed plan,

F_{A}

, measures recovery, while the link-change distance to the baseline plan,

F_{D}

, measures the disturbance imposed on the existing plan, which renders the conflict between high recovery and low disturbance explicit. Third, we propose Multi-Agent Heuristic Evolution (MAHE), an automated heuristic design framework in which evolution, coordinator, repair, and reflection agents driven by an LLM evolve scoring heuristics confined to task-chain reconfiguration—producing logic for chain keeping, insertion, node replacement, chain-type switching, and chain release rather than a static node-target matching score. Fourth, we instantiate the problem in a low-altitude counter-UAV setting and evaluate it extensively against hand-crafted multi-objective solvers and a single-agent heuristic-design baseline; the experiments show that MAHE maintains a consistently high solution quality as the numbers of nodes and targets grow, while the hand-crafted solvers degrade sharply, and an ablation confirms that its reflection, repair, and coordinator agents contribute complementary and increasingly decisive gains at larger scales.

The remainder of this paper is organized as follows. Section 2 reviews cooperative counter-UAV defense and task-chain modeling, dynamic multi-UAV task allocation, resilient reconfiguration under disturbance, and LLM-based automated heuristic design. Section 3 states the problem and the bi-objective model of cooperative task-chain dynamic reconfiguration. Section 4 details the MAHE framework. Section 5 reports the experimental design, the evaluation metrics, and the results. Section 6 concludes the paper, and Section 7 discusses future research directions.

2. Related Work

2.1. Task-Chain Modeling for Cooperative Counter-UAV Defense

The swarming of small UAVs has introduced a class of low-altitude security threats that no single platform can resolve in isolation [11], and counter-UAV defense is consequently described as a continuous process composed of detection, tracking, classification, and engagement. Ghazlane et al. [12] organize counter-UAV systems along this same process and conclude that no single sensing or engagement means can close the response loop alone; at the level of an individual stage, perception techniques such as YOLOv3-based counter-UAV tracking [13] provide only the input at the front of the loop. Castrillo et al. [2] examine cooperative defensive teams of drones and argue that cooperation across sensing, command and control, communication, and engagement compensates for the limited coverage, response speed, and engagement capacity of any single platform, while work on the protection of airports and other critical facilities integrates detection, cybersecurity risk, and risk management within a single framework [14]. Since different engagement means trade resource consumption against response time and mission effectiveness [15], the central design question is not which technique to select for a single stage but how to compose heterogeneous capabilities into a coherent whole. Counter-UAV defense is therefore best regarded as a multi-stage cooperative process that spans sensing, information transfer, decision coordination, and engagement, which is precisely the structure that a task-chain abstraction is intended to capture.

Task-chain modeling formalizes this cooperation by organizing distributed node capabilities into a goal-oriented closed-loop structure. Zhong et al. [6] formulate engagement-chain optimization for unmanned combat systems, characterizing system resilience through the observe-orient-decide-act loop and assessing link quality in terms of mission reward and post-interruption performance fluctuation. Liu et al. [7] construct engagement webs for heterogeneous UAV swarms, organizing nodes with diverse capabilities into several parallel engagement chains through a dynamic consensus-based coalition algorithm, and they emphasize that the way nodes are combined determines whether a mission remains closed in a contested and time-varying environment. Zheng and Yuan [16] jointly optimize sensor configuration and path planning, showing that the perception layout at the front of the chain governs the execution quality of the subsequent links. This line of work, however, treats a chain as stable once it is formed and gives little attention to how an executable closed loop can be re-established after defending nodes fail; taking an initial chain as given, this paper instead studies its reconfiguration under such failures.

2.2. Dynamic Multi-UAV Task Allocation

Building a task chain ultimately depends on matching resources to targets, a problem that counter-UAV studies typically formulate as resource-target allocation or dynamic weapon-target assignment. Representative methods address this matching through multi-agent reinforcement learning [17], improved grey wolf optimization [18], multi-objective particle swarm optimization [19], and Gaussian-mutation beetle swarm optimization [20], with each emphasizing a different balance between solution scale and dynamic adaptability. Although effective at the matching itself, these methods primarily optimize the binary relation of which resource engages which target, and they give limited attention to the cross-stage dependence among sensing, coordination, and engagement, as well as to the chain structure that this dependence induces.

At the broader level of multi-UAV task allocation, the central difficulty is the dynamic variation of tasks, platforms, and environmental states. The survey of Alqefari and Menai [4] identifies real-time requirements, task changes, platform state, communication constraints, and multi-objective optimization as the defining issues of the field. Xu et al. [5] address uncertain environments with a bi-objective dynamic cooperative assignment that uses a modified multi-objective evolutionary algorithm based on decomposition (MOEA/D) with heuristic initialization to jointly optimize mission reward and resource consumption; a centralized-versus-distributed comparison reveals the trade-off among global optimality, communication overhead, and response speed [21], and a hybrid scheme that combines clustering, optimization, and heuristic search improves dynamic adaptability [22]. To handle tasks that change during execution, hierarchical replanning revises the current plan while a formation is still executing it [23], and learning-based and search-based methods such as deep reinforcement learning, the wolf pack algorithm, Monte Carlo tree search, and generative allocation further extend the range of methods available for dynamic task allocation [24,25,26,27]. Although this line of work is methodologically rich, the optimization object remains the platform-task matching relation, and a change of state is typically handled by re-solving the allocation from scratch, without explicitly considering the inheritance of surviving links after a disturbance or the control of collateral changes to unaffected targets. It is precisely this requirement of inheritance and stability that distinguishes task-chain reconfiguration from general dynamic allocation.

2.3. Resilient Task-Chain Reconfiguration under Disturbance

The work most directly related to recovery after a disturbance is resilient task allocation, which concerns the ability of a system to sustain and restore task execution under node failure or capability degradation. Zeng et al. [28] combine task decomposition with resource allocation in a bilevel particle swarm optimization–integer linear programming (PSO-ILP) formulation that strengthens execution resilience for UAV swarms in complex and uncertain environments. Mayya et al. [29] introduce a capability model for heterogeneous multi-robot systems that characterizes how robots adapt to tasks under abnormal or degraded conditions. Neville et al. [30] interleave allocation, scheduling, and motion planning in D-ITAGS so that the plan can be adjusted online as task requirements and robot states evolve, while trait-based allocation and scheduling [31] and adaptive allocation under unknown and evolving capabilities [32] show that the capability model itself can change over time and must be kept feasible through continual updates.

Across these works, a common conclusion is that post-disturbance allocation should account for capability change, recovery cost, and the risk of interruption rather than be re-solved from scratch. Two aspects, however, remain under-addressed in counter-UAV settings: the closed-loop chain structure, in which a target requires both a sensing link and an engagement link while an optional coordination node introduces an effectiveness-delay trade-off, and the collateral change imposed on the still-intact links of unaffected targets, which recovery-only objectives rarely measure. Both aspects are addressed in this paper through a single reconfiguration model and a bi-objective formulation; because such reconfiguration is difficult to steer with hand-crafted rules across diverse node-loss patterns and chain types, the automated heuristic design described in the following subsection is adopted.

2.4. Large Language Model-Based Automated Heuristic Design

The goal of automated heuristic design is to reduce the reliance on manually designed rules in complex optimization tasks. Romera-Paredes et al. [10] propose FunSearch, which combines a large language model with a systematic evaluator to search for executable programs within a function space and discovers new heuristic structures for problems such as online bin packing. Liu et al. [8] propose Evolution of Heuristics, which represents a heuristic idea in natural language, translates it into code, and improves both the idea and the program through evolutionary search. Ye et al. [9] propose ReEvo, which introduces a reflection mechanism into heuristic evolution so that past search experience can influence later candidate generation. Research on heuristic evolution for multi-objective optimization further demonstrates that heuristics generated by a language model can handle multi-objective trade-offs [33].

Within more complex automated optimization frameworks, LLMOA uses a large language model as a high-level hyper-heuristic component to construct sequences of optimization operators [34], while autonomous multi-objective optimization uses a language model to design the algorithmic structure for solving multi-objective problems [35]. Chen et al. [36] propose AgentAD, a multi-agent language model framework that assigns generation, verification, and feedback roles to different agents for the automated design of algorithms in Earth observation satellite scheduling. LLaMEA further demonstrates that a language model can automatically generate black-box optimization metaheuristics through an evolutionary process [37]. In addition, studies on the importance of evolutionary search, on exploration guided by Monte Carlo tree search, and on neighborhood search driven by a language model improve the pipeline of automated heuristic design with respect to search pressure, space coverage, and the efficiency of neighborhood search [38,39,40]. These studies demonstrate that a language model can progress from generating answers to generating executable algorithmic structures. Existing automated heuristic design, however, primarily targets standard combinatorial or continuous optimization benchmarks and typically produces generic construction rules, local search operators, or metaheuristic pipelines. This paper differs by embedding automated heuristic design into cooperative counter-UAV task-chain reconfiguration, so that the generated object directly serves chain keeping, node replacement, chain-type switching, and reconfiguration-cost control after a disturbance.

3. Problem Definition

3.1. Task-Chain Representation and Baseline Plan

This section formalizes cooperative counter-UAV task-chain reconfiguration over a single decision period following a disturbance. The defense system comprises sensing, coordination, and engagement nodes, which are already organized into a baseline task-chain plan before the disturbance. When part of these nodes fail, the solver must rebuild the surviving plan subject to five classes of constraints, namely failed-node prohibition, reachability, capacity, chain integrity, and time windows. The reconstruction balances two competing objectives: maximizing the engagement effectiveness recovered by the new plan and minimizing its deviation from the baseline.

Let

S

,

C

,

I

, and

U

denote the sets of sensing nodes, coordination nodes, engagement nodes, and intruding-UAV targets, respectively:

S = {1, \dots, m}, C = {1, \dots, h}, I = {1, \dots, n}, U = {1, \dots, l} .

(1)

The spatial layout of the nodes, including position, coverage radius, and effective range, is summarized by the binary reachability parameters

a_{s, u}^{S}

,

a_{c, u}^{C}

, and

a_{i, u}^{I}

, where a value of one indicates that the corresponding link is physically reachable. On a reachable link,

q_{s, u}

denotes the effective sensing probability of sensing node s for target u and

p_{i, u}

the effective engagement probability of engagement node i for target u; both are set to zero on unreachable links. The response delays of the sensing, coordination, and engagement stages are denoted

τ_{s}^{S}

,

τ_{c, u}^{C}

, and

τ_{i, u}^{I}

, respectively.

A candidate reconstructed plan is described by three classes of binary assignment variables:

x_{s, u}, z_{c, u}, y_{i, u} \in {0, 1}, X = (x, z, y),

(2)

where

x_{s, u} = 1

,

z_{c, u} = 1

, and

y_{i, u} = 1

indicate that sensing node s, coordination node c, and engagement node i are assigned to target u. Target u is regarded as activated once it receives at least one assignment from any node class:

r_{u} (X) = 1 [\sum_{s \in S} x_{s, u} + \sum_{c \in C} z_{c, u} + \sum_{i \in I} y_{i, u} > 0] .

(3)

Intruding UAVs differ in threat value and in how much delay they tolerate, so the model distinguishes two executable task chains. The sensing-engagement chain relays target information from a sensing node directly to an engagement node; its end-to-end delay is short, which makes it suitable for targets with tight time windows. The sensing-coordination-engagement chain inserts a coordination node that performs situation fusion and fire scheduling, introducing a target-level success-rate gain at the price of additional delay, and is therefore reserved for targets whose time windows can absorb that delay. The two chains differ only in whether target u is connected to a coordination node, a choice the solver makes from each target’s time window rather than fixing in advance by target class. In either case, an activated target must contain at least one sensing node and one engagement node so that its task chain forms a closed loop.

The success-rate gain introduced by the coordination stage is

γ_{u} (z) = 1 + (g - 1) 1 [\sum_{c \in C} z_{c, u} \geq 1], g > 1,

(4)

where g is the coordination gain coefficient, so that

γ_{u} = g

when target u connects to a coordination node and

γ_{u} = 1

otherwise.

The engagement effectiveness of a target is synthesized from the per-link probabilities along its task chains. On a reachable link, the kill probability of a single sensing-engagement chain

(s, i)

against target u is

θ_{s, i, u} (z) = min {q_{s, u} p_{i, u} γ_{u} (z), 1} .

(5)

When several chains act on target u simultaneously, and assuming the chains are independent, the combined engagement success probability of target u is one minus the product of the individual chain failure probabilities:

P_{u} (X) = 1 - \prod_{s \in S} \prod_{i \in I} {(1 - θ_{s, i, u} (z))}^{x_{s, u} y_{i, u}} .

(6)

The end-to-end response time of target u aggregates the delays of its assigned nodes:

T_{u} (X) = \sum_{s \in S} τ_{s}^{S} x_{s, u} + \sum_{c \in C} τ_{c, u}^{C} z_{c, u} + \sum_{i \in I} τ_{i, u}^{I} y_{i, u} .

(7)

The baseline plan that precedes the disturbance is denoted

X^{0} = (x^{0}, z^{0}, y^{0}) .

(8)

The baseline plan is an exogenous input to the reconfiguration problem and is not optimized jointly with the reconfiguration method. It is produced by a deterministic greedy task-chain constructor that respects all reachability, capacity, chain-integrity, and time-window constraints and assigns at least one sensing node and one engagement node to as many targets as possible. A target with a loose time window additionally connects to a coordination node to obtain the success-rate gain, whereas a target with a tight window retains a low-delay sensing-engagement chain. In the experiments that follow, sensing and engagement nodes have unit capacity,

κ_{s}^{S} = κ_{i}^{I} = 1

, and each coordination node has multi-target capacity

κ_{c}^{C} = ⌈ l / h ⌉ > 1

, so that the baseline plan can supply one executable chain per target wherever the resources allow.

3.2. Disturbance Event and Reconfiguration State

A disturbance event specifies the defending nodes that leave service during the current period. In a low-altitude counter-UAV setting, a defending UAV may drop out of the cooperative loop through destruction by hostile fire, loss of communication under jamming, or depletion of onboard power; each of these is treated uniformly as a node failure. The event is written as

Δ = (D^{S}, D^{C}, D^{I}),

(9)

where

D^{S}

,

D^{C}

, and

D^{I}

are the sets of failed sensing, coordination, and engagement nodes. Removing every baseline link that passes through a failed node leaves the still-usable surviving plan:

\begin{matrix} x_{s, u}^{surv} & = x_{s, u}^{0} 1 [s \notin D^{S}], \end{matrix}

(10)

\begin{matrix} z_{c, u}^{surv} & = z_{c, u}^{0} 1 [c \notin D^{C}], \end{matrix}

(11)

\begin{matrix} y_{i, u}^{surv} & = y_{i, u}^{0} 1 [i \notin D^{I}] . \end{matrix}

(12)

The targets whose baseline chain passes through a failed node form the affected set

U^{aff}

. Their closed loops are broken and must be rebuilt without the failed nodes, whereas the surviving links of the remaining targets are kept whenever possible. Both the recovery objective

F_{A}

and the reconfiguration-cost objective

F_{D}

are evaluated over the entire plan, as defined in Section 3.3; the set

U^{aff}

serves only to delineate the scope of the disturbance and to steer the search toward the damaged links.

The reconstructed plan X is obtained from the surviving plan

X^{surv}

through five link-level reconfiguration actions, which form the basic operations of the solver and are listed in Table 1. Affected targets are rebuilt mainly by insertion, replacement, switching, and release, while the remaining targets are kept; every entry of X that departs from the baseline plan

X^{0}

contributes to the reconfiguration cost

F_{D}

.

Table 2 summarizes the main symbols of the model.

3.3. Objective Functions

The reconfiguration problem optimizes two conflicting objectives; throughout,

v_{u}

denotes the value (threat weight) of target u and

P_{u} (X)

the engagement success probability of Equation (6).

The first objective is the recovery effectiveness

F_{A}

, defined as the value-weighted engagement effectiveness of the reconstructed plan over all targets, which drives the solver to restore overall engagement capability after node loss:

F_{A} (X) = \sum_{u \in U} v_{u} P_{u} (X) .

(13)

Because the links on failed nodes are forced to zero, the broken closed loops must be rebuilt from surviving nodes. Retaining only the surviving links gives a feasible lower bound on effectiveness, while the pre-disturbance value

F_{A} (X^{0})

marks the upper bound attained by full recovery; maximizing

F_{A}

pushes the solution toward that upper bound.

The second objective is the reconfiguration cost

F_{D}

, the link-change distance between the reconstructed plan and the pre-disturbance baseline plan

X^{0}

. It discourages secondary changes to the existing plan and equals the number of entries in which the three assignment matrices differ from the baseline:

F_{D} (X) = \sum_{s \in S} \sum_{u \in U} |x_{s, u} - x_{s, u}^{0}| + \sum_{c \in C} \sum_{u \in U} |z_{c, u} - z_{c, u}^{0}| + \sum_{i \in I} \sum_{u \in U} |y_{i, u} - y_{i, u}^{0}| .

(14)

Counted at unit cost,

F_{D}

covers both the links forced off on failed nodes and the links newly built or replaced during recovery, and therefore measures the collateral change that reconfiguration imposes on the existing plan. The reconfiguration problem is finally cast as a constrained bi-objective minimization:

min F (X) = (- F_{A} (X), F_{D} (X)) .

(15)

3.4. Constraints

A feasible reconstructed plan must satisfy several classes of constraints. First, all assignments on failed nodes are forced to zero:

\begin{matrix} x_{s, u} & = 0, \forall s \in D^{S}, \forall u \in U, \end{matrix}

(16)

\begin{matrix} z_{c, u} & = 0, \forall c \in D^{C}, \forall u \in U, \end{matrix}

(17)

\begin{matrix} y_{i, u} & = 0, \forall i \in D^{I}, \forall u \in U . \end{matrix}

(18)

Second, every node-target assignment respects physical reachability:

\begin{matrix} x_{s, u} & \leq a_{s, u}^{S}, \forall s, u, \end{matrix}

(19)

\begin{matrix} z_{c, u} & \leq a_{c, u}^{C}, \forall c, u, \end{matrix}

(20)

\begin{matrix} y_{i, u} & \leq a_{i, u}^{I}, \forall i, u . \end{matrix}

(21)

Third, the number of targets served by each node is bounded by its capacity:

\begin{matrix} \sum_{u \in U} x_{s, u} & \leq κ_{s}^{S}, \forall s \in S, \end{matrix}

(22)

\begin{matrix} \sum_{u \in U} z_{c, u} & \leq κ_{c}^{C}, \forall c \in C, \end{matrix}

(23)

\begin{matrix} \sum_{u \in U} y_{i, u} & \leq κ_{i}^{I}, \forall i \in I . \end{matrix}

(24)

Fourth, each target connects to at most one coordination node, which is what separates the sensing-engagement chain from the sensing-coordination-engagement chain:

\sum_{c \in C} z_{c, u} \leq 1, \forall u \in U .

(25)

Fifth, the reconfiguration follows a partial-recovery strategy: a target may remain inactive, but once it is activated it must include at least one sensing node and one engagement node so as to form an executable closed loop:

\begin{matrix} \sum_{s \in S} x_{s, u} & \geq r_{u} (X), \forall u \in U, \end{matrix}

(26)

\begin{matrix} \sum_{i \in I} y_{i, u} & \geq r_{u} (X), \forall u \in U . \end{matrix}

(27)

Sixth, the end-to-end response time of every activated target must lie within its window:

T_{u} (X) \leq W_{u}, \forall u \in U s . t . r_{u} (X) = 1 .

(28)

Finally, an activated target must be engaged with sufficient certainty: its combined engagement success probability must reach the target’s kill threshold

P_{u}^{min}

(specified in Section 5):

P_{u} (X) \geq P_{u}^{min}, \forall u \in U s . t . r_{u} (X) = 1 .

(29)

Taken together, the objectives and constraints above define a constrained bi-objective nonlinear integer program. The nonlinearity originates from the probabilistic effectiveness expression in Equation (6), while the combinatorial difficulty arises from the binary assignment of three heterogeneous node classes under the coupled chain-integrity and time-window requirements.

4. Method

4.1. Overview and Design Rationale

MAHE has a bilevel structure (Figure 1). The outer level is an LLM-driven evolutionary layer that maintains a population P of N heuristics; each individual is a heuristic package—an executable scoring program

ψ

, a one-sentence natural-language design note, and meta-information (parent origin, generating operator, and measured fitness). The inner level is the multi-objective reconfiguration solver of Section 4.7, which, steered by the scores of

ψ

, searches the surviving plan and returns an approximate Pareto front in the

(- F_{A}, F_{D})

space. The outer level never edits a reconfiguration plan directly; it evolves the scoring logic that guides the inner search. Because the inner solver, its constraint handling, and its evaluation rule are held fixed for every candidate, two heuristics are always compared in the same environment, so any difference in fitness is attributable to the heuristic alone.

A single LLM loop that both proposes and judges heuristics tends to drift: it repeats similar score formulas, wastes inner evaluations on code that does not run, and cannot tell whether a stall comes from a poor operator choice or an exhausted idea. MAHE therefore decomposes the outer level into four agents with disjoint, mutually constraining contracts, summarized in Table 3. The evolution agent proposes a candidate by applying one variation operator to selected parents; the coordinator agent decides which operator to apply, learning online which operators are currently productive; the repair agent fixes candidates that fail the static validator, so that only executable, interface-conformant code reaches the inner solver, and accumulates the failures it observes into a reusable memory; and the reflection agent diagnoses why the search has stalled and injects a corrective hint into later generation. Together they form a generate–validate–evaluate–reflect closed loop in which generation is adaptive rather than driven by a fixed operator rotation.

Algorithm 1 states the outer loop. After the population is seeded, each iteration draws an operator from the coordinator, lets the evolution agent generate a candidate from sampled parents, and passes it to the repair agent, which either makes it executable or rejects it. A surviving candidate is scored by the inner solver on the training scenarios; its realized improvement over the incumbent best updates the coordinator’s operator weights, and the population keeps the best N packages. A stall counter tracks consecutive non-improving evaluations, and when it reaches the patience

T_{p}

the reflection agent fires and its diagnosis conditions the next generation. The loop runs until the evaluation budget B is spent, after which the best heuristic

ψ^{★}

is deployed on the test scenarios at the full inner-solver budget. In the experiments

N = 10

and

B = 100

.

Algorithm 1:MAHE outer loop (LLM-driven heuristic evolution).

Require:: training scenarios $T$ ; seed heuristic $ψ_{0}$ ; population size N; budget B; operator set $O$ ; patience $T_{p}$
Ensure:: best heuristic $ψ^{★}$
1:: $P \leftarrow I n i t P o p u l a t i o n (ψ_{0}, N)$ , each scored on $T$ by the inner solver ▹ seed package + LLM-initialized packages
2:: $ω_{o} \leftarrow 1 / | O | \forall o \in O$ ; $M \leftarrow \emptyset$ ; $ξ \leftarrow n o n e$ ; $c_{stall} \leftarrow 0$ ; $b \leftarrow | P |$
3:: while $b < B$ do
4:: $o \leftarrow S e l e c t O p e r a t o r ({ω_{o}})$ ▹ coordinator agent (operator bandit)
5:: $ψ \leftarrow G e n e r a t e (o, P a r e n t s (P), ξ, M)$ ▹ evolution agent
6:: $(ψ, ok) \leftarrow R e p a i r (ψ, M)$ ▹ repair agent, triggered by the validator; may grow $M$
7:: if $\neg ok$ then
8:: $U p d a t e W e i g h t s (o, 0)$ ; $c_{stall} \leftarrow c_{stall} + 1$ ; continue
9:: end if
10:: score $ψ$ on $T$ with the inner solver; $b \leftarrow b + 1$
11:: $r \leftarrow$ gain of $ψ$ in mean normalized hypervolume over the incumbent best ▹ operator reward
12:: $U p d a t e W e i g h t s (o, r)$ ▹ coordinator learns
13:: if $ψ$ improves the incumbent best then
14:: record the before/after packages; $c_{stall} \leftarrow 0$ ; $ξ \leftarrow n o n e$
15:: else
16:: $c_{stall} \leftarrow c_{stall} + 1$ ; append $ψ$ to the stalled set
17:: end if
18:: $P \leftarrow K e e p B e s t (P \cup {ψ}, N)$ ▹ rank by fitness; drop duplicates
19:: if $c_{stall} \geq T_{p}$ then
20:: $ξ \leftarrow R e f l e c t (before, after, stalled)$ ; $c_{stall} \leftarrow 0$ ▹ reflection agent
21:: end if
22:: end while
23:: return the fittest $ψ^{★}$ in P

4.2. Heuristic Package and Reconfiguration Interface

Genotype. Each individual is a heuristic package: a one-sentence design note in natural language together with a Python program that implements a single scoring function

ψ

. The note is never executed—it carries the design intent into later generation prompts and reflections—whereas only the program couples to the inner solver. This separation lets the LLM reason about why a heuristic should work while the search rewards only what its code measurably achieves.

Phenotype interface. The program exposes a fixed signature, which is the sole coupling point between the outer evolution and the inner solver:

ψ (Ξ, Δ, X^{0}) \to \{S^{S} \in R^{m \times l}, S^{C} \in R^{h \times l}, S^{I} \in R^{n \times l}, w \in R^{3}\} .

(30)

Here

Ξ

gathers the instance parameters (node sets, reachability

a^{S}, a^{C}, a^{I}

, effective probabilities

q, p

, stage delays, capacities, target values, and time windows); the disturbance

Δ

enters as the per-class survival masks that flag the failed nodes

D^{S}, D^{C}, D^{I}

; and

X^{0}

is the pre-disturbance baseline plan. The entries

S_{s, u}^{S}

,

S_{c, u}^{C}

, and

S_{i, u}^{I}

are priority scores for connecting a sensing, coordination, or engagement node to target u (higher means more preferred). The surviving plan

X^{surv}

and the affected set

U^{aff}

of Section 3.2 are not passed separately, since both are recoverable inside

ψ

from

X^{0}

and the survival masks. The auxiliary vector

w

is a three-dimensional objective-bias term retained from the underlying solver interface; the reconfiguration search is steered entirely by the three score matrices, so

w

is validated for shape but not consumed.

Genotype-to-phenotype mapping. Unlike the static node-target score of generic task allocation, this interface makes the disturbance first-class. The survival masks push the scores of failed nodes out of contention; the baseline plan

X^{0}

lets the heuristic distinguish a surviving link from a newly built one, and thus trade recovery against disturbance; and the discrepancy between

X^{0}

and the survival masks reveals which targets have lost their closed loop and should be rebuilt first. A competent heuristic therefore raises the score of surviving, reachable nodes on broken chains and damps unnecessary changes elsewhere, balancing a high recovery effectiveness

F_{A}

against a low reconfiguration distance

F_{D}

. The inner solver converts these scores into the five link-level actions of Table 1—keep, insert, replace, switch, release. The search is seeded with one hand-written heuristic that scores each node by value-weighted effectiveness per unit cost and delay, adds a bonus for links already present in

X^{0}

, and masks out failed or unreachable nodes; every other initial package is written from scratch by the evolution agent. The signature in Equation (30) is frozen for the whole run: evolution rewrites only the body of

ψ

, never the problem, the constraints, or the evaluation function.

4.3. Evolution Agent

The evolution agent realizes variation. Given an operator o chosen by the coordinator and one or two parents sampled from P, it assembles a prompt—the reconfiguration task description, the function specification of Equation (30), the parent package(s), the active reflection

ξ

, and the accumulated pitfalls

M

—queries the LLM, and parses the reply into a new package: a brace-delimited design note followed by a fenced code block. Candidates whose code duplicates an existing package (by hash) are dropped. The operator pool

O

holds seven operators in three families, listed in Table 4: exploration (

E_{1}, E_{2}

) proposes structurally new heuristics for diversity; modification (

M_{1}, M_{2}, M_{3}

) refines a single parent; and ruin-and-recreate (

R_{1}, R_{2}

) deletes part of a parent and rebuilds it, escaping the parent’s local structure without discarding it wholesale. A separate initialization operator

I_{1}

writes a heuristic from scratch and is used only to seed P. The five exploration and modification operators

E_{1}, E_{2}, M_{1}, M_{2}, M_{3}

adopt the operator style of Evolution of Heuristics (EoH) [8], whereas the two ruin-and-recreate operators

R_{1}, R_{2}

absorb the idea of LLM-driven neighborhood search [40], which repeatedly ruins and rebuilds part of a heuristic to explore its structural neighborhood. The full prompt templates—the shared task description and function specification, the initialization operator

I_{1}

, and the seven variation operators

E_{1}, E_{2}, M_{1}, M_{2}, M_{3}, R_{1}, R_{2}

—are reproduced in Appendix A (Figure A1–Figure A6).

Parents are sampled with probability proportional to

1 / (r_{i} + 1 + N)

, where

r_{i}

is the fitness rank of

ψ_{i}

in P (rank 0 is best), which favors stronger heuristics without starving the tail;

E_{1}

and

E_{2}

draw two parents, the remaining operators one.

4.4. Coordinator Agent: An Operator Bandit

The coordinator decides which operator to apply, treating operator choice as a multi-armed bandit that replaces the fixed operator schedule of single-agent heuristic design. Each operator

o \in O

carries a weight

ω_{o}

, initialized uniformly to

1 / | O |

. Operators are sampled from a distribution that mixes the normalized weights with a uniform exploration floor

ϵ_{0}

, in the style of Exp3:

π_{o} = ϵ_{0} + (1 - | O | ϵ_{0}) \frac{ω_{o}}{\sum_{o^{'} \in O} ω_{o^{'}}} .

(31)

Once the candidate produced by o has been scored, the coordinator forms a reward from the realized gain of that candidate over the current incumbent best heuristic, clipped to a bounded range,

r = clip (\bar{H V} (ψ) - {\bar{H V}}^{best}, - c_{r}, c_{r}),

(32)

and updates the operator weight multiplicatively, after which the weights are renormalized:

ω_{o} \leftarrow ω_{o} exp (η r) .

(33)

Here

\bar{H V} (ψ)

is the mean normalized hypervolume (

H V

) of

ψ

over the training scenarios,

η

is the learning rate,

c_{r}

the reward clip, and

ϵ_{0}

the exploration floor; the experiments use

η = 2

,

c_{r} = 0.5

, and

ϵ_{0} = 0.02

. Operators that have recently yielded genuine gains are sampled more often, while the floor keeps every operator reachable, so the schedule tracks the shifting usefulness of operators over a run with no phase labels, diversity gates, or hand-set thresholds—a deliberately transparent controller governed by a single learning rate. Disabling this agent collapses operator choice to a deterministic round-robin over

O

, which is the w/o coordinator ablation of Section 5.4.

4.5. Repair Agent: Accumulating Error Memory

Before a candidate can consume an inner evaluation, the repair agent guarantees that it is runnable and interface-conformant. A static validator compiles the code, executes

ψ

on a probe instance, and checks that the result is a dictionary whose four arrays have exactly the shapes

m \times l

,

h \times l

,

n \times l

, and 3 and contain only finite values. If the validator reports an error, the agent enters a repair loop of at most

L_{r}

attempts: at each attempt it asks the LLM for a minimal edit that corrects only the syntactic or interface fault while preserving the heuristic’s scoring logic, then re-validates; a candidate still invalid after

L_{r}

attempts is rejected and counts as a zero-reward outcome for its operator. Crucially, every distinct failure is distilled by the LLM into a one-sentence root-cause lesson and accumulated into a persistent error memory

M

. This memory is injected both into subsequent repair prompts and, as a known-pitfalls note (Appendix A), into the generation prompts of the evolution agent, so a recurring fault is progressively designed out at its source rather than only patched after the fact. Repair thus both rescues otherwise-discarded high-quality heuristics and steers future generation away from systematic interface mistakes. The experiments use

L_{r} = 3

; disabling the agent—candidates that fail validation are discarded outright—gives the w/o repair ablation.

4.6. Reflection Agent: Patience-Triggered Diagnosis

The reflection agent supplies semantic, cross-generation guidance and—unlike a critic invoked at every step—fires only when the search is demonstrably stuck. A stall counter increments on each evaluation that fails to improve the incumbent best and resets on any improvement; when it reaches the patience

T_{p}

, reflection is triggered. The agent is shown the two heuristics that straddle the last effective improvement—the package before and the package after, each with its mean normalized hypervolume—together with the recent non-improving packages, and is asked to diagnose, in free text, why progress halted (for instance, structural redundancy, neglected survival or baseline-plan signals, or poor objective balancing) and what structural change might break the stall. The resulting diagnosis

ξ

is prepended to the next generation prompt, biasing the LLM toward the indicated change, and is cleared as soon as a new improvement appears. Because reflection is gated by patience rather than issued every step, its extra LLM cost is paid only when deep diagnosis is actually warranted, and its effect is observable as a redirection of the operators that follow. The experiments use

T_{p} = 5

; removing the agent yields the w/o reflection ablation. This division of labor draws on reflective heuristic evolution and multi-agent automated algorithm design [9,33,36,37], but here each agent’s generated object is confined to the scoring and repair logic of task-chain reconfiguration.

4.7. Inner Reconfiguration Solver

The inner solver is a constrained search in the

(- F_{A}, F_{D})

space in the style of the non-dominated sorting genetic algorithm II (NSGA-II) (Algorithm 2). It takes a disturbed scenario—the survival masks, the baseline plan

X^{0}

, and the instance parameters—together with the score matrices produced by

ψ

, and returns an approximate Pareto set. The same solver scores candidates during evolution and deploys the final heuristic; only its budget differs, being reduced while scoring candidates for speed and run at full size at deployment (Section 5.2.3).

Initialization. The population of K individuals is built half by heuristic guidance and half at random. A score-guided greedy constructor processes targets in descending order of value plus best attainable score and, for each target, draws a surviving, reachable, capacity-available sensing and engagement node by a softmax over

S^{S}

and

S^{I}

, then attaches a coordination node when

S^{C}

is competitive and the time window permits; small Gaussian perturbations of the scores diversify these individuals. The remaining individuals are random feasible solutions biased toward

X^{0}

. Coverage is never forced—a target that cannot be feasibly closed is left inactive—so partial recovery is available from the first generation.

Score-guided offspring. Each generation selects parents by binary tournament on non-domination rank, crowding distance, and constraint violation, applies uniform crossover to the three assignment matrices, and then perturbs the child with a score-guided bit flip: an inactive entry is switched on with a probability that rises with its normalized score, and an active entry is switched off with a probability that rises with one minus its score (perturbation strength

β \in [0.25, 0.65]

). The heuristic thereby shapes both the initial solutions and the direction of search. Should

ψ

raise an error at runtime or return invalid scores, the solver falls back to plain uniform crossover, so an occasional faulty heuristic degrades quality gracefully rather than crashing the evaluation.

Feasibility repair. Every child is then projected onto the feasible region of Section 3.4: links on failed or unreachable nodes are cleared; per-node assignments exceeding capacity are trimmed by score; each target is reduced to at most one coordination node; targets left with only a sensing or only an engagement link are deactivated; and links that push the response time past the window are removed, coordination first. At the link level these projections are the five reconfiguration actions of Table 1—keeping a surviving link, inserting a new one, replacing a failed or weak node, switching the chain type, and releasing an unrecoverable target—so the abstract action set and its executable realization coincide.

Selection and archive. Survivors are chosen by fast non-dominated sorting with crowding-distance truncation, and an external archive accumulates the running non-dominated set, truncated to

A_{max}

by crowding distance on overflow. When the search ends, the first front of the archive is the heuristic’s Pareto set on that scenario. The evaluation rule, constraint handling, and archive policy are identical for every

ψ

; the heuristic influences only construction and offspring perturbation. This is precisely what makes the outer comparison fair: a difference in the returned front reflects the scoring logic, not the solver.

Algorithm 2:Heuristic-guided inner reconfiguration solver

I n n e r S o l v e r (scn, ψ)

on one scenario.

Require:: scenario $scn = (survival masks, X^{0}, Ξ)$ ; heuristic $ψ$ ; population K; generations G; archive size $A_{max}$
Ensure:: approximate Pareto set in the $(- F_{A}, F_{D})$ space
1:: $(S^{S}, S^{C}, S^{I}, w) \leftarrow ψ (Ξ, Δ, X^{0})$
2:: build $⌊ K / 2 ⌋$ individuals by score-guided greedy construction and the rest at random; repair all; evaluate $(- F_{A}, F_{D})$ and constraint violation
3:: $A \leftarrow$ non-dominated individuals of the population
4:: for $j = 1$ to G do
5:: $Π \leftarrow$ binary-tournament parents on (rank, crowding distance, violation)
6:: for each parent pair in $Π$ do
7:: child ← uniform crossover, then score-guided bit flip with $S^{S}, S^{C}, S^{I}$ ▹ fallback: plain crossover if $ψ$ is invalid
8:: child ← feasibility repair ▹ keep / insert / replace / switch / release, Table 1
9:: end for
10:: population ← non-dominated sort with crowding truncation of $(population \cup offspring)$ to K
11:: $A \leftarrow$ crowding truncation of $n o n d o m i n a t e d (A \cup offspring)$ to $A_{max}$
12:: end for
13:: return first non-dominated front of $A$

4.8. Outer-Level Fitness and Heuristic Selection

A candidate is scored by its average solution-set quality over the training scenarios

T

—the four training instances under three disturbance ratios, so

| T | = 12

. On each scenario t, the inner front is mapped into the unit square against a fixed reference point precomputed from the baseline solvers (Section 5), and its normalized hypervolume

{\hat{H V}}_{t} (ψ) \in [0, 1]

is measured; the per-heuristic quality is the mean

\bar{H V} (ψ) = \frac{1}{| T |} \sum_{t \in T} {\hat{H V}}_{t} (ψ) .

(34)

Let

ρ_{feas} (ψ)

denote the fraction of training scenarios on which

ψ

returns a non-empty front. The outer objective, which the population minimizes, couples quality with a hard feasibility gate:

J (ψ) = - \bar{H V} (ψ) + λ 1 [ρ_{feas} (ψ) < ρ^{★}] (1 - ρ_{feas} (ψ)),

(35)

where

ρ^{★}

is the minimum acceptable feasibility and

λ

a large penalty; the experiments use

ρ^{★} = 0.5

and

λ = 10^{6}

. The indicator makes the penalty dominate

\bar{H V}

whenever feasibility falls below

ρ^{★}

, so a heuristic that solves too few scenarios can never displace a feasible one however large its hypervolume on the rest, while above the gate the ranking is governed purely by solution-set quality. Scoring by hypervolume—rather than by a weighted sum of

F_{A}

and

F_{D}

—keeps the recovery–disturbance trade-off explicit and prevents a single composite number from concealing it; the same

\bar{H V} (ψ)

supplies the coordinator’s reward in Equation (32). When the budget B is spent, the incumbent best

ψ^{★} = arg {min}_{ψ} J (ψ)

is deployed: it drives the inner solver, run at full population and generations, to report

H V

,

F_{A}

, and

F_{D}

on each test scenario.

5. Experiments

5.1. Experimental Scenarios

The experiments address cooperative counter-UAV task-chain reconfiguration in a low-altitude area-protection setting. Each instance contains sensing, coordination, and engagement nodes together with a set of intruding-UAV targets, laid out so that coordination nodes occupy the rear, engagement nodes the middle, sensing nodes the front, and targets the side facing the intrusion. Sensing and engagement nodes have unit capacity, whereas a coordination node may serve several targets at the cost of an added delay. To assess generalization across scales, four training instances (T1–T4) were used for heuristic evolution and ten disjoint test instances (I1–I10), ranging from 8 to 320 targets, for the final comparison. The instance scales are listed in Table 5 and the attribute-generation rules in Table 6.

All attributes were drawn from the scale-independent distributions in Table 6, so instances differ only in size. For each instance, the baseline plan

X^{0}

was built once by a deterministic greedy constructor: targets are processed in descending value order and assigned the best reachable, capacity-available sensing and engagement nodes, with a coordination node attached when the time window permits; full coverage is not enforced. A disturbance scenario then removed

round (α \times N_{class})

nodes (at least one) from the sensing, coordination, and engagement classes independently and uniformly at random, while the targets stayed fixed; the disturbance ratio

α \in {0.15, 0.35, 0.50}

models light, moderate, and heavy node loss. Disturbance scenarios were sampled with five random seeds (2021–2025): for each seed, one scenario was drawn per instance and ratio. The ten test instances under the three ratios therefore yield

10 \times 3 \times 5 = 150

test scenarios for the comparison, while the four training instances supply the scenarios for heuristic evolution; the representative fronts and reconstruction reported below use seed 2021.

5.2. Experimental Setups

5.2.1. Experimental Platform Configuration

All experiments were conducted on a unified computing platform equipped with an Intel^® Core^™ Ultra 9 275HX CPU at 2.70 GHz and 32 GB of RAM, and all methods were implemented in Python 3.12. Each method was run once on every sampled scenario; because five independently seeded scenarios were drawn per instance and ratio, the reported values are the mean and standard deviation over the five seeds, after averaging each seed’s value over the three disturbance ratios.

5.2.2. Evaluation Metrics

Performance is measured by the normalized hypervolume

H V

of the recovered solution set in the two-dimensional objective space

(- F_{A}, F_{D})

, in which both objectives are minimized. To make the metric comparable across methods and across scenarios of very different scale, the hypervolume is measured against a common, data-driven reference rebuilt for each scenario rather than a hand-set point. For a given scenario, let

Q

be the union of the Pareto-front objective vectors produced by all compared methods, and let

z_{k}^{id} = min_{p \in Q} p_{k}, z_{k}^{nad} = max_{p \in Q} p_{k}, k \in {1, 2},

(36)

be the component-wise ideal and nadir points, with

p = (- F_{A}, F_{D})

. The reference point extends the nadir by a relative margin

ζ = 0.1

, and each objective vector is mapped into the unit box:

z_{k}^{ref} = z_{k}^{nad} + ζ (z_{k}^{nad} - z_{k}^{id}), {\hat{p}}_{k} = max (0, min (1, \frac{p_{k} - z_{k}^{id}}{z_{k}^{ref} - z_{k}^{id}})),

(37)

so that the normalized reference point is

{\hat{z}}^{ref} = (1, 1)

. Denoting by

\hat{PF}

the non-dominated subset of a method’s normalized vectors, its normalized hypervolume is the area dominated within the unit square,

H V = Λ (⋃_{\hat{p} \in \hat{PF}} [{\hat{p}}_{1}, 1] \times [{\hat{p}}_{2}, 1]) \in [0, 1],

(38)

where

Λ

is the Lebesgue measure (area):

H V \to 1

as the front approaches the ideal corner and

H V = 0

for an empty or fully dominated front, so a larger

H V

reflects better convergence and spread at once. Statistical significance between MAHE and each competing method is assessed with a two-sided Wilcoxon signed-rank test on the per-scenario

H V

of Equation (38), with Holm–Bonferroni correction across the comparisons.

5.2.3. Baseline Algorithms and Models

To evaluate the proposed method, MAHE is compared with baselines from distinct methodological categories, ranging from hand-crafted multi-objective solvers to recent language-model-based heuristic design, so that the comparison spans both numerical Pareto search and semantic heuristic generation:

NSGA-II [41] and AGE-MOEA [42]: general-purpose multi-objective evolutionary algorithms that serve as basic references for population-based Pareto search. NSGA-II relies on fast non-dominated sorting and crowding distance, whereas AGE-MOEA replaces the crowding distance with a geometry-based survival score.
MOEA/D-iAM2M [43] and adaptive large neighborhood search (ALNS) [44]: solvers more closely related to constrained resource-target assignment. MOEA/D-iAM2M is a decomposition-based method with dynamic subproblem assignment, and ALNS is a trajectory-based multi-objective variant that maintains an external non-dominated archive.
EoH [8]: a representative LLM-based AHD method that evolves scoring heuristics with a single language-model agent, without the multi-agent coordination, repair, and reflection of MAHE. It serves as the direct AHD counterpart for isolating the benefit of the multi-agent design.

For a fair comparison of the language-model component, both AHD-based methods (EoH and MAHE) used the same large language model backbone, DeepSeek-V4-Flash, as the heuristic generator under identical decoding settings, so that any performance difference reflects the framework design rather than the underlying model. To isolate the effect of the candidate-generation strategy, all methods shared the same baseline plan

X^{0}

and a comparable evaluation budget of about

10^{4}

candidate evaluations: the population-based solvers ran 100 individuals for 100 generations, while ALNS performed 1665 iterations, each generating six candidate solutions that were evaluated once. EoH and MAHE additionally ran an outer heuristic-evolution loop with a heuristic population of 10 under a budget of 100 heuristic evaluations; their main configurations, together with the shared inner-solver settings used at deployment, are listed in Table 7.

5.3. Comparative Experiments

This subsection evaluates MAHE against the five baselines along two axes: the quality of the recovered solution set and its robustness to increasing node loss. For every test instance, the mean and standard deviation are reported over five independent test-scenario seeds (each value first averaged over the three disturbance ratios), with the best entry in each row shown in bold. The analysis proceeds from the aggregate hypervolume across all ten instances, whose differences are then confirmed by a pairwise statistical test, through the recovery–disturbance trade-off fronts at increasing node-loss ratios, to a single reconstruction instance that makes the recovered plan interpretable.

Table 8 reports the normalized

H V

on the ten test instances, where a larger value indicates a better approximation of the

(- F_{A}, F_{D})

front in both convergence and spread; because the reference point is rebuilt for each scenario, values are comparable within a row but not across instances. Two patterns dominate. First, the two AHD-based methods outperform the four hand-crafted solvers by a wide margin, and the gap widens monotonically with scale. The hand-crafted solvers degrade steadily as the instance grows—NSGA-II, for example, falls from

0.856

on the eight-target instance I1 to

0.287

on the 320-target instance I10—and the group as a whole collapses from the

0.40

–

0.86

band on I1 to the

0.26

–

0.36

band on I10, whereas EoH and MAHE never drop below approximately

0.83

at any scale. The two families are at near-parity on the smallest instance (MAHE within

0.007

of the best solver on I1) but diverge sharply with size: on I10 the hand-crafted solvers occupy the

0.26

–

0.36

band while MAHE attains

0.964

, a

2.7

- to

3.7

-fold gap. This indicates that a fixed operator set cannot keep the search inside the high-quality feasible region as the numbers of nodes and targets grow, whereas a generated, disturbance-specific heuristic preserves solution quality across scales. Among the hand-crafted solvers, MOEA/D-iAM2M is the weakest at every scale (mean

0.296

), suggesting that pure decomposition copes poorly with the tightly constrained, partial-recovery geometry of the reconfiguration front. Second, within the AHD family the multi-agent design of MAHE consistently surpasses the single-agent EoH: MAHE attains the best mean on nine of the ten instances and the highest overall mean (

0.949

versus

0.906

). Its advantage is largest on the biggest instances (for example

0.964 \pm 0.010

versus

0.864 \pm 0.054

on I10) and—crucially—is accompanied there by a markedly smaller dispersion (std

0.010

versus

0.054

), so MAHE is not only better on average but also more reliable run to run. We attribute this to the division of labor that EoH lacks: repair removes failed candidates before they consume inner evaluations, reflection re-targets generation when the search stalls, and adaptive scheduling keeps the operator mix matched to the search state, which together stabilize heuristic generation precisely where the single-agent loop becomes brittle. The sole exception is the smallest instance I1, on which NSGA-II already reaches a near-optimal front (

0.856

) and narrowly leads MAHE (

0.849

); when the feasible region is this small, a well-tuned population method suffices and the benefit of a generated heuristic has yet to emerge, yet even here MAHE stays within

0.007

of the best.

All pairwise comparisons against MAHE are confirmed by a two-sided Wilcoxon signed-rank test and reported directly in Table 8. Over the pooled 150-scenario test MAHE is significantly better than every baseline (

p < 0.001

, Holm–Bonferroni corrected), and the per-instance marks show that this advantage holds on essentially all medium and large instances; only on the smallest instances does the gap to the strongest competitors (notably NSGA-II and EoH) narrow to statistical parity, while the weaker solvers remain significantly behind throughout.

To expose the trade-off behind these hypervolumes, the recovery effectiveness

F_{A}

and the reconfiguration distance

F_{D}

are examined directly at the three disturbance ratios. Figure 2 overlays the fronts of all methods on instance I10 for a single representative run, with

F_{A}

on the horizontal axis (larger is better) and

F_{D}

on the vertical axis (smaller is better); a desirable front therefore lies toward the lower right, combining high recovery with a small change to the baseline plan. At light loss (

α = 0.15

), the hand-crafted solvers concentrate at low

F_{A}

and elevated

F_{D}

, whereas MAHE simultaneously extends the front to the highest

F_{A}

and presses

F_{D}

to the lowest values, so that its front dominates those of the other methods; EoH also reaches high

F_{A}

, but its

F_{D}

floor sits well above MAHE’s and its front remains dominated. As the loss grows to

α = 0.35

and

α = 0.50

, the fronts of the classical solvers and ALNS steepen into a nearly vertical wall, so that additional recovery can be bought only at a sharply rising reconfiguration cost; MAHE instead keeps a low, wide, and almost flat front, sustaining high

F_{A}

at a nearly constant

F_{D}

and reaching recovery levels that the other methods attain only by paying a much larger

F_{D}

, if at all. EoH follows the same low-

F_{D}

regime and clearly improves on the hand-crafted solvers, but its front is shorter and stays dominated by that of MAHE. The advantage of MAHE in the high-

F_{A}

, low-

F_{D}

region is thus visible at every ratio and becomes more pronounced as node loss increases, mirroring the hypervolume trend in Table 8.

To make the reconfiguration behavior concrete, Figure 3 contrasts the baseline plan with the plan reconstructed by MAHE on instance I1 at

α = 0.35

(representative seed 2021), a scenario in which the disturbance disables four of the twelve sensing nodes, one of the three coordination nodes, and four of the twelve engagement nodes, including a heavily used coordination hub. Every baseline chain routed through a failed node is severed, so that the affected targets lose their closed loops. MAHE repairs the plan by reusing the surviving links wherever feasible and rebuilding the broken chains through the five reconfiguration actions of keeping, insertion, replacement, switching, and release: the targets that depended on the failed coordination hub are rerouted onto the two surviving coordination nodes or switched to alternative sensing-engagement chains, while the unaffected targets are left untouched; targets that cannot be feasibly re-closed within their time windows remain inactive, realizing partial recovery. The reconstructed plan thus restores executable closed loops for the affected targets while altering only a small portion of the baseline plan, achieving a recovery effectiveness of

F_{A} = 47.24

at a reconfiguration distance of

F_{D} = 12

—a single-scenario realization of the high-recovery, low-disturbance balance that the learned scoring heuristic is designed to strike.

5.4. Ablation Studies

To quantify the contribution of each agent, MAHE is compared with three variants that disable one mechanism at a time: w/o coordinator (the improvement-driven operator weighting is replaced by non-adaptive selection), w/o reflection (no cross-generation feedback), and w/o repair (invalid candidates are discarded instead of repaired). All variants share the full method’s budget, operator pool, training instances, and test-scenario seeds, and are scored by the same normalized

H V

under the same six-method reference as the main comparison, so that Table 9 isolates the effect of coordination, reflection, and repair.

Removing any single mechanism lowers the mean normalized

H V

below that of the full method (

0.949

), confirming that the three agents are complementary rather than redundant. Reflection is the dominant contributor: it accounts for the largest mean drop (to

0.869

,

- 0.080

), and its effect is strongly scale-dependent—negligible to moderate on the small instances (I2–I4 remain at or above

0.92

) but severe on the large ones, where the variant falls into the

0.76

–

0.88

band (on I10 the normalized

H V

drops from

0.964

to

0.765

). Cross-generation feedback is therefore what sustains progress as the reconfiguration space expands and a single generation can no longer reach good heuristics by chance. Repair is the second contributor (mean

0.919

,

- 0.030

), with its effect again concentrated on the harder, more tightly constrained instances (

0.964

versus

0.895

on I10); correcting infeasible candidates instead of discarding them retains high-quality heuristics that the validation filter would otherwise throw away, which matters most where feasible heuristics are scarce. The coordinator contributes the least on average (mean

0.928

,

- 0.021

), but its benefit is strongly scale-dependent: on the small instances a fixed round-robin operator schedule matches or marginally edges the full method (I1 and I4), whereas on the large instances the gap widens sharply (

0.964

versus

0.871

on I10 and

0.968

versus

0.920

on I9). Because the seven-operator pool is rich and complementary, non-adaptive selection suffices when the search space is small, but improvement-driven operator weighting becomes increasingly important as the reconfiguration problem grows—mirroring the scale dependence observed for reflection and repair. Over all 150 paired scenarios the full method significantly outperforms each variant (two-sided Wilcoxon signed-rank,

p < 0.001

, Holm–Bonferroni corrected; reported directly in Table 9); the per-instance tests further show that each component’s contribution, though negligible on the smallest instances, becomes statistically significant on the larger ones.

Figure 4 plots the best-so-far outer-level fitness against the number of heuristic evaluations and reveals how each mechanism shapes the search trajectory. The full method climbs in a steady, stepwise fashion and is still improving when the budget is exhausted; its late surge over the final evaluations shows that the search has not yet saturated and that the complete framework keeps the heuristic evolving throughout training. The w/o-reflection variant, by contrast, advances in long plateaus and remains the lowest trajectory for almost the entire run, visually confirming that cross-generation feedback is what breaks stagnation and sustains progress. The w/o-repair variant exhibits the clearest signature of overfitting: it surges within the first handful of evaluations to the highest training fitness of any variant (final

0.928

, even above the full method’s

0.924

) and then stays essentially flat for the remainder of the budget, with no further gains. Crucially, this early training advantage does not carry over to unseen scenarios—at test time the variant falls behind the full method (cf. Table 9)—indicating that, without repair, the search locks onto a small set of heuristics that fit the few training scenarios but generalize poorly. Taken together, these trajectories show that it is the complete framework that keeps the heuristic evolving across the whole training budget while remaining generalizable: reflection and repair are decisive for both convergence and generalization, while the coordinator—whose w/o-coordinator curve reaches

0.901

in training only via a single late jump—accelerates and stabilizes the search, its benefit on the final solution concentrated on the larger instances (Table 9).

6. Conclusions

This paper addressed the dynamic reconfiguration of cooperative counter-UAV task chains, the problem of restoring executable sensing-coordination-engagement loops after defending nodes fail. By taking intruding UAVs as target nodes and defending platforms as fixed-role functional nodes, and by inheriting the surviving plan rather than re-solving from scratch, the formulation places the task-chain structure, the inheritance of the historical plan, and the recovery–disturbance trade-off at its center, which distinguishes it from conventional task allocation. The problem was cast as a constrained bi-objective program over the recovered effectiveness and the reconfiguration cost, and solved by MAHE, an LLM-driven multi-agent framework that evolves reconfiguration-specific scoring heuristics for a fixed multi-objective solver.

The experiments support three conclusions. First, generated, disturbance-conditioned heuristics scale where fixed operator sets do not: across instances from 8 to 320 targets, MAHE maintained a consistently high solution quality, whereas the hand-crafted multi-objective solvers lost most of theirs as the problem grew, and the gap widened monotonically with scale. Second, the multi-agent decomposition is what delivers this behavior: against an otherwise comparable single-agent design, MAHE was both more effective and markedly more stable on the largest instances, and the ablation attributes the improvement to complementary roles—reflection breaking stagnation, repair retaining feasible high-quality heuristics, and adaptive operator scheduling tracking the search state—whose individual contributions grew with problem size. Third, the recovered plans realize the intended balance, sustaining high recovery at a nearly constant change to the baseline plan even under heavy node loss. Together these results indicate that automated, reconfiguration-oriented heuristic design is a scalable and reliable route to dynamic, heterogeneous, and constraint-intensive counter-UAV task planning.

7. Future Work

Several directions remain open. First, the present model reconfigures over a single decision period; extending it to multi-stage rolling reconfiguration would capture continuous node losses and successive waves of intruding targets, and would let the recovery–disturbance trade-off be optimized over a horizon rather than a single step. Second, the defending UAVs are assigned fixed roles; allowing their sensing, relay, and engagement functions to switch adaptively with payload and energy state would enlarge the reconfiguration action space and better reflect mobile platforms. Third, incorporating trajectory prediction and explicit uncertainty modeling would let reconfiguration anticipate the motion of intruding UAVs and act before, rather than after, a chain is broken. Fourth, under appropriate safety and compliance constraints, validation through hardware-in-the-loop simulation or real flight data would test the model’s applicability in operational low-altitude security settings. Pursuing these directions would extend the framework from single-shot recovery toward continuous, predictive, and field-validated counter-UAV task planning.

Author Contributions

Conceptualization, Y.Z. and C.Y.; methodology, Y.Z.; software, Y.Z.; validation, Y.Z., Y.Y. and Y.S.; formal analysis, Y.Z.; investigation, Y.Z. and Y.L.; resources, C.Y. and R.Y.; data curation, Y.W. and Y.T.; writing—original draft preparation, Y.Z.; writing—review and editing, C.Y., R.Y. and B.H.; visualization, Y.Z. and J.B.; supervision, C.Y. and R.Y.; project administration, C.Y.; funding acquisition, C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Foundation of China (Grant No. 2025-SKJJ-B-027).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

DURC Statement

Current research is limited to the defensive counter-unmanned aerial vehicle (counter-UAV) domain, namely the dynamic reconfiguration of cooperative defensive task chains for restoring low-altitude airspace security, which is beneficial for protecting critical infrastructure, public venues, and regulated airspace against unauthorized or malicious drone intrusions and does not pose a threat to public health or national security. Authors acknowledge the dual-use potential of the research involving heterogeneous multi-platform task allocation and automated heuristic design and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, authors strictly adhere to relevant national and international laws about DURC. Authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Prompt Templates

The following are the prompt templates used by MAHE, reproduced from the deployed system.

Appendix A.1. Task Description and Function Specification

Figure A1. Task description (_TASK_DESCRIPTION).

Figure A2. Function specification (_FUNC_SPEC).

Appendix A.2. Initialization and Variation Operators

Figure A3. Initialization operator

I_{1}

.

Figure A3. Initialization operator

I_{1}

.

Figure A4. Exploration operators

E_{1}

and

E_{2}

.

Figure A4. Exploration operators

E_{1}

and

E_{2}

.

Figure A5. Modification operators

M_{1}

,

M_{2}

, and

M_{3}

.

Figure A5. Modification operators

M_{1}

,

M_{2}

, and

M_{3}

.

Figure A6. Ruin-and-recreate operators

R_{1}

and

R_{2}

.

Figure A6. Ruin-and-recreate operators

R_{1}

and

R_{2}

.

References

Kang, H.; Joung, J.; Kim, J.; Kang, J.; Cho, Y.S. Protect Your Sky: A Survey of Counter Unmanned Aerial Vehicle Systems. IEEE Access 2020, 8, 168671–168710. [Google Scholar] [CrossRef]
Castrillo, V.U.; Manco, A.; Pascarella, D.; Gigante, G. A Review of Counter-UAS Technologies for Cooperative Defensive Teams of Drones. Drones 2022, 6, 65. [Google Scholar] [CrossRef]
Phadke, A.; Medrano, F.A. Towards Resilient UAV Swarms—a Breakdown of Resiliency Requirements in UAV Swarms. Drones 2022, 6, 340. [Google Scholar] [CrossRef]
Alqefari, S.; Menai, M.E.B. Multi-UAV Task Assignment in Dynamic Environments: Current Trends and Future Directions. Drones 2025, 9, 75. [Google Scholar] [CrossRef]
Xu, W.; Chen, C.; Ding, S.; Pardalos, P.M. A Bi-Objective Dynamic Collaborative Task Assignment under Uncertainty Using Modified MOEA/D with Heuristic Initialization. Expert Syst. With Appl. 2020, 140, 112844. [Google Scholar] [CrossRef]
Zhong, Y.; Li, H.; Sun, Q.; Huang, Z.; Zhang, Y. A Kill Chain Optimization Method for Improving the Resilience of Unmanned Combat System-of-Systems. Chaos Solitons Fractals 2024, 181, 114685. [Google Scholar] [CrossRef]
Liu, W.; Pan, Z.; Han, W.; Su, X.; Yu, D.; Wan, B. Construction of Kill Webs with Heterogeneous UAV Swarms in Dynamic Contested Environments. Complex Intell. Syst. 2024, 11, 8. [Google Scholar] [CrossRef]
Liu, F.; Tong, X.; Yuan, M.; Lin, X.; Luo, F.; Wang, Z.; Lu, Z.; Zhang, Q. Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model. In Proceedings of the ICML. arXiv, jun 2024. [Google Scholar] [CrossRef]
Ye, H.; Wang, J.; Cao, Z.; Berto, F.; Hua, C.; Kim, H.; Park, J.; Song, G. ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution. Adv. Neural Inf. Process. Syst. 2024, 37, 43571–43608. [Google Scholar] [CrossRef]
Romera-Paredes, B.; Barekatain, M.; Novikov, A.; Balog, M.; Kumar, M.P.; Dupont, E.; Ruiz, F.J.R.; Ellenberg, J.S.; Wang, P.; Fawzi, O.; et al. Mathematical Discoveries from Program Search with Large Language Models. Nature 2024, 625, 468–475. [Google Scholar] [CrossRef] [PubMed]
Javed, S.; Hassan, A.; Ahmad, R.; Ahmed, W.; Ahmed, R.; Saadat, A.; Guizani, M. State-of-the-Art and Future Research Challenges in UAV Swarms. IEEE Internet Things J. 2024, 11, 19023–19045. [Google Scholar] [CrossRef]
Ghazlane, Y.; Gmira, M.; Medromi, H. Survey on Current Anti-Drone Systems: Process, Technologies, and Algorithms. Int. J. Syst. Syst. Eng. 2022, 12, 235–270. [Google Scholar] [CrossRef]
Kim, K.; Kim, J.; Lee, H.G.; Choi, J.; Fan, J.; Joung, J. UAV Chasing Based on Yolov3 and Object Tracker for Counter UAV Systems. IEEE Access 2023, 11, 34659–34673. [Google Scholar] [CrossRef]
Lykou, G.; Moustakas, D.; Gritzalis, D. Defending Airports from UAS: A Survey on Cyber-Attacks and Counter-Drone Sensing Technologies. Sensors 2020, 20, 3537. [Google Scholar] [CrossRef] [PubMed]
Lee, C.H.; Thiessen, C.; Van Bossuyt, D.L.; Hale, B. A Systems Analysis of Energy Usage and Effectiveness of a Counter-Unmanned Aerial System Using a Cyber-Attack Approach. Drones 2022, 6, 198. [Google Scholar] [CrossRef]
Zheng, H.; Yuan, J. An Integrated Mission Planning Framework for Sensor Allocation and Path Planning of Heterogeneous Multi-UAV Systems. Sensors 2021, 21, 3557. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Zhang, L.; Wang, W.; Fang, H.; Zhang, J.; Zhang, B. Dynamic Resource Target Assignment Problem for Laser Systems’ Defense against Malicious UAV Swarms Based on MADDPG-IA. Aerospace 2025, 12, 729. [Google Scholar] [CrossRef]
Nan, M.; Zhu, Y.; Kang, L.; Wang, T.; Zhou, X. A Modified RL-IGWO Algorithm for Dynamic Weapon-Target Assignment in Frigate Defensing UAV Swarms. Electronics 2022, 11, 1796. [Google Scholar] [CrossRef]
Kong, L.; Wang, J.; Zhao, P. Solving the Dynamic Weapon Target Assignment Problem by an Improved Multiobjective Particle Swarm Optimization Algorithm. Appl. Sci. 2021, 11, 9254. [Google Scholar] [CrossRef]
Xu, H.; Zhang, A.; Bi, W.; Xu, S. Dynamic Gaussian Mutation Beetle Swarm Optimization Method for Large-Scale Weapon Target Assignment Problems. Appl. Soft Comput. 2024, 162, 111798. [Google Scholar] [CrossRef]
Song, Y.; Ma, Z.; Chen, N.; Zhou, S.; Srigrarom, S. Comparative Analysis of Centralized and Distributed Multi-UAV Task Allocation Algorithms: A Unified Evaluation Framework. Drones 2025, 9, 530. [Google Scholar] [CrossRef]
Alqefari, S.; Menai, M.E.B. A Hybrid Method to Solve the Multi-UAV Dynamic Task Assignment Problem. Sensors 2025, 25, 2502. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Gou, J.; Ji, H.; Deng, J. Hierarchical Mission Replanning for Multiple UAV Formations Performing Tasks in Dynamic Situation. Comput. Commun. 2023, 200, 132–148. [Google Scholar] [CrossRef]
Liu, B.; Wang, S.; Li, Q.; Zhao, X.; Pan, Y.; Wang, C. Task Assignment of UAV Swarms Based on Deep Reinforcement Learning. Drones 2023, 7, 297. [Google Scholar] [CrossRef]
Lu, Y.; Ma, Y.; Wang, J.; Han, L. Task Assignment of UAV Swarm Based on Wolf Pack Algorithm. Appl. Sci. 2020, 10, 8335. [Google Scholar] [CrossRef]
Ma, Z.; Chen, J. Multi-UAV Urban Logistics Task Allocation Method Based on MCTS. Drones 2023, 7, 679. [Google Scholar] [CrossRef]
Ai, L.; Ma, B.; Zhang, J.; Ai, Y.; Hao, Z.; Li, J.; Yu, Z.; Cheng, J. A Generative Task Allocation Method for Heterogeneous UAV Swarms Empowered by Heterogeneous Toolchains. Drones 2026, 10, 289. [Google Scholar] [CrossRef]
Zeng, Y.; Wu, L.; Li, J.; Zhuang, X.; Wu, C. Resilient Task Allocation for UAV Swarms: A Bilevel PSO-ILP Optimization Approach. Drones 2025, 9, 623. [Google Scholar] [CrossRef]
Mayya, S.; D’Antonio, D.S.; Saldaña, D.; Kumar, V. Resilient Task Allocation in Heterogeneous Multi-Robot Systems. IEEE Robot. Autom. Lett. 2021, 6, 1327–1334. [Google Scholar] [CrossRef]
Neville, G.; Chernova, S.; Ravichandar, H. D-ITAGS: A Dynamic Interleaved Approach to Resilient Task Allocation, Scheduling, and Motion Planning. arXiv 2022, arXiv:2209.13092. [Google Scholar] [CrossRef]
Neville, G.; Messing, A.; Ravichandar, H.; Hutchinson, S.; Chernova, S. An Interleaved Approach to Trait-Based Task Allocation and Scheduling. arXiv 2021, arXiv:2108.02773. [Google Scholar] [CrossRef]
Emam, Y.; Mayya, S.; Notomista, G.; Bohannon, A.; Egerstedt, M. Adaptive Task Allocation for Heterogeneous Multi-Robot Teams with Evolving and Unknown Robot Capabilities. arXiv 2020. 2003, arXiv:2003.03344. [Google Scholar] [CrossRef]
Yao, S.; Liu, F.; Lin, X.; Lu, Z.; Wang, Z.; Zhang, Q. Multi-Objective Evolution of Heuristic Using Large Language Model. Proc. AAAI Conf. Artif. Intell. 2025, 39, 27144–27152. [Google Scholar] [CrossRef]
Zhong, R.; Hussien, A.G.; Yu, J.; Munetomo, M. LLMOA: A Novel Large Language Model Assisted Hyper-Heuristic Optimization Algorithm. Adv. Eng. Inform. 2025, 64, 103042. [Google Scholar] [CrossRef]
Huang, Y.; Wu, S.; Zhang, W.; Wu, J.; Feng, L.; Tan, K.C. Autonomous Multiobjective Optimization Using Large Language Model. IEEE Trans. Evol. Comput. 2026, 30, 594–608. [Google Scholar] [CrossRef]
Chen, J.; Chen, Y.; Pham, D.T.; Song, Y.; Wu, J.; Xing, L.; Chen, Y. A Large Language Model-based Multi-Agent Framework to Autonomously Design Algorithms for Earth Observation Satellite Scheduling Problem. Engineering 2025. [Google Scholar] [CrossRef]
van Stein, N.; Bäck, T. LLaMEA: A Large Language Model Evolutionary Algorithm for Automatically Generating Metaheuristics. IEEE Trans. Evol. Comput. 2025, 29, 331–345. [Google Scholar] [CrossRef]
Zhang, R.; Liu, F.; Lin, X.; Wang, Z.; Lu, Z.; Zhang, Q. Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models. In Proceedings of the Parallel Problem Solving from Nature – PPSN XVIII;Cham; Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tušar, T., Machado, P., Bäck, T., Eds.; 2024; Vol. 15149, pp. 185–202. [Google Scholar] [CrossRef]
Zheng, Z.; Xie, Z.; Wang, Z.; Hooi, B. Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design. In Proceedings of the Forty-Second International Conference on Machine Learning, jun 2025; p. 2501.08603. [Google Scholar] [CrossRef]
Xie, Z.; Liu, F.; Wang, Z.; Zhang, Q. LLM-Driven Neighborhood Search for Efficient Heuristic Design. In Proceedings of the 2025 IEEE Congress on Evolutionary Computation (CEC), jun 2025; pp. 1–8. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Panichella, A. An Adaptive Evolutionary Algorithm Based on Non-Euclidean Geometry for Many-Objective Optimization. In Proceedings of the Proceedings of the Genetic and Evolutionary Computation Conference, New York, NY, USA, jul 2019; GECCO ’19, pp. 595–603. [Google Scholar] [CrossRef]
Yi, X.; Yu, H.; Xu, T. Solving Multi-Objective Weapon-Target Assignment Considering Reliability by Improved MOEA/D-AM2M. Neurocomputing 2024, 563, 126906. [Google Scholar] [CrossRef]
Wang, Y.; Wang, J.; Hao, J.K.; Feng, J. Efficient Adaptive Large Neighborhood Search for Sensor–Weapon–Target Assignment. IEEE Trans. Syst. Man. Cybern.-Syst. 2024, 54, 6397–6409. [Google Scholar] [CrossRef]

Figure 1. Architecture of the Multi-Agent Heuristic Evolution (MAHE) framework.

Figure 2. Pareto fronts of the six methods on instance I10 under light, moderate, and heavy node loss (

α \in {0.15, 0.35, 0.50}

): recovery effectiveness

F_{A}

(higher is better) versus reconfiguration distance

F_{D}

(lower is better), so fronts toward the lower right are preferable. Single representative run, seed 2021.

Figure 2. Pareto fronts of the six methods on instance I10 under light, moderate, and heavy node loss (

α \in {0.15, 0.35, 0.50}

): recovery effectiveness

F_{A}

(higher is better) versus reconfiguration distance

F_{D}

(lower is better), so fronts toward the lower right are preferable. Single representative run, seed 2021.

Figure 3. Task-chain reconstruction on instance I1 under moderate node loss (

α = 0.35

, representative seed 2021): (a) baseline plan; (b) plan reconstructed by MAHE.

Figure 3. Task-chain reconstruction on instance I1 under moderate node loss (

α = 0.35

, representative seed 2021): (a) baseline plan; (b) plan reconstructed by MAHE.

Figure 4. Convergence of the outer heuristic evolution: best-so-far outer-level fitness (training normalized

H V

) versus the number of heuristic evaluations, for the full method and the three ablated variants on a single representative run (seed 2021).

Figure 4. Convergence of the outer heuristic evolution: best-so-far outer-level fitness (training normalized

H V

) versus the number of heuristic evaluations, for the full method and the three ablated variants on a single representative run (seed 2021).

Table 1. Five reconfiguration actions of the reconstructed plan relative to the surviving plan.

Action	Meaning	Effect on assignment variables
Keep	Reuse a surviving link	The variable stays equal to $X^{surv}$
Insert	Connect a new available node to a target	One assignment variable changes from 0 to 1
Replace	Replace a failed or inefficient node with an available node	One entry changes from 1 to 0 and another from 0 to 1
Switch	Change the chain type between sensing-engagement and sensing-coordination-engagement	Change whether $\sum_{c} z_{c, u} \geq 1$ holds
Release	Set a target that cannot be feasibly recovered to inactive	All assignment variables of the target become 0 ( $r_{u} = 0$ )

Table 2. Main symbols of the reconfiguration model.

Symbol	Meaning
$S, C, I, U$	Sets of sensing nodes, coordination nodes, engagement nodes, and targets.
$x_{s, u}, z_{c, u}, y_{i, u}$	Binary assignment variables from the three node classes to a target.
$r_{u} (X)$	Indicator of whether target u is activated.
$a_{s, u}^{S}, a_{c, u}^{C}, a_{i, u}^{I}$	Reachability parameters of the three node-target link classes.
$q_{s, u}, p_{i, u}$	Effective sensing probability and effective engagement probability.
$γ_{u}, g$	Coordination success-rate gain and gain coefficient.
$v_{u}, W_{u}$	Target value and response time window.
$κ_{s}^{S}, κ_{c}^{C}, κ_{i}^{I}$	Capacities of sensing, coordination, and engagement nodes.
$D^{S}, D^{C}, D^{I}$	Sets of failed sensing, coordination, and engagement nodes.
$X^{0}, X^{surv}, X$	Baseline plan, surviving plan, and reconstructed plan.
$U^{aff}$	Affected target set whose baseline chain contains a failed node.
$F_{A}, F_{D}$	Total weighted effectiveness after reconfiguration and the link-change distance relative to the baseline plan $X^{0}$ .

Table 3. Contracts of the four agents of Multi-Agent Heuristic Evolution (MAHE).

Agent	Trigger	Input	Output / action
Evolution	every iteration	operator o, parent package(s), active reflection $ξ$ , pitfalls $M$	a new heuristic package (design note + scoring code)
Coordinator	every iteration	operator weights $ω_{o}$ , realized hypervolume gain r	the selected operator and the updated operator weights
Repair	a candidate fails the validator	broken code and its error; error memory $M$	a repaired, executable package or a rejection; a distilled lesson appended to $M$
Reflection	$T_{p}$ consecutive non-improving evaluations	the packages before/after the last gain and the recent stalled packages	a free-text diagnosis $ξ$ prepended to the next generation prompt

Table 4. The seven variation operators of the evolution agent.

Family	Op.	Parents	Instruction given to the large language model (LLM)
Exploration	$E_{1}$	2	Produce a heuristic with a totally different structure from the given parents.
	$E_{2}$	2	Extract the parents’ shared backbone idea, then produce a differently structured heuristic motivated by it.
Modification	$M_{1}$	1	Keep the parent’s core idea but change its implementation details.
	$M_{2}$	1	Keep the structure but re-tune the parent’s parameters (weights, coefficients, thresholds).
	$M_{3}$	1	Simplify components prone to overfitting to improve generalization to unseen instances.
Ruin-and-recreate	$R_{1}$	1	Rewrite a random $\sim 40 %$ of the parent’s lines into a complete, improved heuristic.
	$R_{2}$	1	Remove one functional module (a score block or $w$ ) and redesign it from scratch.

Table 5. Training and test instance scales.

Instance	Targets l	Sensing nodes m	Coordination nodes h	Engagement nodes n
T1	12	18	3	18
T2	32	48	5	48
T3	80	120	12	120
T4	160	240	24	240
I1	8	12	3	12
I2	16	24	3	24
I3	24	36	4	36
I4	40	60	6	60
I5	64	96	10	96
I6	100	150	15	150
I7	130	195	20	195
I8	180	270	27	270
I9	250	375	38	375
I10	320	480	48	480

Table 6. Instance-generation parameters.

Parameter	Distribution / value
Effective sensing probability $q_{s, u}$	$U (0.70, 0.95)$
Effective engagement probability $p_{i, u}$	$U (0.50, 0.80)$
Coordination gain coefficient g	$1.3$
Sensing / engagement capacity $κ^{S}, κ^{I}$	1
Coordination capacity $κ^{C}$	$⌈ l / h ⌉$
Coverage radius $R_{s} / R_{c} / R_{i}$	$[30, 50] / [80, 100] / [55, 75]$
Sensing delay $τ^{S}$	$U (1, 3)$
Engagement base delay	$U (1, 3)$
Coordination base delay	$U (1, 2)$
Flight speed	$U (5, 10)$
Transmission rate	50
Target time window $W_{u}$	$U (15, 30)$
Kill threshold $P_{u}^{min}$	$[0.55, 0.75]$

Table 7. Main parameter configurations of the compared algorithms.

Algorithm	Population	Generations / Iterations	Crossover	Mutation
NSGA-II	100	100	0.9	$p_{m}$
AGE-MOEA	100	100	0.9	$p_{m}$
MOEA/D-iAM2M	100	100	0.9	$p_{m}$
ALNS	—	1665	—	—
EoH	100	100	0.9	$p_{m}$
MAHE	100	100	0.9	$p_{m}$

Note: All population-based methods share an external non-dominated archive of size 100.

p_{m} = 1 / ((m + h + n) l)

is the scale-adaptive bit-flip mutation probability, and “—” marks a parameter that does not apply. For EoH and MAHE, the row reports the shared inner-solver configuration used at deployment; the outer heuristic-evolution loop is described in the text. Algorithm abbreviations are defined at their first occurrence in the text (Section 4.7 and Section 5.2.3).

Table 8. Normalized hypervolume (

H V

) on the ten test instances (mean±std over five test-scenario seeds, each averaged over three disturbance ratios); best mean per row in bold.

Table 8. Normalized hypervolume (

H V

) on the ten test instances (mean±std over five test-scenario seeds, each averaged over three disturbance ratios); best mean per row in bold.

Instance	NSGA-II	AGE-MOEA	MOEA/D-iAM2M	ALNS	EoH	MAHE
I1	0.856±0.044^ns	0.785±0.061^ns	0.408±0.052^***	0.549±0.047^***	0.829±0.081^ns	0.849±0.078
I2	0.704±0.062^***	0.602±0.095^***	0.328±0.055^***	0.571±0.110^***	0.915±0.028^*	0.940±0.037
I3	0.598±0.057^***	0.572±0.062^***	0.331±0.038^***	0.553±0.099^***	0.934±0.051^ns	0.954±0.029
I4	0.459±0.036^***	0.424±0.028^***	0.285±0.042^***	0.439±0.057^***	0.959±0.009^ns	0.971±0.015
I5	0.362±0.009^***	0.348±0.019^***	0.266±0.017^***	0.442±0.016^***	0.925±0.014^**	0.954±0.015
I6	0.341±0.023^***	0.343±0.019^***	0.269±0.030^***	0.370±0.018^***	0.909±0.016^***	0.963±0.011
I7	0.308±0.018^***	0.305±0.013^***	0.259±0.012^***	0.377±0.007^***	0.902±0.011^***	0.962±0.006
I8	0.322±0.018^***	0.316±0.016^***	0.284±0.017^***	0.389±0.020^***	0.916±0.018^***	0.966±0.006
I9	0.301±0.013^***	0.297±0.015^***	0.275±0.013^***	0.369±0.016^***	0.909±0.018^***	0.968±0.003
I10	0.287±0.020^***	0.286±0.016^***	0.261±0.024^***	0.353±0.021^***	0.864±0.054^***	0.964±0.010
Mean	0.454±0.012^***	0.428±0.009^***	0.296±0.011^***	0.441±0.021^***	0.906±0.015^***	0.949±0.014

^*

p < 0.05

, ^**

p < 0.01

, ^***

p < 0.001

, ^ns not significant: two-sided Wilcoxon signed-rank test against MAHE on the normalized

H V

(all significant differences favor MAHE). Per-instance rows use the 15 paired scenarios of that instance (three disturbance ratios × five seeds) and are reported without correction; the Mean row uses all 150 scenarios, with p-values Holm–Bonferroni corrected across the five comparisons.

Table 9. Ablation of MAHE components (normalized

H V

, mean±std over five test-scenario seeds, each averaged over three disturbance ratios); best per row in bold.

Table 9. Ablation of MAHE components (normalized

H V

, mean±std over five test-scenario seeds, each averaged over three disturbance ratios); best per row in bold.

Instance	MAHE	w/o coordinator	w/o reflection	w/o repair
I1	0.849±0.078	0.855±0.062^ns	0.829±0.077^ns	0.813±0.080^ns
I2	0.940±0.037	0.934±0.033^ns	0.920±0.043^*	0.877±0.042^***
I3	0.954±0.029	0.949±0.032^ns	0.935±0.042^*	0.922±0.037^***
I4	0.971±0.015	0.973±0.013^ns	0.955±0.019^ns	0.938±0.019^*
I5	0.954±0.015	0.949±0.014^ns	0.863±0.022^***	0.938±0.015^ns
I6	0.963±0.011	0.951±0.012^*	0.860±0.032^***	0.954±0.011^*
I7	0.962±0.006	0.939±0.009^**	0.843±0.027^***	0.959±0.005^ns
I8	0.966±0.006	0.937±0.022^***	0.872±0.021^***	0.960±0.013^ns
I9	0.968±0.003	0.920±0.011^***	0.851±0.032^***	0.935±0.009^***
I10	0.964±0.010	0.871±0.052^***	0.765±0.088^***	0.895±0.041^***
Mean	0.949±0.014	0.928±0.014^***	0.869±0.017^***	0.919±0.013^***

^*

p < 0.05

, ^**

p < 0.01

, ^***

p < 0.001

, ^ns not significant: two-sided Wilcoxon signed-rank test of the full method against each variant on the normalized

H V

(all significant differences favor the full method). Per-instance rows use the 15 paired scenarios of that instance (three disturbance ratios × five seeds) and are reported without correction; the Mean row uses all 150 scenarios, with p-values Holm–Bonferroni corrected across the three comparisons.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Instance	Targets l	Sensing nodes m	Coordination nodes h	Engagement nodes n
T1	12	18	3	18
T2	32	48	5	48
T3	80	120	12	120
T4	160	240	24	240
I1	8	12	3	12
I2	16	24	3	24
I3	24	36	4	36
I4	40	60	6	60
I5	64	96	10	96
I6	100	150	15	150
I7	130	195	20	195
I8	180	270	27	270
I9	250	375	38	375
I10	320	480	48	480

Instance	Targets l	Sensing nodes m	Coordination nodes h	Engagement nodes n
T1	12	18	3	18
T2	32	48	5	48
T3	80	120	12	120
T4	160	240	24	240
I1	8	12	3	12
I2	16	24	3	24
I3	24	36	4	36
I4	40	60	6	60
I5	64	96	10	96
I6	100	150	15	150
I7	130	195	20	195
I8	180	270	27	270
I9	250	375	38	375
I10	320	480	48	480

Dynamic Task-Chain Reconfiguration for Cooperative Counter-UAV Defense: A Multi-Agent Large Language Model Framework for Automated Heuristic Design

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

2.1. Task-Chain Modeling for Cooperative Counter-UAV Defense

2.2. Dynamic Multi-UAV Task Allocation

2.3. Resilient Task-Chain Reconfiguration under Disturbance

2.4. Large Language Model-Based Automated Heuristic Design

3. Problem Definition

3.1. Task-Chain Representation and Baseline Plan

3.2. Disturbance Event and Reconfiguration State

3.3. Objective Functions

3.4. Constraints

4. Method

4.1. Overview and Design Rationale

4.2. Heuristic Package and Reconfiguration Interface

4.3. Evolution Agent

4.4. Coordinator Agent: An Operator Bandit

4.5. Repair Agent: Accumulating Error Memory

4.6. Reflection Agent: Patience-Triggered Diagnosis

4.7. Inner Reconfiguration Solver

4.8. Outer-Level Fitness and Heuristic Selection

5. Experiments

5.1. Experimental Scenarios

5.2. Experimental Setups

5.2.1. Experimental Platform Configuration

5.2.2. Evaluation Metrics

5.2.3. Baseline Algorithms and Models

5.3. Comparative Experiments

5.4. Ablation Studies

6. Conclusions

7. Future Work

Author Contributions

Funding

Data Availability Statement

DURC Statement

Conflicts of Interest

Appendix A. Prompt Templates

Appendix A.1. Task Description and Function Specification

Appendix A.2. Initialization and Variation Operators

References

MDPI Initiatives

Important Links

Subscribe

Instance	Targets l	Sensing nodes m	Coordination nodes h	Engagement nodes n
T1	12	18	3	18
T2	32	48	5	48
T3	80	120	12	120
T4	160	240	24	240
I1	8	12	3	12
I2	16	24	3	24
I3	24	36	4	36
I4	40	60	6	60
I5	64	96	10	96
I6	100	150	15	150
I7	130	195	20	195
I8	180	270	27	270
I9	250	375	38	375
I10	320	480	48	480