Physical AI as Seen from Nature: A Reflex-Policy Layered Architecture for Energy-Proportional Embodied Intelligence

Pietro Perlo; Marco Dalmasso; Davide Penserini; Sergio Pozzato

doi:10.20944/preprints202606.0573.v1

Submitted:

05 June 2026

Posted:

08 June 2026

You are already at the latest version

Abstract

Physical AI requires machines to sense, decide and act under tight constraints of energy, latency, safety and robustness. A fly escaping an approaching hand captures the core principle: a fast sensor-action reflex acts before full deliberation, while higher neural resources remain available for richer behavior. This Perspective proposes a layered Reflex-Policy architecture in which reflex layers execute fast, local, ADC-light actions near sensors and actuators, while policy layers perform slower learning, planning, optimization and rule updates. The two are mutually protective collaborators: policy layers define safe operating envelopes, while reflex layers shield policy processors from high-frequency events and avoidable data floods. We position binary spintronic MTJ crossbars as a plausible technology path for low reflex layers, while distinguishing today's MRAM/eMRAM technologies from the proposed reflex-layer crossbar module, which remains an architectural research target. The contribution is a system-integration framework: intelligence is distributed along the sensing, energy, storage and action chain, and each event is assigned to the lowest sufficient layer. We formalize Energy Returned on Invested Energy for Embodied Intelligence (EROIE), compare the framework with related architectures, and state its limitations and validation needs.

Keywords:

Physical AI

;

embodied intelligence

;

reflex-policy architecture

;

energy-proportional inference

;

EROIE

;

spintronics

;

MTJ crossbar

;

MRAM

;

edge AI

;

energy harvesting

Subject:

Physical Sciences - Applied Physics

1. Introduction and Positioning

1.1. Physical AI as Embodied, Energy-Constrained Intelligence

Artificial intelligence is moving from disembodied perception and language tasks into machines that physically intervene in the world. Physical AI and Physical Intelligence emphasize that the body, the environment and the material substrate are not passive containers for computation: they shape what can be sensed, computed and safely acted upon [1,2]. Vision-Language-Action models, world models and continuous-time neural models have accelerated the policy side of this transition, giving robots richer perception, instruction following and adaptation capabilities [3,4,5,6].

However, embodiment changes the engineering problem. A robot joint cannot wait for cloud inference before limiting a dangerous torque spike; a PV substring under fast partial shading cannot wait for a slow aggregate MPPT search before local energy is lost; and a harvesting-powered IoT node cannot afford to digitize and upload every raw signal. A simple biological image makes the point early: a fly escaping a hand does not first solve a full scene-understanding problem; its sensing suite activates an escape action before slow deliberation is useful. Physical AI therefore needs an architecture in which not every event receives the same computation. The central design question becomes: what is the lowest sufficient layer that can safely act on a given physical event?

1.2. Why a Reflex-Policy Perspective Is Needed

Hierarchical control is not new. Brooks’ subsumption architecture, hierarchical models of behavior and layered robotic systems have long shown that complex behavior can emerge from multiple interacting levels [7,8,9,10]. Recent safe reinforcement-learning work also reintroduces the biological idea that safety-related reflexes may operate faster than task-level policies [11]. What remains underdeveloped is a hardware-aware and energy-aware partition in which reflex actions are explicitly ADC-light, physically local and protected by higher policy layers.

The proposed Reflex-Policy framework is therefore not a new theory of all intelligence. It is a perspective on system integration for embodied systems. Its contribution is compactly threefold: it proposes a Reflex-Policy vocabulary for distributing embodied decisions; it introduces EROIE as a scale-down metric for useful energy returned after sensing and control overhead; and it identifies binary non-volatile rule fabrics, including MTJ-based technologies, as a promising Layer 1 research path. The novelty is the combination of explicit ADC-light versus ADC-heavy partitioning, energy-proportional assignment to the lowest sufficient layer, bidirectional rule transfer between policy and reflex layers, and a candidate spintronic implementation path for binary local reflex modules.

The strength of the contribution is not a single benchmark but a clearer way to ask design questions. Should a temperature transient be sampled by a 12-bit ADC and sent to an NPU, or should it trigger a local threshold? Should a PV substring fault be inferred from aggregate power oscillations, or should it be handled at the cell or substring level? Should a robot joint stream high-rate current waveforms upward, or should it report only a compressed event after local protection has acted? These questions are often treated as implementation details. Here they are elevated to the architectural level.

The hardware motivation is equally strong. The memory wall, ADC cost, data movement and scaling limits of monolithic inference motivate energy-proportional decision allocation [12,13,14,15]. Neuromorphic, spiking and physical neural systems already show why event-driven and substrate-embedded computation matter [16,17,18,19,20].

1.3. Contribution and Limits of This Perspective

The quantitative values in the layer tables should be read as design envelopes and research targets, not as measured system results. This explicit positioning is important: the paper’s purpose is to define an architectural research direction and a comparative vocabulary that can guide prototypes, simulations and EROIE measurements.

Accordingly, where numerical energy or latency values are given, they indicate the order of magnitude that a layer should target, not a measured result of the specific architecture.

The contribution is organized around three claims. First, Physical AI should distribute intelligence along the physical energy and action chain, rather than routing all decisions through a centralized policy processor. Second, reflex and policy layers should collaborate: policy defines rules that keep reflexes safe, while reflexes prevent policy from being overloaded. Third, binary spintronic devices are not yet a reflex product, but the existence of commercial MRAM/eMRAM and foundry routes makes binary MTJ technology a realistic substrate to explore, not a purely speculative material idea.

2. Reflex and Policy Layers: Definitions, Cooperation and EROIE

2.1. Reflex Layers and Policy Layers

A reflex layer is a decision pathway in which sensing maps directly, or with only minimal analog preprocessing, onto action. It uses thresholds, binary rules, hysteresis, low-bit weighting or deterministic state machines. It should be close to the physical event, use no high-resolution ADC in the urgent path, and remain operational even when higher layers are unavailable. A policy layer performs higher-inference tasks: state estimation, prediction, optimization, learning, planning and fleet coordination. It can afford ADCs, NPUs, GPUs or cloud resources because it is invoked less frequently and is not responsible for the first protective action.

The mirror metaphor is useful but limited. A mirror reflects without thinking because the material geometry performs the transformation. Similarly, a binary crossbar can map local conditions to local actions without executing a software loop in the critical path. Unlike a mirror, however, a reflex module must be bounded by policy: its thresholds, permissions and rule maps must be configured, audited and periodically updated by higher layers.

2.2. Mutual Protection Between Reflex and Policy

The architecture should not be understood as a simple command hierarchy. It is a mutual-protection loop. Policy layers protect reflex layers by defining permitted actions, thresholds, gain schedules, update rules and safe envelopes. Reflex layers protect policy layers by converting continuous high-rate physical perturbations into sparse events and immediate local actions. The policy layer therefore does less real-time work; the reflex layer remains safe because its rule map is constrained and maintained by policy.

This cooperation also clarifies why reflex layers are not merely smaller policies. A policy estimates state and selects among alternatives. A reflex filters the world before the policy sees it: suppressing unsafe states, routing energy, waking dormant electronics, inhibiting a channel, or generating a fault event while higher layers remain asleep, busy or disconnected.

The fly escape response illustrates both the power and the boundary of reflex action. When a hand approaches, visual looming cues and mechanosensory signals couple to fast leg and wing actions before deliberative object recognition is useful [37,38]. The adult Drosophila brain contains about 139,255 neurons and roughly 50 million chemical synapses [39], but escape depends on the right sensor-action pathway being triggered at the right time. If a tool or disturbance does not excite the expected sensing channel, the reflex can fail. The engineering lesson is direct: a reflex layer must be co-designed with the sensing suite, actuator suite and expected event class.

2.3. Layer Spectrum and Minimal Sufficient Inference

Inference complexity is a continuum. The correct architecture is not the one that always uses the most advanced AI, but the one that assigns each event to the least costly layer that can act safely and correctly. The energy and latency values in the following table are proposed design envelopes for a perspective paper, not measured system results.

Level	Inference type	Typical example	Implementation	Energy / latency envelope
0	Direct physical coupling	Mechanical fuse, passive thermostat	Material or mechanical structure	Passive / zero-power
1	Binary threshold	Overcurrent, PV bypass, collision flag	Comparator or binary crossbar	pJ-nJ, ns-us
2	Analog coordination	Tilt correction, hysteresis, timers	Op-amps, analog logic	nJ, us
3	Deterministic digital reflex	PID, sequence, feedforward table	MCU, CPLD, FPGA	uJ, us-ms
4	Local policy	Adaptive MPPT, terrain selection	Edge NPU, DSP	mJ, ms-s
5	Global policy	Fleet learning, large VLA/world model	Cloud, GPU cluster	J+, seconds-minutes

2.4. Formalizing EROIE

For scaled-down systems, energy proportionality must account not only for computation but also for the energy cost of sensing, conversion, communication, and control.

We define Energy Returned on Invested Energy for Embodied Intelligence (EROIE) as:

The ratio of useful energy preserved, delivered, or made operationally available by an embodied control architecture, divided by the total energy invested to sense, decide, communicate, switch, and maintain that control action.

This is expressed as:

EROIE = \frac{E_{useful, out}}{E_{invested, control}}

where the denominator includes the full energy overhead of intelligent control:

\begin{matrix} E_{invested, control} = & E_{sense} + E_{condition} + E_{ADC} + E_{compute} + E_{memory} \\ + E_{comm} + E_{wake} + E_{switch} + E_{balance, loss} + E_{leak} + E_{update} \end{matrix}

Incremental Form for Comparing Architectures

When comparing a reflex-policy architecture (RP) against a baseline (B), the incremental EROIE is often more informative:

Δ {EROIE}_{RP / B} = \frac{E_{useful, out, RP} - E_{useful, out, B}}{E_{invested, control, RP} - E_{invested, control, B}}

This form captures the net change in useful energy output relative to the net change in control energy invested when moving from baseline to reflex-based architecture.

The system boundary must be stated explicitly. In PV and battery systems, E_useful_out may be joules delivered to storage or load after bypass, conversion and balancing. In robotics, it may be mechanical work preserved, damage avoided or mission time extended. In IoT, it may be successful sensing-transmission cycles enabled by harvested energy. The metric is intentionally related to EROI/EROEI in energy systems, but shifted to the device and architecture scale [61,62]. A reflex module is justified only if it increases useful returned energy after its own overhead is counted.

A simple illustrative accounting shows why this matters. Suppose a local PV reflex preserves 1 J otherwise lost during a shading transient. If local sensing, comparator decision and switch command cost 10 microjoules, the event-level EROIE is 100,000. If a centralized loop preserves the same joule after ADC sampling, MCU wake-up, communication and search steps costing 1 millijoule, EROIE is 1,000. These values are illustrative; the point is that small-scale control overhead can decide whether an intervention is energy-positive.

3. Biological Grounding

Biology motivates the architecture but does not prove engineering optimality. The relevant lesson from nervous systems is that response pathways are differentiated by locality, speed, function and plasticity. A spinal withdrawal reflex, for example, can couple a harmful stimulus to muscle activation before cortical interpretation, while higher centers later modulate thresholds and context. Spinal reflexes, brainstem circuits, cerebellar coordination, basal-ganglia action selection and cortical planning occupy different time scales and computational roles [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36].

The vertebrate analogy is useful because it separates urgency, coordination, learned execution, action selection and planning into different mechanisms. It should not be overread. Engineered systems have wires, MOSFETs, converters, software schedulers and certification constraints that biology does not have. The paper therefore uses biology as a source of design heuristics: locality before centrality, sparse events before raw streams, learned modulation before constant deliberation, and graceful degradation before catastrophic dependence on a single processor.

Layer	Biological analogy	Main function	Engineering analogy
1 Physical Reflex	Spinal reflex arc	Immediate protective action	Comparator / binary crossbar near sensor-actuator path
2 Analog Reflex	Brainstem, vestibular reflexes, CPGs	Multi-signal coordination	Analog logic, timers, hysteresis
3 Digital Reflex	Cerebellar feedforward control	Learned deterministic routines	MCU/CPLD/FPGA with fixed-point routines
4 Local Policy	Basal ganglia	Action selection, habits, local adaptation	NPU/DSP/edge optimizer
5 Global Policy	Cortex	Planning, abstraction, long-term learning	Cloud/GPU/large AI model

Figure 1. Spinal reflex arc and Layer 1 binary MTJ crossbar analogy. The point of the analogy is locality and urgency, not biological mimicry.

Figure 2. Brainstem reflex integration and CPG analogy for Layer 2 analog coordination.

Figure 3. Cerebellar learning analogy for Layer 3 deterministic digital reflexes trained or tuned by higher layers.

Figure 4. Basal-ganglia action-selection analogy for Layer 4 local policy.

Three engineering lessons follow. First, lower layers operate concurrently with higher layers; they do not wait for deliberation. Second, lower layers should degrade gracefully when higher layers are unavailable. Third, learning does not have to happen at the layer that executes the fastest action: higher layers can learn parameters and transfer them downward, while lower layers execute efficiently.

4. A Five-Layer Reflex-Policy Architecture

4.1. Layer 1 - Physical Reflex

Layer 1 handles urgent, local, stereotyped events such as overcurrent, PV substring bypass, thermal shutdown, collision flags and local interlocks. It should use comparator-level readout or binary rule maps in the critical path. The preferred implementation is not a general AI accelerator but a local rule module: sensors generate low-resolution flags; the rule map is non-volatile; outputs command switches, wake signals, inhibit signals or safe-state actions.

The strongest Layer 1 candidates have three properties. They are frequent or safety-critical, they have a low-dimensional signature, and their first useful response is binary or low-resolution. Examples include opening a bypass path, inhibiting a gate driver, waking a supervisor, blocking an unsafe injection channel or entering a predefined safe state. If the decision requires scene interpretation, long-horizon prediction or semantic reasoning, it does not belong in Layer 1.

Figure 5. Binary MTJ crossbar for Layer 1 physical reflex processing. The crossbar belongs to the decision path; power electronics or actuators carry the main energy path.

4.2. Layer 2 - Analog Reflex

Layer 2 coordinates a small number of analog signals without high-resolution digitization. Examples include accelerometer-gyroscope tilt correction, light-temperature hysteresis, timing windows, and local multi-sensor interlocks. The implementation can be op-amp summation, hysteresis comparators, timers or compact mixed-signal logic. It expands the reflex concept from single threshold to local coordination.

Figure 6. Layer 2 analog reflex using analog summation, hysteresis and actuator drive without high-resolution digitization.

Layer 2 is included to avoid a false dichotomy between pure binary hardware and full digital control. Many physical events are not well represented by one threshold, but they still do not require high-level inference. A vestibular-like correction in a robot, for example, may combine acceleration, angular velocity and contact flags with analog weighting and hysteresis. The result is still a reflex: it is close to the body, local in scope, and designed to act before a policy layer completes a richer interpretation.

4.3. Layer 3 - Digital Reflex

Layer 3 implements deterministic learned routines with bounded timing: PID, state machines, feedforward tables, safe recovery profiles and fixed-point controllers. It may use moderate-resolution ADCs but does not require a deep model in the urgent loop. Higher policy layers can train or optimize parameters offline and deploy them to Layer 3, which executes them with predictable latency.

Layer 3 is where learning becomes executable without remaining computationally expensive. A policy layer may train a neural controller, but the deployed reflex may be a lookup table, a fixed-point approximation, a finite-state machine or a compact feedforward controller. This is important for certification and debugging. A low-level controller with bounded execution time, inspectable parameters and rollback capability is easier to validate than a large opaque model executing in the urgent loop.

Figure 7. Learning transfer from policy layers to reflex layers: slow training or optimization above, fast deterministic execution below.

4.4. Layers 4 and 5 - Local and Global Policy

Layer 4 performs local optimization, context-dependent action selection and short-term adaptation. Examples include adaptive MPPT under repeatable shading, gait selection in robotics and local anomaly classification. Layer 5 performs long-term learning, fleet-level analytics, large model reasoning and strategic planning. These layers are essential, but they should configure rather than replace the first local protective response.

The distinction between local and global policy is primarily about context and update frequency. Layer 4 adapts to the device and its immediate environment: local shading patterns, terrain class, battery aging state, vibration signature or repeated fault pattern. Layer 5 aggregates many devices or long histories and returns slower improvements. This prevents cloud or fleet intelligence from being confused with real-time control. Fleet learning can improve reflex rules, but it should not be in the first protective path.

Figure 8. Energy-proportional hierarchy: inference complexity, latency and energy increase upward. Frequent events should stay in lower layers when safe.

Figure 9. Five-layer architecture with upward event compression and downward configuration. The architecture is a mutual-protection loop, not a one-way command chain.

4.5. Cross-Layer Interfaces and Failure Behavior

The key interface is bidirectional. Upward communication should be compressed: events, status flags, fault codes, counters and summaries. Downward communication should be configurational: thresholds, gains, permissions, rule maps, safety envelopes and update schedules. This prevents local physics from becoming an expensive global data stream.

A practical implementation should therefore specify two contracts. The upward contract defines what a lower layer reports: event type, timestamp, confidence, local state summary and whether the reflex action succeeded. The downward contract defines what a higher layer may change: thresholds, permissions, rule-map entries, timing windows and safe defaults. These contracts make the architecture more than a metaphor and create a natural place for formal verification, logging and certification evidence.

Failure condition	Expected behavior	Minimum safe layer
Layer 5 unavailable	Layer 4 continues local policy; Layers 1-3 remain active	Layer 1/2 for safety
Layer 4 overloaded	Layer 3 deterministic routines continue; urgent events remain local	Layer 1/2
Layer 3 firmware fault	Analog and physical reflexes hold safe states or inhibit actuation	Layer 1/2
Sensor mismatch	Policy detects repeated false negatives/positives and updates sensing or thresholds	Layer 4/5 for diagnosis
Rule-map update error	Formal constraints and rollback prevent unsafe reflex permissions	Layer 1 safe default

Algorithmic design rule: assign a task to Layer k only if all lower layers k-1 cannot meet the required combination of latency, safety, precision, adaptability and EROIE. This is a design heuristic, not a theorem. Formal verification of cross-layer updates remains an open research problem.

5. Spintronics as a Candidate Reflex Substrate

5.1. Binary Availability and the Architecture Gap

The spintronic argument should be stated carefully. Binary magnetic memory technology is not imaginary: discrete MRAM, embedded MRAM and foundry routes are commercially described by vendors and foundries such as Everspin, GlobalFoundries, Samsung Foundry and Avalanche/Renesas [40,41,42,43]. These products demonstrate the industrial relevance of non-volatile binary MTJ-based technology, but they do not yet provide the Physical AI reflex-layer crossbar module proposed here.

This distinction corrects a common TRL misunderstanding. It would be too conservative to call the entire spintronic basis TRL 2-3, because MRAM and eMRAM are commercial or foundry-supported in specific memory roles. It would also be too optimistic to assign the proposed reflex architecture the same maturity, because a certified, sensor-coupled binary crossbar reflex module still has to be designed and demonstrated.

The two-level conclusion is that MTJ/MRAM cells and macros are relatively mature in memory applications, while the reflex-layer crossbar architecture is lower TRL because it requires sensing interfaces, rule-map programming, comparator readout, safety constraints and certification. Device availability makes the architecture plausible and worth prototyping; it does not make the module already available.

Technology element	Current maturity	Relevance to this paper	Caution
Discrete MRAM / STT-MRAM	Commercial products	Shows non-volatile binary magnetic memory availability	Memory product, not reflex module
Embedded MRAM / foundry macros	Foundry-level offering in selected nodes	Supports integration with CMOS control logic	Access depends on node and design rules
Binary MTJ crossbar with comparator readout	Research/prototype target	Candidate Layer 1 rule fabric	Needs reflex-specific validation
Safety-certified reflex module	Future system product	Potential robotics/energy/IoT component	Requires verification and qualification

5.2. Decision-Power Path Isolation

The most important spintronic system principle is Decision-Power Path Isolation. The crossbar should not carry the main power flow. It receives sensor flags and emits small-signal commands. The power path remains in MOSFETs, converters, batteries, motors, bypass switches or injection stages. This is analogous to a pilot valve or railway switch: a small signal selects a route; the main energy flows elsewhere. This preserves the low-energy value of the reflex layer and makes the architecture compatible with existing MPPT, BMS and motor-driver supervisors.

Figure 10. Detailed binary MTJ crossbar for Layer 1. The crossbar stores rules and produces comparator-level decisions; the main power path is outside the array.

This principle is crucial for PV, BMS and robotics examples. In a PV string, the crossbar should not conduct the harvested current; it selects bypass or wake-up commands. In a battery pack, it should not become the balancing converter; it selects safe energy-routing channels under BMS permission. In a robot, it should not drive the motor phase current; it commands a gate, interlock or safe-state response. This separation keeps the reflex module small, fast and low energy while allowing conventional power electronics to do what they already do well.

5.3. Why Binary, Not Primarily Multilevel AI Acceleration

Many spintronic and resistive crossbar papers emphasize analog or multilevel in-memory computing for neural-network acceleration [44,45,46,47,48,49,50,51,52,53,54]. That is important, but it is not the main target here. Reflex-layer decisions are often naturally binary or low-resolution: allow/block, bypass/connect, wake/sleep, inhibit/enable, safe/unsafe. Binary operation reduces ADC burden, simplifies verification, improves interpretability of rule maps and aligns better with safety interlocks.

Binary operation is also attractive from a safety argument. A binary rule can be inspected as a permission matrix or interlock table. It can be tested exhaustively for a bounded number of input flags. It can default to a safe state if a contradiction or unavailable sensor is detected. Multilevel analog inference may be powerful for policy acceleration, but safety reflexes often benefit from simpler observability and clearer fault semantics.

5.4. Alternatives and Honest Comparison

Binary MTJ crossbars are not the only possible substrate. CMOS comparators are mature and should remain the baseline for many Layer 1 tasks. RRAM, FeFET, SRAM LUTs, FPGAs and analog ASICs may be superior in different trade spaces. The case for spintronics is strongest when non-volatility, instant-on behavior, low standby power, radiation or temperature tolerance, and compact rule storage matter.

The correct substrate varies by application. A CMOS comparator may be sufficient in cost-sensitive devices; non-volatility and magnetic robustness may matter more in space, harvesting, wearable or high-reliability systems. The perspective is therefore substrate-aware, not substrate-exclusive: binary spintronics is a strong candidate where its properties match the reflex function.

Substrate	Strengths	Weaknesses	Best role in this framework
CMOS comparators	Mature, cheap, certifiable	Limited rule density; volatile configuration unless supported	Simple Layer 1 thresholds
SRAM / LUT / FPGA	Fast, programmable	Volatile, leakage, configuration overhead	Layer 3 deterministic reflexes
RRAM / memristor crossbar	Dense, in-memory compute	Variability, endurance and analog readout challenges	Research alternative for Layers 1-4
FeFET arrays	CMOS-friendly, non-volatile	Endurance and process maturity vary	Alternative non-volatile rule memory
Binary MTJ / MRAM crossbar	Non-volatile, fast read, high endurance potential, magnetic robustness	Reflex crossbar not yet productized; design/safety validation needed	Layer 1 rule maps and configuration storage

6. Application Windows

6.1. PV Harvesting and Battery Balancing

PV harvesting is a useful example because the source is distributed, dynamic and locally perturbed. Classical MPPT remains essential, but it often observes aggregate voltage and current after a local substring, shadow or hot-spot condition has already changed the source. A PV-side reflex module could react to local voltage, current, temperature or shading flags and command bypass, reconnection, inhibition or MPPT wake-up before the aggregate controller completes a search. The module does not replace MPPT; it improves the source that MPPT sees [54,55,56,57].

The same argument applies on the battery side. Classical BMS logic is indispensable for safety and must remain the authority for charge, discharge and fault permissions. The proposed reflex module does not bypass this authority. It acts inside a permission envelope, selecting a local route only when the BMS has allowed it and local flags confirm that voltage, temperature and fault conditions are safe. This is the mutual-protection principle in energy form: the BMS protects the reflex from unsafe routing; the reflex protects the BMS from high-rate local routing decisions.

A battery-side reflex module is analogous. Passive balancing is robust but dissipates useful energy as heat; active balancing reduces heat but adds switches, converters and control complexity [58,63]. A reflex module can act as a route selector under BMS permission: available PV or auxiliary energy is directed to the weakest safe cluster, unsafe channels are blocked, and only sparse events are escalated. The EROIE question is whether the additional sensing and decision overhead is lower than the extra useful energy captured or preserved.

6.2. Humanoid and Mobile Robotics

In robotics, Layer 1 may handle overcurrent, impact and joint-limit protection; Layer 2 may handle analog tilt or contact reflexes; Layer 3 may execute deterministic balance and recovery profiles; Layer 4 selects gait or action mode; Layer 5 updates policies from fleet experience. The value is not that lower layers are intelligent in the human sense, but that they keep high-rate safety events out of expensive policy loops. Compared with a monolithic controller, the perspective predicts lower average decision energy and stronger temporal isolation, but this must be validated experimentally.

A useful test case would be a mobile robot joint exposed to repeated impacts. A monolithic controller might digitize high-rate current and torque signals, process them through a shared real-time loop and then command a response. A reflex-policy controller would let Layer 1 clamp or inhibit immediately, let Layer 3 execute a recovery profile, and let Layer 4/5 analyze repeated patterns afterward. The expected gains would be lower worst-case latency, reduced upstream bandwidth and fewer policy interrupts. Those gains should be measured, not assumed.

6.3. Energy-Harvesting IoT

In harvesting-powered IoT, the strongest EROIE constraint appears when sensing, wake-up and communication cost are comparable to harvested energy. A Layer 1 voltage comparator can wake a radio only when an ultracapacitor has enough energy. Layer 2 can alter duty cycle based on vibration amplitude or light level. Layer 3 packages data. Layer 4 classifies anomalies only when needed. Layer 5 aggregates long-term health information. Here the reflex-policy architecture is mainly a strategy for avoiding wasted wake cycles and unnecessary data movement.

The IoT example also shows why EROIE is not identical to low-power electronics. A node may contain a very efficient microcontroller and still be system-negative if it wakes too often, samples too much, transmits too much, or spends harvested energy discovering that no useful event occurred. A reflex layer can improve net value by deciding when not to wake the rest of the system. In that sense, inhibition is as important as activation.

7. Comparison with Existing Architectures

The proposed framework overlaps with prior architectures but emphasizes a different target: energy and ADC partitioning along the physical chain. The table below is qualitative, not a benchmark. It compares locality of actuation, explicit energy/ADC partitioning, interface structure, hardware-reflex suitability and validation maturity.

The comparison should be read cautiously. Brooks subsumption was not designed as an energy-harvesting or ADC-budget framework; ROS 2 is middleware, not a physical reflex fabric [59]; and mixed-criticality systems provide scheduling and assurance concepts rather than an energy-path allocation rule [60]. These approaches are complementary precedents.

Architecture	What it already provides	What is missing for this paper’s goal	Relation to proposed framework
Brooks subsumption	Layered behavior and robust local control [7,8]	Not explicitly ADC/energy/EROIE driven	Historical precedent for layered competence
Hierarchical RL	Temporal abstraction and sub-policy learning [9]	Often compute-centric and not hardware-local	Useful for policy-to-reflex training
ROS 2 / micro-ROS	Modular middleware and embedded integration [59]	Middleware does not define physical reflex rule fabrics	Complementary system software
Mixed-criticality systems	Temporal isolation and criticality-aware scheduling [60]	Mostly software/scheduling boundary, not energy-return metric	Useful verification precedent
Neuromorphic / SNN	Event-driven low-energy computation [16,17,18]	Does not by itself assign events to physical energy paths	Candidate implementation style
Physical neural networks	Computation in physical substrates [19,20]	Usually focuses on learned inference, not safety reflex rules	Adjacent hardware paradigm
Proposed reflex-policy	ADC-light/ADC-heavy partition, mutual protection, EROIE	Needs prototype and formal validation	Perspective and research roadmap

7.1. Tightened Novelty Claims

The novelty should be bounded in four layers. Known: hierarchical control, biological reflex analogies, edge/cloud decomposition, neuromorphic efficiency and mixed-criticality scheduling are established. Synthesized here: an explicit Reflex-Policy vocabulary tied to ADC-light versus ADC-heavy partitioning. Proposed as an engineering principle: assigning each physical event to the lowest sufficient layer along the sensing-energy-action chain, with upward event compression and downward rule configuration. Hypothesized but not yet proven: binary MTJ crossbars may be especially suitable as non-volatile Layer 1 rule fabrics for selected Physical AI systems.

This bounded novelty statement is intentionally more modest than saying that the framework is the first Physical AI architecture. It is not. The sharper claim is that Physical AI needs a named architectural layer for low-resolution physical reflex modules, a measurable energy-return metric for scale-down, and a technology roadmap that treats binary non-volatile rule fabrics differently from high-precision AI accelerators. This is a narrower claim, but it is more defensible.

7.2. Validation Roadmap

A credible research program should now move from perspective to evidence. Priority experiments are: (i) a Layer 1 reflex demonstrator comparing CMOS comparator, SRAM/LUT and binary MTJ-like rule maps; (ii) a PV or BMS simulation with EROIE accounting; (iii) a robotic safety-loop benchmark measuring latency, energy, event compression and failure behavior; and (iv) formal constraints for rule-map updates so that policy cannot accidentally disable safety reflexes.

The first validation should not attempt to prove the entire five-layer hierarchy at once. The recommended flagship path is a constrained demonstrator such as PV substring bypass, because it naturally exposes local sensing, event latency, bypass/routing decisions and EROIE accounting. A parallel robotic overcurrent demonstrator would be equally valuable for safety timing. In either case, three variants should be compared: centralized software, CMOS reflex and non-volatile crossbar-like reflex. The measured outputs should include event latency, energy per decision, useful energy recovered, number of upward events, false positives, false negatives and fail-safe behavior after higher-layer loss.

7.3. Limitations

The framework has important limitations. The current paper does not prove optimal layer allocation. It does not present measured MTJ reflex hardware. Biological analogy motivates but does not validate the engineering design. EROIE depends strongly on system boundaries and must be reported transparently. Cross-layer verification may also become combinatorially difficult as sensors, rule-map entries and update permissions grow. Finally, safety certification may favor simpler CMOS or mechanical reflexes over spintronics in some applications. These limitations do not weaken the perspective; they define the next validation steps.

Another limitation concerns sensor adequacy. As the fly example suggests, a reflex cannot respond to what it cannot sense. A poorly placed sensor, a slow transducer, a missing airflow cue, or a misleading aggregate measurement can make a local reflex ineffective regardless of how elegant the rule fabric is. Therefore, the architecture must be evaluated as a sensing-decision-actuation module, not only as a computing array.

8. Conclusion

Physical AI should not only compute policies for bodies. It should distribute intelligence along the physical energy and action chain. Reflex layers and policy layers collaborate: policy learns and constrains; reflex executes and protects. The practical value is avoiding unnecessary ADC conversion, communication, wake-up, balancing heat and centralized decision load.

Binary spintronic crossbars are positioned as a promising Layer 1 substrate because MTJ/MRAM technology exists in commercial and foundry contexts, while non-volatile local rule maps match many reflex decisions. The paper does not claim that a reflex crossbar is already mature; it argues that the gap between available binary spintronic memory and application-specific reflex modules is a meaningful research opportunity.

The proposed EROIE formalization extends energy proportionality into embodied system integration. For scaled-down energy-harvesting robots, PV-BMS nodes and IoT devices, intelligence is useful only if recovered, protected or operational energy exceeds the energy spent sensing, deciding, communicating and switching. That is the central perspective of this manuscript.

Acknowledgments

This work was supported by the HORIZON-JU-Chips-2025-1-IA project NeAIxt (grant no. 101194172).

Conflicts of Interest

The authors are employed by the company IFEVS and declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acronym Table

Acronym	Meaning	Use in this paper
ADC	Analog-to-Digital Converter	Conversion avoided or minimized in low reflex layers
AI	Artificial Intelligence	General field
ANN	Artificial Neural Network	Conventional neural-network model
BEOL	Back-End-of-Line	CMOS integration level relevant to MTJs
BMS	Battery Management System	Battery supervision and safety
CMOS	Complementary Metal-Oxide-Semiconductor	Mainstream semiconductor technology
CPG	Central Pattern Generator	Biological rhythmic-control circuit
CPLD	Complex Programmable Logic Device	Deterministic digital reflex implementation
DSP	Digital Signal Processor	Embedded signal processing
EROIE	Energy Returned on Invested Energy for Embodied Intelligence	Scale-down metric proposed in this paper
FPGA	Field-Programmable Gate Array	Deterministic programmable control
HRL	Hierarchical Reinforcement Learning	Policy decomposition baseline
IoT	Internet of Things	Distributed sensing domain
MCU	Microcontroller Unit	Embedded controller
MPPT	Maximum Power Point Tracking	PV energy-harvesting control
MRAM	Magnetoresistive Random-Access Memory	Non-volatile spintronic memory
MTJ	Magnetic Tunnel Junction	Core binary spintronic device
NPU	Neural Processing Unit	Edge AI accelerator
PID	Proportional-Integral-Derivative	Classical control method
PNN	Physical Neural Network	Computation in physical substrates
PV	Photovoltaic	Solar energy harvesting
RL	Reinforcement Learning	Policy optimization
SNN	Spiking Neural Network	Event-driven neuromorphic model
TRL	Technology Readiness Level	Maturity framing for devices and systems
VLA	Vision-Language-Action	Embodied AI model family

References

Sitti, M. Physical intelligence as a new paradigm. Extreme Mechanics Letters 46, 101340 (2021). [CrossRef]
Ray, P. P. Physical AI: Bridging the sim-to-real divide toward embodied, ethical, and autonomous intelligence. Machine Learning for Computational Science and Engineering 2, 1 (2026). [CrossRef]
Brohan, A. et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. arXiv:2307.15818 (2023).
Kim, M. J. et al. OpenVLA: An open-source vision-language-action model. arXiv:2406.09246 (2024).
Ha, D. and Schmidhuber, J. World models. arXiv:1803.10122 (2018).
Hasani, R., Lechner, M., Amini, A., Rus, D. and Grosu, R. Liquid time-constant networks. Proceedings of the AAAI Conference on Artificial Intelligence 35(9), 7657-7666 (2021). [CrossRef]
Brooks, R. A. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation 2(1), 14-23 (1986). [CrossRef]
Brooks, R. A. Intelligence without representation. Artificial Intelligence 47(1-3), 139-159 (1991). [CrossRef]
Botvinick, M. M. Hierarchical models of behavior and prefrontal function. Trends in Cognitive Sciences 12(5), 201-208 (2008). [CrossRef]
Prescott, T. J. Forced moves or good tricks in design space? Adaptive Behavior 15(1), 9-31 (2007).
Zhang, H., Solak, G. and Ajoudani, A. Bresa: Bio-inspired reflexive safe reinforcement learning for contact-rich robotic tasks. arXiv:2503.21989 (2025).
Wulf, W. A. and McKee, S. A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23(1), 20-24 (1995). [CrossRef]
Barroso, L. A. and Hoelzle, U. The case for energy-proportional computing. Computer 40(12), 33-37 (2007). [CrossRef]
Sze, V., Chen, Y.-H., Yang, T.-J. and Emer, J. S. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105(12), 2295-2329 (2017). [CrossRef]
Horowitz, M. Computing’s energy problem (and what we can do about it). IEEE International Solid-State Circuits Conference Digest, 10-14 (2014). [CrossRef]
Mead, C. Neuromorphic electronic systems. Proceedings of the IEEE 78(10), 1629-1636 (1990). [CrossRef]
Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659-1671 (1997). [CrossRef]
Roy, K., Jaiswal, A. and Panda, P. Towards spike-based machine intelligence with neuromorphic computing. Nature 575, 607-617 (2019). [CrossRef]
Momeni, A. et al. Training of physical neural networks. Nature 645, 53-61 (2025). [CrossRef]
Wright, L. G. et al. Deep physical neural networks trained with backpropagation. Nature 601, 549-555 (2022). [CrossRef]
Kandel, E. R. et al. Principles of Neural Science, 6th ed. McGraw-Hill (2021).
Purves, D. et al. Neuroscience, 6th ed. Oxford University Press (2018).
Sherrington, C. S. The Integrative Action of the Nervous System. Yale University Press (1906).
Jackson, J. H. The Croonian lectures on evolution and dissolution of the nervous system. British Medical Journal (1884).
York, G. K. and Steinberg, D. A. Hughlings Jackson’s neurological ideas. Brain 134(10), 3106-3113 (2011). [CrossRef]
Grillner, S. Biological pattern generation: The cellular and computational logic of networks in motion. Neuron 52(5), 751-766 (2006). [CrossRef]
Grillner, S. and Robertson, B. The basal ganglia over 500 million years. Current Biology 26(20), R1088-R1100 (2016). [CrossRef]
Marder, E. and Bucher, D. Central pattern generators and the control of rhythmic movements. Current Biology 11(23), R986-R996 (2001). [CrossRef]
Angelaki, D. E. and Cullen, K. E. Vestibular system: The many facets of a multimodal sense. Annual Review of Neuroscience 31, 125-150 (2008). [CrossRef]
Cullen, K. E. The vestibular system: Multimodal integration and encoding of self-motion for motor control. Journal of Neurophysiology 107(3), 727-738 (2012). [CrossRef]
Ito, M. Cerebellar circuitry as a neuronal machine. Progress in Neurobiology 78(3-5), 272-303 (2006). [CrossRef]
Wolpert, D. M., Miall, R. C. and Kawato, M. Internal models in the cerebellum. Trends in Cognitive Sciences 2(9), 338-347 (1998). [CrossRef]
Albus, J. S. A theory of cerebellar function. Mathematical Biosciences 10(1-2), 25-61 (1971). [CrossRef]
Marr, D. A theory of cerebellar cortex. Journal of Physiology 202(2), 437-470 (1969). [CrossRef]
Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology 10(6), 732-739 (2000). [CrossRef]
Miller, E. K. and Cohen, J. D. An integrative theory of prefrontal cortex function. Annual Review of Neuroscience 24, 167-202 (2001). [CrossRef]
Card, G. and Dickinson, M. H. Visually mediated motor planning in the escape response of Drosophila. Current Biology 18(17), 1300-1307 (2008). [CrossRef]
Morimoto, M. M. et al. Spatial readout of visual looming in the central brain of Drosophila. eLife 9, e57685 (2020). [CrossRef]
Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Nature 634, 124-138 (2024). [CrossRef]
Everspin Technologies. MRAM product information. Retrieved May 2026 from everspin.com.
GlobalFoundries. Embedded memory and MRAM technology information. Retrieved May 2026 from gf.com.
Samsung Foundry. Specialty technology: eMRAM. Retrieved May 2026 from semiconductor.samsung.com.
Renesas Electronics. IDT offers Avalanche Technology’s MRAM devices. News release (2019). Retrieved May 2026 from renesas.com.
Apalkov, D., Dieny, B. and Slaughter, J. M. Magnetoresistive random access memory. Proceedings of the IEEE 104(10), 1796-1830 (2016). [CrossRef]
Bhatti, S. et al. Spintronics based random access memory: A review. Materials Today 20(9), 530-548 (2017). [CrossRef]
Grollier, J. et al. Neuromorphic spintronics. Nature Electronics 3, 360-370 (2020). [CrossRef]
Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428-431 (2017). [CrossRef]
Borders, W. A. et al. Integer factorization using stochastic magnetic tunnel junctions. Nature 573, 390-393 (2019). [CrossRef]
Manipatruni, S. et al. Scalable energy-efficient magnetoelectric spin-orbit logic. Nature 565, 35-42 (2019). [CrossRef]
Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. and Eleftheriou, E. Memory devices and applications for in-memory computing. Nature Nanotechnology 15, 529-544 (2020). [CrossRef]
Ielmini, D. and Wong, H.-S. P. In-memory computing with resistive switching devices. Nature Electronics 1, 333-343 (2018). [CrossRef]
Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Advances in Physics: X 2(1), 89-124 (2017). [CrossRef]
Gokmen, T. and Vlasov, Y. Acceleration of deep neural network training with resistive cross-point devices. Frontiers in Neuroscience 10, 333 (2016). [CrossRef]
Esram, T. and Chapman, P. L. Comparison of photovoltaic array maximum power point tracking techniques. IEEE Transactions on Energy Conversion 22(2), 439-449 (2007). [CrossRef]
Patel, H. and Agarwal, V. Maximum power point tracking scheme for PV systems operating under partially shaded conditions. IEEE Transactions on Industrial Electronics 55(4), 1689-1698 (2008). [CrossRef]
Silvestre, S., Boronat, A. and Chouder, A. Study of bypass diodes configuration on PV modules. Applied Energy 86(9), 1632-1640 (2009). [CrossRef]
Daliento, S., Mele, L. and Spirito, P. Analysis and modeling of hot spot phenomena in photovoltaic modules. IEEE Transactions on Electron Devices 59(3), 727-734 (2012).
Cao, J., Schofield, N. and Emadi, A. Battery balancing methods: A comprehensive review. IEEE Vehicle Power and Propulsion Conference (2008). [CrossRef]
Macenski, S., Foote, T., Gerkey, B., Lalancette, C. and Woodall, W. Robot Operating System 2: Design, architecture, and uses in the wild. Science Robotics 7(66), eabm6074 (2022). [CrossRef]
Burns, A. and Davis, R. I. A survey of research into mixed criticality systems. ACM Computing Surveys 50(6), Article 82 (2017). [CrossRef]
Murphy, D. J. and Hall, C. A. S. Year in review - EROI or energy return on energy invested. Annals of the New York Academy of Sciences 1185, 102-118 (2010). [CrossRef]
Hall, C. A. S., Lambert, J. G. and Balogh, S. B. EROI of different fuels and the implications for society. Energy Policy 64, 141-152 (2014). [CrossRef]
Safari, A., Sorouri, H., Oshnoei, A. and Blaabjerg, F. A state-of-the-art review on battery cell balancing strategies. Discover Energy 5, 31 (2025). [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.