Preprint
Article

This version is not peer-reviewed.

Physical AI as Seen from Nature: A Reflex-Policy Layered Architecture for Energy-Proportional Embodied Intelligence

Submitted:

05 June 2026

Posted:

08 June 2026

You are already at the latest version

Abstract
Physical AI requires machines to sense, decide and act under tight constraints of energy, latency, safety and robustness. A fly escaping an approaching hand captures the core principle: a fast sensor-action reflex acts before full deliberation, while higher neural resources remain available for richer behavior. This Perspective proposes a layered Reflex-Policy architecture in which reflex layers execute fast, local, ADC-light actions near sensors and actuators, while policy layers perform slower learning, planning, optimization and rule updates. The two are mutually protective collaborators: policy layers define safe operating envelopes, while reflex layers shield policy processors from high-frequency events and avoidable data floods. We position binary spintronic MTJ crossbars as a plausible technology path for low reflex layers, while distinguishing today's MRAM/eMRAM technologies from the proposed reflex-layer crossbar module, which remains an architectural research target. The contribution is a system-integration framework: intelligence is distributed along the sensing, energy, storage and action chain, and each event is assigned to the lowest sufficient layer. We formalize Energy Returned on Invested Energy for Embodied Intelligence (EROIE), compare the framework with related architectures, and state its limitations and validation needs.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction and Positioning

1.1. Physical AI as Embodied, Energy-Constrained Intelligence

Artificial intelligence is moving from disembodied perception and language tasks into machines that physically intervene in the world. Physical AI and Physical Intelligence emphasize that the body, the environment and the material substrate are not passive containers for computation: they shape what can be sensed, computed and safely acted upon [1,2]. Vision-Language-Action models, world models and continuous-time neural models have accelerated the policy side of this transition, giving robots richer perception, instruction following and adaptation capabilities [3,4,5,6].
However, embodiment changes the engineering problem. A robot joint cannot wait for cloud inference before limiting a dangerous torque spike; a PV substring under fast partial shading cannot wait for a slow aggregate MPPT search before local energy is lost; and a harvesting-powered IoT node cannot afford to digitize and upload every raw signal. A simple biological image makes the point early: a fly escaping a hand does not first solve a full scene-understanding problem; its sensing suite activates an escape action before slow deliberation is useful. Physical AI therefore needs an architecture in which not every event receives the same computation. The central design question becomes: what is the lowest sufficient layer that can safely act on a given physical event?

1.2. Why a Reflex-Policy Perspective Is Needed

Hierarchical control is not new. Brooks’ subsumption architecture, hierarchical models of behavior and layered robotic systems have long shown that complex behavior can emerge from multiple interacting levels [7,8,9,10]. Recent safe reinforcement-learning work also reintroduces the biological idea that safety-related reflexes may operate faster than task-level policies [11]. What remains underdeveloped is a hardware-aware and energy-aware partition in which reflex actions are explicitly ADC-light, physically local and protected by higher policy layers.
The proposed Reflex-Policy framework is therefore not a new theory of all intelligence. It is a perspective on system integration for embodied systems. Its contribution is compactly threefold: it proposes a Reflex-Policy vocabulary for distributing embodied decisions; it introduces EROIE as a scale-down metric for useful energy returned after sensing and control overhead; and it identifies binary non-volatile rule fabrics, including MTJ-based technologies, as a promising Layer 1 research path. The novelty is the combination of explicit ADC-light versus ADC-heavy partitioning, energy-proportional assignment to the lowest sufficient layer, bidirectional rule transfer between policy and reflex layers, and a candidate spintronic implementation path for binary local reflex modules.
The strength of the contribution is not a single benchmark but a clearer way to ask design questions. Should a temperature transient be sampled by a 12-bit ADC and sent to an NPU, or should it trigger a local threshold? Should a PV substring fault be inferred from aggregate power oscillations, or should it be handled at the cell or substring level? Should a robot joint stream high-rate current waveforms upward, or should it report only a compressed event after local protection has acted? These questions are often treated as implementation details. Here they are elevated to the architectural level.
The hardware motivation is equally strong. The memory wall, ADC cost, data movement and scaling limits of monolithic inference motivate energy-proportional decision allocation [12,13,14,15]. Neuromorphic, spiking and physical neural systems already show why event-driven and substrate-embedded computation matter [16,17,18,19,20].

1.3. Contribution and Limits of This Perspective

The quantitative values in the layer tables should be read as design envelopes and research targets, not as measured system results. This explicit positioning is important: the paper’s purpose is to define an architectural research direction and a comparative vocabulary that can guide prototypes, simulations and EROIE measurements.
Accordingly, where numerical energy or latency values are given, they indicate the order of magnitude that a layer should target, not a measured result of the specific architecture.
The contribution is organized around three claims. First, Physical AI should distribute intelligence along the physical energy and action chain, rather than routing all decisions through a centralized policy processor. Second, reflex and policy layers should collaborate: policy defines rules that keep reflexes safe, while reflexes prevent policy from being overloaded. Third, binary spintronic devices are not yet a reflex product, but the existence of commercial MRAM/eMRAM and foundry routes makes binary MTJ technology a realistic substrate to explore, not a purely speculative material idea.

2. Reflex and Policy Layers: Definitions, Cooperation and EROIE

2.1. Reflex Layers and Policy Layers

A reflex layer is a decision pathway in which sensing maps directly, or with only minimal analog preprocessing, onto action. It uses thresholds, binary rules, hysteresis, low-bit weighting or deterministic state machines. It should be close to the physical event, use no high-resolution ADC in the urgent path, and remain operational even when higher layers are unavailable. A policy layer performs higher-inference tasks: state estimation, prediction, optimization, learning, planning and fleet coordination. It can afford ADCs, NPUs, GPUs or cloud resources because it is invoked less frequently and is not responsible for the first protective action.
The mirror metaphor is useful but limited. A mirror reflects without thinking because the material geometry performs the transformation. Similarly, a binary crossbar can map local conditions to local actions without executing a software loop in the critical path. Unlike a mirror, however, a reflex module must be bounded by policy: its thresholds, permissions and rule maps must be configured, audited and periodically updated by higher layers.

2.2. Mutual Protection Between Reflex and Policy

The architecture should not be understood as a simple command hierarchy. It is a mutual-protection loop. Policy layers protect reflex layers by defining permitted actions, thresholds, gain schedules, update rules and safe envelopes. Reflex layers protect policy layers by converting continuous high-rate physical perturbations into sparse events and immediate local actions. The policy layer therefore does less real-time work; the reflex layer remains safe because its rule map is constrained and maintained by policy.
This cooperation also clarifies why reflex layers are not merely smaller policies. A policy estimates state and selects among alternatives. A reflex filters the world before the policy sees it: suppressing unsafe states, routing energy, waking dormant electronics, inhibiting a channel, or generating a fault event while higher layers remain asleep, busy or disconnected.
The fly escape response illustrates both the power and the boundary of reflex action. When a hand approaches, visual looming cues and mechanosensory signals couple to fast leg and wing actions before deliberative object recognition is useful [37,38]. The adult Drosophila brain contains about 139,255 neurons and roughly 50 million chemical synapses [39], but escape depends on the right sensor-action pathway being triggered at the right time. If a tool or disturbance does not excite the expected sensing channel, the reflex can fail. The engineering lesson is direct: a reflex layer must be co-designed with the sensing suite, actuator suite and expected event class.

2.3. Layer Spectrum and Minimal Sufficient Inference

Inference complexity is a continuum. The correct architecture is not the one that always uses the most advanced AI, but the one that assigns each event to the least costly layer that can act safely and correctly. The energy and latency values in the following table are proposed design envelopes for a perspective paper, not measured system results.
Level Inference type Typical example Implementation Energy / latency envelope
0 Direct physical coupling Mechanical fuse, passive thermostat Material or mechanical structure Passive / zero-power
1 Binary threshold Overcurrent, PV bypass, collision flag Comparator or binary crossbar pJ-nJ, ns-us
2 Analog coordination Tilt correction, hysteresis, timers Op-amps, analog logic nJ, us
3 Deterministic digital reflex PID, sequence, feedforward table MCU, CPLD, FPGA uJ, us-ms
4 Local policy Adaptive MPPT, terrain selection Edge NPU, DSP mJ, ms-s
5 Global policy Fleet learning, large VLA/world model Cloud, GPU cluster J+, seconds-minutes

2.4. Formalizing EROIE

For scaled-down systems, energy proportionality must account not only for computation but also for the energy cost of sensing, conversion, communication, and control.
We define Energy Returned on Invested Energy for Embodied Intelligence (EROIE) as:
The ratio of useful energy preserved, delivered, or made operationally available by an embodied control architecture, divided by the total energy invested to sense, decide, communicate, switch, and maintain that control action.
This is expressed as:
EROIE = E useful ,   out E invested ,   control
where the denominator includes the full energy overhead of intelligent control:
E invested ,   control =     E sense + E condition + E ADC + E compute + E memory + E comm + E wake + E switch + E balance , loss + E leak + E update

Incremental Form for Comparing Architectures

When comparing a reflex-policy architecture (RP) against a baseline (B), the incremental EROIE is often more informative:
Δ EROIE RP / B = E useful ,   out ,   RP E useful ,   out ,   B E invested ,   control ,   RP E invested ,   control ,   B
This form captures the net change in useful energy output relative to the net change in control energy invested when moving from baseline to reflex-based architecture.
The system boundary must be stated explicitly. In PV and battery systems, E_useful_out may be joules delivered to storage or load after bypass, conversion and balancing. In robotics, it may be mechanical work preserved, damage avoided or mission time extended. In IoT, it may be successful sensing-transmission cycles enabled by harvested energy. The metric is intentionally related to EROI/EROEI in energy systems, but shifted to the device and architecture scale [61,62]. A reflex module is justified only if it increases useful returned energy after its own overhead is counted.
A simple illustrative accounting shows why this matters. Suppose a local PV reflex preserves 1 J otherwise lost during a shading transient. If local sensing, comparator decision and switch command cost 10 microjoules, the event-level EROIE is 100,000. If a centralized loop preserves the same joule after ADC sampling, MCU wake-up, communication and search steps costing 1 millijoule, EROIE is 1,000. These values are illustrative; the point is that small-scale control overhead can decide whether an intervention is energy-positive.

3. Biological Grounding

Biology motivates the architecture but does not prove engineering optimality. The relevant lesson from nervous systems is that response pathways are differentiated by locality, speed, function and plasticity. A spinal withdrawal reflex, for example, can couple a harmful stimulus to muscle activation before cortical interpretation, while higher centers later modulate thresholds and context. Spinal reflexes, brainstem circuits, cerebellar coordination, basal-ganglia action selection and cortical planning occupy different time scales and computational roles [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36].
The vertebrate analogy is useful because it separates urgency, coordination, learned execution, action selection and planning into different mechanisms. It should not be overread. Engineered systems have wires, MOSFETs, converters, software schedulers and certification constraints that biology does not have. The paper therefore uses biology as a source of design heuristics: locality before centrality, sparse events before raw streams, learned modulation before constant deliberation, and graceful degradation before catastrophic dependence on a single processor.
Layer Biological analogy Main function Engineering analogy
1 Physical Reflex Spinal reflex arc Immediate protective action Comparator / binary crossbar near sensor-actuator path
2 Analog Reflex Brainstem, vestibular reflexes, CPGs Multi-signal coordination Analog logic, timers, hysteresis
3 Digital Reflex Cerebellar feedforward control Learned deterministic routines MCU/CPLD/FPGA with fixed-point routines
4 Local Policy Basal ganglia Action selection, habits, local adaptation NPU/DSP/edge optimizer
5 Global Policy Cortex Planning, abstraction, long-term learning Cloud/GPU/large AI model
Figure 1. Spinal reflex arc and Layer 1 binary MTJ crossbar analogy. The point of the analogy is locality and urgency, not biological mimicry.
Figure 1. Spinal reflex arc and Layer 1 binary MTJ crossbar analogy. The point of the analogy is locality and urgency, not biological mimicry.
Preprints 217232 g001
Figure 2. Brainstem reflex integration and CPG analogy for Layer 2 analog coordination.
Figure 2. Brainstem reflex integration and CPG analogy for Layer 2 analog coordination.
Preprints 217232 g002
Figure 3. Cerebellar learning analogy for Layer 3 deterministic digital reflexes trained or tuned by higher layers.
Figure 3. Cerebellar learning analogy for Layer 3 deterministic digital reflexes trained or tuned by higher layers.
Preprints 217232 g003
Figure 4. Basal-ganglia action-selection analogy for Layer 4 local policy.
Figure 4. Basal-ganglia action-selection analogy for Layer 4 local policy.
Preprints 217232 g004
Three engineering lessons follow. First, lower layers operate concurrently with higher layers; they do not wait for deliberation. Second, lower layers should degrade gracefully when higher layers are unavailable. Third, learning does not have to happen at the layer that executes the fastest action: higher layers can learn parameters and transfer them downward, while lower layers execute efficiently.

4. A Five-Layer Reflex-Policy Architecture

4.1. Layer 1 - Physical Reflex

Layer 1 handles urgent, local, stereotyped events such as overcurrent, PV substring bypass, thermal shutdown, collision flags and local interlocks. It should use comparator-level readout or binary rule maps in the critical path. The preferred implementation is not a general AI accelerator but a local rule module: sensors generate low-resolution flags; the rule map is non-volatile; outputs command switches, wake signals, inhibit signals or safe-state actions.
The strongest Layer 1 candidates have three properties. They are frequent or safety-critical, they have a low-dimensional signature, and their first useful response is binary or low-resolution. Examples include opening a bypass path, inhibiting a gate driver, waking a supervisor, blocking an unsafe injection channel or entering a predefined safe state. If the decision requires scene interpretation, long-horizon prediction or semantic reasoning, it does not belong in Layer 1.
Figure 5. Binary MTJ crossbar for Layer 1 physical reflex processing. The crossbar belongs to the decision path; power electronics or actuators carry the main energy path.
Figure 5. Binary MTJ crossbar for Layer 1 physical reflex processing. The crossbar belongs to the decision path; power electronics or actuators carry the main energy path.
Preprints 217232 g005

4.2. Layer 2 - Analog Reflex

Layer 2 coordinates a small number of analog signals without high-resolution digitization. Examples include accelerometer-gyroscope tilt correction, light-temperature hysteresis, timing windows, and local multi-sensor interlocks. The implementation can be op-amp summation, hysteresis comparators, timers or compact mixed-signal logic. It expands the reflex concept from single threshold to local coordination.
Figure 6. Layer 2 analog reflex using analog summation, hysteresis and actuator drive without high-resolution digitization.
Figure 6. Layer 2 analog reflex using analog summation, hysteresis and actuator drive without high-resolution digitization.
Preprints 217232 g006
Layer 2 is included to avoid a false dichotomy between pure binary hardware and full digital control. Many physical events are not well represented by one threshold, but they still do not require high-level inference. A vestibular-like correction in a robot, for example, may combine acceleration, angular velocity and contact flags with analog weighting and hysteresis. The result is still a reflex: it is close to the body, local in scope, and designed to act before a policy layer completes a richer interpretation.

4.3. Layer 3 - Digital Reflex

Layer 3 implements deterministic learned routines with bounded timing: PID, state machines, feedforward tables, safe recovery profiles and fixed-point controllers. It may use moderate-resolution ADCs but does not require a deep model in the urgent loop. Higher policy layers can train or optimize parameters offline and deploy them to Layer 3, which executes them with predictable latency.
Layer 3 is where learning becomes executable without remaining computationally expensive. A policy layer may train a neural controller, but the deployed reflex may be a lookup table, a fixed-point approximation, a finite-state machine or a compact feedforward controller. This is important for certification and debugging. A low-level controller with bounded execution time, inspectable parameters and rollback capability is easier to validate than a large opaque model executing in the urgent loop.
Figure 7. Learning transfer from policy layers to reflex layers: slow training or optimization above, fast deterministic execution below.
Figure 7. Learning transfer from policy layers to reflex layers: slow training or optimization above, fast deterministic execution below.
Preprints 217232 g007

4.4. Layers 4 and 5 - Local and Global Policy

Layer 4 performs local optimization, context-dependent action selection and short-term adaptation. Examples include adaptive MPPT under repeatable shading, gait selection in robotics and local anomaly classification. Layer 5 performs long-term learning, fleet-level analytics, large model reasoning and strategic planning. These layers are essential, but they should configure rather than replace the first local protective response.
The distinction between local and global policy is primarily about context and update frequency. Layer 4 adapts to the device and its immediate environment: local shading patterns, terrain class, battery aging state, vibration signature or repeated fault pattern. Layer 5 aggregates many devices or long histories and returns slower improvements. This prevents cloud or fleet intelligence from being confused with real-time control. Fleet learning can improve reflex rules, but it should not be in the first protective path.
Figure 8. Energy-proportional hierarchy: inference complexity, latency and energy increase upward. Frequent events should stay in lower layers when safe.
Figure 8. Energy-proportional hierarchy: inference complexity, latency and energy increase upward. Frequent events should stay in lower layers when safe.
Preprints 217232 g008
Figure 9. Five-layer architecture with upward event compression and downward configuration. The architecture is a mutual-protection loop, not a one-way command chain.
Figure 9. Five-layer architecture with upward event compression and downward configuration. The architecture is a mutual-protection loop, not a one-way command chain.
Preprints 217232 g009

4.5. Cross-Layer Interfaces and Failure Behavior

The key interface is bidirectional. Upward communication should be compressed: events, status flags, fault codes, counters and summaries. Downward communication should be configurational: thresholds, gains, permissions, rule maps, safety envelopes and update schedules. This prevents local physics from becoming an expensive global data stream.
A practical implementation should therefore specify two contracts. The upward contract defines what a lower layer reports: event type, timestamp, confidence, local state summary and whether the reflex action succeeded. The downward contract defines what a higher layer may change: thresholds, permissions, rule-map entries, timing windows and safe defaults. These contracts make the architecture more than a metaphor and create a natural place for formal verification, logging and certification evidence.
Failure condition Expected behavior Minimum safe layer
Layer 5 unavailable Layer 4 continues local policy; Layers 1-3 remain active Layer 1/2 for safety
Layer 4 overloaded Layer 3 deterministic routines continue; urgent events remain local Layer 1/2
Layer 3 firmware fault Analog and physical reflexes hold safe states or inhibit actuation Layer 1/2
Sensor mismatch Policy detects repeated false negatives/positives and updates sensing or thresholds Layer 4/5 for diagnosis
Rule-map update error Formal constraints and rollback prevent unsafe reflex permissions Layer 1 safe default
Algorithmic design rule: assign a task to Layer k only if all lower layers k-1 cannot meet the required combination of latency, safety, precision, adaptability and EROIE. This is a design heuristic, not a theorem. Formal verification of cross-layer updates remains an open research problem.

5. Spintronics as a Candidate Reflex Substrate

5.1. Binary Availability and the Architecture Gap

The spintronic argument should be stated carefully. Binary magnetic memory technology is not imaginary: discrete MRAM, embedded MRAM and foundry routes are commercially described by vendors and foundries such as Everspin, GlobalFoundries, Samsung Foundry and Avalanche/Renesas [40,41,42,43]. These products demonstrate the industrial relevance of non-volatile binary MTJ-based technology, but they do not yet provide the Physical AI reflex-layer crossbar module proposed here.
This distinction corrects a common TRL misunderstanding. It would be too conservative to call the entire spintronic basis TRL 2-3, because MRAM and eMRAM are commercial or foundry-supported in specific memory roles. It would also be too optimistic to assign the proposed reflex architecture the same maturity, because a certified, sensor-coupled binary crossbar reflex module still has to be designed and demonstrated.
The two-level conclusion is that MTJ/MRAM cells and macros are relatively mature in memory applications, while the reflex-layer crossbar architecture is lower TRL because it requires sensing interfaces, rule-map programming, comparator readout, safety constraints and certification. Device availability makes the architecture plausible and worth prototyping; it does not make the module already available.
Technology element Current maturity Relevance to this paper Caution
Discrete MRAM / STT-MRAM Commercial products Shows non-volatile binary magnetic memory availability Memory product, not reflex module
Embedded MRAM / foundry macros Foundry-level offering in selected nodes Supports integration with CMOS control logic Access depends on node and design rules
Binary MTJ crossbar with comparator readout Research/prototype target Candidate Layer 1 rule fabric Needs reflex-specific validation
Safety-certified reflex module Future system product Potential robotics/energy/IoT component Requires verification and qualification

5.2. Decision-Power Path Isolation

The most important spintronic system principle is Decision-Power Path Isolation. The crossbar should not carry the main power flow. It receives sensor flags and emits small-signal commands. The power path remains in MOSFETs, converters, batteries, motors, bypass switches or injection stages. This is analogous to a pilot valve or railway switch: a small signal selects a route; the main energy flows elsewhere. This preserves the low-energy value of the reflex layer and makes the architecture compatible with existing MPPT, BMS and motor-driver supervisors.
Figure 10. Detailed binary MTJ crossbar for Layer 1. The crossbar stores rules and produces comparator-level decisions; the main power path is outside the array.
Figure 10. Detailed binary MTJ crossbar for Layer 1. The crossbar stores rules and produces comparator-level decisions; the main power path is outside the array.
Preprints 217232 g010
This principle is crucial for PV, BMS and robotics examples. In a PV string, the crossbar should not conduct the harvested current; it selects bypass or wake-up commands. In a battery pack, it should not become the balancing converter; it selects safe energy-routing channels under BMS permission. In a robot, it should not drive the motor phase current; it commands a gate, interlock or safe-state response. This separation keeps the reflex module small, fast and low energy while allowing conventional power electronics to do what they already do well.

5.3. Why Binary, Not Primarily Multilevel AI Acceleration

Many spintronic and resistive crossbar papers emphasize analog or multilevel in-memory computing for neural-network acceleration [44,45,46,47,48,49,50,51,52,53,54]. That is important, but it is not the main target here. Reflex-layer decisions are often naturally binary or low-resolution: allow/block, bypass/connect, wake/sleep, inhibit/enable, safe/unsafe. Binary operation reduces ADC burden, simplifies verification, improves interpretability of rule maps and aligns better with safety interlocks.
Binary operation is also attractive from a safety argument. A binary rule can be inspected as a permission matrix or interlock table. It can be tested exhaustively for a bounded number of input flags. It can default to a safe state if a contradiction or unavailable sensor is detected. Multilevel analog inference may be powerful for policy acceleration, but safety reflexes often benefit from simpler observability and clearer fault semantics.

5.4. Alternatives and Honest Comparison

Binary MTJ crossbars are not the only possible substrate. CMOS comparators are mature and should remain the baseline for many Layer 1 tasks. RRAM, FeFET, SRAM LUTs, FPGAs and analog ASICs may be superior in different trade spaces. The case for spintronics is strongest when non-volatility, instant-on behavior, low standby power, radiation or temperature tolerance, and compact rule storage matter.
The correct substrate varies by application. A CMOS comparator may be sufficient in cost-sensitive devices; non-volatility and magnetic robustness may matter more in space, harvesting, wearable or high-reliability systems. The perspective is therefore substrate-aware, not substrate-exclusive: binary spintronics is a strong candidate where its properties match the reflex function.
Substrate Strengths Weaknesses Best role in this framework
CMOS comparators Mature, cheap, certifiable Limited rule density; volatile configuration unless supported Simple Layer 1 thresholds
SRAM / LUT / FPGA Fast, programmable Volatile, leakage, configuration overhead Layer 3 deterministic reflexes
RRAM / memristor crossbar Dense, in-memory compute Variability, endurance and analog readout challenges Research alternative for Layers 1-4
FeFET arrays CMOS-friendly, non-volatile Endurance and process maturity vary Alternative non-volatile rule memory
Binary MTJ / MRAM crossbar Non-volatile, fast read, high endurance potential, magnetic robustness Reflex crossbar not yet productized; design/safety validation needed Layer 1 rule maps and configuration storage

6. Application Windows

6.1. PV Harvesting and Battery Balancing

PV harvesting is a useful example because the source is distributed, dynamic and locally perturbed. Classical MPPT remains essential, but it often observes aggregate voltage and current after a local substring, shadow or hot-spot condition has already changed the source. A PV-side reflex module could react to local voltage, current, temperature or shading flags and command bypass, reconnection, inhibition or MPPT wake-up before the aggregate controller completes a search. The module does not replace MPPT; it improves the source that MPPT sees [54,55,56,57].
The same argument applies on the battery side. Classical BMS logic is indispensable for safety and must remain the authority for charge, discharge and fault permissions. The proposed reflex module does not bypass this authority. It acts inside a permission envelope, selecting a local route only when the BMS has allowed it and local flags confirm that voltage, temperature and fault conditions are safe. This is the mutual-protection principle in energy form: the BMS protects the reflex from unsafe routing; the reflex protects the BMS from high-rate local routing decisions.
A battery-side reflex module is analogous. Passive balancing is robust but dissipates useful energy as heat; active balancing reduces heat but adds switches, converters and control complexity [58,63]. A reflex module can act as a route selector under BMS permission: available PV or auxiliary energy is directed to the weakest safe cluster, unsafe channels are blocked, and only sparse events are escalated. The EROIE question is whether the additional sensing and decision overhead is lower than the extra useful energy captured or preserved.

6.2. Humanoid and Mobile Robotics

In robotics, Layer 1 may handle overcurrent, impact and joint-limit protection; Layer 2 may handle analog tilt or contact reflexes; Layer 3 may execute deterministic balance and recovery profiles; Layer 4 selects gait or action mode; Layer 5 updates policies from fleet experience. The value is not that lower layers are intelligent in the human sense, but that they keep high-rate safety events out of expensive policy loops. Compared with a monolithic controller, the perspective predicts lower average decision energy and stronger temporal isolation, but this must be validated experimentally.
A useful test case would be a mobile robot joint exposed to repeated impacts. A monolithic controller might digitize high-rate current and torque signals, process them through a shared real-time loop and then command a response. A reflex-policy controller would let Layer 1 clamp or inhibit immediately, let Layer 3 execute a recovery profile, and let Layer 4/5 analyze repeated patterns afterward. The expected gains would be lower worst-case latency, reduced upstream bandwidth and fewer policy interrupts. Those gains should be measured, not assumed.

6.3. Energy-Harvesting IoT

In harvesting-powered IoT, the strongest EROIE constraint appears when sensing, wake-up and communication cost are comparable to harvested energy. A Layer 1 voltage comparator can wake a radio only when an ultracapacitor has enough energy. Layer 2 can alter duty cycle based on vibration amplitude or light level. Layer 3 packages data. Layer 4 classifies anomalies only when needed. Layer 5 aggregates long-term health information. Here the reflex-policy architecture is mainly a strategy for avoiding wasted wake cycles and unnecessary data movement.
The IoT example also shows why EROIE is not identical to low-power electronics. A node may contain a very efficient microcontroller and still be system-negative if it wakes too often, samples too much, transmits too much, or spends harvested energy discovering that no useful event occurred. A reflex layer can improve net value by deciding when not to wake the rest of the system. In that sense, inhibition is as important as activation.

7. Comparison with Existing Architectures

The proposed framework overlaps with prior architectures but emphasizes a different target: energy and ADC partitioning along the physical chain. The table below is qualitative, not a benchmark. It compares locality of actuation, explicit energy/ADC partitioning, interface structure, hardware-reflex suitability and validation maturity.
The comparison should be read cautiously. Brooks subsumption was not designed as an energy-harvesting or ADC-budget framework; ROS 2 is middleware, not a physical reflex fabric [59]; and mixed-criticality systems provide scheduling and assurance concepts rather than an energy-path allocation rule [60]. These approaches are complementary precedents.
Architecture What it already provides What is missing for this paper’s goal Relation to proposed framework
Brooks subsumption Layered behavior and robust local control [7,8] Not explicitly ADC/energy/EROIE driven Historical precedent for layered competence
Hierarchical RL Temporal abstraction and sub-policy learning [9] Often compute-centric and not hardware-local Useful for policy-to-reflex training
ROS 2 / micro-ROS Modular middleware and embedded integration [59] Middleware does not define physical reflex rule fabrics Complementary system software
Mixed-criticality systems Temporal isolation and criticality-aware scheduling [60] Mostly software/scheduling boundary, not energy-return metric Useful verification precedent
Neuromorphic / SNN Event-driven low-energy computation [16,17,18] Does not by itself assign events to physical energy paths Candidate implementation style
Physical neural networks Computation in physical substrates [19,20] Usually focuses on learned inference, not safety reflex rules Adjacent hardware paradigm
Proposed reflex-policy ADC-light/ADC-heavy partition, mutual protection, EROIE Needs prototype and formal validation Perspective and research roadmap

7.1. Tightened Novelty Claims

The novelty should be bounded in four layers. Known: hierarchical control, biological reflex analogies, edge/cloud decomposition, neuromorphic efficiency and mixed-criticality scheduling are established. Synthesized here: an explicit Reflex-Policy vocabulary tied to ADC-light versus ADC-heavy partitioning. Proposed as an engineering principle: assigning each physical event to the lowest sufficient layer along the sensing-energy-action chain, with upward event compression and downward rule configuration. Hypothesized but not yet proven: binary MTJ crossbars may be especially suitable as non-volatile Layer 1 rule fabrics for selected Physical AI systems.
This bounded novelty statement is intentionally more modest than saying that the framework is the first Physical AI architecture. It is not. The sharper claim is that Physical AI needs a named architectural layer for low-resolution physical reflex modules, a measurable energy-return metric for scale-down, and a technology roadmap that treats binary non-volatile rule fabrics differently from high-precision AI accelerators. This is a narrower claim, but it is more defensible.

7.2. Validation Roadmap

A credible research program should now move from perspective to evidence. Priority experiments are: (i) a Layer 1 reflex demonstrator comparing CMOS comparator, SRAM/LUT and binary MTJ-like rule maps; (ii) a PV or BMS simulation with EROIE accounting; (iii) a robotic safety-loop benchmark measuring latency, energy, event compression and failure behavior; and (iv) formal constraints for rule-map updates so that policy cannot accidentally disable safety reflexes.
The first validation should not attempt to prove the entire five-layer hierarchy at once. The recommended flagship path is a constrained demonstrator such as PV substring bypass, because it naturally exposes local sensing, event latency, bypass/routing decisions and EROIE accounting. A parallel robotic overcurrent demonstrator would be equally valuable for safety timing. In either case, three variants should be compared: centralized software, CMOS reflex and non-volatile crossbar-like reflex. The measured outputs should include event latency, energy per decision, useful energy recovered, number of upward events, false positives, false negatives and fail-safe behavior after higher-layer loss.

7.3. Limitations

The framework has important limitations. The current paper does not prove optimal layer allocation. It does not present measured MTJ reflex hardware. Biological analogy motivates but does not validate the engineering design. EROIE depends strongly on system boundaries and must be reported transparently. Cross-layer verification may also become combinatorially difficult as sensors, rule-map entries and update permissions grow. Finally, safety certification may favor simpler CMOS or mechanical reflexes over spintronics in some applications. These limitations do not weaken the perspective; they define the next validation steps.
Another limitation concerns sensor adequacy. As the fly example suggests, a reflex cannot respond to what it cannot sense. A poorly placed sensor, a slow transducer, a missing airflow cue, or a misleading aggregate measurement can make a local reflex ineffective regardless of how elegant the rule fabric is. Therefore, the architecture must be evaluated as a sensing-decision-actuation module, not only as a computing array.

8. Conclusion

Physical AI should not only compute policies for bodies. It should distribute intelligence along the physical energy and action chain. Reflex layers and policy layers collaborate: policy learns and constrains; reflex executes and protects. The practical value is avoiding unnecessary ADC conversion, communication, wake-up, balancing heat and centralized decision load.
Binary spintronic crossbars are positioned as a promising Layer 1 substrate because MTJ/MRAM technology exists in commercial and foundry contexts, while non-volatile local rule maps match many reflex decisions. The paper does not claim that a reflex crossbar is already mature; it argues that the gap between available binary spintronic memory and application-specific reflex modules is a meaningful research opportunity.
The proposed EROIE formalization extends energy proportionality into embodied system integration. For scaled-down energy-harvesting robots, PV-BMS nodes and IoT devices, intelligence is useful only if recovered, protected or operational energy exceeds the energy spent sensing, deciding, communicating and switching. That is the central perspective of this manuscript.

Acknowledgments

This work was supported by the HORIZON-JU-Chips-2025-1-IA project NeAIxt (grant no. 101194172).

Conflicts of Interest

The authors are employed by the company IFEVS and declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acronym Table

Acronym Meaning Use in this paper
ADC Analog-to-Digital Converter Conversion avoided or minimized in low reflex layers
AI Artificial Intelligence General field
ANN Artificial Neural Network Conventional neural-network model
BEOL Back-End-of-Line CMOS integration level relevant to MTJs
BMS Battery Management System Battery supervision and safety
CMOS Complementary Metal-Oxide-Semiconductor Mainstream semiconductor technology
CPG Central Pattern Generator Biological rhythmic-control circuit
CPLD Complex Programmable Logic Device Deterministic digital reflex implementation
DSP Digital Signal Processor Embedded signal processing
EROIE Energy Returned on Invested Energy for Embodied Intelligence Scale-down metric proposed in this paper
FPGA Field-Programmable Gate Array Deterministic programmable control
HRL Hierarchical Reinforcement Learning Policy decomposition baseline
IoT Internet of Things Distributed sensing domain
MCU Microcontroller Unit Embedded controller
MPPT Maximum Power Point Tracking PV energy-harvesting control
MRAM Magnetoresistive Random-Access Memory Non-volatile spintronic memory
MTJ Magnetic Tunnel Junction Core binary spintronic device
NPU Neural Processing Unit Edge AI accelerator
PID Proportional-Integral-Derivative Classical control method
PNN Physical Neural Network Computation in physical substrates
PV Photovoltaic Solar energy harvesting
RL Reinforcement Learning Policy optimization
SNN Spiking Neural Network Event-driven neuromorphic model
TRL Technology Readiness Level Maturity framing for devices and systems
VLA Vision-Language-Action Embodied AI model family

References

  1. Sitti, M. Physical intelligence as a new paradigm. Extreme Mechanics Letters 46, 101340 (2021). [CrossRef]
  2. Ray, P. P. Physical AI: Bridging the sim-to-real divide toward embodied, ethical, and autonomous intelligence. Machine Learning for Computational Science and Engineering 2, 1 (2026). [CrossRef]
  3. Brohan, A. et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. arXiv:2307.15818 (2023).
  4. Kim, M. J. et al. OpenVLA: An open-source vision-language-action model. arXiv:2406.09246 (2024).
  5. Ha, D. and Schmidhuber, J. World models. arXiv:1803.10122 (2018).
  6. Hasani, R., Lechner, M., Amini, A., Rus, D. and Grosu, R. Liquid time-constant networks. Proceedings of the AAAI Conference on Artificial Intelligence 35(9), 7657-7666 (2021). [CrossRef]
  7. Brooks, R. A. A robust layered control system for a mobile robot. IEEE Journal on Robotics and Automation 2(1), 14-23 (1986). [CrossRef]
  8. Brooks, R. A. Intelligence without representation. Artificial Intelligence 47(1-3), 139-159 (1991). [CrossRef]
  9. Botvinick, M. M. Hierarchical models of behavior and prefrontal function. Trends in Cognitive Sciences 12(5), 201-208 (2008). [CrossRef]
  10. Prescott, T. J. Forced moves or good tricks in design space? Adaptive Behavior 15(1), 9-31 (2007).
  11. Zhang, H., Solak, G. and Ajoudani, A. Bresa: Bio-inspired reflexive safe reinforcement learning for contact-rich robotic tasks. arXiv:2503.21989 (2025).
  12. Wulf, W. A. and McKee, S. A. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News 23(1), 20-24 (1995). [CrossRef]
  13. Barroso, L. A. and Hoelzle, U. The case for energy-proportional computing. Computer 40(12), 33-37 (2007). [CrossRef]
  14. Sze, V., Chen, Y.-H., Yang, T.-J. and Emer, J. S. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105(12), 2295-2329 (2017). [CrossRef]
  15. Horowitz, M. Computing’s energy problem (and what we can do about it). IEEE International Solid-State Circuits Conference Digest, 10-14 (2014). [CrossRef]
  16. Mead, C. Neuromorphic electronic systems. Proceedings of the IEEE 78(10), 1629-1636 (1990). [CrossRef]
  17. Maass, W. Networks of spiking neurons: The third generation of neural network models. Neural Networks 10(9), 1659-1671 (1997). [CrossRef]
  18. Roy, K., Jaiswal, A. and Panda, P. Towards spike-based machine intelligence with neuromorphic computing. Nature 575, 607-617 (2019). [CrossRef]
  19. Momeni, A. et al. Training of physical neural networks. Nature 645, 53-61 (2025). [CrossRef]
  20. Wright, L. G. et al. Deep physical neural networks trained with backpropagation. Nature 601, 549-555 (2022). [CrossRef]
  21. Kandel, E. R. et al. Principles of Neural Science, 6th ed. McGraw-Hill (2021).
  22. Purves, D. et al. Neuroscience, 6th ed. Oxford University Press (2018).
  23. Sherrington, C. S. The Integrative Action of the Nervous System. Yale University Press (1906).
  24. Jackson, J. H. The Croonian lectures on evolution and dissolution of the nervous system. British Medical Journal (1884).
  25. York, G. K. and Steinberg, D. A. Hughlings Jackson’s neurological ideas. Brain 134(10), 3106-3113 (2011). [CrossRef]
  26. Grillner, S. Biological pattern generation: The cellular and computational logic of networks in motion. Neuron 52(5), 751-766 (2006). [CrossRef]
  27. Grillner, S. and Robertson, B. The basal ganglia over 500 million years. Current Biology 26(20), R1088-R1100 (2016). [CrossRef]
  28. Marder, E. and Bucher, D. Central pattern generators and the control of rhythmic movements. Current Biology 11(23), R986-R996 (2001). [CrossRef]
  29. Angelaki, D. E. and Cullen, K. E. Vestibular system: The many facets of a multimodal sense. Annual Review of Neuroscience 31, 125-150 (2008). [CrossRef]
  30. Cullen, K. E. The vestibular system: Multimodal integration and encoding of self-motion for motor control. Journal of Neurophysiology 107(3), 727-738 (2012). [CrossRef]
  31. Ito, M. Cerebellar circuitry as a neuronal machine. Progress in Neurobiology 78(3-5), 272-303 (2006). [CrossRef]
  32. Wolpert, D. M., Miall, R. C. and Kawato, M. Internal models in the cerebellum. Trends in Cognitive Sciences 2(9), 338-347 (1998). [CrossRef]
  33. Albus, J. S. A theory of cerebellar function. Mathematical Biosciences 10(1-2), 25-61 (1971). [CrossRef]
  34. Marr, D. A theory of cerebellar cortex. Journal of Physiology 202(2), 437-470 (1969). [CrossRef]
  35. Doya, K. Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology 10(6), 732-739 (2000). [CrossRef]
  36. Miller, E. K. and Cohen, J. D. An integrative theory of prefrontal cortex function. Annual Review of Neuroscience 24, 167-202 (2001). [CrossRef]
  37. Card, G. and Dickinson, M. H. Visually mediated motor planning in the escape response of Drosophila. Current Biology 18(17), 1300-1307 (2008). [CrossRef]
  38. Morimoto, M. M. et al. Spatial readout of visual looming in the central brain of Drosophila. eLife 9, e57685 (2020). [CrossRef]
  39. Dorkenwald, S. et al. Neuronal wiring diagram of an adult brain. Nature 634, 124-138 (2024). [CrossRef]
  40. Everspin Technologies. MRAM product information. Retrieved May 2026 from everspin.com.
  41. GlobalFoundries. Embedded memory and MRAM technology information. Retrieved May 2026 from gf.com.
  42. Samsung Foundry. Specialty technology: eMRAM. Retrieved May 2026 from semiconductor.samsung.com.
  43. Renesas Electronics. IDT offers Avalanche Technology’s MRAM devices. News release (2019). Retrieved May 2026 from renesas.com.
  44. Apalkov, D., Dieny, B. and Slaughter, J. M. Magnetoresistive random access memory. Proceedings of the IEEE 104(10), 1796-1830 (2016). [CrossRef]
  45. Bhatti, S. et al. Spintronics based random access memory: A review. Materials Today 20(9), 530-548 (2017). [CrossRef]
  46. Grollier, J. et al. Neuromorphic spintronics. Nature Electronics 3, 360-370 (2020). [CrossRef]
  47. Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428-431 (2017). [CrossRef]
  48. Borders, W. A. et al. Integer factorization using stochastic magnetic tunnel junctions. Nature 573, 390-393 (2019). [CrossRef]
  49. Manipatruni, S. et al. Scalable energy-efficient magnetoelectric spin-orbit logic. Nature 565, 35-42 (2019). [CrossRef]
  50. Sebastian, A., Le Gallo, M., Khaddam-Aljameh, R. and Eleftheriou, E. Memory devices and applications for in-memory computing. Nature Nanotechnology 15, 529-544 (2020). [CrossRef]
  51. Ielmini, D. and Wong, H.-S. P. In-memory computing with resistive switching devices. Nature Electronics 1, 333-343 (2018). [CrossRef]
  52. Burr, G. W. et al. Neuromorphic computing using non-volatile memory. Advances in Physics: X 2(1), 89-124 (2017). [CrossRef]
  53. Gokmen, T. and Vlasov, Y. Acceleration of deep neural network training with resistive cross-point devices. Frontiers in Neuroscience 10, 333 (2016). [CrossRef]
  54. Esram, T. and Chapman, P. L. Comparison of photovoltaic array maximum power point tracking techniques. IEEE Transactions on Energy Conversion 22(2), 439-449 (2007). [CrossRef]
  55. Patel, H. and Agarwal, V. Maximum power point tracking scheme for PV systems operating under partially shaded conditions. IEEE Transactions on Industrial Electronics 55(4), 1689-1698 (2008). [CrossRef]
  56. Silvestre, S., Boronat, A. and Chouder, A. Study of bypass diodes configuration on PV modules. Applied Energy 86(9), 1632-1640 (2009). [CrossRef]
  57. Daliento, S., Mele, L. and Spirito, P. Analysis and modeling of hot spot phenomena in photovoltaic modules. IEEE Transactions on Electron Devices 59(3), 727-734 (2012).
  58. Cao, J., Schofield, N. and Emadi, A. Battery balancing methods: A comprehensive review. IEEE Vehicle Power and Propulsion Conference (2008). [CrossRef]
  59. Macenski, S., Foote, T., Gerkey, B., Lalancette, C. and Woodall, W. Robot Operating System 2: Design, architecture, and uses in the wild. Science Robotics 7(66), eabm6074 (2022). [CrossRef]
  60. Burns, A. and Davis, R. I. A survey of research into mixed criticality systems. ACM Computing Surveys 50(6), Article 82 (2017). [CrossRef]
  61. Murphy, D. J. and Hall, C. A. S. Year in review - EROI or energy return on energy invested. Annals of the New York Academy of Sciences 1185, 102-118 (2010). [CrossRef]
  62. Hall, C. A. S., Lambert, J. G. and Balogh, S. B. EROI of different fuels and the implications for society. Energy Policy 64, 141-152 (2014). [CrossRef]
  63. Safari, A., Sorouri, H., Oshnoei, A. and Blaabjerg, F. A state-of-the-art review on battery cell balancing strategies. Discover Energy 5, 31 (2025). [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated