Neural Reward Processing in Digital Addiction: A Dynamical Systems Theory of Reward Instability

Anna Makarewicz; Remigiusz Recław; Elżbieta Grzywacz; Krzysztof Chmielowiec; Łukasz Jaworski; Marta Kuczak-Wójtowicz; Jolanta Chmielowiec

doi:10.20944/preprints202604.1880.v1

Submitted:

24 April 2026

Posted:

28 April 2026

You are already at the latest version

Abstract

Behavioral addiction in digital environments is an increasingly relevant neurobehavioral phenomenon characterized by persistent engagement with high-frequency, algorithmically optimized reward stimuli. Although neural correlates of addictive behaviors have been widely studied, current models only partly explain how modern reinforcement environments reorganize behavior at the systems level. This review introduces Reward Instability Theory, a conceptual dynamical systems framework proposing that behavioral addiction may emerge as an attractor-like state within distorted reward landscapes shaped by high-density and high-variance reinforcement signals. The model shifts focus from static behavioral descriptions toward a systems account of motivation involving reinforcement learning, salience attribution, executive control, and environmental reward structure. We propose that digital environments may increase reinforcement density and reward variance, promoting dominant reward peaks and reducing behavioral diversity. To formalize these dynamics, we outline the Behavioral Reward Instability Index (BRII) as a heuristic systems construct integrating individual reward sensitivity, environmental reinforcement structure, and behavioral variability. The framework also situates established addiction models—including incentive sensitization, habit formation, and allostatic regulation—within a shared dynamical architecture. In addition, digital phenotyping is discussed as a potential empirical strategy for testing reward instability, while acknowledging limitations related to signal noise, ecological validity, bias, and privacy.

Keywords:

behavioral addiction

;

reward processing

;

reward instability

;

reward landscape

;

attractor dynamics

;

reinforcement learning

;

dopamine signaling

;

computational psychiatry

;

digital addiction

;

digital phenotyping

Subject:

Medicine and Pharmacology - Psychiatry and Mental Health

1. Introduction

Digital environments have created an emerging and highly influential context of human–reward interaction. Social media platforms, online gaming systems, and algorithmically curated content streams deliver reinforcement signals at exceptional levels of temporal density, variability, and scale. Rather than simply increasing exposure to rewarding stimuli, these systems may alter the statistical structure of reward distributions encountered by the brain [1,2,3].

Problematic engagement with digital platforms has become an increasingly relevant mental health concern, particularly among adolescents and young adults. Excessive or dysregulated digital engagement has been associated with emotional distress, sleep disruption, attentional difficulties, functional impairment, and compulsive behavioral patterns [3,4].

Converging evidence suggests that such environments strongly engage core neurobiological systems involved in motivation and reward processing. High-frequency and variable reinforcement signals may repeatedly recruit dopaminergic learning mechanisms, enhance salience attribution processes, and challenge executive control systems responsible for behavioral regulation [4,5,6,7]. These interactions have been associated with heightened reward responsivity alongside reduced regulatory control in individuals exhibiting problematic digital engagement, supporting the view that some forms of excessive digital behavior may share features with behavioral addiction [8,9].

However, existing theoretical frameworks have largely focused on specific mechanisms, including dopaminergic sensitization, impaired executive control, and maladaptive habit formation. Although these models provide important insights into neural processes underlying addictive behavior, they only partly explain how environmental reward structures may reorganize behavior at the level of system dynamics [10,11,12,13].

In particular, current accounts do not fully explain why sustained exposure to high-density digital reinforcement may progressively narrow behavioral repertoires toward a limited set of highly rewarding activities, or why such patterns may become increasingly stable and resistant to change over time. From a systems perspective, this limitation reflects the relative lack of models capable of capturing non-linear transitions, loss of behavioral flexibility, and attractor-like dynamics in behavior [14,15,16].

This gap highlights a broader conceptual limitation: while existing models effectively describe individual components of addictive behavior, they provide limited insight into the emergent, system-level dynamics arising from the interaction between neurobiological mechanisms and engineered reward environments.

To address this gap, we adopt a dynamical systems perspective in which behavior is conceptualized as trajectories evolving within a structured reward landscape. Within this framework, behavioral selection emerges from probabilistic competition among actions, shaped by reinforcement learning processes and constrained by both neurobiological parameters and environmental inputs [17,18,19].

In this review, we propose that behavioral addiction in digital environments may be conceptualized as an attractor-like state emerging within distorted reward landscapes shaped by high-density reinforcement signals. At the same time, high engagement with digital environments should not be equated with pathology, as adaptive, recreational, and socially meaningful forms of use may arise within the same platforms.

This perspective reframes addiction not as a discrete pathological condition, but as a system-level phenomenon influenced by reward landscape dynamics rather than isolated behavioral or neural dysfunctions. Digital environments may alter reward landscapes by increasing reinforcement density and amplifying reward variance, thereby promoting dominant reward peaks and the progressive reduction of behavioral diversity. Under such conditions, behavioral trajectories may become biased toward a limited set of highly reinforced states, resulting in relatively stable attractor-like configurations.

Importantly, this framework is not intended to replace existing models of addiction. Rather, it offers an integrative architecture that links established mechanisms—such as incentive sensitization, habit formation, and allostatic adaptation—within a unified dynamical perspective [20,21,22].

To formalize these processes, we introduce the Behavioral Reward Instability Index (BRII) as a conceptual construct capturing interactions between biological reward sensitivity, environmental reinforcement structure, and behavioral variability. Unlike linear behavioral metrics, the BRII is framed as a non-linear system variable reflecting relative proximity to instability and attractor-like dominance [14,15,16].

Finally, although the present work is primarily theoretical, we discuss digital phenotyping as a potential empirical approach for examining reward-driven behavioral dynamics in real-world settings, while acknowledging important limitations related to signal noise, ecological validity, and data bias [23,24].

2. Neural Mechanisms of Reward Processing in Digital Addiction

Understanding behavioral addiction in digital environments requires moving beyond isolated neural mechanisms toward a systems-level account of interacting neurocomputational processes. Human motivation arises from the dynamic interplay between reinforcement learning systems, salience attribution networks, and executive control mechanisms, all operating in continuous interaction with environmental inputs [25,26,27].

Importantly, these processes are further shaped by individual differences in reward sensitivity, including genetic, neurobiological, and trait-related factors that may influence how strongly reinforcement signals affect behavior [21,22]. Such variability helps define the effective parameter space within which behavioral dynamics unfold, contributing to differences in susceptibility to maladaptive reward-driven patterns.

The interaction between core neurobiological systems and structured environmental reinforcement is illustrated schematically in Figure 1. In this model, dopaminergic reinforcement learning, salience attribution, and executive control processes interact with high-density and high-variance digital reinforcement signals. Together, these influences may shape the topology of the reward landscape by modulating reward gradients, attentional weighting, and behavioral flexibility. Under sustained reinforcement, the system may become progressively biased, increasing the likelihood that behavioral trajectories converge toward dominant attractor-like states.

Digital environments may exert a disproportionate influence on behavior because they can simultaneously and repeatedly engage all three systems under conditions of rapid, stochastic, and persistent reinforcement [4,5,6,7,23,24]. This creates a regime in which learning processes are continuously updated, attentional systems are repeatedly captured, and regulatory control mechanisms are persistently challenged. From a dynamical perspective, such interactions may promote sustained perturbation of system equilibrium and progressively bias behavior toward highly reinforced reward states.

2.1. Dopaminergic Reinforcement Learning Under High-Density Stimulation

At the core of motivational regulation lies the dopaminergic reinforcement learning system, in which midbrain neurons encode reward prediction errors that update expected value representations of actions, cues, and stimuli [1,6]. These processes are central to reward learning and adaptive decision-making, particularly within mesolimbic pathways linking the ventral tegmental area and ventral striatum [1,6].

Digital environments may substantially modify this reward-learning regime. Variable reinforcement schedules—implemented through notifications, social feedback, intermittent rewards, and algorithmically curated content—can generate persistent and stochastic prediction-error signals that prolong dopaminergic responsivity over extended periods [17,18].

As a consequence, the system may operate under conditions of continuous micro-reinforcement, promoting amplification of reward representations associated with frequently reinforced behaviors, particularly within the ventral striatum and related corticostriatal circuits [3,21]. Repeated exposure to such contingencies may also bias valuation processes toward immediately accessible rewards [21,22].

Within the reward landscape framework, these processes may be conceptualized as a steepening of reward gradients, thereby increasing the probability that behavioral trajectories converge toward highly rewarded regions of the landscape.

2.2. Salience Attribution and Attentional Capture

Reinforcement learning operates in conjunction with neural systems responsible for assigning attentional priority to stimuli. The salience network—centered on the anterior insula and anterior cingulate cortex—plays a critical role in detecting behaviorally relevant signals, monitoring internal and external demands, and allocating cognitive resources accordingly [5,7].

Digital environments are often designed to strongly engage this system [13]. Notifications, visual cues, novelty signals, and social validation cues may function as salience amplifiers and predictive stimuli, thereby enhancing attentional capture and reinforcing reward-learning processes [13].

Repeated exposure to such stimuli may recalibrate salience attribution processes, contributing to persistent attentional biases toward digital cues and increasing susceptibility to context-dependent triggers. As a result, users may become more likely to re-engage with previously reinforced behaviors even in the absence of strong intrinsic motivation [4,7,13].

Within the reward landscape framework, these processes may be conceptualized as a reweighting of perceptual and motivational inputs, increasing the prominence, accessibility, and behavioral pull of specific reward peaks.

2.3. Executive Control and System Imbalance

Adaptive behavior depends on executive control systems capable of regulating impulsive responses, maintaining goals, and supporting long-term decision-making. These systems, primarily associated with prefrontal cortical regions—including the dorsolateral prefrontal cortex, orbitofrontal cortex, and anterior cingulate cortex—provide top-down modulation of reward-driven processes [5,10].

In addiction-related conditions, a characteristic imbalance may emerge between bottom-up reward mechanisms and top-down regulatory control. Empirical studies suggest reduced functional connectivity and less efficient coordination between prefrontal control regions and reward-processing systems in individuals with problematic digital engagement [14,15].

Such dysregulation may impair inhibitory control, increase cue-reactivity, and reduce the capacity to prioritize delayed or non-digital rewards over immediately available reinforcement. From a behavioral perspective, this may contribute to repetitive engagement despite awareness of negative consequences [5,10].

Within the present framework, this imbalance can be interpreted as a shift in the relative weighting of forces within the reward landscape, whereby dominant reward peaks progressively reduce the effective accessibility of alternative behavioral trajectories.

2.4. Individual Differences in Reward System Dynamics

Beyond circuit-level mechanisms, individual differences in susceptibility to reward-driven behavior are also influenced by genetic, neurobiological, and trait-related variability affecting dopaminergic signaling, prefrontal regulation, impulsivity, and reward sensitivity [21,22]. Such factors may help explain why similar digital environments produce markedly different behavioral outcomes across individuals.

Rather than determining behavior directly, these influences are better understood as modulators of core system parameters, including reward sensitivity, responsiveness to reinforcement variability, delay discounting tendencies, and the efficiency of executive control. This perspective aligns with contemporary approaches in computational psychiatry, in which biological variability is conceptualized as influencing system parameters rather than exerting deterministic effects [25,26,27].

The impact of such variability may become particularly relevant in modern environments characterized by high-density and high-variance reinforcement. Under these conditions, even subtle differences in dopaminergic responsivity, inhibitory control, or sensitivity to immediate reward may contribute to disproportionate divergence in behavioral trajectories over time [25,26,27].

Accordingly, vulnerability to maladaptive digital engagement may reflect the interaction between environmental reinforcement structures and pre-existing differences in reward-system dynamics, rather than any single biological determinant. Key factors potentially influencing reward sensitivity and system organization are summarized in Table 1.

Within the present framework, genetic and neurobiological influences are conceptualized as modulators of system-level parameters governing reward sensitivity, reinforcement learning gain, and executive regulation. Rather than directly determining behavior, they may shape the topology of the reward landscape and alter the probability and trajectory of convergence toward attractor-like states.

2.5. Integration with Core Theories of Addiction

Reward Instability Theory is intended as an integrative framework that situates established models of addiction within a shared dynamical structure rather than replacing them. From this perspective, influential theories may be viewed as describing complementary processes operating at different levels of reward-system organization.

Incentive sensitization theory emphasizes the progressive amplification of motivational salience associated with specific stimuli and cues [4]. Within the present framework, this process may be interpreted as a local steepening of reward gradients, increasing the behavioral pull of selected reward peaks.

Habit formation models describe the transition from goal-directed action to increasingly automatic and repetitive behavior through repeated reinforcement [8]. In dynamical terms, this may correspond to the stabilization and deepening of attractor-like states, making previously reinforced trajectories more likely to recur.

Allostatic models emphasize long-term shifts in baseline reward processing, often accompanied by reduced sensitivity to alternative rewards and compensatory behavioral seeking [10]. Within the reward landscape framework, such changes may be conceptualized as a broader deformation of the landscape that further biases behavior toward dominant reward regions.

Dual-process perspectives, which distinguish impulsive reward-driven responding from reflective regulatory control, may also be incorporated into this framework as changes in the balance between bottom-up attraction forces and top-down behavioral regulation [28,29].

Taken together, these mechanisms are better understood as complementary rather than competing accounts, each describing distinct yet interacting components of a unified dynamical system underlying addictive behavior.

2.6. Toward a Unified Neurocomputational Perspective

Taken together, these mechanisms may be understood as components of a coupled neurocomputational system in which dopaminergic learning amplifies reward signals, salience networks prioritize selected inputs, executive systems regulate behavior, and individual variability modulates overall system sensitivity—all within environments that continuously reshape reinforcement structure.

Within this framework, behavior emerges as trajectories evolving across a dynamically changing reward landscape. Under conditions of high-density and high-variance reinforcement, the landscape may become progressively distorted, biasing behavior toward a limited set of highly reinforced states while reducing behavioral flexibility.

This perspective shifts the explanatory focus from isolated dysfunctions toward emergent system dynamics, thereby providing a bridge between neurobiological mechanisms and large-scale behavioral organization. It also aligns with contemporary approaches in computational psychiatry, which increasingly conceptualize mental disorders as disturbances of interacting system-level processes rather than solely as localized neural deficits [25,26,27].

3. Reward Landscape Distortion: A Dynamical Systems Perspective on Behavioral Addiction

3.1. Behavioral Systems as Reward Landscapes

To better explain the emergence of behavioral addiction in digital environments, it may be useful to move beyond linear descriptions of behavior toward a state-space formulation. Within this perspective, behavior is not viewed as a sequence of isolated choices, but as a continuous trajectory evolving within a high-dimensional space of competing action possibilities [14,15,16,30].

We conceptualize this space as a reward landscape, in which each behavioral configuration corresponds to a state associated with an expected reward value. Behavioral dynamics can therefore be understood as probabilistic trajectories moving across this landscape, shaped by reinforcement learning processes and constrained by both internal neurobiological parameters and external environmental inputs [17,18,19].

Importantly, the reward landscape is used here as a formal dynamical construct describing probabilistic state-space organization rather than as a purely descriptive metaphor. It is intended as a heuristic framework for integrating reward valuation, behavioral flexibility, and environmental structure within a common systems perspective.

This formulation can be related conceptually to energy landscapes used in statistical physics and biophysics, where system dynamics are described as transitions toward local minima of an energy function [19,25,31]. By analogy, if reward is treated as an inverse potential, behavioral systems may be viewed as tending toward local maxima of expected reward value, which function as attractor-like states within the landscape.

Under ecologically typical conditions, reward landscapes may exhibit relatively distributed topographies, with multiple comparable peaks corresponding to diverse behavioral domains such as social interaction, learning, physical activity, and rest. Such a distributed structure may support behavioral flexibility and relatively high entropy in action selection, understood here as diversity of behavioral allocation rather than a strict thermodynamic measure [30,31,32].

Crucially, however, this balance is not assumed to be intrinsic to the organism alone, but may emerge from the statistical structure of environmental reinforcement and its interaction with individual neurobiological sensitivity.

3.2. Distortion Through Reinforcement Density and Variance

Digital environments may perturb the geometry of the reward landscape by altering two fundamental statistical properties of reinforcement: density and variance. We operationalize reinforcement density as the temporal compression of reward events and variance as the stochasticity of reward delivery. Within the present framework, these variables are proposed as key environmental drivers of reward-system reorganization. They are emphasized because they jointly determine how often value estimates are updated and how uncertain future rewards remain.

First, digital systems often increase reinforcement density. In contrast to many offline environments—where rewards may be delayed, effort-dependent, or temporally sparse—digital platforms compress reinforcement into dense temporal sequences. Notifications, continuous content streams, rapid social feedback, and low-friction access reduce the temporal distance between reward events, thereby shortening the behavioral path required for reward acquisition. This may accelerate convergence dynamics and bias behavior toward immediately accessible reward states [21,22].

Second, digital environments may amplify reward variance through stochastic and unpredictable reinforcement schedules. Variable reward structures, well established in reinforcement learning theory, can sustain dopaminergic prediction-error signaling by limiting rapid value saturation. While moderate variability may initially promote exploration [17,18,19], persistent high variance may ultimately favor selective amplification of behaviors repeatedly associated with salient or frequent rewards [20,21].

Taken together, these processes may reshape the reward landscape. Reinforcement becomes both frequent and unpredictably distributed, sustaining continuous learning while disproportionately strengthening selected behavioral pathways. As reinforcement density and variance increase, the system may shift from a distributed reward regime toward a more asymmetric configuration dominated by a limited number of behavioral states.

This transformation may be understood as a reorganization of the behavioral state space, creating conditions favorable to attractor formation and progressive reduction of behavioral diversity. Figure 2 illustrates how increasing reinforcement density and variance may reshape the reward landscape from a distributed configuration toward a more skewed structure dominated by a limited number of reward peaks.

Schematic representation of how environmental reinforcement structure may reshape the topology of the reward landscape. Under low-density and low-variance conditions, the landscape exhibits a distributed configuration with multiple comparable reward peaks, supporting behavioral diversity and flexible exploration. As reinforcement density and variance increase, reward gradients may become steeper and more asymmetric, promoting the emergence of dominant attractor regions.

Such structural distortion may have direct consequences for behavioral dynamics. As the landscape becomes increasingly asymmetric, behavioral trajectories may become progressively constrained, converging toward dominant regions of the landscape. This consequence is illustrated schematically in Figure 3, which depicts the transition from distributed behavioral exploration to attractor-dominated dynamics.

Schematic representation of behavioral trajectories under increasing reinforcement density and variance. (A) In a distributed regime, the reward landscape contains multiple comparable peaks, supporting diverse behavioral trajectories and high variability. (B) As reinforcement density and variance increase, the landscape becomes progressively distorted, leading to partial convergence of trajectories toward emerging high-reward regions. (C) In an attractor-dominated regime, a single dominant reward peak captures behavioral trajectories, resulting in reduced variability, increased persistence of behavior, and diminished disengagement capacity. Arrows indicate the directionality of behavioral trajectories within the evolving reward landscape.

3.3. Emergence of Dominant Reward Peaks

Within the proposed framework, the interaction between reinforcement density and reinforcement variance may contribute to the emergence of dominant reward peaks—regions of the landscape associated with disproportionately high expected reward values relative to competing alternatives.

Such peaks may arise through the convergence of several reinforcing factors, including high temporal frequency of reward delivery, strong attentional salience mediated by predictive cues, and low energetic or cognitive cost of engagement. When these factors co-occur, the expected reward associated with specific behaviors may become substantially elevated compared with alternative activities that require greater effort, delay, or sustained commitment [21,22].

This growing asymmetry may produce a steepening of reward gradients, such that even small deviations toward high-reward behaviors increase the probability of rapid convergence into their corresponding basins of attraction. In practical terms, behaviors offering immediate, salient, and repeatedly accessible rewards may become progressively easier to re-enter and more difficult to disengage from.

As a consequence, the reward landscape may become increasingly biased, channeling behavioral trajectories toward a narrower subset of highly reinforced states while reducing engagement with more distributed or delayed reward sources. Such dynamics may help explain how repetitive digital behaviors become progressively dominant over time.

3.4. Collapse of Behavioral Entropy

A potential consequence of increasing reward asymmetry is the progressive reduction of behavioral entropy, defined here as the diversity and dispersion of action selection across available behavioral options. In the present context, entropy is used in an information-theoretic or behavioral diversity sense rather than as a strict thermodynamic measure.

Behavioral entropy has been linked to cognitive flexibility and adaptive system dynamics, with lower entropy associated with more constrained behavioral repertoires and diminished exploratory capacity in both neural and behavioral systems [30,31,32].

In relatively distributed reward landscapes, behavioral entropy may remain comparatively high, supporting flexible transitions between activities and broader allocation of attention and effort. However, as dominant reward peaks emerge, action selection may become increasingly concentrated around a narrower set of rewarding behaviors.

This transition can be interpreted as a shift from exploration toward exploitation. Behavioral variability may function as a stabilizing property of the system; its erosion can reduce the capacity to explore alternative regions of the landscape and adapt to changing contingencies.

As behavioral entropy declines, the system may become increasingly susceptible to attractor formation, repetitive behavioral patterns, and reduced disengagement flexibility.

3.5. Attractor Formation and Phase Transition Dynamics

Within a dynamical systems framework, the processes described above may culminate in the emergence of relatively stable attractor-like states toward which behavioral trajectories converge and from which disengagement becomes progressively less likely. In this context, repeated behavioral selection may increase the likelihood of future re-selection, thereby strengthening persistence over time.

We propose that behavioral addiction may, in some cases, resemble a phase-transition-like reorganization in the structure of the reward landscape. Below a critical threshold, the system may remain in a multi-stable regime, with multiple competing attractors supporting behavioral flexibility and shifting patterns of engagement [14,15,30]. Above this threshold, the system may undergo a qualitative reorganization toward a dominant attractor characterized by:

high entry probability,
reduced exit probability,
diminished sensitivity to alternative rewards.

Such a transition is hypothesized to be shaped by the interaction between intrinsic reward sensitivity, environmental reinforcement density and variance, and behavioral variability. Importantly, these dynamics are expected to be non-linear, such that relatively small parameter changes may produce disproportionately large behavioral effects. This interpretation is broadly consistent with contemporary models of dynamical systems, tipping-point behavior, and neural state transitions [10,24].

Although presented here as a conceptual framework rather than a proven empirical law, this perspective may help explain why some individuals show abrupt shifts from flexible engagement to persistent, difficult-to-reverse behavioral patterns under sustained digital reinforcement conditions. These proposed dynamics remain hypothetical and require empirical testing using longitudinal behavioral, neurocognitive, and digital phenotyping data capable of capturing transitions in system stability over time.

3.6. Irreversibility and Path Dependence

Once established, attractor-like states may exhibit substantial path dependence, meaning that prior behavioral trajectories can influence subsequent system evolution [14,19,30]. Repeated reinforcement may progressively deepen the attractor basin, increasing persistence and reducing responsiveness to competing alternatives or external perturbations.

This perspective offers a potential mechanistic explanation for a common clinical observation: individuals may continue maladaptive behavioral patterns despite explicit awareness of negative consequences [4,10,28,29]. Within the present framework, such persistence is interpreted not solely as a failure of volitional control, but as a property of system organization in which previously reinforced trajectories become increasingly dominant over time.

As this process unfolds, alternative behavioral pathways may remain available in principle, yet become progressively less accessible in practice because they carry lower immediate reward value, require greater effort, or lack sufficient reinforcement history. This formulation may help explain why behavioral change often requires sustained restructuring of environmental contingencies rather than simple intention alone.

3.7. Synthesis: Addiction as an Emergent Property of Distorted Landscapes

The framework developed here reconceptualizes behavioral addiction as an emergent property of a coupled brain–environment system whose reward landscape may become progressively distorted over time. Rather than arising from a single cause, maladaptive behavioral persistence is viewed as the product of interacting neurobiological, behavioral, and environmental processes.

Within this perspective, neurobiological mechanisms govern learning, valuation, and plasticity, while environmental structures shape the density, variance, and accessibility of reinforcement. Under sustained high-density and high-variance reinforcement, the system may become increasingly biased toward dominant reward peaks, thereby increasing the likelihood of stable attractor-like states.

From a systems perspective, this transition may also be interpreted as a reduction in behavioral entropy, reflecting constrained exploration and reduced diversity of action selection within the available state space [30,31,32].

Accordingly, addiction is not reduced to a single mechanism or localized dysfunction, but conceptualized as a system-level reorganization of motivational dynamics driven by the interaction between neural learning processes and altered environmental reward structures.

4. Behavioral Reward Instability Index (BRII): A Heuristic Framework for Motivational Instability

4.1. Conceptual Rationale

The reward landscape framework developed above provides a conceptual account of how behavioral systems may become progressively biased under conditions of high-density and high-variance reinforcement. To move from qualitative description toward a more testable systems formulation, it is useful to introduce a higher-order variable capable of summarizing the evolving state of the system within this landscape [11,12,13,25,26,27]. In this context, the BRII is not proposed as a de novo psychometric score, but as a heuristic systems construct inspired by existing approaches in computational psychiatry, control theory, and dynamical systems modeling.

We therefore propose the Behavioral Reward Instability Index (BRII) as a heuristic system-level construct describing the degree to which a behavioral system may approach a regime of instability characterized by attractor dominance and reduced behavioral flexibility. Such formulations are broadly consistent with contemporary approaches in computational psychiatry and dynamical systems neuroscience, which often emphasize interacting state variables rather than isolated behavioral measures [25,26,27].

Importantly, the BRII is not intended as a diagnostic score, clinical label, or directly validated scalar measure. Rather, it is conceptualized as a latent systems variable reflecting the interaction between individual reward sensitivity, environmental reinforcement structure, and emergent behavioral organization [11,12,13].

In this sense, the BRII serves as a compact representation of system dynamics, capturing interactions that would otherwise require multidimensional behavioral description. Comparable approaches have been used in models of large-scale neural and behavioral dynamics, where complex processes are approximated through lower-dimensional control parameters linked to regime shifts or changes in stability [30,31,32].

Within the present framework, the BRII is proposed as an indicator of proximity to transitions between behavioral regimes, ranging from relatively distributed reward engagement to patterns increasingly dominated by a limited number of highly reinforced attractor-like states. Terms such as attractor-like states are used in an operational systems sense to denote relatively persistent behavioral configurations, without assuming strict mathematical attractors have yet been demonstrated empirically. At present, the BRII should be regarded as a hypothesis-generating framework rather than a validated measurement instrument.

4.2. Core Dimensions

Within the present framework, the BRII is conceptualized as emerging from the interaction of three coupled dimensions operating at distinct but interrelated levels of analysis: individual reward sensitivity, digital reward exposure, and behavioral variability. These three dimensions are intended as parsimonious core domains rather than an exhaustive ontology of motivational determinants; additional modulators such as stress, affective state, sleep disruption, or social context may be incorporated in future model extensions.

Individual Reward Sensitivity (IRS).

This dimension reflects variability in responsiveness to rewarding stimuli, shaped by genetic, neurobiological, and trait-related factors such as dopaminergic responsivity, impulsivity, delay discounting tendencies, and executive regulation. Such factors may modulate the effective gain of reinforcement learning processes and influence how strongly reward differences affect behavior. Variability in reward sensitivity has been associated with differences in reward learning, impulsivity, and addiction vulnerability [6,21,22,28,29]. IRS should be interpreted as a higher-order latent sensitivity construct rather than a single biological trait, potentially decomposable into separable subcomponents in future empirical work.

Digital Reward Exposure (DRE).

This dimension captures the density, immediacy, and stochastic structure of reinforcement signals in digital environments, including the frequency, timing, and unpredictability of reward delivery. Reinforcement learning research suggests that both reward density and variance can substantially influence learning dynamics, cue salience, and value updating [17,18,19].

Behavioral Variability (BV).

This dimension reflects the diversity and dispersion of behavioral engagement across activities. It is conceptualized as a stabilizing influence that may counteract convergence toward a single dominant behavioral state. Behavioral entropy and variability have been linked to cognitive flexibility and adaptive system dynamics in both neural and behavioral domains [25,26,30,31,32].

Together, these components define a coupled system in which individual reward sensitivity modulates responsiveness to reinforcement, digital reward exposure shapes environmental input intensity and variability, and behavioral variability influences resilience through distributed engagement. Crucially, instability is hypothesized to emerge from the interaction of these components rather than from any single factor in isolation. This interaction-based perspective is consistent with contemporary models of addiction and decision-making that emphasize non-linear interplay between biological predispositions and environmental inputs [8,9,10].

4.3. Heuristic Non-Linear Formulation

A defining property of reward instability is its proposed non-linear nature. Behavioral systems exposed to high-density reinforcement may not change proportionally; instead, relatively small parameter shifts may produce disproportionately large behavioral effects. Comparable amplification dynamics have been described in reinforcement learning, neural adaptation, and other complex systems [14,19,24,30].

To represent this concept heuristically, the BRII may be expressed in simplified form as:

BRII∝ (IRS × DRE) / BV

where all components are assumed to be normalized for conceptual clarity. This expression is intended as an illustrative systems heuristic chosen to express directional relationships (amplification vs stabilization) rather than to imply a uniquely correct mathematical specification. The multiplicative form is therefore best understood as a provisional conceptual scaffold for future empirical calibration and computational implementation.

The proposed formulation reflects three core principles. First, multiplicative amplification: higher individual reward sensitivity (IRS) may increase responsiveness to environmental reinforcement (DRE), thereby enhancing reward-driven learning. Second, environmental driving force: reinforcement density and variance may continuously bias the system toward convergence on salient reward states. Third, stabilizing variability: behavioral variability (BV) distributes engagement across multiple domains, potentially counteracting excessive convergence and supporting flexibility.

The multiplicative structure further implies that identical environmental inputs may produce different outcomes depending on system configuration. For example, the impact of digital reward exposure may increase under higher reward sensitivity, whereas lower behavioral variability may amplify vulnerability to reinforcement-driven convergence. Such interaction effects are consistent with contemporary models of addiction in which risk emerges from coupled biological and environmental influences rather than from any single factor alone [8,9,10].

Accordingly, instability is hypothesized to arise when amplification processes outweigh stabilizing influences, increasing the likelihood of progressive convergence toward dominant attractor-like states. Alternative formulations—including threshold, sigmoid, or exponential functions—may ultimately prove more accurate and would require empirical calibration. At present, the BRII is best regarded as a latent systems construct rather than a directly measurable quantity. Any future operational version would require explicit normalization procedures, scaling conventions, and domain-specific calibration before numerical interpretation becomes meaningful.

4.4. Operationalization Using Digital Phenotyping

While the BRII is introduced as a conceptual systems variable, provisional operational approximations may be developed for exploratory modeling purposes, particularly using multimodal data derived from digital phenotyping [23,24]. In this context, the BRII is not assumed to be directly observable; rather, its component processes may be indirectly approximated through behavioral and contextual proxies. Potential examples are summarized in Table 2.

The BRII is conceptualized as a latent system-level construct emerging from the interaction between individual reward sensitivity, environmental reinforcement structure, and behavioral variability. Accordingly, its dimensions may be approximated using passive and active longitudinal data streams generated by smartphones, wearable devices, and digital behavioral tasks [23,24]. Such approaches may enable preliminary modeling of reward-system dynamics in real-world settings.

Examples of candidate indicators include impulsivity indices, delay discounting performance, and neurocognitive task measures as proxies of individual reward sensitivity; screen time, notification frequency, short-form content exposure, and app engagement patterns as indicators of digital reward exposure; and behavioral entropy, activity diversity, sleep regularity, mobility patterns, or app-use diversity as markers of behavioral variability. Repeated measures may also permit estimation of temporal dynamics, including recovery from perturbation, increasing variance, or slowing return to baseline following behavioral disruption.

These data sources offer a scalable—although indirect—empirical substrate for testing the present framework. However, important limitations must be acknowledged. Digital phenotyping signals are inherently noisy, context-dependent, and sensitive to device usage patterns, platform design, and missing data. Common behavioral proxies may capture only part of the underlying motivational processes, and measurement bias may differ across populations and operating systems [23,24].

Consequently, any proxy-based operationalization of the BRII should be interpreted as a structured approximation of system dynamics rather than a direct measurement of latent constructs; proxy indicators should therefore be interpreted probabilistically and at the aggregate trajectory level rather than as precise readouts of latent reward-system states. Robust validation will require longitudinal studies combining behavioral data, neurocognitive assessment, and external clinical outcomes. Any future operational BRII scores should not be used for individual-level clinical decision-making without robust prospective validation.

4.5. Limits and Future Validation

The BRII is not intended to diagnose addiction, replace clinical assessment, or define pathological thresholds in its current form. Rather, it is intended as a conceptual bridge between the theoretical framework of reward landscape dynamics and empirical observation. By representing instability as a system-level construct emerging from interacting components, it may help formalize processes that are otherwise described qualitatively [25,26,27].

The framework also naturally extends to a temporal perspective, in which instability may evolve dynamically over time (BRII(t)). From this viewpoint, repeated longitudinal assessment could examine whether early warning signals precede regime shifts, whether destabilization and recovery follow asymmetric trajectories, and whether some individuals exhibit hysteresis-like persistence after environmental conditions improve. Such patterns would be broadly consistent with critical slowing down and tipping-point dynamics described in other complex systems [14,19,24].

At present, however, the BRII remains a hypothesis-generating framework rather than a validated measurement instrument. The framework should also be considered falsifiable, insofar as longitudinal data may fail to support predicted instability dynamics or incremental predictive utility. Its components require empirical calibration, operational definitions may vary across settings, and causal interpretations should be made cautiously. Future research should test whether BRII-informed models improve prediction of behavioral persistence, relapse risk, or recovery trajectories beyond conventional single-metric indicators [11,12,13,23,24].

Validation will likely require prospective longitudinal studies integrating passive digital sensing, neurocognitive measures, ecological momentary assessment, and clinically meaningful outcomes.

If supported empirically, the BRII may offer a useful translational framework for identifying periods of elevated motivational instability and informing adaptive intervention strategies. Future comparisons should explicitly evaluate whether BRII-informed models add incremental explanatory or predictive value beyond conventional indicators such as screen time, impulsivity scores, or symptom counts. Although inspired by digital environments, the present framework may require substantial modification before extension to substance-related or non-digital behavioral addictions.

A proof-of-concept translational extension illustrating how BRII states may inform structured behavioral interpretation and intervention logic is provided in Supplementary Material S1.

5. Toward Operationalization: Digital Phenotyping and Reward Instability

5.1. Digital Phenotyping as an Empirical Substrate

The Reward Instability framework conceptualizes behavioral addiction as a system-level phenomenon emerging from interactions between individual reward sensitivity, environmental reinforcement structure, and behavioral dynamics. Although the BRII is proposed as a conceptual systems construct, its dimensions may be indirectly approximated through observable behavioral proxies.

Digital phenotyping—defined as the continuous quantification of behavior using passive and active data streams from personal digital devices—offers a potential empirical substrate for examining such dynamics [23,24]. Smartphones and wearable technologies can generate longitudinal data reflecting engagement patterns, attentional allocation, mobility, sleep regularity, and activity distribution across time [23,24].

Candidate proxies for core BRII dimensions are summarized in Table 2. Repeated measurement may additionally support assessment of temporal instability, persistence, and recovery dynamics.

Importantly, these indicators do not directly measure latent motivational constructs. They should therefore be interpreted as indirect signals approximating underlying system dynamics rather than as direct measurements.

5.2. Measurement Challenges: Noise, Validity, Bias, and Governance

Despite its promise, digital phenotyping also introduces substantial methodological constraints that must be considered when evaluating the BRII framework.

First, signal noise represents a fundamental limitation. Behavioral data collected from digital devices are often incomplete, context-dependent, and influenced by measurement artifacts. Variability in device usage, operating systems, platform design, and data resolution may obscure the underlying dynamics of interest.

Second, construct and ecological validity remain limited. Many commonly used indicators—such as screen time or app frequency—serve only as coarse proxies for complex motivational processes. High engagement does not necessarily imply reward dominance, and low engagement does not guarantee behavioral stability. The same observable behavior may reflect boredom, work demands, social connection, or maladaptive reinforcement, depending on context.

Third, sampling and systemic bias may arise from uneven data representation. Digital phenotyping datasets are often skewed toward particular demographic groups, socioeconomic strata, device ecosystems, and cultural settings. In addition, observed behavior is partly shaped by algorithmic curation and platform design, complicating causal interpretation [23,24,33,35].

Fourth, privacy, consent, and governance considerations are central. Continuous behavioral monitoring raises important ethical questions regarding informed consent, data security, transparency, and acceptable downstream use of inferred risk states.

Taken together, these limitations imply that any empirical implementation of the BRII should be interpreted as a structured approximation of system dynamics rather than a direct measurement of latent processes. Similar concerns have been highlighted in recent work on digital phenotyping and computational psychiatry [23,24].

5.3. Non-Linearity and Calibration Requirements

A key implication of the Reward Instability framework is that the relationship between observable indicators and underlying system dynamics may be inherently non-linear. Small changes in reinforcement density, environmental cues, or behavioral variability may produce disproportionately large effects depending on baseline system configuration. Consequently, simple linear aggregation of behavioral metrics may fail to detect critical transitions in motivational state. Such non-linear effects are well described in reinforcement learning and dynamical systems models of behavior [19,24,34].

Future empirical work should therefore prioritize:

non-linear modeling approaches (e.g., multiplicative, threshold-based, or sigmoid formulations),
dynamic time-series analyses capturing trajectory evolution over time,
person-specific baselines and within-subject change detection [23,24],
identification of thresholds or early warning signals associated with attractor formation.

Meaningful calibration of the BRII would likely require longitudinal datasets capable of capturing transitions between relatively stable and unstable behavioral regimes.

5.4. Scope and Limits of Operationalization

It is important to emphasize that the primary contribution of the present work is theoretical rather than operational. The proposed framework is not intended as a finalized measurement system, validated clinical tool, or instrument for immediate decision-making. Rather, it is offered as a conceptual structure for understanding reward-driven behavioral dynamics.

Accordingly, empirical approximation through digital phenotyping should be viewed as a secondary and exploratory extension intended to test, refine, and potentially falsify the theoretical model. At present, the framework is best regarded as hypothesis-generating rather than decision-guiding.

5.5. Synthesis: Measurement as Model Refinement

Digital phenotyping offers a potential pathway for linking theoretical constructs with empirical observation. Its primary value, however, may lie less in precise measurement than in enabling iterative refinement of systems-level models.

Within the present framework, empirical data may be used to:

test whether non-linear transitions occur in real-world behavioral dynamics,
identify conditions under which reward landscapes become progressively distorted,
refine the structure, calibration, and predictive utility of the BRII framework.

By maintaining a clear distinction between conceptual modeling and proxy-based approximation, the present approach avoids reducing complex motivational dynamics to simplified single-metric interpretations.

6. Discussion

6.1. From Mechanisms to System Dynamics

The present framework proposes a shift from mechanism-centered accounts of behavioral addiction toward a systems-level understanding of motivational dynamics. Rather than attributing maladaptive behavior to isolated neural dysfunctions, Reward Instability Theory conceptualizes addiction as an emergent property of a coupled brain–environment system in which reward landscapes may become progressively distorted over time.

Within this perspective, outcomes are defined less by any single behavior and more by the dynamical state of the system—namely, whether behavioral trajectories remain relatively distributed and flexible or increasingly converge toward dominant attractor-like patterns. This distinction may help explain why superficially similar behaviors can differ substantially in persistence, reversibility, and functional impact.

Accordingly, the present work shifts the unit of analysis from discrete behavioral acts to system dynamics, broadly aligning with contemporary directions in computational psychiatry that emphasize interacting state variables over isolated symptoms or static diagnostic categories [11,12,13].

6.2. Theoretical Contribution and Testable Predictions

A central contribution of the present model lies in its explicit treatment of behavioral addiction as a dynamic process unfolding over time rather than a static condition defined solely by behavioral endpoints. This perspective shifts explanation from descriptive accounts of maladaptive behavior toward a process-oriented framework grounded in system dynamics.

Although dynamical approaches have been applied in related domains, their integration with reward landscape distortion under conditions of sustained digital reinforcement remains comparatively limited. In particular, existing models have only partially addressed how environmental reinforcement structure may interact with neurobiological processes to shape the temporal evolution of behavioral trajectories [14,19].

By incorporating both reinforcement density and variance into a unified reward landscape formulation, the present framework offers a structured account of how dominant behavioral patterns may emerge through progressive reorganization of the behavioral state space. Within this perspective, addiction is viewed less as a discrete endpoint and more as a trajectory-level phenomenon reflecting convergence within a dynamically evolving system.

Importantly, the framework also generates empirically testable hypotheses. Under sustained high-density and high-variance reinforcement, behavioral systems may exhibit: (i) non-linear changes in behavioral variability rather than gradual linear change [19]; (ii) increasing persistence of dominant behavioral states under repeated reinforcement [4,5]; and (iii) early warning signals of instability, such as rising variance or slower recovery following perturbation, prior to behavioral consolidation [14,24]. These predictions may be examined using longitudinal behavioral datasets, person-specific monitoring, time-series modeling approaches, and reinforcement-learning-based computational frameworks [17,18,19,34]. Such hypotheses are also directly testable using longitudinal digital phenotyping data combined with computational modeling of behavioral trajectories (e.g., reinforcement learning agents or dynamical systems simulations).

At the same time, real-world behavioral systems are likely to be noisy, heterogeneous, and context-dependent. Accordingly, attractor-like dynamics may emerge in graded, partial, or transient forms rather than as fully stable states [14,15].

6.3. Integration Across Levels of Theory and Early Warning Signals

Reward Instability Theory also provides a common dynamical architecture through which established models of addiction may be interpreted as describing different transformations of the reward landscape. Within this perspective, incentive sensitization may correspond to local amplification of reward gradients [4], habit formation may reflect stabilization of behavioral trajectories within attractor-like regions [8], and allostatic processes may correspond to broader shifts in baseline reward valuation consistent with deformation of the overall landscape [10]. These mechanisms are therefore viewed as complementary rather than mutually exclusive components of a coupled system. Throughout the manuscript, attractor-like terminology is used as a systems heuristic describing relative persistence and recurrence of behavioral configurations rather than as proof of formally verified mathematical attractors.

An additional implication of this framework is that observable changes in behavioral dynamics may precede overt consolidation of maladaptive patterns. Rather than assuming abrupt or binary transitions, systems may exhibit gradual shifts in variability, persistence, and responsiveness as they approach regimes characterized by reduced flexibility.

In this context, increased fluctuations in behavior, slower recovery following perturbation, or progressive narrowing of behavioral diversity may be broadly consistent with dynamics described in other complex systems, including critical slowing down and path dependence [14,19]. However, such patterns should not be interpreted as definitive markers of impending transition, but as provisional indicators of changing system stability. Their reliability and specificity are likely to depend on context, measurement resolution, and individual differences.

6.4. Digital Environments and Measurement Implications

A key implication of the present framework is the reconceptualization of digital environments as active reward-shaping systems rather than passive contexts of behavior [17,18,19,35,36]. Algorithmically mediated platforms may continuously modulate reinforcement density, variance, salience, and accessibility, thereby influencing the evolving structure of the reward landscape over time [35,36]. As a result, behavioral outcomes may not be fully understood independently of the environments in which they emerge. Importantly, not all highly engaging digital behavior should be interpreted as pathological; adaptive, recreational, educational, and socially meaningful engagement may occur within the same environments.

Within this perspective, the BRII offers a conceptual pathway for translating reward instability into empirical research. However, any operationalization remains indirect. Observable behavioral metrics capture only partial projections of underlying system dynamics and may fail to reflect non-linear processes when interpreted in isolation. Future work may therefore benefit from approaches that model behavioral trajectories over time, examine potential non-linear relationships between reinforcement exposure and outcomes, and use longitudinal designs capable of detecting gradual shifts in system organization. If supported empirically, such approaches may help identify periods of elevated vulnerability and guide timing-sensitive behavioral interventions.

6.5. Limitations

Several limitations should be acknowledged. First, the framework is primarily conceptual and does not yet constitute a fully specified quantitative model. Second, constructs such as reward landscapes, attractor states, and instability indices are abstractions that simplify substantial biological and behavioral complexity. Third, digital phenotyping approaches introduce methodological constraints, including measurement noise, limited ecological validity, sampling bias, and privacy concerns [23,24]. Fourth, the present formulation simplifies broader psychosocial, developmental, and cultural determinants that may substantially shape digital behavior and addiction risk. Finally, the model has been developed with particular emphasis on high-density digital environments and may not generalize directly to other behavioral domains without modification.

6.6. Future Directions

The present framework nevertheless suggests several directions for future research. Empirical studies may examine whether real-world behavioral systems exhibit patterns broadly consistent with non-linear dynamics and attractor-like organization, particularly using high-resolution longitudinal data. From a modeling perspective, integrating reinforcement learning with dynamical systems approaches may enable more formal representations of how reward landscapes evolve over time [17,18,19,34]. More broadly, the framework highlights a potential multi-scale temporal structure in which fast fluctuations in reinforcement signals interact with slower processes of learning, adaptation, and self-regulation [14,19,30,31,32]. Understanding these cross-timescale interactions may help explain why some behavioral patterns become resistant to change despite conscious awareness.

7. Conclusions

Behavioral addiction in digital environments may reflect not only excessive engagement with specific activities, but a broader reorganization of motivational dynamics shaped by interactions between neurobiological processes and engineered reinforcement environments. In the present framework, maladaptive persistence is conceptualized as an emergent property of a coupled brain–environment system whose reward landscape may become progressively distorted under sustained high-density and high-variance reinforcement.

Reward Instability Theory extends existing accounts of addiction by shifting the unit of analysis from isolated behaviors or localized mechanisms toward system dynamics. Within this perspective, persistent behavioral patterns may arise through convergence toward attractor-like states shaped by reinforcement learning, salience processes, executive regulation, and environmental reward structure. The proposed BRII is intended as a heuristic systems construct for describing relative instability rather than as a validated diagnostic instrument.

Although primarily theoretical, the framework generates empirically testable predictions regarding non-linear change, declining behavioral variability, persistence dynamics, and early warning signals of reduced flexibility. Evaluating these predictions will require longitudinal, multimodal, and person-centered approaches capable of capturing behavior as a dynamic process unfolding over time.

More broadly, the present work suggests that the effects of digital environments on human behavior may not be fully understood without considering how platform-driven reinforcement structures shape motivational systems. Integrating neurobiological, behavioral, and environmental perspectives within a common dynamical framework may help advance future research in addiction science, digital mental health, and computational psychiatry. While developed with particular emphasis on digital environments, the underlying systems logic may also prove informative for other forms of behavioral addiction in which reinforcement density, salience, and reduced behavioral flexibility interact over time.

Author Contributions

Conceptualization, A.M. and R.R.; methodology, A.M. and R.R.; validation, A.M. and R.R.; investigation, A.M., R.R. and J.C.; resources, A.M. and R.R.; writing—original draft preparation, A.M., R.R., Ł.J. and E.G.; writing—review and editing, A.M., R.R., Ł.J., M.K.W. and E.G.; visualization, A.M.; supervision, R.R. and J.C.; project administration, A.M. and J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

The research obtained no external funding.

Institutional Review Board Statement

Not applicable. This manuscript is a narrative review and does not report original research involving humans or animals.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable. No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Deng, Y.; Song, D.; Ni, J.; Qing, H.; Quan, Z. Reward prediction error in learning-related behaviors. Front. Neurosci. 2023, 17, 1171612. [Google Scholar] [CrossRef]
Gershman, S.J.; Assad, J.A.; Datta, S.R.; Linderman, S.W.; Sabatini, B.L.; Uchida, N.; Wilbrecht, L. Explaining dopamine through prediction errors and beyond. Nat. Neurosci. 2024, 27, 1645–1655. [Google Scholar] [CrossRef]
Volkow, N.D.; Blanco, C. Substance use disorders: A comprehensive update of classification, epidemiology, neurobiology, clinical aspects, treatment and prevention. World Psychiatry 2023, 22, 203–229. [Google Scholar] [CrossRef]
Robbins, T.W.; Banca, P.; Belin, D. From compulsivity to compulsion: The neural basis of compulsive disorders. Nat. Rev. Neurosci. 2024, 25, 313–333. [Google Scholar] [CrossRef] [PubMed]
Robinson, T.E.; Berridge, K.C. The incentive-sensitization theory of addiction 30 years on. Annu. Rev. Psychol. 2025, 76, 29–58. [Google Scholar] [CrossRef]
Amo, R. Prediction error in dopamine neurons during associative learning. Neurosci. Res. 2024, 199, 12–20. [Google Scholar] [CrossRef] [PubMed]
Pezzulo, G.; Parr, T.; Friston, K.J. Active inference as a theory of sentient behavior. Biol. Psychol. 2024, 186, 108741. [Google Scholar] [CrossRef]
Wilkinson, C.S.; Luján, M.Á.; Hales, C.; Costa, K.M.; Fiore, V.G.; Knackstedt, L.A.; Kober, H. Listening to the data: Computational approaches to addiction and learning. J. Neurosci. 2023, 43, 7547–7553. [Google Scholar] [CrossRef]
Konova, A.B.; Ceceli, A.O.; Horga, G.; Moeller, S.J.; Alia-Klein, N.; Goldstein, R.Z. Reduced neural encoding of utility prediction errors in cocaine addiction. Neuron 2023, 111, 4058–4070.e6. [Google Scholar] [CrossRef]
Koob, G.F.; Schulkin, J. Addiction and stress: An allostatic view. Neurosci. Biobehav. Rev. 2019, 106, 245–262. [Google Scholar] [CrossRef]
Friston, K.J. Computational psychiatry: From synapses to sentience. Mol. Psychiatry 2023, 28, 256–268. [Google Scholar] [CrossRef]
Gordon, J.A.; Dzirasa, K.; Petzschner, F.H. The neuroscience of mental illness: Building toward the future. Cell 2024, 187, 5858–5870. [Google Scholar] [CrossRef]
Akiki, T.J.; Williams, L.M.; Wolfers, T.; Yang, Y.; Stahl, D.; Gillan, C.M. Transforming psychiatry with computational and brain-based methods. Nat. Comput. Sci. 2025, 5, 844–847. [Google Scholar] [CrossRef]
Shine, J.M.; Breakspear, M.; Bell, P.T.; Ehgoetz Martens, K.A.; Shine, R.; Koyejo, O.; Sporns, O.; Poldrack, R.A. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nat. Neurosci. 2019, 22, 289–296. [Google Scholar] [CrossRef] [PubMed]
Seguin, C.; Sporns, O.; Zalesky, A. Brain network communication: concepts, models and applications. Nat. Rev. Neurosci. 2023, 24, 557–574. [Google Scholar] [CrossRef] [PubMed]
Ashwin, P.; Fadera, M.; Postlethwaite, C. Network attractors and nonlinear dynamics of neural computation. Curr. Opin. Neurobiol. 2024, 84, 102818. [Google Scholar] [CrossRef] [PubMed]
Klein-Flügge, M.C.; Bongioanni, A.; Rushworth, M.F.S. Medial and orbital frontal cortex in decision making and flexible behavior. Neuron 2022, 110, 2743–2770. [Google Scholar] [CrossRef]
Dabney, W.; Kurth-Nelson, Z.; Uchida, N.; Starkweather, C.K.; Hassabis, D.; Munos, R.; Botvinick, M. A distributional code for value in dopamine-based reinforcement learning. Nature 2020, 577, 671–675. [Google Scholar] [CrossRef]
Dakos, V.; Boulton, C.A.; Buxton, J.E.; Abrams, J.F.; Arellano-Nava, B.; Armstrong McKay, D.I.; Bathiany, S.; Blaschke, L.; Boers, N.; Dylewsky, D.; et al. Tipping point detection and early warnings in climate, ecological, and human systems. Earth Syst. Dynam. 2024, 15, 1117–1135. [Google Scholar] [CrossRef]
Kato, A.; Shimomura, K.; Ognibene, D.; Parvaz, M.A.; Berner, L.A.; Morita, K.; Fiore, V.G. Computational models of behavioral addictions: State of the art and future directions. Addict. Behav. 2023, 140, 107595. [Google Scholar] [CrossRef]
Ceceli, A.O.; Huang, Y.; Kronberg, G.; McClain, N.; King, S.G.; Butelman, E.R.; Alia-Klein, N.; Goldstein, R.Z. The impaired response inhibition and salience attribution model of drug addiction: Recent neuroimaging evidence and future directions. Annu. Rev. Psychol. 2026, 77, 81–108. [Google Scholar] [CrossRef]
Shourkeshti, A.; Abbaszadeh, M.; Marrocco, G.; Jurewicz, K.; Moore, T.; Ebitz, R.B. Pupil size predicts exploration through critical slowing in prefrontal dynamics. Commun. Biol. 2026, 9, 103. [Google Scholar] [CrossRef]
Bufano, P.; Laurino, M.; Said, S.; Tognetti, A.; Menicucci, D. Digital phenotyping for monitoring mental disorders: Systematic review. J. Med. Internet Res. 2023, 25, e46778. [Google Scholar] [CrossRef]
Akre, S.; Seok, D.; Douglas, C.; Aguilera, A.; Carini, S.; Dunn, J.; Hotopf, M.; Mohr, D.C.; Bui, A.A.T.; Freimer, N.B.; et al. Advancing digital sensing in mental health research. Npj Digit. Med. 2024, 7, 362. [Google Scholar] [CrossRef] [PubMed]
Vasilchenko, K.F.; Chumakov, E.M. Current status, challenges and future prospects in computational psychiatry: A narrative review. Consort. Psychiatr. 2023, 4, 33–42. [Google Scholar] [CrossRef]
Badcock, P.B.; Davey, C.G. Active inference in psychology and psychiatry: Progress to date? Entropy 2024, 26, 833. [Google Scholar] [CrossRef] [PubMed]
Shafiei, A.; Jesawada, H.; Friston, K.J.; Russo, G. Distributionally robust free energy principle for decision-making. Nat. Commun. 2026, 17, 707. [Google Scholar] [CrossRef]
Boot, J.; van den Ende, M.W.J.; Wiers, R.W.; Lees, M.H.; van der Maas, H.L.J. Integrating dual-process decision making and social dynamics: A formal modeling framework for addiction. Psychol. Rev. Advance online publication. 2025. [Google Scholar] [CrossRef] [PubMed]
Strack, F.; Deutsch, R. Reflective and impulsive determinants of social behavior. Pers. Soc. Psychol. Rev. 2004, 8, 220–247. [Google Scholar] [CrossRef]
Sani, O.G.; Pesaran, B.; Shanechi, M.M. Dissociative and prioritized modeling of behaviorally relevant neural dynamics using recurrent neural networks. Nat. Neurosci. 2024, 27, 2033–2045. [Google Scholar] [CrossRef]
Findling, C.; Romand-Monnier, M.; Skvortsova, V.; Koechlin, E. Neural variability in the medial prefrontal cortex contributes to efficient adaptive behavior. Nat. Commun. 2025, 16, 11356. [Google Scholar] [CrossRef] [PubMed]
Camargo, A.; Del Mauro, G.; Wang, Z. Task-induced changes in brain entropy. J. Neurosci. Res. 2024, 102, e25310. [Google Scholar] [CrossRef] [PubMed]
Britton, G.B.; Huang, L.K.; Villarreal, A.E.; Levey, A.; Philippakis, A.; Hu, C.J.; Yang, C.C.; Mushi, D.; Oviedo, D.C.; Rangel, G.; et al. Digital phenotyping: An equal opportunity approach to reducing disparities in Alzheimer’s disease and related dementia research. Alzheimers Dement. (Amst.) 2023, 15, e12495. [Google Scholar] [CrossRef] [PubMed]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Chan, J.; Choi, F.; Saha, K.; Chandrasekharan, E. Examining algorithmic curation on social media: An empirical audit of Reddit’s r/popular feed. arXiv 2025, arXiv:2502.20491. [Google Scholar] [CrossRef]
Bakshy, E.; Messing, S.; Adamic, L.A. Exposure to ideologically diverse news and opinion on Facebook. Science 2015, 348, 1130–1132. [Google Scholar] [CrossRef]

Figure 1. Neurocomputational architecture of reward-driven behavior in digital environments.

Figure 2. Distortion of the reward landscape under increasing reinforcement density and variance.

Figure 3. Emergence of attractor dynamics in distorted reward landscapes.

Table 1. Neurobiological and individual factors influencing reward system dynamics in the Reward Instability framework.

Factor	Neurobiological Mechanism	System-Level Effects	Reward Landscape Impact
Dopaminergic signaling (e.g., DRD2, SLC6A3)	Modulation of reward prediction error encoding and synaptic plasticity	Modulates reinforcement learning gain and sensitivity to reward gradients	Steepens reward gradients, increasing convergence toward high-reward states
Prefrontal regulation (e.g., COMT)	Regulation of executive control and top-down modulation of behavior	Modulates capacity for behavioral inhibition and goal-directed control	Expands or constrains accessibility of alternative behavioral trajectories
Impulsivity and delay discounting traits	Reduced delay discounting thresholds and increased sensitivity to immediate rewards	Biases decision-making toward short-term reinforcement	Shifts system toward shallow but rapidly accessible reward peaks
Stress and allostatic load	Dysregulation of baseline reward processing and stress-related neuroadaptation	Alters baseline reward sensitivity and increases reliance on habitual responding	Globally deforms the reward landscape, reducing salience of alternative rewards
Salience attribution networks (dopaminergic–insula interactions)	Enhanced cue-triggered motivational salience	Increases attentional capture and cue-driven behavior	Amplifies prominence of specific reward peaks, reinforcing attractor formation

Examples are illustrative and not exhaustive.

Table 2. Candidate proxies for operationalizing the Behavioral Reward Instability Index (BRII).

BRII Dimension	Candidate Proxies	Data Sources	System Role	Expected Dynamic Effect
Individual Reward Sensitivity (IRS)	Impulsivity indices, delay discounting, neurocognitive performance	Behavioral tasks, cognitive testing apps	Modulates sensitivity to reward signals and amplification of reward gradients	Higher IRS may amplify responsiveness to reinforcement under high DRE
Digital Reward Exposure (DRE)	Screen time, notification frequency, short-form content exposure	Smartphone logs, app usage analytics	Shapes density and variability of environmental reinforcement	Higher DRE may accelerate convergence toward dominant reward states
Behavioral Variability (BV)	Behavioral entropy, activity diversity, sleep regularity	Wearables, GPS, app diversity metrics	Maintains distributed engagement and counteracts attractor formation	Lower BV may reduce resilience and favor convergence
Temporal Dynamics (BRII(t))	Fluctuations in activity patterns, recovery from perturbation, variance shifts	Longitudinal behavioral data	Captures time-dependent evolution of instability	Early warning signals may include increased variance and critical slowing down

Examples are illustrative and not exhaustive. Proxies require empirical validation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.