Shaping Adaptive and Sustainable Smart Urban Ecosystems

Maurizio Giacobbe; Salvatore Distifano

doi:10.20944/preprints202604.1385.v1

Submitted:

19 April 2026

Posted:

21 April 2026

You are already at the latest version

Abstract

The transition from smart to intelligent cities allows for the deployment and management of information and communication technologies in the urban context to be driven by holistic sustainability requirements rather than technical ones such as feasibility and fragmented, siloed operational patterns. This work proposes a multi-dimensional decision-making framework to manage a smart-intelligent city as an urban Cyber-Physical System across environmental, economic, and social sustainability pillars, metrics and their tradeoffs. A methodology based on Deep Reinforcement Learning and reward-shaping mechanisms is proposed to represent and assess sustainability pillar dependencies and their interplay. A case study on a Low-Power Wide-Area Network planning, deployment and management in a Sicilian municipality has been developed to demonstrate the effectiveness of the proposed approach in dealing with the dynamics and the non-linear dependencies of the sustainability pillars. The results thus obtained provide a blueprint for urban planners to develop sustainable, resilient, cost-effective, and environmentally friendly smart-intelligent city frameworks.

Keywords:

smart cities

;

sustainability

;

Deep Reinforcement Learning (DRL)

;

Deep Q-Network (DQN)

;

Cyber Physical Systems (CPS)

Subject:

Computer Science and Mathematics - Computer Science

1. Introduction

The modern urban environment is conceptualized as a dynamic, evolving, increasingly complex urban ecosystem where the convergence of physical and computational elements is essential to keep high the quality of life [1,2]. In this way, municipalities face significant open issues and challenges, often conflicting such as infrastructure degradation, inadequate service provision, growing demand, cost, and environmental impact. These challenges are central to the 11th United Nations Sustainable Development Goal (SDG 11), which mandates the creation of sustainable cities and communities that are inclusive, safe, and resilient, as illustrated in Figure 1 [19].

Despite this potential, even the sustainability of the smart city Information and Communication Technology (ICT) infrastructure is a critical concern. Current models often suffer from a lack of systematic capacity planning, resilience, and long-term sustainability strategies. Many existing infrastructures follow siloed patterns and a static deployment bias, where sensors, devices and networks are specifically designed, deployed and configured for a given vertical application, with fixed parameters regardless of the fluctuating urban context. This absence of resource reuse and modularity leads to low adaptivity, rendering such systems unsustainable in the face of rapid environmental or social shifts.

A viable solution to these limitations is the integration can be Artificial Intelligence (AI), moving towards intelligent cities [48]. This evolution represents a fundamental paradigm shift toward holistic urban ICT sustainability, where AI serves as the neural blueprint of the urban entity [3,4]. Such adaptive urbanism allows the city to function like a living organism, mutating its behavior to optimize its response to bio-social demands while minimizing its environmental footprint [10].

Related work identifies a distinction between classical computational methods, which lack the real-time adaptivity required for complex urban scenarios, and more advanced machine learning approaches. Although supervised learning is highly effective for predictive tasks, it remains limited by its reliance on extensive labeled datasets, that are rarely available for dynamic urban control [20,21,22,23]. Consequently, Reinforcement Learning (RL) has emerged as a dominant methodology, enabling autonomous agents to derive optimal strategies through direct interaction with the urban environment [24,27]. Furthermore, breakthroughs in Multi-Agent Reinforcement Learning (MARL) [25,31] and Large Language Models (LLM) [26,28,29] have introduced advanced reasoning capabilities, although balancing the computational overhead of AI with the energy constraints of edge devices remains a persistent open problem in the design of sustainable systems [30].

The main contribution of this work is the proposal of a holistic approach to multi-dimensional (environmental, economic, and social) urban ICT infrastructure sustainability, characterized by its three main pillars: environmental (

P_{e}

), economic (

P_{c}

), and social (

P_{s}

). This framework moves beyond traditional siloed patterns by prioritizing resource reuse, proper capacity planning, and multi-objective optimization. The proposed methodology is framed within the Cyber-Physical Systems (CPS) paradigm, where smart-intelligent cities are modeled as interconnected urban CPS. A real-time, adaptive AI-driven model based on Deep Reinforcement Learning (DRL) and Deep Q-Networks (DQN) [32,34,35,36] is adopted to optimize and trade-off among environmental, economic, and social sustainability metrics.

The proposed framework represents a strategic solution to address the multi-faceted nature of SDG 11. By leveraging the DQN-based CPS, the integration of the three pillars (

P_{e}, P_{c}, P_{s}

), and the related metrics, allows for a direct contribution to the operational targets of SDG 11, as summarized in Table 1. The effectiveness of this solution is demonstrated through a case study on Low-Power Wide-Area Network (LPWAN) [16] technologies, specifically the LoRaWAN specification [17].

The remainder of this work is organized as follows: Section 2 discusses the background and related work; Section 3 presents the proposed approach and sustainability metrics; Section 4 details the experimental case study; Section 5 analyzes the results of our investigation; and Section 6 concludes the work with future perspectives.

2. Background and Related Work

The smart governance of the city in the Society 5.0 era, put humans at the center of decision processes, supported by CPS [11] as digital bridge between people (policymakers, citizens) and digital services, thus creating an adaptive urban environment. The smart governance must be supported by adequate infrastructure, institutional quality, and inclusive policies [9]. In such a context, social sustainability (

P_{s}

) is crucial to realizing livable communities that are focused on achieving human well-being and enhanced quality of life [12,13]. Our work addresses this social/democracy challenge [38] in the governance of future cities, characterized by a deep integration among the environment, technology, and planet inhabitants.

Moreover, designing and maintaining sustainable CPS to effectively control fleets of smart devices, is a challenge mainly asking for a policy-technology balance. For example, while scientists and technical officers optimize for latency and matching efficiency, policymakers (i.e., municipalities) focus on safety, equity, and environmental protection (i.e., the SDG). Because service demand is inherently unpredictable and undergoes stochastic changes, dynamic fleet control represents significant operational hurdles in high-dynamic (i.e., high-frequent) services. The consequence is the priority to identify the best-fit and context-aware approach capable of managing the non-linear trade-offs between environmental, economic, and social metrics within the urban context.

Traditional smart city models often suffer from a static deployment bias, where the underlying ICT infrastructure is deployed on purpose and configured with fixed parameters regardless of the fluctuating urban context. The evolution of urban infrastructures has recently moved towards the integration of cognitive CPS and Artificial Intelligence of Things (AIoT) [10,15].

Recent literature suggests that the next frontier is the transformation of these networks into fully autonomous CPS [6]. Scheduling is the most effective solution to address production-friendly hybrids that combine AI with heuristics. A fundamental contribution to the evolution of smart city management is provided by [10], which explores the synergy between AIoT and Digital Twin (DT) as Smart City Brain. This work emphasizes how predictive planning and real-time management can pioneer environmental synergies, moving beyond simple data collection towards a cognitive urban management system. In particular, the integration of AI within the urban context allows for predictive planning, which is essential to achieve long-term sustainability goals. DT serves as the “virtual mirror” of the physical city, enabling simulations that optimize resource allocation before their physical implementation. However, while [10] focuses on high-level environmental synergies and the overarching architecture of the city brain, there is a recognized need to bridge the gap between macro-scale planning and micro-scale operational efficiency at the device level.

The literature emphasizes how high-dimensional datasets [14] are essential for training robust autonomous agents, particularly when dealing with the stochastic nature of modern Internet of Things (IoT) deployments. Without sufficient data granularity, the convergence of optimization algorithms can be compromised, leading to bias projections (i.e., inaccurate estimations). Furthermore, large datasets facilitate the validation of comparative studies across different communication protocols and low-power technologies, providing a statistical basis to evaluate the long-term sustainability of large-scale IoT deployments [16]. As a consequence, transition from theoretical models to viable urban CPS calls for a data-centric perspective to minimize uncertainty in operational costs and maximize the efficiency of shared resource management. To such a purpose, a long-term testbed collecting data from sensors deployed at different locations across a city, after conducting intelligent radio coverage planning [17], has been implemented. This systematic approach led to the generation of structured datasets tailored to this paper investigations.

Economic sustainability (

P_{c}

) is crucial in cities suffering from poor infrastructure, inadequate service provision, and environmental degradation. In such a context, for example, the concept of smart cities has recently gained traction as a potential solution to the challenges facing African cities [39]. In this light, city governance is truly smart if, beyond updating and enhancing the accessibility and usability of provided services, it actively preserves the environment. This implies safeguarding the natural and landscape features that define the environmental sustainability (

P_{e}

) of the city and its technological and infrastructural components.

In [40], the authors discuss this challenge, and propose a deep learning multivariate architecture to predict wind speed and global horizontal irradiance. Although they use only timestamp-based features, enabling reliable prediction of renewable energy without sensing infrastructure and achieving strong predictive accuracy, they assert their proposed framework is validated using a single dataset and may exhibit reduced performance under highly variable cloudy conditions. We adopt a traditional IoT-based approach for the monitoring, by collecting data about weather conditions and quality of the air, thereby incorporating the seasonal patterns of the data.

De-carbonization is part of the environmental sustainability challenge, as demonstrated by several studies in literature [41,42]. However, often these studies suffer from unavailable real testbeds and are limited to simulations.

A study oriented to the quantitative assessment of urban ecosystems is represented by the work of [43], in which the authors developed a comprehensive integrated model to evaluate the environmental and economic sustainability of smart cities. Their approach is characterized by the use of distinct indicators to calculate an overall Sustainability Index, providing a mathematical framework to balance carbon emissions and energy efficiency with economic viability. However, the challenge remains in the dynamic nature of these indicators: while their model provides a diagnostic tool for urban planning, it primarily operates on aggregated data. Our research extends this integrated vision by embedding such sustainability metrics directly into a CPS architecture. Using the real-time sensing capabilities discussed in the AIoT literature [10], we transform the integrated model of [43] into a dynamic optimization problem. Specifically, we adopt their logic of multi-dimensional indicators from a “top-down” planning model to a “bottom-up” adaptive control loop. However, although the authors in [10] provide the architectural blueprint for environmental synergies, our work focuses on the trade-offs between the three pillars of sustainability, which are often in conflict. Therefore, a trade-off analysis is crucial to determine the optimal configurations of CPS in urban ecosystems.

In this context, various computational approaches have been proposed to address the inherent volatility of smart city environments, offering distinct methodological perspectives to model and control the complexity of urban dynamics. A classical distinction in literature is among three primary paradigms: Classical Methods (CM), comprising Linear Programming and Heuristics, Supervised Learning (SL), and Reinforcement Learning (RL). The

C M

group is traditionally used for static resource allocation; although these methods are computationally efficient, they lack the adaptability required to respond to contextual shifts in real-time. Furthermore, they often require a precise closed-form mathematical model of the entire environment, which is notoriously difficult to obtain in complex smart city scenarios.

The

S L

methods [20,21,22,23] are highly effective for predictive tasks, such as forecasting traffic flow or energy demand; however, they require extensive labeled datasets representing optimal decisions, which are rarely available for dynamic urban control. In the context of this framework, the optimal balance between the pillars of sustainability is not pre-defined but must be discovered dynamically through exploration.

Unlike the aforementioned approaches,

R L

algorithms have emerged as a dominant methodology, allowing agents to derive optimal strategies through direct interaction with the urban environment. While foundational theories established the mechanics of agent-environment interaction [24], modern implementations now leverage Deep RL to discover state-of-the-art algorithms that outperform human-engineered heuristics [27]. In the context of smart cities, the challenge of resource allocation is increasingly addressed through MARL, providing a robust framework for optimizing distributed infrastructures [25,31].

Recent breakthroughs in LLM and RL from human feedback have further introduced advanced reasoning capabilities [26,28,29]. Although these models are often associated with natural language processing, their underlying logic of “learning to reason” and “scaling through feedback” is directly applicable to CPS. By deciphering the nuances of feedback loops and long-term rewards [30], it is possible to design urban controllers that address sustainable trade-offs between energy saving and service availability.

A comparison between the computational approaches

C M

,

S L

, and

R L

is reported in Table 2, evaluating them based on heterogeneity of the data, portability, environment interaction, and integration of sustainability.

Data heterogeneity refers to ingest and normalize different data streams. While

C M

requires rigid rules that struggle with high-dimensional IoT data, and

S L

focuses on finding patterns in historical datasets,

R L

is goal-oriented: it processes heterogeneous inputs to optimize a sustainability score, making it more resilient to sensor variability.

Portability assesses the feasibility of running the engine on resource-constrained hardware.

C M

methods are computationally light but inflexible;

S L

models can be memory-intensive; RL enables an asymmetrical workload: intensive training is performed offline, while the resulting policy is a lightweight inference engine runnable at-the-edge.

Environment interaction represents the feedback loop.

C M

is essentially an open-loop paradigm;

S L

remains a passive observer of historical data. In contrast,

R L

is a closed-loop system. This is crucial for sustainability: the agent observes how a power-saving action at time t affects reliability at

t + 1

, adapting its strategy autonomously based on the observed outcome.

Sustainability integration evaluates how the model incorporates environmental and economic pillars. Sustainability is not a fixed threshold (as in

C M

) nor a historical replication (as in

S L

), but a context-aware balance. While

C M

uses static weights that fail when the urban context changes, and

S L

requires non-existent “perfectly labeled” trade-off datasets,

R L

provides dynamic Pareto balancing, shifting prioritization based on real-time urgency.

Based on the comparative assessment above,

R L

, and in particular

D R L

, was identified as the most suitable technique [32]. Raw data are directly fed into a multi-layered deep neural network that progressively extracts higher-level features [33]. As part of the DRL family,

D Q N

has emerged as a pivotal technology in smart city research [34,35,36], primarily due to its ability to dynamically adapt to fluctuating urban conditions and orchestrate complex data from interconnected devices [37]. The rationale for this choice is mainly due to the ability of DQN to handle the “curse of dimensionality” inherent in the multi-dimensional context-aware approach. In the smart city context, using Deep Neural Networks (DNN), a DQN agent adapts its behavior to rapid weather-induced hardware stress and fluctuating traffic density, without relying on pre-defined or historical “perfect” datasets.

Recent advances have further integrated DNN and federated learning [5,7] to anticipate future demand patterns, thereby enhancing system-wide sustainability [8]. This technological convergence promises to shape the future of adaptive urban ecosystems.

3. Context-Aware Optimization

To evaluate the systemic impact of the ICT infrastructure in urban environments, this study defines a multi-dimensional framework taking into account environmental, economic, and social sustainability metrics. They are quantified through recognized scientific metrics, providing a standardized approach to evaluate how a Smart City infrastructure interacts with the urban ecosystem and its governance.

As illustrated in Figure 2, our approach introduces a conceptual framework integrating multiple CPS into a cohesive urban ecosystem. It is characterized by the AIoT as a “cognitive” engine that transforms the old concept of static infrastructure in an adaptive urban ecosystem. AIoT acts as a stochastic controller that processes real-time telemetry to optimize system responses.

The architecture of the proposed adaptive urban ecosystem is illustrated in Figure 3. The framework conceptualizes the transition from a static sustainability model to a dynamic, context-aware system. As shown in this diagram, the system is driven by two primary inputs: the Sustainability, which defines the theoretical SDG11, and the Urban Context, providing real-time environmental and operational data. These inputs converge on the Preliminary Setup Layer (PSL), highlighted by the dashed red boundary. Within this layer, the sustainability pillars

P_{e} (t)

,

P_{c} (t)

, and

P_{s} (t)

are not treated as static metrics but as dynamic functions of the urban context.

The interaction among these pillars is governed by a Multi-Objective Optimization engine. This component processes the input parameters (i.e., from the PSL) to resolve inherent trade-offs, also ensuring that the infrastructure remains efficient and reliable under varying urban conditions. Finally, an Adaptive Feedback loop returns the optimized state to the Urban Context, enabling a continuous self-healing and self-configuring cycle.

3.1. Smart Urban Context

The core of the proposed framework lies in the formalization of the smart urban context as a dynamic CPS ecosystem governed by a state-space representation. Thereby, a dynamic urban context

C (t)

can be defined as the n-tuple of Equation (1)

C (t) = (c_{1} (t), c_{2} (t), \dots, c_{n} (t)) \in C \subseteq R^{n}

(1)

where each component

c_{i} (t)

represents a specific environmental or systemic metric, such as traffic congestion levels, network availability, or sudden urban emergencies, and where

C

is the context state space.

To tune the model to urban-wide areas, usually organized into geographical districts (e.g., urban districts, industrial zones, or residential quarters), a context matrix

S (t)

is formally defined by eq. (2):

S (t) = (\begin{matrix} C_{1, 1} (t) & \dots & C_{1, q} (t) \\ ⋮ & ⋱ & ⋮ \\ C_{p, 1} (t) & \dots & C_{p, q} (t) \end{matrix}) \in C^{p, q} \subseteq R^{n, p, q}

(2)

mapping

p * q > 0

districts of the urban area into the corresponding

C_{i, j} (t)

context representing the

i, j

-th geographical district at time t. This formulation ensures that the context matrix

S (t)

captures the dynamics of the city and its districts, preserving geographic and topological distance through the matrix adjacency. This allows the municipality to identify spatial correlations and proximity-based dependencies, which are essential for risk prevention (e.g., hydro-geological monitoring) and urban optimization (e.g., commercial promotion in high-footfall areas). Therefore, it is possible to identify complex spatial-temporal correlations that would be lost in a lower-dimensional or aggregated representation, finally identifying optimal trade-offs among the three pillars of sustainability. This representation enables a comprehensive overview of the city dynamics by aggregating localized district-level data and identifying potential spatial correlations between adjacent districts. Using this structured information, the framework can detect criticalities and vulnerabilities associated with specific districts based on the interplay of their context metrics, such as traffic congestion, weather patterns, and crowd density.

The practical implications of such granular monitoring are manifold: high-resolution telemetry is crucial for hydro-geological risk prevention and for optimizing urban traffic planning through proactive interventions; the ability to analyze real-time visitor and tourist flows enables more effective promotion of events and commercial activities in areas with higher crowding; the high-dimensional observation space is transformed into an actionable decision-support tool, allowing the municipality to identify both systemic risks and strategic opportunities by capturing the inter-dependencies between neighboring districts. This representation does not merely describe the environment but actively shapes the weighting of the sustainability pillars within the proposed multi-objective optimization process, ensuring that the resulting urban trade-offs are both resilient and context-aware. The brain of the optimization process determines the criticality of the current situation. For example, in high-critical scenarios, the controller may prioritize immediate social and network performance by extending the duration of high-power operational states. Conversely, in routine contexts, the optimization logic shifts toward energy saving and cost reduction. As a result of this analysis, the context impacts the sustainability of the smart urban ecosystem and the selected main metrics that characterize its pillars, as described below. This holistic representation emphasizes that the transition toward a smart city is not merely technological but is intrinsically linked to measurable sustainability goals, where CPS bridge the gap between physical infrastructure and socio-economic outcomes.

3.2. Sustainability Pillars and Metrics

Environmental sustainability

P_{e}

is traditionally quantified by the ecological cost of the digital infrastructure, focusing primarily on metrics such as energy consumption and carbon footprint. However, in the context of a Smart City, this perspective is inherently limited. True environmental sustainability must be redefined through the proactive role of CPS, acting as responsive sentinels. Beyond mere device-level energy savings, these systems monitor both local and neighboring contexts to dynamically manage urban dynamics. By anticipating traffic peaks, high-density events, and congestion, the DQN policy prioritizes the systemic prevention of pollution surges, using the infrastructure as an active tool for ecological mitigation rather than a passive energy consumer. In such a scenario, energy consumption is not a static value but a baseline strictly coupled with the operational context

S (t)

. It serves as a direct measure of the CPS impact on environmental and economic costs, as the power consumption scales dynamically across different modes, from low-power stand-by or idle states to high-performance active monitoring.

Let

d_{i, j}

be the number of devices deployed across the

i, j

-th urban district, the total energy consumptionE over an observation interval

[t_{0}, t]

is formally defined by Equation (3):

E (s, t) = \int_{t_{0}}^{t} P (S (τ), τ), d τ [kWh] .

(3)

where the lowercase s represents the specific realization of the state at the current time t, i.e.

s = S (t)

. The total power P is the result of the sum of the instantaneous power measured at time t for each device in the district, formally defined by Equation (4):

P (s, t) = \sum_{i = 1}^{p} \sum_{j = 1}^{q} \sum_{k = 1}^{d_{i, j}} P_{i, j, k} (C_{i, j} (t), t) [W] .

(4)

The carbon footprint (

C F

) represents the Global Warming Potential (GWP) [44] associated with the system existence and operation in the observation interval

[t_{0}, t]

. It accounts for the embodied carbon of each device, provided as a manufacturer specification, and the emissions derived from its contextual energy consumption. Leveraging the energy formulation defined in Equation (3), the total

C F

for the infrastructure across the n districts is formalized by Equation (5):

C F (s, t) = \sum_{i = 1}^{p} \sum_{j = 1}^{q} \sum_{k = 1}^{d_{i, j}} G W P_{i, j, k} + ϵ_{g e o} \cdot E (s, t) [{kgCO}_{2} eq]

(5)

where

G W P_{i, j, k}

is the aggregate cradle-to-gate carbon footprint of the k-th device located in the

i, j

-th urban district (including its enclosure, electronics, and battery where applicable). This term represents the embodied impact linked to the physical existence of the device. The term

ϵ_{g e o} \cdot E (s, t)

represents the operational impact, where

ϵ_{g e o}

denotes the grid emission intensity factor [45] and

E (s, t)

is the context-dependent energy consumption derived from the instantaneous power

P (s, t)

as defined by Equation (3). This formulation allows the framework to evaluate how the selection of specific hardware models and their operational duty cycles, driven by the urban dynamics captured in

S (t)

, jointly influence the total environmental sustainability of the infrastructure.

The economic sustainability (

P_{c}

) ensures the financial viability and scalability of the infrastructure within municipal budget constraints. It is modeled through the Total Cost of Ownership (TCO), which balances the initial investment with long-term maintenance. The TCO represents the comprehensive financial metric for evaluating the (

P_{c}

) of the CPS infrastructure over its entire operational life. It is defined by Equation (6) as the sum of the initial investment and the accumulated operating costs:

T C O (s, t) = C A P E X (t_{0}) + O P E X (s, t)) [Currency]

(6)

The term

C A P E X (t_{0})

represents the Capital Expenditure incurred at the initial time

t_{0}

, acting as a static and context-independent value. It includes the procurement of devices for all the districts, including hardware costs, licenses, and physical installation. By defining it at

t_{0}

, the model treats the initial investment as a fixed boundary condition for the optimization problem, as formally defined by Equation (7):

C A P E X (t_{0}) = \sum_{i = 1}^{p} \sum_{j = 1}^{q} \sum_{k = 1}^{d_{i, j}} (H_{i, j, k} + I_{i, j, k} + L_{i, j, k}) |_{t_{0}} [Currency]

(7)

where

H_{i, j, k}

,

I_{i, j, k}

, and

L_{i, j, k}

denote the hardware, installation, and licensing costs for the k-th device in the

i, j

-th geographical district, respectively.

Conversely,

O P E X (s, t)

represents the dynamic and context-aware Operational Expenditure. It is formally defined by Equation (8):

O P E X (s, t) = \frac{(t - t_{0})}{T} \cdot K_{T} + O P E X_{V} (s, t) [Currency]

(8)

where the

K_{T}

term represents the time-invariant operational costs (e.g., cloud subscription fees or fixed infrastructure leasing) in the T period. The variable operational expenditure

O P E X_{V}

during the observation interval

[t_{0}, t]

is formally defined by Equation (9) as a direct function of the geographical urban context, capturing how urban dynamics drive resource allocation and energy demand.

O P E X_{V} (s, t) = \int_{t_{0}}^{t} (\sum_{i = 1}^{p} \sum_{j = 1}^{q} \sum_{k = 1}^{d_{i, j}} [V_{i, j, k} (τ) \cdot P_{i, j, k} (C_{i, j} (τ), τ) + (M_{i, j, k} (τ) + δ_{k} \cdot B_{i, j, k} (τ)) \cdot O_{p r e d} (C_{i, j} (τ), τ)]) d τ

(9)

In this formulation, the first term of the sum captures the monetary cost of the energy, where

V_{i, j, k}

is the unit price (currency-per-kW), and

P_{i, j, k}

is the power of Equation (4). The integral term also captures the localized maintenance and hardware preservation costs across the districts over the observation interval

[t_{0}, t]

. Here,

M_{i, j, k}

denotes the routine maintenance for the k-th device in the

i, j

-th district, encompassing inspections and standard operational overhead. The second part of the sum addresses the critical management of power-source longevity. The binary parameter

δ_{k}

indicates the presence of a battery for the k-th device model, as specified by Equation (10).

δ_{k} = \{\begin{matrix} 1 & if the k - th device is battery - powered \\ 0 & if the k - th device is mains - powered . \end{matrix}

(10)

The term

B_{i, j, k} (t)

represents the replacement cost for the battery (including procurement and field logistics). These factors are modulated by the predictive operator

O_{p r e d}

, which estimates the degradation rate and the “state-of-the-health” of the device. By incorporating this predictive logic, the model acknowledges that operational stress directly accelerates battery chemical aging. The above formulation is useful for long-term planning, as battery-operated devices (

δ_{k} = 1

) can incur periodic replacement costs and labor, whereas mains-powered devices (

δ_{k} = 0

) contribute more significantly to the energy cost.

The social sustainability (

P_{s}

) is evaluated through the system ability to provide a reliable and timely service to the community. Information systems and IT services are increasingly performing a wide variety of organizational functions and personal activities. Therefore, high-quality information systems and IT services are essential to provide value and avoid possible negative consequences for their stakeholders. According to the ISO / IEC System Quality and Software Quality Requirements and Evaluation (SQuaRE) family of International Standards [47], the social value of a CPS is quantified through objective quality of service (QoS) parameters. In this work, we redefine these metrics to be context-aware, ensuring they respond dynamically to the urban context. A QoS metric is the Packet Delivery Ratio (PDR) measuring the communication integrity and the reliability of the data flow from the devices to the application. It is formally defined by Equation (11):

P D R (s, t) = \frac{\int_{t_{0}}^{t} \sum_{i = 1}^{p} \sum_{j = 1}^{q} \sum_{k = 1}^{d_{i, j}} Ψ_{R X, I, j, k} (S (τ), τ) d τ}{\int_{t_{0}}^{t} \sum_{i = 1}^{p} \sum_{j = 1}^{q} \sum_{k = 1}^{d_{i, j}} Ψ_{T X, I, j, k} (S (τ), τ) d τ}, t - t_{0} ≫ τ_{m a x}

(11)

where the

P D R

is defined as the ratio of cumulative successfully received packets (

Ψ_{R X}

) to total transmitted packets (

Ψ_{T X}

) across the urban scenario. By modeling these variables as time-varying processes, the integral structure captures the cumulative throughput over the observation window

[t_{0}, t]

, effectively filtering out instantaneous stochastic noise. To ensure statistical consistency, the interval

[t_{0}, t]

is assumed to be significantly larger than the maximum network propagation delay. Furthermore, the explicit dependence of both integrals on the urban context

S (t)

formalizes how urban stressors, such as localized fading or congestion, modulate this ratio.

The

τ_{m a x}

is defined by Equation (12):

τ_{m a x} = max_{i, j, k} τ_{P r o p, i, j, k} (t)

(12)

where the term

τ_{P r o p, i, j, k} (t)

denotes the instantaneous latency experienced by a packet transmitted by the k-th device within the

i, j

-th district. This delay accounts for the physical distance between nodes, the signal propagation speed in the urban medium, and potential context-driven retransmissions or multi-hop overheads.

The Service Timeliness, denoted as

σ (s, t)

, quantifies the average end-to-end latency over the observation window

[t_{0}, t]

, ensuring the system meets specific real-time requirements. It is formally defined by Equation (13) as the mean temporal gap between data generation at the sensing layer and its final reception at the application level:

σ (s, t) = E {[t_{D, i, j, k} - t_{S, i, j, k}]}_{(t - t_{0})} [s]

(13)

where

t_{S}

and

t_{D}

represent the timestamps of data sensing and delivery for each packet, respectively. To keep the operational integrity of safety-critical services, we impose a strict latency constraint:

σ (s, t) \leq τ_{r e q}

(14)

where

τ_{r e q}

represents the application-specific deadline. While

τ_{m a x}

of Equation (12) defines the physical upper bound of the network propagation delay, the timeliness

τ

accounts for the total system latency, including processing times and queuing delays. The condition

σ \leq τ_{r e q}

ensures that the response of the system to the urban environment

S (t)

is based on timely rather than outdated information to preserve the effectiveness of the decision-making process under dynamic conditions.

The Service Availability (

A_{s}

) [47] represents the fraction of time the infrastructure is fully operational and capable of fulfilling its functional requirements. Although the service availability at an individual device level is sensitive to localized stressors, the

A_{s}

is a critical system-level Key Performance Indicator (KPI). The proposed model leverages the aggregation of individual availability metrics to derive a systemic indicator that reflects the capacity of the smart city infrastructure to provide continuous services, despite possible failures of single nodes.

Defining the availability of a device

a_{k}

by Equation (15):

a_{k} (t) = \frac{U p t i m e_{k} (t, t_{0})}{D o w n t i m e_{k} (t, t_{0}) + U p t i m e_{k} (t, t_{0})}

(15)

where

U p t i m e_{k} (t, t 0)

denotes the cumulative duration within the time interval

(t_{0}, t)

in which the k-th device is fully operational and the service is active. Conversely,

D o w n t i m e_{k} (t_{0}, t)

accounts for periods of inactivity, and the service is inactive. Thereby, the system-level availability is formally specified by Equation (16):

A_{s} (s, t) = \sum_{i = 1}^{p} \sum_{j = 1}^{q} \frac{1}{d_{i, j}} \sum_{k = 1}^{d_{i, j}} a_{i, j, k} (t)

(16)

where

d_{i, j}

denotes the number of devices deployed across the

i, j

-th urban district. Unlike static reliability models, this formulation accounts for environmental and operational stressors across the full urban context

S (t)

. A high

A_{s}

value is mandatory for mission-critical urban services, ensuring 24/7 availability and resilience against localized disruptions.

3.3. DQN-Based Optimization

As discussed in Section 2, a DQN-based optimization strategy has been adopted to achieve context-aware governance of urban resources, thereby facilitating the transition toward more adaptive and sustainable smart city ecosystems. A general schema of the proposed method is depicted in Figure 4. This architectural framework transitions from raw environmental data to a high-level decision-making state through a three-level hierarchical abstraction. At the most granular level (System Metrics), the system captures systemic metrics represented as

c_{n} (t)

. Such metrics are geographically aggregated into

p \cdot q > 0

district contexts (

C_{i, j} (t) = (c_{1} (t), c_{2} (t), \dots, c_{n} (t))

) encapsulating the n multi-dimensional status of the

i, j

-th district at time t.

The integration of this state representation into the DQN pipeline follows a specific information flow. The Environment block, acting as the CPS, outputs the context matrix

S (t)

. Therefore, the matrix

S (t)

is passed to the DQN agent, which processes the data based on its internal structure. It uses convolutional layers to keep the grid shape intact. This allows the agent to recognize spatial patterns, such as how a problem in one district might affect its neighbors. By preserving this grid structure, the agent better understands the layout of the city, leading to more effective urban management decisions.

More specifically, at each time step t, the agent observes the global state of the city through the context matrix

S (t) \in C^{p, q}

. This representation captures the status of all functional districts

(i, j)

, characterized by systemic metrics whose raw data, collected from IoT devices, are normalized before being processed by the DNN to ensure numerical stability and effective feature extraction. Once the agent detects the optimal policy for the devices of each district

(i, j)

, it transforms this policy in an operational action

a_{i, j} \in A^{p, q}

to set devices to a specific operational condition. By discretizing the operational conditions into a finite set of

h \in N

operational modes as defined by Equation (17):

O = {o_{1}, . . ., o_{h}}, o_{h} \in N

(17)

the

a_{i, j}

action can be defined as a pair of eq (18):

a_{i, j} = (o_{s}, o_{f}) \in A^{p, q} \subseteq O^{2}

(18)

with

o_{s}, o_{f} \in O

as starting and final state of an action. It is important to remark that, in general, the operational conditions are usually orthogonal to the sustainability pillars, and

o_{s}

may be the same as

o_{f}

, i.e., no changes are enforced by

o_{s} = o_{f}

actions.

The Sustainability Gain

G^{π} (s, a)

, is formally defined by Equation (19), for a specific state-action pair

(s, a) \in S x A

. It is the expected cumulative discounted return, which the DQN agent aims to maximize to identify the optimal policy (

π

). Specifically,

G^{π} (s, a)

corresponds to the action-value function, satisfying the Bellman optimality criterion [49].

G^{π} (s, a) = E [\sum_{y = 0}^{\infty} γ^{y} \sum_{i = 1}^{p} \sum_{j = 1}^{q} R_{i, j} (S (t + y), A (t + y))], γ \in (0, 1)

(19)

This formulation encapsulates the multi-level complexity of urban infrastructure through several key components. The expectation operator

E [\cdot]

accounts for the intrinsic stochasticity of the urban environment. State transitions and the resulting rewards are influenced by unpredictable events, such as traffic spikes or sudden emergencies, which characterize the described dynamics. The temporal index t denotes the current decision epoch, while

y \in N

represents the discrete look-ahead horizon. The term

t + y

indicates that the agent current return is not merely a function of immediate rewards, but a discounted accumulation of expected future states

S (t + y)

and actions

A (t + y)

. The integration of an infinite horizon (

y \to \infty

), moderated by the discount factor

γ \in (0, 1)

, ensures the strategic sustainability of the policy: for

γ \to 0^{+}

, the objective function collapses into a single-step optimization, where the agent considers only the immediate reward. Conversely, as

γ \to 1^{-}

, the agent equally weights present and future rewards. However, in this case, the agent loses the ability to rank different strategies, as any policy providing a positive reward would result in the same mathematical value (∞). By keeping

γ < 1

, the series converges to a finite number, allowing the DQN to mathematically determine which policy is superior by comparing finite values.

The sustainability gain

G^{π} (s, a)

aggregates the local rewards

R_{i, j}

at the

i, j

-th district level to ensure computational scalability within large-scale smart city deployments. As the number of urban IoT devices may grow into the thousands, a device-level reward structure would introduce high-frequency noise and a dimensional explosion in the feedback signal, severely impacting the DQN convergence. By a district-level reward

R_{i, j}

, the system considers each geographical area as a functional domain. This ensures that the learning complexity remains tied to the resolution of the urban (

p \times q

) grid rather than the fluctuating density of the deployed hardware. It is formally defined by Equation (20), aggregating the environmental, economic, and social sustainability metrics of all devices within that area:

\begin{matrix} R_{i, j} (s, a) = 1.0 - [λ_{1} L_{e} (η_{i, j}, a_{i, j}) + λ_{2} L_{c} (η_{i, j}, a_{i, j}) + λ_{3} L_{s} (η_{i, j}, a_{i, j})] \end{matrix}

(20)

where

λ_{k} \in [0, 1] \subset R

are the sustainability factors balancing the impact of normalized losses

L

on the system, such that

\sum_{k = 1}^{3} λ_{k} = 1

. The reward function

R_{i, j}

formalizes the operational efficiency of the

(i, j)

-th district by coupling the local action

a_{i, j}

with the expanded neighborhood [50] state

(η_{i, j})

. This formulation adopts a penalty-from-unity approach: starting from an ideal value of

1.0

, the reward is decreased by a weighted sum of these components. As a consequence, maximization of the expected return

G^{π} (s, a)

is mathematically equivalent to minimizing the long-term cumulative loss. Thereby, a policy

π

is a parametric configuration tha can be identified by the 3-tuple of Equation (21):

π = (λ_{1}, λ_{2}, λ_{3}) .

(21)

The spatial dependency is modeled upon the Moore neighborhood convention, which is widely adopted in networked RL to capture local inter-dependencies and interference patterns in distributed sensing infrastructures [50]. It is formally defined by Equation (22):

η (i, j) = {C_{u, v} \in S : | u - i | \leq r, | v - j | \leq r}

(22)

where

η (i, j)

is composed of the context subset (sub-matrix) of the global state space

S

centered on the district

(i, j)

with radius r.

By conditioning environmental (

L_{e}

), economic (

L_{c}

), and social (

L_{s}

) losses on the context of adjacent districts, the system prevents the emergence of selfish policies that might optimize local parameters at the expense of neighboring areas. Each loss component

L_{x}

of Equation (20) is modeled as a weighted aggregation of a specific subset of normalized metrics as specified by Equation (23)

L_{x} = \sum_{k = 1}^{n} ω_{k} \cdot {\hat{c}}_{k}; {\hat{c}}_{k} = \frac{c_{k}}{c_{k}^{m a x}}

(23)

where

ω_{k}

represents the relative weight of each normalized metric

{\hat{c}}_{k}

within that specific loss, such that

\sum_{k = 1}^{n} ω_{k} = 1

, and

c_{k}^{m a x}

is the maximum value threshold allowed to preserve the operation of the equipment. The hierarchical weighting structure allows the DQN agent to internalize a complex multi-dimensional state while maintaining a clear balance between high-level sustainability goals and specific physical constraints.

4. Case Study

To validate the proposed approach, a real-world urban deployment (i.e. the historical center of a Sicilian city) consisting of five neighboring districts has been investigated. The effectiveness of this solution is demonstrated through a case study on LoRaWAN technologies [17] as the smart city ICT infrastructure. LoRaWAN is a LPWAN specification designed to wirelessly connect battery operated things to the internet in regional, national or global networks, and targets key IoT requirements such as bi-directional communication, end-to-end security, mobility and localization services [18].

As shown in Figure 5, a 3x3 map defines the urban domain where districts A, B, C, D, and E represent the active operational area covered by a central LoRaWAN gateway located in district C (0,0). Non-covered peripheral areas are masked to focus the DQN agent policy on the core interconnected districts. Each active district is equipped with one installation point (pole) including several IoT devices to capture the multi-faceted dynamics of the district, as reported in Table 3.

Each cell corresponds to a surface area of

0.04 k m^{2}

, due to the urban morphology of the historic center, and its impact on the LoRaWAN communication. To ensure the replicability of the proposed approach, the cell size can be appropriately scaled based on the topology and geo-morphological features of the new monitored environment.

More specifically, Pole A acts as an integrated sensing cluster hosting one environmental station, one traffic sensor, and one parking node, while Pole B provides a dual-source setup for environmental and traffic monitoring. A similar configuration is replicated on Pole C, which hosts an additional environmental station, a traffic sensor, a parking node, and a standalone LoRaWAN gateway providing the backbone for the LoRaWAN communication layer. Mobility dynamics are further captured by Pole D, which serves as a high-density hub with one traffic sensor and two parking nodes, and by Pole E, conceived for localized parking monitoring with a single sensing unit.

Thereby, the experimental testbed comprises 12 heterogeneous edge devices and a central gateway

G W

, leveraging hardware and firmware technology provided by the innovative company SmartMe.io1. The gateway acts as the primary orchestrator, ensuring seamless connectivity between the edge nodes and a dedicated cloud-based smart mobility platform for real-time data processing and analytics. This technological infrastructure is categorized into three functional typologies of devices, as shown in Table 4.

All poles are permanently mains-powered, and devices operate in a continuous DC active state. As a consequence, the

O P E X_{V}

formulation (Equation (9)) is simplified (

δ_{k} = 0

). Table 5 reports the baseline energy metrics under standard operating conditions, assuming a static duty-cycle without any dynamic policy intervention. These values represent the reference consumption against which the agent energy-saving capabilities are subsequently measured.

This baseline has been established during the start-up phase of the project, where measurements have been calibrated and validated using certified instrumentation. Therefore, this baseline is designed to be periodically updated and refined, provided that such updates remain compatible with existing network constraints and infrastructure overhead (e.g., the 1% duty-cycle in LoRaWAN communication).

During runtime, the system collects a set of heterogeneous physical measurements from the urban environment, formally defined as a real-valued 10-dimensional tuple by Equation (24):

C (t) = (c_{1} (t), c_{2} (t), \dots, c_{10} (t)]) \in C \subseteq R^{10}

(24)

in compliance with Equation (1).

Table 6 reports the metrics considered for the examined scenario, including features and related impacts on the city dynamics. The radio parameters (

c_{9}, c_{10}

) and urban demand indicators (

c_{4}, c_{5}, c_{6}

) specify the transmission power and frequency. Physical stressors (

c_{1}, c_{2}

) derived from the weather dataset, alongside the Ultraviolet (UV) index (

c_{3}

), influence the probability of hardware fatigue and maintenance frequency. In our

T C O

model, these environmental factors directly impact the amortization

O P E X

and

C A P E X

by modulating the expected life-cycle of the mains-powered devices. Furthermore, the performance metrics of the LoRaWAN network (

c_{7}, c_{8}

), combined with the activity density (

c_{6}

), define the overall

Q o S

.

The Table 6 metrics, selected from weather, traffic, parking and LoRaWAN datasets, serve as the primary inputs for the optimization process. To prevent bias during the DQN training phase, these raw observations are normalized into a normalized demand vector

\hat{C}

, as defined by Equation (23). Through the normalization process, the DQN agent executes the operational phases of its architecture to maximize the cumulative reward

R

defined by Equation (20). The training process integrates experience replay and target network synchronization to ensure the stability of the Q-value estimation. To effectively navigate the trade-off between investigating new strategies, a trade-off mechanism allows the agent to explore the state-action space before converging toward the optimal policy

π

maximizing the sustainability gain

G^{π} (s, a)

.

A distinctive feature of this implementation is the spatial modeling of the state, which extends beyond localized observations. The logic implemented integrates a context with the dynamics detected in adjacent districts

(A, B, D, E)

, defined as the radius

r = 1

Moore neighborhood set

η (0, 0)

of Equation (22). Therefore, it is not just a quick reaction to local changes, but it is possible to understand how different districts affect each other. For example, how traffic moves or how environmental changes spread from the outer districts to the center. The agent can anticipate incoming data traffic from neighboring areas and adjust settings in advance to stay within the constraints.

The experimental setup implements three (

h = 3

) discrete operational conditions for the edge infrastructure, as defined by Equation (25) specializing Equation (17):

O = {o_{ECO}, o_{STD}, o_{HPM}}

(25)

where each mode

o_{E C O}

,

o_{S T D}

, and

o_{H P M}

defines a specific power profile. The

o_{E C O}

mode identifies the lower operational bound for the edge device, characterized by a power consumption of

\approx 2

W. From a hardware perspective, this configuration corresponds to an idle state or a low-power duty cycle where GPU-intensive tasks are suspended and sensing frequencies are minimized. In the

o_{S T D}

mode, the device operates at its nominal power profile of

\approx 4

W. This setup ensures compliance with standard regulatory limits while providing sufficient data granularity for routine urban monitoring. The

o_{H P M}

mode is characterized by maximum computational and transmission effort to ensure high system responsiveness during urban emergencies or peak demand. In this configuration, the device enables full real-time edge-inference, resulting in a power surge of

\approx 7

W. This value aligns with the 10 W Thermal Design Power (TDP) profile of the Jetson Nano, accounting for sustained GPU utilization during complex neural inference.

In applying DRL techniques to urban scenarios, overfitting is a critical challenge: it occurs when the neural network accurately learns the specific noise or outliers of the training dataset but fails to generalize its policy to unseen conditions. In our context, an over-fitted DQN agent might trigger wrong actions in response to sensor anomalies.

5. Results

To validate the robustness of the proposed approach, a comprehensive analysis has been conducted on the urban infrastructures described in Table 3. The C district has been selected as the primary

(0, 0)

reference due to its full-fledged configuration, as shown in Table 4. Such a multi-modal data environment provides a challenging and representative scenario. Three policies, characterizing the agent behavioral attitude toward urban dynamics, emerge as trade-offs from the multi-objective optimization. We define these policies as lazy, balanced, and responsive.

The lazy policy

π_{l} = (0.6, 0.3, 0.1)

identifies an energy-saving and cost-saving oriented profile. In this configuration, the DQN agent keeps the devices in a low-power state to minimize energy consumption and reduce the associated economic expenditure. It is simultaneously focused on the device lifespan: by operating primarily in less demanding and less stressful conditions, it preserves the hardware electronic integrity. Thereby, the system ensures long-term operational resilience by avoiding the thermal and computational stress typical of higher-performance modes. Although this policy safeguards hardware longevity and related costs, it introduces a significant operational risk to the urban scenario: by failing to capture micro-scale traffic fluctuations and peak congestion events, the system underestimates the actual urban stress. As a consequence, it can represent a systemic risk for the community, as decision-makers are provided with insufficient data that masks pollution hotspots and traffic bottlenecks. In summary, the lazy policy achieves device-level resilience impacting on the city observability and consequently on its sustainability.

The responsive policy

π_{r} = (0.1, 0.3, 0.6)

identifies a social-oriented profile. This configuration is highly sensitive to fluctuations in urban traffic demand; while it accepts a higher energy and economic cost at the device level, it minimizes the information latency. By operating at peak performance, the system avoids underestimating mobility dynamics, which is crucial to prevent flawed or weak decision-making. In this case study, responsiveness translates in substantial environmental and social benefits: by providing high-resolution monitoring of traffic congestion and parking occupancy, the system enables more effective urban flow management and a reduction in traffic congestion. Consequently, the ability to detect and react to micro-scale mobility fluctuations allows for the targeted mitigation of pollution hotspots. In this configuration, local energy E consumption and capital

C A P E X

expenditure could be a strategic investment to achieve a systemic reduction of the overall environmental footprint.

The balanced policy

π_{b} = (0.3, 0.4, 0.3)

identifies a nominal baseline profile designed to ensure that urban monitoring remains reliable and inclusive without reaching the energy peaks of the responsive mode while avoiding the information latency of the lazy approach.

To account for seasonal variations in traffic-parking patterns and environmental conditions, four representative months have been selected to sample the annual operational cycle from August 2025 to March 2026: August (summer), October (autumn), January (winter), and March (spring). This seasonal sampling ensures that the DQN policy is not over-fitted to specific temporal conditions but remains effective and generalized across different climatic and social contexts. The following analysis details the behavioral patterns learned by the DQN agent across the four representative months, illustrating how the lazy, balanced, and responsive policies map the normalized traffic demand to specific operational conditions

O = {o_{E C O}, o_{S T D}, o_{H P M}}

.

In the August scenario depicted in Figure 6, the agent faces distinct summer traffic peaks where the policies exhibit highly differentiated behaviors. The lazy policy is in the

o_{E C O}

state for the majority of the day, showing high tolerance for traffic increases and transitioning to

o_{S T D}

only during the mid-day and evening peaks. The balanced policy serves as a stable baseline, remaining in

o_{S T D}

for almost the entire 24-hour cycle to ensure constant monitoring. Finally, the responsive policy acts as a vigilant sentinel, proactively switching to

o_{H P M}

during the two main traffic surges.

In the October scenario depicted in Figure 7, the lazy policy switches from the

o_{E C O}

mode to the

o_{S T D}

mode only during the two highest peaks of the day, maintaining its low-power profile like in August. The balanced policy remains in

o_{S T D}

throughout the active city hours (07:00–21:00) and reverting to

o_{E C O}

only during deep night. The responsive policy switches into

o_{H P M}

at each traffic peak, specifically targeting the early morning rush, the mid-day plateau, and the evening return, thus maximizing the permanence in the high-performance

o_{H P M}

condition.

By observing the January scenario depicted in Figure 8, the analysis reveals how the agent adapts to winter traffic demand. The lazy policy confirms the extreme cost-oriented attitude observed in August and in October. Devices remain in

o_{E C O}

even during significant traffic demand, with only a brief transition to

o_{S T D}

during the late morning. Although the balanced policy is not high-responsive to the midday peaks (0.5–0.7) in traffic demand, it maintains devices in

o_{S T D}

during the (07:00–21:00) daily period when the city center is expected to be persistently affected by vehicular traffic flows. Devices are switched to

o_{E C O}

during the early morning hours (normalized traffic demand < 0.1), and around the 22:00. The responsive policy, instead, promptly escalates to

o_{H P M}

to cover the broad mid-day traffic plateau and a spike around the 16:00.

Observing the March scenario depicted in Figure 9, the analysis confirms the characteristic behavior of the three policies. The lazy policy maintains devices in the

o_{E C O}

, activating the

o_{S T D}

condition only during the peaks of demand (around hours 11:00 and 17:00). The balanced policy maintains its cost-oriented objective. Finally, the responsive policy confirms its high sensitivity.

The cross-seasonal analysis confirms that the DQN agent has successfully synthesized three distinct optimization policies. The policy convergence observed across the four representative months suggests a high degree of robustness: regardless of the seasonal baseline, the lazy policy consistently acts as a lower bound for energy expenditure, while the responsive policy serves as a high-fidelity upper bound, with a direct benefit in terms of social sustainability.

The balanced policy exhibits an intermediate and stabilizing behavior. It effectively filters out minor traffic fluctuations to maintain devices in a steady

o_{S T D}

monitoring state, proving to be a compromise profile between social, economic, and environmental sustainability, without incurring the economic penalties of the responsive mode. However, the balanced policy is weaker than responsive to address environmental sustainability.

The economic sustainability of the examined CPS is evaluated through a

T C O

analysis over a 10-year operational horizon. The initial

C A P E X

is established as the baseline, and comprises the acquisition of the equipment shown in Table 4, alongside a centralized gateway, software orchestration platforms, and professional installation costs. The growth in

T C O

is driven by the

O P E X

, and it is primarily conditioned by energy consumption costs, assuming a baseline rate of

0.20 € / kWh

, and system maintenance costs.

The results of Figure 10 allow an investigation of the policy sensitivity: the responsive policy exhibits higher growth, reaching an OPEX-to-CAPEX ratio of approximately

5.6

% after 10 years. The DQN agent consistently prioritizes the

o_{H P M}

mode to ensure near-zero latency and high information fidelity during peak traffic demand, which inherently maximizes the power draw across the NVIDIA Jetson Nano rails.

The lazy policy achieves the highest economic sustainability, limiting the 10-year

O P E X

increase to less than 4%. All policies show a subtle transition to a linear trend after the initial five-year phase, reflecting their stabilization and leading to a predictable marginal cost per year.

Finally, the balanced policy shows a mid-range trade-off. These savings are achieved with 95% confidence intervals, ensuring the reliability of the economic projections.

The training process of the DQN agent has been monitored through three key metrics, as illustrated in Figure 11. The Epsilon curve represents the exploration-exploitation trade-off. It follows a planned decay from

1.0

to a minimum threshold of

0.1

around episode 700. This transition ensures that the agent sufficiently explores the state-action space in the early stages before shifting toward the exploitation of the learned optimal policy in the final phases of training. The high stability of the reward and loss trends in the last 200 episodes indicates that the agent has reached a reliable and converged operational state. The Mean Squared Bellman Error (MSBE) represents a fundamental metric [49] for evaluating approximation accuracy, formulating learning objectives, and investigating the theoretical properties of RL algorithms. The

M S B E

trend shows a consistent downward trajectory, starting from approximately

0.55

and approaching zero as the training progresses. This reduction indicates that the model has accurately captured the underlying dynamics of the context. The Cumulative Reward trend exhibits a strong upward slope, stabilizing at a high plateau (approximately at

0.9

) after 800 episodes. This trend confirms that the agent is successfully learning a control policy that maximizes the objective function, balancing energy efficiency and service quality according to the defined reward structure.

Figure 12 shows a radar of the multi-objective optimization policies resulting from our analysis, mapped on a set of UN SDG 11 targets. This map is obtained considering the set of heterogeneous metrics of the dynamic urban context

C (t)

defined in Equation (24).

The responsive policy demonstrates a clear prioritization of Target 11.2 (Sustainable Transport) and Target 11.5 (Disaster Resilience and Safety). By proactively escalating to the high-performance state

o_{H P M}

during traffic peaks, the agent ensures the information fidelity required to mitigate congestion and reduce emergency response times. This “citizen-centric” behavior acknowledges that a localized increase in energy consumption

E (s, t)

is a strategic investment to achieve systemic environmental benefits and public safety, directly supporting Target 11.6 (reducing the environmental impact of cities).

Conversely, the lazy policy aligns primarily with Target 11.b (Resource Efficiency). By maintaining the infrastructure in a low-power

o_{E C O}

state for the majority of the operational cycle, it minimizes the carbon footprint

C F (s, t)

and the operational expenditure

O P E X

. While this approach maximizes the physical and financial longevity of the CPS, it results in lower performance regarding real-time mobility management.

Finally, the balanced policy identifies an equilibrium point for Target 11.7 (Inclusive and Reliable Monitoring). By acting as a stable baseline in

o_{S T D}

, it provides continuous and reliable data flows without the energy surges of the responsive mode or the data scarcity of the lazy profile.

The analysis confirms that the DQN-based framework does not merely optimize a technical trade-off but offers an optimal set of strategies. This allows municipal decision-makers to dynamically tune the infrastructure behavior to meet specific sustainability priorities, from strict resource conservation to high-fidelity urban resilience.

6. Conclusions and Future Directions

The proposed multi-dimensional decision-making framework successfully transitions urban management from static, siloed ICT deployments to an integrated, adaptive urban Cyber-Physical System. By leveraging Deep Reinforcement Learning, specifically Deep Q-Networks, the model effectively navigates the non-linear trade-offs inherent in the environmental, economic, and social pillars of sustainability. The empirical results from the LoRaWAN case study demonstrate that reward-shaping mechanisms allow for the precise modulation of node duty-cycles, balancing energy conservation with service availability. This approach mitigates the inefficiencies of traditional high-performance configurations while preserving the resilience required for modern urban infrastructures. Moreover, this methodology provides a scalable blueprint for achieving carbon-neutral, cost-effective, and socially equitable smart-intelligent cities. This research confirms that the intersection of autonomous AI agents and granular urban telemetry is essential for fulfilling the complex requirements of UN Sustainable Development Goal 11.

Future iterations of this work will explore the integration of decentralized federated learning to further enhance data privacy and system-wide scalability in heterogeneous urban environments. To address long-term challenges, aging-/reliability-aware framework are envisioned to be integrated in the DQN controller. A feasible way can be by introducing a cumulative stress counter as a state variable, the agent could learn to proactively mitigate hardware aging.

Author Contributions

Conceptualization, M.G. and S.D.; methodology, M.G. and S.D.; software, M.G.; validation, M.G. and S.D.; formal analysis, M.G. and S.D.; investigation, M.G.; resources, M.G. and S.D.; data curation, M.G.; writing—original draft preparation, M.G. and S.D.; writing—review and editing, M.G. and S.D.; visualization, M.G.; supervision, S.D.; project administration, S.D.; funding acquisition, S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been funded by the Italian Ministero delle Imprese e del Made in Italy (MIMIT) under the project SMART•E - piattaforma per l’IoT Maintenance, il Facility e l’Asset Management dell’industria 4.0, grant number FTE0000382 (CUP: B47H22004430008, COR: 22573728).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available. Data availability is restricted by a confidentiality agreement between the technology provider and the local public administration. Due to the integration of the testbed into the municipal infrastructure, the raw datasets contain sensitive administrative information that cannot be disclosed to ensure compliance with security protocols and institutional privacy. Requests to access the datasets should be directed to SmartMe.io.

Acknowledgments

The authors would like to express their gratitude to SmartMe.io for the technical support throughout the experimental phase of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AIoT	Artificial Intelligence of Things
CAPEX	Capital Expenditure
CF	Carbon Footprint
CM	Classical Methods
CR	Coding Rate
CPS	Cyber-Physical System
DC	Direct Current
DNN	Deep Neural Networks
DRL	Deep Reinforcement Learning
DQN	Deep Q-Networks
DT	Digital Twin
EU	European Union
GW	Gateway
GWP	Global Warming Potential
ICT	Information and Communication Technology
IEC	International Electrotechnical Commission
IoT	Internet of Things
ISO	International Organization for Standardization
KPI	Key Performance Indicator
LLM	Large Language Models
LPWAN	Low-Power Wide-Area Network
MARL	Multi-Agent Reinforcement Learning
OPEX	Operational Expenditure
PDR	Packet Delivery Ratio
PSL	Preliminary Setup Layer
QoS	Quality of Service
RL	Reinforcement Learning
SDG	Sustainable Development Goal
SF	Spreading Factor
SL	Supervised Learning
SQuaRE	Systems and Software Quality Requirements and Evaluation
TCO	Total Cost of Ownership
UV	Ultraviolet

Notes

1	https://smartme.io/

References

Alahi, M.E.E.; Sukkuea, A.; Tina, F.W.; Nag, A.; Kurdthongmee, W.; Suwannarat, K.; Mukhopadhyay, S.C. Integration of IoT-Enabled Technologies and Artificial Intelligence (AI) for Smart City Scenario: Recent Advancements and Future Trends. Sensors 2023, 23, 5206. [Google Scholar] [CrossRef]
Bellini, P.; Nesi, P.; Pantaleo, G. IoT-Enabled Smart Cities: A Review of Concepts, Frameworks and Key Technologies. Appl. Sci. 2022, 12, 1607. [Google Scholar] [CrossRef]
Terven, J. Deep Reinforcement Learning: A Chronological Overview and Methods. AI 2025, 6, 46. [Google Scholar] [CrossRef]
Plaat, A. Deep Reinforcement Learning; Springer Nature Singapore: Singapore, 2022; ISBN 978-981-19-0638-1. [Google Scholar] [CrossRef]
Ficili, I.; Giacobbe, M.; Tricomi, G.; Puliafito, A. From Sensors to Data Intelligence: Leveraging IoT, Cloud, and Edge Computing with AI. Sensors 2025, 25, 1763. [Google Scholar] [CrossRef] [PubMed]
Pochelu, P.; Cartiaux, H.; Schleich, J. What artificial intelligence can do for high-performance computing systems? Engineering Applications of Artificial Intelligence 2026, 164, 113248. [Google Scholar] [CrossRef]
De Vita, F.; Bruneo, D. Leveraging Stack4Things for Federated Learning in Intelligent Cyber Physical Systems. J. Sens. Actuator Netw. 2020, 9, 59. [Google Scholar] [CrossRef]
Beguni, C.; Căilean, A.-M.; Zadobrischi, E.; Avătămăniței, S.-A.; Lavric, A.; Stoian, F.-M. The Convergence of Artificial Intelligence and Public Policy in Shaping the Future of Ride-Hailing: A Review. Smart Cities 2026, 9, 40. [Google Scholar] [CrossRef]
Zghidi, N.; Trabelsi, R. Impact of Digitalization on Sustainable Development: A Comparative Analysis of Developed and Developing Economies. J. Risk Financial Manag. 2025, 18, 359. [Google Scholar] [CrossRef]
Bibri, S.E.; Huang, J. Artificial intelligence of things for sustainable smart city brain and digital twin systems: Pioneering environmental synergies between real-time management and predictive planning. Environ. Sci. Ecotechnol. 2025, 26, 100591. [Google Scholar] [CrossRef]
Rojek, I.; Mikołajewski, D.; Galas, K.; Piszcz, A. Advanced Deep Learning Algorithms for Energy Optimization of Smart Cities. Energies 2025, 18, 407. [Google Scholar] [CrossRef]
Wang, K.; Ke, Y. Social sustainability of communities: A systematic literature review. Sustainable Production and Consumption 2024, 47, 585–597. [Google Scholar] [CrossRef]
Capecchi, S.; Corduas, M.; Piccolo, D. Social Sustainability and Subjective Well-Being: A Study on Italian Inner Areas. Sustainability 2025, 17, 2078. [Google Scholar] [CrossRef]
Dresp-Langley, B.; Ekseth, O.K.; Fesl, J.; Gohshi, S.; Kurz, M.; Sehring, H.-W. Occam Razor for Big Data? On Detecting Quality in Large Unstructured Datasets. Appl. Sci. 2019, 9, 3065. [Google Scholar] [CrossRef]
Seng, K.P.; Ang, L.M.; Ngharamike, E. Artificial intelligence Internet of Things: A new paradigm of distributed sensor networks. Int. J. Distrib. Sens. Netw. 2022, 18, 15501477211062835. [Google Scholar] [CrossRef]
Mekki, K.; Bajic, E.; Chaxel, F.; Meyer, F. A comparative study of LPWAN technologies for large-scale IoT deployment. ICT Express 2019, 5, 1–7. [Google Scholar] [CrossRef]
Giacobbe, M.; Zanafi, S.; Zaia, A.; Puliafito, A. Empirical Validation of LoRaWAN Coverage Models for Smart Urban Connectivity. In Proceedings of the European Council for Modelling and Simulation, ECMS; 2025; pp. 493–499. [Google Scholar] [CrossRef]
LoRa Alliance, What is LoRaWAN? 2026. Available online: https://lora-alliance.org/about-lorawan/ (accessed on 13 April 2026).
United Nations. Sustainable Development Goals. Department of Economic and Social Affairs. Available online: https://https://sdgs.un.org/goals (accessed on 30 March 2026).
Kim, S.-K.; Chan, I.C. Novel Machine Learning-Based Smart City Pedestrian Road Crossing Alerts. Smart Cities 2025, 8, 114. [Google Scholar] [CrossRef]
Palley, B.; Poças Martins, J.; Bernardo, H.; Rossetti, R. Integrating Machine Learning and Digital Twins for Enhanced Smart Building Operation and Energy Management: A Systematic Review. Urban Sci. 2025, 9, 202. [Google Scholar] [CrossRef]
Fatorachian, H.; Kazemi, H.; Pawar, K. Digital Technologies in Food Supply Chain Waste Management: A Case Study on Sustainable Practices in Smart Cities. Sustainability 2025, 17, 1996. [Google Scholar] [CrossRef]
Fatorachian, H.; Kazemi, H.; Pawar, K. Enhancing Smart City Logistics Through IoT-Enabled Predictive Analytics: A Digital Twin and Cybernetic Feedback Approach. Smart Cities 2025, 8, 56. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement learning. Journal of Cognitive Neuroscience 1999, 11, 126–134. [Google Scholar]
Liang, J.; Miao, H.; Li, K.; Tan, J.; Wang, X.; Luo, R.; Jiang, Y. A Review of Multi-Agent Reinforcement Learning Algorithms. Electronics 2025, 14, 820. [Google Scholar] [CrossRef]
Chen, M.; Sun, L.; Li, T.; Sun, H.; Zhou, Y.; Zhu, C.; Wang, H.; Pan, J.Z.; Zhang, W.; Chen, H.; et al. ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning. arXiv 2025, arXiv:2503.19470. [Google Scholar] [CrossRef]
Oh, J.; Farquhar, G.; Kemaev, I.; et al. Discovering state-of-the-art reinforcement learning algorithms. Nature 2025, 648, 312–319. [Google Scholar] [CrossRef]
Yu, Q.; Zhang, Z.; Zhu, R.; Yuan, Y.; Zuo, X.; Yue, Y.; Wang, M. Dapo: An open-source llm reinforcement learning system at scale. arXiv 2025, arXiv:2503.14476. [Google Scholar] [CrossRef]
Hubert, T.; Mehta, R.; Sartran, L.; et al. Olympiad-level formal mathematical reasoning with reinforcement learning. Nature 2026, 651, 607–613. [Google Scholar] [CrossRef]
Chaudhari, S.; Aggarwal, P.; Murahari, V.; Rajpurohit, T.; Kalyan, A.; Narasimhan, K.; Deshpande, A.; da Silva, B.C. RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs. ACM Comput. Surv. 2025, 58, 53. [Google Scholar] [CrossRef]
Hady, M.A.; Hu, S.; Pratama, M.; et al. Multi-agent reinforcement learning for resources allocation optimization: a survey. Artif. Intell. Rev. 2025, 58, 354. [Google Scholar] [CrossRef]
De Vita, F.; Nardini, G.; Virdis, A.; Bruneo, D.; Puliafito, A.; Stea, G. Using Deep Reinforcement Learning for Application Relocation in Multi-Access Edge Computing. IEEE Commun. Stand. Mag. 2019, 3, 71–78. [Google Scholar] [CrossRef]
Liu, R.; Nageotte, F.; Zanne, P.; de Mathelin, M.; Dresp-Langley, B. Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review. Robotics 2021, 10, 22. [Google Scholar] [CrossRef]
Kim, Y.; Jung, B.C.; Song, Y. Online Learning for Joint Energy Harvesting and Information Decoding Optimization in IoT-Enabled Smart City. IEEE Internet Things J. 2023, 10, 10675–10686. [Google Scholar] [CrossRef]
Zhang, B.; Liu, C.H.; Tang, J.; Xu, Z.; Ma, J.; Wang, W. Learning-Based Energy-Efficient Data Collection by Unmanned Vehicles in Smart Cities. IEEE Trans. Ind. Inform. 2018, 14, 1666–1676. [Google Scholar] [CrossRef]
Park, J.; Baek, J.; Song, Y. Optimizing smart city planning: A deep reinforcement learning framework. ICT Express 2025, 11, 129–134. [Google Scholar] [CrossRef]
Remmache, M.I.; Boudouh, S.S.; Bendouma, T.; Abdelhafidi, Z. Balancing Energy and Latency in Multi-Edge MEC Systems Using DQN-Based Task Offloading. In Proceedings of the 2025 7th International Conference on Pattern Analysis and Intelligent Systems (PAIS), Laghouat, Algeria, 2025; pp. 1–5. [Google Scholar] [CrossRef]
Minskere, L.; Kalnina, D.; Salkovska, J.; Batraga, A. Urban Communication in Smart Cities: Stakeholder Participation Motivators. Smart Cities 2026, 9, 58. [Google Scholar] [CrossRef]
Das, D.K.; Aiyetan, A.O.; Mostafa, M.M.H. Advancing Smart Cities in Africa: Barriers, Potentials, and Strategic Pathways for Sustainable Urban Transformation. Smart Cities 2026, 9, 38. [Google Scholar] [CrossRef]
Shafiullah, M.; Katranji, A.R.; Hassan, M.; Rahman, M.M.; Shezan, S.A. Advanced Multivariate Deep Learning Methodology for Forecasting Wind Speed and Solar Irradiation. Smart Cities 2026, 9, 59. [Google Scholar] [CrossRef]
Bisegna, F.; Vespasiano, F.; Pompei, L.; Burattini, C.; Belli, E.; Bellucci, A.M.; Di Vittorio, F.; Blaso, L. Towards the Decarbonization of Urban Communities: Evaluation of Smart and Green Strategies to Reduce Gas Carbon Emissions. Smart Cities 2026, 9, 26. [Google Scholar] [CrossRef]
Santos, O.; Ribeiro, F.; Metrôlho, J.; Dionísio, R. Using Smart Traffic Lights to Reduce CO2 Emissions and Improve Traffic Flow at Intersections: Simulation of an Intersection in a Small Portuguese City. Appl. Syst. Innov. 2024, 7, 3. [Google Scholar] [CrossRef]
Abu-Rayash, A.; Dincer, I. Development of an integrated model for environmentally and economically sustainable and smart cities. Sustain. Energy Technol. Assess. 2025, 73, 104096. [Google Scholar] [CrossRef]
Eurostat, Smart cities - statistics on urban institutions. Statistics Explained. 2023. Available online: https://ec.europa.eu/eurostat/statistics-explained/SEPDF/cache/10719.pdf (accessed on 24 March 2026).
International Energy Agency (IEA). World Energy Outlook 2024; IEA Publications: Paris, France, 2024; Available online: https://www.iea.org/reports/world-energy-outlook-2024 (accessed on 24 March 2026).
National Institute of Standards and Technology (NIST). Trustworthiness (System). Computer Security Resource Center (CSRC) Glossary. Available online: https://csrc.nist.gov/glossary/term/trustworthiness_system (accessed on 24 March 2026).
ISO/IEC Standard No. 25002:2024; Systems and software engineering — Software product Quality Requirements and Evaluation (SQuaRE) — Managed quality model. ISO: Geneva, Switzerland, 2024.
Bittencourt, J. C. N.; Jesus, T. C.; Peixoto, J. P. J.; Costa, D. G. The Road to Intelligent Cities. Smart Cities 2025, 8, 77. [Google Scholar] [CrossRef]
Patterson, A.; Liao, V.; White, M. Robust Losses for Learning Value Functions. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023, 45, 6157–6167. [Google Scholar] [CrossRef] [PubMed]
Zaitsev, D. A.; Shmeleva, T. R.; Ghaffari, P. Modeling Multidimensional Communication Lattices with Moore Neighborhood by Infinite Petri Nets. 2021 International Conference on Information and Digital Technologies (IDT), Zilina, Slovakia, 2021; pp. 171–181. [Google Scholar] [CrossRef]

Figure 1. United Nations Sustainable Cities and Communities Goal (SDG11), and its Targets.

Figure 2. Conceptual Framework of Adaptive Urban Ecosystem composed by multiple AI-driven CPS.

Figure 3. Overall Architecture of the proposed multi-objective optimization framework leading to a context-aware optimized urban trade-offs.

Figure 4. DQN-based optimization schema, highlithing the cyber-to-physical and physical-to-cyber transitions in the smart urban ecosystem.

Figure 5. Spatial 3 X 3 grid representation of the urban district forming the monitored urban center. The map is centered on the Moore-modeled district C(0,0), with radius

r = 1

.

Figure 5. Spatial 3 X 3 grid representation of the urban district forming the monitored urban center. The map is centered on the Moore-modeled district C(0,0), with radius

r = 1

.

Figure 6. DQN Policy Sensitivity Analysis for the C district in August. The plot illustrates the trade-offs resulting from the DQN-based optimization relative to normalized traffic demand (shaded area).

Figure 7. DQN Policy Sensitivity Analysis for the C district in October.

Figure 8. DQN Policy Sensitivity Analysis for the C district in January.

Figure 9. DQN Policy Sensitivity Analysis for the C district in March.

Figure 10. Projected 10-year TCO growth as a percentage of initial investment across different operational trade-offs. Results are reported with a 95% confidence interval.

Figure 11. DQN training metrics: MSBE loss, cumulative reward, and epsilon decay trends over 1000 episodes, showing model stability and convergence.

Figure 12. DQN-based trade-offs alignment with UN SDG 11 targets. The radar plot illustrates the alignment between the agent behavioral profiles and the global indicators for sustainable and resilient cities.

Table 1. Comprehensive Mapping of the proposed framework into SDG 11 Targets and Implementation Strategies.

Sustainability Pillar	SDG 11 Targets	Framework Contribution
Environmental ( $P_{e}$ )	11.6 Environment	Minimizes the urban carbon footprint by mitigating traffic-related emissions. It dynamically regulates district resources to counter pollution peaks, ensuring a synergy between energy conservation and the overall environmental quality of the city area.
Economic ( $P_{c}$ )	11.4, 11.5 Heritage, Resilience	Minimizes economic impact by deploying resilient monitoring networks for urban heritage and early-warning systems. It reduces financial losses and recovery costs from environmental disasters through proactive infrastructure management and efficient resource allocation.
Social ( $P_{s}$ )	11.1, 11.2, 11.7 Housing, Transport, Public Spaces	Guarantees the necessary operational performance for smart mobility services, preventing service disruptions in urban transport infrastructures. It ensures that public transit and pedestrian spaces remain safe, accessible, and inclusive by maintaining reliable connectivity and real-time data flow.
Integrated ( $P_{e}, P_{c}, P_{s}$ )	11.3 Urbanization	Orchestrates participatory settlement planning by adapting to dynamic “bio-social” demands through multi-objective optimization.
	11.a Regional Planning	Supports strong links between urban and rural areas overcoming siloes patterns.
	11.b Integrated Policies	Internalizes multi-dimensional rewards to promote resource efficiency, climate change mitigation, and adaptive urban resilience.
	11.c Sustainable Building	Provides Municipalities with a dynamic instrument for “adaptive” city planning, focusing on sustainable infrastructure and resilient buildings.

Table 2. Comparative Assessment of Classical Methods (CM), Supervised Learning (SL), and Reinforcement Learning (RL) in the reference context.

Feature	CM (LP/Heuristics)	SL	RL
Data Heterogeneity	Rigid Pre-defined Rules	Historical Pattern Matching	Goal-Oriented Optimization
Portability	Lightweight but Inflexible	Memory Intensive	Asymmetrical Workload
Env. Interaction	Open-loop Execution	Passive Observation	Active Closed-loop Feedback
Sustainability	Fixed Thresholds	Pattern Replication	Autonomous Exploration
Trade-off Logic	Static Weighting	Perfectly Labeled Data	Dynamic Pareto Balancing
Scientific Ref.	Classical Theory	[7,20,21,22,23]	[24,25,26,27,28,29,30,31,32,33,36,40]

Table 3. Spatial distribution and functional roles of the urban sensing infrastructure across the five neighboring districts and the centroidal gateway

(G W)

.

Table 3. Spatial distribution and functional roles of the urban sensing infrastructure across the five neighboring districts and the centroidal gateway

(G W)

.

District/Pole	Coord. $(i, j)$	Deployed Devices	Strategic Role
A	$(0, 1)$	1 Env., 1 Traffic, 1 Parking	Integrated Urban Sensing
B	$(- 1, 1)$	1 Env., 1 Traffic	Telemetry & Flow Analysis
C	$(0, 0)$	1 Env., 1 Traffic, 1 Parking, 1 GW	Network Backbone & Sensing
D	$(- 1, 0)$	1 Traffic, 2 Parking	Mobility & Occupancy Hub
E	$(0, - 1)$	1 Parking	Localized Spot Monitoring

Table 4. Technical specifications and functional classification of the urban IoT testbed.

Typology	Hardware Platform	Num	Primary Function
Environmental (weather station)	Raspberry Pi 4 (12V supply)	3	Ambient telemetry
Parking (optical sensor)	Nvidia Jetson Nano (5V rail¹)	5	Optical occupancy sensing
Traffic (optical sensor)	Nvidia Jetson Nano (5V rail¹)	4	Vehicle flow analysis
Network & Context Parameters
Protocol	LoRaWAN Class C (SF7, 125 kHz, CR 4/5)²
Duty Cycle	$1 %$ (EU 868 MHz)
¹ Powered via DC–DC converter from a 24V source; ² ChirpStack, open source LoRaWAN Network Server.

Table 5. Baseline Power and daily (24h) Energy Consumptions in standard operating mode, without DQN optimization.

Typology	Num	$P_{device}$ (W)	$E_{d e v i c e}$ (kWh)	Num · $E_{d e v i c e}$ (kWh)
Environmental (weather station)	3	2.364	0.0567	0.1702
Parking (optical sensor)	5	3.395	0.0815	0.4074
Traffic (optical sensor)	4	4.460	0.1070	0.4281
LoRaWAN Gateway	1	3.583	0.0860	0.0860
Overall benchmark	13	-	0.3312¹	1.0917
¹ It represents the benchmark for the district C (0,0) in our case study.

Table 6. Set of metrics considered for the examined scenario, with related measures and impacts on the dynamics of the city.

Dataset	Metric	Measure	Impact
Weather	$c_{1}$	Ambient Temperature (°C)	Hardware thermal stress and operational efficiency
	$c_{2}$	Relative Humidity (%)	Signal attenuation and link propagation quality
	$c_{3}$	Ultraviolet (UV) Index	Seasonal fluctuations and weather-induced variability
Traffic	$c_{4}$	Flow Index	Real-time service urgency and prioritization
	$c_{5}$	Total Vehicle Count	Demand intensity and infrastructure load
Parking	$c_{6}$	Lot Occupancy Rate	Local urban activity density and utility demand
LoRaWAN	$c_{7}$	Packet Delivery Ratio (PDR)	Network reliability and communication quality
	$c_{8}$	Latency ( $τ$ )	Transmission delay and real-time responsiveness
	$c_{9}$	RSSI (dBm)	Signal strength and transmission power efficiency
	$c_{10}$	SNR (dB)	Link robustness and noise interference levels

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Shaping Adaptive and Sustainable Smart Urban Ecosystems

Abstract

Keywords:

Subject:

1. Introduction

2. Background and Related Work

3. Context-Aware Optimization

3.1. Smart Urban Context

3.2. Sustainability Pillars and Metrics

3.3. DQN-Based Optimization

4. Case Study

5. Results

6. Conclusions and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Notes

References

MDPI Initiatives

Important Links

Subscribe