1. Introduction
The power grid is undergoing a significant transformation toward a smart grid paradigm. Since protection, control, and monitoring functions in a smart grid are predominantly executed within substations, achieving overall grid reliability fundamentally depends on the reliability of these substations. As the transition toward a net-zero energy system accelerates, digital substations (DS), serving as critical nodes within this evolving infrastructure, are advancing to leverage modern developments in communication and computing technologies [
1]. The secondary system of a digital substation consists of a Substation Automation System (SAS) and a Protection, Automation, and Control (PAC) system. The SAS typically manages higher level functions within the substation, including communication with Supervisory Control and Data Acquisition (SCADA) systems, interfacing with the Human Machine Interface (HMI), and handling alarms. In contrast, the PAC system mainly provides protection for primary assets as well as real-time control and monitoring at the lower level. This complete secondary system in an automated substation is oftentimes referred to as DSAS in the literature. PAC functions, which were mainly distributed in the beginning (i.e., different Bay Control Units (BCUs) and Protection IEDs in a substation) [
2], are transitioning towards a centralized and virtualized PAC system. The transformation borrowed from the enabling technology of virtualization from the field of computer science has proved to be promising as a result of early performance investigations [
3]. Traditionally, physical PAC devices comprised tightly integrated hardware and software layers, each designed and optimized for a specific set of protection and control functions. Because these devices were delivered as purpose-built units, their performance characteristics were predetermined and guaranteed by the manufacturer. With the introduction of virtualization, these layers are now decoupled, enabling the hardware platform to be selected independently of the PAC application. In this model, the vendor provides the PAC functionality as software, while utilities can deploy it on robust hardware computing platforms that meet their operational requirements and can be adapted as the grid evolves. This architectural separation offers utilities significant flexibility, as it allows them to mix and match components from different suppliers to construct a solution tailored to their needs. However, this increased freedom also introduces a critical challenge: ensuring that the independently sourced hardware and software components operate seamlessly and reliably together, an essential requirement given the stringent dependability expectations of power system infrastructure.
Aging of the software layer in the virtualized architecture poses a critical performance [
5] and reliability challenge for long-running systems, where prolonged execution leads to gradual performance degradation and increased failure likelihood. This degradation typically results from aging-related bugs that accumulate over time through mechanisms such as memory leaks, resource exhaustion, and corrupted internal states. These effects progress along the fault-error-failure chain and are often reflected in observable indicators, including rising resource consumption and reduced responsiveness [
6]. To counter these effects, software rejuvenation provides a proactive recovery strategy that restores system health through controlled restarts or state refresh operations [
7]. Rather than removing underlying defects, rejuvenation aims to release consumed resources and reset deteriorated system states, thereby postponing failures associated with cumulative errors. This technique is especially effective in continuously operating environments where planned rejuvenation can prevent unscheduled outages. Its effectiveness depends on determining an appropriate rejuvenation schedule that balances system availability with maintenance overhead, making timing decisions a central challenge [
8]. Considering the pivotal role of dependability and performance aspects during the design and operation stages of the system life cycle, an exploration of model choices and system failure mechanisms during the pilot deployment stage [
4] can build confidence prior to real-world deployments.
To the author’s knowledge, no work exists to date that addresses the Software Aging and Rejuvenation (SAR) model aspect and its impact on substation reliability indices; this may be due to the infancy of virtualization technology in digital substations. The phenomenon, however, is well understood and have been the topic of research in the field of computer science. Owing to this fact we will present here some important work of SAR pertinent to virtualization technology in the field of computer science. This background serves both as a relevant literature review and a brief introduction to the field for power system researchers.
In virtualized environments, software aging appears as performance degradation and resource exhaustion, exacerbated by the additional abstraction layers of virtualization. Aging analysis seeks to estimate the likely time to failure caused by these effects and enable timely rejuvenation. Existing techniques fall into three main categories: model-based, measurement-based, and hybrid approaches [
8]. Model-based techniques use analytical and stochastic models to derive optimal rejuvenation schedules. Common formalism examples include continuous time Markov chain (CTMC) [
9], semi-Markov process (SMP) [
10], Markov regenerative process (MRGP) [
11], and Petri-net–based models such as stochastic Petri-net (SPN) [
12] and stochastic reward net (SRN) [
13], applied across configurations involving single or multiple hosts and various Virtual Machine (VM) states (cold, warm, migrated, failover). These approaches enable systematic evaluation of alternative rejuvenation policies but often rely on simplifying assumptions that reduce accuracy in dynamic virtualized environments [
8]. Measurement-based approaches rely on observable aging indicators at both the Virtual Machine Monitor (VMM) and VM layers. System-level metrics capture CPU, memory, storage, and network usage, while VM-level indicators include latency, throughput, and Service Level Agreement (SLA) compliance. Collected data can be analyzed using time-series forecasting [
14], machine learning [
15], or threshold-based methods [
16], with dimensionality reduction or feature selection applied when indicators are correlated [
17]. Although effective in revealing unanticipated aging patterns, these methods require extensive monitoring effort and often lack generalizability across systems. Hybrid strategies combine the strengths of both approaches by parameterizing analytical models with empirical data, thereby improving model fidelity and adaptability to real workloads [
18,
19]. Rejuvenation mechanisms operate at multiple layers of the virtualization stack. At the VM layer, techniques include cold-VM restarts and failover [
16]. At the VMM layer, options include cold- and warm-VM rejuvenation, [
20] suspend/resume or quick reboot [
21], various VM migration [
22] strategies (stop-and-copy, pre-copy, return-back, stay-on), and micro-reboots [
23] of virtual infrastructure components. However, virtualization can introduce additional aging challenges—such as increased memory fragmentation—that accelerate degradation and complicate rejuvenation planning [
24]. Given the diversity and operational trade-offs among these techniques and the nonexistence of mission-critical virtualized PAC systems in utility infrastructures it is quite a challenge to propose a standard model at this stage. This challenge is further exacerbated due to unavailability of field data, lack of standardization at VMM or hypervisor and application level, and technology choices at the server hardware and network layer level. This leaves the choice of model and analysis methodology selection completely open for VPAC-based DSAS. This work is thus deemed to be of an exploratory nature, an impetus to draw electrical power system researchers’ attention to explore the modeling possibilities and further investigation in this emerging area. The work contributes by:
Extending and incorporating an existing SAR model using a hierarchical modeling framework into the availability model of a DSAS.
Analyzing and evaluating the reliability indices of the primary and secondary systems of the substation and later deriving the combined substation indices.
The paper is organized as follows:
Section 2 presents the primary system and secondary architecture chosen for the study and proposed modeling methodology.
Section 3 presents, analysis of primary and secondary systems along with combined system evaluation model. Reliability indices of primary, secondary and combined system are derived with discussion in
Section 4. Finally, the paper concludes with
Section 5.