1. Introduction
The rapid integration of electronics into daily applications, such as the automotive industry and the Internet of Things (IoT), underscores the need for research and innovation in design-for-reliability paradigms more than ever. In the realm of Very Large Scale Integration (VLSI) reliability, the use of overly conservative approaches, which involve large reliability margins and safety factors, restricts the design space, thereby affecting chip performance and power consumption. Given the anticipated computing energy demands associated with the rise and prevalence of artificial intelligence (AI) [
1], along with the complexity implications of materials and manufacturing processes, these overly pessimistic reliability paradigms are at odds with global sustainability requirements and the sustainability obligations of the semiconductor sector.
One of the key (VLSI) reliability challenges concerning metal interconnections is electromigration (EM). EM is a phenomenon where metal atoms are displaced due to the momentum transfer from conducting electrons [
2,
3]. This displacement can lead to the formation of voids in interconnects, causing circuit failures. The failure is by void nucleation and growth at the cathode end of interconnects which results in an increase of interconnect resistance impairing circuit operation. For instance, it can induce timing errors and eventually lead to open circuit failures. The impact of EM on the reliability of VLSI circuits has been a topic of extensive research as a central interconnect reliability concern [
4,
5].
In standard EM tests, interconnects are characterized by determining their time-to-failure (TTF) under constant direct current (DC) tests with flux divergence points at their two ends. An interconnect is considered to have failed when its resistance increases beyond a target value, often 10% resistance shift (R-Shift). Tests are conducted at accelerated conditions by increasing the current density and temperatures. Multiple interconnects are tested under different temperatures and current conditions and the mean-time-to-failures (MTTF) at different conditions are used to determine the activation energy,
Ea and the current density exponent, n, of the Black’s equation [
2,
6,
7,
8]. TTFs follow a lognormal distribution, whereas the lognormal and Black’s equation parameters are used to derive an EM current density limit,
jmax, for a target EM failure probability and lifetime [
9,
10]. The current density limit,
jmax, is usually reported in the process design kits (PDKs) provided by the foundry to be employed by the designer for EM compliance checks in the Place and Route (P&R) phase.
Fundamentally speaking, EM is correlated with the cohesive energy of metals and therefore also correlated to their meting point. Many widely used metals in CMOS technology such as Al and Cu suffer from EM. In contrast, for metals with high melting temperature such as Ru, Mo and W, EM is not a major reliability concern [
11,
12]. Many material and process-related factors are implicated in EM, including interfacial properties e.g. adhesion [
13], Cu microstructure (i.e. grain size distribution) [
14,
15,
16,
17,
18,
19], mechanical and fracture properties of the metal, confining materials and dielectrics and residual stresses [
20,
21,
22,
23,
24,
25,
26]. Thereby, technological solutions involve material and process innovations such as using dopants (e.g. AL and Mn) that segregate into Cu grain boundaries and also implementation of liners and metal capping e.g. Co cap [
27,
28,
29], each with their pros and cons in terms of resistivity and cost [
30,
31].
With increase of current densities and drastic decline of EM robustness due to miniaturization of interconnects [
16,
32,
33,
34], EM is considered to be a significant reliability challenge for the ongoing scaling [
35]. As scaling progresses, delay shifts attributed to EM are anticipated to overshadow other aging mechanisms such as hot carrier injection (HCI) and bias temperature instability (BTI) [
36]. Despite the advent of EM-robust alternative metals such as Mo, Ru projected for used in angstrom nodes [
11,
12], resistivity considerations restrict their application to wires narrower than ~12 nm linewidths [
37]. Thus, the back-end-of-line (BEOL) will continue to display a hierarchical architecture where wider metal levels will be Cu-based given its cost and resistivity benefits [
37]. Moreover, the rise of back-side power delivery networks (BS-PDNs), where the PDN is processed on the opposite side of the wafer [
38,
39], justifies the use of Cu in back-side power delivery applications given available spacing for more relaxed linewidths. In this scenario, transistor heat generation and interconnect joule heating in the absence of effective chip cooling options [
40], will raise Cu interconnect temperatures and temperature gradients, thereby intensifying EM aging [
41,
42]. Evidently, EM will persist to be a reliability concern as we venture into the angstrom technology era.
The standard design-for-reliability approaches widely employed for EM by the industry are; (i) the limit-based approach (LBA) which predicts system failure when the average DC current density,
j, exceeds
jmax, for any interconnect, ignoring the statistical nature of EM [
43,
44] and, (ii) the statistical EM budgeting (SEB) which employs the weakest link statistics considering the distribution of
j/jmax ratio of all single interconnects [
43,
44]. Thereby in these methods, failure is predicted when
jmax is violated in any single interconnect or the failure probability is predicted based on a first-to-fail interconnect criterion, respectively. This assumption is plausible for circuits where interconnects are connected in series. However, the PDN, which is most prone to EM because of high average magnitudes of unipolar currents, has a grid-like architecture with many parallel interconnect paths. Thereby, if one interconnect suffers from EM, the current will be redirected to redundant parallel paths. Clearly, application of the conventional EM compliance evaluation methods to PDN may be overly pessimistic, as formation of the first void alone doesn’t necessarily cause a system failure [
45].
To this end, many studies have been dedicated to investigation of EM in grid-like networks using both experimental and simulative approaches. Zhou et al. (2018), used a test chip to study EM effects in PDNs, and observed mechanical stress-dependent failure locations in grids and self-healing due to redundant current paths [
46]. Using a similar on-chip approach, Pande et al. (2019), captured several EM effects ranging from abrupt and/or progressive failures, temporary healing effects and circuit-interconnect interplay, which may not be observed by single interconnect characterization [
47]. Lin et al, conducted systematic experiments on the impact of redundancy by using test structures with different number of parallel interconnects, and proposed a statistical model to predict the TTF of parallel interconnect networks based on the TTF of the last failing interconnect [
48]. As the latter model was purely statistical, the physical dynamics, e.g. current redirection and accelerated aging of the late failing interconnects, could not be captured. Yet the approach, explained many of the key EM characteristic trends of parallel systems such as the decrease of lognormal sigma,
, with increase of redundancy [
48]. With increasing complexity of grid architecture, however, their model predictions diverged from experimental findings, possibly due to neglecting the time-dependent physical cascade of phenomena such as current and stress redistribution within the grids. In this context, understanding of the distribution of EM induced hydrostatic stress within the grid, has been shown to be the pre-requisite to determine the EM failure locations [
49]. EM voids occur in locations of high tensile stress which don’t strictly coincide with locations of peak current within a grid and requires grid-level stress analysis [
49]. To this end, physics-based modelling of EM has seen significant advances in recent years considering transient simulation of all stages of EM aging where models have become more technology and microstructure aware [
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61]. However, application of such exhaustive models to grids with billions of segments entails significant computational costs. Mainly because, simulation of post-voiding cascade of events requires transient coupled electrical-EM analysis to capture current redistribution within the grid. To minimize such computational costs, model order reduction together with filtering algorithms that confine the analysis to critical interconnect segments have been adopted in the literature [
62,
63,
64,
65]. Furthermore, due to their computational expense, complexity, and parametric uncertainty, the statistical aspect of EM is frequently overlooked. This often results in models failing to deliver the crucial chip-level failure probability. In addition to the need for electronic design automation (EDA) software packages capable of resolving transient mechanical stress distribution across the entire chip before and after void-nucleation, the adoption of stress-based approaches necessitates the provision of mechanical stress limits (i.e. critical stress) from the foundry. These limits can only be indirectly inferred, for instance, through model-based approaches. Therefore, despite their inherent limitations, current-based EM compliance evaluation methods continue to be the industrial benchmark.
To overcome the described practical constraints and account for grid redundancy in chip-level reliability predictions, we recently introduced the concept of a PDN-tile-based EM compliance check, see
Figure 1. This method derives current limits for the unit cells (or tiles) of the PDN. This approach is practical because PDNs are architecturally composed of repeating grid unit-cells, each consisting of parallel interconnect paths. Consequently, the redundancy impact is inherently captured in the current limits determined at the tile level characterization. Further, tile-based SEB would be applied, where PDN tiles are considered as the fundamental elements for weakest link statistics, instead of single interconnects, see
Figure 1. This allows for scalable EM assessment of PDNs, considering the impact of PDN redundancy [
66].
In this paper, the proposed tile-based concept is expanded by outlining a physics-based simulation modeling framework for PDN unit cells. This framework can predict their EM current limits using a Monte-Carlo approach. We present a calibration methodology for the model using standard EM test data, considering uncertainty propagation in the pre-nucleation and post-nucleation phases. A stress-based analysis predicts the EM failure locations within the PDN tile, taking into account the stochastic nature of void nucleation. The fundamental differences between EM in the negative supply voltage, Vss grid and the positive supply voltage, Vdd grid are investigated. In addition to pre-nucleation stress evaluation and void nucleation, the post-voiding EM aging phase and the variability induced by the stochasticity of void dynamics are efficiently simulated. We achieve this by using an order-reduced void impact model that can be calibrated directly from standard EM experimental data. The developed simulation framework is validated using experimental data on the impact of redundancy. Subsequently, the modelling framework is used to predict the EM behavior of a realistic PDN unit-cell, determining the impact of redundancy on EM failure probability and tile-based current limits.