Submitted:
24 March 2025
Posted:
25 March 2025
You are already at the latest version
Abstract
Keywords:
Introduction
- 1.
- Optimizations on High-Level Synthesis (HLS)
- A.
- Design Space Exploration: Early simulation of power within different architectural options (e.g., loop unrolling, pipelining) before RTL commitment. Power estimation can inform performance vs. switching activity trade-offs [1,2]. However, it can perform multiple operations per cycle as per the HLS due to resource sharing and time-multiplexing. This incurs latency at the expense of lower capacitance and toggling [3]. The scheduler already has power-aware heuristics (lower activity ops won't be executed together). This minimizes peak switching activity and balances the power consumption [4].
- B.
- Glitch Approach: Glitching power is the usage of power in the logic drain in the combinational logic due to the unbalanced paths. Some HLS tools reorders operations to produce stable intermediate signals [5].
- C.
- Clock Gating Insertion: HLS synthesizers can insert clock gates around FSMs and data paths. This prevents unused modules from toggling without designer input [4].
- 2.
- Power-Aware RTL Design Methodologies
- Power Intent Definition (UPF/CPF): These standards at RTL specify power domains, retention logic, level shifters, and shutdown conditions. It achieves functional simulation of power behaviors such as sleep, retention, and isolation [8]. Finite state machines (FSMs) using gray coding or one-hot encoding are often used to decrease the number of transitions compared to binary encoding, significantly where only one state changes at a time [9]. RTL can gate logic that fetches data paths that are not used often with enable signals. For example, a multiply unit only utilized during initialization can be gated with conditional execution so that it does not inadvertently toggle [10].
- Block Gating at RTL: Heterogeneous clock gating groups the flip flops sharing the same enable and replaces them in the clock path with gated versions. This meta-level operates at an abstraction where synthesizers can identify and apply using standard library cells [11].
- Reducing activity in unused logic: Unused combinational logic must be tied to the constant (0 or 1) or defined as "don't care" (X) to avoid unpredictable switching [12].
- Signal balance: Unequal delay paths can cause glitches.
- RTL pipelining: Paths are balanced so that transitions happen simultaneously to reduce intermediate power spikes [13].
- 3.
- Switching & Glitch Power Reduction through Logic Restructuring
- Operand Isolation: Combinational blocks (e.g., multiplier) create downstream logic that may not always be used. Isolation gates are placed to prevent the inputs from toggling unless needed [14].
- Another example—is CMOS Gates: CMOS dynamic power depends on the order of the input toggle—signal toggle rates that influence the switching energy [15].
- Duplication of Logic: High-fanout nets can be broken down into isolated duplicated drivers, decreasing capacitive loads per driver and thus switching power [3].
- Gate Merging: Allowing multiple gates to be combined into a single, more complicated one to minimize the number of internal transitions. It serves to simplify logic and avoid toggling paths [15].
- 4.
- Low-Power Optimizations that are functional unit–specific
- At the MAC units: Use gated accumulation registers, isolate operands, and enable gating on the clock. Partial reconfiguration can also unload unused MACs in FPGA implementations [7].
- Linear Feedback Shift Registers (LFSRs): The optimization of taps and XOR/XNOR functions reducing switching plays an essential role in minimizing the LFSR power [16].
- Phase Frequency Detectors (PFDs): Low-power phase frequency detectors are also used to reduce potential leakage during the idle states of PLL feedback loops by using LECTOR or stacking-based logic [6].
- 5.
- Early Power Model and Predictive Estimation
- Activity-Based Estimators: compute the toggle rates based on the simulation or probabilistic models.
- Analytical Models: These models use the input statistics and signal probabilities to obtain power estimates without simulating the full RTL [17].
- 6.
- Clock Gating
- Levels of Gating: Acceptable level: Individual registers or small data paths gate if they are enabled.
- Coarse-grained: Entire functional blocks, such as multipliers, memory controllers, or peripherals, are disabled when not needed [2].
- Synthesis-Aided Gating: Nowadays, most RTL synthesis tools can infer clock gating automatically and will optimize their timing and area if standard enable signals are found, which allows designers to alleviate some workload [3].
- Design Challenges: If not balanced properly, you'll introduce clock skew. Gated clocks complicate static timing analysis. It needs cautious gating control logic, especially in vector operations or pipelined units [5]
- 7.
- Power Gating
-
Implementation Styles:Fine-grained power gating – small blocks are connected to their sleep transistor with high savings but increased area.Coarse-grained power gating: In this case, complete subsystems like CPU cores, DMA engines, or DSPs are switched off together using common sleep transistors [8].
-
State Retention & Isolation:State retention flip-flops maintain the logic state for power-down cycles.Isolation cells block floating or undefined values from propagating to always-on logic [9].Control strategy: The power-management units coordinate the "save → power-off → restore" cycle and often stage the wake-up process to limit inrush currents that can corrupt adjacent logic [10].
- In ASICs vs. FPGAs: ASICs use physical sleep transistors for fine-grained power gating. Unlike processors, entry-level solutions (FPGAs) provide only limited power gating at the granularity of bank/block or using partial reconfiguration methods [11]. Power gating can minimize power leakage by 90–95% for blocks in an inactive state, which is advantageous for standby and mobile applications [12].
- 8.
-
Techniques for Voltage Scaling (Multi-VDD and DVFS)
- Multi-VDD: The multi-VDD design approach divides the chip into voltage islands, with each island powered at a different constant voltage according to performance requirements [4].
- Example of designs: 1.0 V CPU core, 0.8 V peripherals, 0.6 V for always-on blocks.
- Power Savings: Dynamic power scales quadratically with supply voltage (P ∝ V2) but achieving significant savings with relatively small reductions in voltage [13].
- Level Shifters: For signals crossing from a lower-voltage domain to a higher one. Usually located at domain boundaries and inserted via synthesis or floor planning tools [14].
- Design Complexity: Requires level shifters, which adds area/power overhead. This necessitates multi-rail power delivery networks and attenuated isolation [15].
- SoC Usage: Commonly used in mobile and multimedia. SoCs with different performance profiles among blocks.
- 9.
- Dynamic Voltage and Frequency Scaling (DVFS)
- P-States: DVFS employs performance states, known as "Turbo," "Nominal," and "Eco" modes. For that are predefined voltage-frequency pairs.
- Implementation Needs: On-chip regulators or external PMICs that can scale based on control signals. Programmable clock generators (e.g., PLLs or DPLLs) for frequency steering with voltage. Control firmware keeps track of workload and makes transitions [17].
- Voltage Settling: To avoid timing errors, the voltage must stabilize before increasing frequency. Ramp times and response latency limit the speed of DVFS adaptation.
- FPGA-specific: Most FPGAs are not native DVFS enhanced for logic fabrics, though they may enable it for SoC processing subsystems or IO blocks. Some FPGAs use adaptive body biasing or power islands to simulate DVFS behavior [18]. In variable-load systems such as CPUs, smartphones, and edge AI accelerators, DVFS provides 20–40% average power reduction.
-
Placement and Routing at Low-PowerIn the physical design flow, Placement and routing (P&R) directly affect dynamic and leakage power at the post-layout phase. Reducing interconnecting length, minimizing switching capacitance, optimizing clock distribution, etc., are essential techniques.
- Activity-Driven Placement: Heuristics aware of toggle rates in modern placement tools are used to minimize the capacitance of high-activity nets. Wires are shorter and switched less frequently, so interconnect capacitance saves dynamic power [1]. For instance, low-activity nets can be deprioritized at the cost of increased wire length by optimizing hot nets [2].
-
Merge of a Multi-Bit Flip-Flop (MBFF) MBFFs pack multiple flops into the same physical cell and share the clock driver across the flop cells. This: Reduces clock buffer count.Unbuffered Minimization and Routing → Minimizing clock net capacitanceIt saves power and area with timing that is not affected.MBFF-based optimizations have shown up to 20–30% power reduction on the clock tree [3]. MBFF insertion is typically performed post-placement based on the physical proximity of single-bit flops.
- Placement Along with Gate Sizing and Vt Swapping: opportunistically shrink over-provisioned gates and swap any non-critical ones with higher threshold voltage (HVT) variants for lower leakage.
-
Clustering based on IR Drop and ActivityHigh-activity cells will take larger switching currents and, if remote from power straps, can create localized IR drops. To avoid droop-induced delay and wasted power from timing slacks or retries [5], these cells can be placed close to robust power taps or decoupling capacitors.
-
Retiming and Pipeline InsertionInserting pipeline registers minimizes the length of combinational paths and localizes switching in long interconnects, especially in ASICs and FPGAs. This helps not only to cut glitch power but also to lower capacitive coupling across long wires [6].
-
Crosstalk and Coupling-Aware RoutingRouting engines add shield wires, increase routing track widths, or use guard bands between high-speed nets to avoid toggling on the victim nets. While this is primarily a measure to improve signal integrity, it directly reduces dynamic power due to crosstalk [7].
-
Clock Routing OptimizationClock tree/mesh are designed in: Early clock splitting, Minimal skew buffering, Distributed gating (clock-gate cells next to loads)Thus, high power efficient distribution of the clock is guaranteed. FPGA toolchains emulate these gains with regional clock routing and power-aware packing [8].
-
Multi Threshold-Voltage (Multi-Vt) Annihilation: Multi-Vt design (combination of different levels of threshold voltage in cells) helps to balance leakage vs. performance:LVT (Low-Vt): Fast, leakyHVT(High-Vt): Slow, high-leakageSVT (Standard): Mid-point
-
MTCMOS Strategy: Tools typically hold LVTs on critical paths for a final leakage recovery pass and switch non-timing critical cells to SVT or HVT types [9].
- Advanced Process Support: Such options exist in 7nm and below via metal gate work function engineering, allowing fine-grained leakage control [12].
-
IR Drop and Power Efficiency: Voltage drops in the grid: Logic Slower→More Timing Margin→Increased Dynamic PowerCauses short circuit power during transitionsOvervoltage of triggered voltage from AVS, increasing global power [13]It is expected to keep VDD drop to less than 5%.Designers mitigate this with: Wider metal straps, Strategic C4 bump placement, Decoupling capacitors are distributed [14]
-
EM-Aware RoutingNote: High current densities are prone to electromigration (EM). Tuning inrush currents from gated blocks, for example, requires routing power grids with redundant vias, wide wires, and staged power-ups [15].
-
Current-Aware Floor planningFor example, high-switching units (e.g., GPU cores) are positioned nearest to the voltage sources. Power-up blocks are distributed spatially to minimize clustered power-ups [16].
-
Multi-Vt and power gating are not the only static power suppression techniques.Reverse Body Bias (RBB): Both FDSOI and triple-well bulk processes offer RBB, which raises Vt dynamically during idle modes to minimize leakage. Per-process-dependent effectiveness is further reduced in FinFETs [17].
- Transistor Stacking: Intermediate node biasing makes two or more off transistors in a series leak less than one alone. Libraries use this in gate-level topologies (i.e., NAND/NOR) or dedicated sleep cells [18].
- SRAM Light-Sleep and Retention: Memory is a leakage hotspot. Solutions include: WL/BL biasing in light sleep mode, Low power state retention SRAMs, Unused memory macros power-down control [18]
- Leakage Sensitivity: This leads to exponential variations in leakage over the die with varying Vt changes. Guard banding is used for guard banding in statistical corners such as "FF slow-leaky" [17].
- Adaptive Compensation: To minimize worst-case overheads: Adaptive Voltage Scaling (AVS) adjusts supply voltage at a per-chip granularity. Dynamic post-silicon tuning methods [15] tune bias or clock dynamically
- 1.
-
Software-Controlled Power ManagementSoftware-managed power control is the new way for all modern SoCs — especially mobile and data center SoCs. Understanding and implementing software-level runtime architectures, where proper OS, firmware, or hypervisors dynamically handle DVFS, clock/power gating, and per-core states. With UPF and CPF gaining broad design flow adoption, formal verification, power-aware simulation, and emulation are used to verify complex power-management interactions [6,15]. Architectural designs feature more independent power domains, so architects, RTL designers, and software developers must work together closely.The introduction of FinFETs (16nm/7nm) and now GAA transistors (3nm) [10,12] have improved electrostatic control and dramatically reduced leakage. But FinFETs came with higher gate capacitance and less voltage scaling headroom. Because they are variable, this H endpoint region is not used well (optimal 0.4–0.5V). Consequently, multiple VT flavors will be available across the SQ level and standard cell track heights to trade speed with foundry power. With GAA nanosheets, designers have yet another option for leakage vs. performance control. These device innovations are also matched with increasing power density, and thus, thermal-aware design, dark silicon, and fine-grained throttling are becoming more relevant than ever [14,22].
- 2.
-
Heterogeneous Integration and ChipletsAs Moore's Law stops, the industry moves to chipsets packaged in 2.5D (interposer) or 3D. These allow mixing process technologies (i.e., pairing a low-power IO die with a performance core die). 3D stacking lowers interconnect power but makes dissipating heat and delivering power vertically more complex [16]. Internal regulators now enable per-chipset DVFS, reducing conversion losses and improving response time. Power management chipsets may also be included in future systems to control the supply and monitor locally.
- 3.
-
Automating Machine Learning and Explorative Data AnalysisNew AI-enhanced design tools are hitting the market to search for vast parameter spaces for power reduction. ML algorithms now aid in:Identifying the best DVFS partitions.Tuning of multi-Vt thresholds and placement settings [21].One proposed approach uses reinforcement learning to optimize the voltage island creation process. Power-aware clustering also helps these neural networks in placement legalization. Although new, ML-powered CAD can help leach a low-power design more quickly.This is a more general-purpose technique that, if done appropriately, is widely applicable.
- 4.
-
Different domains drive innovationSensor and IoT are centered on below-threshold functionality and harvesting energy. Analog/mixed-signal blocks are duty-cycled for very low-power standby. Energy per inference is excellent, but AI/ML accelerators go further. Examples are approximate computing (e.g., low-precision MACs), power gating of unused units, and custom data paths [8,18]. Microcontrollers now widely enable near-threshold execution for background tasks that can quickly transition to nominal mode for performance bursts.In summary, there is much more progress in low-power design beyond architectural and transistor-level techniques. And software, machine learning, packaging, and emerging devices are increasingly joining the fray. The newer systems will be adaptive and context-aware; they'll dynamically observe workload, temperature, and silicon characteristics to tune power usage on the fly. The need for energy-efficient digital design will only escalate as data centers get more significant, the Internet of Things flourishes, and edge computing expands. The new wave of design automation needs to put forth intelligent and self-optimizing flows so a balance between algorithmic expertise, architectural features, and physical domain expertise can work together to reduce energy per operation without impacting performance.
Conclusions
References
- Vaithianathan, M., Patil, M., Ng, S. F., & Udkar, S. (2024). Low-power FPGA design techniques for next-generation mobile devices. ESP International Journal of Advancements in Computational Technology, 2(2), 82–93.
- Dai, S., & Campbell, K. (2015). High-level synthesis for low-power design. IPSJ Transactions on System LSI Design Methodology, 8, 12–25.
- Mansuri, N., Vakhare, V., & Shah, K. (2019, June 26). Low power implementation techniques for ASIC physical design. EDN Network. https://www.edn.com/low-power-implementation-techniques-for-asic-physical-design/.
- Synopsys, Inc. (n.d.). What is glitch power? Synopsys Low Power Glossary. Retrieved January 2023, from https://www.synopsys.com/glossary/what-is-glitch-power.html.
- Dillinger, T. (2020, June 26). Multi-Vt device offerings for advanced process nodes. SemiWiki. https://semiwiki.com/semiconductor-services/ic-implementation/285912-multi-vt-device-offerings-for-advanced-process-nodes/.
- Foster, H. (2021, January 27). Part 11: The 2020 Wilson Research Group functional verification study – Low power trends. Verification Horizons Blog (Siemens EDA). https://blogs.sw.siemens.com/verificationhorizons/2021/01/27/part-11-the-2020-wilson-research-group-functional-verification-study-low-power-trends/.
- Chen, W., Wang, Y., Xu, J., & Pedram, M. (2020). Performance comparisons between 7-nm FinFET and conventional bulk CMOS standard cell libraries. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(12), 3562–3566. [CrossRef]
- Maxfield, M. (2006, September 30). Reducing power with an advanced multi-Vdd methodology. EE Times. https://www.eetimes.com/reducing-power-with-an-advanced-multi-vdd-methodology/.
- Semiconductor Engineering. (2020). Dynamic voltage and frequency scaling (DVFS). Semiconductor Engineering Knowledge Center. https://semiengineering.com/knowledge_centers/processes/power/dynamic-voltage-and-frequency-scaling-dvfs/.
- Chatti, K., Rima, A., & Lahiani, F. (2022). Dynamic voltage and frequency scaling and duty-cycling for ultra low-power wireless sensor nodes. Electronics, 11(24), 4071. [CrossRef]
- Garima Thakur, Shruti Jain, & Harsh Sohal (2022). Current issues and emerging techniques for VLSI testing – A review. Measurement: Sensors, 24, 100497. [CrossRef]
- Premananda, B. S., & Sreedhar, S. (2022). Low-power phase frequency detector using hybrid AVLS and LECTOR techniques for low-power PLL. Advances in Electrical and Electronic Engineering, 20(3), 294–301. [CrossRef]
- Srinivas, L., Chandrakala, & Kumar, G. K. (2021). A review on low power area-efficient architecture for linear feedback shift registers. Dogo Rangsang Research Journal, 11(1), 328–330.
- Haq, S. U., & Sharma, V. K. (n.d.). Challenges in low power VLSI design: A review. Proceedings of the 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA 2021). IEEE. [CrossRef]
- Varadharajan, S. K., & Nallasamy, V. (2017, March). Low power VLSI circuits design strategies and methodologies: A literature review. Proceedings of the IEEE Conference on Emerging Devices and Smart Systems (ICEDSS). [CrossRef]
- Raut, K. J., Chitre, A. V., Deshmukh, M. S., & Magar, K. (2021). Low power VLSI design techniques: A review. Journal of University of Shanghai for Science and Technology, 23(11), 172–180.
- Malipatil, S. (2017). Review and analysis of glitch reduction for low power VLSI circuits. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 5(11), 1386–1389.
- Sahu, A. K., Samanth, R., Mendez, T., & Nayak, G. S. (2021, July). VLSI design techniques for low power MAC unit: A review. AIP Conference Proceedings, 2358, 050013. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
