Submitted:
27 November 2025
Posted:
28 November 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Motivation and Research Gap
1.2. Contributions
- A pipelined Keccak-f[1600] core with balanced stage segmentation, loop unrolling, and logic reuse to enable high throughput with low resource overhead.
- An energy-aware design using dynamic clock gating to reduce idle switching activity while supporting high-speed operation.
- Demonstrated energy efficiency of 1.44 Gbps/W, surpassing many recent FPGA-based SHA-3 implementations.
- Full validation using NIST test vectors for functional correctness, followed by post-synthesis performance verification.
2. Related Work
3. Methodology and Design Approach
- Functional Analyst on Keccak-f [1600]: This stage analyses the 24-round permutation core that is the computational engine behind SHA-3. And a well-defined structure will help you to gain efficient hardware mapping.
- Architectural Modelling: A modular architecture is designed where each block, e.g. input formatter, Keccak core and output logic, is handled as an independent, reusable component. Such a design not only permits the trivial scaling of the number of attendees involved in computation, but also simplifies the embedding of use of such a protocol within a larger cryptographic system.
- Design Optimisation: Loop unrolling techniques (even partial loop unrolling), balanced segmentation of pipeline stages, and resource sharing (where tailoring is done to meet individual design metrics) are performed to improve throughput and to mitigate the uncertainty of switching activity and to improve energy efficiency.
- Architecture in VHDL: The architecture for the design is described in VHDL and implemented using the Xilinx Vivado 2024.2.2 toolchain. During synthesis and place-and-route, great care is taken to adhere to timing constraints, manage clocks, and allocate resources.
- Validation and Testing: Official NIST SHA-3 test vectors are used to validate functional correctness. Timing closure and performance metrics are verified to ensure real-world deployability.
3.1. Proposed Design Methodology
3.2. System-Level Architecture Overview
3.2.1. Pre-Processing Unit
3.2.2. Keccak Processing Core
3.2.3. Post-Processing Unit
3.2.4. Optimisation Techniques for High Performance and Efficiency
- Resource Sharing: To reduce LUT usage with no performance overhead, logic reuse techniques were implemented.
- Manual Floor planning: Vivado 2024.2.2 was used to manually optimise timing paths to enhance both placement and routing, as well as provide overall timing [26].
3.2.5. Synthesis and Verification
3.2.6. Performance Metrics
3.2.7. Comparative Analysis with Existing Work
4. System Architecture and Implementation Results
4.1. FPGA Implementation Setup
- Input Interface: Defines the alignment and buffering of input data.
- Message Scheduler: Slices incoming data into chunks and readies them for processing
- Keccak Core (f[1600]): 24-stage pipelined permutation engine
- The LEAVES: Output Interface: Finalises and outputs the SHA-3 digest of 256 bits.

4.2. Hardware Resource Analysis

| Parameter | Value | Description |
| FPGA Device | Xilinx Artix-7 XC7A100T | Target hardware |
| Operating Frequency | 210 MHz | Achieved post-place-and-route |
| SHA-3 Mode | SHA-3-256 | Output digest length |
| Throughput | 1.35 Gbps | Post-pipeline steady-state throughput |
| Total Power | 0.94 W | Total estimated power consumption |
| LUT Utilization | 24% | Logic resource utilisation |
| Flip-Flop Utilization | 18% | Includes all pipelining registers |
| BRAM Usage | 20% | For input buffering and round constants |
| Efficiency | 1.44 Gbps/W | Throughput per watt |
4.3. Power and Energy Efficiency Evaluation
| Component | Dynamic (mW) | Static (mW) | Total (mW) | Percentage (%) |
| Keccak Core | 520 | 120 | 640 | 68% |
| I/O Buffers | 140 | 30 | 170 | 18% |
| Control Logic | 90 | 25 | 115 | 12% |
| Clock Network | 60 | 25 | 85 | 9% |
| Total | 810 | 200 | 1010 | 100% |
4.4. Comparative Evaluation of FPGA Implementations
| Reference | FPGA Device | Frequency (MHz) | Throughput (Gbps) | Power (W) | Efficiency (Gbps/W) | Notes |
| Leonardi et al. [11] | Zynq-7000 | 180 | 1.10 | 1.30 | 0.85 | HWS integration with basic cryptographic accelerators |
| Baird et al. [12] | Artix-7 | 200 | 1.20 | 1.15 | 1.04 | Energy profiling for KangarooTwelve |
| Magyari & Chen [22] | Kintex-7 | 250 | 2.00 | 2.50 | 0.80 | General review of FPGA for IoT, focusing on adaptability |
| Potestad-Ordóñez et al. [18] | Artix-7 | 190 | 1.00 | 1.10 | 0.91 | Fault-resilient SHA-3 with ADC protection |
| Korona et al. [24] | Artix-7 | 210 | 1.30 | 1.10 | 1.18 | Hash functions for traffic acquisition using probes |
| This Work | Artix-7 | 210 | 1.35 | 0.94 | 1.44 | Compact design with gated pipelining and DSP-aware layout |
| Algorithm | FPGA Device | Frequency (MHz) | Throughput (Gbps) | Power (W) | Efficiency (Gbps/W) |
| SHA-2 | Artix-7 | 200 | 1.10 | 0.95 | 1.15 |
| BLAKE2 | Artix-7 | 210 | 1.25 | 1.02 | 1.22 |
| SHA-3 (This Work) | Artix-7 | 210 | 1.35 | |0.94 | 1.44 |
4.5. Discussion of Findings
5. Discussion and Interpretation
5.1. Effect of Pipeline Balancing on Throughput
| Design Variant | Frequency (MHz) | Throughput (Gbps) | Power (W) | Efficiency (Gbps/W) |
| Baseline (No Pipelining) | 140 | 0.78 | 0.80 | 0.98 |
| + Deep Pipelining | 210 | 1.35 | 0.94 | 1.44 |
| + Clock Gating | 210 | 1.35 | 0.89 | 1.52 |
5.2. Impact of Clock Gating on Power Efficiency
5.3. Resource and Performance Trade-Offs
5.4. Comparison with Contemporary Architectures
5.5. Practical Implications and Application Context
5.6. Side-Channel Resilience Evaluation
6. Conclusions
7. Future Work
- Scalable parallel architectures: Future designs could exploit multiple, parallel SHA-3 cores, operating under a single I/O management system. This approach would significantly boost throughput, particularly for applications that require high-speed data processing—such as real-time network traffic analysis or large-scale data integrity checks.
- Processor Integration: If the accelerator is integrated into a chip with a soft-core processor, it can provide functionality for high-level cryptographic protocols as well as adaptive security mechanisms. This hybrid solution balances the programmability of software with the speed of hardware acceleration, delivering both flexibility and performance in a single platform.
- Dynamic power management: Techniques such as voltage and frequency scaling or context-aware clock gating can be explored to optimise energy efficiency. These strategies help reduce power consumption during low-activity periods, making them ideal for battery-powered or portable devices where energy conservation is essential.
- Resistance to Side-Channel Attack: Future improvements should focus on enhancing side-channel resistance. Hardware-level protections such as logic masking, randomised data paths, and secure clock distribution can mitigate risks from electromagnetic leakage and power analysis, which are critical concerns in modern cryptographic systems.
- Support for XOF Variants: Adding support for eXtendable Output Functions (XOFs) like SHAKE128 and SHAKE256 would increase design flexibility. These variants are essential for evolving cryptographic protocols, including digital signatures, hashing, and post-quantum cryptography, and would expand the range of secure applications the accelerator could support.
- Integrated Security Subsystems: The architecture could be extended by integrating crypto engines, verification modules, and anomaly detection components. This would create a self-contained, robust security platform, particularly useful for edge devices and embedded systems that require comprehensive hardware-layer protection.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sideris, T. Sanida, and M. Dasygenis, “High Throughput Implementation of the Keccak Hash Function Using the Nios-II Processor†,” Technologies, vol. 8, no. 1, pp. 2020; 14. [Google Scholar] [CrossRef]
- J. Kelsey, S. J. Kelsey, S. Change, and R. Perlner, “NIST Special Publication 800-185 - SHA-3 derived functions: cSHAKE, KMAC, TupleHash and ParallelHash,” NIST Spec. Publ., 2016, [Online]. Available: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-185.
- National Institute of Standards and Technology, “FIPS PUB 202 SHA-3 Standard : Permutation-Based Hash and,” NIST Fed. Inf. Process. Stand, no. 15. 20 August.
- S. El Moumni, M. Fettach, and A. Tragha, “High throughput implementation of SHA3 hash algorithm on field programmable gate array (FPGA),” Microelectronics J., vol. 93, no. October 2018, 2019. [CrossRef]
- R. Journal and O. F. Information, “On the Interpretation of Results from the NIST Statistical Test Suite,” vol. 18, no. 1, pp. 18–32, 2015.
- F. Kahri, H. F. Kahri, H. Mestiri, B. Bouallegue, and M. Machhout, “High Speed FPGA Implementation of Cryptographic KECCAK Hash Function Crypto-Processor,” J. Circuits, Syst. Comput., vol.. 25, no. 04, p. 1650026, Apr. 2016. [Google Scholar] [CrossRef]
- K. Gaj, J. P. K. Gaj, J. P. Kaps, V. Amirineni, M. Rogawski, E. Homsirikamol, and B. Y. Brewster, “ATHENa - Automated tool for hardware evaluation: Toward fair and comprehensive benchmarking of cryptographic hardware using FPGAs,” Proc. - 2010 Int. Conf. F. Program. Log. Appl. FPL 2010, no. 60, pp. 2010. [Google Scholar] [CrossRef]
- F. Assad, M. Fettach, F. El Otmani, and A. Tragha, “High-performance FPGA implementation of the secure hash algorithm 3 for single and multi-message processing,” Int. J. Electr. Comput. Eng., vol. 12, no. 2, pp. 1324– 1333, 2022. [CrossRef]
- P. Torino, R. P. Torino, R. Prof, G. Masera, P. M. Martina, and I. A. Dolmeta, “Integration and optimisation of a RISC-V based Keccak accelerator,” 2023.
- L. Ioannou, H. E. L. Ioannou, H. E. Michail, and A. G. Voyiatzis, “High-performance pipelined FPGA implementation of the SHA-3 hash algorithm,” Proc. - 2015 4th Mediterr. Conf. Embed. Comput. MECO 2015 - Incl. ECyPS 2015, BioEMIS 2015, BioICT 2015, MECO-Student Chall. 2015, pp. 2015; 71. [Google Scholar] [CrossRef]
- L. Leonardi, G. L. Leonardi, G. Lettieri, and P. Perazzo, “Applied Sciences: On the Hardware – Software Integration in Cryptographic Accelerators for Industrial IoT,” 2022.
- Baird, I. Wadhaj, B. Ghaleb, C. Thomson, and G. Russell, “Evaluating the Energy Costs of SHA-256 and SHA-3 ( KangarooTwelve ) in Resource-Constrained IoT Devices,” vol. 3, 2025.
- Sideris, T. Sanida, and M. Dasygenis, “A Novel Hardware Architecture for Enhancing the Keccak Hash Function in FPGA Devices,” Inf., vol. 14, no. 2023; 9. [Google Scholar] [CrossRef]
- H. E. Michail, L. H. E. Michail, L. Ioannou, and A. G. Voyiatzis, “Pipelined SHA-3 Implementations on FPGA,” pp. 2015; 18. [Google Scholar] [CrossRef]
- Dolmeta, M. Martina, and G. Masera, “Comparative Study of Keccak SHA-3 Implementations,” Cryptography, vol. 7, no. 2023; 4. [Google Scholar] [CrossRef]
- S. Chauhan and R. Shrestha, “Reconfigurable and Hardware-Efficient KECCAK Architecture with SHAKE Integration and Dynamic Input Processing for Post Quantum Cryptography,” 2025 Int. VLSI Symp. Technol. Syst. Appl. VLSI TSA 2025 - Proc. Tech. Pap., pp. 2025; 4. [CrossRef]
- Sideris, T. Sanida, and M. Dasygenis, “Hardware acceleration design of the SHA-3 for high throughput and low area on FPGA,” J. Cryptogr. Eng., vol. 14, no. 2, pp. 2024. [Google Scholar] [CrossRef]
- F. E. Potestad-ordóñez and A. Casado-galán, “Protecting FPGA-Based Cryptohardware Implementations from Fault Attacks Using ADCs,” pp. 1–15, 2024.
- X. Zhang et al., “Design and Analysis of Area and Energy Efficient Reconfigurable Cryptographic Accelerator for Securing IoT Devices,” Sensors, vol. 22, no. 2022; 23. [CrossRef]
- Jungk and, M. Stöttinger, “Serialised lightweight SHA-3 FPGA implementations,” Microprocess. Microsyst., vol. 2019; 71. [Google Scholar] [CrossRef]
- S. Xiong et al., “A Lightweight Folded Keccak-Based SHA-3 for Resource-Constrained Embedded Security,” Proc. - 2024 IEEE 17th Int. Symp. Embed. Multicore/Many-core Syst. MCSoC 2024, pp. 2024. [CrossRef]
- Magyari and, Y. Chen, “Review of state-of-the-art FPGA applications in IoT Networks,” Sensors, vol. 22, no. 19, p. 7496, 2022.
- S. Stoyanov and N. Kakanakov, “FPGA Prototyping of Heterogeneous Security Architecture for Educational Purposes†,” pp. 1–7, 2025.
- M. Korona and M. Rawski, “Comparison of Hash Functions for Network Traffic Acquisition Using a Hardware-Accelerated Probe,” 2022.
- Torres-Alvarado, L. A. Morales-Rosales, I. Algredo-Badillo, F. López-Huerta, M. Lobato-Báez, and J. C. López-Pimentel, “An SHA-3 Hardware Architecture against Failures Based on Hamming Codes and Triple Modular Redundancy,” Sensors, vol. 22, no. 8, pp. 2022; 29. [Google Scholar] [CrossRef]
- Sideris and, M. Dasygenis, “Enhancing the Hardware Pipelining Optimisation Technique of the SHA-3 via FPGA,” Computation, vol. 11, no. 2023; 8. [Google Scholar] [CrossRef]
- Y. Akiya et al., “SHA-3-LPHP: Hardware Acceleration of SHA-3 for Low-Power High-Performance Systems,” Proc. - 2021 IEEE Int. Symp. Softw. Reliab. Eng. Work. ISSREW 2021, vol. 3, pp. 2021. [CrossRef]
- E. K. O. Putra, O. E. K. O. Putra, O. Natan, and J. E. Istiyanto, “Optimizing Fpga Resource Allocation for Sha-3 Using Dsp48 and Pipelining Techniques,” IIUM Eng. J., vol. 26, no. 1, pp. 2025. [Google Scholar] [CrossRef]





| Parameter | Symbol / Unit | Measured Value | Description |
| Maximum Operating Frequency |
f<sub>max</sub> (MHz) |
210 | Achieved clock frequency after place-and-route |
| Throughput |
T<sub>p</sub> (Gbps) |
1.35 | Effective data rate for SHA3-256 mode |
| Total Power Consumption |
P<sub>total</sub> (W) |
0.94 | Measured dynamic and static power under load |
| Performance per Watt | η (Gbps/W) | 1.44 | Ratio of throughput to total power consumption |
| LUT Utilisation | (%) | 24 | Percentage of available look-up tables used |
| Flip-Flop Utilisation | (%) | 18 | Percentage of total registers occupied |
| Block RAM (BRAM) Utilisation | (%) | 12 | Memory blocks used for intermediate storage |
| DSP Slice Utilisation | (%) | 5 | DSP resources used for arithmetic operations |
| Reference | FPGA Platform | Throughput (Gbps) | Power (W) | Performance/Watt (Gbps/W) | Notes |
| Leonardi et al. [11] | Xilinx Zynq-7000 | 1.10 | 1.20 | 0.92 | Hardware–software integration; general-purpose accelerator |
| Baird et al. [12] | Xilinx Artix-7 | 1.20 | 1.10 | 1.09 | Focused on KangarooTwelve variant; moderate power consumption |
| Magyari & Chen [22] | Xilinx Spartan-6 | 0.95 | 1.00 | 0.95 | General FPGA usage in IoT; not optimised for hashing |
| Potestad-Ordóñez et al. [18] | Xilinx Kintex-7 | 1.50 | 1.70 | 0.88 | Fault detection supported via ADCs; power overhead |
| Stoyanov et al. [23] | Xilinx Artix-7 | 1.30 | 1.20 | 1.08 | Modular security framework for educational prototyping |
| This Work | Xilinx Artix-7 | 1.35 | 0.94 | 1.44 | Energy-efficient, compact SHA-3 cores with gated pipelining |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).