Submitted:
20 May 2026
Posted:
21 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- An ML-KEM polynomial ring accelerator based on Open IP and open hardware;
- Integration into a Chipyard-based RISC-V SoC via an MMIO interface;
- System-level implementation and evaluation on an FPGA-based platform;
- Validation of open hardware platforms for reproducible PQC system research.
2. Background and Related Works
2.1. ML-KEM and NTT-based Polynomial Arithmetic
2.2. ML-KEM Hardware Accelerators Survey
2.3. Open Hardware and RISC-V SoC Platforms
3. Polynomial ring multiplication accelerator for ML-KEM in RISC-V SoC
3.1. System Architecture of the RISC-V SoC
3.2. Design of the ML-KEM Polynomial Ring Accelerator
- Local Buffer
- 2.
- Butterfly Array
- 3.
- Multiplier Array
- 4.
- Adder Array
- 5.
- Scratchpad Memory
- 6.
- Controller
3.3. Matrix–Vector Multiplication over Polynomial Rings for ML-KEM
4. Implementation Results and Discussion
4.1. Hardware Resource Utilization
4.2. Performance Analysis of Polynomial Ring Computation
- Data load time requires 2967 cycles;
- NTT computation requires 280 cycles;
- Pointwise multiplication requires 122 cycles;
- INTT computation requires 150 cycles;
- Data store time requires 1964 cycles.
4.3. SoC-Level Performance Evaluation of ML-KEM
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| BRAM | Block RAM |
| CPU | Central Processing Unit |
| DDR | Double Data Rate |
| DSP | Digital Signal Processing |
| FF | Flip-Flop |
| FIPS | Federal Information Processing Standards |
| FPGA | Field-Programmable Gate Array |
| INTT | Inverse Number Theoretic Transform |
| IP | Intellectual Property Core |
| ISA | Instruction Set Architecture |
| LUT | Look-Up Table |
| ML-DSA | Module-Lattice-Based Digital Signature Algorithm |
| ML-KEM | Module-Lattice-Based Key Encapsulation Mechanism |
| MMIO | Memory-Mapped I/O |
| MPRA | ML-KEM Polynomial Ring Accelerator |
| NTT | Number Theoretic Transform |
| OS | Operating System |
| OTBN | OpenTitan Big Number |
| PQC | Post-Quantum Cryptography |
| RISC-V | Open-standard RISC instruction set architecture |
| RSA | Rivest–Shamir–Adleman |
| RTL | Register-Transfer Level |
| RV64GC | RISC-V 64-bit General-purpose ISA with Compressed extension |
| SHAKE | Secure Hash Algorithm Keccak |
| SoC | System-on-Chip |
| XOF | Extendable Output Function |
Appendix A
| Security-level-dependent dimension parameter in ML-KEM. | |
| Polynomial vector or matrix operand in matrix–vector multiplication. | |
| Polynomial vector operand in matrix–vector multiplication. | |
| Output polynomial or matrix–vector multiplication result. | |
| in the ML-KEM-512 case. | |
| in the ML-KEM-512 case. | |
| Intermediate result of the first pointwise multiplication in the NTT domain. | |
| Intermediate result of the second pointwise multiplication in the NTT domain. | |
| Accumulated intermediate result in the NTT domain before INTT. | |
| Number Theoretic Transform. | |
| Inverse Number Theoretic Transform. | |
| Pointwise multiplication in the NTT domain. | |
References
- Shor, P.W. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS), Santa Fe, NM, USA, 20–22 November 1994; pp. 124–134. [Google Scholar]
- National Institute of Standards and Technology (NIST). Migration to Post-Quantum Cryptography; NIST Interagency Report (IR) 8547 (Initial Public Draft); NIST: Gaithersburg, MD, USA, 2024. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology (NIST). Module-Lattice-Based Key-Encapsulation Mechanism (ML-KEM); FIPS 203; NIST: Gaithersburg, MD, USA, 2024. [Google Scholar] [CrossRef]
- Tan, W.; Lao, Y.; Parhi, K.K. KyberMat: Efficient accelerator for matrix–vector polynomial multiplication in CRYSTALS-Kyber. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Francisco, CA, USA, 29 October–2 November 2023. [Google Scholar] [CrossRef]
- Bos, J.W.; Ducas, L.; Kiltz, E.; Lepoint, T.; Lyubashevsky, V.; Schanck, J.M.; Schwabe, P.; Seiler, G.; Stehlé, D. CRYSTALS-Kyber: A CCA-secure module-lattice-based KEM. In Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), London, UK, 24–26 April 2018; pp. 353–367. [Google Scholar] [CrossRef]
- Waris, A.; Aziz, A.; Khan, B.M. Area-time efficient pipelined number theoretic transform architecture for CRYSTALS-Kyber. IEEE Access 2021, 9, 109424–109438. [Google Scholar] [CrossRef]
- Yaman, F.; Mert, A.C.; Öztürk, E.; Savaş, E. A hardware accelerator for polynomial multiplication operation of CRYSTALS-Kyber PQC scheme. In Proceedings of the Design, Automation and Test in Europe Conference (DATE), Grenoble, France, 1–5 February 2021. [Google Scholar] [CrossRef]
- Wang, T.; Zhang, C.; Zhang, X.; Gu, D.; Cao, P. Hardware–software co-design for Kyber and Dilithium on RISC-V SoC FPGA. In Cryptogr. Hardw. Embed. Syst.; IACR, Translator; 2024; Volume 3, pp. 99–135. [Google Scholar] [CrossRef]
- Amid, A.; Biancolin, D.; Lee, A.; et al. Chipyard: An integrated design framework for custom SoCs. IEEE Micro 2020, 40, 10–21. [Google Scholar] [CrossRef]
- Pramstaller, N.; Zaruba, F.; Benini, L.; et al. Ibex: A small, efficient RISC-V processor core. OpenTitan Project Documentation. 2020. Available online: https://opentitan.org (accessed on 2 April 2026).
- OpenTitan Project. OpenTitan: Open source silicon root of trust. Available online: https://opentitan.org (accessed on 2 April 2026).
- Liu, S.-H.; Kuo, C.-Y.; Mo, Y.-N.; Su, T. An area-efficient, conflict-free, and configurable architecture for accelerating NTT/INTT. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2024, 32, 519–529. [Google Scholar] [CrossRef]
- Kim, H.; Jung, H.; Satriawan, A.; Lee, H. A configurable ML-KEM/Kyber hardware accelerator. IEEE Trans. Circuits Syst. II 2024, 71, 4678–4682. [Google Scholar] [CrossRef]
- Ni, Z.; Khalid, A.; Liu, W.; O’Neill, M. A highly hardware-efficient ML-KEM accelerator. ACM Trans. Embed. Comput. Syst. 2025, 24, 1–24. [Google Scholar] [CrossRef]
- Cui, Y.; Chen, J.; Ni, Z.; Zhang, Z.; Wang, C.; Liu, W. Instruction-based hardware controller of CRYSTALS-Kyber. IEEE Trans. Circuits Syst. I 2025, 72, 2394–2407. [Google Scholar]
- Dolmeta, A.; Valpreda, E.; Martina, M.; Masera, G. Integration of NTT/INTT accelerator on RISC-V. In Proceedings of the ACM Computing Frontiers Conference (CF), Ischia, Italy, 7–9 May 2024; pp. 59–62. [Google Scholar] [CrossRef]
- Abdulrahman, A.; Oberhansl, F.; Pham, H.N.; Philipoom, J.; Schwabe, P.; Stelzer, T.; Zankl, A. Towards ML-KEM and ML-DSA on OpenTitan. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 2025. [Google Scholar] [CrossRef]
- OpenHW Group. CORE-V CV32E40P RISC-V processor core user manual. Available online: https://docs.openhwgroup.org/projects/cv32e40p-user-manual (accessed on 2 April 2026).
- Dam, D.-T.; Nguyen, T.-H.; et al. RISC-V SoC with NTT-Blackbox. In Proceedings of the ICDV, 2024; pp. 49–54. [Google Scholar] [CrossRef]
- Dam, D.-T.; Nguyen, K.-D.; Le, D.-H.; Pham, C.-K. High-efficiency NTT for ML-KEM on RISC-V. Electronics 2026, 15, 100. [Google Scholar] [CrossRef]
- Rumelili Köksal, C.I.; Örs Yalçın, S.B. Efficient modeling and usage of scratchpad memory. Electronics 2025, 14, 1032. [Google Scholar] [CrossRef]
- Huang, Y.; Zhao, Y.; Chen, Z.; Li, X. High-Speed NTT-Based Polynomial Multiplication Accelerator for CRYSTALS-Kyber Post-Quantum Cryptography. IEEE Access 2020, 8, 203000–203012. [Google Scholar] [CrossRef]
- Chen, Z.; Ma, Y.; Chen, T.; Lin, J.; Jing, J. Towards efficient Kyber on FPGAs: A processor for vector of polynomials. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), Beijing, China, 13–16 January 2020; pp. 247–252. [Google Scholar]
- Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari Kermani, M. High-Speed NTT-Based Polynomial Multiplication Accelerator for Post-Quantum Cryptography. In Proceedings of the 28th IEEE Symposium on Computer Arithmetic (ARITH), Virtual Conference, 14–16 June 2021; pp. 94–101. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, D.; Chen, Z.; Jing, J. Towards Efficient Hardware Implementation of NTT for Kyber on FPGAs. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Karabulut, A.; et al. RANTT: A RISC-V architecture extension for the number theoretic transform. In Proceedings of the International Conference on Field-Programmable Logic and Applications (FPL), Gothenburg, Sweden, 31 August–4 September 2020; pp. 26–32. [Google Scholar] [CrossRef]
- Celik, A.; Yilmaz, F.; Korkmaz, M.A.; Ors, B. Implementation of CRYSTALS-Kyber post-quantum algorithm using RISC-V processor. In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems (ICECS), Istanbul, Turkey, 4–7 December 2023. [Google Scholar] [CrossRef]
- Avanzi, R.; Bos, J.; Ducas, L.; Kiltz, E.; Lepoint, T.; Lyubashevsky, V.; Schanck, J.M.; Schwabe, P.; Seiler, G.; Stehlé, D. CRYSTALS-Kyber algorithm specifications and supporting documentation; Third-round submission to the NIST post-quantum cryptography standardization process, 2020. Available online: https://csrc.nist.gov/projects/post-quantum-cryptography (accessed on 2 April 2026).








| Work | Year | Algorithm | Architecture | Integration | Impl. | Open IP |
| Chen et al. [23] |
2020 | Kyber | Polynomial vector processor | Standalone accelerator | FPGA | No |
| Huang et al. [22] |
2020 | Kyber | NTT-based polynomial multiplication | Standalone accelerator | FPGA | No |
| Karabulut et al. [26] | 2020 | NTT | RISC-V ISA extension for NTT | CPU-integrated (ISA extension) | ASIC | No |
| Waris et al. [6] |
2021 | Kyber | NTT/INTT-based polynomial multiplier | Standalone accelerator | FPGA | No |
| Yaman et al. [7] |
2021 | Kyber | Polynomial multiplication accelerator | Standalone accelerator | FPGA | Yes |
| Bisheh-Niasar et al. [24] | 2021 | Kyber | NTT-based polynomial multiplier | Standalone accelerator | FPGA | No |
| Zhang et al. [25] |
2021 | Kyber | Efficient NTT architecture | Standalone accelerator | FPGA | No |
| Celik et al. [27] |
2023 | Kyber | Keccak hardware acceleration | RISC-V CPU (Ibex, MMIO/interrupt-based) | FPGA | No |
| Liu et al. [12] |
2024 | NTT/INTT | Configurable NTT/INTT accelerator | Standalone accelerator | ASIC | No |
| Kim et al. [13] |
2024 | ML-KEM | Configurable full KEM accelerator | Standalone full accelerator | ASIC | No |
| Wang et al. [8] |
2024 | Kyber / Dilithium | HW/SW co-design with polynomial accelerators | RISC-V SoC (HW/SW co-design) | FPGA | No |
| Dolmeta et al [16] |
2024 | Kyber | Memory-mapped NTT/INTT accelerator | RISC-V SoC (MMIO-based) | FPGA | No |
| Dam et al. (ICDV) [19] |
2024 | Kyber | NTT black-box accelerator | Chipyard RISC-V SoC (MMIO / peripheral) | FPGA/ASIC | Partial |
| Ni et al. [14] |
2025 | ML-KEM | Full ML-KEM accelerator | Standalone full accelerator | ASIC | No |
| Cui et al. [15] |
2025 | Kyber | Instruction-based hardware controller | CPU–accelerator (instruction-based) | ASIC | No |
| Abdulrahman et al. [17] | 2025 | ML-KEM / ML-DSA | OpenTitan OTBN extension | OpenTitan SoC (OTBN-based) | ASIC | Partial |
| Dam et al. (Electronics) [20] | 2026 | ML-KEM | Tightly integrated NTT accelerator with custom instructions | Chipyard RISC-V SoC (RoCC tightly-coupled) | ASIC | Partial |
| Proposed Work | 2026 | ML-KEM | ML-KEM Polynomial ring accelerator | Chipyard RISC-V SoC (MMIO / peripheral) | FPGA | Yes |
| Work | Algorithm | NTT Accelerator | Full ML-KEM | RISC-V SoC | Open Hardware |
| Wang et al. (TCHES) [8] |
Kyber | ✓ | Partial | ✓ | ✗ |
| Dolmeta et al. [16] |
Kyber | ✓ | ✗ | ✓ | ✗ |
| Abdulrahman et al. [17] |
ML-KEM | ✓ | Partial | ✓ | Partial |
| Dam et al. (ICDV) [19] |
Kyber | ✓ | ✗ | ✓ | Partial |
| Dam et al. (Electronics) [20] |
ML-KEM | ✓ | Partial | ✓ | Partial |
| Proposed Work | ML-KEM | ✓ (Integrated NTT datapath) |
✓ (Full matrix–vector) |
✓ (Chipyard SoC + OS support) |
✓ (Open IP + reproducible) |
| Implementation | LUT | FF | BRAM | DSP |
| Proposed SoC without MPRA |
67356/203800 (33.05%) | 42918/407600 (10.53%) | 143/445 (32.13%) | 15/840 (1.79%) |
| Proposed SoC with MPRA |
78514/203800 (38.52%) | 52613/407600 (12.91%) | 177/445 (39.77%) | 31/840 (3.69%) |
| Work | Target Operation | NTT Accelerator | Polynomial Multiplication | Modular Addition | Matrix–Vector Multiplication | Hash Accel-erator | SoC-Level Evaluation |
| Yaman et al. [7] |
Polynomial Multiplication | ✓ | ✓ | ✗ | ✗ | ✗ | Partial |
| Dam et al. [19] |
Polynomial Multiplication | ✓ | ✓ | ✗ | ✗ | ✗ | Partial |
| Karabulut et al. [26] | Polynomial Multiplication | ✓ | ✓ | ✗ | ✗ | ✗ | Partial |
| Celik et al. [27] |
Keccak Acceleration | ✗ | ✗ | ✗ | ✗ | ✓ | Full |
| Proposed Work | Matrix–Vector Multiplication over Polynomial Rings | ✓ | ✓ | ✓ | ✓ | ✗ | Full |
| Implementations | Steps | Clocks | Total Clocks | Latency (μs) | |
| Proposed work | NTT (2 NTTs) | Data Load Time | 2967 | 5483 | 54.83 |
| NTT Core for NTT | 280 | ||||
| Pairwise Multiplication | 122 | ||||
| INTT (1 INTT) | NTT Core for INTT | 150 | |||
| Data Store Time | 1964 | ||||
| Dam et al. [19] | NTT (1 NTT) | Data Load Time | 2084 | 9842 | ✗ |
| NTT Core for NTT | 5682 | ||||
| Data Store Time | 2076 | ||||
| Karabulut et al. [26] | NTT (1 NTT) | Data Load Time | 43756 | 43756 | ✗ |
| NTT Core for NTT | |||||
| Data Store Time | |||||
| Kyber C code [30] | NTT | 66394 | 143196 | 1431.96 | |
| Pairwise Multiplication | 18686 | ||||
| INTT | 56098 | ||||
| Implementations | Clocks | Latency (μs) | Speed up |
| Kyber C code [30] | 296485 | 2964.85 | 1 |
| Proposed HW Accelerator with Pointwise Multiplication Only | 12037 | 120.37 | 24.6 |
| Proposed HW Accelerator with Pointwise Multiplication and Modular Addition Support | 7372 | 73.72 | 40.2 |
| Algorithm | Operation | Kyber C code [30] | Proposed work | Speed-up |
| ML-KEM 512 | Encaps | 929391 | 554479 | 1.67 |
| Decaps | 1037658 | 492533 | 2.10 | |
| ML-KEM 768 | Encaps | 1447649 | 877912 | 1.64 |
| Decaps | 1604701 | 788022 | 2.03 | |
| ML-KEM 1024 | Encaps | 2120848 | 1295500 | 1.63 |
| Decaps | 2317043 | 1193890 | 1.94 |
| Work | CPU | ISA | Clock Rate | Execution Time |
Encaps (SW) |
Encaps (SW/HW) |
Decaps (SW) |
Decaps (SW/HW) |
| Celik et al. [27] | Ibex Core | RV32IMC | 50MHz | Cycles | 5886277 | 3115537 | 6222787 | 3957289 |
| Latency (μs) | 117725.54 | 62310.74 | 124455.74 | 79145.78 | ||||
| Proposed Work | Rocket Core | RV64GC | 100MHz | Cycles | 1447649 | 877912 | 1604701 | 788022 |
| Latency (μs) | 14476.49 | 8779.12 | 16047.01 | 7880.22 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).