Submitted:
08 May 2024
Posted:
22 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
- 1.
- We proposed a new data accessing pattern on the NTT algorithm as well as appropriate reordering in order to reduce the BRAM to just LUTRAM of the FPGA Architecture, which can support shallow-depth and long-width requirements for unrolling.
- 2.
- We also propose two novel butterfly units that are DSPs-free and have low resource utilization. The most expensive operation of the butterfly unit in NTT, in terms of resources and time, is the modular multiplication of the coefficient with the root of unities. The first approach we propose utilizes the fact that with parameter , which means the coefficient is up to long, we can split it to sum of multiples precomputed product with the roots of unity and can eliminate the full multiplication and as well as the need for dedicated storage for the root of unities completely.
- 3.
- The second butterfly unit we propose will utilize the quarter square multiplication to perform modular multiplication , by realizing the fact that although a multiplication typically requires a depth look-up table while a squares only requires . Therefore, by replacing multiplication by squares with additional processing, we can fit the quarter squares on a single dual-port ROM that fits neatly on one BRAM. This approach also avoids the need for the full product to be computed and reduced.
- 4.
- We propose a detailed pipelined butterfly unit based on the proposed approach with a short critical path and thus can operate at a higher frequency.
- 5.
- And finally, we synthesize PnR and verify the functionality on an Xilinx Artix-7 FPGA, which can run at up to 446MHz.
3. Background Knowledge
| Algorithm 1: Cooley-Tukey forward NTT. |
![]() |
| Algorithm 2: Gentleman-Sande inverse NTT. |
![]() |
4. Proposed NTT Architecture
4.1. Butterfly Unit
4.1.1. Multiplication-Less Method
4.1.2. Quarter Square Multiplication Method
4.2. Realign Buffer
5. Experimental Results
5.1. Experimental Setup
5.2. Results and Comparison
6. Conclusions
Acknowledgments
References
- Shor, P.W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM Journal on Computing 1997, 26, 1484–1509. [Google Scholar] [CrossRef]
- Grover, L.K. A Fast Quantum Mechanical Algorithm for Database Search. Annual ACM Symp. on Theory of Comp. 1996, 212–219. [Google Scholar]
- Rivest, R.L.; Shamir, A.; Adleman, L. A Method for Obtaining Digital Signatures and Public-key Cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef]
- Miller, V.S. Use of Elliptic Curves in Cryptography. Advances in Cryptology (CRYPTO) 1985, 417–426. [Google Scholar]
- Bos, J.; Ducas, L.; Kiltz, E.; Lepoint, T.; Lyubashevsky, V.; Schanck, J.M.; Schwabe, P.; Seiler, G.; Stehle, D.D. CRYSTALS - Kyber: A CCA-Secure Module-Lattice-Based KEM. European Symp. on Secu. and Privacy (EuroS&P) 2018, 353–367. [Google Scholar]
- Lyubashevsky, V.; Peikert, C.; Regev, O. On Ideal Lattices and Learning with Errors Over Rings. Advances in Cryptology (EUROCRYPT) 2010, 1–23. [Google Scholar]
- Lindner, R.; Peikert, C. Better Key Sizes (and Attacks) for LWE-Based Encryption. Topics in Cryptology (CT-RSA) 2011, 319–339. [Google Scholar]
- Das, M.; Jajodia, B.B. Hardware Design of Optimized Large Integer Schoolbook Polynomial Multiplications on FPGA. Int. SoC Design Conf. (ISOCC) 2022, 65–66. [Google Scholar]
- Zhang, Y.; Cui, Y.; Ni, Z.; Kundi, D.; Liu, D.; Liu, W. . A Lightweight and Efficient Schoolbook Polynomial Multiplier for Saber. IEEE Int. Symp. on Circ. and Syst. (ISCAS) 2022, 2251–2255. [Google Scholar]
- Birgani, Y.A.; Timarchi, S.; Khalid, A. Area-Time-Efficient Scalable Schoolbook Polynomial Multiplier for Lattice-Based Cryptography. IEEE Trans. on Circ. and Syst. II: Express Briefs (TCAS-II) 2022, 69, 5079–5083. [Google Scholar] [CrossRef]
- Yang, S.; Liu, D.; Hu, A.; Li, A.; Zhang, J.; Li, X.; Lu, J.; Mo, C. An Instruction-configurable Post-quantum Cryptographic Processor Towards NTRU. Asian Hardware Oriented Secu. and Trust Symp. (AsianHOST) 2022, 1–6. [Google Scholar]
- Wong, Z.Y.; Wong, D.C.K.; Lee, W.K.; Mok, K.M.; Yap, W.S.; Khalid, A. KaratSaber: New Speed Records for Saber Polynomial Multiplication Using Efficient Karatsuba FPGA Architecture. IEEE Trans. on Computers 2023, 72, 1830–1842. [Google Scholar] [CrossRef]
- Ghosh, A.; Mera, J.M.B.; Karmakar, A.; Das, D.; Ghosh, S.; Verbauwhede, I.; Sen, S. A 334 μW 0.158 mm2 ASIC for Post-Quantum Key-Encapsulation Mechanism Saber With Low-Latency Striding Toom–Cook Multiplication. IEEE Journal of Solid-State Circ. 2023, 58, 2383–2398. [Google Scholar] [CrossRef]
- Wang, J.; Yang, C.; Zhang, F.; Meng, Y.; Su, Y. TCPM: A Reconfigurable and Efficient Toom-Cook-Based Polynomial Multiplier Over Rings Using a Novel Compressed Postprocessing Algorithm. IEEE Trans. on Very Large Scale Integration (VLSI) Syst. 2023, 31, 1153–1166. [Google Scholar] [CrossRef]
- Ye, Z.; Cheung, R.; Huang, K. PipeNTT: A Pipelined Number Theoretic Transform Architecture. IEEE Trans. on Circ. and Syst. II: Express Briefs (TCAS-II) 2022, 69, 4068–4072. [Google Scholar] [CrossRef]
- Xin, M.; Xu, C.; Huang, K.; Yu, H.; Yao, H.; Jiang, X.; Liu, D. Implementation of Number Theoretic Transform Unit for Polynomial Multiplication of Lattice-based Cryptography. Int. Conf. on Consumer Elec. and Comp. Engi. (ICCECE) 2022, 323–327. [Google Scholar]
- Nguyen, T.H.; Binh, K.D.N.; Pham, C.K.; Hoang, T.T. High-Speed NTT Accelerator for CRYSTAL-Kyber and CRYSTAL-Dilithium. IEEE Access 2024, 12, 34918–34930. [Google Scholar] [CrossRef]
- Gauss, C.F. Nachlass: Theoria Interpolationis Methodo Nova Tractata. Carl Friedrich Gauss 1866, 3, 265–303. [Google Scholar]
- Pollard, J.M. The Fast Fourier Transform in a Finite Field. Mathematics of Computation 1971, 25, 365–374. [Google Scholar] [CrossRef]
- Cooley, J.W.; Tukey, J.W. An Algorithm for the Machine Calculation of Complex Fourier Series. Mathematics of Computation 1965, 19, 297–301. [Google Scholar] [CrossRef]
- Zhang, C.; Liu, D.; Liu, X.; Zou, X.; Niu, G.; Liu, B.; Jiang, Q. Towards Efficient Hardware Implementation of NTT for Kyber on FPGAs. IEEE Conference Publication 2021. [Google Scholar]
- Ni, Z.; Khalid, A.; Liu, W.; O’Neill, M. Towards a Lightweight CRYSTALS-Kyber in FPGAs: an Ultra-lightweight BRAM-free NTT Core. IEEE Conference Publication 2023. [Google Scholar]
- Longa, P.; Naehrig, M. Speeding Up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography. Cryptology ePrint Archive, Paper 2016/504, 2016. https://eprint.iacr.org/2016/504.
- Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. Instruction-Set Accelerated Implementation of CRYSTALS-Kyber. IEEE Trans. on Circ. and Syst. I: Regular Papers (TCAS-I) 2021, 68, 4648–4659. [Google Scholar] [CrossRef]
- Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. High-Speed NTT-based Polynomial Multiplication Accelerator for Post-Quantum Cryptography. IEEE Conference Publication 2021. [Google Scholar]
- Bisheh-Niasar, M.; Azarderakhsh, R.; Mozaffari-Kermani, M. Instruction-Set Accelerated Implementation of CRYSTALS-Kyber. IEEE Journals & Magazine 2021, 68. [Google Scholar]
- Yaman, F.; Mert, A.C.; Öztürk, E.; Savas, E. A Hardware Accelerator for Polynomial Multiplication Operation of CRYSTALS-KYBER PQC Scheme. IEEE Conference Publication 2021. [Google Scholar]
- Xing, Y.; Li, S. A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA. IACR Transactions on Cryptographic Hardware and Embedded Systems 2021. [Google Scholar] [CrossRef]






| Mode | LUTs | FFs | DSPs | BRAMs | Freq | Cycles | Time | Platform | LUT | FF | |
| () | () | ATP | ATP | ||||||||
| This work ( x2) | ◐ | 429 | 538 | 0 | 4 | 446 | 459 | 1.03 | Artix-7 | 441.87 | 554.14 |
| This work ( x1) | ◐ | 379 | 414 | 0 | 1 | 435 | 910 | 2.05 | Artix-7 | 792.11 | 865.26 |
| (quarter squares) | |||||||||||
| This work ( x2) | ● | 541 | 680 | 0 | 4 | 417 | 461 | 1.10 | Artix-7 | 595.10 | 748.00 |
| [25] | ● | 810 | 717 | 4 | 2 | 222 | 324 | 1.46 | Artix-7 | 1182.60 | 1046.82 |
| [21] | ● | 609 | 640 | 2 | 4 | 257 | 490 | 1.91 | Artix-7 | 1163.19 | 1222.40 |
| [22] | ● | 1154 | 1031 | 2 | 0 | 300 | 456 | 1.52 | Ultrascale+ | 1754.08 | 1567.12 |
| [26] x1 | ● | 360 | 145 | 3 | 2 | 115 | 940 | 8.17 | Artix-7 | 2941.20 | 1184.65 |
| [26] x2 | ● | 737 | 290 | 6 | 4 | 115 | 474 | 4.12 | Artix-7 | 3036.44 | 1194.80 |
| [27] (1) | ● | 2543 | 792 | 4 | 9 | 182 | 232 | 1.27 | Artix-7 | 3229.61 | 1005.84 |
| [27] (2) | ● | 9508 | 2684 | 16 | 35 | 172 | 69 | 0.40 | Artix-7 | 3803.20 | 1073.60 |
| [27] (3) | ● | 948 | 325 | 1 | 2.5 | 190 | 904 | 4.76 | Artix-7 | 4512.48 | 1547.00 |
| [28] | ● | 1737 | 1167 | 2 | 3 | 161 | 512 | 3.18 | Artix-7 | 5523.66 | 3711.06 |
| ◐ Only NTT | ● Both NTT and INTT | ||||||||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

