Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Compact and Low-latency FPGA-based NTT Architecture for CRYSTALS Kyber Post-Quantum Cryptography Scheme

Version 1 : Received: 8 May 2024 / Approved: 22 May 2024 / Online: 22 May 2024 (15:09:05 CEST)

How to cite: Kieu-Do-Nguyen, B.; The Binh, N.; Pham-Quoc, C.; Huynh, P. N.; Tran, N.-T.; Hoang, T.-T.; Pham, C.-K. Compact and Low-latency FPGA-based NTT Architecture for CRYSTALS Kyber Post-Quantum Cryptography Scheme. Preprints 2024, 2024051452. https://doi.org/10.20944/preprints202405.1452.v1 Kieu-Do-Nguyen, B.; The Binh, N.; Pham-Quoc, C.; Huynh, P. N.; Tran, N.-T.; Hoang, T.-T.; Pham, C.-K. Compact and Low-latency FPGA-based NTT Architecture for CRYSTALS Kyber Post-Quantum Cryptography Scheme. Preprints 2024, 2024051452. https://doi.org/10.20944/preprints202405.1452.v1

Abstract

In the era of the post-quantum Internet of Things (IoT), the implementation of anti-quantum cryptographic algorithms in numerous terminals can successfully defend against prospective quantum computing assaults. Lattice-based cryptography can withstand quantum computing attacks, making it a viable substitute for the currently prevalent classical public-key cryptographic technique. Nevertheless, the algorithm’s significant time complexity will result in a substantial computational burden on the edge computing chip in the IoT terminal. The computation of polynomial multiplication is the most demanding task in lattice-based cryptographic algorithms. Therefore, the investigation into efficient methods for calculating polynomial multiplication is highly important. Quick number theory transformations (NTT) are a widely employed technique to accelerate polynomial multiplication. This study presents a hardware implementation of an efficient number theory transformation. We utilize a multi-level pipeline architecture in the design to accomplish parallel calculations and execute it on a low-profile Artix7-XC7A100T FPGA device. The performance evaluation results demonstrate that our implementation significantly enhances performance and reduces resource usage compared to other existing proposals on the same platform. We suggested NTT core can be implemented in edge computing chips to enhance computational speed due to its small and low-latency design. The experimental results show that the proposed design, which supports both NTT and inverse NTT, achieves 417-Megahertz and consumes only 541-LUTs on Artix-7 XC7A100T.

Keywords

Field-programmable Gate Array; Post-Quantum Cryptography; Number Theoretic 18 Transform; CRYSTALS Kyber; Lightweight design

Subject

Computer Science and Mathematics, Hardware and Architecture

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.