Liang, Y.; Tan, J.; Xie, Z.; Chen, Z.; Lin, D.; Yang, Z. Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence. Sensors2024, 24, 240.
Liang, Y.; Tan, J.; Xie, Z.; Chen, Z.; Lin, D.; Yang, Z. Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence. Sensors 2024, 24, 240.
Liang, Y.; Tan, J.; Xie, Z.; Chen, Z.; Lin, D.; Yang, Z. Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence. Sensors2024, 24, 240.
Liang, Y.; Tan, J.; Xie, Z.; Chen, Z.; Lin, D.; Yang, Z. Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence. Sensors 2024, 24, 240.
Abstract
In recent years, Edge Intelligence (EI) has emerged, combining edge computing with AI, specifically deep learning, to run AI algorithms directly on edge devices. In practical applications, EI faces challenges related to computational power, power consumption, size, and cost, with the primary challenge being the trade-off between computational power and power consumption. This has rendered traditional computing platforms unsustainable, making heterogeneous parallel computing platforms a crucial pathway for implementing EI. In our research, we leveraged the Xilinx Zynq 7000 heterogeneous computing platform, employed High-Level Synthesis (HLS) for design, and implemented two different accelerators for LeNet-5 using loop unrolling and pipelining optimization techniques. The experimental results show that when running at a clock speed of 100 MHz, the PIPELINE accelerator, compared to the UNROLL accelerator, experiences an 8.09% increase in power consumption but speeds up by 14.972 times, making the PIPELINE accelerator superior in performance. Compared to the CPU, the PIPELINE accelerator reduces power consumption by 91.37% and speeds up by 70.387 times, while compared to the GPU, it reduces power consumption by 93.35%.This study provides two different optimization schemes for edge intelligence applications through design and experimentation, and demonstrates the impact of different quantization methods on FPGA resource consumption. These experimental results can provide a reference for practical applications, thereby providing a reference hardware acceleration scheme for edge intelligence applications.
Keywords
FPGA; HLS; Edge Intelligence; deep learning; heterogeneous computing
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.