ARTICLE | doi:10.20944/preprints202311.1420.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Image Classification; Complex-valued Neural Network; FPGA Implementation; CVNN on FPGA
Online: 22 November 2023 (15:14:18 CET)
This proposed research explores a novel approach to image classification by deploying a complex-valued neural network (CVNN) on a field-programmable gate array (FPGA), specifically for classifying 2D images transformed into polar form. The aim of this research is to address the limitations of existing neural network models in terms of energy and resource efficiency, by exploring the potential of FPGA-based hardware acceleration in conjunction with advanced neural network architectures like CVNNs. The methodological innovation of this research lies in the Cartesian to polar transformation of 2D images, effectively reducing the input data volume required for neural network processing. Subsequent efforts focused on constructing a CVNN model optimized for FPGA implementation, emphasizing the enhancement of computational efficiency and overall performance. The experimental findings provide empirical evidence supporting the efficacy of the image classification system developed in this study. One of the developed models, CVNN_128, achieves an accuracy of 88.3% with an inference time of just 1.6ms and a power consumption of 4.66mW for the classification of the MNIST test dataset consists of 10,000 frames. While there is a slight concession in accuracy compared to recent FPGA implementations that achieve 94.43%, our model significantly excels in classification speed and power efficiency—surpassing existing models by more than a factor of 100. In conclusion, the paper demonstrates the substantial advantages of FPGA-implementation of CVNNs for image classification tasks, particularly in scenarios where speed, resource, and power consumption are critical. The study’s reproducible results and corresponding code are available on GitHub at the following link: https://github.com/mahmad2005/CVNNonFPGA
ARTICLE | doi:10.20944/preprints202301.0328.v3
Online: 20 January 2023 (02:07:12 CET)
FPGA-based cards for data concentration and readout are often used in data acquisition (DAQ) systems for high-energy physics experiments. The DMA engines implemented in FPGA enable efficient data transfer to the processing system’s memory. This paper presents a versatile DMA engine. It may be used in systems with FPGA-equipped PCIe boards hosted in a server and MPSoC-based systems with programmable logic connected directly to the AXI system bus. The core part of the engine is implemented in HLS to simplify further development and modifications. The design is modular and may be easily integrated with the user’s DAQ logic, assuming it delivers the data via a standard AXI-Stream interface. The engine and accompanying software are designed with flexibility in mind. They offer a simple single-packet mode for debugging and a high-performance multi-packet mode fully utilizing the computational power of the processing system. The number of used DAQ cards and the amount of memory used for the DMA buffer may be modified in runtime without rebooting the system. That is particularly useful in the development and test setups. The paper also presents the development and testing methodology. The whole design is open-source and available in public repositories.
ARTICLE | doi:10.20944/preprints202107.0403.v1
Online: 19 July 2021 (10:51:37 CEST)
This paper describes a new optimization methodology of testing vector sets reduction for testing of soft-processor cores and their individual blocks. The deterministic test vectors both for whole core and its individual blocks are investigated that significantly reduce the testing time and amount of test data that needs to be stored on the tester memory. The processor executes an assembler program which together with determined testing vectors ex-ercise its functionality. The new BIST methodology applicable at industrial testing of processor cores, diagnostics and dynamic reconfiguration of FPGA is proposed. This novel methodology combined with dynamic reconfiguration of FPGAs can be profitable applied for missions-critical i.e. FPGAs operate in space, or other difficult condition where are explore on radiation. Experimental results demonstrate that the proposed approach reduces many times testing time.
ARTICLE | doi:10.20944/preprints202302.0211.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: FPGA; DAQ; Data concentration; Beneš network
Online: 13 February 2023 (09:10:55 CET)
The concentration of data from multiple links to a single output is an essential task performed by High-Energy Physics (HEP) Data Acquisition Systems (DAQs). At high and varying data rates combined with the large width of the concentrator’s output interface, this task is non-trivial. This paper presents a concentrator based on the Beneš network, which provides efficient concentration without using a high-frequency clock internally. It warrants that empty data are eliminated and does not disturb the data time-ordering if the data rates significantly differ between inputs. Additionally, it is well suited to FPGA implementation. It is based on simple data-routing primitives and may be fully pipelined.
ARTICLE | doi:10.20944/preprints201609.0020.v1
Subject: Physical Sciences, Optics And Photonics Keywords: FPGA; photoacoustic spectroscopy; diode laser; methane
Online: 6 September 2016 (11:51:45 CEST)
A portable laser photoacoustic sensor based on a Field-Programmable Gate Array (FPGA) is reported for methane detection. A tunable DFB diode laser in the 1654 nm wavelength range is used as an excitation source. The photoacoustic signal processing was implemented by a FPGA device. A small resonant photoacoustic cell is designed. The minimum detection limit (1σ) of 10 ppm for methane (CH4) is demonstrated.
ARTICLE | doi:10.20944/preprints202311.0956.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: embedded system; FPGA; Z80; ZX Spectrum+
Online: 15 November 2023 (10:35:30 CET)
The ZX Spectrum was a popular 8-bit home computer by Sinclair Research in the 80s. Even though some of these computers may still work, the audio tapes, the TV with an analog tuner and the micro-switch joystick, used with the original ZX Spectrum, nowadays are outdated and hard to find in good working order or replicate. Since many other old closed systems are also very difficult to update to support modern peripherals, there is a necessity to provide a methodology to adapt such systems to support new peripherals while being compatible with existing software. The work proposed in this paper is focused on recreating a ZX Spectrum+/48k computer and interface it with modern peripherals on an FPGA. This is accomplished by adding a co-processor to assist with the control of the new peripherals which would either require to complex architectural changes to the original system and in the end it would perform poorly due to the low performance of the Z80 CPU. This work distances from previous ones on emulating a ZX Spectrum since it focuses on the use of different upgraded peripherals and the use of a NIOS II soft-processor as a co-processor to manage the SD Card accesses. A demonstration of the proposed modernized architecture was made by successfully running a diagnostics ROM and playing original ZX Spectrum games from an SD card for game with a PS/2 keyboard and a pair of joysticks.
ARTICLE | doi:10.20944/preprints202003.0397.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: New-CFAR; WS-CFAR; FPGA; Radar detection
Online: 26 March 2020 (15:38:32 CET)
In the radar system, detection represents a basic and important stage in the receiver side. The detection process is based on the thresholding criteria; two philosophies of this criteria, constant and adaptive threshold. The constant threshold is simple in design, but it has a mis-detection and does not control the false alarm rate. As for the adaptive threshold, it is powerful in target detection, and better control of the false alarm rate, where it is called Constant False Alarm Rate (CFAR). Lots of research in the CFAR design, but the gap in the previous works is that there is no CFAR algorithm can be working with all or most environmental fields and all or most target situations.In this paper, The CFAR, which can work with the most environment and most of the target situations, has been presented. The producing the design and implementation of the new practical CFAR processor is presented. Where, the new CFAR is a combination of the properties of three different CFAR algorithm (CA, OSGO, and OSSO), and from two different families; averaging and statistical. Where it has overperformed of it's is 97.25% for simulation and 96.25% for the implementable version for different target situations. The simulation analysis is made by using Matlab 2015, while the implementation is done by using Xilinx Spartan 700 3a.
ARTICLE | doi:10.20944/preprints202307.1705.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Convolution; Multiplier; Look-up table; Carry chain; FPGA.
Online: 26 July 2023 (11:22:49 CEST)
Convolution forms one of the most essential operations for the FPGA-based hardware accelerator. However, the existing designs often neglect the inherent architecture of FPGA, which puts forward an austere challenge on hardware resource requirements. Even though some previous works have proposed approximate multipliers or convolution acceleration algorithms to deal with this issue, the inevitable accuracy loss and resource occupation easily lead to performance degradation. Toward this, we first propose two kinds of resource-efficient optimized accurate multipliers based on LUTs or carry chains. Then targeting FPGA-based platforms, a generic multiply-accumulate structure is constructed by directly accumulating the partial products produced by our proposed optimized radix-4 Booth multipliers without intermediate multiplication and addition results. Experimental results demonstrate that our proposed multiplier achieves a maximum 51% look-up-table (LUT) reduction compared to the Vivado area optimized multiplier IP. Furthermore, the convolutional process unit using the proposed structure achieves a 36% LUT reduction compared to existing methods. As case studies, the proposed method is applied to DCT transformer and LeNet to achieves hardware resource saving without loss of accuracy.
ARTICLE | doi:10.20944/preprints201804.0101.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: EMI; Luo-converter; chaotic PWM technique; FPGA; RCFMFD
Online: 9 April 2018 (09:00:19 CEST)
Chaotic switching is a newly evolve randomization method which can suppress conducted electromagnetic interference generated within the DC-DC converter. It can suppress the spectral peaks present in the frequency band effectively by spread spectrum technique and can spread it over the wide range of frequency band implying EMI suppression. In this paper, a chaotic PWM technique based on RCFMFD scheme is generated through Field programmable gate array (FPGA) for suppressing the conducted electromagnetic interference (EMI) generated within the Luo converter. A hardware prototype of Luo converter was developed in order to analyze EMI reduction through FFT analysis by comparing both traditional periodic PWM switching and chaotic PWM switching. The results obtained from the hardware setup shows significant reduction of EMI with Chaotic switching as compared to traditional PWM switching for both boost and buck operation of Luo converter.
ARTICLE | doi:10.20944/preprints202309.0884.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: network security; packet sniffer; packet classification; FPGA; embedded systems
Online: 13 September 2023 (11:51:05 CEST)
In recent years web applications and on-line business transactions have grown many folds. Consequently, also cyberattacks have increased and represent a serious threat to the pervasive digital services upon which our society relies. To mitigate cyberattacks, many countermeasures are deployed on computing nodes (e.g., anti-malware software) as well as on network devices to detect and possibly block malicious packets in transit; these monitoring devices broadly go under the name of firewalls. Firewalls are designed according to two main architectural approaches: software running on a standard or embedded computer, or purposedly designed hardware, e.g., ASICs. Software-based solutions have the advantage of high flexibility and can be ported on easily upgradable hardware. However, hardware implementation represents the only viable solution for high data rates. On the market, very fast devices of the latter kind are available, but their cost is typically very high, especially considering that their ultra-optimized design makes updating them very difficult, with the consequence of a rather short lifespan. As a more balanced alternative, we wanted to investigate the use of an FPGA architecture, , which is significantly easier to update than custom-built chips, and features low-latency and high-throughput characteristics concurrently, making it preferrable to other programmable systems based on GPUs or microcontrollers. In this paper a packet sniffer that has been designed on FPGA with a 1 Gbit/s data transfer rate is presented. The system is implemented on the FPGA development board KC705 by Xilinx, can analyze Ethernet frames, checking the frame fields against a set of rules defined by the user and calculates statistics of the received Ethernet frames over time. The designed packet sniffer has been successfully tested both with Ethernet frames ad hoc generated using a packets generator, and with real web traffic by connecting the packet sniffer to the internet.
ARTICLE | doi:10.20944/preprints202308.0643.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: VIO; V-SLAM; FPGA; histogram equalization; FAST; Pyramid processing
Online: 9 August 2023 (02:36:25 CEST)
Due to the advantages of low latency, low power consumption and high flexibility of FPGA-based acceleration technology, it has been more and more widely studied and applied in the field of computer vision in recent years. An FPGA-based feature extraction and tracking accelerator for real-time visual odometry (VO) and visual simultaneous localization and mapping (V-SLAM) is proposed, which can realize the complete acceleration processing capability of the image front-end and directly output the feature point ID and coordinates to the backend. The accelerator consists of image preprocessing, pyramid processing, optical flow processing, and feature extraction and tracking modules. For the first time, it implements a hardware solution that combines features from accelerated segment test (FAST) corners with Gunnar Farneback (GF) dense optical flow, to achieve better feature tracking performance and provide more flexible technical route selection. In order to solve the scale invariance and rotation invariance lacking problem of FAST features, an efficient pyramid module with a five-layer thumbnail structure is designed and implemented. The accelerator is implemented on a modern Xilinx Zynq FPGA. The evaluation result shows that the accelerator can achieve stable tracking of features of violently shaking images, and is consistent with the results of MATLAB code running on PC. When operating at 100MHz, the accelerator can process 108 frames per second for 720P images and 48 frames per second for 1080P images. Compared to PC CPUs that consume seconds of time, the processing latency is greatly reduced to the order of milliseconds, making GF dense optical flow an efficient and practical technical solution on the edge side.
ARTICLE | doi:10.20944/preprints202305.1980.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: DC-DC Converter; High static gain; Control strategy; FPGA.
Online: 29 May 2023 (05:41:35 CEST)
This paper presents a comprehensive study and the analysis of a topology of a no isolated DC-DC intituled Double Quadratic Boost converter with high static gain. This converter has the main advantage of high static gain and low voltage stress on its switches. The article will first present the theoretical analysis of the converter operating in an open loop. The objective of the work is the mathematical modeling and the control strategy of the converter, as well as the validation through closed-loop experimental results. Besides, we presented the practice test results to demonstrate the operation of the converter, such as the static gain experimental curve, the practice efficiency of the converter, and the control of the output voltage, as well as the capacitor voltage balance control. The authors designed a prototype for 1 kW, with a switching frequency of fs=50kHz, with FPGA-based command and modulation.
COMMUNICATION | doi:10.20944/preprints202301.0516.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: FPGA; PPI 8255A; Oscillator circuit 4GHz; Ultrasonic PWM; VSI
Online: 28 January 2023 (04:50:35 CET)
To build a stable power factor and high output accuracy, it should use balanced active and reactive power, also used ultrasonic PWM was used with the frequencies (20–500 kHz). I also used a three-phase dynamic load at the output system and was connected to the compound splicing (CS) system this is a novelty. An oscillator circuit with 4Ghz as a microcontroller is a novelty, and designing ADC is utilized to increase accuracy, as well as contribute to the reduction of loss brought on by the characteristics of voltage source inverters (VSI), such as dead time determined by the excess voltage or voltage drop in the inverter, circumstances abnormal to the load current, such as short circuit current in the production phase, and so forth. The innovation relates to the inverter's load sensing circuit, current smoothing during operation, reaction, spontaneous power factor enhancement of the inverter, and correction of reactive power of passive devices. This invention contributes to the advancement of D.C to A.C power converter by the achievement of extremely low losses, excellent precision, and lightweight construction. Additionally, the accuracy ranges from 99%, and the total harmonic distortion (THD) of the voltage and current is (0.1%-0.8%). We Power MOSFET are used, along with FPGA to improve control over the creation of ultrasonic PWM signals, the programmable peripheral interface (PPI) 8255A for regulatory work. The marks serve as proof of this.
ARTICLE | doi:10.20944/preprints202110.0098.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: FPGA; Wishbone; Control interface; VHDL; System management; System diagnostics
Online: 6 October 2021 (09:48:08 CEST)
FPGA-based data acquisition and processing systems play an important role in modern high-speed, multichannel measurement systems, especially in High-Energy and Plasma Physics. Such FPGA-based systems require an extended control and diagnostics part corresponding to the complexity of the controlled system. Managing the complex structure of registers while keeping the tight coupling between hardware and software is a tedious and potentially error-prone process. Various existing solutions aimed at helping that task do not perfectly match all specific requirements of that application area. The paper presents a new solution based on the XML system description, facilitating the automated generation of the control system’s HDL code and software components and enabling easy integration with the control software. The emphasis is put on reusability, ease of maintenance in case of system modification, easy detection of mistakes, and the possibility of use in modern FPGAs. The presented system has been successfully used in data acquisition and preprocessing projects in High-Energy Physics experiments. It enables easy creation and modification of the control system definition and convenient access to the control and diagnostic blocks. The presented system is an open-source solution and may be adopted by the user for particular needs.
Subject: Physical Sciences, Optics And Photonics Keywords: magnetic fusion devices; ir interferometry; fpga; phase detection; dsp
Online: 20 September 2019 (10:32:16 CEST)
Interferometry is used in magnetic fusion devices to measure the line-1 averaged electron density. It is based on detecting changes in the refractive index of electromagnetic waves traveling through a plasma. The adequate frequency of these electromagnetic waves depends on several limitations. First the maximum expected peak electronic density must be lower than the cutoff density for that frequency. This means that the waves can still propagate through the plasma when the maximum density is reached. In this sense, IR interferometers operating in the infrared región are themost suitable. With such low wavelengths, mechanical vibrations become an important issue and a complementary interferometer to cancel these vibrations must be used. These arrangements are called two color interferometers. In this paper some measurements that were obtained from the TJ-II double color IR FPGA-based processing system that were never published before are shown and analyzed. The line-averaged electron density is computed in real time (100 μs).
ARTICLE | doi:10.20944/preprints202311.0309.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Spiking Neural Networks; Neural Encoding; Complex Exponential Neuron; FPGA Implementation
Online: 6 November 2023 (10:36:27 CET)
This research investigates the implementation of complex exponential-based neurons in FPGA, which can pave the way for implementing bio-inspired spiking neural networks to compensate for the existing computational constraints in conventional artificial neural networks. The increasing use of extensive neural networks and the complexity of models in handling big data lead to higher power consumption and delays. Hence, finding solutions to reduce computational complexity is crucial for addressing power consumption challenges. The complex exponential form effectively encodes oscillating features like frequency, amplitude, and phase shift, streamlining the demanding calculations typical of conventional artificial neurons through levering simple phase addition of complex exponential functions. The article implements such a two-neuron and a multi-neuron neural model using Xilinx system generator and Vivado design suite, employing 8-bit, 16-bit, and 32-bit fixed-point data format representations. The study evaluates the accuracy of the proposed neuron model across different FPGA implementations while also providing a detailed analysis of operating frequency, power consumption, and resource usage for the hardware implementations. BRAM-based Vivado designs outperformed Simulink regarding speed, power, and resource efficiency. Specifically, the Vivado BRAM-based approach supported up to 128 neurons, showcasing optimal LUT and FF resource utilization. Such outcomes accommodate choosing the optimal design procedure for implementing spiking neural networks on FPGAs.
ARTICLE | doi:10.20944/preprints202308.1327.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: SHA-3 accelerator; Edge computing; Padding; Round function; SoC FPGA
Online: 18 August 2023 (07:17:07 CEST)
Edge computing has emerged as a significant computing trend alongside the rapid expansion of the Internet of Things (IoT). Computing operations at the network’s edge offer a solution to the high latency and service overload challenges often associated with cloud computing. Additionally, the Secure Hash Algorithm-3 (SHA-3) plays a crucial role in ensuring data integrity and is implemented for numerous applications in the IoT field. Therefore, integrating the SHA-3 accelerator into edge computing is essential. Moreover, recent studies about SHA-3 have primarily focused on achieving high performance and optimizing resource utilization for SHA-3. However, these studies have overlooked the crucial aspect of data transfer between the external memory and the hash function block, as transfer time plays a significant role. This paper proposes an efficient SHA-3 architecture designed for System-on-Chip (SoC) Field Programmable Gate Array (FPGA) to address the aforementioned challenges and be suitable for real edge computing. The architecture contains three key techniques. First, the harmony of padding and Direct Memory Access (DMA) for managing the data transfer process and enhancing performance efficiency based on Serial Input to Parallel Output (SIPO) and a barrel shifter. Secondly, implementing internal pipelining within the Round Function (RF) to reduce critical path delays and optimize resource utilization. Finally, designing four modes (SHA3-224, SHA3-256, SHA3-384, and SHA3-512) to cater to various applications. Our architecture is implemented and tested on the DE10-Standard Development Kit (Cyclone V SX SoC-5CSXFC6D6F31C6N), which is integrated into the FPGA and is controlled by a Dual-Core ARM Cortex-A9 processor. The result is up to 38.34 Gbps in throughput and 5.61 Mbps/ALM in efficiency for the RF computation, 28.02 Gbps in throughput, and 3.63 Mbps/ALM for the full proposed architecture.
ARTICLE | doi:10.20944/preprints202304.0922.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: data acquisition; Orbitrap mass spectrometry; spectrum analysis; FPGA; FFT analysis
Online: 25 April 2023 (10:29:05 CEST)
Orbitrap mass spectrometers have been widely used in environmental component analysis . This paper presents an enhanced data acquisition system for in-situ detection of Orbitrap mass spectrometry, which enables in-suit real-time signal processing and analysis for atmospheric molecules with small mass numbers.During previous space atmospheric explorations, quadrupole mass spectrometry (QMS) was usually utilized for analyzing isotopic compositions and complicated compounds in the mixture. However, the inevitable ion scattering and drift during mass transfer, as well as the QMS attenuator loss, result in a relatively lower signal-to-noise ratio of the produced mass spectrum. In recent years, a novel mass spectrometer, called an Orbitrap mass spectrometer, has been gradually accepted and used because of its non-destructive detection ability, high mass resolution, and accuracy. Nevertheless, Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) is preferred over the Orbitrap mass spectrometry for ground applications, especially for detecting low-mass components (e.g. below 50 atomic mass units) due to the former’s superior resolution. As the mass number decreases, the detection system demands a higher sampling and processing rate. Additionally, in ground-based scenarios, mass spectrometry analysis software on computers is often employed to analyze the signals. Therefore, in order to achieve in-situ detection of the space atmospheric environment using the Orbitrap mass spectrometer, this paper proposes a real-time signal acquisition and processing system for mass spectrometry analysis. The system comprises of signal conditioning circuits, analog-to-digital conversion circuits, programmable logic circuits, and related software. These components perform spectrum analysis and process the signal in real-time on hardware components, allowing for high-speed acquisition and analysis of the signals produced by the Orbitrap mass spectrometer.
ARTICLE | doi:10.20944/preprints201806.0393.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: reconfigurable architecture; CORDIC; Field Programmable Gate Array(FPGA); SAR imaging
Online: 25 June 2018 (14:42:01 CEST)
This paper presents a unified reconfigurable coordinate rotation digital computer (CORDIC) processor for floating-point arithmetic. It can be configured to operate in multi-mode to achieve a variety of operations and replaces multiple single-mode CORDIC processors. A reconfigurable pipeline-parallel mixed architecture is proposed to adapt different operations, which maximizes the sharing of common hardware circuit and achieves the area-delay-efficiency. Compared with previous unified floating-point CORDIC processors, the consumption of hardware resources is greatly reduced. As a proof of concept, we apply it to 1638416384 points target Synthetic Aperture Radar (SAR) imaging system, which is implemented on Xilinx XC7VX690T FPGA platform. The maximum relative error of each phase function between hardware and software computation and the corresponding SAR imaging result can meet the accuracy index requirements.
ARTICLE | doi:10.20944/preprints202307.0374.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: Floating point, hybrid floating point, FPGA, numerical precision, accuracy, mixed-precision
Online: 6 July 2023 (07:54:48 CEST)
Nowadays, there are implemented devices whose purpose is to perform massive computations by saving resources at the time they reduce the latency of arithmetic operations. These devices are usually GPUs, FPGAs and other specialised devices such as "Coral". Neural networks, digital filters and numerical simulators take advantage of the massively parallel operations of such devices. One way to reduce the amount of resources used is to limit the size of the registers that store data. This has led to the proliferation of numeric formats with a length of less than 32 bits, known as short floating point or SFP. We have developed several SFP’s for use in our neural network accelerator design, allowing for different levels of accuracy. We use a 16-bit format for data transfer and different formats can be used simultaneously for internal operations. The internal operations can be performed in 16-bit, 20-bit and 24-bit. The use of registers larger than 16-bit allows the preservation of fractional information while increasing precision. By leveraging some of the FPGA’s arithmetic resources, our design outperforms designs implemented from scratch and is competitive with specialized arithmetic circuits already implemented in the FPGA.
ARTICLE | doi:10.20944/preprints201907.0011.v1
Subject: Engineering, Telecommunications Keywords: LTE, LTE-A, 4G, PRACH, NCO, time-domain frequency shift, FPGA
Online: 1 July 2019 (11:52:58 CEST)
The Physical Random Access Channel (PRACH) plays an important role in LTE and LTE-A systems. It is through the PRACH channel that the user equipment (UE), based on eNodeB's timing estimates, aligns its uplink transmissions to the eNodeB's uplink and gain access to the network. One of the initial operations executed by the PRACH receiver at eNodeB side is the translation of the PRACH signal back to base band, $i.e.$, center the PRACH signal around DC. This operation is a necessary step for preamble detection and can be carried out through a time-domain frequency shift operation. Therefore, in this paper we present the hardware architecture and implementation details of a configurable and optimized FPGA-based time-domain frequency shifter. It is a hardware-efficient and accurate architecture for converting the relevant received PRACH signal into base band before further signal processing. The architecture is manly based on a customized Numerically Controlled Oscillator (NCO), which is used for generating complex exponentials employing only adders, a Look-Up Table (LUT) and plain logic resources. The main advantage of the proposed hardware architecture is that it completely eliminates the need for storing a large number of long complex exponential sequences by employing a single LUT and exploiting quarter wave symmetry of the basis waveform. Our simulation results show that the proposed customized NCO architecture provides high Spurious Free Dynamic Range (SFDR) signals using a minimal amount of FPGA resources. Moreover, the proposed architecture exhibits spur-suppression ranging from 62.13 to 153.58 dB without using Taylor Series correction.
ARTICLE | doi:10.20944/preprints202311.1169.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Architecture design, FPGA, IEC 61131-3, low-latency, PLC, System-on-Chip.
Online: 21 November 2023 (03:53:06 CET)
This study presents the design and implementation of a PLC microprocessor adhering to the IEC-61131-3 standard, executed on a Cyclone-V FPGA using a DE10-NANO development board. Our microprocessor optimizes the central processing unit by streamlining the data path, achieving a remarkable simulated response time of approximately 60 ns, equivalent to three clock cycles at a 50MHz frequency for Boolean operations. To substantiate our approach, we conducted practical experiments utilizing a FESTO conveyor station, employing relays as actuators, and incorporating optical and inductive sensors. The results underscore the feasibility of our proposed approach and serve as practical validation of its efficacy. This work introduces a promising avenue for the development of cost-effective PLCs employing SoC FPGA variants. Additionally, a thorough comparison of execution times with other early reported architectures. Our microprocessor outperforms even well-established PLCs like the S7-312, with substantial reductions in execution times of 94.54% for floating-point operations, 71.42% to 93.33% for word operations, and up to 78.57% for bit operations.
COMMUNICATION | doi:10.20944/preprints202305.0746.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: field programmable gate array (FPGA); hardware implementation; real-time system; action clustering
Online: 10 May 2023 (11:07:29 CEST)
Video behavior recognition often needs to focus on object motion processes. In this work, a self-organizing computational system oriented to behavioral clustering recognition is proposed, which achieves the extraction of motion change patterns by binary encoding and completes motion pattern summarization using a similarity comparison algorithm. And in the face of unknown behavioral video data, a self-organizing structure with layer-by-layer accuracy progression is used to achieve motion law summarization by using a multi-layer agent design approach. Finally, the real-time feasibility is verified in the prototype system using real scenes to provide a new feasible solution for unsupervised behavior recognition and space-time scenes.
ARTICLE | doi:10.20944/preprints201804.0129.v1
Subject: Engineering, Control And Systems Engineering Keywords: Topological-entropy; Chaos; Fractional-order; Computer-assisted proof; Topological Horseshoe Analysis; FPGA
Online: 10 April 2018 (15:41:39 CEST)
This paper first discusses a fractional-order Liu system of order as low as 2.7 and shows its chaotic characteristics by carrying out numerical simulations such as Lyapunov exponents, bifurcation diagrams and phase portraits. Then, by using the topological horseshoe theory and computer-assisted proof, the existence of chaos in the system is verified theoretically. Finally, the fractional-order system is implemented on a Field Programmable Gate Array (FPGA) and the results obtained show that the fractional-order Liu system is indeed chaotic.
ARTICLE | doi:10.20944/preprints202305.1810.v1
Subject: Computer Science And Mathematics, Hardware And Architecture Keywords: RISC-V; PULPino; NDIR CO2 sensors; FPGA; energy efficiency; signal demodulation; power consumption
Online: 25 May 2023 (11:42:42 CEST)
In the field of embedded systems, energy efficiency is a critical requirement, particularly for battery-powered devices. RISC-V processors have gained popularity due to their flexibility and open-source nature, making them an attractive choice for embedded applications. However, not all RISC-V processors are equally energy-efficient, and it is important to evaluate their performance in specific use cases. This paper evaluates the energy consumption and resource utilization of a new RISC-V processor, RisCO2, and four existing processors - Zero-riscy, Micro-riscy, Ri5cy, and CV32E40P - in a signal demodulation application for NDIR CO2 sensors. The processors were implemented in the PULPino SoC and synthesized using Vivado IDE. The processor named RisCO2 is based on the RV32E_Zfinx instruction set and was designed from scratch by the authors specifically for low-power signal processing applications such as signal demodulation in CO2 NDIR sensors. The other processors are Ri5cy, Micro-riscy, and Zero-riscy, developed by the PULP Platform team, and CV32E40P (derived from Ri5cy) from the OpenHW Group, all of them widely used in the RISC-V community. Our experiments showed that RisCO2 had the lowest energy consumption among the five processors, with a 53.5% reduction in energy consumption compared to CV32E40P and a 94.8% reduction compared to Micro-riscy. Additionally, RisCO2 had the lowest FPGA resource utilization compared to the best-performing processors, CV32E40P and Ri5cy, with a 46.1% and a 59% reduction in LUTs, respectively. Our findings suggest that RisCO2 is a highly energy-efficient RISC-V processor for NDIR CO2 sensors that require signal demodulation. The results also highlight the importance of evaluating processors in specific use cases to identify the most energy-efficient option. This paper provides valuable insights for designers of energy-efficient embedded systems using RISC-V processors.
CONCEPT PAPER | doi:10.20944/preprints202204.0129.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Digital Design; Digital Architecture; Image Processing; Machine learning; FPGA; Dedicated Design; Image Processor
Online: 14 April 2022 (05:09:47 CEST)
Many dedicated designs for real-time operations provide functionality on fixed-sized operators, but where speed, scalability, and flexibility are required, extensive research is demanded. Dedicated designs can provide real-time processing for many applications. This paper presents an FPGA-based design of a general image processor. The proposed design is based on a fixed-point representation of binary numbers. The proposed design provides a mechanism to manage matrices on-chip along with matrix arithmetic. The matrices are represented with simple identifiers and microinstruction that assist in the computation of many operations which are useful for solving complex problems. The design was successfully implemented and tested using VHDL language. The proposed design is an efficient architecture as a standalone processor with all embedding computational resources necessary for an embedded image processing application.
ARTICLE | doi:10.20944/preprints202101.0202.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Stochastic Logic; Chaotic Systems; Approximate Computing; Shimizu-Morioka System; Chaotic Circuits; FPGA Implementation
Online: 11 January 2021 (14:56:15 CET)
An exploding demand for processing capabilities related to the emergence of the IoT, AI and big data, has led to the quest for increasingly efficient ways to expeditiously process the rapidly increasing amount of data. These ways include different approaches like improved devices capable of going further in the more Moore path, but also new devices and architectures capable of going beyond Moore and getting more than Moore. Among the solutions being proposed, Stochastic Computing has positioned itself as a very reasonable alternative for low-power, low-area, low-speed, and adjustable precision calculations; four key-points beneficial to edge computing. On the other hand, chaotic circuits and systems appear to be an attractive solution for (low-power, green) secure data transmission in the frame of edge computing and IoT in general. Classical implementations of this class of circuits require intensive and precise calculations. This paper discusses the use of the SC framework for the implementation of nonlinear systems, showing that it can provide results comparable to those of classical integration, with much simpler hardware, paving the way for relevant applications.
ARTICLE | doi:10.20944/preprints202008.0603.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: secure boot; cyber-physical system security; embedded systems; FPGA; hardware primitives; IoT security
Online: 27 August 2020 (08:49:02 CEST)
Reconfigurable computing is becoming ubiquitous in the form of consumer-based Internet of Things (IoT) devices. Reconfigurable computing architectures have found their place in safety-critical infrastructures such as the automotive industry. As the target architecture evolves, it also needs to be updated remotely on the target platform. This process is susceptible to remote hijacking, where the attacker can maliciously update the reconfigurable hardware target with tainted hardware configuration. This paper proposes an architecture of establishing Root of Trust at the hardware level using cryptographic co-processors and Trusted Platform Modules (TPMs) and enable over the air updates. The proposed framework implements secure boot protocol on Xilinx based FPGAs. The project demonstrates the configuration of the bitstream, boot process integration with TPM and secure over-the-air updates for the hardware reconfiguration.
ARTICLE | doi:10.20944/preprints201807.0444.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: staircase waveform; harmonics control; field programming gate array (FPGA); estimation; iterative technique; VHDL
Online: 24 July 2018 (06:07:17 CEST)
Few switching transitions in high power and medium voltage application of Power converters are desirable. The selective harmonics elimination (SHE) pulse width modulation offers a better quality waveform with lower switching transitions and hence lower switching losses. The SHE is a pre-programmed modulation technique where certain amounts of lower order harmonics are removed and fundamental voltage is controlled. After Fourier analysis of output waveform, a set of nonlinear transcendental equations is obtained which exhibits, multiple, unique or no solution in different range of modulation index (MI). In this paper, an iterative method based on the Jacobian estimate is proposed to solve a highly non-linear set of SHE equations. The proposed technique is easy in implementation and can solve a large number of such equations as computation of the Jacobian matrix in the subsequent iteration is estimated from the previous values. Moreover, the proposed method also removes the singularity problem, especially for large SHE equations. High accuracy in the initial guess is also not essential for this method and can converge to the solution with any random initial guess. The computational and simulation results are given to validate the concept. The hardware result is provided to confirm the computational and simulation results.
ARTICLE | doi:10.20944/preprints202201.0399.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Shared Autonomous Vehicle (SAV); Field-Programmable Gate Array (FPGA); Microphone Array; Sound Source Localization
Online: 26 January 2022 (13:07:39 CET)
With the current technological transformation in the automotive industry, autonomous vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5. This level corresponds to the full vehicle automation, where the driving system autonomously monitors and navigates the environment. With SAE-level 5, the concept of a Shared Autonomous Vehicle (SAV) will soon become a reality and mainstream. The main purpose of an SAV is to allow unrelated passengers to share an autonomous vehicle without a driver/moderator inside the shared space. However, to ensure their safety and well-being until they reach their final destination, it is required an active monitoring of all passengers. In this context, this article presents a microphone-based sensor system that is able to localize sound events inside an SAV. The solution is composed of a Micro-Electro-Mechanical System (MEMS) microphone array with a circular geometry connected to an embedded processing platform that resorts to Field-Programmable Gate Array (FPGA) technology to successfully process in hardware the sound localization algorithms.
ARTICLE | doi:10.20944/preprints202101.0250.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: elliptic curves cryptography (ECC); high speed implementation; unified; Montgomery multiplication; field-programmable gate array (FPGA)
Online: 13 January 2021 (13:03:15 CET)
In this paper, we present a high-speed, unified elliptic curve cryptography (ECC) processor for arbitrary Weierstrass curves over GF(p), which to the best of our knowledge, outperforms other similar works in terms of execution time. Our approach employs the combination of the schoolbook long and Karatsuba multiplication algorithm for the elliptic curve point multiplication (ECPM) to achieve better parallelization while retaining low complexity. In the hardware implementation, the substantial gain in speed is also contributed by our n-bit pipelined Montgomery Modular Multiplier (pMMM), which is constructed from our n-bit pipelined multiplier-accumulators that utilizes DSP primitives as digit multipliers. Additionally, we also introduce our unified, pipelined modular adder/subtractor (pMAS) for the underlying field arithmetic, and leverage a more efficient yet compact scheduling of the Montgomery ladder algorithm. The implementation on the 7-series FPGA: Virtex-7, Kintex-7, and XC7Z020, yields 0.139, 0.138, and 0.206 ms of execution time, respectively. Furthermore, since our pMMM module is generic for any curve in Weierstrass form, we support multi-curve parameters, resulting in a unified ECC architecture. Lastly, our method also works in constant time, making it suitable for applications requiring high speed and SCA-resistant characteristics.
ARTICLE | doi:10.20944/preprints201809.0550.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: synthetic aperture radar (SAR); real-time processing; single FPGA node imaging processing; multi-node parallel accelerating technique
Online: 27 September 2018 (15:14:49 CEST)
With the development of satellite load technology and very-large-scale integrated (VLSI) circuit technology, on-board real-time synthetic aperture radar (SAR) imaging systems have facilitated rapid response to disasters. Limited by severe size, weight, and power consumption constraints, a key challenge of on-board SAR imaging system design is to achieve high real-time processing performance. In addition, with the rise of multi-mode SAR applications, the reconfiguration of the on-board processing system is beginning to receive widespread attention. This paper presents a multi-mode SAR imaging chip with SoC architecture based on the reconfigurable double-operation engines and multilayer switching network. We decompose the commonly used extend chirp scaling (CS) SAR imaging algorithm into 8 types of double-operation engines according to the computing orders, and design a three-level switching network to connect these engines for data transition. The CPU is responsible for engine scheduling based on data flow driven with instructions to implement each part of the CS algorithm. Thus, multi-mode floating-point SAR imaging processing can be integrated into a single Application-Specific Integrated Circuit (ASIC) chip instead of relying on distributed technologies. As a proof of concept, a prototype measurement system with chip-included board is implemented, and the performance of the proposed design is demonstrated on Chinese Gaofen-3 stripmap continuous imaging. A chip requires 9.2 s, 50.6 s and 7.4 s for a stripmap with 16,384×16,384 granularity, multi-channel stripmap with 65.536×8192 granularity and multi-channel scan mode with 32,768×4096 granularity and 6.9 W for the system hardware to process the SAR raw data.
ARTICLE | doi:10.20944/preprints202308.2137.v1
Subject: Computer Science And Mathematics, Security Systems Keywords: elliptic curve cryptography; affine, projective, and Jacobian coordinates; modular multiplication; hardware security module; Verilog HDL; FPGA; cost/performance evaluation
Online: 31 August 2023 (09:35:06 CEST)
Elliptic curve cryptography (ECC) over prime fields relies on scalar point multiplication realized by point addition and point doubling. Point addition and point doubling operations consist of many modular multiplications of large operands (256 bits for example), especially in projective and Jacobian coordinates which eliminate the modular inversion required in affine coordinates for every point addition or point doubling operation. Accelerating modular multiplication is therefore important for high-performance ECC. This paper presents the hardware implementations of modular multiplication algorithms, including 1) Interleaved modular multiplication (IMM), 2) Montgomery modular multiplication (MMM), 3) Shift-sub modular multiplication (SSMM), 4) SSMM with advance preparation (SSMMPRE), and 5) SSMM with CSAs and sign detection (SSMMCSA) algorithms, and evaluates their execution time (the number of clock cycles and clock frequency) and required hardware resources (ALMs and registers). Experimental results show that SSMM is 1.76 times faster than IMM, and SSMMCSA is 3.21 times faster than IMM. We also present the ECC hardware implementations based on the Secp256k1 protocol in affine, projective, and Jacobian coordinates using the IMM, SSMM, SSMMPRE, and SSMMCSA algorithms, and investigate their cost and performance. Our ECC implementations can be applied to the design of hardware security module systems.
ARTICLE | doi:10.20944/preprints202310.1472.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: 5G NR standard, Error correcting codes, Low-density parity check codes (LDPC), Stochastic decoding (SD), Field-Programmable Gate Array (FPGA).
Online: 24 October 2023 (08:12:11 CEST)
Iterative Stochastic decoding is an alternative to standard fixed-point decoding of Low-density parity check codes (LDPC) to reduce inter-node routing. In this paper, we propose a Flexible Field-Programmable Gate Array (FPGA)-Based Stochastic decoding (SD) hardware architecture for LDPC codes in the Fifth-Generation (5G) New Radio (NR) Standard that supports decoding of set of various code rates. This decoder’s runtime flexibility is desirable to switch a better performing code rate automatically based on the channel conditions without the extra time needed for reprogramming of FPGA. An offline design method is implemented to generate the hardware description language (HDL) code description of the decoder for the required code-rate set, which is further synthesized and integrated into the Xilinx Kintex-7 series FPGA board to determine the hardware resource utilisation and processing throughput. The Synopsys design tools were employed during both the simulation and synthesis stages, in combination with TSMC 65-nm CMOS standard cell technology, to facilitate comparative analysis. Compared with state-of-the-art designs, the proposed architecture reduces hardware utilization by up to 26% and achieved energy efficiency by 52%.
ARTICLE | doi:10.20944/preprints202308.0825.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: max-pooling; convolutional neural network (CNN); FPGA; rank tracking based max-pooling (RTB-MAXP); cascaded maximum based max-pooling (CMB-MAXP)
Online: 10 August 2023 (05:48:50 CEST)
This paper proposes two max-pooling engines, named the RTB-MAXP engine and the CMB-MAXP engine, with a scalable window size parameter for FPGA-based convolutional neural network (CNN) implementation. The max-pooling operation for the CNN can be decomposed into two stages, i.e., a horizontal axis max-pooling operation and a vertical axis max-pooling operation. These two one-dimensional max-pooling operations are performed by tracking the rank of the values within the window in the RTB-MAXP engine and cascading the maximum operations of the values in CMB-MAXP engine. Both the RBM-MAXP engine and the CMB-MAXP engine were implemented using VHSIC Hardware Description Language (VHDL) and verified by simulations. They have been employed for and tested in our CNN accelerator targeting at the CNN model YOLOv4-CSP-S-Leaky for object detection.
ARTICLE | doi:10.20944/preprints201702.0050.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Digital Lock-in Amplifier (DLIA); Field Programmable Gate Array (FPGA); Near Infrared Spectroscopy (NIRS); Hardware Description Language (HDL); Light Emitting Diode (LED); Silicon Photomultiplier (SiPM); Microprocessors
Online: 14 February 2017 (09:11:38 CET)
Functional Near Infrared Spectroscopy (fNIRS) systems for e-health applications usually suffer of poor signal detection mainly due to a low end-to-end signal to noise ratio of the electronics chain. Lock-In Amplifiers (LIA) historically represent a powerful technique helping to improve performances in such circumstances. In this work it has been designed and implemented a digital LIA system, based on a Zynq® Field Programmable Gate Array (FPGA), trying to explore if this technique might improve fNIRS system performances. More broadly, FPGA based solution flexibility has been investigated, with particular emphasis applied to digital filter parameters, needed in the digital LIA, and it has been evaluated its impact on the final signal detection and noise rejection capability. The realized architecture was a mixed solution between VHDL hardware modules and software ones, running within a softcore microprocessor. Experimental results have shown the goodness of the proposed solutions and comparative details among different implementation will be detailed. Finally a key aspect taken into account throughout the design was its modularity, allowing an ease increase of the input channels while avoiding the growth of the design cost of the electronics system.