Sort by
Plücker Conoid-Inspired Geometry for Wave-Based Computing Systems
Arturo Tozzi
Posted: 18 April 2025
Adaptive NVM Word Compression Based on Cache Line Dynamics on Micro-Architecture
Jialin Wang,
Zhen Yang,
Zhenghao Yin,
Yajuan Du
Posted: 15 April 2025
A Survey on Advancements in Scheduling Techniques for Efficient Deep Learning Computations on GPUs
Rupinder Kaur,
Arghavan Asad,
Seham Al Abdul Wahid,
Farah Mohammadi
Posted: 20 February 2025
Benchmarking Hyper-Breakpoints for Efficient Virtual Machine Introspection
Lukas Beierlieb,
Alexander Schmitz,
Christian Dietrich,
Raphael Springer,
Lukas Iffländer
Posted: 03 January 2025
Object Detection Post-Processing Accelerator Based on Co-Design of Hardware and Software
Dengtian Yang,
Lan Chen,
Xiaoran Hao,
Mao Ni,
Ming Chen,
Yiheng Zhang
Deep learning significantly advances object detection. Post process, a critical component of this process, selects valid bounding boxes to represent true targets during inference and assigns boxes and labels to these objects during training to optimize the loss function. However, post process constitutes a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post-processing algorithms. To reduce the redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post process involves sorting operations that contribute to inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated post process. Given the similarities between training and inference post processes, we unify four typical post-processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post process compared to recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for training post process and 19.89 times for inference post process, thereby significantly enhancing the efficiency of loss function minimization.
Deep learning significantly advances object detection. Post process, a critical component of this process, selects valid bounding boxes to represent true targets during inference and assigns boxes and labels to these objects during training to optimize the loss function. However, post process constitutes a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post-processing algorithms. To reduce the redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post process involves sorting operations that contribute to inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated post process. Given the similarities between training and inference post processes, we unify four typical post-processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post process compared to recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for training post process and 19.89 times for inference post process, thereby significantly enhancing the efficiency of loss function minimization.
Posted: 05 December 2024
Dynamic Key Replacement Mechanism for Lightweight IoT Microcontrollers to Resist Side-channel Attacks
Chung-Wei Kuo,
Wei Wei,
Chun-Chang Lin,
Yu-Yi Hong,
Jia-Ruei Liu,
Kuo-Yu Tsai
Posted: 29 November 2024
Redfish API And vSphere Hypervisor API: A Unified Framework For Policy-Based Server Monitoring
Vedran Dakić,
Karlo Bertina,
Jasmin Redžepagić,
Damir Regvart
Posted: 17 November 2024
Challenges of the QWERTY Keyboard for Quechua Speakers in the Puno Region in Perú
Henry Juarez Vargas,
Roger Mijael Mansilla Huanacuni,
Fred Torres Cruz
Posted: 09 October 2024
Container Based Electronic Control Unit Virtualisation: A Paradigm Shift Towards a Centralised Automotive E/E Architecture
NIcholas Ayres,
Lipika Deka,
Daniel Paluszczyszyn
Posted: 21 August 2024
Designing a Scalable and Area-Efficient Hardware Accelerator Supporting Multiple PQC Schemes
Heonhui Jung,
Hyunyoung Oh
Posted: 01 July 2024
A Comprehensive Review on Processing-in-Memory Architectures for Deep Neural Networks
Rupinder Kaur,
Arghavan Asad,
Farahnaz Mohammadi
Posted: 21 June 2024
Building an Analog Circuit Synapse for Deep Learning Neuromorphic Processing
Alejandro Juarez-Lora,
Victor H. Ponce-Ponce,
Humberto Sossa-Azuela,
Osvaldo Espinosa-Sosa,
Elsa Rubio-Espino
Posted: 28 May 2024
Memristors in the Context of Security: A Brief Meta-Review of the State of the Art
Alexander Tekles,
Nico Mexis,
Stefan Katzenbeisser
Posted: 23 May 2024
Compact and Low-latency FPGA-based NTT Architecture for CRYSTALS Kyber Post-Quantum Cryptography Scheme
Binh Kieu-Do-Nguyen,
Nguyen The Binh,
Cuong Pham-Quoc,
Phuc Nghi Huynh,
Ngoc-Thinh Tran,
Trong-Thuc Hoang,
Cong-Kha Pham
Posted: 22 May 2024
FPGA-based Accelerator Method for Edge Computing
Peter Schulz,
Grigore Sleahtitchi
Posted: 10 May 2024
ECHO: Energy-Efficient Computation Harnessing Online Arithmetic – a MSDF-Based Accelerator for DNN Inference
Muhammad Sohail Ibrahim,
Muhammad Usman,
Jeong-A Lee
Posted: 08 April 2024
Enabling Efficient On-Edge Spiking Neural Network Acceleration with Highly Flexible FPGA Architectures
Samuel López-Asunción,
Pablo Ituero
Posted: 20 February 2024
A Hardware Implementation of the PID Algorithm Using Floating-Point Arithmetic
Józef Kulisz,
Filip Jokiel
Posted: 24 January 2024
PFA-TP3M: Permanent Fault Aware Two-Phase Peak-Power Management in Fault-Tolerant Multi-Core Systems
Pargol Hatefi,
Mohammad Salehi
Posted: 03 January 2024
A Hardware Realization Framework for Fuzzy Inference System Optimization
Saeid Gorgin,
Mohammad Sina Karvandi,
Somaye Moghari,
Mohammad K Fallah,
Jeong-A Lee
Posted: 20 December 2023
of 3