Sort by
AgroNova: An Autonomous IoT Platform for Greenhouse Climate Control
Borislav Toskov
,Asya Toskova
Posted: 13 February 2026
Transformer Algorithmics: A Tutorial on Efficient Implementation of Transformers on Hardware
Christoforos Kachris
Posted: 11 February 2026
Prompt Sensitivity and Bias Amplification in Aligned Video Diffusion Models
Marco Rossi
,Giulia Bianchi
,Alessandro Conti
Posted: 27 January 2026
Cross-Modal Bias Transfer in Aligned Video Diffusion Models
Yuki Nakamura
,Kenji Sato
,Ayaka Suzuki
,Hiroshi Tanaka
Posted: 27 January 2026
Reliability–Latency Co-Optimization in Parallel Register Array Frameworks Under Fault Injection
Jun Wei
,Li Ming
,Wei Zhang
Posted: 26 January 2026
The Spike Processing Unit (SPU): An IIR Filter Approach to Hardware-Efficient Spiking Neurons
Hugo Puertas de Araújo
Posted: 14 January 2026
A Review of Floating-Point Arithmetic Algorithms Using Taylor Series Expansion and Mantissa Region Division Techniques
Jianglin Wei
,Haruo Kobayashi
Posted: 05 January 2026
A Unified GF(4)–Symplectic Framework for Quantum Error Correction: A Constructive, Pedagogical Derivation of the Steane [[7,1,3]] Code
Amir Hameed Mir
Posted: 04 December 2025
ESDM–SMTJ: An Entropic Semantic Dynamics Model for Classical Probabilistic Hardware with Superparamagnetic Tunnel Junctions
Ezequiel Lapilover
Posted: 02 December 2025
Design of an Energy-Efficient SHA-3 Accelerator on Artix-7 FPGA for Secure Network Applications
Abdulmunem A. Abdulsamad
,Sándor R. Répás
Posted: 28 November 2025
Large Pages, Large Leaks? Hugepage-Induced Side-Channels vs. Performance Improvements in Cryptographic Computations
Xinyao Li
,Akhilesh Tyagi
Posted: 20 November 2025
Cooling, Placement, and Virtualization for Sustainability
Pedro Ramos Brandao
Posted: 18 August 2025
An Open Chisel-Based Framework for Hardware Acceleration on High-Performance FPGA Cards
Robin Gay
,Tarek Ould-Bachir
Posted: 13 August 2025
Near-Optimal Multirun March Memory Tests for Neighborhood Pattern-Sensitive Faults in Random-Access Memories
Petru Cascaval
,Doina Cascaval
Posted: 09 July 2025
Extending a Moldable Computer Architecture to Accelerate DL Inference on FPGA
Mirko Mariotti
,Giulio Bianchini
,Igor Neri
,Daniele Spiga
,Diego Ciangottini
,Loriano Storchi
Posted: 27 May 2025
Plücker Conoid-Inspired Geometry for Wave-Based Computing Systems
Arturo Tozzi
Posted: 18 April 2025
Adaptive NVM Word Compression Based on Cache Line Dynamics on Micro-Architecture
Jialin Wang
,Zhen Yang
,Zhenghao Yin
,Yajuan Du
Posted: 15 April 2025
A Survey on Advancements in Scheduling Techniques for Efficient Deep Learning Computations on GPUs
Rupinder Kaur
,Arghavan Asad
,Seham Al Abdul Wahid
,Farah Mohammadi
Posted: 20 February 2025
Benchmarking Hyper-Breakpoints for Efficient Virtual Machine Introspection
Lukas Beierlieb
,Alexander Schmitz
,Christian Dietrich
,Raphael Springer
,Lukas Iffländer
Posted: 03 January 2025
Object Detection Post-Processing Accelerator Based on Co-Design of Hardware and Software
Dengtian Yang
,Lan Chen
,Xiaoran Hao
,Mao Ni
,Ming Chen
,Yiheng Zhang
Deep learning significantly advances object detection. Post process, a critical component of this process, selects valid bounding boxes to represent true targets during inference and assigns boxes and labels to these objects during training to optimize the loss function. However, post process constitutes a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post-processing algorithms. To reduce the redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post process involves sorting operations that contribute to inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated post process. Given the similarities between training and inference post processes, we unify four typical post-processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post process compared to recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for training post process and 19.89 times for inference post process, thereby significantly enhancing the efficiency of loss function minimization.
Deep learning significantly advances object detection. Post process, a critical component of this process, selects valid bounding boxes to represent true targets during inference and assigns boxes and labels to these objects during training to optimize the loss function. However, post process constitutes a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post-processing algorithms. To reduce the redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post process involves sorting operations that contribute to inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated post process. Given the similarities between training and inference post processes, we unify four typical post-processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post process compared to recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for training post process and 19.89 times for inference post process, thereby significantly enhancing the efficiency of loss function minimization.
Posted: 05 December 2024
of 4