Low-Light Video Enhancement via Fast–Slow Dual Branches and Flow-Guided Attention

Tianzhi Jia; Shikui Wei; Yao Zhao

doi:10.20944/preprints202604.1276.v1

Submitted:

17 April 2026

Posted:

17 April 2026

You are already at the latest version

Abstract

Low-light video enhancement aims to restore clear, color-faithful, and temporally consistent visual content from video sequences captured under extremely low signal-to-noise ratios and high dynamic range constraints. Existing multi-frame enhancement methods typically adopt uniform spatio-temporal sampling and feature extraction strategies for all frames, making it challenging to simultaneously achieve long-range temporal denoising and accurate fast-motion modeling. To address this trade-off, we propose a low-light video enhancement framework based on a Fast–Slow dual-branch architecture. The video signal is decomposed into two complementary feature streams: a Slow branch with sparse temporal sampling and high spatial resolution, built on a Vision Transformer backbone, which focuses on long-range temporal denoising and high-frequency texture restoration for static and slow-moving regions; and a Fast branch with dense temporal sampling and low spatial resolution, built on a ViT-Tiny backbone, which efficiently captures large-scale motion and rapid illumination changes. To mitigate the discrepancy in sampling rates and spatial resolutions between the two branches, we further introduce a flow branch based on a pre-trained StreamFlow model and design a Flow-Guided Cross-Attention (FGCA) module. FGCA first uses optical flow to geometrically modulate and progressively align Fast-branch features, and then injects the flow-enhanced Fast features into the Slow branch at each space-time location via lightweight pixel-wise cross-attention. This mechanism achieves a cascade of coarse geometric alignment and fine semantic fusion. Experiments on two real-world low-light video datasets, SDSD-indoor and SDSD-outdoor, demonstrate that our method consistently outperforms several representative approaches in terms of PSNR, SSIM, AB(Var), and MABD, while effectively suppressing motion blur and ghosting artifacts in dynamic night scenes, yielding temporally stable and perceptually pleasing results.

Keywords:

low-light video enhancement

;

optical flow

;

Transformer

;

multi-rate architecture

;

temporal consistency

Subject:

Computer Science and Mathematics - Computer Vision and Graphics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Low-Light Video Enhancement via Fast–Slow Dual Branches and Flow-Guided Attention

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe