Accurate prediction of significant wave height (SWH) is essential for fisheries management, coastal socio-economic activities, and marine ecological conservation. In recent years, deep learning-based bias correction has shown considerable potential for improving numerical wave forecasts. However, many existing approaches are still constrained by limited receptive fields and often struggle to capture long-range spatiotemporal dependencies in wave forecast errors. To deal with this issue, we adapt and improve a video prediction framework, namely the Vision Mamba Recurrent Neural Network (VMRNN), to model and correct the spatiotemporal patterns of SWH prediction biases. Comprehensive evaluations show that the multi-channel VMRNN achieves consistently high predictive accuracy across different forecast lead times and sea-state conditions. When validated against reanalysis data, the proposed model reduces the root mean square error (RMSE) of WAVEWATCH III forecasts by 28.2%, 26.1%, and 24.7% at lead times of 24, 48, and 72 hours, respectively. It also preserves the spatial structure of SWH fields quite well, with the spatial structural similarity index remaining as high as 0.945 even at the 72-hour lead time. Regional assessments over high-wave areas further indicate that VMRNN can effectively reduce both the mean error and the systematic overestimation commonly found in numerical wave models. Additional validation using in-situ buoy observations confirms that the model has a robust ability to correct systematic positive biases, especially for wave heights ranging from 0.5 m to 2 m. Taken together, these results suggest that VMRNN has strong spatiotemporal modeling capability and can serve as a promising post-processing framework for improving operational physics-based wave forecasting systems.