Video synthetic aperture radar (SAR) enables observation of moving targets by leveraging temporal information across successive frames. In particular, dynamic shadows in video SAR image sequences provide critical cues for detecting moving objects whose energy is smeared or Doppler-shifted. To achieve high-resolution imaging at a high frame rate for effective dynamic scene monitoring, video SAR systems typically operate at extremely high frequencies or even in the terahertz band, rather than the microwave band. However, terahertz video SAR suffers from significant signal attenuation due to atmospheric absorption. We present a deep learning framework for high-frame-rate and high-resolution imaging with microwave video SAR system. In this framework, the problem of microwave video SAR imaging is formulated as an image super-resolution reconstruction task for low-resolution yet high-frame-rate image sequences from microwave video SAR. We develop a simple yet effective image super-resolution reconstruction network that is completely built upon convolutional neural networks. The designed network takes a low-resolution image sequence and the corresponding high-resolution image with blurred shadows as input, and then produces a high-resolution image sequence where shadows are clearly visible. Furthermore, the network is trained in a self-supervised manner and thus does not require desired high-resolution image sequences as ground truth, which is appealing to practical applications. Processing results of real data from two different video SAR systems have shown good performance of the proposed approach with convincing generalization ability.