Water body extraction using remote sensing is crucial for ecological environment monitoring and water resource management. Nevertheless, the irregular and complicated shapes of water bodies make it difficult to obtain fine-grained characterization and preserve structural consistency with current approaches. To overcome the shortcomings of fixed receptive fields and sampling schemes of traditional convolutional networks, this paper proposes UNet-LSCNet, an advanced architecture based on U-Net. The proposed model integrates dynamic snake convolutions (DSConv), the convolutional block attention module (CBAM), and a lightweight Vision Transformer (LaViT) to enable local adaptive geometric modeling and contextually enriched semantics representation. The experimental findings demonstrate that the suggested approach outperforms various widely applied models. Specifically, UNet-LSCNet achieves a mIoU of 95.67% and an F1-score of 96.32%, while maintaining a competitive inference speed of 4.18 frames per second (FPS). Furthermore, it exhibits greater stability in highly complex situations, such as slender meandering rivers and fragmented small-scale water bodies. Ablation experiments confirm the synergistic utility of each module, revealing that the proposed model enhances segmentation accuracy and morphological resilience without compromising inference efficiency.