Unmanned aerial vehicle (UAV) photogrammetry allows the generation of orthophoto and digital surface model (DSM) rasters of a terrain. However, DSMs of water bodies mapped using this technique often reveal distortions in the water surface, thereby impeding the accurate sampling of water surface elevation (WSE) from DSMs. This study investigates the capability of deep neural networks to accommodate the aforementioned perturbations and effectively estimate WSE from photogrammetric rasters. Convolutional neural networks (CNNs) were employed for this purpose. Two regression approaches utilizing CNNs were explored: direct regression employing an encoder and solution based on prediction of the weight mask by an encoder-decoder architecture, subse-quently used to sample values from the photogrammetric DSM. The dataset employed in this study comprises data collected from five case studies of small lowland streams in Poland and Denmark, consisting of 322 DSM and orthophoto raster samples. Each sample corresponds to a 10 by 10 meter area of the stream channel and adjacent land. A grid search was employed to identify the optimal combination of encoder, mask generation architecture, and batch size among multiple candidates. Solutions were evaluated using two cross-validation methods: stratified k-fold cross-validation, where validation subsets maintained the same proportion of samples from all case studies, and leave-one-case-out cross-validation, where the validation dataset originates entirely from a single case study, and the training set consists of samples from other case studies. The proposed solution was compared with existing methods for measuring WSE in small streams using a drone. The results indicate that the solution outperforms previous photogrammetry-based methods and is second only to the radar-based method, which is considered the most accurate method available. The solution is characterized by a high degree of explainability and generalization.