Ear biometrics has emerged as a complementary modality for biometric recognition, particularly in unconstrained environments where traditional approaches such as face recognition may be affected by pose, illumination, or occlusion. Accurate ear segmentation plays a critical role in such systems by isolating the region of interest and reducing background interference. However, reliable segmentation remains challenging under real-world conditions due to occlusions, accessories, and variations in image quality. In this work, we investigate an encoder-enhanced U-Net architecture for pixel-wise ear segmentation, incorporating a ResNet-50 backbone to improve feature representation through transfer learning. The proposed approach is evaluated on the Annotated Web Ears (AWE) dataset and the EarSegDB-25 dataset under standard experimental settings. On AWE, the model achieves a mean Intersection over Union (IoU) of 77.1% and a pixel-wise accuracy of 99.7%, outperforming previously reported encoder–decoder baselines. On EarSegDB-25, the method attains a test IoU of 94.76%, demonstrating strong segmentation performance on a dataset with diverse real-world variations. We further analyze the relationship between pixel-wise accuracy and IoU, highlighting the limitations of accuracy as a metric in background-dominated segmentation tasks. While the architectural modification is incremental, the results indicate that incorporating a pretrained residual encoder can provide consistent improvements in segmentation quality under challenging conditions. These findings support the effectiveness of encoder-enhanced U-Net models as a practical solution for ear segmentation in biometric pipelines.