Automatic speech recognition has advanced rapidly for high-resource languages, yet performance remains limited for low-resource languages such as Bangla, particularly in telehealth settings. Most systems rely on a standardized 16 kHz sampling rate, a design choice despite evidence that Bangla contains sibilant fricatives and other phonetic cues with substantial high-frequency energy that may be suppressed under bandwidth and latency constraints. This study evaluates audio sampling rate as a controllable signal-level parameter for Bangla telehealth ASR to identify an empirically grounded operating range balancing transcription accuracy, execution time, and network bandwidth. Twenty real-world Bangla doctor–patient consultations recorded at 32 kHz were deterministically resampled to 55 configurations between 8 kHz and 32 kHz and transcribed using a fixed cloud-based ASR system. Session-level Word Error Rate, execution latency, payload bandwidth, and high-frequency phonetic content were analyzed using a composite sibilant-likelihood score. WER decreased from 0.338 at 8 kHz to a local minimum of 0.232 at 18.75 kHz, with gains plateauing beyond this range despite substantial bandwidth increases. Elbow-point, Pareto frontier, weighted scoring, and Minimum Acceptable Trade-off analyses converged on an optimal region between 17.25 and 18.75 kHz, demonstrating that sampling-rate optimization improves ASR accuracy without proportional resource costs in telehealth settings.