Investigating Sibilant Fricative Representation in Bangla Telemedicine Speech: A Cost-Aware Sampling Rate Optimization Study

Prajat Paul; Mohamed Mehfoud Bouh; Manan Vinod Shah; Forhad Hossain; Ashir Ahmed

doi:10.20944/preprints202603.1320.v1

Submitted:

17 March 2026

Posted:

17 March 2026

You are already at the latest version

Abstract

Automatic speech recognition has advanced rapidly for high-resource languages, yet performance remains limited for low-resource languages such as Bangla, particularly in telehealth settings. Most systems rely on a standardized 16 kHz sampling rate, a design choice despite evidence that Bangla contains sibilant fricatives and other phonetic cues with substantial high-frequency energy that may be suppressed under bandwidth and latency constraints. This study evaluates audio sampling rate as a controllable signal-level parameter for Bangla telehealth ASR to identify an empirically grounded operating range balancing transcription accuracy, execution time, and network bandwidth. Twenty real-world Bangla doctor–patient consultations recorded at 32 kHz were deterministically resampled to 55 configurations between 8 kHz and 32 kHz and transcribed using a fixed cloud-based ASR system. Session-level Word Error Rate, execution latency, payload bandwidth, and high-frequency phonetic content were analyzed using a composite sibilant-likelihood score. WER decreased from 0.338 at 8 kHz to a local minimum of 0.232 at 18.75 kHz, with gains plateauing beyond this range despite substantial bandwidth increases. Elbow-point, Pareto frontier, weighted scoring, and Minimum Acceptable Trade-off analyses converged on an optimal region between 17.25 and 18.75 kHz, demonstrating that sampling-rate optimization improves ASR accuracy without proportional resource costs in telehealth settings.

Keywords:

automatic speech recognition (ASR)

;

bangla language

;

sampling rate

;

telehealth

;

lowresource language (LRL)

;

sibilant fricatives

;

word error rate (WER)

;

bandwidth optimization

;

speech signal processing

Subject:

Engineering - Other

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Investigating Sibilant Fricative Representation in Bangla Telemedicine Speech: A Cost-Aware Sampling Rate Optimization Study

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe