Preprint
Article

This version is not peer-reviewed.

Streaming Transformer Networks: Unified Hearing-to-Speech Recognition and Intelligent Text Generation Systems

Submitted:

03 March 2026

Posted:

04 March 2026

You are already at the latest version

Abstract
Streaming Transformer Networks: Unified Hearing-to-Speech Recognition and Intelligent Text Generation Systems introduce a groundbreaking architecture that processes real-time audio streams to produce both synthesized speech outputs and contextually intelligent text, overcoming traditional limitations in multimodal AI systems. Traditional speech recognition models often operate offline, requiring full audio sequences before generating results, which hinders interactive applications. This work proposes a transformer-based framework that unifies hearing-to-speech translation directly converting input audio into natural-sounding speech with advanced text generation capabilities, enabling seamless dual-mode responses in conversational agents. By adapting transformers for streaming via causal attention and triggered mechanisms, the system achieves low-latency performance while maintaining high fidelity in prosody preservation and semantic coherence. Key innovations include shared encoder layers for efficiency, hybrid decoding paths for modality-specific outputs, and joint optimization across diverse objectives like word error rate minimization and perceptual quality enhancement. Evaluations on standard benchmarks demonstrate superior results, with latency under 200ms and error rates rivalling non-streaming baselines, paving the way for deployment in voice assistants, live captioning, and real-time dialogue systems. This unified approach not only reduces model complexity but also advances end-to-end learning for dynamic audio-to-multimodal generation tasks.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated