Preprint
Article

This version is not peer-reviewed.

Transformer-Based Pipeline for Speech-to-Text Transcription and Automated Text Synthesis

Submitted:

02 March 2026

Posted:

04 March 2026

You are already at the latest version

Abstract
This paper introduces a novel Transformer-Driven Pipeline that seamlessly integrates acoustic hearing, automated speech transcription, and writing synthesis into a unified end-to-end framework powered by advanced transformer architectures. Beginning with raw acoustic inputs captured via microphones, the pipeline preprocesses audio signals into spectrogram representations, leveraging stacked transformer encoders with multi-head self-attention to extract contextualized phonetic and prosodic features. These features feed into a sequence-to-sequence transcription module, where cross-attention mechanisms align auditory patterns with linguistic tokens, achieving robust speech-to-text conversion even in noisy environments or with diverse accents. Extending beyond transcription, the system employs a generative decoder to synthesize structured written outputs, such as summaries, reports, or formatted notes, by refining transcripts through autoregressive language modelling while preserving semantic fidelity and stylistic nuances derived from the original speech. Experimental validation on benchmark datasets like LibriSpeech and Common Voice demonstrates superior performance, with word error rates reduced by up to 25% compared to RNN baselines and enhanced fluency in synthesis metrics like BLEU scores. The pipeline's parallelizable design ensures real-time efficiency, making it ideal for applications in assistive technologies, live captioning, and automated documentation. This work highlights transformer's versatility in bridging auditory perception and textual production, paving the way for scalable multimodal AI systems.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated