Submitted:
04 May 2026
Posted:
06 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A novel PDF decomposition scheme exploiting the fixed page-layout structure of the source storybooks;
- A prompt engineering methodology for child-oriented, voiceover-ready translation via a local LLM;
- A cross-lingual TTS approach combining smart chunking and zero-shot voice cloning; and
- An automated HTML5 flipbook generation module that synchronises page-turn events with audio playback.
2. Source Material and Motivation
2.1. The Stories of Türkiye Corpus
2.2. Target Language Selection
3. Methods
3.1. Stage 1: Structured PDF Decomposition (pdf2text.py)
- Rule 1: Pages 1–3 are always saved as PNG images (cover, logo, colophon).
- Rule 2: The final page (page 22) is always saved as a PNG image (back cover).
- Rule 3: Intermediate pages alternate between text (even pages →.txt via get_text()) and image (odd pages → .png via get_pixmap()).
3.2. Stage 2: LLM-Based Translation (translate.py)
- (a)
- Age-appropriate vocabulary: The output must be written for children aged 5–10, using simple and engaging language faithful to the original story.
- (b)
- PDF artifact repair: Broken lines and incomplete sentences resulting from the PDF extraction step must be rejoined into fluent, complete sentences.
- (c)
- Voiceover formatting: Detected headings are to be followed by a period and a newline, creating natural pause points for TTS software.
- (d)
- Proper noun preservation: Turkish proper nouns (e.g., Karamel, Türkiye, Anadolu) must not be translated and must appear verbatim in all target languages.
3.3. Stage 3: Multilingual TTS Synthesis (text2speech.py)
3.4. Stage 4: HTML5 Flipbook Generation (webbooks.py)
4. Results
4.1. Code and Data Availability
4.2. Coverage
4.3. Translation Quality Observations
4.4. TTS Audio Quality
5. Discussion
5.1. Scalability
5.2. Privacy and Cost
5.3. Limitations
6. Conclusion
- (i)
- expanding TTS coverage to the remaining 19 text-only languages;
- (ii)
- conducting formal human evaluation of translation and audio quality;
- (iii)
- integrating accessibility standards (WCAG 2.1); and
- (iv)
- exploring fine-tuning of translation models on the specific domain of Turkish cultural children’s literature to further improve output fidelity.
Acknowledgments
Conflicts of Interest
References
- Korat, O.; Shamir, A. Do Hebrew electronic books differ from Dutch electronic books? A replication of a Dutch content analysis. J. Comput. Assist. Learn. 2004, 20, 257–268. [Google Scholar] [CrossRef]
- Kılıçlıoğlu, A.; Acar, E.; Doğan, C.; Bişgen, E.; Karasakal, C.; Konukseven, H.; Yirmibeş, S.K.; Ballı, N.; Begenjov, S. Stories of Türkiye. 2026. Available online: https://www.storiesofturkiye.com/.
- Team, L.S. LM Studio. 2024. Available online: https://github.com/lmstudio-ai.
- Team, G. Gemma 3 Technical Report. arXiv 2025, arXiv:2503.19786. [Google Scholar] [CrossRef]
- Da Costa-Luis, C. tqdm: A Fast, Extensible Progress Meter for Python and CLI. J. Open Source Softw. 2019, 4, 1277. [Google Scholar] [CrossRef]
- AI, R. Chatterbox: Open Source Text-to-Speech Model. 2025. Available online: https://huggingface.co/ResembleAI/chatterbox (accessed on 2026-05-02).
- Harris, C.R.; Millman, K.J. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
- Robert, J. Pydub. 2018. [Google Scholar]
- Nodlik. StPageFlip - Simple library for creating realistic page turning. 2021. Available online: https://nodlik.github.io/StPageFlip/ (accessed on 02.05.2026).
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002; pp. 311–318. [Google Scholar]
- Rei, R.; Stewart, C.; Farinha, A.C.; Lavie, A. COMET: A neural framework for MT evaluation. In Proceedings of the Proceedings of the 2020 conference on empirical methods in natural language processing (emnlp), 2020; pp. 2685–2702. [Google Scholar]

| Parameter | Value |
|---|---|
| Total Books | 53 |
| Pages per Book | 22 |
| Total Source Pages | 1,166 |
| Text Pages per Book | 9 (p. 4, 6, 8, 10, 12, 14, 16, 18, 20) |
| Image Pages per Book | 13 |
| Source Language | English |
| Target Languages | 34 |
| Total Book-Language Combinations | 1,802 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).