Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

TWIENG: A Multi-Domain Twi-English Parallel Corpus for Machine Translation of Twi, a Low-Resource African Language

Version 1 : Received: 21 March 2022 / Approved: 23 March 2022 / Online: 23 March 2022 (02:28:21 CET)

How to cite: Afram, G.K.; Weyori, B.A.; Adekoya, F.A. TWIENG: A Multi-Domain Twi-English Parallel Corpus for Machine Translation of Twi, a Low-Resource African Language. Preprints 2022, 2022030303. https://doi.org/10.20944/preprints202203.0303.v1 Afram, G.K.; Weyori, B.A.; Adekoya, F.A. TWIENG: A Multi-Domain Twi-English Parallel Corpus for Machine Translation of Twi, a Low-Resource African Language. Preprints 2022, 2022030303. https://doi.org/10.20944/preprints202203.0303.v1

Abstract

A Twi-English parallel corpus is certainly an important resource for Machine Translation of Twi (ISO 639-3), a Low-Resource African Language (LRAL) which is mainly spoken in Ghana and Ivory Coast. Currently large-scale multi-domain Twi-English parallel corpus is still unavailable partly due to the difficulties and the arduous efforts required in its design. In this paper, we present TWIENG: a large-scale multi-domain Twi-English parallel corpus. We crawled the sentences from the web using web crawlers, translated, aligned, tokenized and compiled to create the corpus. We crawled English sentences from Ghanaian indigenous electronic news portals, Ghanaian Parliamentary Hansards, Twi Bible and crowdsourcing via google forms. The sentences were translated by professional translators and linguists, they were then aligned, tokenized and compiled. The corpus was curated using the sketch engine, a corpus manager and analysis software developed by Lexical Computing Limited. The corpus was manually evaluated by Twi professional linguists. The Corpus has 5,419 parallel sentences.

Keywords

Twi; Parallel corpus; Tokens; Sketch Engine; Word sketch; Parallel concordance; Machine Translation; Low-Resource Language

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.