Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean

Version 1 : Received: 2 January 2022 / Approved: 4 January 2022 / Online: 4 January 2022 (20:24:43 CET)

How to cite: Kim, A.; Kim, J. Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean. Preprints 2022, 2022010018. https://doi.org/10.20944/preprints202201.0018.v1 Kim, A.; Kim, J. Guidance to Pre-tokeniztion for SacreBLEU: Meta-Evaluation in Korean. Preprints 2022, 2022010018. https://doi.org/10.20944/preprints202201.0018.v1

Abstract

SacreBLEU, by incorporating a text normalizing step in the pipeline, has been well-received as an automatic evaluation metric in recent years. With agglutinative languages such as Korean, however, the metric cannot provide a conceivable result without the help of customized pre-tokenization. In this regard, this paper endeavors to examine the influence of diversified pre-tokenization schemes –word, morpheme, character, and subword– on the aforementioned metric by performing a meta-evaluation with manually-constructed into-Korean human evaluation data. Our empirical study demonstrates that the correlation of SacreBLEU (to human judgment) fluctuates consistently by the token type. The reliability of the metric even deteriorates due to some tokenization, and MeCab is not an exception. Guiding through the proper usage of tokenizer for each metric, we stress the significance of a character level and the insignificance of a Jamo level in MT evaluation.

Supplementary and Associated Material

http:// github.com/ko_sacrebleu: GitHub for the codes

Keywords

NMT Evaluation, Meta-Evaluation, SacreBLEU, Korean

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.