Preprint Article Version 1 This version not peer reviewed

The Entropy of Words—Learnability and Expressivity across More Than 1000 Languages

Version 1 : Received: 27 April 2017 / Approved: 27 April 2017 / Online: 27 April 2017 (15:54:13 CEST)

A peer-reviewed article of this Preprint also exists.

Bentz, C.; Alikaniotis, D.; Cysouw, M.; Ferrer-i-Cancho, R. The Entropy of Words—Learnability and Expressivity across More than 1000 Languages. Entropy 2017, 19, 275. Bentz, C.; Alikaniotis, D.; Cysouw, M.; Ferrer-i-Cancho, R. The Entropy of Words—Learnability and Expressivity across More than 1000 Languages. Entropy 2017, 19, 275.

Journal reference: Entropy 2017, 19, 275
DOI: 10.3390/e19060275

Abstract

The choice associated with words is a fundamental property of natural languages. It lies at the heart of quantitative linguistics, computational linguistics, and language sciences more generally. Information-theory gives us tools at hand to measure precisely the average amount of choice associated with words—the word entropy. Here we use three parallel corpora—encompassing ca. 450 million words in 1916 texts and 1259 languages—to tackle some of the major conceptual and practical problems of word entropy estimation: dependence on text size, register, style and estimation method, as well as non-independence of words in co-text. We present three main results: 1) a text size of 50K tokens is sufficient for word entropies to stabilize throughout the text, 2) across languages of the world, word entropies display a unimodal distribution that is skewed to the right. This suggests that there is a trade-off between the learnability and expressivity of words across languages of the world. 3) There is a strong linear relationship between unigram entropies and entropy rates, suggesting that they are inherently linked. We discuss the implications of these results for studying the diversity and evolution of languages from an information-theoretic point of view.

Subject Areas

natural language, unigram entropy, entropy rate, learnability, expressivity

Readers' Comments and Ratings (0)

Leave a public comment
Send a private comment to the author(s)
Rate this article
Views 0
Downloads 0
Comments 0
Metrics 0
Leave a public comment

×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.