Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Italian Throughout Seven Centuries of Literature: Deep Language Statistics And Their Relationship With Miller’s 7∓2 Law and Short−Term Memory

Version 1 : Received: 18 November 2018 / Approved: 20 November 2018 / Online: 20 November 2018 (15:32:11 CET)

How to cite: Matricciani, E. Italian Throughout Seven Centuries of Literature: Deep Language Statistics And Their Relationship With Miller’s 7∓2 Law and Short−Term Memory. Preprints 2018, 2018110505. https://doi.org/10.20944/preprints201811.0505.v1 Matricciani, E. Italian Throughout Seven Centuries of Literature: Deep Language Statistics And Their Relationship With Miller’s 7∓2 Law and Short−Term Memory. Preprints 2018, 2018110505. https://doi.org/10.20944/preprints201811.0505.v1

Abstract

Statistics of languages are calculated by counting characters, words, sentences, word rankings. Some of these random variables are also the main “ingredients” of classical readability formulae. Revisiting the readability formula of Italian, known as GULPEASE, shows that of the two terms that determine the readability index G – the semantic index G_C, proportional to the number of characters per word, and the syntactic index G_F, proportional to the reciprocal of the number of words per sentence −, G_F is dominant because G_C is, in practice, constant for any author throughout seven centuries of Italian Literature. Each author can modulate the length of sentences more freely than he can do with the length of words, and in different ways from author to author. For any author, any couple of text variables can be modelled by a linear relationship y=mx, but with different slope m from author to author, except for the relationship between characters and words, which is unique for all. The most important relationship found in the paper is, in author’s opinion, that between the short−term memory capacity, described by Miller’s “7∓2 law”, and the word interval, a new random variable defined as the average number of words between two successive punctuation marks. The word interval can be converted into a time interval through the average reading speed. The word interval is spread in the same of Miller’s law, and the time interval is spread in the same range of short−term memory response times. The connection between the word interval (and time interval) and short−term memory appears, at least empirically, justified and natural, and should further investigated. Technical and scientific writings (papers, essays etc.) ask more to their readers. A preliminary investigation of these texts shows clear differences: words are on the average longer, the readability index G is lower, word and time intervals are longer. Future work done on ancient languages, such as Greek or Latin, could bring us a flavor of the short term−memory features of these ancient readers.

Keywords

Italian, readability, GULPEASE, literature, statistics, characters, words, sentences, punctuation marks, short−term memory, word interval, time interval

Subject

Social Sciences, Language and Linguistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.