Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Semantic Complexity; Semantics; Text Complexity; Readability Formulae
Online: 6 September 2021 (13:33:34 CEST)
Simple measures often couldn’t count a deep complexity. In the case of semantic complexity, conventional readability formulas share a common style, a common sort of achievements and a common borders of limitation: These formulas lack a semantics-aware approach and as a result, a precise measurement of semantic complexity couldn’t be done. In this paper, we introduce DASTEX, a novel semantics-aware complexity measure for semantic complexity of text. By DASTEX, a new layer of complexity analysis are opened for NLP, cognitive and computational tasks. This measure benefits from an intuitionistic underlying formal model which consider semantic as a lattice of intuitions. This yields to a well-defined definition for semantic of a text and its complexity. DASTEX is a practical analysis method upon this formal model. So a complete suite of idea, model and method are prepared to result in a simple but yet deep measure for semantic complexity of text. The evaluation of the proposed approach is done by 4 Experiments. The results show DASTEX is capable of measuring the semantic complexity of text in 6 application-tasks.
ARTICLE | doi:10.20944/preprints201811.0505.v1
Subject: Arts & Humanities, Linguistics Keywords: Italian, readability, GULPEASE, literature, statistics, characters, words, sentences, punctuation marks, short−term memory, word interval, time interval
Online: 20 November 2018 (15:32:11 CET)
Statistics of languages are calculated by counting characters, words, sentences, word rankings. Some of these random variables are also the main “ingredients” of classical readability formulae. Revisiting the readability formula of Italian, known as GULPEASE, shows that of the two terms that determine the readability index G – the semantic index G_C, proportional to the number of characters per word, and the syntactic index G_F, proportional to the reciprocal of the number of words per sentence −, G_F is dominant because G_C is, in practice, constant for any author throughout seven centuries of Italian Literature. Each author can modulate the length of sentences more freely than he can do with the length of words, and in different ways from author to author. For any author, any couple of text variables can be modelled by a linear relationship y=mx, but with different slope m from author to author, except for the relationship between characters and words, which is unique for all. The most important relationship found in the paper is, in author’s opinion, that between the short−term memory capacity, described by Miller’s “7∓2 law”, and the word interval, a new random variable defined as the average number of words between two successive punctuation marks. The word interval can be converted into a time interval through the average reading speed. The word interval is spread in the same of Miller’s law, and the time interval is spread in the same range of short−term memory response times. The connection between the word interval (and time interval) and short−term memory appears, at least empirically, justified and natural, and should further investigated. Technical and scientific writings (papers, essays etc.) ask more to their readers. A preliminary investigation of these texts shows clear differences: words are on the average longer, the readability index G is lower, word and time intervals are longer. Future work done on ancient languages, such as Greek or Latin, could bring us a flavor of the short term−memory features of these ancient readers.
ARTICLE | doi:10.20944/preprints201811.0149.v1
Subject: Arts & Humanities, Religious Studies Keywords: Confidence tests, dictations, Jesus Christ, Maria Valtorta, mystics, punctuation marks, readability index, sentences, semantic index, syntactic index, text characters, Virgin Mary, visions, words, word interval.
Online: 7 November 2018 (09:06:01 CET)
We have studied the very large amount of literary works written by the Italian mystic Maria Valtorta to assess similarities and differences in her writings because she claims that most of them are due to mystical visions. We have used mathematical and statistical tools developed for specifically studying deep linguistic aspects of texts. The general trend indicates that the literary works explicitly attributable to Maria Valtorta differ significantly from her other literary works, that she claims are attributable to the alleged characters Jesus and Mary. Mathematically, they seem to have been written by different authors. The comparison with the Italian literature is very striking. A single author, namely Maria Valtorta, seems to be able to write texts so diverse to cover the entire mathematical range (suitable defined) of the Italian literature of seven centuries.