3.1. Quantitative characterization of the symmetry of alphabets; Shannon measures of symmetry and diversity of alphabets.
The genetic tree of the alphabets rooted in the Phoenician one is supplied in
Figure 1 [
18]. The supplied genetic tree is disputable, and we will address, at least partially, the problems related to its structure.
Symmetries of the symbols were analyzed and characterized with the Schoenflies notation as it is shown in
Table A1 and
Supplementary Materials. Let us illustrate the entire procedure with the Phoenician alphabet (for the symbols inscribed into a square) taken as an example. The symbols with only identity transformation (
C_{1}) symmetry are:
The symbols with identity transformation (
C_{1}) and horizontal mirror axis (
S_{1}) only is
. The symbols with identity transformation (
C_{1}) and vertical mirror axis (
S_{2}) only are:
,
,
. The symbols with identity transformation (
C_{1}), horizontal and vertical mirror axes (
S_{1}, S_{2}) and rotation on 180˚ (
C_{2}) only are:
,
,
,
. The symbols with identity transformation (
C_{1}), horizontal, vertical and diagonal mirror axes (
S_{1}, S_{2}, S_{3}, S_{4}) and 4-fold rotational symmetry (
C_{4}, C_{2}, C_{4}^{3}) only are:
,
.
Afterwards, two various Shannon symmetry measures were calculated for the addressed alphabets. The first is the Shannon/informational measure of symmetry of the alphabet
${H}_{\mathrm{S}\mathrm{Y}\mathrm{M}}\left(G\right)$ (abbreviated as IMS) defined in a Shannon-like form as follows:
where
${\mathit{P}}_{\mathit{i}}\left({\mathit{G}}_{\mathit{i}}\right)$ is the probability of appearance of the symmetry operation
${\mathit{G}}_{\mathit{i}}$ within the alphabet, and
${\mathit{N}}_{\mathit{G}}=\sum _{\mathit{i}=1}^{\mathit{k}}\mathit{m}\left({\mathit{G}}_{\mathit{i}}\right)$ is the total number of the symmetry elements (operations) appearing in the alphabet and
$\mathit{m}\left({\mathit{G}}_{\mathit{i}}\right)$ is the number of the same symmetry elements/operations
${\mathit{G}}_{\mathit{i}}$, calculated for a given set of symbols/alphabet. The normalization condition given by Equation (3) takes place:
Table 2 summarizes
$\mathit{m}\left({\mathit{G}}_{\mathit{i}}\right)$ as established for the Phoenician Script; the total number of the elements of symmetry established for the Phoenician Script
${\mathit{N}}_{\mathit{g}}=52.$
Substitution of data, appearing in
Table 2, and calculation with Eq. 1 yields:
${\mathit{H}}_{\mathit{S}\mathit{Y}\mathit{M}}\left(\mathit{P}\mathit{h}\mathit{o}\mathit{e}\mathit{n}\mathit{i}\mathit{c}\mathit{i}\mathit{a}\mathit{n}\right)=1.688.$ The aforementioned procedure was repeated for Western Greek, Euclidian Greek, Etruscan from Marsiliana, Archaic Etruscan, Neo-Etruscan, Proto-Hebrew, Hebrew, Archaic and Classic Latin and modern English scripts. The calculation was carried out for the symbols inscribed in a square (
s-scripts) and also for the symbols inscribed into a rectangle (
r-scripts). The symbol
was always considered with rectangle symmetry.
The second Shannon-like measure calculated for the addressed alphabets, known as Shannon diversity index (abbreviated SDI), we denote
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}$. For its calculation, we divide the total set of symbols/letters constituting the alphabet into subsets of symbols characterized by the same symmetry group (the same set of the symmetry operations). Shannon diversity index
${\mathit{D}}_{\mathit{S}\mathit{Y}\mathit{M}}\left(\stackrel{~}{\mathit{G}}\right)$ is calculated as follows:
where
${\mathit{P}}_{\mathit{i}}\left({\stackrel{~}{\mathit{G}}}_{\mathit{i}}\right)$ is the probability of finding a subset of symbols with the same set of symmetry operations/symmetry group
${\stackrel{~}{\mathit{G}}}_{\mathit{i}}$,
$\stackrel{~}{\mathit{m}}\left({\stackrel{~}{\mathit{G}}}_{\mathit{i}}\right)$ is the number of letters possessing the same symmetry group
${\stackrel{~}{\mathit{G}}}_{\mathit{i}}$,
${\stackrel{~}{\mathit{N}}}_{\stackrel{~}{\mathit{G}}}$ is total number of subsets, which coincides with the number of letters in a given alphabet. Again, the normalization condition given by Equation (6) takes place:
To calculate the Shannon diversity index, it is necessary to consider all subsets of symbols appearing in the alphabet characterized by the same set of symmetry elements.
Table 3 shows these subsets with the probability of their appearance in the Phoenician alphabet (letters are inscribed in a square,
${\stackrel{~}{\mathit{N}}}_{\stackrel{~}{\mathit{G}}}=22$). Substitution of these data into Eq. 4 yields:
${\mathit{D}}_{\mathit{S}\mathit{Y}\mathit{M}}\left(\stackrel{~}{\mathit{G}}\right)=1.271.$
Let us address graphs depicted in
Figure 2 representing
${\mathit{H}}_{\mathit{S}\mathit{Y}\mathit{M}}$ calculated for the studied scripts and the plotted
vs. the date at which the scripts were registered first [
16,
18,
19,
20,
22]. For example, the first Etruscan text is ascribed to ca. 700 BCE [
16]. The archaic Latin is ascribed to
ca. 750 BCE [
19,
20]. Classical Latin is formed approximately at I century BCE {19-20]. The graphs established for the scripts inscribed into the square (represented with grey circles and abbreviated
s-scripts) and the graphs established for the letters/graphemes inscribed into rectangles (abbreviated
r-scripts) are depicted. It is recognized from the graphs presented in
Figure 2 that
${\mathit{H}}_{\mathit{S}\mathit{Y}\mathit{M}}$ calculated for the
s-scripts, is decreased with time; whereas
${\mathit{H}}_{\mathit{S}\mathit{Y}\mathit{M}}$, is only slightly time sensitive for
r-scripts. Let us explain this result: the average uncertainty to find an element of symmetry within the symbols of the given alphabet (averaged over the entire alphabet) is decreased with time for
s-scripts, and it is constant for the
r-scripts. The interpretation of this conclusion needs some care; indeed,
${\mathit{H}}_{\mathbf{S}\mathbf{Y}\mathbf{M}}\left(\mathit{G}\right)=-\sum _{\mathit{i}=1}^{\mathit{k}}{\mathit{P}}_{\mathit{i}}\left({\mathit{G}}_{\mathit{i}}\right){\mathbf{l}\mathbf{n}\mathbf{P}}_{\mathit{i}}\left({\mathit{G}}_{\mathit{i}}\right),$ is not a monotonic function of
${\mathit{P}}_{\mathit{i}}$ , indeed
${\mathit{H}}_{\mathbf{S}\mathbf{Y}\mathbf{M}}\left(\mathit{G}\right)=0,$ when
${\mathit{P}}_{\mathit{i}}=0$, and also
${\mathit{H}}_{\mathbf{S}\mathbf{Y}\mathbf{M}}\left(\mathit{G}\right)=0$, when
${\mathit{P}}_{\mathit{i}}=1$ [
21]. A low value of
${\mathit{H}}_{\mathbf{S}\mathbf{Y}\mathbf{M}}\left(\mathit{G}\right)$ may evidence, the absence of symmetry in the letters of the alphabet; and this the case with the Hebrew alphabet, for which
${\mathit{H}}_{\mathit{S}\mathit{Y}\mathit{M}}$ calculated for both
s- and
r-Hebrew scripts is very low and it is out of the entire picture.
${\mathit{H}}_{\mathit{S}\mathit{Y}\mathit{M}}$ calculated for modern English letters and supplied for the comparison in
Figure 2 is close to those established for the Phoenician-rooted scripts.
Least squares regression straight lines emerging from the data plotted in
Figure 2 are supplied by Eq. 7 and Eq. 8.
where
${\mathit{R}}^{2}$ is the squared correlation coefficient, which is calculated for all off the represented scripts without English and Hebrew/Ashurit which are obviously far from the regression trend line. The low values of the correlation coefficient evidence the fact that the straight lines are supplied for eye guidance only.
It is recognized from Eq. 7 and Eq. 8 that the modulus of the slope of the ${\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)$ regression line for r-scripts, is much lower than that established for s-scripts.
It is noteworthy that both ${\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}\mathbf{a}\mathbf{n}\mathbf{d}$ ${\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)$ are restricted within a very narrow range of values, namely: $1.291<{\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}<1.757$ and ${1.226<\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)<1.314$. The only exception is Hebrew ${\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}\cong $ ${\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)\cong 1$. This observation will be discussed later.
Now we address the Shannon diversity index (SDI), denoted
${\mathit{D}}_{\mathit{S}\mathit{Y}\mathit{M}}\left(\stackrel{~}{\mathit{G}}\right)$ calculated for the studied alphabets with Eq. 4, and illustrated with
Figure 3. Somewhat surprisingly, SDI is increased with time in a monotonic way for both of
s- and
r-scripts. This means that the diversity of symmetry groups inherent for alphabets emerging from the Phoenician one grows with time. And again, the Hebrew script, demonstrating markedly low value of
${\mathit{D}}_{\mathit{S}\mathit{Y}\mathit{M}}$ is an exception (see
Figure 3).
${\mathit{D}}_{\mathit{S}\mathit{Y}\mathit{M}}$ calculated for modern English alphabet and supplied for the comparison in
Figure 3 is close to those established for ancient Phoenician-rooted scripts.
Figure 3. Shannon diversity index (SDI), denoted
${\mathit{D}}_{\mathit{S}\mathit{Y}\mathit{M}}\left(\stackrel{~}{\mathit{G}}\right)$ calculated for the studied alphabets, rooted in the Phoenician one
Least squares regression straight lines (
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}\left(\mathit{t}\right)=\mathit{\alpha}\mathit{t}+\mathit{\beta}$), emerging from the data plotted in
Figure 3 are supplied by Eq. 9 and Eq. 10:
The squared correlation coefficient${\mathit{R}}^{2}$ is calculated for all off the represented scripts without English and Hebrew/Ashurit which are far from the trend line. It is recognized from Eq. 7 and Eq. 8 that the modulus of the slope of the ${\mathit{H}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)$ regression line for r-scripts, is much lower than that established for s-scripts. Again, low values of the correlation coefficient point to the fact that the straight lines are supplied for eye guidance only.
Let us take a more close glance on Eq. 9 and Eq. 10; as we already mentioned both of the dependencies
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}\left(\mathit{t}\right)$ and
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)$ grow with time; moreover, the slopes of the both of dependencies are of the same order of magnitude:
${\mathit{\alpha}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}\cong 0.021;{\mathit{\alpha}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\cong 0.037.$ Let us calculate the points of intersection of the regression lines with the time axis:
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{s}\mathit{q}\mathit{u}\mathit{a}\mathit{r}\mathit{e}}\left(\mathit{t}\right)=0.0221\times {\mathit{\tau}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{*}\mathit{s}\mathit{q}}+1.6615=0$ and
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}\left(\mathit{t}\right)=0.0337\times {\mathit{\tau}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{*}\mathit{r}\mathit{e}\mathit{c}\mathit{t}}+1.6066=0.$ We calculate:
${\mathit{\tau}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{*}\mathit{s}\mathit{q}}=-75.2\mathit{c}\mathit{e}\mathit{n}\mathit{t}\mathit{u}\mathit{r}\mathit{y};{\mathit{\tau}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{*}\mathit{r}\mathit{e}\mathit{c}\mathit{t}}=-47.7\mathit{c}\mathit{e}\mathit{n}\mathit{t}\mathit{u}\mathit{r}\mathit{y}.$ The value
${\mathit{\tau}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{*}\mathit{r}\mathit{e}\mathit{c}\mathit{t}}=-47.7\mathit{c}\mathit{e}\mathit{n}\mathit{t}\mathit{u}\mathit{r}\mathit{y}$ catches the eye, due to the fact that it falls within the Vinča culture period, or Vinča-Turdaș culture, which is a Neolithic archaeological culture of Southeast Europe, dated to the period 5400–4500 BC [
24,
25,
26]. The Vinča culture is a later Neolithic/early Chalcolithic phenomenon which lasted for 700 years in the largest part of the northern and central Balkans, spreading across an area which includes present-day Serbia, the Romanian Banat, parts of Romanian Oltenia, western Bulgaria, northern Macedonia and eastern parts of Slavonia and Bosnia [
24,
25,
26].
Just to this period famous Vinča symbols (also called the Vinča script) are attributed [
27]. The Vinča symbols, shown in
Figure 4, are a set of untranslated symbols found on Neolithic era artifacts from the Vinča culture [
27]. Whether this is one of the earliest writing systems or simply symbols of some sort is disputed [
27]. Scholars have tried to answer two main questions about the nature of the signs: first, do they form a system, and (if so), could such a system be interpreted as an original prehistoric script? The scientists demonstrated that the signs and sign groups of the Vinča script are uniform, just as in organized writing [
27]. And it is reasonable to suggest, such a complex notation system could have been a form of written communication throughout the Vinča society. We plan to study the symmetry of Vinča symbols in our future investigations.
Thus, if we speculate that the diversity of alphabets, constituting scripts, quantified by
${\mathit{D}}_{\mathit{s}\mathit{y}\mathit{m}}^{\mathit{r}\mathit{e}\mathit{c}\mathit{t}}$ and calculated with Eq. 4, evolved in time in a continuous wave, as shown in
Figure 3, the regression line is expected to cross the axis of time in a point, to which the origin of scripts is related. And to the best of our knowledge in archeology this point of time coincides with the existence of the Vinča culture [
24,
25,
26,
27]. We are well aware that at this stage of investigation this is a bold hypothesis, which calls for the further investigations. The low value of the correlation coefficient of the linear regression appearing in Eq. 4, oblige con consider the aforementioned reasoning
cum grano salis.
One more observation is noteworthy: the values of SDI are restricted in a narrow range of values for both s- and r-scripts $1.024<\mathit{S}\mathit{D}\mathit{I}<1.686$, with the only exception of the Hebrew script, namely $\mathit{S}\mathit{D}\mathit{I}\cong 0.752.$
3.2. Symmetry factor; its definition and calculation for alphabets
Now we introduce one more notion, enabling quantification of symmetry of symbols constituting the scripts. We adopt the plausible hypothesis that an amount of graphical information, necessary for storing the symbol is proportional to the area of rectangle in which the symbol may be inscribed. Mirror axes of symmetry will separate the rectangle into sub-areas, as shown in
Figure 5. Consider symbol
- qoph of the Phoenician script. This symbol has the vertical mirror axis of symmetry denoted
${\mathit{S}}_{2}$ as depicted in
Figure 5A. Thus, the entire symbol may be obtained by projection of half-a-symbol relatively the axis
${\mathit{S}}_{2}$ as shown in
Figure 5B. If we have the full list of instructions describing building/drawing of half-a-symbol, the symmetrical projection will enable inscribing of the entire symbol. Thus, symmetry enables parsimony of information, necessary for drawing/inscribing of the symbols. Now consider the Phoenician letter
- teth , depicted in
Figure 5C. This symbol has four mirror symmetry axes, namely
$\left({\mathit{S}}_{1},{\mathit{S}}_{2},{\mathit{S}}_{3},{\mathit{S}}_{4}\right)$, shown in
Figure 5C. These axes separate the symbol into eight sub-segments, depicted in
Figure 5C. Following the aforementioned reasoning, axes
$\left({\mathit{S}}_{1},{\mathit{S}}_{2},{\mathit{S}}_{3},{\mathit{S}}_{4}\right)$ provide the eight-fold parsimony of graphical information necessary for drawing/inscribing the symbol.
This eightfold parsimony of information may be also demonstrated with the Cayley table of symmetry of symbols [
28,
29]. It should be mentioned that for the
- theth symbol, we recognize four additional elements of symmetry and they are rotations about the geometrical center of the symbol to the angles
${\phi}_{i}\left(i=1\dots 4\right)=\left(0;\frac{\pi}{2};\pi ;\frac{3\pi}{2}\right).$
Thus, the group of the symmetry of the symbol contains eight elements, namely four mirror axes and four distinguishable rotations [
28,
29]. Assume, that the letters are created with the software. Eight elements of symmetry provide an eightfold decrease in graphical information, necessary for drawing/inscribing the symbol. The same reasoning works for the Phoenician symbol
- qoph depicted in
Figure 5A. The total symmetry group of this symbol contains the mirror axis
${S}_{2}$ and the identity element which is the rotation to
$\phi =0;$ thus, the total number of the symmetry operations is two. Hence, the symmetry provides the twofold parsimony of the graphical information necessary for drawing the symbol. It should be emphasized that the aforementioned reasoning does not depend on the specific type of drawing of the symbol. Now let us quantify the aforementioned parsimony. We denote
${m}_{i}\left(G\right)$ the total number of elements of symmetry related to
i-th letter of the given alphabet, known in the group theory at the order of the group
G [
30]. Now we introduce the symmetry factor of the alphabet denoted
$\mu $ and defined with Eq. 11:
where
n is the number of symbols in the alphabet. The symmetry factor
$\mu $ quantifies the averaged level of symmetrization of the alphabet on one hand, and the possible parsimony of graphical information necessary for the drawing of the entire set of letters, constituting the alphabet.
Figure 6 depicts the dependence of the symmetry factor
$\mu $ claculated for various alphabets of the Phoenician group.
The regression line describing the time evolution of the symmetry factor
$\mathit{\mu}\left(\mathit{t}\right)$ is given by Eq. 12:
The regression line crosses the time axis at the point
${\mathit{\tau}}^{\mathit{*}}=36.6\mathit{c}\mathit{e}\mathit{n}\mathit{t}\mathit{u}\mathit{r}\mathit{y}$. Let us take a close look on the plot, presented in
Figure 6. We come to the following conclusions: i) points representing rectangular and square scripts are located very close one to another for all of the studied scripts, emerging from the Phoenician alphabet; ii) symmetry factor
$\mathit{\mu}$ is decreased with time. This means, that the averaged level of symmetrization of the studied alphabet is increased with time; and the parsimony of graphical information necessary for writing is increased with time. And, again the high value of the symmetry factor established for Hebrew presents the obvious exception.