Figure 1.
The 64-Kalā grammar wheel. The 64-Kalā feature space depicted as a wheel of 64 arcs organised into four quadrants. Arc opacity encodes feature importance (Kruskal-Wallis H-statistic across Dosha groups): fully opaque arcs (H > 80) indicate strong discriminatory power; near-transparent arcs (H < 5) negligible discrimination. Quadrant colours: green (STHANA—G-quadruplex density and run length, CpG density and O/E ratio, thermodynamic stability ΔG, GC content, sequence complexity, repeat content); purple (SANDHI—thermodynamic gradient sharpness, palindrome density and length, hairpin and cruciform potential, junction asymmetry); orange (AVAKĀŚA—CTCF density, nucleosome exclusion scores, NFR depth and width, accessibility architecture); blue (DOSHA-composition—Alu/SINE, LINE-1, DNA transposon, and HERV-LTR density and regulatory activity scores). Inset: TE mini-wheel showing each transposable element class separately, arc opacity encoding regulatory competence scores.
Figure 1.
The 64-Kalā grammar wheel. The 64-Kalā feature space depicted as a wheel of 64 arcs organised into four quadrants. Arc opacity encodes feature importance (Kruskal-Wallis H-statistic across Dosha groups): fully opaque arcs (H > 80) indicate strong discriminatory power; near-transparent arcs (H < 5) negligible discrimination. Quadrant colours: green (STHANA—G-quadruplex density and run length, CpG density and O/E ratio, thermodynamic stability ΔG, GC content, sequence complexity, repeat content); purple (SANDHI—thermodynamic gradient sharpness, palindrome density and length, hairpin and cruciform potential, junction asymmetry); orange (AVAKĀŚA—CTCF density, nucleosome exclusion scores, NFR depth and width, accessibility architecture); blue (DOSHA-composition—Alu/SINE, LINE-1, DNA transposon, and HERV-LTR density and regulatory activity scores). Inset: TE mini-wheel showing each transposable element class separately, arc opacity encoding regulatory competence scores.
Figure 2.
Grammar discriminates organ-system Dosha identity. (A) Balanced accuracy of a Random Forest classifier trained on 64 grammar features from 9,299 gene promoters (5-fold cross-validation). Grammar alone achieves 45.1% versus the 33.3% random expectation (p < 10-48). (B) Radar plot of normalised mean feature values for the top 10 discriminating features, stratified by Dosha: Kapha (neural/cardiac, green), Pitta (metabolic/hepatic, red), Vāta (dermal/pulmonary, blue). Kapha genes are enriched for G4 density and CpG content; Pitta genes for thermodynamic gradient sharpness and CTCF density; Vāta genes for repeat and Alu element content. (C) Feature importance (Kruskal-Wallis H-statistic), top 15 features. G4 density (H = 98.2) and CTCF density (H = 95.6) are the two strongest discriminating features.
Figure 2.
Grammar discriminates organ-system Dosha identity. (A) Balanced accuracy of a Random Forest classifier trained on 64 grammar features from 9,299 gene promoters (5-fold cross-validation). Grammar alone achieves 45.1% versus the 33.3% random expectation (p < 10-48). (B) Radar plot of normalised mean feature values for the top 10 discriminating features, stratified by Dosha: Kapha (neural/cardiac, green), Pitta (metabolic/hepatic, red), Vāta (dermal/pulmonary, blue). Kapha genes are enriched for G4 density and CpG content; Pitta genes for thermodynamic gradient sharpness and CTCF density; Vāta genes for repeat and Alu element content. (C) Feature importance (Kruskal-Wallis H-statistic), top 15 features. G4 density (H = 98.2) and CTCF density (H = 95.6) are the two strongest discriminating features.
Figure 3.
Grammatical fragility landscapes across three Dosha-representative genes. Per-position fragility score (y-axis) across the 2,000 bp promoter window (x-axis, relative to TSS) for three genes: CLU (clusterin, Kapha—neural/cardiac, green), CYP3A4 (cytochrome P450 3A4, Pitta—metabolic/hepatic, red), and KRT14 (keratin 14, Vāta—structural/dermal, blue). The fragility score sums five grammar elements at each position: G-quadruplex run membership (+4), CpG dinucleotide participation (+3), thermodynamic gradient sharpness (+1–2), palindromic core proximity (+2), and TATA box adjacency (+5). Annotations indicate highest-scoring grammar elements per promoter. Background shading: distal zone −2,000 to −1,500 bp (light blue); medial enhancer zone −1,500 to −500 bp (white); core promoter −500 to TSS (light green). The medial zone consistently harbours the majority of fragile positions across all three Doshas.
Figure 3.
Grammatical fragility landscapes across three Dosha-representative genes. Per-position fragility score (y-axis) across the 2,000 bp promoter window (x-axis, relative to TSS) for three genes: CLU (clusterin, Kapha—neural/cardiac, green), CYP3A4 (cytochrome P450 3A4, Pitta—metabolic/hepatic, red), and KRT14 (keratin 14, Vāta—structural/dermal, blue). The fragility score sums five grammar elements at each position: G-quadruplex run membership (+4), CpG dinucleotide participation (+3), thermodynamic gradient sharpness (+1–2), palindromic core proximity (+2), and TATA box adjacency (+5). Annotations indicate highest-scoring grammar elements per promoter. Background shading: distal zone −2,000 to −1,500 bp (light blue); medial enhancer zone −1,500 to −500 bp (white); core promoter −500 to TSS (light green). The medial zone consistently harbours the majority of fragile positions across all three Doshas.
Figure 4.
GWAS concordance gradient across fragility thresholds. (A) Dosha concordance (fraction of variants where gene Dosha matches GWAS trait Dosha) as a function of grammatical fragility threshold, for 6,977 GWAS-NFR variant pairs (95% Wilson confidence intervals). Dashed line: 33.3% random expectation. At fragility ≥ 5 (n = 437), concordance reaches 78.5% (p = 2.17 × 10-83). (B) 3×3 Dosha confusion matrix at fragility ≥ 2 (n = 1,025). Diagonal cells (concordant) shown in Dosha colour. Pitta diagonal: 80%; Kapha: 44%; Vāta: 12%; χ2 = 137.3, p = 1.1 × 10-29. (C) Concordance by sequence zone. Proximal zone (0–500 bp from TSS): 23.5%, below chance—consistent with purifying selection eliminating Dosha-specific variation at the core promoter. Medial enhancer zone (500–1,500 bp): 60.2% (p < 0.0001). Distal zone: 42.7%.
Figure 4.
GWAS concordance gradient across fragility thresholds. (A) Dosha concordance (fraction of variants where gene Dosha matches GWAS trait Dosha) as a function of grammatical fragility threshold, for 6,977 GWAS-NFR variant pairs (95% Wilson confidence intervals). Dashed line: 33.3% random expectation. At fragility ≥ 5 (n = 437), concordance reaches 78.5% (p = 2.17 × 10-83). (B) 3×3 Dosha confusion matrix at fragility ≥ 2 (n = 1,025). Diagonal cells (concordant) shown in Dosha colour. Pitta diagonal: 80%; Kapha: 44%; Vāta: 12%; χ2 = 137.3, p = 1.1 × 10-29. (C) Concordance by sequence zone. Proximal zone (0–500 bp from TSS): 23.5%, below chance—consistent with purifying selection eliminating Dosha-specific variation at the core promoter. Medial enhancer zone (500–1,500 bp): 60.2% (p < 0.0001). Distal zone: 42.7%.
Figure 5.
ClinVar independent validation confirms zone architecture. (A) Dosha concordance by sequence zone in 23,544 ClinVar regulatory variants. Proximal zone: 32.9% (p = 0.62, not significant), consistent with purifying selection. Medial zone: 72.8% (p = 1.26 × 10-178, n = 1,248), independently replicating the GWAS finding. Distal zone: 67.7% (p = 10-100). (B) 3×3 Dosha confusion matrix for medial zone ClinVar variants (n = 1,248). Kapha diagonal: 81%; Vāta: 86%; Pitta: 45%; χ2 = 946.4, p = 1.5 × 10-203. (C) Pathogenicity rate (CLNSIG ≥ 4) for splice donor/acceptor variants (81.0%, n = 49,574) versus regulatory NFR baseline (3.0%, n = 23,544). OR = 138.7, p < 10-300 (Fisher exact). Complete disruption of the GT/AG splice Sandhi grammar yields near-deterministic clinical pathogenicity.
Figure 5.
ClinVar independent validation confirms zone architecture. (A) Dosha concordance by sequence zone in 23,544 ClinVar regulatory variants. Proximal zone: 32.9% (p = 0.62, not significant), consistent with purifying selection. Medial zone: 72.8% (p = 1.26 × 10-178, n = 1,248), independently replicating the GWAS finding. Distal zone: 67.7% (p = 10-100). (B) 3×3 Dosha confusion matrix for medial zone ClinVar variants (n = 1,248). Kapha diagonal: 81%; Vāta: 86%; Pitta: 45%; χ2 = 946.4, p = 1.5 × 10-203. (C) Pathogenicity rate (CLNSIG ≥ 4) for splice donor/acceptor variants (81.0%, n = 49,574) versus regulatory NFR baseline (3.0%, n = 23,544). OR = 138.7, p < 10-300 (Fisher exact). Complete disruption of the GT/AG splice Sandhi grammar yields near-deterministic clinical pathogenicity.
Figure 6.
Evolutionary depletion of common variants at fragile positions. 148 genes (48 Kapha, 50 Pitta, 50 Vāta) with at least 50 gnomAD variants in the 2 kb promoter window, comprising 28,976 total variants (23,543 non-CpG). (A) Percentage of variants with allele frequency ≥ 1% (common variants) binned by grammatical fragility score. Fragile positions (score ≥ 3) show 5.2% common variants versus 7.3% at tolerant positions (score = 0). Fisher exact OR = 0.698, p = 2.97 × 10-5. Spearman ρ = −0.45 between fragility and proportion common (non-CpG positions). (B) Depletion stratified by Dosha group. The depletion signal is consistent across all three Dosha groups, confirming that the evolutionary constraint is a property of the grammar rather than a feature of any specific gene set.
Figure 6.
Evolutionary depletion of common variants at fragile positions. 148 genes (48 Kapha, 50 Pitta, 50 Vāta) with at least 50 gnomAD variants in the 2 kb promoter window, comprising 28,976 total variants (23,543 non-CpG). (A) Percentage of variants with allele frequency ≥ 1% (common variants) binned by grammatical fragility score. Fragile positions (score ≥ 3) show 5.2% common variants versus 7.3% at tolerant positions (score = 0). Fisher exact OR = 0.698, p = 2.97 × 10-5. Spearman ρ = −0.45 between fragility and proportion common (non-CpG positions). (B) Depletion stratified by Dosha group. The depletion signal is consistent across all three Dosha groups, confirming that the evolutionary constraint is a property of the grammar rather than a feature of any specific gene set.
Figure 7.
Grammar errors produce directed Dosha shift consequences. 6,975 GWAS-NFR variants analysed for grammar preservation and Dosha shift direction. (A) Grammar preservation rate (fraction of variants maintaining reference Dosha grammar) stratified by fragility threshold. Preservation rises from 36.1% at fragility ≥ 0 to 67.8% at fragility ≥ 5, confirming that high-fragility positions are selectively disrupted when variants occur. (B) Dosha shift error matrix: rows = gene Dosha, columns = direction of grammar shift for variants that do change Dosha. Kapha-to-Pitta shift: 22% of Kapha gene variants; Pitta-to-Kapha: 31%. (C) Shift-concordance: among variants that produce a grammar shift, 38.9% show concordance between shift direction and GWAS trait Dosha (p = 2.0 × 10-5, n = 6,975; versus 33.3% random). Kapha-to-Pitta shifts predict Pitta-class disease in 71% of classifiable cases. (D) APOE schematic: G4/CpG anchor at −618 bp (Channel 1, Kapha programme) suppresses the dominant Pitta grammar (Channel 2) in brain tissue. G→A variant disrupts Channel 1; Channel 2 is released; neuroinflammatory Alzheimer’s disease follows.
Figure 7.
Grammar errors produce directed Dosha shift consequences. 6,975 GWAS-NFR variants analysed for grammar preservation and Dosha shift direction. (A) Grammar preservation rate (fraction of variants maintaining reference Dosha grammar) stratified by fragility threshold. Preservation rises from 36.1% at fragility ≥ 0 to 67.8% at fragility ≥ 5, confirming that high-fragility positions are selectively disrupted when variants occur. (B) Dosha shift error matrix: rows = gene Dosha, columns = direction of grammar shift for variants that do change Dosha. Kapha-to-Pitta shift: 22% of Kapha gene variants; Pitta-to-Kapha: 31%. (C) Shift-concordance: among variants that produce a grammar shift, 38.9% show concordance between shift direction and GWAS trait Dosha (p = 2.0 × 10-5, n = 6,975; versus 33.3% random). Kapha-to-Pitta shifts predict Pitta-class disease in 71% of classifiable cases. (D) APOE schematic: G4/CpG anchor at −618 bp (Channel 1, Kapha programme) suppresses the dominant Pitta grammar (Channel 2) in brain tissue. G→A variant disrupts Channel 1; Channel 2 is released; neuroinflammatory Alzheimer’s disease follows.

Figure 8.
Grammar fragility operates across the full cis-regulatory landscape. (A) Dosha-disease concordance as a function of grammar fragility threshold, stratified by regulatory element type. Intronic GWAS variants (n = 4,045 total) show the strongest fragility-concordance gradient: 52.3% concordance at all fragility thresholds (p = 3.69 × 10-31), rising to 94.0% at score ≥ 5 (n = 301, p = 2.52 × 10-110). Promoter/UTR and distal intergenic variants show weaker signals. The grammar encodes Dosha identity at every active regulatory element in the gene body, not only the proximal promoter. (B) Concordance stratified by TAD (topologically associating domain) context, using Rao et al. 2014 GM12878 Hi-C TAD calls [
4] (9,274 domains). TADs were classified as pure-Dosha (all classified genes share one Dosha; n = 314) or mixed-Dosha (genes from multiple Doshas cohabit the same domain; n = 710). Variants in mixed-Dosha TADs show higher concordance than those in pure-Dosha TADs at all fragility thresholds (fragility ≥ 0: 46.4% vs 39.5%, p = 2.11 × 10-10 vs 1.25 × 10-5). Grammar errors in mixed-Dosha TADs are more directed because the competing regulatory programme is physically proximate in the same chromatin domain.
Figure 8.
Grammar fragility operates across the full cis-regulatory landscape. (A) Dosha-disease concordance as a function of grammar fragility threshold, stratified by regulatory element type. Intronic GWAS variants (n = 4,045 total) show the strongest fragility-concordance gradient: 52.3% concordance at all fragility thresholds (p = 3.69 × 10-31), rising to 94.0% at score ≥ 5 (n = 301, p = 2.52 × 10-110). Promoter/UTR and distal intergenic variants show weaker signals. The grammar encodes Dosha identity at every active regulatory element in the gene body, not only the proximal promoter. (B) Concordance stratified by TAD (topologically associating domain) context, using Rao et al. 2014 GM12878 Hi-C TAD calls [
4] (9,274 domains). TADs were classified as pure-Dosha (all classified genes share one Dosha; n = 314) or mixed-Dosha (genes from multiple Doshas cohabit the same domain; n = 710). Variants in mixed-Dosha TADs show higher concordance than those in pure-Dosha TADs at all fragility thresholds (fragility ≥ 0: 46.4% vs 39.5%, p = 2.11 × 10-10 vs 1.25 × 10-5). Grammar errors in mixed-Dosha TADs are more directed because the competing regulatory programme is physically proximate in the same chromatin domain.

Figure 9.
Grammatical fragility predicts organ-system disease identity independently of variant deleteriousness. (A) CADD deleteriousness score shows no relationship with Dosha concordance. Across 700 GWAS-NFR variants binned into CADD quintiles, concordance rate (fraction where gene Dosha matches GWAS trait Dosha) is shown with 95% Wilson confidence intervals. Spearman ρ = 0.087, p = 0.021—a weak correlation reflecting general regulatory enrichment, not organ specificity. A dotted horizontal line marks the mean concordance rate across quintiles, emphasising the absence of a gradient. CADD was designed to predict whether a variant is damaging; it carries no information about which organ system the damage affects. (B) Grammatical fragility shows a strong and significant gradient with Dosha concordance across 1,092 GWAS-NFR variants with classifiable Dosha. Concordance is 39.3% at tolerant positions (φ̃=0), 34.8% at low fragility (φ̃=1–2), 39.0% at moderate fragility (φ̃=3–4), and 90.7% at high-fragility positions (φ̃≥5; n=248, p=6.7×10−80). After partial correlation controlling for CADD score, the fragility-concordance relationship remains significant (r=0.130, p=0.005), confirming orthogonality with deleteriousness. Error bars: 95% Wilson confidence intervals. Dashed line: 33.3% random expectation. Stars: one-sided binomial test against 33.3% (***p<0.001). Together, the two panels demonstrate that CADD and grammatical fragility measure orthogonal biological properties: deleteriousness (whether a variant damages function) and organ-system specificity (where the damage manifests).
Figure 9.
Grammatical fragility predicts organ-system disease identity independently of variant deleteriousness. (A) CADD deleteriousness score shows no relationship with Dosha concordance. Across 700 GWAS-NFR variants binned into CADD quintiles, concordance rate (fraction where gene Dosha matches GWAS trait Dosha) is shown with 95% Wilson confidence intervals. Spearman ρ = 0.087, p = 0.021—a weak correlation reflecting general regulatory enrichment, not organ specificity. A dotted horizontal line marks the mean concordance rate across quintiles, emphasising the absence of a gradient. CADD was designed to predict whether a variant is damaging; it carries no information about which organ system the damage affects. (B) Grammatical fragility shows a strong and significant gradient with Dosha concordance across 1,092 GWAS-NFR variants with classifiable Dosha. Concordance is 39.3% at tolerant positions (φ̃=0), 34.8% at low fragility (φ̃=1–2), 39.0% at moderate fragility (φ̃=3–4), and 90.7% at high-fragility positions (φ̃≥5; n=248, p=6.7×10−80). After partial correlation controlling for CADD score, the fragility-concordance relationship remains significant (r=0.130, p=0.005), confirming orthogonality with deleteriousness. Error bars: 95% Wilson confidence intervals. Dashed line: 33.3% random expectation. Stars: one-sided binomial test against 33.3% (***p<0.001). Together, the two panels demonstrate that CADD and grammatical fragility measure orthogonal biological properties: deleteriousness (whether a variant damages function) and organ-system specificity (where the damage manifests).

Figure 10.
Grammatical fragility is under evolutionary constraint universally across human populations. Depletion of common variants (AF > 1%) at grammatically fragile positions (φ̃ ≥ 3) versus tolerant positions (φ̃ = 0), computed across 24,339 gnomAD v2.1.1 variants in 2kb promoter windows of 124 genes, stratified by super-population. Analysis uses the same hg19 coordinates and fragility scores as
Figure 6. The reference result (OR = 0.698, p = 2.97 × 10
−5) from the full combined dataset is shown as a diamond. Population-stratified ORs are shown for four super-populations with sufficient data. All four independently show OR < 1: AFR OR = 0.844 (p = 0.010); AMR OR = 0.758 (p = 0.001); EAS OR = 0.645 (p = 6.5 × 10
−6); EUR OR = 0.799 (p = 0.008). The depletion signal in the African (AFR) super-population—which carries the deepest human genetic diversity, representing lineages that predate the out-of-Africa migration—confirms that the grammatical constraint on fragile positions was established at least 100,000 years before present. The grammar is not a European population artefact. It is a universal property of the human regulatory genome. Error bars: 95% Woolf confidence intervals. SAS (South Asian) had insufficient data in this gene panel and is not shown.
Figure 10.
Grammatical fragility is under evolutionary constraint universally across human populations. Depletion of common variants (AF > 1%) at grammatically fragile positions (φ̃ ≥ 3) versus tolerant positions (φ̃ = 0), computed across 24,339 gnomAD v2.1.1 variants in 2kb promoter windows of 124 genes, stratified by super-population. Analysis uses the same hg19 coordinates and fragility scores as
Figure 6. The reference result (OR = 0.698, p = 2.97 × 10
−5) from the full combined dataset is shown as a diamond. Population-stratified ORs are shown for four super-populations with sufficient data. All four independently show OR < 1: AFR OR = 0.844 (p = 0.010); AMR OR = 0.758 (p = 0.001); EAS OR = 0.645 (p = 6.5 × 10
−6); EUR OR = 0.799 (p = 0.008). The depletion signal in the African (AFR) super-population—which carries the deepest human genetic diversity, representing lineages that predate the out-of-Africa migration—confirms that the grammatical constraint on fragile positions was established at least 100,000 years before present. The grammar is not a European population artefact. It is a universal property of the human regulatory genome. Error bars: 95% Woolf confidence intervals. SAS (South Asian) had insufficient data in this gene panel and is not shown.
