1. Introduction
The increasing worldwide risk of cancer, which is particularly high in European countries and the US (e.g. in the UK, it is currently estimated to be over 50% [
1], is alarming. Furthermore, men are more likely to both get and die from malignant tumours [
2]. Male fertility was also shown to be decreasing with acceleration over the last 80 years of monitoring worldwide [
3]. Are both tendencies causally linked? Cancer-testis-associated (CTA) protein-coding genes (expressed nearly selectively in the normal testis), which originated with the development of mammals and expanded in placentals and hominids, turned out to be oncogenic drivers, responsible for poor prognosis in cancer patients of many solid tumour types [
4,
5,
6]. In this review, we attempt to analyse this fatal link's evolutionary root and contemporary drive.
In our previous phylostratigraphy analysis of the human genome embracing the entire ~4 billion years-long evolutionary timeline of life on Earth, by investigating the list of 1474 gametogenic (germ cell, meiotic and CTA origin) genes, or GG, we revealed several peaks of evolutionary reproductive attractors [
7] (
Figure 1A). Beside the peaks in Unicellular (UC) eukaryotes (Phylostratum 2), early multicellulars (MC) (strata 4+5), and the peak in stratum 8 (the Cambrian explosion of animal variety), we paid attention to two splashes of late GG that had evolved in Eutherians and Old World monkeys (strata 12-14). Those included mostly the CTA group of late origin [
7]. With the origin of the X-chromosome, dated about 170 Mya, the CTA genes, initially appearing in the mammalian autosomes, transited their exponentially expanding families onto the X-chromosome (
Figure 1B) The evolution of X-linked CTAs was further hastened and is likely still going on in humans [
8].
The following questions arise: (1) Why did these testis-associated genes evolve in hominids so late when sexual reproduction was already established? (2) And why are they associated with cancer? As the main X-located CT-MAGE genes code for antigenic proteins, this feature also needs an explanation of the link to fertility and cancer.
According to the popular evolutionary theory of cancer based on phylostratigraphic data of cancer patient transcriptome databases, as well as that of driver oncogenes, the origin of cancer is dated very early, to the transition between unicellular (UC) and early multicellular (MC) organisms, (~ 2-1 Bya) [
9,
10,
11,
12,
13,
14,
15,
16] creating in human cancers the rewired UC-MC gene regulatory network (GRN) [
17], though it is considered a stepwise process [
16,
18]. Moreover, the genome rewiring in cancer, particularly in association with polyploidy and
c-myc-related activation of bivalent developmental genes, was found to facilitate the coordinated expression of reproduction genes and proteins and favour the female meiotic pathway in solid TCGA (The Cancer Genome Atlas) tumours [
7,
15,
19,
20], which is potentially parthenogenetic, also in males [
21]. Bruggeman et al. [
22], who observed a massive expression of germ cell-specific genes in cancer, came to the conclusion that it is a cancer hallmark; moreover, they showed, using TCGA lung adenocarcinoma as an example, that a higher germ cell gene expression signature is associated with poorer survival of the patients. All these data provide support for the old embryonal/parthenogenetic cancer theory [
23,
24,
25]. In its current variant, the embryonal theory of cancer considers polyploid giant cancer cells (PGCCs) in somatic tumours to be exploiting a program of early embryogenesis [
26,
27] and sexually undetermined primordial germ cells [
23,
24,
25]. For the historical arrow diagram describing advancements in cancer polyploidy research (including embryogenesis-like feature identification) from the 19th century onwards, see [
28]. In further support of this theory, the microscopic observations from various laboratories during the two last decades have shown PGCCs, which are more numerous in genotoxically challenged cancers, to be similar to an early embryo (typically reaching the 8–16 cell (32C) stage), exhibiting meiotic and embryonal pluripotency markers [
7,
26,
27,
29,
30,
31,
32,
33] and being able to initiate tumours upon xenotransplantation of a single PGCC [
34]. At the same time, PGCCs are capable of releasing their cellularized mobile offspring by asymmetric division or bursting. This process, termed ‘neosis’ [
35], has the mixed features of multicellular embryogenesis and unicellular amoebal sporogenesis, and, not infrequently, literally recapitulates the latter in the process of developing drug resistance [
11,
36,
37,
38]. Some typical images of this process are compiled in
Figure 2, with a Volvox 8-cell-bridged embryo image enclosed for comparison.
However, STRING PPI network analysis of gametogenesis (GG) genes in breast and many other common cancers from the TCGA database reveals that the giant component of the network, enriched for meiotic modules (including those of female meiosis/oogenesis), is bridged to the subnetwork of the CTA/MAGE group by the nuclear receptor transcriptional regulator PRAME antigen (
Figure 3A). In
Figure 3B, the GG phylostratigraphy distribution for polyploid BRCA samples, including expression of the evolutionarily late genes, is presented as a typical example for many common cancers [
7].
Here, we tried to find the answers to the questions relating to CTA genes we’d formulated above, using data from the literature and also performing further in silico functional analysis of separate phylostratigraphy layers of the GG gene modules.
But first, we should start with a brief insight into the evolution of the human genome and CTA genes.
2. Evolution of the human genome by segmental duplications, adaptation by transposition and fragility, CTA origin, reactivation of the X-chromosome in spermatogenesis, and X-doubling in male cancer
Our human lineage started ~3.9 Bya with the origin of Life by Eubacteria, developed through multicellular eukaryotes, underwent two rounds of whole-genome doubling at the base of vertebrate origin before the Cambrian explosion of the animal variety, at the second atmospheric oxygenation ~500 Mya; mammals originated ~250-200 Mya, primates ~90-57 Mya [
18]. Compared to other mammals, the genomes of primates and particularly humans are enriched with large, interspersed segmental duplications (SDs), repeated in two or more genomic locations, with high levels of sequence identity. A strong association between SDs, genomic instability, and large-scale chromosomal rearrangements has been shown. The findings suggest that SDs have not only created novel primate gene families but might have also influenced current human genetic and phenotypic variation on a previously unappreciated scale [
40]. 45% of the human genome is composed of transposable elements (TE): non-LTR (long terminal repeats) retrotransposons - short and long interspersed nuclear elements (SINE and LINE, correspondingly), LTR retrotransposons (endogenous retroviruses), and DNA transposons; TE are mostly epigenetically silenced [
41]. The detailed analyses of sequences of pairwise SD alignments have revealed that Alu, the most abundant RT class of mobile elements, is significantly enriched at the boundaries of SD pairs and restricted to younger subfamilies (AluY and AluS). The pairwise SD boundaries were shown to be fragile and the preferential sites of double-strand breakage. The fragile human genome sites assume a left-handed zigzag-like Z-DNA form of high energy tension and represent the sites of the high mutation and deletion rates [
40,
42]. Thus, SD and Alu’s repeats appear as the main origin of genome instability in primates and humans [
40,
43]. Updated analysis of the common fragile sites indicates their activation to be associated with replication stress and heterochromatin under-replication, which correlate well with chromosomal rearrangement and copy number variation and are likely causally linked to carcinogenesis [
44,
45]. Notably, the genome fragile sites are attractive for the meiotic recombination endonuclease SPO11 [
46]. Vertically-inherited endogenous ERV viruses possessing long terminal repeats (LTR) have also contributed to CTA evolution by producing tissue-specific variants (testis, brain, placenta), creating alternative gene promoters [
47].
The evolution of CTAs in humans is tightly associated with the newest RT history, on the one hand, and the phylogenetic and ontogenetic history of the sex chromosome X, where the largest part of CTAs are located, on the other hand. The X chromosome is enriched 2-fold for the autonomous RT LINE-1 (L1) that may also serve as DNA signals to propagate X-chromosome inactivation (through lncRNA) along the chromosome [
48] while transposing by their smaller active fraction of the Alu-elements. The restricted subset of L-1 elements underwent an Eutherian burst [
49] which could favour the Eutherian splash of CTA genes. Alu intruded the Primate genomes with more than one million elements 60-35 Mya [
49,
50]. As shown in
Figure 1B, CTA genes transited to and exponentially expanded on the X-chromosome. Given the high level of diversifying selection, it was suggested that CTA genes are primarily responsible for the observed rapid evolution of protein-coding genes on the X chromosome [
8] that involves the ongoing evolution of ALU repeats [
51]. There are also the transcription-binding sites found within the Alu sequences, including the nuclear transcription factor family, in particular, steroid hormone receptors, Progesterone and Androgen receptors (PR and AR, correspondingly) [
52]; they are associated with somatic sex determination regulating spermatogenesis, folliculogenesis, and placentation.
The early human embryo undergoes full genome activation at the 8-cell stage; the later-formed primordial germ cells (PGCs) maintain the paternally and maternally inherited imprinting patterns. This DNA methylation pattern is again rapidly erased when PGCs begin migrating towards the developing gonads and undergo reprogramming, starting the transition from mitotic to meiotic division during spermatogenesis there [
53]. Many CT genes located on the X-chromosome are involved in this reprogramming (see below). LINE-1 activation is essential for preimplantation development [
54], they are expressed in round spermatids [
55] and also the DNAse-hypersensitive nucleo-histone fraction of the mouse and human sperm is enriched in retrotransposon DNA [
56,
57]. The Xq26-28 fragile site, a region on the X chromosome prone to breakage, has been linked to LINE-1 retrotransposition [
58]. This region, particularly Xq27.3, includes the MAGE-A family of CTA genes. The youngest and the only transposing Ta subfamily of L1 amplified in the last 2 million years [
59]. Non-autonomous ALU, which alters DNA methylation [
60] and the autonomous, only human, transposing L1 subfamily, which generally favours methylation, are often clustered together in the fragile sites [
61]. It remains to add that CTA genes are hyperactivated in cancers by demethylation [
62].
From the above, it is seen that the complex regulation of male CTA gene expression on the X-chromosome is RT-linked, associated with fragility, and also highly dependent on the DNA secondary structure and epigenetic modifications. The latter, as well as X-gene dosage, also depend on X-chromosome ontogeny, different for males (XY) and females (XX).
In the female karyotype, one of two X-chromosomes is inactivated (XCI). XCI in placental mammals is a dosage compensation mechanism that transcriptionally silences the majority of genes on one of the X chromosomes in females. Because males have a single X chromosome, this ensures dosage equivalence between males and females. Male cells reactivate their only X chromosome during spermatogenesis [
63]. Recent reports have shown that reactivation of the inactive X-chromosome, or a loss of the inactive and doubling of the active X-chromosome [
64], a unique phenomenon that exists in many high-risk tumours in women, can transform the expression of many X-linked genes from monoallelic to biallelic. Therefore, Liu et al. (2018) [
65] speculated that X-chromosome reactivation can inappropriately augment CTA expression in cancer. Therefore it is highly interesting that our studies also revealed a high proportion of male tumour types with the extra X-chromosome acquisition in the Mitelman tumour karyotype database, which thus presumably doubles the gene dosage of X-linked CTA genes [
21]. Moreover, the studies on male breast cancers with an extra X-chromosome revealed the hypomethylation of the AR gene together with the CTA
MAGEA family members, the coregulators of AR, both mapped on the X-chromosome’s q-arm [
66]. The authors suggested that this cis-hypomethylation may lead to CTA and AR hyperactivation. Moreover, Talon et al [
63] proposed that genes encoded on the sex chromosomes act on autosomal genes to generate a differential regulatory and epigenetic landscape upon which later factors, such as hormones, act to counter or compound sex biases.
As suggested [
67] and generally accepted [
68], CTAs emerged in evolution to protect male reproduction in mammals and particularly hominids (who possess large brains) from stress. The useful information on the functions of the most important CTAs/MAGE members related to stress is briefly compiled below.
3. The MAGE protein oncogene family functions in gametogenesis and the adaptive stress response
MAGEs (Melanoma-associated genes, first found in melanoma) represent the most important group of CTA genes associated with cancer.
MAGE genes are conserved in all eukaryotes and have expanded from a single gene in lower eukaryotes to ∼40 genes in humans and mice. The type I MAGEs include the
MAGE-A,
-B, primate-specific-C, and mouse-specific Mage-a–like subfamily members. Type I MAGEs are called cancer-testis antigens (CTAs) because they are primarily expressed in the testis but are normally silent in other tissues, however, they are often aberrantly reactivated during oncogenic transformation and code for antigens recognized by cytotoxic T lymphocytes, and they are also involved in diseases other than cancer, including neurological disorders [
69]. They are mostly located on the X-chromosome.
In contrast, the type II MAGEs, consisting of the
MAGE-D, -E, -F, -G, -H, -L, and
NECDIN genes, are more ubiquitously expressed in humans, particularly in the brain. They are typically not associated with human cancer and can be located on autosomes. In
Figure 4 A, the MAGE group I and II genes mapping on a human X-chromosome are shown. Notably, MAGE-A and C-subfamilies, most associated with cancer, are nested in the syntenic regions of the X-chromosome, the fragile locus Xq27.3 and locus Xq28, where the testis-associated genes are overrepresented. Some MAGE-B genes from the X-short arm are also involved in cancer.
Figure 4B, also borrowed from the review [
69], shows a heatmap of the percentage of various tumours that express each type of MAGE. It is interesting to compare it with the Table in
Figure 4C, which showcases the male tumour karyotype cohorts showing the highest percentage of extra X chromosome gain, presented by the same tumour types, with the upper row being seminoma in both figures.
It was supposed that the stress tolerance assigned by MAGEs might explain why many cancers capable of surviving anticancer treatments aberrantly express them [
69,
70]. At the molecular level, MAGEs are regulators of transcription factors, but many also bind to E3 RING ubiquitin ligases and, thus, regulate their substrate specificity, ligase activity, and subcellular localization. Moreover, a majority of CTA/MAGE are intrinsically disordered proteins (IDPs), toggling promiscuous links with various substrates, in a dosage-sensitive manner [
71,
72]. In general, the IDPs enable discrete cell transitions from one state to another and can change cell fate [
72,
73].
The MAGE-A group also suppresses the p53 function [
74]. In turn, suppression of p53 can induce abnormal gametogenesis and parthenogenetic development in tumours [
75]. MAGE (and CTAs in general) protect spermatogenesis under stress conditions (e.g. famine)
in vivo [
70].
CTA genes, including MAGEs, have been recently excellently reviewed [
5,
6,
69]. Therefore below, we only briefly list the functions of selected CTA genes which provide keys for our further analysis of GG STRING PPI networks.
-
MAGE-A1, NY-ESO-1, and GAGE are expressed in foetal and neonatal sexually differentiating germ cells. In oocytes,
MAGE-A1 expression terminates around birth, whereas NY-ESO-1 expression persists through the neonatal stage and GAGE expression is maintained until adulthood. The population of GAGE-expressing male and female germ cells is partially OCT4-positive [
76].
-
MAGE-A2restrains cellular senescence by targeting the function of PML-involved pathways [
77].
-
MAGE-A3/6 downregulates autophagy and apoptosis in response to cellular starvation [
78] and also supports genome stability by degradation of retrotransposon RNA [
79].
-
MAGE-A11 forms complexes with
MAGE A3/6 and regulates AR function in spermatogenesis and somatic sex determination [
80], and also interacts with AR and PR to favour embryo implantation [
69].
-
MAGE-B2 (locus Xp21.2) coincides with the position of
NORB1, whose doubling or deletion leads to dosage-sensitive sex reversal [
81].
-
MAGE-C1/C2 - involved in p53 suppression, cancer invasion [
69]
-
SPANX-N and (A-D) (Xq261-27.3) - (A-D) is the human family derived from the rodent
SPANX-N, they are responsible for the sperm motility (
SPANX-N) and sperm head packaging (A-D) [
76,
82], were found controlled by the nuclear lamina in melanoma [
83].
-
SYCP1 and SYCP2- the genes of the meiotic synaptonemal complex are also attributed to CTA by their antigenicity outside the testis-blood barrier [
84].
- The antigenic
PRAME is a very important master gene connecting the meiotic giant nucleus of the cancer cells genome network with the MAGE-A cluster (
Figure 3A). As a nuclear receptor transcriptional regulator, it activates the embryonal stemness (through
OCT4A) and PGC program (through
SOX17) [
85] and it also downregulates cell differentiation as a dominant retinoic acid receptor signalling gene [
86]. So, PRAME can be crucial for cell fate change and soma-germ transition. PRAME activates
MAGE-A-1 and
MAGE-A-3 and is overexpressed not only in many solid tumours but also in myeloid leukaemia [
87].
More than 300 human-specific genes and 1,000 primate-specific novel genes appear to be implicated in brain development and male reproduction [
88] and often share the chromosome location and functions. A large MAGE-A group of spermatogenic genes is located in the X-fragility Xq27.3 locus, which is associated with the FMR1-mental disability syndrome [
89] coupled with a low sperm count in men [
90] and mild ovarian failure in affected women [
91]. Fragile X-Associated Tremor/Ataxia Syndrome (FXTAS) [
92] is another pathology involving this site. A more detailed CTA-related brain-associated pathology review and discussion are out of the scope of this article.
From our review of CTA gene and protein functions, one can see how complex, dose-sensitive, and flexibly adaptive they are. While dominating in the spermatogenesis regulation, they are also expressed in the trophoblast, some participate in the female gametogenesis, activation of the embryonal stemness, PGC pathway and somatic sex determination. These literature insights will help in interpreting the gametogenic gene STRING networks which we further analysed.
4. STRING network analysis of GG genes in the human genome along the evolutionary phylostratigraphic axis
The list of 1474 gametogenic (GG) genes is compiled from CTDatabase [
93] cancer-testis genes, the germ cell-specific genes from the work of Bruggeman et al. [
22] and the MeiosisOnline meiotic gene database extended with a manually curated gene list (
SYCP1, SYCP2, SYCP3, SYCE1, SYCE2, HORMAD2, MAEL, MEIKIN, MEIOB, MEIOC, SYC E1L, TEX11, MAJIN, FAM9C, FAM9B, FAM9A, REC114, TEX19, BRME1, TEX14, MSH4, TEX15). For purposes of this work, we have extended it further, adding genes of early embryogenesis (
POU5F1, ZP4) and genes from the pregnancy/placentation functional modules recently identified by us to be upregulated in MDA-MB-231 cells on day 5 after doxorubicin treatment [
19]. The phylostratigraphic distribution of these genes was determined using the gene phylostratigraphy data from the work of Trigos et al. [
10]. This gene list, now designated GG+, is presented in
Table S1 alongside their respective phylostratigraphic groups (phylostrata). Their STRING protein-protein interaction (PPI) networks are displayed in
Figure 5,
Figure 6 and
Figure 7 and 9. These STRING PPI networks were constructed separately for phylostrata 4-5, 8, and 10-16 at medium confidence, while the phylostrata 1-2 network is republished [
7]. The STRING database’s [
94] web interface was used to prototype the networks and identify their giant components, after which the STRING network tables were downloaded and the final design of the networks was constructed using ggraph [
95] and ggplot2 [
96] in R.
Stratum 1+2 (
Figure 5). The modules of the cell cycle, meiotic cell cycle, DNA repair and recombination, and gamete generation were revealed. It indicated the already-established emergence of meiosis and sex in UC eukaryotes. Notably, a loose subnetwork of gamete generation is not integrated into the dense network core component including meiotic recombination and DNA repair genes. It may mean that in this case, meiosis could be present in one of its evolutionarily earlier forms – endomitotic or zygotic [
97,
98,
99].
Stratum 4+5 (
Figure 6): Two interacting network clusters are presented. A central subnetwork is a cluster of meiosis networked with the generation of both gametes (gametic meiosis). The meiotic cluster (including PRDM9, MRE11, TEX11, MCMDC2 and other essential genes for homologous pairing and programmed repair of DNA DSBs, like MCMDC2 [
100]. This cluster also includes NANOS1 (light-blue), which downregulates mitosis during female germline development [
101] and possesses the features of oncofetal oncogenes [
102], with a direct link to DAZL. DAZL (triple-coloured) is a hub of early embryogenesis (ESCs) and the development of primordial bisexual germ cells (PGCs) [
103]. Furthermore, three other key genes of the ESC and PGC determination, PRDM14, BMP4 and WNT3, are present [
104]. Another big cluster highlights the mito-meiotic cell cycle transition. The genes of this cluster indicate replication stress (ATAD5), DNA replication and S/M and G2/M checkpoints (TICRR, CLSPN), DNA double strand-break repair by homologous recombination (FAM175A; RAD51AP1), delay in G2/M phase progression (GTSE1), inactivation of the anaphase-promoting complex, metaphase-anaphase transition in meiosis I (CCNB3), microtubule motor (KIF14) and remodelling of MTOC during oocyte maturation (FBXO5, CEP152). The cluster also includes the meiotic crossover junction endonuclease EME1.
In summary, we see the role of Strata 4+5 in the mito-meiotic transition induced by replication stress and DNA double-strand breaks (likely transiting the mitotic G2/M with its DNA damage checkpoint into meiotic prophase [
105,
106]), gametogenesis networked with meiosis (gametic meiosis), converging to the establishment of the Metazoan preimplantation embryo and germline lineage (PGC), oocyte maturation with the oncofetal potential, also supported by the literature data and linking Stratum 5 with oncogenic driver genes [
9,
107].
Stratum 8 (
Figure 7): This GRN part embraces mostly “reproductive processes” (GO:0022414) at the level of individual organisms. A gene central to the network, FOS, represented by its AP-1 dimer (with JUNB), highlights a general stress response. In the reproductive context, FOS is critical for the upregulated expression of key ovulatory genes in human granulosa cells, mediated through hormonal receptors, PGR and EGF, signalling [
108]. At the same time, FOS and JUN B are members of the “female pregnancy” GO module (GO:0007565). Thus, here FOS is uniting two somatic modules of reproductive processes: the cluster of endocrine somatic sex determination, on the right, and the immunity/placental cluster, on the left. Most genes of the GO “female pregnancy” module (GO:0007565), IL-1β, VEGFA, THBD, AREG, PGF, PTHLH, AGT, FOS, and JUNB are interlaced with the immunity network of cell communication and angiogenesis knotted by cytokines IL-10 and IL-1β. In addition, we find a cluster related to conventional meiosis I including the centromeric recombinase REC8 and its stabilisers (SGO1,2) and the central elements of SC (SYCP1, SYCE1). This subnetwork is connected to the ZP3/ZP4
zona pellucida proteins (the vertebrate egg-coating glycoprotein interacting with ECM) enclosing the matured oocyte and early embryo [
109]. Another link between somatic regulation of reproduction to generative embryonic stem cells is seen by the inclusion of the POU5F1, a key to embryonic pluripotency and PGC development [
110].
The cluster of somatic sex determination (“the determination of sex and sexual phenotypes in an organism's soma and involving endocrine regulation”) reveals here key genes. DMRT1 (the double-sex-related transcription factor) is involved in sex determination and gonadal development (stimulation of Sertoli cells and ovarian follicles [
111,
112]). Its expression in PGCs is not sexually distinctive, moreover, DMRT1 is haploinsufficient for testicular development, and it can cause male-to-female sex reversal in the embryo. In humans, DMRT1 is critically required for the development of the testis during the foetal period. In the adult testis, DMRT1 is predominantly expressed in Sertoli cells and also is required in spermatogonia, enabling restoration of their pool after sperm depletion. Another gene involved in epigenetic sex determination, CYP19A1 is involved in the androgen-to-oestrogen receptor conversion (AR-ER) by aromatase P450. Its activity in males is restricted by methylation, while the haploinsufficient-for-males DMRT1 is more methylated in females [
113], both thus epigenetically preventing male-female sex reversal in the embryo. The actions of steroid androgens such as testosterone and dihydrotestosterone are mediated via the X-linked androgen receptor (AR). It becomes hyperactive (undermethylated) in the breast- (regardless of patient sex) [
114,
115] and castration-resistant prostate cancers [
116]. AR is also involved in the brain and some other tissue functions [
117]. The third gene of this cluster NROB1 (DAX1) is related to sex determination and reversal through its link with glucocorticoid receptors (adrenal gland), while SOX3 is involved in sex determination and brain development. The balanced sex determination by this system provides the normal male/female birth sex index ratio (approximately 1). More about the hypothalamus-gonadal endocrine sex regulation can be found below, in the section devoted to male infertility and Figure 13 herein.
The aromatase CYP19 for the AR-ER transition coded by its respective sex determination gene is expressed in the reproductive organs and the brain of most mammals. Notably, in primates, an LTR-ERV-promoted transcriptional variant of this gene confers the alternative expression to the placenta [
118].
In summary, we see the organism-level regulation of reproduction by cytokine-cytokine receptors and growth factors for cell communication in immunity fused with the placental regulators, mostly for angiogenesis linked by the general early stress response gene FOS to the establishment of the vertebrate endocrine somatic sex determination. This somatic regulation also has links to the established meiosis and the mature oocyte-early embryo vertebrate development. As it looks, the whole set, if not fine-tuned for males or altered, is inclined by default for embryo sex reversal and female germline development.
The placenta module, standing out in phylostratum 8, much before the development of Eutherians, seems confusing. Therefore, in the next section, we inserted a mini-review devoted to placenta evolution and its cancer-related issues.
6. The mammalian placenta evolution, immunity, and the enrichment of the “female pregnancy” GO module in genotoxically challenged PGCCs
Before mammalians, in the teleosts (Teleostomi, Phylostratum 8) a “follicular placenta”, lined with microvilli and surrounded by the highly vascularized tissue to facilitate maternal-foetal exchange, has arisen several times, making it a model for the evolution of placentation [
119]. However, it is only in mammals that the placenta has developed from a trophoblast lineage specified in an embryo from the morula stage [
120], which in humans is represented by its invasive variant [
121,
122]. It is noteworthy that the resulting placental structures of mammals and their associated trophoblast cell populations appear not to be governed by particular master genes but rather depends on widely expressed transcription factors embedded into the intercellular communication network of immunity cytokines and growth factors (which also evolved in Euteleostomi [
10]) and operating in a combinatorial manner [
121]. There are few predominantly placenta-specific genes, e.g.,
GCM1, a transcription factor that plays a role in controlling the formation of syncytial trophoblast (STB) [
123]. Intriguingly, it was found recently that ancestral retroviral infections have provided a source of novel protein-coding genes that have played a role in the Eutherian evolution. In many species, the placenta expresses a range of endogenous retroviruses (ERVs) that are involved by integrating part of their DNA (
env) into the regulatory part of placental genes to produce cell-fusing syncytins of the syncytiotrophoblast, the most specific structural component of the placenta [
121,
124,
125,
126]. The transmembrane fraction of syncytins called the immunosuppressive domain (ISD), which can induce severe immunosuppression of host cells, is therefore potentially oncogenic. The proviral activity of the retrogenes is controlled by the innate immune response to viral and cytosolic DNA fragments (the cGAS-STING pathway [
127]). However, in senescent and cancer cells, this pathway may be unleashed, particularly by anticancer treatment, resulting in the activation of the GO module of ‘female pregnancy’ in the PGCCs undergoing repeated rounds of mitotic slippage [
19]. As reported, the induced change of gene networks in the Doxorubicin-treated triple-negative breast cancer MDA-MB-231 cells (exampled and illustrated in
Figure 8A at day 5 post-treatment) is mostly highlighted by the differentially expressed gene phylostratigraphic distributions markedly peaking at the 8
th phylostratum (
Figure 8B), which is related to innate immunity and hubbed, in particular, by IL-1β [
19]. The highly upregulated IL-1β shares the enriched ‘female pregnancy’ module (
Figure 8 A and C) with modules related to innate immunity, which has originated in Euteleostomi (see above in
Figure 7).
7. String network analysis (continuation) Strata 10-16 (Figure 9)
This gametogenic subnetwork of the human genome is of late evolutionary origin (mammalian-human). It is mostly represented by X-linked CTA protein interactions highlighting spermatogenesis. It is enriched with the MAGE group and related reproduction processes. In particular, all these genes are also potential oncogenes [
4,
5,
6]. The densely intertwined CTA-MAGE core of the network (predominantly from the long arm Xq27-28, (see details in
Figure 4)) drives spermatogenesis from the proliferation of spermatogonial cells, including the FATE1 (Xq28) gene. This important gene, which is strongly active in embryonic and adult spermatogenesis, is a key factor in decreasing the sensing of stress [
128] and is harnessed by cancer cells to escape apoptotic death and resist the action of chemotherapeutic drugs [
128,
129].
Figure 9.
The STRING network of GG+ genes (Suppl. 1) corresponds to the 10th-16th evolutionary phylostrata. Genes (nodes) belonging to enriched functional modules of interest are displayed in the form of pie charts. The giant component of the network belongs mostly to MAGE-associated genes of spermatogenesis, it is connected to the subnetwork of the embryonic stem cell (ESC) and primordial germ cell (PGC) development. The link of MAGE-A11 to the androgen receptor (AR) is indicated by a red dashed line. The two genes sharing spermatogenic functions with brain development are yellow-marked.
Figure 9.
The STRING network of GG+ genes (Suppl. 1) corresponds to the 10th-16th evolutionary phylostrata. Genes (nodes) belonging to enriched functional modules of interest are displayed in the form of pie charts. The giant component of the network belongs mostly to MAGE-associated genes of spermatogenesis, it is connected to the subnetwork of the embryonic stem cell (ESC) and primordial germ cell (PGC) development. The link of MAGE-A11 to the androgen receptor (AR) is indicated by a red dashed line. The two genes sharing spermatogenic functions with brain development are yellow-marked.
Two other notable genes are TEX15 (8p12) required for DNA DSB, chromosome synapsis, and meiotic recombination in spermatocytes; while SPANXN and (A-D) families (Xq26.2 -27.3) are strictly associated with spermiogenesis (sperm motility and head packaging [
130]) and strongly involved in melanoma genesis [
131]. Furthermore, the gene FMR1NB (Xq27.3) has both spermatogenic [
132] and oncogenic functions [
133]. It is the closest neighbour of the gene FMR1 (associated with mental disability and reduced fertility X-fragility syndrome). The aligned looser subnetwork of CTA genes includes the SPANX-N family and the MAGE B1-B6 group (neighbours of the NROB1 gene of dosage-sensitive sex determination [
81] on the short arm of the X-chromosome (for localisation details see
Figure 3). The CTA45A gene family is composed of the testis-restricted cancer genes, whose tumorigenic, invasive (EMT-promoting) capacity is enhanced by growth factors as examined in breast cancer. The genes of the CTA47A family, which are testis-restricted, form a compact group on the Xq24 locus and also interact with CTA genes on the Xq28 (MAGEA1) and Xp22.2 locus [
134]. The function of the CTA 44-47 families is evidently spermatogenesis-oncogene-related (as seen in GeneCards) [
135] (as manually labelled).
A more remote subnetwork, including the DPPA group - development pluripotency-associated genes - belongs to the ESC and PGC module, it importantly includes the ESC master gene NANOG, which is strongly expressed in foetal gonocytes and
in situ germ cell cancer [
134]. This subnetwork also includes DNMT3L, inherited by maternal imprinting which is promoting neural tube, placenta and ovary development, and inactivates RT in the male germline, safeguarding it from mutations. This ESC/PGC subnetwork is converging to DPPA4 – the development pluripotency chromatin modifier, which links it (through the multidrug-resistance gene MDR1 for the brain-blood barrier) to the FATE1-mediator gene bridge connecting large clusters of spermatogenesis and CTA/MAGE-A. In turn, the CTA/MAGE-A subnetwork converges to the PRAME gene. The PRAME gene (chr #22) represses endoderm differentiation, activates the
POU5F1 promoter for induction of ESC and modulates
SOX17 to function as a master PGC gene [
85]. In addition, PRAME is activated by IFN-γ closely associated with cGAS-STING (sensing cytosolic DNA) and the IL1β signalling [
136].
In summary, the reproductive CTA genes of the Strata 10-16, which originated in mammals and expanded in primate evolution and, even more, in humans, are largely X-linked or maternally imprinted, and aimed in general for the support of the male germline development, from spermatogonia to sperm maturation. They provide stress protection for the male reproductive system, however, paradoxically they also acquired the functions of antigens and oncogenes. The CTA module of this stratum is capable of activating the modules of ESCs and sexually undifferentiated PGCs, due to the included ESC master gene
NANOG. Activated
NANOG, in turn, is well-known for its cooperation with the pluripotency transcription factors
POU5F1 (
OCT4) and
SOX2, which are the “pioneers” of development [
137].
It seems important to further clarify the potential carcinogenic link between late spermatogenetic Phylostrata of mammals and humans and the Phylostratum 8, which evolved much earlier.
8. The causal link between GG genes of the 8th phylogenetic stratum and CTA-enriched strata 12 and 14 in spermatogenesis and in cancer progression
In our previous bioinformatic study on the distribution of polyploidy (WGD)-upregulated GG genes in TCGA tumours (using Quinton et al.’s data [
138]) on differential expression between diploid and polyploid tumours), we characterised the GG gene distribution histograms of 17 primary tumour types from the TCGA database [
7]. The typical phylostratigraphy profile for common solid cancers with the dominating Stratum 2 is presented for BRCA in
Figure 4B. But between 17 tumour types, there were two distinguished and opposite patterns, for Testicular germ cell tumour (TGCT) and Head and neck squamous carcinoma (HNSC). In TGCC, stratum 8 and further strata were absent, while in HNSC, on the contrary, those were highly expressed (
Figure 10 A and B).
To understand the situation with Testicular germ cell cancer (TGCT), it is useful to learn about its development in ontogenesis [
139]. The TGCT develops in adulthood from the dormant PGC cells, arrested during the foetal period, that does not even start the gonocyte/spermatogenesis pathway. Accordingly, TGCT does not express CT antigens [
140]. It follows that regulation of spermatogenesis encoded by CTAs in mammals needs the expression of the proteins encoded in stratum 8 (sex determination, at least, certainly). TGCT is very different from seminoma which is developed from spermatogonia [
139] and expresses CTAs (
Figure 3B).
For HNSC strongly expressing Str. 8 and CTA genes associated with polyploidy (see
Figure 3B), we decided to analyse here the distribution of the differentially expressed (compared to 44 normal samples) GG+ in the phylostratum peaks depending on the cancer stage and polyploidy, as verified on a large number of TCGA-HNSC samples (Stage I – 25; Stage II – 74, Stage III – 74, stage IV – 259 samples). Polyploidy samples were filtered with a threshold >3.5 using GG+ (Suppl.1). As can be seen in
Figure 10C, the number of the GG genes available for analysis, which is generally the highest in Str 2, undergoes at Stage I also predominant peaking there, along with the increase in Str 4-5 and 8, whose GG numbers are relatively smaller, while the reaction of Str 12 is weak, and of Str 14 practically absent. However, with cancer progression and particularly at Stage IV of disseminated metastatic cancer, the profile is changing its vector, favouring a more significant GG+ contribution of Str 8 and later strata. The difference in fold-increase of the involved GG+ gene count in each stratum of interest comparing Stage IV with Stage I, as presented in
Figure 10D, clearly shows this tendency and the particularly big response of the hominid CTA MAGE-rich Str.14 (10-fold). It indicates the crucial role of CTA genes in the polyploidy-related metastases of this cancer type. Although HNSC has a good prognosis for patient survival at the initial stages, recurrent or metastatic HNSC is largely incurable [
5].
The genome instability of mammals and endocrine disruption enhanced by the current environmental pollution and climate change synergistically potentiate male infertility, embryonic sex reversal probability, and increase cancer risk. These factors are also linked to developmental disorder risk. The following sections will briefly outline these aspects.
9. The global decline of male fertility, link to cancer risk, and the consequence of endocrine disruption
The first evidence of a global impairment of male reproductive health was published in 1992 [
141]. The authors reported a significant (two-fold!) decrease in mean sperm counts from 113 million/ml in 1940 - to 66 million/ml in 1990 in the United States, and many European, South American, African and Middle-East countries (
Figure 11A). They also reported the concomitant increase of the pathologies and morbidities of the male reproductive tract such as testicular cancer, cryptorchidism (undescended testicles in newborns), and hypospadias.
Many researchers considered the data from this publication as a kind of slowly ticking bomb threatening mankind, and many studies were initiated to either confirm or overturn these data. The following publications were very controversial – some of the studies did not confirm the decline in semen quality over time [
143,
144], whilst the data from the others supported such a decline [
145,
146,
147,
148]. However, there were important limitations connected to all these studies: 1) data were poor or highly variable; 2) the validity of the statistical methods was questionable; 3) different study populations were investigated; 4) confounding factors such as age and abstinence time (time between sample collection and last ejaculation) were not taken into account in all studies [
142].
The first systematic review and meta-regression analysis of temporal trends in sperm counts was published in 2017 [
142]. It reported a significant overall decline in semen quality in men from North America, Europe, Australia and New Zealand, analysed between 1973 and 2011. Declines were most pronounced among men unselected by fertility – they showed a decrease of 59% (−1.6% per year) over the study period (
Figure 11B). Declining slopes remained unchanged after controlling for multiple covariates: age, abstinence time, method of semen collection, method of counting sperm, selection of population and study exclusion criteria, number of samples per man and completeness of data. Thus, these data provided a robust indication of a decline in male reproductive health in North America, Europe, Australia and New Zealand over the 4 decades.
The most recent systematic review and meta-regression analysis of semen samples collected globally in the 20th and 21st centuries confirmed a 50-60% decline in sperm counts among unselected men from all continents, including South and Central America, Asia and Africa [
3]. The 21st century is hallmarked by the acceleration of sperm count decline (
Figure 12).
Also, the level of testosterone in adolescent and young adult men as monitored in the USA from 1999 to 2016, was decreasing by 1% per year [
149]. Sperm counts and other semen parameters have been plausibly associated with multiple environmental influences, including endocrine-disrupting chemicals [
150,
151], pesticides [
152], heat [
153], lifestyle and diet [
154,
155], stress [
156,
157], smoking [
158], and elevated body mass index [
159,
160]. Therefore, sperm count may sensitively reflect the impacts of the modern environment on male health throughout the life course [
161], while severe infertility is a marker of genome instability as such [
162].
In addition to the global decline of semen quality, an alarming relationship between decreased semen quality and infertility on one side, and increased morbidity and mortality on the other side has been observed. It has been reported that men from infertile couples have approximately twice-fold more morbidities as compared to their fertile counterparts [
163]. Infertile men are at a higher risk of having diabetes, cardiovascular diseases, auto-immune diseases, rheumatic arthritis, and multiple sclerosis [
164,
165,
166]. Infertile men also have a 1.5-fold higher risk of cancer, and a 2-fold higher testicular cancer risk in particular [
167]. Moreover, infertile men with the most severe phenotype – azoospermia (a complete absence of spermatozoa in ejaculate) exhibit a 3-fold higher risk of any type of cancer, as compared to men without fertility problems [
164].
Also, it has been shown that decreased sperm quality has a significant correlation with increased mortality [
168,
169]. More than 20 years ago a Danish group led by Skakkebaek introduced “testicular dysgenesis syndrome (TDS)” – an increasingly common developmental disorder with environmental aspects. They suggested that declining semen quality, increasing testicular cancer, undescended testis and hypospadias share a common pathogenesis and are the features of the same TDS [
170]. Experimental and epidemiological studies showed that TDS is a result of disruption of embryonal programming and gonadal development during foetal life by different endocrine disruptors that have polluted our modern environment, and humans are massively exposed to it by food, water, cosmetics, construction materials, etc. [
150,
151]. A scheme of the endocrine disruption of male fertility based on somatic sex determination from [
171] modified by adding also the female regulation is given in
Figure 13.
10. Conclusions
The human genome gene network is composed of two evolutionary parts - the UC part, encompassing the most essential daily functions which gave birth to the cell cycle, the DNA damage response, meiotic DNA recombinational repair, and gametes. The second is the MC part. The early MC genes are partially ambivalent (Strata 3-5). It can be relatively easy, by activation of c-Myc, polyploidy and bivalency, to be united with the dense network of the UC core, providing a basic framework for the gametogenesis by cancer pseudo-embryogenetic (parthenogenetic) PGCCs or amoeboid sporogenesis. The main cancer drivers enabling this by mutations or epigenetically are pre-programmed as the normal regulators of the reproductive process that evolved at that same period. Therefore cancer already appeared in the
Hydra. The evolution of UCs and early MCs was very long (~ 3Bya) and relatively gradual. The 2R - genome doublings pushing the Cambrian explosion in vertebrates allowed the huge variety of newly emerged genes and animals but the evolution of mammals started only ~250-200 Mya, and that of primates - even later, and it had another rate, accelerated by retrotransposon bursts and their DNA insertions in the genomes. It created the incredible complexity of the mammalian and the human brain. The complex biological systems are intrinsically unstable and are thus able to adapt to the environment and change cell fates through exploration and learning [
14,
172,
173]. In this review, we gathered evidence on the features of genome instability and its use in the evolution of mammals and hominids, including segmental genome duplications, increasing retroviral domestication and their ongoing activity, and genome fragility as a source and consequence of both adaptations and cancer. Those also created CTA genes for supporting male reproductivity, counteracting environmental stress by high adaptability (using ERVs for constructing the alternative promoters for other tissues (different from the testis), and intrinsically disordered domains for post-translational switching of cell fate and tissue functions). The reason why just male reproduction needs protection becomes clear from our analysis of the STRING networks in the evolution of reproductive systems: it was created by evolution as female by default, to ensure life continuation in the embryo. We revealed how CTAs are also interacting with the evolutionary reproduction somatic tools from Euteleostomi, with the shaky for males somatic sex determination system and also complying cellular communication presented by immunity and vascularisation/invasion (“female pregnancy”) system in cancer aggression. The instability of the human genome and adaptive plasticity in the regulation of CTA genes aiming to stabilise spermatogenesis against environmental stress still appeared a too fragile instrument in the face of the challenges of environmental pollution, increasing social stress, and particularly endocrine disruption. Both the Scylla of the preferential female gametogenesis-linked carcinogenesis formed at the beginning of UC-MC evolution and the Charybdis of the embryonic male-to-female sex reversal hidden in the sex determination system and established in the vertebrate evolution make the journey of the human male germ differentiation, sailed by the CTAs, unsafe, increasing risk of infertility and gametogenesis-linked cancer, “female” by its origin. The acquisition of the second active X-chromosome by malignant tumours aggravates the story. This story is not contradictory but likely complimentary to the hypothesis of Lavia et al [
174] who proposed a model based on Waddington's cell differentiation landscape, whereby LINE-1 expression in adult cells triggers chromatin remodelling and reactivates embryonic circuits, ultimately leading to cancer malignancy. The bridge between the two evolutionary branches of human cancer, early and late, “female” and “male”, joined by stress-induced events, which favour suppression of male fertility, while increasing the drive towards cancer development (as revealed in our analysis) is schematised in
Figure 14:
Without a doubt, there are and can be further developed many means of combinatorial individual cancer treatments, in particular by modulating immunity, which can prolong life of the patients with metastatic cancers. But the most urgent need of mankind for survival is to stop the environmental pollution by organic materials, delay the atmosphere heating, and shift to a more healthy lifestyle.
Author Contributions
Conceptualization, Je.E., R.K., N.M.V., and Ju.E; methodology, N.M.V., P.Z. and JeE; software, N.M.V. and P.Z.; investigation, N.M.V., K.S., F.R., I.I., and P.Z.; data curation, D.P.; writing—original draft preparation, Je.E., M.L., N.M.V., Ju.E; writing—review and editing, Je.E., Ju.E., N.M.V., R.K., D.P., and P.Z.; visualisation, K.S., N.M.V., and P.Z.; supervision, Je.E. All authors have read and agreed to the published version of the manuscript.