Repeat-Induced Point mutations (RIP) drives the formation of distinct sub-genomic compartments in Fusarium circinatum

: Repeat-Induced Point mutations (RIP) serves as a genome defence mechanism that impedes the deleterious consequences of repeated motifs such as transposable elements in fungi. Genomic regions with RIP are biased for adenosine and thymine transitions and the cumulative influence of RIP is thought to have a considerable impact on genome composition. We investigated the impact of RIP on localized genomic regions and whole-genome sequences for representatives of the pine pathogen, Fusarium circinatum . We set out to determine the intraspecific variation in acquired RIP and the role of RIP in the development of diverse F. circinatum sub-genomic compartments. The results of the study show that the AT-enriched sub-genomic compartment accounts for ca . 97% of the calculated RIP and was further prominent in both core and accessory genomic regions. However, more extensive RIP was observed in the accessory sub-compartment and more variable regions of the genome. Regions with RIP indicated increased intrinsic curvature of the DNA which may influence DNA-protein interactions and may promote constitutive heterochromatin formation. The results show that RIP is an important source of functional novelty and genome variation. RIP contributes to the evolution of the genetic landscape and differentiation of diverse sub-genomic compartments of this important fungal pathogen.


Introduction
The genome architecture of many fungi in the subphylum Pezizomycotina is characterized by local GC content biases that have important implications for genome evolution [1,2]. In these fungi, telomeric regions, dispensable chromosomes, and accessory sub-genomic compartments often also display greater variability in genetic composition compared to that of the remainder of the genome [3,4]. These more variable regions usually represent fast-evolving reservoirs of functional novelty, the 3 allow important conclusions to be drawn regarding the way in which RIP drives genomic variation and contributes to the evolution of the genetic landscape and differentiation of sub-genomic compartments of this and other fungal pathogens.

Genome sequences and gene annotations
The genome sequences for five isolates of F. circinatum were used in this study (Table 1). Of these, the sequence of FSP34 were available from a previous study [29] and is available from the National Centre of Biotechnological Information (NCBI). The genome sequences for isolates CMWF567, UG27, FSOR, and FFRA were sequenced in the current study, after their species identity was confirmed using phylogenetic analyses (see Figure S1 and associated references). High quality genomic DNA was extracted from seven-day old cultures grown on half-strength PDA as described previously [33]. Genome sequencing for isolates FSOR and FFRA was performed using paired-end (350 bp median insert size) and mate-pair (5 kilobase pairs [Kb] median insert size) libraries on the Illumina HiSeq XTen and Hiseq2000 platforms, respectively, at Macrogen (Seoul, Korea). Those of CMWF567 and UG27 were performed using PacBio long read sequencing (10 Kb and 20 Kb libraries), and Illumina HiSeq2000 250 bp paired-end sequencing.
Following quality filtering and trimming of raw read sequences using CLC Genomics Workbench v. 8.0.1 (CLCBio, Aarhus, Denmark), de novo genome assembly was performed using ABySS v. 1.3.7 [34]. Contigs were scaffolded using SSPACE v.2.0 [35] and gaps filled using GapFiller v. 1.11 [36] to yield chromosome-sized scaffolds. These were then assembled into pseudochromosomes (Table S1). This involved ordering and orienting the assembled scaffolds of the newly sequenced genomes by making use of information generated from LASTZ 1.02.00 [37] alignments implemented in Geneious R8 [38] using the FSP34 genome as reference. See Table 1 for DDBJ/EMBL/GenBank accession numbers for the genome sequence data for the five isolates.  [39] This study, [39] [ 29] This study, [40] This study, [41] DDBJ/EMBL/GenBank accession number The level of completeness was evaluated for the five F. circinatum assemblies using the software BUSCO (Benchmarking Universal Single-Copy Orthologs; v2) with the Sordariomyceta data set [42,43]. All genomes were annotated using CLC software. The annotations were predicted by WebAUGUSTUS v.2.5.5 [44] using gene models for F. graminearum (http://bioinf.unigreifswald.de/augustus) together with mRNA sequence data generated for F. circinatum FSP34 [29]. 4 To study the presence and distribution of genes associated with the functioning of RIP, DNA methylation and RIP directed heterochromatic silencing (based on reports from N. crassa) [10,[45][46][47][48][49] , BLASTp and tBLASTn searches were performed on the genomes investigated in this study. Fungal nucleic acid and protein sequences with similarity (Expect-value < 1x10 -5 ) to those described in N. crassa were identified using CLC software. Although the exact genomic mechanism underlying RIP is still under investigation, the query sequences considered in this study were the two 5-cytosine methyltransferases RID (RIP deficient) and DIM-2 (defective in methylation). In these analyses, we also included the six additional cofactors of the DIM-2 mediated DNA methylation, HP-1 (heterochromatic protein), DIM-5, -7, DNA Damage binding protein 1 (DDB-1) and Cul4 (Cullin-4 protein) [47][48][49]. Further, a set of 12 effector genes, previously reported in F. circinatum isolate FSP34 [50], were identified using nucleotide BLAST searches as described above for the four remaining genome assemblies. These genes were expressed in planta and significantly upregulated in the fungus during pathogenesis on pine [50]. Functional annotations, predicted functional domains, protein family (PFAM) membership and Gene Ontology (GO) were evaluated using InterProScan v. 5 [51].

Genome-wide RIP analysis
For downstream RIP analyses, the five genomes examined here were artificially segmented to generate six types of data sets. The first of these included only the coding content for the five genomes, where the coding sequences predicted by AUGUSTUS was included in separate data sets for each isolate. Similarly, the second data set type included only the sequences for the putative 12 effector genes identified for these fungi.
The fourth type of data set contained sequences for the accessory genomic compartment and for the core compartment. To generate these data sets, core and the accessory regions were predicted using NUCmer alignments, implemented in Spine (v0.3.1) [52]. Sequences that were shared among all five genomes with 95% nucleotide identity across 10 000 bp were considered as core genomic regions, while regions unique to each isolate were recorded as part of the accessory sub-genomic compartment. Core and accessory data sets were accordingly generated for each of the five F. circinatum genome assemblies.
To generate the fifth type of data set, genome sequences were separated based on chromosome identity. For this purpose, homologous chromosomes in the various genome assemblies were identified using LASTZ alignments as described before. Accordingly, the data sets contained the respective core chromosome (chromosome 1 to 11) and the dispensable chromosome (chromosome 12) sequences. Because the RIP pathway is induced by repeated sequences, the data for the dispensable chromosomes also included information regarding the distribution of TEs. The latter was identified in the dispensable chromosome assemblies for the five isolates with REPET [53,54]. Where relevant, REPET annotations determined previously for the FSP34 strain [10] were included in these analyses.
The sixth type of data set included the sequences for putative centromeric loci and telomeric regions. This was done as described previously [55] by using 100% similarity to the known "TTAGGG" telomeric repeat sequence described for F. circinatum and other fungi [55,56]. The putative positions of centromeres were determined by identifying homologous, GC-depleted and extensively RIP affected regions that were common to the five isolates and that occurred in regions homologous to those of the centromeres of the closely related fungi, F. fujikuroi and F. verticillioides [57]. To confirm the putative locations of F. circinatum centromeric regions, the genomic positions of genes flanking the identified 5 centromeric regions of F. fujikuroi and F. verticillioides [57] were compared to those occurring adjacent to the putative centromeric regions in F. circinatum by using tBLASTn searches implemented using CLC software (Expect-value < 1x10 -5 ).
The whole genome assembly for each isolate, as well as the various data sets containing particular types of sequence information (described above) were subjected to analysis with The RIPper v1.0 [58]. The widely used RIP indices [20,[58][59][60], were calculated for all sequences using 1000 bp sliding windows with 500 bp steps, except for the data sets containing only the coding sequences and the effector gene sequences that utilized 100 bp windows and 50 bp step sizes. For the RIP analyses performed in this study, stringent parameters were applied (i.e., RIP product index >1.1, RIP substrate index  0.75 and RIP composite index > 0) to limit false positives (i.e., regions erroneously suggested to contain RIP), as suggested previously for the Ascomycota [58]. Large RIP Affected Regions (LRARs; i.e., genomic regions spanning at least 4 000 bp or seven consecutive RIP-affected windows) were identified using the "Calculate LRAR" tool of The RIPper by applying default parameters.

Loss of synteny and collinearity
In order to evaluate whether the RIP pathway can affect synteny and collinearity among homologous genomic regions, ten LRARs from FSP34 were randomly chosen using Randomizer (www.randomizer.org). The nucleic acid sequences, together with their predicted annotations of the selected LRARs, were then used as query sequences in nucleotide BLAST searches and comparisons using CLC software (Expect-value < 1x10 -5 ) to identify corresponding regions in the genomes of the other four fungi. Five Kb regions up and downstream of each of the LRARs were included in these analyses. The results were visualized with Kablamo v. 1.0 [61] using the XML file output retrieved from NCBI alignment analyses.

DNA curvature
In order to explore the potential role of RIP on intrinsic DNA curvature, genomic regions with and without RIP were investigated. Using the information generated above, a set of ten 1000 bp-windows with RIP and ten such windows without RIP were randomly selected from the data generated for FSP34, by making use of Randomizer. In the same way, we also selected ten LRARs in the FSP34 genome. The three data sets were then subjected to DNA curvature analyses using Bend.it [62]. In this analysis, the default window size of 31 bp were used together with bendability parameters originally derived from nucleosome binding and DNaseI digestion data [62]. In addition, these analyses of DNA curvature were also performed on the data sets containing sequences for the GC-light and GC-heavy isochores of FSP34 using the information generated above.

Genome sequences and annotations
The genome assemblies of the five F. circinatum isolates had comparable genome statistics ( Table  1). The assembled draft genome for the newly sequenced isolates ranged from 45.58 million bases (Mb) (FSOR) to 46.83 Mb (FFRA), with an average GC content ranging of 46.8-47.1%, which was comparable to that of the FSP34 strain of F. circinatum ( Table 1). The respective assemblies consisted of 24 to 57 contigs. Based on the LASTZ alignments, the four new assemblies contained 12 chromosome-sized scaffolds corresponding to the expected 11 core and 12 th dispensable chromosomes (Table S1) [63].
The BUSCO results showed that the assemblies were between 96.1% (UG27) to 99.0% (FFRA) complete (Table 1 and Table S2), suggesting that they included most of the expected gene content in these fungi [53,54]. Based on the WebAUGUSTUS results, these strains encode between 14 868 (CMWF567) and 16 509 (FSOR) genes (Table 1). Of these, candidate homologous sequences for each of the query genes associated with the RIP process and DNA methylation (described for N. crassa; Table  S3) were identified for the five F. circinatum assemblies and were comparable to RIP associated genes identified previously for FFSC species and that of FSP34 [10].
Additionally, putative homologs for the 12 effector genes in FSP34, which were previously shown to be upregulated during pathogenesis on pine [50], were identified in the four new genomes examined (Table S4). Gene sequences sharing high nucleotide similarity in comparable genome locations were identified for the 12 effector genes in each of the isolates (Table S4). Notably, the effector genes, FCIRG_06237 and FCIRG_08174 identified in FSP34 [50] likely originate from a tandem gene duplication event, as they shared similar protein functional domains and were located adjacent to one another (FCIRG_08174 was absent from UG27 and CMWF567). Overall, the InterProScan results showed that most of these genes had similar functional annotations and contained the requisite functional domains as those identified for the FSP34 data set (Table S5). Some differences were observed in that genes contained domains that differed compared to those described in FSP34, which included four genes in FFRA, two in FSOR, two in UG27 and two genes in CMWF567 (see Table S5 for detail).

Genome-wide RIP analysis
The overall summary of RIP statistics was evaluated for the entire set of genome assemblies for F. circinatum (Table 2). These analyses showed that between 5.96% (FSOR) and 7.08% (UG27) of the respective genomes were constituted by RIP mutations. Based on the criteria described previously [64], all F. circinatum isolates were classified as 'Moderately RIP Affected', with more than 5% of their genomes constituting RIP mutations.
For the data sets containing coding sequences, our analysis showed that RIP mutations constituted an average of 2.08% of this sub-section of the genome ( Table 2, Table S6). However, more than a third (average 34.35%) of all F. circinatum genes contained at least one window indicating RIP. In such cases, RIP was often recorded in single windows (50 bp), with rare instances of consecutive RIP-affected windows (Table S6). Furthermore, the GC content of the data sets with coding sequences was higher (average of 51.46%) compared to that of the remainder of the genome (about 47%). Analysis of the data sets containing sequences for 12 putative effector genes of FSP34, showed that each had at least one window indicating RIP ( Figure S2), although they were all located in close proximity (within 30 Kb) to regions with RIP or were in regions neighbouring LRARs ( Figure S3). For the remaining effector gene data sets in the other fungi, our analyses showed that some genes did not contain evidence of RIP affected (i.e., one in FFRA, and one in CMWF567) (see Figure S2 for details).
As expected for RIP-capable fungi, the frequency and distribution of GC content across the F. circinatum genomes was bimodal, with a major peak around 50% GC and a minor peak around 20% GC ( Figure S4). In order to investigate the occurrence and extent of RIP in the AT-enriched genomic compartments, the F. circinatum genomes were separated into data sets representing GC-light isochores (0-20% and 20-40% GC content) and GC-heavy isochores (40-50% and 50-100%). Regions defined as "isochore" are genomic regions that are alike in their DNA base compositions bias (either GC-or ATrich), which are maintained across the length of the isochore [8]. On average, about 3.9% and 3% of the five genomes were characterized as part of the GC-light isochore, corresponding to genomic regions with 0-20% and 20-40% GC content, respectively ( Figure S5). By contrast, 83.2% and 10.2% of the respective F. circinatum genomes constituted the GC-heavy isochore, corresponding to genomic regions 7 with 40-50% and 50-100% GC content, respectively. For the GC-heavy isochore, di-and trinucleotide frequencies remained similar ( Figure S5). In comparison, the di-and trinucleotide sequences in GClight isochore were enriched for AT-rich combinations, comparable to that of other FFSC fungi [10]. Further analyses of the GC-light isochores showed that the majority of these sequences had RIP mutations (Table 2; Figure 1; Table S7). Between 97.66 and 99.16% of the 0-20% GC content data set were constituted by RIP mutations, as compared to 81.22% to 86.66 of the 20-40% GC content data set. Together, the total proportion of RIP in GC-light isochore represented the greater majority (ca. 97%) of all recorded RIP mutations. By contrast, for the data sets containing sequences representing the GCheavy isochore, RIP mutations only constituted 0.31%, and 3.21% of the overall observed RIP recorded for F. circinatum. Further, the GC-light isochores contained LRARs, while LRARs were completely absent from the GC-heavy isochore sequences.
To evaluate the occurrence and extent of RIP in the core and accessory genomic compartments of F. circinatum, alignment-based analyses were performed to distinguish genomic regions present in all five isolates (core compartment) and those genomic regions that are unique to each isolate (i.e., regions that are part of the accessory compartment) (Figure 1). The size genome unique to an isolate ranged from 3.0 Mb in FSP34 to 3.9 Mb in CMWF567, while the average size of the core genomic compartment was ca. 39.91 Mb (Table S8). Both the core genomic compartment and the parts of the genomes unique to an isolate contained RIP mutations ( Table 2; Table S8; Figure 1). RIP mutations constituted an average of 2.17% of the overall core compartment, which ranged from 2.09% in FSP34 though to 2.27% in UG27 Genes 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/genes and CMWF567. However, in the regions unique to a specific isolate, the proportion constituting RIP mutations ranged from 31.23% in FFRA to 48.48% in UG27.
Homologous sets of F. circinatum chromosomes shared high degrees of sequence similarity (Table S1) and were also very similar in their RIP mutation content (Figure 1). RIP constituted an average of 6.45% of the core chromosomes and varied from 2.48% for chromosome 1 of FFRA to 11.99% for chromosome 10 of the CMWF567 strain. The dispensable chromosome of the isolates examined (i.e., chromosome 12) were comparatively much richer in RIP mutations (Figure 1), where these mutations constituted between 11.85% (FSOR) and 25.96% (UG27) of the respective assemblies (Table 3). However, as reported previously [10], the RIP affected proportion of the 12 th chromosome of FSP34 was much lower (4.85%). The total proportion of the dispensable chromosomes that constituted RIP also showed a strong positive correlation to both chromosome size and TE density (correlation coefficient [r]=0.9 and 0.9 respectively; Figure 1; Table S9). The ends of almost all chromosome scaffolds of the five F. circinatum genome assemblies investigated were characterized by reduced GC content (ca. 28.21%) and high frequencies of the telomeric repeat sequence (Table 4 and Table S10, Figure 2 and Figure S6). The only exceptions were the left end of Chromosome 2 of FSOR and FFRA ( Figure S6). The average number of telomeric repeats (per isolate) at the chromosome ends varied from 62 (FFRA) to 104 (UG27). These putative telomeric regions were also rich in RIP mutations (Table 4; Table S10; Figure S6), where the average size of such RIP-affected regions was ca. 33.98 Kb. Surprisingly, however, telomeric repeats were also enriched for in many of the LRARs in the F. circinatum chromosomes investigated (Figure 2; Figure  S6).
The putative locations of all 11 centromeric regions of the F. circinatum core chromosomes sequences were identified (Table S11). These genomic positions corresponded to those previously characterized in the closely related species, F. fujikuroi and F. verticillioides [57]. The data showed a high level of conservation in the sequences surrounding the centromeric regions in F. fujikuroi, F. verticillioides and F. circinatum (Table S12). However, as expected [65], the reciprocal chromosomal translocation between the distal ends of chromosome 8 and 11, reported previously [65], were the only exception recorded that deviated from the observed synteny and collinearity between these species. In F. circinatum, the putative centromeric regions were on average ca. 36 Kb in size, ranging   (Table S11, Figure  S6). In all cases, these regions were extensively affected by RIP and had reduced GC content at an average of 17.1% (Table S11). The results of this study showed that the F. circinatum isolates had many instances of consecutive RIP-positive windows, and that these extensively RIP affected regions were similar in size and mostly had comparable genomic locations for each homologous set of chromosomes (Table S13). For example, the LRARs of homologous sets of chromosomes 5 had similar distribution patterns, occurred in comparable genomic locations and generally had similar sizes. Regional biases amongst the position of LRARs in specific chromosomes were also observed (Table S13). For example, in chromosome 5 of FSP34, seven predicted LRARs were recorded and the same chromosome in UG27 had 14 LRARs. Overall, between 156 (FSOR and UG27) to 165 (FFRA) LRARs were identified among the five genomes examined (Table 2), where the average size varied between ca. 15 177 bp (FSOR) and 16 056 bp (CMWF567). These LRARs further constituted large proportions of the individual genome assemblies ( Table 2). For all isolates, the LRARs displayed reduced GC content compared to that of the remainder of the genome sequences ( Table 2). Numerous (6-11) LRARs were also recorded in each of the assemblies for the 12 th chromosome ( Table 3), although that of FSP34 contained only one, as reported previously [10].

Loss of synteny and collinearity
Homologous regions associated with ten randomly chosen LRARs were investigated in the five genomes to evaluate whether RIP affected synteny and collinearity in these regions. Except for one of the LRARs that were absent from the CMWF567 genome, homologous locations and regions for all 10 LRARs were identified in the five genomes (Table S14). Overall, the homologous loci were characterized by high levels of nucleotide similarity (91%-100%), but such similarity could stretch across as little as 10% of the compared regions or could span the entire sequence. In other words, the overall query coverage ranged from 10% (e.g., the LRAR on chromosome 6 of UG27) to 100% (e.g., the LRAR located on chromosome 11 of UG27). Two of the ten LRARs (i.e., "LRAR 2" on chromosome 2 and LRAR 3 on chromosome 4; Figure S7) had overall high levels of nucleotide similarity and had no changes in synteny and collinearity in the LRAR and neighbouring genomic regions (e.g. Figure 3). By contrast, the remaining eight LRARs all showed substantial changes in synteny in the regions with RIP, as well as the regions neighbouring them (e.g. Figure 3). The genomic regions neighbouring the LRARs that showed changes in synteny included LRARs numbered 1, and 5-8, that were located respectively on, chromosomes 1, 7, 6, 8 and 10 of the FSP34 assembly (Table S14). The neighbouring regions of LRARs numbered 2 and 3 on chromosomes 2 and 4, respectively, showed no changes in synteny and gene order. Figure 3. Illustration of changes in RIP, GC content, gene content and nucleotide similarity in two Large RIP affected genomic regions (LRARs) and neighbouring genomic regions of FSP34 relative to homologous regions in other isolates of F. circinatum. An LRAR on the 2nd chromosome is indicated on the left, and an LRAR on the 7th chromosome is shown on the right. The top two panels illustrate changes in RIP composite index values (values above 0 indicate RIP), followed by changes in GC content (%) across the particular genomic region. Below them are shown the synteny plots, based on NCBI pairwise alignments (visualized using Kablamo software), for homologous regions for each isolate. The results for all 10 LRAR are illustrated in Figure S and a full list of alignment statistics are listed in Table S13.

DNA curvature
In order to evaluate the influence of RIP on sequence-dependent DNA curvature, an analysis of the intrinsic curvature distribution was carried out on F. circinatum (FSP34) sequences with and without RIP (Table S15, Figure 4). Sequences without RIP had an average intrinsic curvature value of 3.5 degrees per helical turn and varied from 2.7 to 3.95 degrees per helical turn. The regions with RIP had higher average intrinsic curvature values, with an average of 4.55 degrees per helical turn that varied from 3.47 to 4.62 degrees per helical turn. Overall, these sequences showed a non-symmetrical frequency distribution, which is consistent with AT-enriched regions in other organisms [19,66]. Remarkably, a clear shift towards higher intrinsic curvature values were recorded in F. circinatum sequences rich in RIP (Figure 4). The intrinsic curvature values recorded for the ten LRARs of FSP34 were similar to those recorded for regions affected by RIP (Table S16). Detailed analyses of the change in intrinsic curvature performed on the LRAR located at 1 035 000-1 042 500 bp on chromosome 1 of FSP34 and the regions flanking it, also revealed that high RIP and AT content were associated with increased DNA curvature ( Figure S8). By contrast, the regions outside of the LRAR (with higher GC content) had lower curvature values. The latter was also consistent with results for the DNA curvature analysis of the GC-light isochore components and GC-heavy isochore components of FSP34 (Figure 4). The GClight isochore had an average intrinsic curvature value of 6.26 degrees per helical turn that varied from 0.006 to 24.31 degrees per helical turn. The GC-heavy isochore component had an average intrinsic curvature value of 4.06 degrees per helical turn and varied from 0.002 to 19.26 degrees per helical turn. The GC-light isochore components had an overall greater frequency of sequences with higher curvature values per helical turn, compared to those of GC-heavy isochore components.

Discussion
Intraspecific, genome-wide RIP analyses showed that the five F. circinatum genomes investigated harboured extensive evidence of RIP. More specifically, we obtained RIP index values indicative of RIP [10,20,59,60] for the five genomes, and in each genome GC content was also bimodally distributed [2,10]. The five genome further encoded for the gene thought to underly the RIP pathway, including the two 5-cytoseine methyltransferases, DIM-2 (and associated co-factors) and RID [10,45,48,67]. RIP mutations were also more prevalent in the GC-light isochores and accessory sub-genomic compartment of F. circinatum. Comparable to other fungi with RIP, the dispensable chromosome, centromeres and telomeres of F. circinatum were rich in RIP [9,10,64]. Based on RIP mutation content of the five isolates examined, F. circinatum belongs to the 'Moderately RIP Affected' class of RIPcapable Ascomycetes [64], which also includes other Sordariomycetes such as the closely related Fusarium pininemorale and Fusarium temperatum [10,68], and more distantly related fungi such as Verticillium dahlia and Trichoderma reesei [64].
In this study, we show that RIP can accumulate in both the core and dispensable chromosomes of genetically diverse individuals of F. circinatum. Previous studies indicated that RIP extensively affected the dispensable chromosomes of other FFSC fungi, but not that of the F. circinatum isolate examined at the time (i.e., FSP34) [10]. The current study showed that this reduced level of RIP might be linked to the dearth of TEs on the 12 th chromosome of FSP34, and that RIP is in fact common in this dispensable chromosome of F. circinatum. Indeed, the combined actions of TEs and the RIP pathway probably contributed to the isolate-specific differentiation and reduced nucleotide similarity observed for these chromosomes from different isolates of this fungus, although other evolutionary processes such as horizontal gene transfer cannot be excluded. RIP is common in the dispensable chromosomes of fungi, because their repetitive nature provides prime substrates that induce the RIP pathway [4]. For example, the highly repetitive dispensable chromosomes of the Dothideomycetes species, Zymoseptoria tritici, is also rich in RIP [9]. As in F. circinatum, the dispensable chromosomes of Z. tritici are also characterised by many instances of large-scale, isolatespecific sequence differentiation due to the combined actions of TE integration and RIP [9].
The findings presented here demonstrate that RIP affects the core and accessory sub-genomic compartments of F. circinatum to a strikingly different extent. The RIP pathway is widely credited for enhancing regional and chromosomal variation [3,4,10,13,69], but no previous study has set out to quantitively investigate the extent of RIP beyond gene-level analyses in the accessory sub-genomic compartment of a fungus. Our findings show that large proportions of this sub-genomic compartment of F. circinatum were constituted by RIP mutations. The RIP pathway likely represents an important driver of functional novelty in the accessory sub-genomic compartments in isolates of F. circinatum. This is because the accessory genomic compartments, dispensable chromosomes, and more variable regions of chromosomes, typically harbour genes contributing to various aspects of fungal lifestyle, including host range and pathogenicity [3,14,67].
RIP mutations resulting from RIP leakage into regions outside TEs/repeats (i.e., RIP mutations acquired beyond the bounds of the intended targets), most likely contributed to the changes in synteny and collinearity observed in regions neighbouring LRARs in F. circinatum. These changes could also reflect the actions of TE integration, which in turn induced RIP. Also, many of the effector genes previously found to be upregulated during pathogenesis on pine [50] were located on genomic regions in close proximity to LRARs and/or within regions with extensive RIP. These putative effector genes varied in their predicted protein functional domain, and this variation was often isolatespecific. This situation is thus reminiscent from what was observed in quantitative RIP analyses in N. crassa, which showed that RIP can leak up to 40 Kb from its intended target [7]. RIP leakage is also known from other RIP capable fungi and is thought to contribute to the diversification of genes underlying host-pathogen interactions [8,69]. The results presented here are also consistent with studies showing that RIP is acquired in an isolate-specific manner [8,69,70]. A recent study of Fusarium xylarioides (a close relative of F. circinatum) reported RIP-driven diversification were more prevalent amongst putative effector genes than for any other type of gene classification [71]. Together these data suggest that RIP is an important source of variation in gene regions underlying important biological traits, pathogen development and host-pathogen interactions in F. circinatum and other fungi.
Apart from driving variation in specific gene regions, RIP may also be an important source of genome-wide variation in populations of F. circinatum. Large chromosomal regions rich in RIP (i.e., LRARs) occurred across chromosomes where they had patchy genomic distributions and were frequently characterized by reduced nucleotide similarity among isolates, as well as changes in gene order and synteny. The latter often also extended into regions neighbouring such RIP-dense areas of the F. circinatum genome. However, many of these long stretches of RIP affected regions occurred in homologous positions, which suggests that these regions, and LRARs specifically, were maintained over evolutionary time in different F. circinatum strains. Similar RIP distribution patterns have been reported in Dothideomycetes such as L. maculans [8] and Z. tritici [17,72] and in Sordariomycetes such as Neurospora tetrasperma [73,74] and V. dahlia [4]. In L. maculans, extensive RIP was shown to drive isolate-specific diversification [8,70]. Similarly, species-specific RIP directed sequence diversification was recorded in the FFSC and in different isolates of N. tetrasperma [74,75]. In the same way, the cumulative influence of RIP thus may drive the independent divergence of F. circinatum sequences, thereby contributing to genetic variation of different strains or populations of F. circinatum.
Most of the RIP in the F. circinatum genome is accommodated in its GC-light isochore. The current study provides the first quantitative examination of RIP in this isochore of any fungus, and the findings support the view that that the RIP drives the development of the GC-light isochore in RIP-capable fungi [2]. These results presented are in full agreement with reports where surrogate methods were employed (i.e., those analysing changes in genome-wide GC content distribution) for investigating the cumulative influence of RIP [2]. Taken together, our findings suggest that RIP drives localized GC depletion but that it also has a quantifiable influence on the overall genome composition of a fungus in that it underpins the formation and potentially the maintenance of distinct subgenomic environments within F. circinatum.
RIP is particularly prominent in genomic regions that correspond to the telomeres and centromeres of the F. circinatum chromosomes. Detailed investigation of LRARs showed that the location of these regions coincided with a consistent increase in frequency of telomeric repeat sequences in the predicted telomeres and centromeres of the fungus. These analyses also revealed the presence of interstitial telomeric motifs in the F. circinatum chromosomes that were also enriched for RIP. Interstitial telomeric motifs are regions enriched for telomeric repeats occurring in locations other than the telomeres and centromeres of chromosomes and are typically found in genomic regions that have undergone double stranded DNA breaks [56]. Such breaks can occur due to aberrant recombination between TEs [76,77], a process in which telomeric repeat sequences are thought to be important signatures for directing DNA repair mechanisms [56]. This is consistent with the fact that RIP was prevalent in regions of the F. circinatum genome containing high densities of TEs [10]. In other words, regions with high TE density may be more vulnerable to double stranded DNA breaks, causing increased incorporation of telomeric repeat sequences by the telomerase enzyme, whilst the TEs induce RIP.
The results of this study suggest that the RIP pathway may have substantial influence on the genomic composition of F. circinatum and may bring about a range of functionally important genetic changes, thereby influencing chromosomal architecture, epigenetic regulation and higher order chromatin organization. Other fungi that have GC-light isochores enriched for RIP show unique 5cytosine methylation patterns [10,57,78]. These distribution patterns, which correspond with the RIP driven AT-enriched genomic regions, have been suggested to reflect the "mosaic pattern" of DNA methylation, which differ from that observed in most plant and animal lineages [78]. Additionally, RIP-associated methylation of histones and cytosines are important processes for directing heterochromatin formation [20,46], which guides heterochromatic silencing and suppressed recombination of RIP targeted regions. In this way, the RIP pathway may have key roles in the evolution of genome architecture and the epigenetic landscape, to ultimately contribute to higher order chromatin structure of F. circinatum chromosomes.
Finally, our study showed that the RIP pathway may influence the physical properties of the targeted DNA. The F. circinatum sequences with RIP displayed an overall shift to higher curvature values compared to those of regions lacking these mutations, which was also true for its extensively RIP affected GC-light isochore. Such increased curvature was also observed across the length of LRARs. Changes in DNA curvature often correlates with GC depleted regions in some organisms [66,[79][80][81]. Sequence-dependent DNA curvature changes, particularly in regions biased towards AT nucleotides, are thought to increase the flexibility of the DNA, thereby making it more amendable to certain DNA-protein interactions [79]. For example, the increased curvature of the AT-rich centromeric regions of Saccharomyces cerevisiae is essential for facilitating centromere complex formation and assembly in this fungus [79]. Additionally, DNA curvature and AT-rich DNAs play key roles in chromatin organisation and higher order chromatin structure, because the intrinsic curvature and flexibility of DNA determine the ease that DNA would wrap around the nucleosome octamers and the cellular energy needed for the process [22]. Although other DNA-protein interactions can also alter the spatial properties of genomic regions [22,24], our findings suggest that sequence-dependent curvature and sequence composition may contribute to chromatin structure and three-dimensional organisation of the F. circinatum genome.

Conclusions
Intraspecific evaluation of the genome-wide occurrence of RIP has provided valuable insights on the manner that RIP influences the genetic composition and genome architecture of F. circinatum. RIP mutations in F. circinatum were likely caused by induction of the RIP pathway due to TE integration and intragenomic gene duplication, as well as the leakage of the pathway's effects to regions outside of repeated areas. RIP further leads to the formation and possibly maintenance of the GC-light isochore of F. circinatum, and its cumulative effects on nucleotide base composition likely affect the physical properties of the targeted regions. Despite being a prominent feature of all the F. circinatum genomes examined, the higher occurrence and extent of RIP in the various accessory genomic compartments, together with the concomitant changes in synteny and collinearity, highlights the role of RIP in driving intraspecific genome variation in this fungus. The fact that the impact of this pathway also occurred in genes underlying host-pathogen interactions further suggest RIP may be a key driver of pathogen development of this economically important fungus.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/, Figure S1: Phylogenetic relationships of F. circinatum strain investigated in this study; Figure S2: RIP analyses of effector gene data set; Figure S3: Genomic location of effector genes investigated in this study in relation to the genetic features of the chromosome assemblies of F. circinatum (FSP34); Figure S4: Frequency distribution of the GC content of the F. circinatum strains investigated in this study; Figure S5: Changes in nucleotide frequencies per GC content range; FigureS6: Graphical representation of the genetic features of the core 11 Fusarium circinatum chromosome assemblies for each isolate investigated in this study; Figure S7: Illustration of changes in RIP, GC content, gene content and nucleotide similarity in FSP34 Large RIP affected genomic regions (LRARs) and neighbouring genomic regions in comparison to that of homologous regions in other isolates of F. circinatum; Figure S8: Summary of physical features of the LRAR at locus 1 035 000-1 042 500 bp on chromosome 1 of FSP34, relative to its RIP mutation and GC content. Table S1: Summary of alignment statistics and the sizes of F. circinatum chromosome assemblies investigated in this study; Table S2: Summary of the BUSCO analyses of F. circinatum isolates investigated in this study; Table S3: Table S3: The genomic location of the F. circinatum RIP and RIP associated DNA methylation associated genes; Table S4: Name and genomic locations of F. circinatum sequences with similarity to that of the FSP34 query set of effector genes; Table S5: InterProscan results of the genes sharing nucleotide similarity to that of the effector gene data set of FSP34; Table S6: RIP statistics of the coding genomic compartment of the F. circinatum isolates investigated; Table S7: RIP statistics of the GC-light and GC heavy isochore components of the sequenced isolates of F. circinatum isolates investigated in this study; Table S8: RIP statistics of the core and accessory genomic compartments of the F. circinatum strain investigated in this study; Table S9: Correlation coefficients calculated for percentage RIP, count of Transposable elements (TEs) and size (bp) for the dispensable chromosome assemblies of F. circinatum strains investigated in this study; Table S10: List of genomic regions corresponding to the telomeric regions in F. circinatum isolates investigated in this study; Table S11: Summary of the RIP statistics for the putative genomic locations of the centromeric regions of F. circinatum isolates investigated in this study; Table S12: Nearest genes to F. circinatum centromeres and synteny with Fusarium fujikuroi and Fusarium verticillioides; Table S13: List of Large RIP Affected Regions (LRARs) calculated for each genome assembly of F. circinatum strains investigated in this study; Table S14: List of NCBI nucleotide alignment statistics using ten FSP34 Large RIP affected Regions (LRARs) as query sequence; Table  S15: List of RIP statistics and predicted sequence-dependent curvature of genomic regions of FSP34 that had RIP or were not affected by RIP; Table S16: RIP statistics and predicted sequence-dependent curvature of Large RIP Affected Regions (LRARs) investigated in this study.