Whole-Exome Analysis Reveals X-linked Recessive Mutations underlying Premature Ovarian Failure using Galaxy Platform

An in-silico WES approach using the Galaxy platform was adopted in the current study to predict the genetic basis of Premature Ovarian Failure (POF), where three affected patients in a Saudi Arabian family of seven, found associated with X-linked recessive mutations. The current analysis discovered 518,054 variants using FreeBayes variant caller that had 1,461,864 effects on variable sites in the genome revealed by SnpEff software. The causal genetic mutations were filtered and annotated with the ClinVar database using the GEMINI tool. This tool retained 369 pathogenic mutations harboring 130 genes. Among the total, 268 variants positioned on 69 genes are shared with three affected individuals, 61 variants on 23 genes are shared by any two of the affected individuals, and 40 of the variants on 38 genes are present in any one of the affected sample. Two mutations in one of the already POF-associated, POF1B gene were also observed e.g. (i) g.84563135T>A; p.M349L and (ii) g.84563194C>T; p.R329Q in the two affected individuals i.e. IV-I-C & IV-6 in the current data. This gene consists of 17 exons that span the region of >100 kb. The putative function of this gene in regulating the actin cytoskeleton due to homology with myosin tail and maintains a number of oocytes during fetal ovary development. In a nutshell, this Galaxy pipeline facilitates all-in-one to pinpoint not only the known pathogenic gene mutations for this disorder but few other novel genetic variants as well, whose gene-disease association may be validated by further experimental studies. inheritance predispositions inheritance known homozygous mutations on POF1B gene, (i) g.84563135T>A; p.M349L (ii) g.84563194C>T; p.R329Q. on former mutation its pathogenicity, analyses on later mutation describing the role, functioning and alteration are novel candidate mutations POF

comprehensive genetic studies on patients with POF disorder. Early cessation of ovarian function prior to age 40 in women, refer to as POF, leads to successive concomitant disorders such as, rise in serum follicle-stimulating hormone (FSH) levels to greater than 40 IU/L and amenorrhea of 4-6 months (2). The occurrence of POF in women above 40 years of age is 1/100 while, this is less frequent 1/1000 in women below age, 30 (3). In many idiopathic POF cases, there is greater 50 to 90% likelihood of genetic involvement. Among them, it is observed that 10-30% are the cases where first degree relative is affected also with 6 times chances of having POF in woman with affected mother (4). The genetic architecture of the POF still remain to be elucidated, however, few clinical exome based diagnostic analyses provide key mutations in POF1B gene and their association with POF disorder. A critical region that spans 100kb of genomic DNA and consist of 17 exons on X chromosome, which function for early normal ovarian growth, harbors this gene (5). Expression analysis of this gene in mouse proposed its role in early ovary growth as, this gene also escapes the X chromosome inactivation which suggest its contribution in ovary development (6).
Here, we intend to implement the WES pipeline on Galaxy platform to unveil the causal variants responsible for the POF disorder. For this purpose, we uploaded seven paired-end exome sequencing reads in galaxy history that were retrieved from ENA database. The data was of a Saudi Arabian consanguineous family where the clinical investigation revealed three sisters, IV-1-C, IV-6, and SAPOF with POF disease. Candidate gene mutations were then annotated with ClinVar entries, disease names and its phenotypes. The in-silico analysis on Galaxy software holds valuable place in clinical studies for diagnosing the genetic causes of heterogeneous diseases.

WES family data retrieval and quality assessment
Whole-exome paired-end fastq datasets of a Saudi Arabia family with POF phenotype were retrieved from European Nucleotide Archive (ENA) under projectID = PRJNA260607 (7). These WES datasets were uploaded in Galaxy history that comprises seven members of a family where parents are immediate cousins. The three patients age 19, 24 and 35 years suffer from primary amenorrhea, hypothyroidism and hypergonadotropic hypogonadism while two daughters remain unaffected from the disorder (4). The sequencing reads were assessed for validating the quality of data using FastQC software (Galaxy v. 0.72+galaxy1) (8). A customized report of all the FastQC results was generated using MultiQC (Galaxy v. 1.7) (9).

Predicting the putative pathogenic gene variants
Narrowing down our search for deleterious variants that define the patient's POF phenotype, we first loaded the annotated variant information into the GEMINI database framework using GEMINI load tool (Galaxy v. 0.20.1) (14). This GEMINI-specific database dataset along with pedigree file was fed to GEMINI inheritance pattern (Galaxy v. 0.20.1) tool adding the x-linked recessive mode of inheritance constraint to the identified variants. This tool integrated additional annotations from ClinVar database and returned a handful of possible pathogenic mutations.

Alignment and variant annotation of WES POF datasets
The quality statistics of paired-end POF family WES datasets was figured out to mitigate any sequencing errors prior to mapping it. The computed statistics clearly indicated a good overall quality of datasets with average 50% GC content and 100bp average sequence length ( Figure 1).  The significant fraction of variations appeared on intronic region and on exons as pictured in Figure 2.

Detection of pathogenic and novel mutations associated with POF
To determine the mutational effect of the variants on patients, we screened the genes harboring variants that holds for x-linked recessive inheritance pattern type rather than autosomal recessive. This candidate

Discussion
In current computational investigation, we recruited WES datasets on Galaxy software of a Saudi Arabian family where three sisters IV-1-C, IV-6, and SAPOF were clinically diagnosed with POF disorder. They were found suffering from idiopathic hypergonadotropic primary amenorrhea with hypothyroidism, atrophic ovaries having normal 46, XX karyotype (4). Prior SNP analysis and functional study on this family data by (4), identified an autosomal recessive mutation on MCM8 gene c.446C>G; p.P149R that manifests POF, endocrine dysfunction and chromosomal instability. Consequently, we searched for the causative pathogenic mutation that met the X-linked recessive inheritance filter criteria by executing Galaxy software WES pipeline. Genetic predispositions for POF often comply with an X-chromosomal inheritance pattern and these families usually have an early onset of this disorder before age 31 (15). Our WES framework divulged two known homozygous mutations on POF1B gene, (i) g.84563135T>A; p.M349L and, (ii) g.84563194C>T; p.R329Q. There are no evident studies on former mutation that can best describe its pathogenicity, however, analyses on later mutation describing the role, functioning and alteration are present. Moreover, we discovered some novel candidate mutations from 268 that are not reported before in ClinVar, but might have role in POF due to its incidence in all patients.
A mutational study of POF1B gene performed on a Lebanese family WES data of 5 affected sisters identified the homozygous mutation R329Q by whole-genome SNP typing and homozygosity-bydescent mapping (16). They hypothesized that POF1B shares homology with myosin tail and thus it plays a function in actin-filament interaction. In vitro examination conducted on mutant and wild-type proteins showed hindrance in the interaction of mutant with actin four times than the wild-type POFIB. They speculated that the loss of function of mutant type is probably due to lack of phosphorylation at serineleucine-arginine site (17

Conclusion
Detecting and predicting the pathogenicity of genetic variants using omics data and associating its causality with any disorder is an expedient in-silico approach. We retrieved a WES data of one Saudi Arabian family suffering in POF and firstly, developed a Galaxy pipeline using freely available tools and software to scan candidate gene variants in this disorder. Secondly, two known candidate mutations were found in coding region of POF1B gene that are already reported with this disorder along with some other novel variants as well. We found this bioinformatics pipeline practically robust that offers the theoretical basis to find the genetic variants using WES data and its initial detection. Further, these detected variant's association can be validated by wet-lab experimentation for better understandings and more confidence.

Acknowledgement
Authors are thankful to the Magee-Womens Research Institute & Foundation, Pittsburgh, PA, USA for providing this data publically available for the researchers to explore and understand the genetic basis of this disorder for the awareness of affected community and overall humanity in general.