Detection of whole genome selection signatures of Pakistani Teddy goat

Natural and artificial selection tend to cause variability that contributes to shape the genome of livestock in a way that differentiates them among the animal kingdom. The particular aim here is to identify positive selection signatures with whole genome pooled-sequence data of Pakistani Teddy goat. Paired-end alignment of 635,357,043 reads of Teddy goat with (ARS1) reference genome assembly was carried out. Pooled-Heterozygosity (Hp) and Tajima’s D (TD) are applied for validation and getting better hits of selection signals, while pairwise FST statistics is conducted on Teddy vs. Bezoar (wild goat ancestor) for genomic differentiation, moreover annotation of regions under positive selection was also performed. Hp score with − ZHp > 5 detected six windows having highest hits on Chr. 29, 9, 25, 15 and 14 that harbor HRASLS5, LACE1 and AXIN1 genes which are candidate for embryonic development, lactation and body height. Secondly, − ZTD value of > 3.3 showed 4 windows with very strong hits on Chr.5 & 9 which harbor STIM1 and ADM genes related to body mass and weight. Lastly, − ZFST < − 5 generated four strong signals on Chr.5 & 12 harbor LOC102183233 gene. Other significant selection signatures encompass genes associated with wool production, prolificacy and coat colors traits in this breed. In brief, this study identified the genes under selection in Pakistani Teddy goat that will be helpful to refining the marker-assisted breeding policies and converging required production traits within and across other goat breeds and to explore full genetic potential of this valued species of livestock.


Introduction
Capra hircus are small ruminants whose domestication dates back to ~ 10,000 years ago [1]. Goats are mainly reared for meat, milk and wool production [2]. Classical domestication and breeding practices allowed geneticists and animal breeders to explore the inheritance of the economical traits in this species. Meanwhile, industrial revolution and commercial needs triggered the masses to develop genomic technologies along with refining husbandry techniques to get maximum outputs by adaptation of this species to diverse environments and make them specialized for valued products. Estimated goat population in Pakistan is 76.1 million which produces ~ 940 thousand tonnes of milk and ~ 344,000 tonnes of meat annually [3]. Spontaneous phenotypic mutants have been studied and selected by artificial selection methods which are hallmarks of today's goat breeds like coat colors, meat and milk production, fecundity and adaptation traits. Teddy goat in Punjab, Pakistan is characterized by its short stature, weighing approximately 23-34 kg, quality meat production and weather tolerant characters [4]. In our previous attempts for searching selective sweeps, we identified coat colour structural variants, SNPs and CNVs in 20 domesticated Pakistani/Swiss goat breeds [5], and particularly analyzed 8 Pakistani breeds for putative variants responsible for large body size traits [6].
Aim of the current study is to report complete breed defining hotspots with reduced heterozygosity called "selective sweeps" in this Teddy goat breed from Pakistan, which 1 3 is primarily raised for meat purposes in this region. Ultimate goal of this endeavor is to aid in selecting particular characteristics of this breed through artificial selection which could also be helpful to explore the full genetic potential of this breed in its successive generations and to conserve the genetic resources of this species.

Sample collection and whole genome pooled sequencing
Whole blood samples of Teddy goat (n = 12) from home tract of this breed and Bezoar wild ancestor (n = 8) were collected from Punjab/Pakistan and Switzerland respectively. Genomic DNA of Pakistani breeds were extracted through standard protocol using TIANGEN biotech (Beijing) CO., LTD [7], while Bezoar DNA extraction was performed at Institute of Genetics, University of Bern, Switzerland. Extracted DNA were mixed into a single pool/breed in equimolar ratios. High-throughput sequencing was conducted using Illumina HiSeq3000 platform which generated 150 bp paired-end ~ 300mio reads. These sequences were further submitted to European Nucleotide Archive (ENA) under Project ID: PRJEB23815 with sample accession numbers ERS2037817 for Teddy and ERS2037806 for Bezoar. Characteristics & representative animals of Teddy (breed), Bezoar (wild ancestor) and San Clemente (reference genome) are shown in Table 1; Fig. 1.

Genome wide selection scanning
Three statistical tests were applied for the detection of genomics selection imprints left in this Teddy breed genome and generated the SNVs. Resulting values were then Z-transformed by applying Z values = -(valueμ/σ) such that if any − ZHp value > 5, −ZTD > 3.3 and − ZF ST < − 5 as a best hit for considering that window under positive selection.

Detecting selective sweeps using Pooled-Heterozygosity (Hp)
First of all we calculated Hp score for both pools by using an in-house Ruby script which applies Hp = 2Σn MAJ Σn MIN / (Σn MAJ + Σn MIN ) 2 , where (n MAJ ) and (n MIN ) are major and minor allele counts with window-size of 150 kb.

Goat reference genome
ARS1 goat reference genome accession number GCF_001704415.1 was obtained from NCBI and used for annotations.

SNP data visualization
R software was used for the construction of SNP density graph of -ZHp, −ZTD and − ZF ST scores using CMplot package [15]. For −ZHp,−ZTD and − ZF ST scores, manhattan plots were constructed by qqman package on R. Horizontal threshold lines were drawn on manhattan plots which signifies the chosen cutoff values ( −ZHp > 5,−ZTD > 3.3 and − ZF ST < − 5).

Quantile-quantile plots and histogram
The −ZHp,−ZTD and − ZF ST values were plotted against expected values as normal Q-Q plots using the function "qqnorm" on R software [16]. Standard normal distribution diagonal line which represents the expected values was drawn with "qqline" function on R. To check the distribution of −ZHp,−ZTD and − ZF ST values across all autosomes, histograms were constructed using the function "hist".  Fig. S2).

Selective sweeps and harbor genes
The windows under positive selection were obtained after setting thresholds on the basis of previously published data and the rationale observation of current data. Further finemapping and annotations were performed which revealed that body weight/mass, reproduction, milk production, litter size, wool production and coat color related genes harbor in these selective sweeps (Table 2).

Selection footprints by Pooled-Heterozygosity analysis
Hp is applied using sliding window approach. Based on the generated scores (Fig. 2) Table S1). In addition, theoretical distribution of millions of SNPs on x-axis are mapped with observed − ZHp scores on y-axis ( Supplementary Fig. S3a), where black line shows deviated SNPs from the tail at both ends as compared to red line of theoretical distribution. Likewise, histogram of − ZHp scores across all autosomes on x-axis vs. its frequencies on y-axis reveal that only handful of SNPs have either very low or very high − ZHp values (Supplementary Fig. S3b).

Genetic hitchhiking by Tajima's D statistics
By applying TD statistics, 41 top hit windows of 150 kb with the cutoff threshold of > 3.3 harbor strong selective sweeps (Fig. 3).  Table S2).
To evaluate the significance of − ZF ST scores, the Q-Q plot is also constructed which shows that the observed (sample) quantiles of the − ZF ST values detected against the expected (theoretical) quantiles has some outliers at tail (Supplementary Fig. S5a). These deviated SNPs whose values are far from zero are expected to be responsible for the genetic differentiation between the two populations. Moreover, the frequencies of fixation index profiles are also

Common selective sweeps observed by Hp, TD and F ST statistics
Several significant windows were found common among at least two of the three applied statistics and proposed as the best selection hits. Total 18 common positive signatures harbor 12 genes associated with meat, milk, body height and reproduction traits (Table 3).

Discussion
Selective sweep regions in the Teddy breed which either under natural or artificial selection are detected by three statistics e.g. Hp , TD and F ST . The main trait of Teddy goat breed is its short stature and tender meat production whose consumption in Pakistan is preferred among other mutton breeds/species due to its soft and leaner properties on various religious occasions. Also improved reproductive performance and better fertility rate are other valued attribute of this goat breed. Recently, this meat producing Teddy breed is artificially selected for milk yield, as we find six genes affecting lactation. Similarly, strong selection for wool production and coat color in Teddy goats due to economic interest might have led to the detected selective sweeps.
Our study found 100 promising genes in this breed from which 32 genes influence meat production. Examples include LRRIQ3, TGFBR3 and FGGY that also controls the number, diameter of muscle fiber which affects meat quality, tenderness, fat deposition and muscle growth in swine and average daily gain in Nellore cattle [26][27][28][29]. Nine genes putatively associated with embryonic development, high fecundity rate that additionally control litter size are also observed e.g. BIRC6, RTEL1, SYNRG etc. [20,25]. Similarly, we identified FOXO3, LACE1, TTC27, RTEL1, TDRD3 and VPS13B related to lactation [22,23,30]. The genetic architecture underlying body height trait in Teddy presents 8 candidate genes e.g. SCAPER and AXIN1 associated with body height trait in humans [24]. Other selection hits include genes PKIA and KIT which functions for hair fleece development enhancing wool production and in defining coat color phenotypes respectively [5,25,31].
In conclusion, we found several strong signals around distinguished genomic regions especially harboring Chr. 5, 9, 12, 15, 14, 25 and 29. A set of common significant windows were also revealed by more than one aforementioned statistical methods which are located on Chr. 3, 4, 5, 9, 11, 12, 13, 15, 17, 21. Further fine-mapping is still needed for more comprehensive understanding of selective sweeps in this breed. This research provides genome wide maps of selection footprints in this Pakistani goat, that will help in better understanding of genomic architecture of this breed and for further improvements of its existing production traits.