Why Pashmina Goat Produces Long Hair-fiber and Barbari doesn’t: A Differential Gene Expression Study

The Pashmina and Barbari are two famous goat breeds found in the wide areas of the Indo-Pak region. Pashmina is famous for its long hair-fiber (Cashmere) production while Barbari is not-selected for this trait. So, the mRNA expression profiling in the skin samples of both breeds would be an attractive and judicious approach for detecting putative genes involved in this valued trait. Here, we performed differential gene expression analysis on publicly available RNA-Seq data from both breeds. Out of 44,617,994 filtered reads of Pashmina and 55,995,999 of Barbari which are 76.48% and 73.69% mapped to the ARS1 reference transcriptome assembly respectively. A Pairwise comparison of both breeds resulted in 47,159 normalized expressed transcripts while 8,414 transcripts are differentially expressed above the significant threshold. Among these, 4,788 are upregulated in Pashmina while 3,626 transcripts are upregulated in Barbari. Fifty-nine transcripts harbor 57 genes including 32 LOC genes and 24 are annotated genes which were selected on the basis of TMM counts > 500. Genes with ectopic expressions other than uncharacterized and LOC symbol genes are Keratins (KRT) and Keratin Associated Proteins (KRTAPs), CystatinA&6, TCHH, SPRR4, PPIA, SLC25A4, S100A11, DMKN, LOR, ANXA2, PRR9 and SFN. All of these genes are likely to be involved in keratinocyte differentiation, sulfur matrix proteins, dermal papilla cells, hair follicles proliferation, hair curvature, wool fiber diameter, hair transition, hair shaft differentiation and its keratinization. These differentially expressed reported genes are critically valuable for enhancing the quality and quantity of the pashmina fiber and overall breed improvement. This study will also provide important information on hair follicle differentiation for further enrichment analyses and introducing this valued trait to other goat breeds as well.


Introduction
Hair follicles (HFs) are composed of integumental tissues which consist of outer dermal and inner epidermal layer cells. Their interaction initiates the hair cycle composed of anagen (proliferation), telogen (resting), catagen (regression), exogen (shedding) and kinogen (new growth) phases that ultimately results in keratinized end products including hairs, skin, horns etc. [1]. It is well recognized that the contribution of animal's hair in thermal insulation and visual display affect the human commerce for which they harvest the fiber [2]. Commercially valuable fiber produced are of goats that usually consist of double coat, the outer long hairs made up of primary hair follicles (PHFs) while the secondary hair follicles (SHFs) make up the inner wool coat (the textile fiber cashmere). The under wool coat is < 19 µm in diameter that is obtained mainly from Pashmina goat breed [3]. The fitness of pashmina fiber depends on number and small diameter of SHFs. It is of prime significance to the country's economy as it is globally valued for its warmth, fine, soft and thick fiber. In the past for centuries, this Pashmina goat have roamed in western zones of Tibetan plateau and now it habitats the high altitude of Himalayas in Pakistan, Nepal and Northern India [4]. As a functional mini-organ, the molecular mechanisms controlling HFs in indigenous livestock are now addressed widely. RNA-Seq technology due to its cost-effectiveness provides such platform to elucidate the novel transcripts, transcripts with low or high abundance and differential gene expression among various breeds. This including other high throughput sequencing technologies can also provide a detailed and accurate molecular insight of the organism under consideration for which the genomic resources are limited.
Herein, we focused to analyze the publically available datasets of Pashmina and Barbari goat's transcriptome skin sample. In contrast to Pashmina goat breed, Barbari is small, short-haired, meat and milk yielding breed and is indigenous to Pakistan, Nepal, India, Mauritius and Vietnam [5]. We compared the digital transcript expression profiles from both breeds to look for highly expressed genes related to HFs.

Sample source and quality analysis
Whole transcriptome skin sample of Pashmina and Barbari goats were selected to analyze the differential expression of wool/fiber producing genes. Both breeds are shown in Figure 1. Sequencing reads were taken from ENA source under the experiment accession: SRX3414248 and SRX6655258 respectively. Prior to mapping and quantification of paired end fastq sequences, files were first subjected to quality checks using FastQC (v0.11.8) software [8]. Only the clean reads were then retained by filtering low quality bases by Trimmomatic (v0.36) using sliding window approach [9].

Normalization of quantified expression levels
We calculated number of mapped reads to each transcript directly from filtered reads by Salmon tool [10] in mapping based mode providing Capra hircus transcriptome reference fasta file to perform mapping as an intermediate step.
The normalization of raw counts using Trinity's tool script [11], abundance_extimates_to_matrix.pl reported TMM (Trimmed Mean of M-values) and TPM (Transcripts Per Million) values which are effective to eliminate sequencing and gene length biases.

Gene expression profiling and visualization
The differential expression in transcripts of Pashmina and Barbari goat's skin sample was analyzed to figure out the expression of key genes related to wool production. The number of expressed transcripts in TPM normalized matrix was counted using count_matrix_features_given_MIN_TPM_threshold.pl script of Trinity [12] and then used plot function on R for visual inference. Pairwise comparison between the two samples using counts matrix identified DE transcripts by running script run_DE_analysis.pl. The parameter method was set to edgeR and dispersion value 0.1. This generated MA and volcano plots. From TMM matrix, differentially expressed transcripts (DETs) were extracted and clustered using analyze_diff_exp.pl script. Transcripts were considered significant that have p value at most 0.001 and log2 fold-change (log2FC) of 2. Heatmap and correlation graph were also generated [13].

Quality checks and mapping
Trimming of 47,830,768 reads of Pashmina breed resulted in survival of 44,617,994 (93.28%) pair of reads while from a total of 55,995,999 reads of Barbari 55,502,832 (99.12%) pair of reads retained after cleaning of low quality bases. Overall GC% of both breeds is 52%. Quality scores graphs is shown in (Figure 2). In addition, 76.48% (34126799 reads) of Pashmina breed and 73.69% (40902778 reads) of Barbari mapped successfully to the reference ARS1transcriptome.

Identification and screening of expressed transcripts
To detect the key genes related to cashmere production in Pashmina and Barbari, the pairwise comparison of both breeds was carried out. On the basis of TPM matrix maximally, 47,159 transcripts are found to be expressed. ( Figure  3) displays the distribution of these expressed transcripts as MA and Volcano plots. While calculating the DETs at a significant threshold (p-value cutoff for FDR < 0.001 and log2FC > 2), we found 8,414 transcripts that appeared to be differentially expressed between Pashmina and Barbari (Table S1). Among the DETs, 4,788 are upregulated in Pashmina (Table S2) while 3,626 transcripts are upregulated in Barbari breed (Table S3).

High differential expression of HF genes
Two-dimensional hierarchical clustering of differentially expressed transcripts was performed to group the similar ones based on the expression patterns which classified the differentially expressed features on a heatmap as shown in (Figure 4). The transcripts with the highest expression in Pashmina and Barbari were separately analyzed to look for the genes involved in wool production. A high expression level having the average normalized counts > 500 was selected to single out the most expressed genes related to HFs that divulged 59 transcripts consisting of 57 genes including 32 LOC symbol genes and 24 known genes. Table 1 detailed the known genes that have role in keratinization pathway, hair development or morphogenesis, fiber diameter etc.  Table 3 detailed the identified LOC symbol genes that are differentially expressed. These are involved in keratinization pathway, keratinocyte differentiation, follicle transition from catagen to telogen phase while rest of LOC symbol genes are uncharacterized and thus are not mentioned and counted here.

Discussion
To investigate the genes that are responsible for wool/cashmere production, we performed the differential expression analysis on the transcriptome profile of skin samples from two goat breeds, Pashmina and Barbari. Pashmina goat provides protection against the cold harsh environment via its luxurious fiber while Barbari goat breed inhabits the hot arid environment and are valued for meat and milk yield [4,5]. The top most DETs were then functionally annotated to get molecular insight of its corresponding genes. Gene ontology terms of highly expressed HF transcripts in Pashmina and Barbari goat breed revealed that most of the genes are related to keratin differentiation and are involved in keratinization pathways. Several scientific evidence is available and many studies report the genes concurrent to our results. It is previously suggested that keratins (KRT) and Keratin associated proteins (KRTAPs) are the main structural component of hair fibers and sheath [15]. The interaction of fibrous KRTs and matrix KRTAPs are the foundation of cornified skin appendages such as hairs, horns. An analysis on Chinese Tan sheep's skin transcriptome revealed the potential genes for curly fleece and hair/wool formation including KRT27, PRR9 and CST6 that ranged from 3,353 to 7,813 FPKM reads [16]. Another study reported the high expression of SFN gene in Changthangi goat [15]. This Stratifin gene regulates the epithelial keratinization by directly regulating the cell cycle. It has been best selected for human dermal papilla cells (DPCs) as well. Similarly, they analyzed CSTA and KRT77 gene having fold change values of 4.63 and 3.46 respectively that helps in keratinocyte differentiation. A recent analysis on differential expression in Canine anagen, telogen HFs and interfollicular epidermis (IFE) [13] presented that hair follicle associated structural genes (TCHH, KRT25) have high expression levels 3,997.42 and 10,456.27 during anagen phase. Similarly, KRT1, KRT5, KRT10, KRT14 and LOR are epidermal structure-associated genes and have high TPM counts (21,712.07, 12,733.06, 33,847.68, 29,472.32 and 7229.10) in IFE samples. RNA-seq analyses on Ovis aries revealed S10011A gene association with hair/wool development in them [17]. Small proline-rich protein 4 that has TMM count > 500 in our Pashmina breed is a class of cornified envelope and are found around 300 kb of epidermal differentiation complex. An analysis on sheep found its expression in HFs, epidermis and capillaries [18]. Importantly, the LOC symbol genes were also targeted in this study with known functions. A little is known about them but they have been characterized and are previously reported. A study elucidated the role of these LOC symbol genes in transition of HFs from catagen to telogen phase/telogen induction and their differential expression in either phase with FPKM, log2(FC) and P-values [19]. Other uncharacterized proteins such as LOC108635855 encoded by XR_001917982.1, LOC108635656 (XR_001917911.1) and many others were also divulged in Pashmina breed with high expression counts but they are not included here.
In conclusion, Pashmina and Barbari breed's skin transcriptome sample were subjected to differential expression analysis to search out potential genes related to HFs. Differential expression patterns above a significant normalized count in the two breeds under consideration suggested the involvement of those genes in keratinization pathways, keratinocyte differentiation, sulfur matrix proteins, dermal papilla cells, HFs proliferation, hair curvature, wool fiber diameter, hair transition, hair shaft differentiation and hair follicle keratinization. All DETs identified in this study can be biologically evaluated in future for pashmina/cashmere fiber evolvement in animals and provides some possible clues for further investigations on gene enrichment analysis of hair development genes.