1 Mutational hotspots and protein interactome analyses of collagen-2 specific chaperone-HSP 47 3 4

Heat shock protein 47kDa (HSP47) serves as a client-specific chaperone, essential for 21 collagen biosynthesis and its folding and structural assembly. To date, there is no comprehensive 22 study on mutational hotspots and protein network for human HSP47. Using five different human 23 mutational databases, we deduced a comprehensive list of human HSP47 mutations and we found 24 24 67, 50, 43 and 2 deleterious mutations from the 1000 genomes data, gnomAD, COSMICv86, 25 cBioPortal, and CanVar. We identified thirteen top-ranked missense mutations of HSP47 with the 26 stringent cut-off of CADD score (>25) and Grantham score (≥151) as Ser76Trp, Arg103Cys, 27 Arg116Cys, Ser159Phe, Arg167Cys, Arg280Cys, Trp293Cys, Gly323Trp, Arg339Cys, Arg373Cys, 28 Arg377Cys, Ser399Phe, and Arg405Cys with the arginine-cysteine change as the predominant 29 mutation. We also found that HSP47 is up-regulated and down-regulated in 11 and 4 of cancers 30 types. Upon constructing protein interactome map of human HSP47, we found that a set of 31 molecular chaperones is interaction partners of HSP47, which included two copies each of CREB 32 binding proteins, HSP27, HSP40, HSP70, HSP90, ubiquitin proteins and one copy each of cartilage 33 associated protein (CRTAP), HSPH1, HSBP1, FK506-binding protein B (FKBP), kruppel-like factor 34 (KLF13), peptidyl-prolyl isomerase PIPB and Prolyl 4-hydroxylase beta subunit (P4HB). This 35 suggested a cocktail of different chaperones interact with HSP47. These findings will assist in the 36 evaluation of roles of HSP47 in human disease including different types of cancers. 37


Introduction
Heat shock protein 47 kDA (HSP47) serves as an endoplasmic reticulum (ER)-residing collagenspecific chaperone and it has the cavalier role in collagen biosynthesis and its structural assembly [1,2].HSP47 protein is the product of the human SERPINH1 gene, which belongs to the group V6 in the indel-based group-wise classification of vertebrate serpins [3].Structurally, HSP47 possesses a typical serpin domain (Pfam ID -PF00079 and InterPro ID -IPR000215), composed of threeβ-sheets (s) and nine α-helices (h) as sA-sC and hA-hI, respectively [4].HSP47 lacks inhibitory function due to mutations in its reactive center loop (RCL) [1].Over last five decades, HSP47 has been extensively characterized by biochemical and biophysical methods to demonstrate its roles in the collagen biosynthesis.Recently we have characterized the evolutionary history of HSP47 [1].Human HSP47 is associated with several human diseases like a familial connective tissue disorder, known as Osteogenesis imperfecta (OI) [5], rheumatoid arthritis [6] and different cancer types [7].A missense mutation in the HSP47 protein leads into a lethal form of OI which is triggered by improper regulation of collagen type I and it leads into bone fragilities and deformities, short stature, and shortened lifespan and the higher risk of bone fractures [8].To understand further roles of HSP47, there is a need to evaluate mutational profiles of HSP47, expression patterns of HSP47 in human diseases and building understanding on HSP47-protein interaction partners.These are not a single study known until today, which focuses on these issues.Hence, an investigation focusing on these aspects of HSP47 is warranted.Herein, we depicted mutational hotspots for disease perfectives focusing using population genomics-based mutation resources such as 1000Genomes [9] and gnomAD [10] and cancer genomics-based mutation resources such as COSMIC version 86 [11], cBioPortal [12] and CanVar [13].We also examined the expression pattern of HPS47 in the different cancer types.We have also constructed interaction maps of HSP47 and identified top 20 interactions partners and most of these are different heat shock proteins.
We have identified a total of 82 missense variants, which are causing 75 mutations in the human HSP47 protein sequence (Table 1).We have computed the deleterious nature of these missense variants using SIFT [14] and PolyPhen V2 [15] and there are 24 deleterious variants predicted both by SIFT (score <0.06) and PolyPhen V2 (score >0.45), while 21 are predicted deleterious by PolyPhen V2 only and 3 are only predicted by SIFT only (Table 1).We considered single tool prediction as partly deleterious nature, which is 24 in total.

Summary of HSP47 mutations in human cancer
We have evaluated the HSP47 mutations in different cancers using three different cancer mutational resources as COSMICv86, cBioPortal, and CanVar.COSMICv86 has 119 mutations in HSP47 surpassing various cancer types (Figure 3 and Table S3).These mutations are found in 121 cancer patients with 88 missense mutations, 30 synonymous mutations, 2 nonsense mutation and 1 inframe deletion (Figure 3A).Major types of nucleotide changes are G>A, C>T, G>T, and C>A (Figure 3B) with mutations 48 (40.34%), 35(29.41%), 10 (8.40%) and 9 (7.56%).On examining missense mutations, we found that these mutations are found in 18 different cancer types with top five being cancers of large intestine (23 mutations), liver (10), stomach (9), lung (8) and oesophagus (7) (Figure 3C).Upon computing deleterious nature of these COSMIC missense variants of HSP47 with CADD score (>20) and Grantham score (>50), we found 50 deleterious missense mutations on the various locations of HSP47 protein in various cancers (Figure 3D).Using the cBioPortal, we identified 168 cancer mutations of HSP47 (Table S4), which included 163 missense mutations (Table 2), five nonsense mutations, three fusion mutations and one each of frameshift deletion and in frame deletion (Table S4).These mutations were found in 45 different types of cancers (Tables 2 and S4) with top cancer types with HSP47 mutations are 26 mutations for uterine endometrioid carcinoma, 15 each cutaneous melanoma and bladder urothelial carcinoma, 14 mutations in stomach adenocarcinoma and 12 mutations in colorectal adenocarcinoma (Tables 2 and     S4).There are 91 different missense mutations of which are 43 are predicted as deleterious mutations (Tables 2 and S4).Upon applying the strict cut-off of the CADD score >25 and Grantham score ≥151 (radical), we found only 6 highly ranked missense variants as top-ranked variants -Arg167Cys, Arg280Cys, Trp293Cys, Gly323Trp, Arg373Cys and Arg377Cys (Figure 3D).Additionally, there are three nonsense mutations as Glu251*, Gln368* and Glu375* spanning to five different cancer types (Table S4).CanVar is a specific database of colorectal cancer samples and there are only 2 deleterious missense variants of HSP47 (Table 3).

Overview of deleterious mutation of human HSP47 deduced from various resources
We have identified a comprehensive list of deleterious or pathogenic mutations using five different resources.Figure 5 depicts the sharing pattern of deleterious missense mutations of HSP47 in these 5 resources.[9] and gnomAD [10]) and cancer databases (COSMIC version 86 [11], cBioPortal [12] and CanVar [13]).
Only 3 deleterious missense variants were shared by these four databases (1000G-COSMIC-cBioPortal and gnomAD).While one, three, one variant was shared by a combination of three Upon computing top-ranked deleterious variants from all these resources using the stringent cut-off of the CADD score >25 and Grantham score ≥151, we found thirteen top-ranked deleterious missense mutations as Ser76Trp, Arg103Cys, Arg116Cys, Ser159Phe, Arg167Cys, Arg280Cys, Trp293Cys, Gly323Trp, Arg339Cys, Arg373Cys, Arg377Cys, Ser399Phe, and Arg405Cys.Eight out of these thirteen are arginine mutated into cysteine.We performed structural comparisons of these 13 deleterious mutations using protein models deduced using canine HSP47 (PDB Id -3ZHA) as a template and structural changes are depicted in Figure 5. of HSP47.This structural study is based on the homology models of human HSP47 (wild type) and mutant HSP47 deduced using canine HSP47 structure (PDB Id -3ZHA) as a template with SWISS-MODEL [17] and by superimposing and visualizing in YASARA [18].
Herein, we summarize and describe the implications of deleterious mutations of human HSP47.In the N-terminal segment, and only Leu6Pro is potentially deleterious from the 1000 genomes data, while, the gnomAD dataset has six deleterious mutations as Arg2His, Ala9Thr, Ala19Thr, Lys22Asn, Ala40Val, and Thr42Met (Figure 6).The COSMIC database has three deleterious mutations -Arg2Cys, Ala19Thr and Ala40Val with one sample mutated with each mutation in stomach, pancreatic and large intestine cancers (Figure 3).cBioPartal shares two deleterious mutations as Lys22Asn and Ala40Val with one sample each mutated in breast invasive ductal carcinoma and colorectal adenocarcinoma (Table 2).
The helix hA has no deleterious variant in 1000 genomes dataset (Table 1), but three variants are deleterious in gnomAD as Ser47Gly, Leu50Gln, and Leu54Phe (Figure 3), while COSMIC has one deleterious mutation as Gly49Asp with mutation in sample of large intestine (Figure 3) and cBioPortal has one deleterious mutation as Ala44Thr in three samples of esophageal squamous cell carcinoma (Table 2).The sheet s6B has a single deleterious mutation in a sample of pancreatic cancer from the COSMIC dataset (Figure 3), but none in 1000 genomes (Table 1), gnomAD (Figure 2) and cBioPortal (Table 2).
The loop between sheet s6B and helix hB possesses two pathogenic mutation Ser70Pro and Pro71Ala from GnomAD data (Figure 2), but none from other datasets.The helix hB has two deleterious mutations as Ser76Leu and Leu78Pro from the 1000 genomes (Table 1).
Mutation Leu78Pro will cause instability for the helix hB as generally, prolines are helix breaker residues.Hence, it has severe implication as the Leu78Pro mutation causes degradation of the ERresident HSP7 via the proteasome and it leads into OI [8].The gnomAD data has also three missense variants as Ser76Leu and Ser76Trp at the same position and Ser77Leu (Figure 2).COSMIC data has two deleterious mutations at the amino acid position 75 as Ala75Ser with one sample of liver cancer and Ala75Thr with two samples of thyroid cancer, respectively (Figure 3).The cBioPortal data has also Ser76Leu with one sample each with cutaneous melanoma and uterine endometrioid carcinoma and two deleterious variants as Ser77Leu with one sample mutated with diffuse large B-cell lymphoma (NOS) and Gly79Val in one sample each with cutaneous melanoma (Table 2).Ser76Trp is top-ranked deleterious variant (CADD score >25 and Grantham scores ≥151) and change of serine to tryptophan leads into increment conformational space in HSP47 protein´, depicted using surface area (Figure 5A).These mutations in the helix hB are adjutant to known OI-causing Leu78Pro mutation, hence these must be given priorities in future studies.
The loop between helices hB-hC harbors a pathogenic variant as Gly85Asp with one sample of uterine endometrioid carcinoma (1) gathered from the cBioPortal data (Table 2) The helix hC has total three variants with two deleterious mutational sites as Ala90Thr and Ser91Leu from the 1000 genomes (Table 1 and Figure 6).Mutation Ala90Thr will have higher implication as it causes large changes in the middle of the helix hC in comparison to small amino acid changes for Ser91Leu.Ser91Leu mutation is also possessed by gnomAD data (Figures 2 and 6).
Deleterious mutation, Arg103His is localized in the loop connecting helices hC and hD, which is also known as CD-loop (Figure 6 and Table 1) and at the same position, there is another deleterious mutation as Arg103Cys found in the gnomAD data (Figures 2 and 6) and it is also found mutated in two samples of stomach adenocarcinoma of cBioPortal (Table 2).Arg103Cys is top-ranked deleterious mutation (CADD score >25 and Grantham scores ≥151) with shrinkage of total surface area with arginine changing into serine (Figure 5B).
This CD loop also harbors three pathogenic variants with first two as Ala99Thr andGlu100Lys with one each sample of large intestine and endometrium cancer (1) and the last variant Asp104Gly mutated in two types of cancers haematopoietic and lymphoid and pancreatic cancer from the COSMIC dataset (Figure 3).
The helix hD has only one partly predicted (by SIFT only) pathogenic mutation Ser117Lue.This helix has five mutations at the four locations as Leu115Gln, Arg116Cys, Leu118Pro and Arg124Leu/Arg124Pro in the gnomAD data (Figures 2 and 6).Out of these missense variants, Arg116Cys is top-ranked deleterious mutation (CADD score >25 and Grantham scores ≥151) with shrinkage of the total surface area in the helix D (Figure 5C).
The sheet s2A has a single deleterious mutation as Tyr135His, this position is occupied by aromatic amino acids (Figure 6 and Table 1).We found two deleterious mutations as Leu134Pro and Gly136Arg in the gnomAD data (Figures 2 and 6) and these are matching with COSMIC data with one sample each for esophagus and liver cancer (Figure 3).There is one more critical mutation as Arg133Gln, mutated in the uterine endometrioid carcinoma deduced from the cBioPortal data (Table 2).
The loop connecting the sheet s2A and helix hE possesses two deleterious mutations as Ser139Leu and Phe142Leu in the gnomAD data (Figures 2 and 6) and at the position 139, there is another critical mutation as Ser139Pro, found in one sample of hepatocellular carcinoma of cBioPortal data (Table 2) The helix hE has three critical mutations at two positions Asp144Gly and His153Asn/His153Gln in the gnomAD data (Figures 2 and 6), whereas cBioPortal data has 4 different pathogenic variants as Val147Glu, Arg148His, Ser150Arg and His153Tyr with one sample each from tubular stomach adenocarcinoma, colon adenocarcinoma, hepatocellular adenoma and mucinous adenocarcinoma of the colon and rectum (Table 2).
The loop connecting the helix hE and the sheet s1A has a single deleterious variant as Cys156Trp from gnomAD data (Figures 2 and 6) and.The sheet s1A has a single deleterious mutation as Ser159Phe from gnomAD data (Figures 2 and 6) and this missense mutation is in the list of the top 13 deleterious variants (CADD score >25 and Grantham scores ≥151) with noticeable increase in the total surface area by serine to phenylalanine leads into conformational changes in the sheet s1A (Figure 5D).
This sheet harbors another mutation at the same position as Ser159Cys with one sample of large intestine cancer (Figure 3) and one sample of colorectal adenocarcinoma (Table 2).There is another deleterious variant as Ile161Leu deduced from CanVar dataset (Table 3).The loop between of the sheet s1A and the helix hF has one deleterious mutation as Asn162Lys from gnomAD data (Figures 2 and 6).
The helix hF has a single deleterious mutation as Asn174Lys from gnomAD data (Figures 2 and 6).
This helix hF has six deleterious variants found in different cancers as Arg167Cys (haematopoietic and lymphoid cancer), Ala169Thr (large intestine cancer), Glu175Lys (large intestine cancer), Ala178Ser (large intestine cancer), Ala178Val (urinary tract cancer) and Asp182Asn (large intestine and lung cancer) from the COSMIC dataset (Figure 3).Two pathogenic mutations are also found in the cBioPortal data (Table 2) as Arg167His and Ileu173Val with one mutation each mutated in colorectal and esophagogastric adenocarcinoma.Arg167Cys is a top-ranked deleterious mutation (CADD score >25 and Grantham scores ≥151) and this mutation leads into depletion of the total surface area (Figure 5E).
The loop connecting helix hF-sheet s3A has single deleterious missense variant as Gly183Ser (Table 1 and Figure 6).GnomAD data has two mutations at the position 183 as Gly183Ser and Gly183Arg (Figures 2 and 6).The second deleterious variant Gly183Arg is also deduced from CanVar dataset (Table 3).
The COSMIC data has a single mutation as Gly183Ser with one sample each mutation in oesophagus and stomach cancer (Figure 3).The cBioPortal data has 3 mutations as Gly183Ser mutated in stomach adenocarcinoma and Lys184Gln and Glu187Lys -both mutated breast invasive ductal carcinoma (Table 2).
The sheet s3A of HSP47 has three pathogenic mutations as Val192Ala, Asn202Lys and Phe205Val found in gnomAD data (Figures 2 and 6).This sheet has one critical mutation as Asp191His with two samples of lung adenocarcinoma derived from the cBioPortal (Table 2), while COSMIC data has four pathogenic mutation as Arg194Ser with one each sample mutated for large intestine cancer, breast cancer, haematopoietic and lymphoid cancer, Thr195Met with one sample of central nervous system cancer, Ala198Thr with one each sample mutated for large intestine and thyroid cancer (1) and Leu200Pro with one sample mutated for biliary tract cancer (Figure 3).
The loop connecting sheet s3A-helix hF1 has two variants with one deleterious variant as His215Tyr in the highly conserved region (Figure 6).The small helix hF1 has a single deleterious mutation with high CADD score (32) as Val219Met from the gnomAD dataset (Figures 2 and 6).The loop between the helix hF1 and the sheet s4C has one critical missense variant (Asn221Tyr) with CADD score of 28.2 (Figures 2 and 6).Sheet s4C possess one pathogenic mutation as Arg222His (CADD score = 35).
The loop connecting sheets s4C-s3C harbours four critical variants -Val226Met from the gnomAD dataset (Figures 2 and 6).COSMIC data has two mutations as Arg228Trp and Ser229Phe with one sample each of skin cancer (Figure 3) and two from the cBioPortal data as Arg228Trp and Ser229Pro with one sample each mutated of hepatocellular carcinoma and cutaneous squamous cell carcinoma (Table 2).
Sheet s3C has four pathogenic mutations as Val232Gly, Met237Ile, Arg239Trp, Gly241Val deduced from the gnomAD dataset (Figures 2 and 6).The COSMIC has 2 deleterious mutations (CADD score >20) as Thr231Ile with one sample each for biliary tract and skin cancer and Met235Thr possessed by one sample of liver cancer and two samples for urinary tract cancer (Figure 3).Thr231Ile is also mutated in the cutaneous melanoma (Table 2).The sheet s1B has two variants as Glu249Lys and Glu251Lys are potentially critical possessed by one sample of large intestine and biliary tract cancer as deduced from the COSMIC (Figure 3).
The loop connecting sheets s1B-s2B has two critical variants as Lys252Asn (CADD score of 22.5) mutated in a sample of upper aerodigestive tract cancer as found in the COSMIC dataset (Figure 3) and Leu253Pro (CADD score of 28.4) is found in the gnomAD dataset (Figures 2 and 6).
The sheet s2B has one deleterious variant as Leu260Met, which highly conserved position (>90%, Figure 6) in 1000 genomes data and there are three variants at two positions as Pro259Thr/Pro259Leu and Leu260Val.There are 3 mutations with known to be mutated in cancer samples in this sheet with Val256Met and Leu260Met mutated in breast invasive ductal carcinoma and stomach adenocarcinoma, derived from cBioPortal (Table 2) and Ala261Thr is mutated in large intestine cancer, deduced from the COSMIC data (Figure 3).The sheet s3B has one deleterious mutation as Ser266Asn, in a residue, which is highly conserved with >70% conservation at the start of this sheet (Figure 6) and this variant is also found in the cBioPortal data with the mutation in colorectal adenocarcinoma (Table 2).The loop connecting the sheet s3B and the helix hG has total 2 critical variants at the highly conserved position 276 -Glu276Ala and Glu276Lys, first one is found in 1000 genomes data (Figure 6), while both are present in the gnomAD dataset (Figures 2 and 6).Helix hG has two deleterious variants at a highly conserved position 280 as Arg280Cys and Arg280His in the 1000genomes (Table 2), while at the same position has three critical variants as Arg280Cys, Arg280Ser, Arg280His deduced from the gnomAD dataset (Figures 2 and 6).COSMIC data has total four deleterious mutations in the helix hG as Glu279Lys mutated in one sample of large intestine cancer, Arg280Cys found in two samples of pancreatic cancer, Trp293Leu and Trp293Cys with one sample each for liver cancer (Figure 3).The cBioPortal data also has Arg280Cys as deleterious variant as possessed by three samples of head and neck squamous cell carcinoma and also by one sample of ampullary carcinoma, respectively (Table 2).
Both Arg280Cys and Trp293Cys are listed into the group of 13 top-ranked deleterious HSP47 mutations.Structurally, these two mutations lead into shrinkage of surface areas at respective positions (Figures 5F-G) The sheet s2C has two pathogenic mutations as Ala303Thr with one sample mutated with breast cancer and Ser305Tyr possessed by three samples of oesophagus cancer, derived from the COSMIC dataset (Figure 3).At the position 305, Ser305Pro is found as partly deleterious (only by Polyphen V2) mutation in the 1000gemones data (Table 2).The loop between the helix hH and the sheet s2C has a single variant as Met297Ile with partly deleterious nature (only by SIFT, Table 2).The loop between sheets s2C-s6A has one critical variant as Lys308Glu with one sample mutated in the hepatocellular carcinoma, derived from cBioPortal data (Table 2).The sheet s6A has a single critical mutation as Thr314Ile, derived from cBioPortal (Table 2).
The helix hI has two deleterious mutations at the two highly conserved positions 321 and 323 as Leu321Pro and one as Gly323Trp from 1000genomes data (Figure 4 and Table 1), which have matches in the COSMIC dataset (Figure 3) with one sample each for endometrium and lung cancer and these two mutations are also found in three samples of uterine endometrioid carcinoma and small cell lung cancer (Table 2).There is another pathogenic mutation as Leu324Pro in the COSMIC dataset with mutations in large intestine and urinary tract cancer (Figure 3).This mutation is already known to cause OI in dogs [8].Out of these missense variants of the helix hI, Gly323Trp is a topranked deleterious mutation (CADD score -34 and Grantham score -182) with the increment of the total surface area at the position 323 (Figure 5H).This mutation is critical as it is located penultimate to known IO-causing Leu324Pro.
The loop between helices hl and hl1 has one deleterious mutation as Leu326Val from the gnomAD dataset (Figures 2 and 6) and this variant is also found in the cBioPortal data with this mutation in two samples of uterine carcinosarcoma/uterine malignant mixed mullerian tumor (Table 2).The helix hl1 has one deleterious mutation as Glu328Lys from the gnomAD dataset (Figures 2 and 6).The loop between helix hl1 and the sheet s5A has two deleterious variants as Arg339Cys and Ala349Asp from the gnomAD dataset (Figures 2 and 6).The cBioPortal has a deleterious mutation as Arg329Thr with one sample of uterine endometrioid carcinoma (Table 2).The sheet s5A has two deleterious variants as Phe352Tyr and His353Arg from the gnomAD dataset (Figures 2 and 6).The Loop between helix hI1-sheet s5A has two partly deleterious mutations as Lys332Asn and Arg339Leu deduced from 1000genomes (Only by PolyPhen V2, Table 2).This loop has also a highly ranked mutation as Arg339Cys with CADD score -32 and Grantham score -180, derived from the gnomAD dataset (Figures 2 and 6) and this mutation leads into depletion of the total surface area at the position 339 (Figure 5I).
This loop also harbors another deleterious mutation as Lys343Thr with one sample with skin cancer in the COSMIC dataset (Figure 3) and two samples with cutaneous melanoma from the cBioPortal dataset (Table 2).
Sheet s5A has total 5 variants of which two pose deleterious mutations as His353Asn and Glu358Gln at the highly conserved position from 1000genomes data (Figure 6 and Table 1).These two mutations are also found to be mutated in one and four samples of colon adenocarcinoma and bladder urothelial carcinoma (Table 2) in the cBioPortal data, whereas COSMIC data has the first mutation found in one sample of large intestine and another critical mutation as Ala356Thr in two samples of stomach cancer (Figure 3).
The sheet s4A (within RCL) harbors 6 mutations sites -first four (Pro365Leu, Gly372Arg, Arg373Pro, and Glu375Val) being partly deleterious with only PolyPhen V2 prediction support and two being deleterious as Arg377Cys and Pro379Leu.The COSMIC dataset has 3 mutations as Arg373Cys with one sample each of lung and pancreatic cancer, Arg373Pro with one sample of endometrium cancer and Arg377Cys with 2 samples of large intestine cancer (Figure 3).Deleterious mutation Arg377Cys is also in the gnomAD dataset (Figures 2 and 6) and in the cBioPortal dataset with one sample each in colorectal adenocarcinoma and B-lymphoblastic leukemia/lymphoma (Table 2).Two arginine to cysteine mutations (Arg373Cys Arg377Cys) are highly deleterious in nature (with very high CADD and Grantham scores) with causes depletion in surface areas at respective positions in the sheet s4A (within RCL, Figure 5J-K).The sheet s1C has single deleterious mutation as Phe382Leu, to a highly conserved position just after RCL from the 1000 genomes (Table 1) and this variant is also shared in the cBioPortal dataset with two and three samples found in bladder urothelial and uterine endometrioid carcinoma, respectively (Table 2).Similarly, the loop joining sheets s1C-s4B harbors a single deleterious variant as Ala384Thr from the 1000 genomes (Table 1) and gnomAD dataset (Figures 2 and 6).This loop also harbors a pathogenic mutation (Pro387Ser) with one sample in cutaneous melanoma from the cBioPortal data (Table 2).The Sheet s5B harbors two deleterious mutations (Ser399Phe and Leu401Val) and five deleterious mutations at four positions (Ser399Phe/Ser399Tyr, Leu400Gln, Arg405Cys, Arg408Trp) from the 1000 genomes (Table 1) and the gnomAD (Figures 2 and 6), respectively.The pathogenic variant Ser399Phe was found in one sample each of cutaneous squamous cell carcinoma and cutaneous melanoma from the cBioPortal data (Table 2).Two missense variants of the loop joining sheets s1C-s4B is grouped in the top 13 most deleterious variants list with very high CADD (>25) and Grantham (≥151) scores as Ser399Phe and Arg405Cys with increase and decrease in the surface area due to mutations at the positions 399 and 405, respectively (Figures 5L-M).At the C-terminal end, there are five deleterious variants as Asp412Asn, Arg415Gly, Arg415Gln, Asp416Glu, Glu417Lys from gnomAD (Figures 2 and 6), while at the position 411, there are two pathogenic variants as Gly411Val and Gly411Cys found mutated in one sample each of the liver cancer and cutaneous melanoma deduced from the COSMIC (Figure 3) and cBioPortal (Table 2), respectively.

Expression of HSP47 in different cancers tissues and normal tissues
Based on the mutational profile of HSP47, it has roles in different cancers.It is also important to know, what is the expression pattern of HSP47 in different cancer types.To evaluate expression patterns of HSP47, we extracted data from the database of differential expression of protein in cancer, dbDEPC 3.0 [19].In eleven types of cancer, HSP47 is up-regulated with top four cancers based on the number of experiments -meningioma, colorectal cancer, hepatocellular carcinoma and breast cancer (Figure 7 and Suppl.Table S5).HSP47 is down-regulated in chordoma, lung adenocarcinoma and urinary bladder neoplasms (Figure 7 and Suppl.Table S5).
This also leads us to think, how are the expression patterns of HSP47 in different normal human tissues.To evaluate expression pattern, we have scanned three large resources of human gene expression datasets as human protein atlas (HPA, https://www.proteinatlas.org/),genotype-tissue expression (GTEx https://gtexportal.org)and FANTOM5 project (http://fantom.gsc.riken.jp/5/).
Upon evaluating protein level expression of HPS47 using the HPA resource, we found that human HSP47 protein is highly expressed in in the normal tissues of lung, kidney, breast, endometrium, ovary and placenta (Figure 7B), whereas expression of HPS47 is ranged in medium level in tissues of tonsil, smooth muscle, oral mucosa, esophagus, testis, vagina, cervix (uterine), soft tissue, and skin (Figure 7B).Low level of expression of HSP47 was found in tissues of adrenal gland, bronchus, cerebral cortex and colon (Figure 7B).We examined RNA-Seq data for HSP47 from the HPA resource, placenta tissues have highest expression level with 329.1 transcripts per million (TPM), whereas other normal Using FANFOM5 dataset, we found that HSP47 is highly expressed (>100 tags per millions) in 7 normal tissues originating from vagina, placenta, cervix (uterine), ovary, breast, thyroid gland and urinary bladder (Figure 7E).We also found medium (>100 tags per millions) and lower levels of HSP47 expression in 19 and 10 tissues types, respectively (Figure 7E).Overall, we have found HSP47 expression patterns in several normal tissues using three different publicly available datasets.

A Cocktail of different chaperones interact with HSP47
To evaluate the protein interaction partners of HSP47, we have constructed the interactome map of 0.9) are different types of molecular chaperones (Figure 8A), which is clear from their names as they contain heat shock protein of specific kDa and member numbers (Figure 8B and Table 4).
Figure 8.Protein interactome network of human HSP47 reveals several molecular chaperones are interaction partners for HSP47.This network is produced with help of STRING 10 [20] with confidence score > 0.9.
A. Interactome of human HSP47 protein.
B. Details of top protein-protein interaction partners of heat shock protein 47 (HSP47) with their confidence scores.
This suggested that a cocktail of different molecular chaperones is essential for the physiology of HSP47 in the endoplasmic reticulum (ER).These HSP47 interaction partners involve two paralogs involved in histone acetylations such as CREB binding protein (CREBBP) and E1A binding protein p300 (EP300) [21].These two proteins are closely related size such as CREBBP and EP300 is 2442 and 2414 residues long, respectively and these two proteins possess multiple Pfam domains such as Zf-TAZ (Pfam ID -PF02135.15),KIX (PF02172.15),Bromo (PF00439.24),unknown domain (PF06001.12),HAT_KAT11 (PF08214.10),ZZ (PF00569.16)and other unknown domain (PF09030.9),respectively (Figure 9A-B).These two proteins work as histone acetyltransferases and they regulate transcription and/or cell cycle progression by modulating the chromatin structure [21].These two prominent chromatin remodelers, which operate as scaffolds, which stabilize other protein-protein partners with the transcription complex and these are involved in crucial physiological roles such as development, growth, and homeostasis [21].CREBBP and EP300 genes are localized in the human genome (Table 4) on the chromosomes 16 (cytoplasmic band 16p13.3)and 22 (22q13.2),respectively.Mutations in these genes cause a rare neurodevelopmental syndrome of known as the Rubinstein-Taybi syndrome (RSTS, OMIM #180849, #613684), which is characterized by errors in facial appearance, skeletal and dysmorphic abnormalities, microcephaly, enlargement of thumbs and first toes, and impaired intellectual and postnatal growth [22].
Cartilage associated protein (CRTAP) is 401 amino acids long without any known protein domain (Figure 9C).It is encoded by CRTAP gene is localized on the human chromosome 3 (cytoplasmic band 3p22.3,Table 4).CRTAP forms the collagen prolyl 3-hydroxylation complex with P3H1 and cyclophilin B (CyPB) in the ER, which 3-hydroxylates the pro986 residue of α1(I) and α1(II) collagen chains [23].It is associated with a small percentage (5-7%) of patients with severe to lethal OI types VII (OMIM # 610682) and there are five known mutations in CRTAP gene, which lead into either prevention of production of any cartilage associated proteins, or reduction in the production of cartilage associated proteins.Irregularities in the production of cartilage associated proteins cause problems in formation of collagen, which ultimately results into the severe form of OI [23].
There are two hsp40 proteins are in this list of HSP47 interaction partners as DnaJ (Hsp40) homolog  4).J-domain is highly conserved domains amongst hsp40 proteins, which is associated with protein folding and protein disaggregation with partnering with hsp70 [24,25].These two proteins are associated with human diseases caused by errors in protein folding [26,27].
There are two heat-shock protein 27 (HSP27) homologs -heat shock factor binding protein 1 (HSPB1) and 2 (HSPB2) with size 205 and 182 amino acids with a protein domain HSP20 (PF00011.20) in region of 88-183 and 70-162, respectively (Figure 9J-K).HSPB1 gene is localized on human chromosome 7 (7q11.23)while HSPB2 is mapped to 11q23.1 region in the chromosome 11 (Table 4).It encodes for an enzyme, which is a member of a heat shock protein family.Under environmental stress, HSPB1 translocate from cytoplasm to nucleus and helps other protein for correct folding.The main role of this gene is the differentiation of a wide range of cell type.Mutation in this gene leads to Charcot-Marie-Tooth Disease, Axonal, Type 2F, and distal hereditary motor neuropathy, Type Iib diseases.
HSPB1 is involved in many cellular processes such as apoptosis, thermotolerance, protein disaggregation and cell differentiation and development.HSPB2 has a crucial role in binding and activating myotonic dystrophy protein kinase (DMPK), hence it is also called as called as myotonic dystrophy kinase binding protein (MKBP).This protein HSPB1/MKBP is a major player in maintenances of muscle structure and function [30].Hsp27 has a highly conserved α-crystallin domain that is enriched with β-sheet structures.The sHsps bind to aggregated proteins in ATPindependent manner and which are subsequently tackled by either by HSP70 system (Hsp70 plus Hsp40 system) or Hsp70/104 bichaperone [31] system for protein disaggregation.Disaggregated proteins either refolded back into native proteins or degraded by autophagy and/or proteasomal system.In addition, Hsp27 recently was shown to be involved in cancer related retinopathy, suggesting its role in developing cancer therapeutics [32].
HSBP1 gene is localized in the genomic fragment of 16q23.3 on the chromosome 16 (Table 4), encodes for HSBP1 protein, which is 76 amino acids long with HSBP1 (PF06825.12)domain in the region of 10-60 (Figure 9L).HSBP1 is a member of small heat shock proteins (sHSPs) family and this protein prevents the aggregation of denatured and stress-induced misfolded proteins [33].
There are two HSP90 homologs in protein-protein interaction partners as HSP90AA1 (or Hsp90α) and HSP90AB1 (Hsp90β), belong to HSP90 family, which is a well-characterized, well-documented conserved and critical eukaryotic chaperone family [34].These homologs HSP90AA1 and HSP90AB1 are mapped into the human chromosomes 14 (14q32.31)and 6 (6p21.1),respectively (Table 4).These two proteins have two types of protein domains such as HATPase_C (PF02518.25) and HSP90 (PF00183.17) in the N-terminal and the C-terminal end (Figure 9M-N).HSP90 proteins are required for the proper function of other chaperones.These HSP90 proteins are essential for the maturation, structural maintenance and protein folding, intracellular trafficking, and other signal transduction events [34,35].HSP90AB1 was shown to be overexpressed during cancer, which prevents misfolding, and degradation of both mutated (for example Ras and p53) and over-expressed oncoproteins (for example p53 and Her2) [36].
Leucine proline-enriched proteoglycan 1 (LEPRE1, leprecan) gene is located on the human chromosome 1 (cytoplasmic location 1p34.2) (Table 4).LEPRE1 encode to prolyl 3-hydroxylase 1 (P3H1), which is a member of collagen prolyl hydroxylase family with 736 amino acid long and it possesses a single domain of 96 residues long as OG-Fe(II) oxygenase superfamily (2OG-FeII_Oxy_3, PF13640.5) in the region of 584-661 (Figure 9O).PPIB/CyPB plays the instrumental role in the formation of the collagen prolyl 3-hydroxylation complex with P3H1 and CRTAP in the ER [23].The activity required for proper collagen synthesis and assembly [23].Mutation in this gene is associated with OI type VIII.
Kruppel-like factor 13 (KLF13) protein is encoded by KLF13 gene is localized on human chromosome 15 (Table 4) and KLF13 protein is 288 amino acids long with three copies of Zf-C2H2 (PF00096.25) domain from mid to the C-terminal end (Figure 9P).It is a member of kruppel-like factors (KLFs) family of Cys2-His2 (C2H2) zinc-finger transcription factors and it has play function in a myriad of physiological roles during cell differentiation and development processes [37].
P4HB gene is localized on human chromosome 17 (cytoplasmic ban 17q25.3),which encodes for prolyl 4-hydroxylase beta subunit (P4HB) protein of size 508 amino acids with 3 protein domains made of two thioredoxin (PF00085.19) in the N-terminal (25-131 residues) and the C-terminal ends 368-472 residues) and one thioredoxin_6 (PF13848.5) in the middle located in 161 -345 residues (Figure 9Q).This protein is a member of the disulfide isomerase family and it is also called protein disulfide isomerase (PDI).P4HB/PDI is the ubiquitously expressed protein which helps in the correction of disulfide bridges in nascent polypeptide chains [38].Hence P4HB/PDI plays an instrumental role in the protein folding and the cellular concentration of this protein is critical for protein aggregation/disaggregation [38].Mutations in this protein is involved in a new form of OIlike disorder, known as Cole-Carpenter syndrome [38].
Peptidyl-prolyl isomerase B (PPIB) gene is located on human chromosome 15 (cytoplasmic band 15q22.31),which encodes for peptidyl-prolyl isomerase B (PPIB) of size 216 residues with pro_isomerase (PF00160) domain in the region of 47-204 residues (Figure 9R) and it is also known as cyclophilin B (CyPB).PPIB/CyPB plays the instrumental role in the formation of the collagen prolyl 3-hydroxylation complex with P3H1 and CRTAP in the ER [23].Mutational variation in this gene leads to recessive forms of OI.PPIases enzyme helps in that catalysis process of the cis-trans isomerization of proline imidic peptide bonds in proteins and it ultimately assists protein folding and stability [23].PPIB, a member of peptidyl-prolyl cis-trans isomerase (PPIase) has β-barrel structure as cyclophilin and localized inside the endoplasmic reticulum (ER) lumen [39].Due to its localization to this specialized cellular compartment, it is involved in many biological processes such as posttranslational modification and proper folding of proteins such as type I collagen [40].
Finally, there are two ubiquitin proteins, which are interaction partner of HSP47 as ubiquitin B (UBB) and ubiquitin C and these proteins are variable in protein length with 229 and 685 amino acids and similarly these two possess 3 and 9 ubiquitin domains (72 amino acids each; PF00240.22),respectively (Figure 9S-T).UBB and UBC are encoded by UBB and UBC genes mapped on chromosomes 17 (17p11.2) and 12 (12q24.31),respectively (Table 4).These highly conserved are eukaryotic proteins that are involved in protein ubiquitination, which is a multifaceted dynamic post-translational change with help of the ubiquitin code present in the 72 amino acids of ubiquitin domain [41] with Pfam ID -PF00240.22.Resultant of protein ubiquitination is the clearance of aberrant proteins for their possible degradation by the proteasome and hence, this process is associated with various physiological roles and also with regulations of various signaling pathways [41].Mutations in these two ubiquitins are related to different human diseases such as Huntington's disease, Alzheimer's disease and polyglutamine disease [42].

Discussion
HSP47 is a critical regulator of the collagen maturation and associated embryonic development.
However, despite great efforts on discovering the molecular mechanisms and clinical relevance of HSP47 gene and protein functions, also detailed molecular phylogenic analyses was carried out previously [1].Although mutations of HSP47 have known to play roles in human diseases, yet detailed survey of mutational hotspot was lacking [8].We have addressed mutational profiles using germline and somatic mutations in various resources.
Eukaryotes have two primary cell types as germ and somatic cells and mutation in these two cell types are called germline and somatic mutations, respectively.Germline mutations are Mendelian mutations, which serve major source for evolutionary change, and it contributes to familial diseases [43].Similarly, somatic mutations are the primary cause of sporadic diseases including cancer [43].
To date, this study also provides the largest report on the mutational hotspot analyses of any member of serpin superfamily, previously studied [44][45][46][47].These mutational hotspots will be serving a major resource for understanding of the biology of HSP47, particularly how HSP47 regulates collagen maturations.Structure of canine HSP47 is known and it suggested that HSP47 do not undergo any conformational changes and two HSP47 monomers are required for stabilizing a single collagen triplex and crucial residues of collagen-binding are known.However, known OI causing mutations -Leu78Pro (in humans) and Leu326Pro (in dachshund) are far away from critical collagen binding sites and yet these can impose deleteriousness.This hints for further investigations of roles of HSP47 and its mutational sites.
Since HSP47 is the potential prognostic biomarker in cancer studies [48], these variants serve platforms for investigations for potential roles in the different cancer types.We also found that HPS47 is differentially expressed in different cancers and also in several normal tissues (Figure 6).
Knowledge of protein-protein interactions is crucial for understanding how singling networks functions flanking a protein.We performed interactome map of HSP47 analyses, which revealed that major stakeholders of top interaction partners of HSP47 are other molecular chaperones (Figure 7).
Taken together, our protein-protein network of HSP47 is a critical protein network in the collagenrelated disorders.Hence, remaining members of interactions partners must also be given into the consideration for future evaluations.
In conclusion, this study provides the largest repository of mutational hotspots of HSP47.It also sums up the expression pattern of HSP47 in human cancer types.This also reports on top protein-protein interaction patterns, which are cocktails of different heat-shock proteins.These findings will setup newer investigations focusing on the roles of HSP47 in human diseases from perspective of genetic variant and protein-protein network.

Figure 1 .
Figure 1.SNPs are major stakeholders of human genetic variants of HSP47 deduced from 1000G

Figure 3 .
Figure 3. Overview of somatic mutations of HSP47 identified from the COSMIC v86 database.A. Overview of mutational types reveal large fraction is missense variants.B. Summary of nucleotide changes that generates mutational profile for HSP47.C. Summary of missense variants identified in samples of different cancers.D. Top-ranked deleterious missense variants of HSP47 computed using CADD score (>20) and Grantham score (>50) with summary of cancer type and number of samples possessing deleterious variant.Ct -C-terminal end;

Figure 5 .
Figure 5. Overview of structural and conformational changes induced by 13 top-ranked deleterious mutations

Figure 7 .
Figure 7. Overview of different expression patterns of human HSP47 in cancer and different normal tissue types.A. Summary of HSP47 expression pattern in different cancer types.This expression pattern was deduced from dbDEPC 3.0[19].B. Summary of human HSP47 protein expression patterns in different normal tissues derived from human protein atlas (HPA, https://www.proteinatlas.org/).C. Overview of of human HSP47 expression patterns using RNA-Seq data from HPA and expression values are depicted as mean transcripts per million (TPM), corresponding to mean values of the different individual samples from each tissue types.D. Summary of RNA-seq based HSP47 expression patterns in different normal tissues and these values are shown as median reads per kilobase per million mapped reads (RPKM), derived from the genotype-tissue expression (GTEx https://gtexportal.org)datasets.E. Overview of expression pattern of HSP47 in normal human tissues are reported as tags per million extracted through cap analysis of gene expression (CAGE) in the FANTOM5 project data (http://fantom.gsc.riken.jp/5/)Similar functional tissue groups are same color codes in B-E.