Submitted:
28 July 2025
Posted:
29 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data Sets
2.2. Annotation of Compositional Bias Using fLPS2
2.3. Patterny Flow Design
2.4. Moduley: Labelling Compositional Modules (CModules) and Other Possible Compositional Boundaries
2.5. Bandy: Discerning Compositional Banding
2.6. Blocky: Assessing Residue Segregation
2.7. Runny: Measuring Homopeptide Content
2.8. Repeaty: Calculating Repetitiveness
2.9. The Patterny Script and the Program Implementations
3. Results & Discussion
3.1. Rationale, Test Data Sets & Performance
3.2. Prevalences of Features in the DISPROTNR Set
3.3. Ranges of Behaviour for the Properties Explored
3.4. Detailed Example from DISPROT: Chromogranin-A from Domestic Cod
3.5. Detailed Example from CModulesYEAST: Putative Transcriptional Activator MSA2 from S. cerevisiae
3.6. Further Examples
3.7. Patterny Source Code Distribution
4. Conclusions
Supplementary Materials
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| IDR | Intrinsically-disordered region |
| LCR | Low-complexity region |
| CBR | Compositionally-biased region |
| DPB | Distance to perfect banding |
| IE | Interval Entropy |
References
- Romero, P.; Obradovic, Z.; Li, X.; Garner, E.C.; Brown, C.J.; Dunker, A.K. Sequence complexity of disordered protein. Proteins 2001, 42, 38–48. [Google Scholar] [CrossRef]
- van der Lee, R.; Buljan, M.; Lang, B.; Weatheritt, R.J.; Daughdrill, G.W.; Dunker, A.K.; Fuxreiter, M.; Gough, J.; Gsponer, J.; Jones, D.T.; et al. Classification of intrinsically disordered regions and proteins. Chem Rev 2014, 114, 6589–6631. [Google Scholar] [CrossRef]
- Wootton, J.C.; Federhen, S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol 1996, 266, 554–571. [Google Scholar] [CrossRef] [PubMed]
- Harrison, P.M. Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins. Sci Rep 2024, 14, 680. [Google Scholar] [CrossRef] [PubMed]
- Harrison, P.M. Intrinsically Disordered Compositional Bias in Proteins: Sequence Traits, Region Clustering, and Generation of Hypothetical Functional Associations. Bioinform Biol Insights 2024, 18, 11779322241287485. [Google Scholar] [CrossRef] [PubMed]
- Kastano, K.; Mier, P.; Dosztanyi, Z.; Promponas, V.J.; Andrade-Navarro, M.A. Functional Tuning of Intrinsically Disordered Regions in Human Proteins by Composition Bias. Biomolecules 2022, 12. [Google Scholar] [CrossRef]
- MacLea, K.S.; Paul, K.R.; Ben-Musa, Z.; Waechter, A.; Shattuck, J.E.; Gruca, M.; Ross, E.D. Distinct amino acid compositional requirements for formation and maintenance of the [PSI(+)] prion in yeast. Mol Cell Biol 2015, 35, 899–911. [Google Scholar] [CrossRef]
- Lyons, H.; Veettil, R.T.; Pradhan, P.; Fornero, C.; De La Cruz, N.; Ito, K.; Eppert, M.; Roeder, R.G.; Sabari, B.R. Functional partitioning of transcriptional regulators by patterned charge blocks. Cell 2023, 186, 327–345 e328. [Google Scholar] [CrossRef]
- King, M.R.; Ruff, K.M.; Pappu, R.V. Emergent microenvironments of nucleoli. Nucleus 2024, 15, 2319957. [Google Scholar] [CrossRef]
- Gemayel, R.; Chavali, S.; Pougach, K.; Legendre, M.; Zhu, B.; Boeynaems, S.; van der Zande, E.; Gevaert, K.; Rousseau, F.; Schymkowitz, J.; et al. Variable Glutamine-Rich Repeats Modulate Transcription Factor Activity. Mol Cell 2015, 59, 615–627. [Google Scholar] [CrossRef]
- Dorone, Y.; Boeynaems, S.; Flores, E.; Jin, B.; Hateley, S.; Bossi, F.; Lazarus, E.; Pennington, J.G.; Michiels, E.; De Decker, M.; et al. A prion-like protein regulator of seed germination undergoes hydration-dependent phase separation. Cell 2021, 184, 4284–4298 e4227. [Google Scholar] [CrossRef]
- Cohan, M.C.; Shinn, M.K.; Lalmansingh, J.M.; Pappu, R.V. Uncovering Non-random Binary Patterns Within Sequences of Intrinsically Disordered Proteins. J Mol Biol 2022, 434, 167373. [Google Scholar] [CrossRef]
- Cascarina, S.M.; King, D.C.; Osborne Nishimura, E.; Ross, E.D. LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains. NAR Genom Bioinform 2021, 3, lqab048. [Google Scholar] [CrossRef]
- Mier, P.; Andrade-Navarro, M.A. Assessing the low complexity of protein sequences via the low complexity triangle. PLoS One 2020, 15, e0239154. [Google Scholar] [CrossRef] [PubMed]
- Quaglia, F.; Meszaros, B.; Salladini, E.; Hatos, A.; Pancsa, R.; Chemes, L.B.; Pajkos, M.; Lazar, T.; Pena-Diaz, S.; Santos, J.; et al. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Res 2022, 50, D480–D487. [Google Scholar] [CrossRef] [PubMed]
- Harrison, P.M. Compositionally biased dark matter in the protein universe. Proteomics 2018, e1800069. [Google Scholar] [CrossRef] [PubMed]
- UniProt, C. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 2023, 51, D523–D531. [Google Scholar] [CrossRef]
- Fox, N.K.; Brenner, S.E.; Chandonia, J.M. SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 2014, 42, D304–309. [Google Scholar] [CrossRef]
- Tegenfeldt, F.; Kuznetsov, D.; Manni, M.; Berkeley, M.; Zdobnov, E.M.; Kriventseva, E.V. OrthoDB and BUSCO update: annotation of orthologs with wider sampling of genomes. Nucleic Acids Res 2025, 53, D516–D522. [Google Scholar] [CrossRef]
- Harrison, P.M. Robust phylogenetic profile clustering for Saccharomyces cerevisiae proteins. PeerJ 2025, 13, e19370. [Google Scholar] [CrossRef]
- Harrison, P.M. fLPS: Fast discovery of compositional biases for the protein universe. BMC Bioinformatics 2017, 18, 476. [Google Scholar] [CrossRef]
- Harrison, P.M. fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences. PeerJ 2021, 9, e12363. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Harrison, P.M. Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021, 11, 10025. [Google Scholar] [CrossRef]
- Tariq, D.; Maurici, N.; Bartholomai, B.M.; Chandrasekaran, S.; Dunlap, J.C.; Bah, A.; Crane, B.R. Phosphorylation, disorder, and phase separation govern the behavior of Frequency in the fungal circadian clock. Elife 2024, 12. [Google Scholar] [CrossRef]
- Josefsson, E.; O'Connell, D.; Foster, T.J.; Durussel, I.; Cox, J.A. The binding of calcium to the B-repeat segment of SdrD, a cell surface protein of Staphylococcus aureus. J Biol Chem 1998, 273, 31145–31152. [Google Scholar] [CrossRef] [PubMed]
- Denning, D.P.; Patel, S.S.; Uversky, V.; Fink, A.L.; Rexach, M. Disorder in the nuclear pore complex: the FG repeat regions of nucleoporins are natively unfolded. Proc Natl Acad Sci U S A 2003, 100, 2450–2455. [Google Scholar] [CrossRef]
- Yoo, S.H.; Ferretti, J.A. Nature of the pH-induced conformational changes and exposure of the C-terminal region of chromogranin A. FEBS Lett 1993, 334, 373–377. [Google Scholar] [CrossRef] [PubMed]
- Taylor, W.R. Residual colours: a proposal for aminochromography. Protein Eng 1997, 10, 743–746. [Google Scholar] [CrossRef]
- Gene Ontology, C.; Aleksander, S.A.; Balhoff, J.; Carbon, S.; Cherry, J.M.; Drabkin, H.J.; Ebert, D.; Feuermann, M.; Gaudet, P.; Harris, N.L.; et al. The Gene Ontology knowledgebase in 2023. Genetics 2023, 224. [Google Scholar] [CrossRef]




| Feature |
Data Sets→DISPROTNR (total=6463) |
CModulesYEAST (total=24043) |
ASTRALSCOP40 (total=14844) |
|
Modularity: CModules (≥1) CModules (≥2) CModules (≥3) |
2454 (38.0%) 746 (11.5%) 309 (4.8%) |
--- --- --- |
10096 (68.0%)* 4490 (30.2%) 1794 (12.1%) |
|
Banding: Bands (≥2) Bands (≥3) Significantly even bands (≥2) Significantly uneven bands (≥2) Significantly even bands (≥3) Significantly uneven bands (≥3) |
670 (10.4%) 389 (6.0%) 430 (6.7%) 17 (0.3%) 66 (1.0%) 14 (0.2%) |
3217 (13.4%) 1640 (11.0%) 2145 (8.9%) 59 (0.2%) 244 (1.0%) 51 (0.2%) |
1317 (8.9%) 382 (2.6%) 950 (6.4%) 1 (0.0%) 46 (0.3%) 1 (0.0%) |
|
Blockiness (B): Significantly blocky Significantly un-blocky |
|||
| 560 (8.7%) | 1592 (6.6%) | 575 (3.9%) | |
| 70 (1.1%) | 51 (0.2%) | 38 (0.3%) | |
|
Homopeptide content (hpep): Significant hpep enrichment Significant hpep lack |
802 (12.4%) 9 (0.1%) |
2280 (9.5%) 16 (0.1%) |
938 (6.3%) 0 (0.0%) |
|
Repetitiveness [Interval Entropy (IE)]: Significantly repetitive Significantly un-repetitive |
598 (9.3%) 54 (0.8%) |
2194 (9.1%) 269 (1.1%) |
1672 (11.3%) 136 (0.9%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).