Submitted:
17 July 2025
Posted:
18 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Dataset Selection
2.2. Differential Abundance Analysis
2.3. Biological Relevance Filtering and Overlap Analysis
2.4. Segregation and Functional Enrichment Analysis
2.5. Jaccard Analysis of Individual Works
2.6. Meta-Analysis
3. Results and Discussion
3.1. Protein Groups Variability
3.2. Differential Expression Analysis and Impact of Hypothesis Testing Methods
3.3. Biological Relevance in Differential Proteomics Analysis
3.4. Functional Enrichment and Similarity Profiles
3.5. Meta-Analysis
3.6. Conclusions and Implications
Supplementary Materials
References
- Han, D., Jiang, X., Li, X., Wang, Y., Lu, Y., Liu, S., ... & Dong, M. (2022). Quantitative proteomics in biomedical research: From discovery to clinical application. Journal of Proteomics, 254, 104445. [CrossRef]
- Ting, L., Su, M., Sun, C., Li, S., & Li, R. (2021a). Benchmarking statistical methods for differential protein expression analysis in label-free quantitative proteomics. Journal of Proteome Research, 20(4), 1982–1993. [CrossRef]
- Wang, B., Zeng, S., Tan, S., Tang, R., Wang, X., Wang, S., & Wu, X. (2021). Strategies for quantitative proteomic analysis in drug discovery and development. Expert Review of Proteomics, 18(9), 743–760. [CrossRef]
- Wang, X., Lu, Y., & Liang, W. (2023). A practical guide to differential protein expression analysis in label-free quantitative proteomics. Frontiers in Cell and Developmental Biology, 11, 1118742. [CrossRef]
- Liao, Y., Lin, D., Jin, H., Hu, H., Guo, L., & Hu, C. (2022). Strategies for sample preparation in proteomic research: Current status and future perspectives. Proteomics, 22(19), 2200007. [CrossRef]
- Mallick, P., Kuncar, G., & Kratchmarova, I. (2011). Quantitative proteomics: Strategies and statistical considerations. Molecular BioSystems, 7(8), 2419–2430. [CrossRef]
- Gessner, D., Kuras, M., & Vitek, O. (2020). Statistical aspects of label-free quantitative proteomics. In Mass Spectrometry-Based Quantitative Proteomics (pp. 37–58). Humana, New York, NY. [CrossRef]
- Zhang, T., Zhang, N., & Zhang, W. (2020). Progress in label-free quantitative proteomics. Molecules, 25(14), 3244. [CrossRef]
- Cox, J., & Mann, M. (2011). Quantitative proteome analysis with SILAC, metabolic labeling and quantifiable peptide-centric MS. In Methods in Molecular Biology (Vol. 752, pp. 193–207). Humana Press. [CrossRef]
- Ruggles, K. V., Krug, K., Wang, Y., Dou, Y., & Huse, J. T. (2017). Proteogenomic analysis of IDH-mutant glioma. Cancer Cell, 31(4), 543–558. [CrossRef]
- Kall, L., Canterbury, J. D., Sherman, B. T., & MacCoss, M. J. (2012). Peptide and protein identification from tandem mass spectrometry data. Current Protocols in Bioinformatics, 40(1), 13.9.1-13.9.23. [CrossRef]
- Teo, G., Tan, T. J., Chuah, A., & Chong, F. T. (2021). A comprehensive review of mass spectrometry-based proteomic data analysis tools and workflows. Proteomics, 21(1-2), 2000099. [CrossRef]
- Elias, J. E., & Gygi, S. P. (2007). Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods, 4(3), 207–214. [CrossRef]
- Chawade, A., Alexandersson, E., & Levander, F. (2014). Normalization strategies for protein quantification in mass spectrometry. Proteomics, 14(10), 1279–1290. [CrossRef]
- Díaz, S., Sampedro-Torres, E., Peinado, P., Gil-Monzo, M., Piqueras, P., & Al-Amoudi, A. (2021). A review of normalization methods for label-free quantitative proteomics. Proteomics, 21(21-22), 2100062. [CrossRef]
- Callister, S. J., Barry, R. C., Adkins, J. N., Johnson, E. T., Qian, W. J., Smith, R. D., & Lipton, M. S. (2006). Normalization approaches for membrane proteomics. Journal of Proteome Research, 5(1), 191–201. [CrossRef]
- Schwanhäusser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., ... & Mann, M. (2011). Global quantification of mammalian gene expression control by measuring absolute protein synthesis rates. Nature, 473(7347), 337–342. [CrossRef]
- Cox, J., Zecha, J., & Mann, M. (2014). MaxQuant enables deep and reproducible proteome quantification. Molecular & Cellular Proteomics, 13(7), 1805–1813. [CrossRef]
- Hicks, J. L., Garmire, L. X., & Garmire, M. L. (2015). A comprehensive evaluation of normalization methods for proteomic data. Scientific Reports, 5(1), 16410. [CrossRef]
- Karp, N. A., & Lilley, K. S. (2007). Determining significance in quantitative proteomics. Molecular & Cellular Proteomics, 6(1), 13–20.
- Zhang, B., VerBerkmoes, N. C., Langston, M. A., Uberbacher, E., Hettich, R. L., & Samatova, N. F. (2006). Detecting differential and correlated protein expression in label-free shotgun proteomics. Journal of Proteome Research, 5(11), 2909–2918. [CrossRef]
- Millikin, R. J., Shortreed, M. R., Scalf, M., & Smith, L. M. (2020). A Bayesian null interval hypothesis test controls false discovery rates and improves sensitivity in label-free quantitative proteomics. Journal of Proteome Research, 19(5), 1975–1981. [CrossRef]
- Choi, H., Fermin, D., & Nesvizhskii, A. I. (2008). Significance analysis of spectral count data in label-free shotgun proteomics. Molecular & Cellular Proteomics, 7(12), 2373–2385. [CrossRef]
- Lazar, C., Gatto, L., Ferro, M., Bruley, C., & Burger, T. (2016). Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. Journal of Proteome Research, 15(4), 1116–1125. [CrossRef]
- Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47. [CrossRef]
- Zhu, Y., Orre, L. M., Tran, Y. Z., Mermelekas, G., Johansson, H. J., Malyutina, A., Anders, S., & Lehtiö, J. (2020). DEqMS: A method for accurate variance estimation in differential protein expression analysis. Molecular & Cellular Proteomics, 19(6), 1047–1057. [CrossRef]
- Crook, O. M., Chung, C.-W., & Deane, C. M. (2022). Challenges and opportunities for Bayesian statistics in proteomics. Journal of Proteome Research, 21(4), 849–864. [CrossRef]
- Choi, H., Sheng, Q., Merrill, B. D., Sysko, A. D., & Gilmore, J. M. (2014a). MSstats: An R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics, 30(17), 2524–2526.
- Kerr, M. K., Martin, M., & Churchill, G. A. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology, 7(6), 819–837. [CrossRef]
- Jow, H., Boys, R. J., & Wilkinson, D. J. (2014). Bayesian identification of protein differential expression in multi-group isobaric labelled mass spectrometry data. Statistical Applications in Genetics and Molecular Biology, 13(3), 329–347. [CrossRef]
- Choi, H., Geng, Q., Hitchcock, G., Wang, C., Li, X., Gordon, O., & Shen, N. (2014). MSstats: An R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics, 30(17), 2524–2526. [CrossRef]
- Karpievitch, Y. V., Dabney, A. R., & Smith, R. D. (2012). Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics, 13(Suppl 16), S5. [CrossRef]
- Välikangas, T., Suomi, T., & Elo, L. L. (2018). A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation. Briefings in Bioinformatics, 19(6), 1344–1355. [CrossRef]
- Amor, M., Diaz, M., Bianco, V., Svecla, M., Schwarz, B., Rainer, S., Pirchheim, A., Schooltink, L., Mukherjee, S., Grabner, G. F., Beretta, G., Lamina, C., Norata, G. D., Hackl, H., & Kratky, D. (2024). Identification of regulatory networks and crosstalk factors in brown adipose tissue and liver of a cold-exposed cardiometabolic mouse model. Cardiovascular Diabetology, 23(1), 298. [CrossRef]
- Lozano-Terol, G., Chiozzi, R. Z., Gallego-Jara, J., Sola-Martínez, R. A., Vivancos, A. M., Ortega, Á., Heck, A. J. R., Díaz, M. C., & de Diego Puente, T. (2024). Relative impact of three growth conditions on the Escherichia coli protein acetylome. iScience, 27(2), 109017. [CrossRef]
- Rodriguez, M. C., Mehta, D., Tan, M., & Uhrig, R. G. (2021). Quantitative Proteome and PTMome Analysis of Arabidopsis thaliana Root Responses to Persistent Osmotic and Salinity Stress. Plant Cell Physiology, 62(6), 1012–1029. [CrossRef]
- Biełło, K. A., Lucena, C., López-Tenllado, F. J., Hidalgo-Carrillo, J., Rodríguez-Caballero, G., Cabello, P., Sáez, L. P., Luque-Almagro, V., Roldán, M. D., Moreno-Vivián, C., & Olaya-Abril, A. (2023). Holistic view of biological nitrogen fixation and phosphorus mobilization in Azotobacter chroococcum NCIMB 8003. Frontiers in Microbiology, 14, 1129721. [CrossRef]
- R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/.
- Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1). [CrossRef]
- Goodrich, B., Gabry, J., Ali, I., & Brilleman, S. (2022). rstanarm: Bayesian Applied Regression Modeling via Stan (Version 2.21.3) [R package]. https://mc-stan.org/rstanarm/.
- Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC.
- Kruschke, J. K. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2nd ed.). Academic Press.
- Conway, J. R., Lex, A., & Gehlenborg, N. (2017). UpSetR: An R package for the visualization of intersecting sets and their properties. Bioinformatics, 33(18), 2923–2924. [CrossRef]
- Bindea, G., Mlecnik, B., Hackl, H., Charoentong, P., Tosolini, M., Kirilovsky, A., ... & Trajanoski, Z. (2009). ClueGO: A Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics, 25(8), 1091–1093. [CrossRef]
- Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., ... & Ideker, T. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–2504. [CrossRef]
- Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50. [CrossRef]
- Dunn, O. J. (2017). Multiple comparisons among means. Journal of the American Statistical Association, 62(320), 1418–1432. [CrossRef]
- Krug, K., Kuka, M., Solovyev, A., & Kuster, B. (2020). Benchmarking of quantitative proteomics software for label-free data. Nature Methods, 17(10), 1007–1014. [CrossRef]
- Ting, L. J., Low, T. Y., Phua, K. K. B., & Sze, S. K. (2021b). Benchmarking computational methods for label-free quantitative proteomics. Nature Communications, 12(1), 1–13. [CrossRef]
- Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., ... & Yutani, H. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686. [CrossRef]
- Kassambara, A. (2020). ggpubr: “ggplot2” Based Publication Ready Plots (Version 0.4.0) [R package]. https://CRAN.R-project.org/package=ggpubr.
- Dinno, A. (2017). dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums (Version 1.3.1) [R package]. https://CRAN.R-project.org/package=dunn.test.
- Student. (1908). The probable error of a mean. Biometrika, 6(1), 1–25.
- Welch, B. L. (1947). The generalization of Student’s problem when several different population variances are involved. Biometrika, 34(1/2), 28–35.
- Gabry, J., & Goodrich, B. (2018). rstanarm: Bayesian Applied Regression Modeling via Stan. Journal of Statistical Software, 87(13), 1–39. [CrossRef]
- Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1–28. [CrossRef]
- Goeminne, L. J., Govaert, E., De Neve, J., Mertens, I., Van Deun, K., Van Bocxlaer, K., Gevaert, K., & Clement, L. (2020). Bayesian modelling of quantitative proteomics data: A practical guide. Molecular & Cellular Proteomics, 19(3), 560–572.
- Kall, L., Cannings, D., & MacCoss, M. J. (2007). A statistical model for peptide identification and protein quantification. Journal of Proteome Research, 6(9), 3704–3711. [CrossRef]
- Ferreira, D. F., & de Farias, L. B. (2012). The Kruskal–Wallis test for statistical analysis of experiments. Comunicata Scientiae, 3(2), 169-178.
- Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Sherlock, G., & White, R. (2000). Gene ontology: Tool for the unification of biology. Nature Genetics, 25(1), 25–29.
- Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). McGraw-Hill.
- Tyanova, S., Temu, T., Sinitcyn, P., Carlson, A., Hein, M. Y., Gehre, T., ... & Mann, M. (2016). The Perseus computational platform for comprehensive analysis of (prote)omics data. Nature Methods, 13(9), 731–740. [CrossRef]
- Aebersold, R., & Mann, M. (2016). Mass-spectrometric exploration of proteome structure and function. Nature, 15;537(7620):347-55. PMID: 27629641. [CrossRef]





| Experimental Design | Most Commonly Used Test | More Appropriate Test |
|---|---|---|
| Simple Comparison (A vs. B) | Student’s t-test [23] | limma (moderated t-test) [25], DEqMS [26], Bayesian models [27] |
| Multiple Conditions | One-way ANOVA [28] | limma [25], DEqMS [26], Bayesian models [27] |
| Time Series Experiments | ANOVA / Linear Regression [28] | Linear mixed-effects models (MSstats) [28], limma [25], DEqMS [26], Bayesian [27] |
| Multifactorial (e.g., treatment × time) | Factorial ANOVA [28] | Mixed-effects models (MSstats) [28], limma [25], DEqMS [26], Bayesian [27] |
| Controlled Reference Mixtures | ANOVA / t-test [28] | limma [25], DEqMS [26], Bayesian [27] |
| Spectral Count Data | QSpec [29] | QSpec [29], hierarchical Bayesian count models [27] |
| Extended Time Series (>4 points) | Regression / Clustering [28] | Linear mixed-effects models (MSstats) [28], Bayesian time series [27] |
| Low Replication Designs | t-test / PLGEM-STN [23] | PLGEM-STN [23], limma [25], DEqMS [26], Bayesian [27] |
| Work | Median CV (Cond1) (%) | Median CV (Cond2) (%) | SD del |Log2FC (Cond1 vs Cond2)| | Levene (p value) |
|---|---|---|---|---|
| 1 | 17.02 | 36.85 | 1.06 | 0 |
| 2 | 39.18 | 46.63 | 3.94 | 0.9313 |
| 3 | 17.16 | 12.92 | 2.73 | 0.1317 |
| 4 | 25.27 | 62.71 | 3.59 | 0.0117 |
| 5 | 74.69 | 62.2 | 3.9 | 0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).