Preprint
Article

This version is not peer-reviewed.

Raman Spectroscopy of Protein-Polysaccharide Conjugates: A Comparative Study of Tree-Based Ensemble Models

Submitted:

29 December 2025

Posted:

30 December 2025

You are already at the latest version

Abstract
Proteins with additives, especially in small quantities, are of great interest as a subject of a study. Machine learning approaches implemented to Raman spectroscopy data could provide an insight into chemical structure of such mixtures or conjugates. Although, de-cision tree model could be powerful in solving either classification or regression task and could provide accessible predictions, it is prone to overfitting. Ensemble models that implement several decision trees could overcome the determined problem. Five different model types are discussed: RandomForest, GradientBoosting, AdaBoost, Voting, and Stacking. Raman spectroscopy data of whey protein isolate (5 wt. %) with different amounts of hyaluronic acid (0, 0.1, 0.25, and 0.5 wt. %) were used as datasets. Optimiza-tion established that ensembles of 200 decision trees with a maximum depth of four were optimal. AdaBoostClassifier found to be the most efficient in finding differences between whey protein isolate and its conjugates with hyaluronic acid: 99.5% accuracy, 100% sen-sitivity, and 98.0% specificity. Stacking of RandomForest, GradientBoosting, and Ada-Boost regressors with final estimator of RidgeCV was the most effective approach in the regression task (R2 = 0.963). According to the feature importance plots, the Raman bands that were most influential in predicting the results were 1003 cm-1 (phenylalanine, ring breath), 1206 cm-1 (C–C stretching), 1240 cm-1 (amide III (β-sheet), N−H in-plane bend, C−N stretch), and 1399 cm-1 (aspartic and glutamic acids, C=O stretch of COO−).
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated