Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Selection of the Most Informative Wavenumbers to Improve Prediction Accuracy of Milk Fatty Acid Profile Based on Milk Mid-Infrared Spectra Data

Version 1 : Received: 19 October 2023 / Approved: 20 October 2023 / Online: 20 October 2023 (10:15:41 CEST)

How to cite: Lou, W.; Brito, L.F.; Zhao, X.; Li, J.; Wang, Y. Selection of the Most Informative Wavenumbers to Improve Prediction Accuracy of Milk Fatty Acid Profile Based on Milk Mid-Infrared Spectra Data. Preprints 2023, 2023101324. https://doi.org/10.20944/preprints202310.1324.v1 Lou, W.; Brito, L.F.; Zhao, X.; Li, J.; Wang, Y. Selection of the Most Informative Wavenumbers to Improve Prediction Accuracy of Milk Fatty Acid Profile Based on Milk Mid-Infrared Spectra Data. Preprints 2023, 2023101324. https://doi.org/10.20944/preprints202310.1324.v1

Abstract

Milk MIR spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test-day records, and reference FA concentrations of 155 first-parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and five feature selection methods were evaluated. The results indicated that the Competitive Adaptive Reweighted Sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables [days in milk at the test day, age at the test day, pregnancy stage (in days), number of days open, number of inseminations, and somatic cell count] yielded the best FA profile predictions based on Partial Least Square regression. In particular, ten FAs (C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium-chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA) presented accuracies based on the determination coefficient (R2cv) ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. By running CARS 1,000 times in internal validations, we obtained the frequency of selected milk MIR wavenumber for 35 FAs. The most related wavenumbers to FAs were found within 1,003 to 1,145 cm-1, while other discrete areas were between 1,651 to 1,797 and 2,834 to 2,954 cm-1. These biomarkers may give insights into the relationship between MIR spectra and FA phenotypes. In conclusion, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.

Keywords

feature selection; milk mid-infrared spectra; fatty acids concentration; regression

Subject

Biology and Life Sciences, Animal Science, Veterinary Science and Zoology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.