Submitted:
07 May 2025
Posted:
08 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- How accurately can ML models predict distinct biomass regimes, and which factors most strongly drive these predictions?
- What is the relative influence of organic and mineral nitrogen inputs on biomass outcomes?
- How do nitrogen inputs compare with mowing frequency and other management practices in determining biomass yield?
2. Related Work
3. Methodology
The Biodiversity Exploratories
3.1. The Dataset
Biomass sampling
Management variables
Data Preprocessing and Cleaning
- Binary Classification: The mean value is used to distinguish between “low” and “high” biomass.
- Three-Class Classification: The 33rd and 66th percentiles define thresholds for “low”, “medium”, and “high” biomass.
- Four-Class Classification: The 25th, 50th, and 75th percentiles are used to assign four classes, typically labeled as “low”, “medium-low”, “medium-high”, and “high” biomass.
- Five-Class Classification: Quintiles (20%, 40%, 60%, 80%) are computed to assign labels “very_low”, “low”, “medium”, “high”, and “very_high” biomass.
Fertilization Preprocessing
- Step 1: Remove non-positive entries. For each variable, values (including special codes such as -1) were set to missing.
- Step 2: Distinguish fertilized from non-fertilized samples. If the categorical field Fertilization was “no,” every fertilizer variable for that sample was left as missing. This prevents artificial zeros from compressing distribution ranges.
3.2. Machine Learning
Feature Importance Analysis
Train/Test Split and Scaling
Data Augmentation with ADASYN
Hyperparameter Tuning
Model Evaluation and Visualization
Overview Experimental Setup
- Data Loading and Cleaning: Load the raw dataset, remove irrelevant columns, and handle missing values.
- Feature Engineering: Separate features into numeric and categorical sets, standardize numeric variables, and encode categorical variables.
- Biomass Categorization: Compute quantile-based thresholds and assign biomass observations to binary, three, four, or five classes.
- Data Splitting and Scaling: Split the data into training and validation sets; fit a StandardScaler on the training set and apply it to both sets.
- Oversampling: Use ADASYN to generate synthetic samples for minority classes and balance the training set.
- Hyperparameter Optimization and Training: Perform Bayesian Optimization with BayesSearchCV to tune CatBoost hyperparameters on the ADASYN training data, then train the final model.
- Validation and Analysis: Evaluate each model’s accuracy and other metrics on the original training set, the ADASYN-augmented training set, and the validation set. Generate confusion matrices, classification reports, distribution plots for nitrogen-related features, and extract feature importance scores.
4. Results and Discussion
4.1. Preliminary Evaluation
4.2. Results Binary Quantiles Classification
4.3. Results Three Quantiles Classification
4.4. Results Four Quantiles Classification
4.5. Results Five Quintiles Classification
4.6. Discussion
5. Conclusions
Research Questions Answered
- Prediction accuracy and key drivers: Machine learning models can predict broad biomass regimes (binary split) with good accuracy (77 %) and clearly identify key drivers, but their reliability declines with finer class granularity.
- Influence of nitrogen inputs: Mineral nitrogen input consistently ranks among the top predictors; while organic nitrogen’s role cannot be fully assessed due to data limitations, total nitrogen remains an important feature.
- Comparison of management factors: Mowing frequency and mineral nitrogen inputs outweigh other management practices in determining biomass yield, with secondary factors (e.g. drainage, maintenance) providing additional, but smaller, predictive value.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ADASYN | Adaptive Synthetic Sampling Approach |
| CAT | CatBoost |
| BO | Bayesian Optimization |
| ML | Machine Learning |
References
- O’Mara, F.P. The role of grasslands in food security and climate change. Annals of Botany 2012, 110, 1263–1270. [CrossRef]
- Bengtsson, J.; Bullock, J.M.; Egoh, B.; Everson, C.; Everson, T.; O’Connor, T.; O’Farrell, P.J.; Smith, H.G.; Lindborg, R. Grasslands—more important for ecosystem services than you might think. Ecosphere 2019, 10, e02582. [CrossRef]
- Gibon, A. Managing grassland for production, the environment and the landscape. Challenges at the farm and the landscape level. Livestock Production Science 2005, 96, 11–31. [CrossRef]
- Socher, S.A.; Prati, D.; Boch, S.; Müller, J.; Klaus, V.H.; Hölzel, N.; Fischer, M. Direct and productivity-mediated indirect effects of fertilization, mowing and grazing on grassland species richness. Journal of Ecology 2012, 100, 1391–1399. [CrossRef]
- Hautier, Y.; Niklaus, P.A.; Hector, A. Competition for light causes plant biodiversity loss after eutrophication. Science 2009, 324, 636–638. [CrossRef]
- Zhang, Y.; Loreau, M.; He, N.; Zhang, G.; Han, X. Mowing exacerbates the loss of ecosystem stability under nitrogen enrichment in a temperate grassland. Functional Ecology 2017, 31, 1637–1646. [CrossRef]
- Socher, S.A.; Prati, D.; Boch, S.; Müller, J.; Baumbach, H.; Gockel, S.; Hemp, A.; Schöning, I.; Wells, K.; Buscot, F.; et al. Interacting effects of fertilization, mowing and grazing on plant species diversity of 1500 grasslands in Germany differ between regions. Basic and Applied Ecology 2013, 14, 126–136. [CrossRef]
- Klimeš, L.; Klimešová, J. The effects of mowing and fertilization on carbohydrate reserves and regrowth of grasses: do they promote plant coexistence in species-rich meadows? In Ecology and Evolutionary Biology of Clonal Plants: Proceedings of Clone-2000. An International Workshop held in Obergurgl, Austria, 20–25 August 2000; Stuefer, J.F.; Erschbamer, B.; Huber, H.; Suzuki, J.I., Eds.; Springer: Dordrecht, 2002; pp. 141–160. [CrossRef]
- Hey, T.; Butler, K.; Jackson, S.; Thiyagalingam, J. Machine learning and big scientific data. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 2020, 378, 20190054. Publisher: Royal Society. [CrossRef]
- Chlingaryan, A.; Sukkarieh, S.; Whelan, B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture 2018, 151, 61–69. [CrossRef]
- Mallinger, K.; Raubitzek, S.; Neubauer, T.; Lade, S. Potentials and limitations of complexity research for environmental sciences and modern farming applications. Current Opinion in Environmental Sustainability 2024, 67, 101429. [CrossRef]
- Goldenits, G.; Mallinger, K.; Raubitzek, S.; Neubauer, T. Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture. Smart Agricultural Technology 2024, 8, 100512. [CrossRef]
- Mallinger, K.; Corpaci, L.; Neubauer, T.; Tikász, I.E.; Banhazi, T. Unsupervised and supervised machine learning approach to assess user readiness levels for precision livestock farming technology adoption in the pig and poultry industries. Computers and Electronics in Agriculture 2023, 213, 108239. [CrossRef]
- Mallinger, K.; Corpaci, L.; Neubauer, T.; Tikász, I.E.; Goldenits, G.; Banhazi, T. Breaking the barriers of technology adoption: Explainable AI for requirement analysis and technology design in smart farming. Smart Agricultural Technology 2024, 9, 100658. [CrossRef]
- Xu, K.; Su, Y.; Liu, J.; Hu, T.; Jin, S.; Ma, Q.; Zhai, Q.; Wang, R.; Zhang, J.; Li, Y.; et al. Estimation of degraded grassland aboveground biomass using machine learning methods from terrestrial laser scanning data. Ecological Indicators 2020, 108, 105747. [CrossRef]
- Wang, Y.; Qin, R.; Cheng, H.; Liang, T.; Zhang, K.; Chai, N.; Gao, J.; Feng, Q.; Hou, M.; Liu, J.; et al. Can Machine Learning Algorithms Successfully Predict Grassland Aboveground Biomass? Remote Sensing 2022, 14. [CrossRef]
- Peng, W.; Karimi Sadaghiani, O. Enhancement of quality and quantity of woody biomass produced in forests using machine learning algorithms. Biomass and Bioenergy 2023, 175, 106884. [CrossRef]
- Huang, W.; Li, W.; Xu, J.; Ma, X.; Li, C.; Liu, C. Hyperspectral Monitoring Driven by Machine Learning Methods for Grassland Above-Ground Biomass. Remote Sensing 2022, 14. [CrossRef]
- Morais, T.G.; Teixeira, R.F.; Figueiredo, M.; Domingos, T. The use of machine learning methods to estimate aboveground biomass of grasslands: A review. Ecological Indicators 2021, 130, 108081. [CrossRef]
- Shafizadeh, A.; Shahbeig, H.; Nadian, M.H.; Mobli, H.; Dowlati, M.; Gupta, V.K.; Peng, W.; Lam, S.S.; Tabatabaei, M.; Aghbashlo, M. Machine learning predicts and optimizes hydrothermal liquefaction of biomass. Chemical Engineering Journal 2022, 445, 136579. [CrossRef]
- Muro, J.; Linstädter, A.; Magdon, P.; Wöllauer, S.; Männer, F.A.; Schwarz, L.M.; Ghazaryan, G.; Schultz, J.; Malenovský, Z.; Dubovyk, O. Predicting plant biomass and species richness in temperate grasslands across regions, time, and land management with remote sensing and deep learning. Remote Sensing of Environment 2022, 282, 113262. [CrossRef]
- Künast, R.; Weisser, W.W.; Seibold, S.; Mayr, D.; Siegmüller, N.; Schneider, I.; Westenrieder, M.; Blüthgen, N.; Staab, M.; Meyer, S.T.; et al. Differential effect of grassland mowing on arthropod taxa. Ecological Entomology 2025, 50, 288–298, [https://resjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/een.13400]. [CrossRef]
- Hartlieb, M.; Raubitzek, S.; Berger, J.L.; Staab, M.; Vogt, J.; Ayasse, M.; Ostrowski, A.; Weisser, W.; Blüthgen, N. Assessing mowing intensity: A new index incorporating frequency, type of machinery, and technique. Grassland Research 2024, 3, 264–274, [https://onlinelibrary.wiley.com/doi/pdf/10.1002/glr2.12089]. [CrossRef]
- Blüthgen, N.; Dormann, C.F.; Prati, D.; Klaus, V.H.; Kleinebecker, T.; Hölzel, N.; Alt, F.; Boch, S.; Gockel, S.; Hemp, A.; et al. A quantitative index of land-use intensity in grasslands: Integrating mowing, grazing and fertilization. Basic and Applied Ecology 2012, 13, 207–220. [CrossRef]
- Gossner, M.M.; Lewinsohn, T.M.; Kahl, T.; Grassein, F.; Boch, S.; Prati, D.; Birkhofer, K.; Renner, S.C.; Sikorski, J.; Wubet, T.; et al. Land-use intensification causes multitrophic homogenization of grassland communities. Nature 2016, 540, 266–269. [CrossRef]
- Fischer, M.; Bossdorf, O.; Gockel, S.; Hänsel, F.; Hemp, A.; Hessenmöller, D.; Korte, G.; Nieschulze, J.; Pfeiffer, S.; Prati, D.; et al. Implementing large-scale and long-term functional biodiversity research: The Biodiversity Exploratories. Basic and Applied Ecology 2010, 11, 473–485. [CrossRef]
- Vogt, J.; Klaus, V.H.; Both, S.; Fürstenau, C.; Gockel, S.; Gossner, M.M.; Heinze, J.; Hemp, A.; Hölzel, N.; Jung, K.; et al. Eleven years’ data of grassland management in Germany. Biodiversity Data Journal 2019, 7, e36387. [CrossRef]
- Hinderling, J.; Penone, C.; Fischer, M.; Prati, D.; Penone, C. Biomass data for grassland EPs, 2009 - 2022, 2024. Dataset ID: 31448.
- Bazzichetto, M.; Sperandii, M.G.; Penone, C.; Keil, P.; Allan, E.; Lepš, J.; Prati, D.; Fischer, M.; Bolliger, R.; Gossner, M.M.; et al. Biodiversity promotes resistance but dominant species shape recovery of grasslands under extreme drought. Journal of Ecology 2024, 112, 1087–1100. [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2018, pp. 6638–6648, [arXiv:cs.LG/1706.09516]. Accessed on 2025-03-10.
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: gradient boosting with categorical features support. In Proceedings of the Workshop on ML Systems at NeurIPS, 2017. Accessed on 2025-03-10.
- Quinlan, J.R. Induction of Decision Trees. Machine Learning 1986, 1, 81–106. Accessed on 2025-03-10. [CrossRef]
- Breiman, L. Random Forests. Machine Learning 2001, 45, 5–32. Accessed on 2025-03-10. [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016; KDD ’16, pp. 785–794. Accessed on 2025-03-10. [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2017; NIPS’17, pp. 3149–3157.
- Raubitzek, S.; Corpaci, L.; Hofer, R.; Mallinger, K. Scaling Exponents of Time Series Data: A Machine Learning Approach. Entropy 2023, 25. Accessed on 2025-03-10. [CrossRef]
- Raubitzek, S.; Mallinger, K. On the Applicability of Quantum Machine Learning. Entropy 2023, 25. Accessed on 2025-03-10. [CrossRef]
- Corpaci, L.; Wagner, M.; Raubitzek, S.; Kampel, L.; Mallinger, K.; Simos, D.E. Estimating Combinatorial t-Way Coverage Based on Matrix Complexity Metrics. In Proceedings of the Testing Software and Systems; Menéndez, H.D.; Bello-Orgaz, G.; Barnard, P.; Bautista, J.R.; Farahi, A.; Dash, S.; Han, D.; Fortz, S.; Rodriguez-Fernandez, V., Eds., Cham, 2025; pp. 3–20.
- Mallinger, K.; Raubitzek, S.; Neubauer, T.; Lade, S. Potentials and limitations of complexity research for environmental sciences and modern farming applications. Current Opinion in Environmental Sustainability 2024, 67, 101429. Accessed on 2025-03-10. [CrossRef]
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms, 2012, [arXiv:stat.ML/1206.2944]. Accessed on 2025-03-10.
- Head, T.; Kumar, M.; Nahrstaedt, H.; Louppe, G.; Shcherbatyi, I. scikit-optimize/scikit-optimize (v0.9.0), 2021. Accessed on 2025-03-10. [CrossRef]
- Developers, C. Feature Importance Calculation in CatBoost. https://catboost.ai/docs/en/features/feature-importances-calculation, 2024. Accessed on 2025-03-10.
- Fenz, S.; Neubauer, T.; Heurix, J.; Friedel, J.K.; Wohlmuth, M.L. AI- and data-driven pre-crop values and crop rotation matrices. European Journal of Agronomy 2023, 150, 126949. [CrossRef]
- Goldenits, G.; Mallinger, K.; Raubitzek, S.; Neubauer, T. Current applications and potential future directions of reinforcement learning-based Digital Twins in agriculture. Smart Agricultural Technology 2024, 8, 100512. [CrossRef]
- Goldenits, G.; Neubauer, T.; Raubitzek, S.; Mallinger, K.; Weippl, E. Tabular Reinforcement Learning for Reward Robust, Explainable Crop Rotation Policies Matching Deep Reinforcement Learning Performance. Preprints 2024. [CrossRef]

















| Variable | Description | Unit | Scale |
|---|---|---|---|
| Mowing | Cuts per year | 1 y | Qty. |
| minNitrogen_kgNha | Mineral-N input | kg h | Ratio |
| orgNitrogen_kgNha | Organic-N input | kg h | Ratio |
| MowingConditioner | Conditioner on mowing machine | — | Bool |
| NbFertilization | Fertilizer applications per year | — | Qty. |
| totalNitrogen_kgNha | Total N (min. + org.) | kg h | Ratio |
| Drainage | Drainage applied (yes/no) | — | Spec. |
| Leveling | Levelling/matting break-up | — | Qty. |
| Manure_tha | Solid manure applied | t h | Ratio |
| Slurry_m3ha | Slurry applied | h | Ratio |
| Maintenance | Other maintenance present | — | Bool |
| MowingMachine | Machine type (rotary, knife, mulcher) | — | Spec. |
| minPhosphorus_kgPha | Mineral P input | kg h | Ratio |
| Fertilization | Any fertilizer added | — | Bool |
| WaterLogging | Water-storage action | — | Spec. |
| Sulphur_kgSha | Sulphur input | kg h | Ratio |
| minPotassium_kgKha | Mineral K input | kg h | Ratio |
| Test Data | ||||
|---|---|---|---|---|
| Class | Precision | Recall | F1-Score | Support |
| high | 0.72 | 0.69 | 0.71 | 49 |
| low | 0.79 | 0.82 | 0.81 | 71 |
| Accuracy | 0.77 | 120 | ||
| Macro Avg | 0.76 | 0.76 | 0.76 | 120 |
| Weighted Avg | 0.77 | 0.77 | 0.77 | 120 |
| Synthetic Training Data (ADASYN) | ||||
| high | 0.84 | 0.64 | 0.73 | 1530 |
| low | 0.73 | 0.89 | 0.80 | 1650 |
| Accuracy | 0.77 | 3180 | ||
| Macro Avg | 0.78 | 0.76 | 0.76 | 3180 |
| Weighted Avg | 0.78 | 0.77 | 0.76 | 3180 |
| Original Training Data (No ADASYN) | ||||
| high | 0.83 | 0.66 | 0.73 | 546 |
| low | 0.71 | 0.86 | 0.78 | 525 |
| Accuracy | 0.76 | 1071 | ||
| Macro Avg | 0.77 | 0.76 | 0.76 | 1071 |
| Weighted Avg | 0.77 | 0.76 | 0.75 | 1071 |
| Test Data | ||||
|---|---|---|---|---|
| Class | Precision | Recall | F1-Score | Support |
| high | 0.66 | 0.64 | 0.65 | 39 |
| medium | 0.56 | 0.23 | 0.32 | 40 |
| low | 0.55 | 0.88 | 0.67 | 41 |
| Accuracy | 0.58 | 120 | ||
| Macro Avg | 0.59 | 0.58 | 0.55 | 120 |
| Weighted Avg | 0.59 | 0.58 | 0.55 | 120 |
| Synthetic Training Data (ADASYN) | ||||
| high | 0.79 | 0.72 | 0.75 | 743 |
| medium | 0.79 | 0.53 | 0.63 | 710 |
| low | 0.63 | 0.88 | 0.74 | 776 |
| Accuracy | 0.71 | 2229 | ||
| Macro Avg | 0.74 | 0.71 | 0.71 | 2229 |
| Weighted Avg | 0.73 | 0.71 | 0.71 | 2229 |
| Original Training Data (No ADASYN) | ||||
| high | 0.75 | 0.70 | 0.73 | 366 |
| medium | 0.74 | 0.48 | 0.58 | 353 |
| low | 0.60 | 0.85 | 0.70 | 352 |
| Accuracy | 0.68 | 1071 | ||
| Macro Avg | 0.70 | 0.68 | 0.67 | 1071 |
| Weighted Avg | 0.70 | 0.68 | 0.67 | 1071 |
| Test Data | ||||
|---|---|---|---|---|
| Class | Precision | Recall | F1-Score | Support |
| very_high | 0.57 | 0.47 | 0.52 | 34 |
| high | 0.21 | 0.33 | 0.26 | 15 |
| low | 0.42 | 0.28 | 0.34 | 39 |
| very_low | 0.45 | 0.59 | 0.51 | 32 |
| Accuracy | 0.42 | 120 | ||
| Macro Avg | 0.41 | 0.42 | 0.41 | 120 |
| Weighted Avg | 0.45 | 0.42 | 0.43 | 120 |
| Synthetic Training Data (ADASYN) | ||||
| very_high | 0.78 | 0.73 | 0.76 | 1274 |
| high | 0.76 | 0.54 | 0.63 | 1218 |
| low | 0.68 | 0.56 | 0.62 | 1208 |
| very_low | 0.55 | 0.84 | 0.67 | 1225 |
| Accuracy | 0.67 | 4925 | ||
| Macro Avg | 0.69 | 0.67 | 0.67 | 4925 |
| Weighted Avg | 0.69 | 0.67 | 0.67 | 4925 |
| Original Training Data (No ADASYN) | ||||
| very_high | 0.70 | 0.70 | 0.70 | 264 |
| high | 0.68 | 0.49 | 0.57 | 282 |
| low | 0.60 | 0.49 | 0.54 | 259 |
| very_low | 0.53 | 0.78 | 0.63 | 266 |
| Accuracy | 0.62 | 1071 | ||
| Macro Avg | 0.63 | 0.62 | 0.61 | 1071 |
| Weighted Avg | 0.63 | 0.62 | 0.61 | 1071 |
| Test Data | ||||
|---|---|---|---|---|
| Class | Precision | Recall | F1-Score | Support |
| very_high | 0.59 | 0.36 | 0.44 | 28 |
| high | 0.15 | 0.18 | 0.16 | 17 |
| medium | 0.24 | 0.21 | 0.22 | 24 |
| low | 0.24 | 0.19 | 0.21 | 26 |
| very_low | 0.39 | 0.64 | 0.48 | 25 |
| Accuracy | 0.33 | 120 | ||
| Macro Avg | 0.32 | 0.31 | 0.31 | 120 |
| Weighted Avg | 0.34 | 0.33 | 0.32 | 120 |
| Synthetic Training Data (ADASYN) | ||||
| very_high | 0.77 | 0.70 | 0.73 | 1548 |
| high | 0.71 | 0.54 | 0.61 | 1552 |
| medium | 0.69 | 0.61 | 0.65 | 1558 |
| low | 0.58 | 0.53 | 0.56 | 1577 |
| very_low | 0.48 | 0.75 | 0.58 | 1477 |
| Accuracy | 0.62 | 7712 | ||
| Macro Avg | 0.65 | 0.63 | 0.63 | 7712 |
| Weighted Avg | 0.65 | 0.62 | 0.63 | 7712 |
| Original Training Data (No ADASYN) | ||||
| very_high | 0.69 | 0.62 | 0.66 | 210 |
| high | 0.59 | 0.45 | 0.51 | 221 |
| medium | 0.60 | 0.54 | 0.57 | 214 |
| low | 0.52 | 0.49 | 0.50 | 212 |
| very_low | 0.47 | 0.71 | 0.57 | 214 |
| Accuracy | 0.56 | 1071 | ||
| Macro Avg | 0.58 | 0.56 | 0.56 | 1071 |
| Weighted Avg | 0.58 | 0.56 | 0.56 | 1071 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).