Integrating Envirotyping and Phenomics for AI-Enabled Multi-Environment Genomic Prediction in Crop Breeding

Xiongwei Liang; Shaopeng Yu; Yongfu Ju; Yingning Wang; Dawei Yin

doi:10.20944/preprints202604.0950.v1

Submitted:

13 April 2026

Posted:

14 April 2026

You are already at the latest version

Abstract

Genomic prediction is now routine in crop improvement, but its main bottleneck has shifted from marker density to environmental complexity. Breeders rarely need predictions for one fixed environment; they need to rank genotypes across target populations of environments that differ in weather, soils, management, and stress timing. This makes genotype-by-environment interaction a primary breeding problem rather than a secondary statistical nuisance. This review examines how genomic, environmental, and phenomic information can be integrated to improve multi-environment prediction in crop breeding pipelines. The review is narrative rather than PRISMA-style, but the literature search and selection logic were structured and explicitly defined. Peer-reviewed English-language studies were identified through structured searches of Web of Science Core Collection and Scopus, supplemented by backward citation screening, with emphasis on literature published from January 2023 to March 2026. Four conclusions emerge. First, environmental information is most useful when it is developmentally aligned, biologically interpretable, and matched to the target population of environments. Second, strong structured statistical baselines remain highly competitive, especially in moderate-sized or highly unbalanced datasets, whereas gains from more flexible machine-learning and deep-learning approaches are most evident in large, sparse, heterogeneous, and multimodal settings. Third, phenomic markers often improve prediction for complex traits, especially yield, because they capture realized crop responses not fully represented by markers alone. Fourth, practical value depends less on isolated gains in predictive accuracy than on evaluation under realistic deployment scenarios, including untested genotype and untested environment settings. Progress therefore requires transparent reporting, benchmark design, stage-aware envirotyping, multimodal integration, uncertainty reporting, and cost-aware deployment.

Keywords:

genomic prediction

;

genotype-by-environment interaction

;

target population of environments

;

envirotyping

;

enviromics

;

phenomics

;

high-throughput phenotyping

;

multimodal learning

;

plant breeding

;

climate-resilient crops

Subject:

Biology and Life Sciences - Agricultural Science and Agronomy

1. Introduction

Artificial intelligence has become a dominant label in discussions of digital crop improvement, yet much of the current methodological difficulty in breeding is not caused by a lack of algorithms in the abstract. Marker-assisted selection, genome-wide genotyping, crop modeling, and remote sensing have all advanced quickly, and recent reviews now cover AI methods in crop research, broad genomic selection pipelines, drone phenotyping platforms, and phenomics more generally [1,2,3,4,5,6,7,8,9,10,11]. However, breeding programs rarely fail because they cannot fit a predictor to one dataset. They fail because rankings of genotypes change across years, locations, management systems, and stress patterns. In practical terms, the core problem is not only predicting phenotype from genotype, but predicting phenotype from genotype under incomplete, noisy, and shifting environmental representation [12,13,14,15,16,17,18,19,20,21,22,23,24,25].

This distinction matters because many breeding targets are deeply contingent on environmental context. Grain yield, flowering time stability, grain moisture, canopy development, adaptation to drought or heat windows, and harvestable quality traits are all influenced by interactions among genotype, development, weather trajectories, soils, and management [26,27,28,29,30,31,32,33,34,35,36]. When breeding decisions are made for a target population of environments rather than a single testing site, genotype-by-environment interaction is not backgrounding noise. It becomes the object that prediction models must represent [37].

The recent literature has clarified two recurring misunderstandings. The first is that adding more environmental data automatically improves genomic prediction. In practice, raw weather tables, poorly chosen environmental covariates, or site labels can add variance without improving transportable inference unless they are aligned with crop development and with the intended deployment scenario [37,38,39]. The second is that deep learning should generally outperform structured statistical baselines. The evidence does not support such a universal claim. Mixed models, factor-analytic formulations, and reaction-norm approaches remain difficult to beat in moderate datasets or in settings with severe missingness, whereas machine learning and deep learning become more compelling when data are larger, more heterogeneous, more multimodal, and evaluated in harder extrapolation settings [37,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58].

These trends make the present review timely. Recent reviews have already described AI in plant breeding in broad terms, summarized crop phenotyping technologies, or revisited genomic selection as a whole [1,2,3,4,5,6,7,8,9,10,11]. The contribution needed now is more specific. Between 2023 and 2026, the literature has shifted toward explicit integration of envirotyping, enviromics, phenomics, and multi-environment genomic prediction for operational breeding decisions. This period includes the expansion of benchmark-scale trial resources, the rise of stage-aware environmental covariates, increased use of multimodal learning, and more frequent testing of models under breeder-relevant validation scenarios such as new genotype or new environment prediction [59,60].

The central argument of this review is therefore narrow and practical. The next gains in genomic prediction for crop breeding are unlikely to come from marker-only modeling alone. They are more likely to come from better environmental representation, better use of field phenomics, and stricter alignment between model evaluation and the breeding decisions a program needs to make.

At the same time, the current evidence base is spreading into a wider set of crop-specific and platform-oriented discussions, including digitalized breeding frameworks, wheat-focused AI reviews, climate-smart cereal pipelines, and early foundation-model or platform-based concepts [61,62,63,64,65,66,67]. That broader diffusion is useful, but it also means that readers need clearer distinctions between conceptual enthusiasm, crop-specific proof of concept, and studies that evaluate breeder-relevant prediction under multi-environment uncertainty.

2. Scope, Search Strategy, and Positioning of This Review

2.1. Literature Search and Selection Strategy

This article is a narrative review, not a PRISMA-style systematic review, but the literature selection process was structured rather than ad hoc. Literature identification was performed on 27-28 March 2026 using Web of Science Core Collection and Scopus as the primary bibliographic databases, followed by structured backward citation screening from relevant recent reviews and primary studies. These databases were selected because they are widely used in review-based evidence synthesis and provide broad coverage of peer-reviewed plant breeding, crop science, genetics, phenotyping, and data-integration literature. Searches were conducted using combinations of title, abstract, and keyword terms, and journal-page checks and citation tracing were used selectively for scope confirmation and metadata verification.

The search period was defined a priori as January 2023 to March 2026 to capture the most recent methodological phase of AI-enabled multi-environment prediction. This range was chosen because it corresponds to a visible transition from broad conceptual discussion toward benchmark-oriented, data-integrative studies that explicitly combine genomic, environmental, and phenomic information. Representative search strings included combinations of the following terms: "multi-environment genomic prediction crop breeding", "genotype environment genomic prediction crop", "envirotyping genomic prediction crop breeding", "enviromics genomic prediction plant breeding", "phenomic prediction crop breeding", "high-throughput phenotyping genomic prediction crop", "deep learning genomic prediction crop breeding", "machine learning genomic selection crop breeding", "environmental covariates genomic prediction crop", and "phenomics-assisted genomic prediction crop". Refinement searches were then added for narrower corridors such as satellite enviromics, UAS-enabled phenotyping, field phenotyping plus genomic prediction, and multi-trait multi-environment prediction. The complete search strings are provided in Supplementary Table S1.

The inclusion logic was designed to match the review question rather than to maximize article count. Studies were retained when they were peer-reviewed, written in English, published within the defined period, and directly relevant to at least one of the following: multi-environment genomic prediction, genotype-by-environment modeling, environmental covariate engineering, envirotyping or enviromics for breeding, phenomics-assisted prediction, multimodal genomic prediction, or breeder-relevant deployment frameworks. Studies were excluded when their focus was unrelated to breeding prediction, for example generic smart farming, crop disease image classification, Internet-of-Things applications, purely molecular stress biology without a predictive breeding component, or non-peer-reviewed preprints used as stand-alone evidence.

Reviews and primary studies were handled differently. Recent reviews were retained mainly to define prior scope, terminology, and outstanding gaps. Primary studies were prioritized when the manuscript made comparative claims about model classes, validation scenarios, gains over baseline methods, or operational deployment. When several papers addressed similar questions, priority was given to studies that reported at least some combination of crop, trait, study scale, validation design, or explicit comparison against a baseline model. The structured database searches and crop-specific refinement searches were deliberately narrow and yielded a deduplicated working pool that was taken forward to full-text assessment. Recent reviews were used mainly to define prior scope, terminology, and outstanding gaps, whereas primary studies were prioritized when the manuscript made comparative claims about model classes, validation scenarios, gains over baseline methods, or operational deployment. Backward citation screening was iterative rather than indefinite and was stopped when newly identified studies no longer changed crop coverage, modality coverage, or methodological conclusions. Supplementary Figure S1 summarizes this workflow for a structured narrative review, and Supplementary Table S2 lists the included studies. Backward citation screening was iterative rather than indefinite and was stopped when newly identified studies no longer changed crop coverage, modality coverage, or methodological conclusions. Supplementary Figure S1 summarizes this transparent workflow for a structured narrative review, and Supplementary Table S2 lists the included studies and how they were used in the manuscript. This logic does not make the review exhaustive in the systematic-review sense or turn it into a PRISMA-style selection funnel, but it does make the selection process inspectable and fit for a focused methodological synthesis.

2.2. Terminology Used in This Review

Several terms are used inconsistently across the recent literature, so they are standardized here. Genotype-by-environment interaction (G×E) refers to differential genotype performance across environments. Target population of environments (TPE) refers to the environmental domain for which breeding decisions are intended [22]. In this review, envirotyping is used for biologically informed characterization of the environments experienced by a crop, typically using weather, soil, developmental, or management-derived descriptors [19,32,33]. Enviromics is used more broadly for large-scale environmental data integration for prediction, recommendation, or environmental similarity analysis, often including geospatial and remote-sensing layers [33,68,69,70,71]. Phenomics refers to high-dimensional plant-response measurements, especially repeated field-based measurements obtained through proximal or remote sensing [5,9,59,60,72,73,74,75]. In this review, the term phenomic markers refers to high-dimensional plant-response measurements used as predictive features, regardless of whether they originate from proximal sensing, UAS platforms, or derived temporal summaries. Multimodal learning refers to models that integrate more than one data layer, such as markers plus environmental covariates, or markers plus environmental plus phenomic information [41,44,56,76,77,78,79].

Two additional terms require practical clarification. Sparse testing refers to breeding strategies in which not all genotypes are tested in all environments, with prediction used to recover missing genotype-by-environment combinations [80,81]. Deployment scenario refers to the prediction context that the breeder faces, such as untested genotype in tested environment, tested genotype in untested environment, untested genotype in untested environment, or late-stage recommendation across a region [21,34,37,44,45]. Common abbreviations used repeatedly in the following sections are G×E, genotype-by-environment interaction; TPE, target population of environments; UAS, unmanned aircraft system; UAV, unmanned aerial vehicle; FA, factor analytic; and MTME, multi-trait multi-environment. UAS is used as the preferred umbrella term in the main text, whereas UAV is retained only where it remains conventional in the source literature.

2.3. What Distinguishes This Review from Recent Reviews

Several recent reviews have already addressed parts of this topic, but they do not fully cover the same analytical corridor. Some reviews are broad overviews of AI in crop science or plant breeding [6,7,11]. Others focus on genomic selection at large [2,10], or on phenotyping technologies and sensor systems [5,9]. By contrast, this review concentrates specifically on how genomic, environmental, and phenomic information are being integrated for multi-environment prediction, and how these models should be judged under operational breeding scenarios.

Three aspects define the intended novelty. First, the review is evidence-centered rather than only concept-centered. It contrasts representative studies across crops, data layers, model families, validation designs, and practical deployment stages. Second, it treats the 2023-2026 period as a distinct methodological phase, marked by benchmark expansion, increasing attention to new-environment prediction, more explicit environmental feature engineering, and broader use of multimodal learning. Third, it translates methodological discussion into deployment guidance, including failure modes, validation logic, uncertainty, reporting standards, and stage-specific model choice. In other words, the purpose is not merely to update citations, but to reorganize recent evidence around the practical problem of breeding under environmental uncertainty.

Figure 1. Conceptual workflow showing how genomic, environmental, and phenomic layers feed into model classes and breeding decisions; the arrows indicate that predictive value depends on aligning each data layer with validation design and decision stage [21,26,28,33,34,37,44,59].

Table 1. Neutral scoping comparison between selected recent reviews and the present review.

Period covered	Main scope	Environmental integration covered?	Phenomics integration covered?	Deployment/validation focus?	Distinctive focus relative to the present review
Broad methodological literature to 2022 [1]	Deep learning for crop genomic selection with environmental data	Yes	Indirect	Limited	Broad model survey; less emphasis on 2023-2026 comparative multimodal evidence and deployment framing
Historical genomic-selection literature to 2023 [2]	General genomic selection for crop improvement	Partial	Limited	Limited	Genomic-selection background; less specific emphasis on multi-environment prediction under explicit G×E and TPE logic
Historical drone-phenotyping literature to 2023 [5]	Drone imaging and phenotyping for breeding	Indirect	Yes	Limited	Sensor-platform overview; less emphasis on whether phenomics alters breeder-relevant prediction
Broad AI literature to 2023 [7]	AI methods across crop science	Partial	Partial	Limited	Broad AI coverage; less specific emphasis on multimodal prediction for breeding deployment
Historical field-phenotyping literature to 2024 [9]	Field crop phenotyping methods and trajectories	Indirect	Yes	Limited	Phenomics context; less explicit integration with genomic and environmental prediction
Historical genomic-selection literature to 2024 [10]	Applications and prospects of genomic selection	Partial	Limited	Partial	Breeding background; less emphasis on recent deployment scenarios, baseline choice, and reporting standards

Three scope boundaries should also be stated explicitly. First, this review is about prediction for breeding use, not only biological discovery. Second, it emphasizes field or breeding-program relevance rather than controlled-environment phenotyping alone. Third, it does not treat image-based disease detection, generic smart agriculture, or stand-alone omics reviews as central evidence unless they directly inform multi-environment breeding prediction.

3. Why Multi-Environment Genomic Prediction Has Become a Bottleneck

3.1. Prediction Targets in Breeding Are Deployment Specific

Breeding programs rarely ask a single predictive question. They often need to rank untested genotypes in tested environments, tested genotypes in untested environments, untested genotypes in untested environments, and materials that combine mean performance with environmental stability. The difference among these targets is not cosmetic. A model that performs well in random cross-validation can still fail when the environment itself changes, or when the prediction target is a genuinely future year or untested location [44,45,81].

This is why large multi-environment datasets have become so important. The curated maize Genomes-to-Fields resource linked more than 70,000 phenotypic records to genomic and environmental information across more than 130 year-locations and over 4,000 hybrids [26]. Such resources do not merely improve statistical power. They expose the mismatch between the environmental diversity that breeding programs care about and the subset of environments that are represented in routine trials. Even large trial networks still under sample stress timing, management variation, and environment types within the TPE [21,22].

Sparse testing studies in cassava, sugarcane, and other crops reinforce the same point. Prediction is becoming part of trial design rather than an afterthought applied once all data have been collected [80,81]. Consequently, multi-environment genomic prediction should be viewed as a design-and-deployment problem, not only a model-fitting problem.

3.2. Why Marker-Only Models Can Underperform for Environmentally Contingent Traits

Genomic prediction remains foundational in crop improvement, and recent reviews rightly emphasize its continuing value [82]. However, marker-only models can underperform in deployment scenarios where realized trait expression depends strongly on development-stage exposure to the environment. Grain yield is the clearest example, but flowering time, grain moisture, plasticity, and adaptation-related traits frequently show similar behavior [34,35,36].

The recent evidence is directionally consistent but conditional. In a winter wheat field study with 2,994 lines evaluated across two sites and two years, phenomic markers from multispectral, hyperspectral, and visual field data explained more yield variation than genomic markers alone and combining the two data layers improved predictive performance over the strongest phenomic-only baseline [28]. In a maize multi-environment study using environmental feature engineering, machine-learning models reported gains over a factor-analytic mixed-model baseline, but those gains were tied to specific validation settings and environmental inputs rather than being universal [34]. In large maize trial networks, environmental covariates improved prediction in new environments when modeled through latent-factor or reaction-norm structures, but the magnitude of benefit depended on the environmental scenario and data sparsity [26,37].

The implication is not that markers are insufficient in general. Rather, markers encode inherited potential, whereas the realized phenotype is jointly determined by genotype, environment, management, and crop development. When the model does not represent environmental exposure well, genotype effects are forced to absorb structured variation they cannot fully explain. This is one reason why marginal gains observed within familiar environments can disappear when prediction is moved to unfamiliar or under-sampled conditions.

Smaller crop-specific studies point in the same general direction but also reveal how uneven the evidence base still is. Hybrid grain sorghum, spring barley, cotton, white lupin, coffee, and lentil studies all suggest that multi-environment or multi-source prediction can be useful beyond the dominant maize and wheat systems, but these studies are often based on fewer environments, narrower trait panels, or more local breeding contexts [83,84,85,86,87,88].

3.3. The Target Population of Environments Is Not a Background Concept

Recent breeding theory has sharpened the role of the TPE and shown why it belongs at the center of prediction design [22]. A TPE is not just a list of trial sites. It is a distribution of environment types, frequencies, stress combinations, and management contexts relevant to selection and product placement. This has two practical consequences. First, evaluation datasets must be interpreted relative to what part of the TPE they represent. Second, environmental descriptors should be chosen for their relevance to the TPE, not merely for their availability.

This framing helps explain why prediction in plant breeding is both statistical and operational. A model can only learn from the environmental states it has seen, the descriptors it has been given, and the missing-data pattern built into the field design. Environmental representation is therefore not a secondary preprocessing step. It is a core determinant of what kind of extrapolation is possible.

4. Environmental Representation: Envirotyping, Enviromics, and Crop Context

4.1. What Counts as Useful Environmental Information

The most mature recent studies treat environmental information as more than appended weather tables. Usable environmental information is developmentally meaningful, spatially relevant, and aligned with the intended prediction target [68,69,70,71]. This can include weather summarized by crop stage, soil descriptors, geospatial terrain variables, crop-model outputs, management proxies, or larger-scale remote-sensing products.

That distinction matters because raw environmental abundance does not guarantee useful signal. Daily weather records can still be weak predictors if they are not aligned with phenology or stress windows. Stage-aware covariates, photoperiod descriptors, temperature or radiation summaries linked to crop development, and water-balance proxies are often more informative because they translate exposure into biologically interpretable quantities [89,90]. The question is not whether environmental data should be included, but which environmental representations remain meaningful when moved from one breeding decision context to another.

4.2. Envirotyping and Enviromics Should Not Be Conflated

The recent literature often uses envirotyping and enviromics interchangeably, but the distinction is useful and should be stated early. In this review, envirotyping refers to crop-relevant characterization of the environment’s plants experience, often at the field or growth-stage level [19,32]. Enviromics refers to broader environmental data integration across scales, including gridded climate products, geospatial layers, and satellite-derived data used for recommendation, clustering, mapping, or extrapolation [33,68,69,70,71].

This distinction clarifies why satellite-enabled approaches can be promising without being automatically sufficient. Satellite-enabled enviromics expands the spatial reach of environmental profiling and can help map breeding zones or environmental similarity across landscapes [33]. Yet its breeding value still depends on how well landscape-scale descriptors connect to plot-scale crop development, management, and trial design. Put differently, enviromics widens coverage, but envirotyping determines whether that coverage becomes biologically relevant for prediction.

4.3. Feature Engineering Versus Sequence-Based Environmental Encoding

One of the most active recent debates is whether environmental information should be summarized into hand-engineered covariates or learned directly from temporal sequences. Neither option is universally superior.

Feature engineering remains attractive because it is transparent, computationally efficient, and closer to breeder reasoning. In the Genomes-to-Fields maize benchmark, environmental covariates derived using crop-model logic created a more informative basis for G×E analysis than site labels alone [26]. In the maize grain-yield study using machine learning and environmental data, the authors explicitly argued that the feature-engineering stage itself served as a viable envirotyping strategy, and they reported that an additive genetic-plus-environment formulation could match or exceed more explicit multiplicative interaction encodings while also using less memory and time [34].

Sequence-based encoding becomes more attractive when the temporal structure of the environment is itself informative and when datasets are large enough to support more flexible learning. GEFormer exemplifies this direction by combining genomic inputs with daily environmental sequences, dynamic convolution, and attention-based temporal processing [44]. The reported advantage was strongest in hard deployment settings involving untested genotypes and untested environments, which is precisely where breeders most need better extrapolation. Even so, such models depend on richer environmental histories and larger training resources than many breeding programs currently possess.

The recent evidence therefore supports a qualified position. Feature-engineered environmental covariates often remain the more practical choice in moderate-sized programs, whereas sequence-based encoding is most promising when environmental histories are dense, the deployment problem truly demands temporal representation learning, and strong baselines have already been exhausted [43,44]. Sequence-based encoding may be unjustified when environmental records are short, heavily imputed, weakly aligned with crop stages, or unavailable at the actual decision horizon.

4.4. Crop Growth Models and Ecophysiological Mediation

Another constructive trend in the recent literature is the reintegration of crop modeling with genome-enabled prediction [31,38,39,91,92]. This trend is relevant because the phenotype that breeders observe is mediated by development, source-sink relationships, canopy trajectories, stress timing, and recovery dynamics. Purely statistical predictors may absorb some of this structure, but they do not explicitly represent how the crop moved through the season.

Hybrid systems that combine crop or ecophysiological models with genomic prediction can help in at least three ways. First, they can generate development-stage-aligned environmental covariates. Second, they can provide genotype-specific parameters or trait trajectories that are difficult to observe directly in every field setting. Third, they can improve interpretability by linking prediction to crop processes rather than to opaque environmental summaries. However, these benefits are conditional. Process-based descriptors can also propagate model misspecification if phenology, water balance, or management effects are poorly represented. Hybridization should therefore be viewed as a disciplined way to inject biological structure into prediction, not as a shortcut to causality.

4.5. Evidence from Representative Recent Studies

The strongest way to assess current progress is to move beyond general claims and inspect what recent papers reported across crops, trial scales, and deployment settings. Table 2A and 2B summarize representative primary studies from 2023-2026 that were selected because they reported at least part of the information needed to compare crop, traits, study scale, data layers, model family, validation scenario, baseline comparator, and practical breeding relevance. Table 2A covers multi-environment genomic prediction studies, including both broader multi-environment modeling papers and studies with explicit environmental descriptors. Table 2B covers multimodal studies that incorporate phenomic or related auxiliary data layers.

Table 2A. Representative 2023-2026 studies on multi-environment genomic prediction, including studies with explicit environmental descriptors.

Crop/trait(s)	Study scale	Data layers	Model family	Comparator baseline	Best reported value / gain	Deployment stage
Sesame; 9 agronomic traits	Diversity panel; 2 seasons [93]	Markers + MET field data	GBLUP, Bayes, RKHS, marker×environment	Single-environment models	15%-58% improvement in predictive ability, under multi-environment analyses relative to single-environment models	Early-to-mid stage MET support
Grain sorghum hybrids; hybrid performance	US sorghum production environments [32]	Markers + envirotype typologies	Hierarchical Bayesian reaction norms	Alternative envirotype and relationship structures in the same study	Study-specific qualitative improvement in new-environment prediction, relative to alternative envirotype and relationship structures in the same study; no single pooled number reported here	Sparse hybrid MET support
Maize; multi-trial performance	4,402 varieties; 195 trials; 87.1% missing [37]	Markers + environmental covariates	MegaLMM with environmental regressions on latent factors	Univariate GBLUP	Study-specific qualitative improvement in new-environment prediction under extreme missingness, relative to univariate GBLUP; no single pooled number reported here	Large-network sparse testing
Field pea; seed protein and seed yield	300 candidates; 3 contrasting environments [94]	Markers + multi-trait multi-environment phenotypes	MTME genomic prediction	Additive G-BLUP	Study-specific qualitative improvement in whole- and split-environment prediction, relative to additive G-BLUP; no single pooled number reported here	Preliminary MET support
Maize; grain yield	Large multi-environment trial dataset [34]	Markers + engineered environmental descriptors	Tree-based ML G+E and GEI models	Factor-analytic multiplicative mixed model	Up to 7% improvement in mean prediction accuracy, under the authors' study-specific CV settings relative to a factor-analytic multiplicative mixed model	Mid-to-late stage MET prediction
Maize hybrids; grain moisture and grain yield	2,126 hybrids; 34 environments; 9,355 SNPs [35]	Markers + 19 climatic factors / reduced climate sets	GBLUP-GE variants	GBLUP and reduced-climate GBLUP-GE variants	Prediction accuracy of 0.731 for grain moisture and 0.331 for grain yield, under cross-region and 10-fold validation for the full GBLUP-GE19CF model	Regional MET recommendation
Maize, rice, and wheat; agronomic traits	Benchmark-scale multi-crop datasets [44]	Markers + daily environmental sequences	GEFormer with gMLP, dynamic convolution, and attention	6 statistical and 4 ML comparators	Study-specific qualitative improvement in the hardest genotype/environment withholding settings, relative to six statistical and four ML comparators; no single pooled number reported here	Hard extrapolation benchmarking
Maize hybrids; plasticity, stability, and genomic prediction	Large multi-environment hybrid dataset [45]	Markers + reduced environmental parameters + trait-associated markers	AutoML framework	Marker-only genome-wide models	14.02%-28.42% improvement in predictive ability, under the authors' study-specific genomic prediction settings relative to marker-only genome-wide models	Climate-adaptive hybrid selection

Table 2B. Representative 2023-2026 studies integrating genomic, phenomic, or other multimodal inputs.

Crop/trait(s)	Study scale	Data layers	Model family	Comparator baseline	Best reported value / gain	Deployment stage
Winter wheat; grain yield	Winter wheat breeding dataset [95]	Genomic inputs + UAS-derived phenotypes	Genomic-only, phenotypic-only, and combined models	Genomic-only and phenotypic-only models	Study-specific qualitative improvement in combined-genomic-plus-UAS prediction, relative to genomic-only and phenotypic-only models; no single pooled number reported here	Advanced yield testing
Winter wheat; grain yield	2,994 lines; 2 sites; 2 years [28]	Markers + multispectral, hyperspectral, and visual phenomics	Phenomic-only, genomic-only, and combined models	Genomic-only and best phenomic-only models	Phenomic-only R² about 0.39-0.47, with combined models improving 6%-12% over the best phenomic-only model in cross-location prediction	Advanced yield testing
Coffea canephora; yield	Diverse population; 2 locations; 4 harvest seasons [87]	Genomic markers + NIR-based phenomics	Genomic selection vs phenomic selection	Genomic-only and phenomic-only predictors	Study-specific qualitative competitive performance of NIR-based phenomic predictors, relative to genomic-only predictors in within- and across-location prediction; no single pooled number reported here	Perennial selection support
Eucalyptus; multiple agronomic traits	Tree breeding populations adapted to arid environments [78]	SNP markers + spectral phenomics	MLP, CNN, and Bayesian models	Bayesian alphabet models	Prediction accuracy of 0.13-0.80 for MLP and 0.16-0.82 for CNN, relative to 0.08-0.66 for Bayesian models across traits	Tree breeding trait support
Winter wheat; grain yield	4,094 genotypes; 11,593 plots; 2019-2022 [59]	Markers + UAS spectral reflectance indices	Univariate and multivariate genomic prediction	Base genomic prediction control	At least 16% higher prediction accuracy than the genomic control, when test-year NDVI was available under leave-one-year-out validation; cross-year reliability remained limited	Late-stage seasonal decision support
Sesame; longitudinal traits and yield	Diversity panel over growing seasons [75]	Markers + temporal high-throughput phenotyping	Random regression, longitudinal GP, multi-trait GP	Single-trait longitudinal analysis	Study-specific qualitative improvement in future-phenotype forecasting and multi-trait prediction, relative to single-trait longitudinal analysis; no single pooled number reported here	Early repeated-phenotyping selection

Performance metrics are study-specific and are reported as stated in the original articles; they are not directly comparable across studies. When the source paper did not provide a single directly transferable number, the table states explicitly that only a study-specific qualitative improvement is summarized here. Deployment-stage labels are interpretive summaries assigned in the present review to support cross-study comparison and are not necessarily the exact terminology used in the source articles.

The tables are intended to support structured comparison, not quantitative pooling. Several patterns emerge from these studies, but they should be phrased carefully. Multi-environment or envirotype-assisted models often outperform single-environment or genotype-only baselines in sesame, sorghum, maize, pulse crop, and winter wheat, but the size of improvement depends on trait, environmental sampling, baseline choice, and validation design [28,32,34,35,37,44,45,59,75,93,94]. Likewise, phenomic layers often improve prediction for yield-related traits, but their operational value depends strongly on when the phenomic data become available and whether the prediction target concerns future seasons, current-season within-program decisions, or regional deployment [28,59,60,75,87,95,96]. These patterns should be interpreted as recurring tendencies rather than directly pooled quantitative effects because study design, metric definitions, and validation scenarios differ substantially.

Additional studies in spring barley, cotton, coffee, lentil, and white lupin suggest that the same logic extends beyond the best-known benchmark crops, but they also show that transferability of conclusions across crop types should be treated cautiously [84,85,86,87,88]. The field therefore needs more crop-diverse benchmarking rather than simply more algorithm classes.

4.6. Environmental Extrapolation Remains Conditional

The main test of environmental modeling is not within-sample fit but extrapolation. Can the model help in genuinely new environments? Recent evidence suggests yes, but not under all conditions.

The grain sorghum envirotyping study is particularly instructive because it showed that prediction gains for new environments depended on the mega-environment itself; improvements were clearer in temperate environments than in subtropical ones [32]. The 2025 maize GBLUP-GE study also showed that increasing the number of target-environment observations in the training set improved prediction for grain moisture and yield, which implies that environmental coverage still matters even when climate covariates are available [35]. MegaLMM likewise improved new-environment prediction in a very large maize dataset, but it did so in a setting with 195 trials and 87.1% missing phenotypes, illustrating that success depended on both scale and model structure [37].

These examples support a cautious conclusion. Environmental descriptors help most when they capture relevant environment classes, align with crop development, and are evaluated under withholding schemes that genuinely mimic deployment. Environmental representation is therefore powerful but not plug-and-play.

5. AI and Statistical Learning Architectures: What the Recent Evidence Actually Supports

5.1. Strong Baselines Still Define the Standard of Proof

The recent literature does not justify replacing structured statistical models by default. Mixed models, factor-analytic frameworks, reaction-norm approaches, and multi-trait formulations remain highly competitive because they handle unbalanced breeding data, relatedness, repeated environments, and missingness efficiently [21,23,26,30,37,42,43,94]. The correct comparison is not between "old statistics" and "new AI" as abstract categories. It is between methods that are credible for the specific data structure and deployment target of a breeding program.

MegaLMM is an important reminder that statistical innovation is still active. By extending a large-scale latent-factor framework to use environmental covariates for new-environment prediction, it improved performance relative to univariate GBLUP in a difficult maize setting characterized by heavy missingness and many trials [37]. Similarly, mmGEBLUP shows that linear mixed-model families continue to evolve in their handling of major genes, polygenes, and G×E effects [43]. These systems remain especially relevant when the number of environments is modest relative to the number of marker effects, or when interpretability and variance decomposition matter operationally.

5.2. When Machine Learning Adds Value

Machine learning becomes more attractive when nonlinear interactions are plausible, environmental descriptors are high dimensional, and a breeding program has enough heterogeneity to benefit from flexible representations [1,7,11,34,40,41,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,97,98,99,100,101]. Even then, the evidence is conditional rather than absolute.

One subspace is machine learning with engineered environmental covariates. In the maize grain-yield study based on multi-environment trial data, adding engineered environmental features increased the study-defined mean prediction accuracy by up to 7% relative to a factor-analytic multiplicative mixed-model baseline across three study-specific validation scenarios [34]. This gain is notable, but it should be interpreted within its boundaries: the crop was maize, the trait was grain yield, the dataset was large, and the advantage depended on feature engineering that translated the environment into structured inputs. The result should not be read as proof that machine learning dominates mixed models in all crops or traits.

Automated machine learning offers a related but distinct contribution because its value lies in model search and joint optimization rather than in automation alone. In a large-scale hybrid maize study, the reported predictive ability, as defined in the original article, increased by 14.02%-28.42% relative to marker-only models under the authors' study-specific validation settings when reduced environmental parameters and trait-associated markers were integrated [45]. Here again, the conditions matter. The gain came from combining genomic and environmental information in a large hybrid breeding context, not from AutoML by itself. The study is most convincing as evidence that environmental information can materially reshape predictive performance and genetic interpretation when the dataset is sufficiently rich.

Accordingly, the result is better interpreted as evidence for the value of joint environmental-feature reduction and model search than as evidence for automation alone.

More broadly, the design space now includes machine-learning-assisted climate resilience prediction, digitalized breeding workflows, crop-specific AI frameworks, and platformized genomic prediction systems [62,63,64,65,66,102]. These contributions expand the toolkit available to breeders, but they also make it more important to separate genuine evidence of decision improvement from generic claims that AI is modernizing breeding.

5.3. Where Deep Learning Is Most Credible

The strongest case for deep learning in this literature appears in multimodal, temporal, or otherwise hard-to-specify problems. Recent examples include multimodal deep learning in wheat [41], a deep-learning fusion framework for G×E genomic prediction [77], transformer-based genomic prediction [46], interpretable architectures such as Cropformer [53], and transfer-learning systems that integrate multi-trait information [56]. Deep learning is especially promising when multiple data streams must be fused and when interactions cannot be pre-specified easily.

However, the review evidence does not support the stronger claim that deep learning should generally replace simpler methods. Deep models usually require more environmental diversity, more careful validation, and more attention to leakage. Their performance is often most persuasive in scenarios involving untested environments, complex multimodal inputs, or time-resolved environmental sequences [1,41,44,46,53,56,77]. Evidence remains much thinner for routine deployment in smaller breeding programs, for crops outside maize and wheat, and for scenarios where multimodal data are sparse or delayed. In smaller or moderately structured breeding datasets, the operational advantage may remain uncertain even if the model is technically sophisticated.

5.4. Interpretation, Uncertainty, and the Credibility of Model Choice

Prediction models in breeding are decision tools, not only pattern-recognition systems. Breeders need to know not only which line ranks highly, but also under what environmental assumptions, with what uncertainty, and relative to which baseline. The current literature supports this need more as a recurring methodological gap than as a consistently reported evidence layer.

Recent studies have started to engage fragments of the problem. GEFormer provides an explicit architecture for genomic-environmental fusion and evaluates several deployment scenarios [44]. Cropformer emphasizes interpretability at the model-design level [53]. AutoML-based environmental modeling in maize links gains to reduced environmental parameters and classes of markers associated with plasticity, stability, and G×E [45]. The large-scale winter wheat UAS study, by contrast, is valuable partly because it states the conditional nature of its gains and the associated computing burden [59]. Still, breeder trust cannot rest on architecture names or attention maps alone. Interpretability is most useful when it clarifies which environmental windows, phenomic acquisition times, or trial structures make a prediction credible. As summarized conservatively in Table 4, formal uncertainty reporting remains uncommon even among representative recent studies, and many papers still report point estimates without formal rank-stability or decision-value summaries.

This challenge may become sharper as the surrounding tool ecosystem expands to include broader multi-omics pipelines, microbiome-linked adaptation signals, simulation environments, and genome foundation-model representations [67,103,104,105]. These resources may prove useful, but they should still be judged by whether they improve breeder-relevant decisions under transparent validation rather than by novelty alone.

5.5. Benchmark Hygiene, Leakage, and Fair Comparison

The expansion of AI methods has made benchmark hygiene one of the most important methodological issues in the field. Different studies vary in crop, relatedness structure, environmental withholding logic, trait definitions, and data timing. As a result, headline accuracy values are not directly comparable unless the validation framework is also compared [21,26,34,37,44,45,59].

Three risks are recurrent. The first is kinship leakage, in which closely related genotypes appear on both sides of the split. The second is environmental leakage, in which nearly duplicated year-location contexts are shared between training and test sets. The third is information-timing leakage, in which environmental or phenomic data available only late in the season are used to claim gains for decisions that breeders must make much earlier. These problems are not exclusive to AI studies, but they can exaggerate the apparent superiority of flexible models because such models readily exploit hidden overlap.

This concern is practical, not philosophical. A late-season multimodal model may be very useful for regional recommendation or product placement, yet irrelevant for early-stage preselection. Likewise, a model that performs well under random cross-validation may still be weak under environment withholding. Good benchmarking therefore requires decision-aligned validation, transparent timing of data availability, and explicit comparison against strong statistical baselines rather than straw-man baselines.

Table 3. Recurring methodological lessons and failure modes in AI-enabled multi-environment prediction.

Issue	Typical manifestation	When most severe	Practical response
Kinship leakage [37]	Closely related genotypes occur in both training and test folds	Family-structured breeding populations	Use family-aware splits or pedigree/genomic relationship constraints
Environmental leakage [26]	Training and test sets share near-duplicate year-location contexts	Repeated trial networks and short time spans	Use leave-one-environment, leave-one-year, or site-withholding designs
Timing leakage [59,60,75]	Late-season phenomics or weather summaries are used for early-stage claims	Operationally compressed breeding timelines	State explicitly when each data layer becomes available
Misaligned environmental covariates [19,26,33,34,38]	Raw weather tables are added without stage alignment	Traits tied to developmental windows	Use stage-aware envirotyping or crop-model-informed summaries
Severe missing-data burden [37,80,81,94]	Sparse genotype-environment matrices distort apparent gains	Network trials and sparse testing	Report missingness pattern and compare against sparse-data-aware baselines
Weak baseline choice [34,37,42,43,44]	AI models are compared only with marker-only baselines	Method-comparison papers	Benchmark against strong factor-analytic, reaction-norm, or mixed-model baselines
Unclear decision framing [37,44,45,53,59]	Accuracy is reported without deployment stage, uncertainty, or cost context	Late-stage recommendation or expensive field validation	Report scenario, uncertainty, and deployment use-case together

5.6. Minimum Reporting Recommendations for Future Studies

To improve comparability across the field, recent concerns about leakage, deployment mismatch, and incomplete reporting can be translated into a minimum reporting standard. The goal is not bureaucratic uniformity, but enough transparency for readers to evaluate whether a prediction study is relevant to breeding use.

Any reported gain should be linked to an explicit baseline and a clearly defined metric. For studies intended to support advancement or recommendation, reporting should also include uncertainty, rank-based results, computational cost, sensing burden, and, where possible, the availability of code, data, or benchmark metadata for reproducibility [21,26,34,37,44,45,59].

6. Phenomics-Assisted and Multimodal Prediction

6.1. Why Phenomic Markers Are Not Redundant with Genomic Markers

Field phenomics is no longer only a trait measurement technology. It increasingly acts as a predictive layer. Genomic markers describe inherited potential, whereas phenomic markers capture realized plant responses after genotype has interacted with the environment. For complex traits such as grain yield, this distinction is fundamental [5,9,28,59,60,72,73,74,75,95,96,106,107,108].

A winter wheat field study involving 2,994 lines evaluated at two sites over two years provides one of the clearest demonstrations [28]. In that dataset, phenomic variables derived from multispectral, hyperspectral, and visual field measurements explained more yield variation than genomic markers alone and combining genomic and phenomic information improved predictive performance over the strongest phenomic-only model. The important qualifier is that this result concerns a large field dataset, a complex trait, and cross-location prediction. It should therefore be read as strong evidence for the value of phenomics in certain breeding contexts, not as proof that phenomic data generally replace genomics.

6.2. Timing of Phenomic Acquisition Matters as Much as Sensor Quality

The recent phenomics literature has matured beyond proof-of-concept image collection and now focuses more on timing, scale, trait engineering, and deployment [3,5,9,59,60,72,73,74,75,108]. This is an important shift because a phenomic signal is only useful if it becomes available in time to influence the intended breeding decision.

Two use cases should be distinguished. The first is current-season augmentation, in which phenomic measurements are used to improve an in-season or late-stage decision for the same testing cycle. The large-scale winter wheat UAS study illustrates this point clearly. Using spectral reflectance indices from 4,094 genotypes across 11,593 breeding plots, the authors showed that UAS-derived signals could improve genomic prediction [59]. When test-year NDVI was available under the study's leave-one-year-out setting, the reported prediction accuracy was at least 16% higher than the study control. This is strong evidence that phenomics can add value when the signal is available at the right decision time.

Another use case is cross-season transportable prediction, which is methodologically harder and operationally more demanding. The same winter wheat study showed that cross-year reliability was limited and that stronger multivariate models required substantial computing resources [59]. The lesson is not that UAS-enabled phenomics fails, but that its value depends on timing, repeatability, and whether the phenomic layer can be carried across seasons or locations without hidden leakage. Rice UAS-based nitrogen monitoring, coffee comparisons between genomic and phenomic selection, and temporal sesame phenotyping all reinforce that timing and use-case definition matter as much as sensor quality [75,87,109]. These two use cases should not be benchmarked interchangeably.

6.3. Temporal Phenotyping Changes the Prediction Problem

Time-series phenotyping adds a new dimension because it captures development, stress onset, and recovery rather than only end-point trait values [31,60,75,110]. Recent work in sesame showed that temporal high-throughput phenotyping can improve genomic prediction when longitudinal traits are modeled explicitly and linked to yield through multi-trait approaches [75]. Similarly, recent reviews of time-series trait prediction emphasize that treating every time point as an unrelated trait wastes temporal structure [60].

Temporal phenotyping is therefore not just "more data". It changes the prediction problem from static ranking to developmental inference. That opens new opportunities, but it also raises new risks involving irregular sampling, growth-stage alignment, and the temptation to use late-season signals to explain decisions that must be made earlier.

6.4. Multimodal Fusion Is Promising, But Not All Data Layers Earn Their Cost

Many of the most ambitious recent studies are no longer restricted to genotype plus one auxiliary input. They combine markers with phenomics, environmental covariates, remote sensing, metabolomic markers, or crop-model outputs [31,38,41,56,60,71,76,77,78,79,92,106,107,111]. This is one reason why deep learning and transfer learning have gained visibility.

Still, multimodal learning should be evaluated by decision value rather than by data volume. A data layer may improve retrospective accuracy while remaining operationally unattractive if it is costly, delayed, or difficult to standardize. The strongest case for multimodal prediction therefore arises when each layer contributes something distinct: markers support portability across untested material; environmental descriptors structure the exposure context; and phenomic measurements capture realized response. If a data stream does not change a decision in time or at acceptable cost, its breeding value remains limited even if it is scientifically interesting.

This crop dependence should not be ignored. Horticultural hybrid prediction, lentil genomic selection, and legume-oriented genomic or multi-omics studies extend the relevance of multimodal prediction beyond large cereal programs, but often under different logistical and economic constraints [88,112,113]. A pipeline that is well matched to a national maize or wheat network may be unrealistic in a smaller horticultural or pulse breeding program.

6.5. Interpreting Cases Where Genomic and Phenomic Signals May Diverge

Although direct comparative evidence remains limited, a useful scenario occurs when genomic and phenomic predictions disagree. This should not automatically be treated as instability. It may instead indicate that the two data layers answer different questions. Genomic prediction is portable across untested material and therefore useful earlier in the breeding cycle. Phenomic prediction is more proximal to realized crop performance and can be highly informative for the current season or for specific stress patterns.

Recent studies help anchor this interpretation. In winter wheat, combining genomic information with UAS-derived data improved grain-yield prediction beyond genomic-only prediction, implying that the phenomic layer captured realized current-season information that markers alone did not represent [95]. In Coffea canephora, phenomic selection based on plant-response measurements was competitive with genomic selection for yield prediction across locations, again suggesting that the two predictors can emphasize different but agriculturally relevant signals [87]. By contrast, work on predictor bias in genomic and phenomic selection showed that seemingly strong phenomic results can be upwardly biased when independence is not preserved, and that unbiased phenomic prediction may be lower than earlier optimistic estimates [96]. For this reason, multimodal systems should not always collapse all evidence into one score too early. In some breeding settings, retaining separate genomic, environmental, and phenomic components for a while may improve interpretation and help distinguish broadly adapted material from materials whose apparent advantage may be more context-specific than broadly transportable.

7. From Prediction Accuracy to Breeding Use

7.1. Validation Design Must Mirror the Breeding Question

Many prediction studies still report metrics under random splits that are easier than the decisions breeders face. This is now one of the clearest weaknesses in the literature. Predicting a known genotype in a familiar environment is not the same as predicting an untested genotype in an untested environment, nor is it the same as filling sparse testing matrices across a trial network [21,26,30,34,35,36,37,44,45,81].

The best recent papers state the deployment scenario explicitly. GEFormer reported separate results for untested genotypes in tested environments, tested genotypes in untested environments, and untested genotypes in untested environments [44]. MegaLMM focused directly on new-environment prediction [37]. The pulse-crop MTME study contrasted whole-environment and split-environment cross-validation [94]. These are not minor technical choices. They determine whether a reported gain is relevant for early-stage selection, sparse testing, or late-stage deployment.

7.2. Breeding Stage Determines Which Model Family Is Realistic

Prediction value depends strongly on breeding stage. Early in the pipeline, programs need inexpensive ranking of many candidates, often before dense phenomics are available. Mid-pipeline, breeders may be able to use historical trial information and some environmental structure. Late-stage advancement and regional recommendation can justify richer models because fewer candidates remain and the cost of error is higher [10,21,26,28,34,35,37,45,59,79,81].

This logic means that there is no universally best model family. Marker-only or marker-plus-basic-environment models may still be adequate for early-stage preselection. Stage-aware environmental models are attractive for sparse testing and new-environment prediction. Rich multimodal models are most justified when their additional data streams arrive in time to change late-stage decisions. A review that ignores this stage structure risks overvaluing technically impressive models that do not fit breeding operations.

7.3. Uncertainty and Economic Decision Value Should Be Reported Together

Accuracy alone is not enough to define usefulness. Breeders often make threshold decisions: which lines move forward, which hybrids justify another year of testing, or which product candidates merit costly regional deployment. The practical value of a prediction therefore depends on uncertainty and on the economic consequence of ranking errors.

This is particularly important in multi-environment settings. A model may provide modest average gains while still offering large operational value if it reduces field-testing burden in expensive environments. Conversely, a sophisticated multimodal system may improve correlation slightly but remain economically unattractive if it requires repeated sensor flights, complex preprocessing, or high-performance computing beyond what the program can sustain [34,45,59,79,80]. For this reason, future studies should report not only predictive performance but also uncertainty, compute burden, sensing cost, and the decision stage for which the model is intended.

In practical breeding pipelines, rank-based agreement among the top selected entries may be more informative than global correlation metrics. Table 4 provides an illustrative reporting audit across representative environmental, multimodal, and bias-sensitive studies selected to span the main model classes and data modalities discussed in this review. It is not intended as an exhaustive audit of every study cited in Table 2A and 2B or as a balanced sample by crop or algorithm. Formal uncertainty reporting remains uncommon across this representative set. Future studies should increasingly report rank-based measures such as top-k overlap, selection coincidence, or expected regret, because these are often closer to actual advancement decisions than correlation alone.

7.4. A Practical Framework for Stage-Specific Deployment

One of the clearest messages of the recent literature is that model choice should start from the breeding decision, not from the algorithm class. Table 5 presents a matrix-style deployment framework linking breeding stage, typical candidate number, data realistically available at decision time, suitable validation design, realistic model families, and the main decision target. The framework is intentionally simple and comparative rather than decorative. It is not a replacement for program-specific optimization within a given breeding pipeline and TPE, but it can help readers translate methodological claims into operational choices.

The framework also helps explain why strong baselines remain important. In early-stage preselection, there may be little justification for a highly multimodal system. In sparse testing, environmental covariates and reaction-norm models become more relevant. In late-stage regional recommendation, richer multimodal models may be justified because the decision is expensive, and the data horizon is longer. The same model class can therefore be either over-engineered or well matched depending on when it is used.

7.5. Practical Design Rules for Readers and Future Authors

Five practical rules follow from the evidence synthesized here. First, define the deployment scenario before choosing the model. Second, compare against strong mixed-model or factor-analytic baselines, not only against marker-only straw men. Third, report when each data layer becomes available relative to the breeding decision. Fourth, distinguish within-environment interpolation from genuine new-environment extrapolation. Fifth, discuss cost, uncertainty, and interpretability together rather than treating them as separate afterthoughts.

These rules are simple, but they address many of the recurring weaknesses in the recent literature. They also help explain why the same methodological claim can be valid in one breeding stage and unconvincing in another.

8. Current Limitations and Priorities for the Next Phase

The evidence base remains promising but uneven. The first limitation is cropping imbalance. Maize and wheat dominate the strongest methodological papers because they benefit from large trial networks, national datasets, and comparatively mature phenotyping infrastructures [45,59]. This imbalance is not just a benchmarking issue. It shapes the apparent maturity of the field. Among the 14 primary studies summarized in Table 2A and 2B, 7 focus directly on maize or wheat, and 9 concern cereal systems once sorghum and multi-crop cereal benchmarks are included. Methods that look robust in maize may be much less tested in pulses, oilseeds, roots and tubers, or minor cereals.

These counts refer only to the representative studies summarized in Table 2A and 2B and should not be interpreted as a formal prevalence estimate for the entire screened literature.

Important breeding targets also differ across crop systems. Resource-use efficiency, drought tolerance, grain protein, seed quality, climate resilience, and stress-linked adaptation are all prominent breeding goals in barley, wheat, peanuts, cotton, legumes, horticultural crops, and other systems, yet the evidence connecting these targets to breeder-ready multi-environment prediction remains uneven [86,112,114,115,116,117,118,119]. Generalization across crops should therefore be made with explicit caution.

The second limitation is incomplete environmental representation. Even sophisticated pipelines still struggle to integrate management, soil heterogeneity, in-season biotic stress, and microclimate at the scale breeders need. Satellite-enabled enviromics, CLIM4OMICS-type resources, and gridded environmental products expand coverage, but they do not automatically solve alignment with plot-level breeding observations [33,71]. Environmental ontology and stage-aware alignment remain underdeveloped.

An additional limitation of this review is that the search was restricted to English-language peer-reviewed literature indexed primarily through database and backward citation tracing, which may underrepresent some relevant studies outside that coverage.

The third limitation is uncertainty reporting. Many papers still foreground mean accuracy values without making clear how prediction uncertainty, ranking instability, or confidence intervals would influence selection decisions. This weakens translation to real breeding.

The fourth limitation is interoperability. Multimodal systems require compatible metadata, synchronized timing, and consistent preprocessing across genomics, phenomics, and environmental records. Many breeding programs still lack low-friction pipelines connecting these layers from field to decision dashboard. Several digital breeding tools and platform initiatives describe infrastructure that may help address this problem, but direct evidence that these resources resolve operational bottlenecks in routine deployment remains limited [120,121].

The fifth limitation is the unresolved balance between predictive flexibility and biological explanation. Deep learning and transfer learning can represent complex nonlinear relationships, but breeder trust will remain limited unless predictions can be connected to interpretable environmental windows, physiological expectations, or clearly defined deployment logic [91,92,110].

The sixth limitation is economic realism. A model that improves an abstract metric by a small amount may or may not improve realized genetic gain once labor, sensing, compute cost, and cycle time are considered. This review therefore supports a broader reporting standard in which cost, uncertainty, timing, and interpretability are discussed alongside performance metrics.

Finally, future crop improvement itself is becoming more diverse in its germplasm and design goals. Work on next-generation domestication and broader digital breeding architectures suggests that prediction systems may soon need to cover more diverse genetic backgrounds and adaptation targets than current benchmarks represent [122]. If so, the need for explicit environmental representation, fair benchmarking, and deployment-aware reporting will only become stronger.

9. Conclusions

The recent literature shows that a more mature phase of multi-environment genomic prediction is emerging. The most useful advances are not generic claims that AI will transform crop improvement. They are concrete attempts to integrate genomic data, environmental representation, and field phenomics in ways that improve breeder-relevant prediction problems.

At the same time, the evidence argues against overstatement. Environmental information is not automatically useful; it works best when aligned with crop development, trait biology, and the target population of environments. Strong statistical baselines are still highly competitive, particularly in moderate or highly unbalanced datasets. AI gains are most convincing when the task is genuinely multimodal, temporally structured, or evaluated in difficult deployment scenarios. Phenomics frequently adds value for yield-related traits, but its practical utility depends on acquisition timing, cost, and whether the information arrives early enough to alter selection.

This leads to a pragmatic conclusion. The field should now move from algorithm-centered enthusiasm to benchmark-centered and deployment-centered rigor. For this field to mature, future studies should report scenario-specific validation, comparator baselines, data-availability timing, uncertainty, and deployment cost in a more standardized way.

Author Contributions

Conceptualization, X.L. and S.Y.; Methodology, X.L.; Software, X.L.; Validation, X.L., S.Y., and Y.J.; Formal Analysis, X.L.; Investigation, X.L.; Resources, X.L.; Data Curation, X.L.; Writing—Original Draft Preparation, X.L.; Writing—Review and Editing, X.L., S.Y., Y.J., Y.W., and D.Y.; Visualization, X.L.; Supervision, S.Y.; Project Administration, S.Y.; Funding Acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This article was funded by the Heilongjiang Provincial Natural Science Foundation of China (PL2025D007).

Data Availability Statement

Data are available through request to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sheikh Jubair; Michael Domaratzki. Crop genomic selection with deep learning and environmental data: A survey. Frontiers in Artificial Intelligence 2023, 5, 1040295-1040295. [CrossRef]
Rabiya Parveen; Mankesh Kumar; Swapnil Swapnil; Digvijay Singh; Monika Shahani; Zafar Imam; Jyoti Prakash Sahoo. Understanding the genomic selection for crop improvement: current progress and future prospects. Molecular Genetics and Genomics 2023, 298(4), 813-821. [CrossRef]
Jason Walsh; Eleni Mangina; Sónia Negrão. Advancements in Imaging Sensors and AI for Plant Stress Detection: A Systematic Literature Review. Plant Phenomics 2024, 6, 0153-0153. [CrossRef]
Andekelile Mwamahonje; Zamu Mdindikasi; Devotha Mchau; Emmanuel T. Mwenda; Daines Nicodem Sanga; Ana Luísa Garcia-Oliveira; Chris O. Ojiewo. Advances in Sorghum Improvement for Climate Resilience in the Global Arid and Semi-Arid Tropics: A Review. Agronomy 2024, 14(12), 3025-3025. [CrossRef]
Boubacar Gano; Sourav Bhadra; Justin M. Vilbig; Nurzaman Ahmed; Vasit Sagan; Nadia Shakoor. Drone-based imaging sensors, techniques, and applications in plant phenotyping for crop breeding: A comprehensive review. The Plant Phenome Journal 2024, 7(1). [CrossRef]
Lixia Sun; Mingyu Lai; Fozia Ghouri; Muhammad Amjad Nawaz; Fawad Ali; Faheem Shehzad Baloch; Muhammad Azhar Nadeem; Muhammad Aasım; Muhammad Qasim Shahid. Modern Plant Breeding Techniques in Crop Improvement and Genetic Diversity: From Molecular Markers and Gene Editing to Artificial Intelligence—A Critical Review. Plants 2024, 13(19), 2676-2676. [CrossRef]
Suvojit Bose; Saptarshi Banerjee; Soumya Kumar; Akash Saha; Debalina Nandy; Soham Hazra. Review of applications of artificial intelligence (AI) methods in crop research. Journal of Applied Genetics 2024, 65(2), 225-240. [CrossRef]
Guilong Lu; Purui Liu; Qibin Wu; Shuzhen Zhang; Peifang Zhao; Yuebin Zhang; Youxiong Que. Sugarcane breeding: a fantastic past and promising future driven by technology and methods. Frontiers in Plant Science 2024, 15, 1375934-1375934. [CrossRef]
Lukas Roth; Afef Marzougui; Achim Walter. A review of the journey of field crop phenotyping: From trait stamp collections and fancy robots to phenomics-informed crop performance predictions. Journal of Plant Physiology 2025, 311, 154542-154542. [CrossRef]
Diana M. Escamilla; Dongdong Li; Karlene L. Negus; Kiara L. Kappelmann; Aaron Kusmec; Adam Vanous; Patrick S. Schnable; Xianran Li; Jianming Yu. Genomic selection: Essence, applications, and prospects. The Plant Genome 2025, 18(2), e70053-e70053. [CrossRef]
Ana Luísa Garcia-Oliveira; Sangam L. Dwivedi; Subhash Chander; Charles Nelimor; Diaa Abd El Moneim; Rodomiro Ortíz. Breeding Smarter: Artificial Intelligence and Machine Learning Tools in Modern Breeding—A Review. Agronomy 2026, 16(1), 137-137. [CrossRef]
Shaoming Huang; Krishna Kishore Gali; Gene Arganosa; Bunyamin Tar ̓an; Rosalind Bueckert; Thomas D. Warkentin. Breeding indicators for high-yielding field pea under normal and heat stress environments. Canadian Journal of Plant Science 2023, 103(3), 259-269. [CrossRef]
Carlos A. Robles-Zazueta; Leonardo Crespo-Herrera; Francisco J. Piñera-Chávez; Carolina Rivera-Amado; Guðbjörg I. Aradóttir. Climate change impacts on crop breeding: Targeting interacting biotic and abiotic stresses for wheat improvement. The Plant Genome 2023, 17(1), e20365-e20365. [CrossRef]
Jacob van Etten; Kauê de Sousa; Jill E. Cairns; Matteo Dell’Acqua; Carlo Fadda; Davíd Güereña; Joost van Heerwaarden; Teshale Assefa; Rhys Manners; Anna Müller. Data-driven approaches can harness crop diversity to address heterogeneous needs for breeding products. Proceedings of the National Academy of Sciences 2023, 120(14), e2205771120-e2205771120. [CrossRef]
Osval A. Montesinos-López; Leonardo Crespo-Herrera; Carolina Saint Pierre; Alison R. Bentley; Roberto de la Rosa-Santamaria; José Alejandro Ascencio-Laguna; Afolabi Agbona; Guillermo Gerard; Abelardo Montesinos-López; José Crossa. Do feature selection methods for selecting environmental covariables enhance genomic prediction accuracy?. Frontiers in Genetics 2023, 14, 1209275-1209275. [CrossRef]
J. R. Adams; Michiel E. de Vries; Fred A. van Eeuwijk. Efficient Genomic Prediction of Yield and Dry Matter in Hybrid Potato. Plants 2023, 12(14), 2617-2617. [CrossRef]
Yoselin Benitez-Alfonso; Beth K Soanes; Sibongile Zimba; Besiana Sinanaj; Liam German; Vinay Sharma; Abhishek Bohra; Anastasia Kolesnikova; Jessica Dunn; Azahara C. Martín. Enhancing climate change resilience in agricultural crops. Current Biology 2023, 33(23), R1246-R1261. [CrossRef]
Tingting Guo; Jialu Wei; Xianran Li; Jianming Yu. Environmental context of phenotypic plasticity in flowering time in sorghum and rice. Journal of Experimental Botany 2023, 75(3), 1004-1015. [CrossRef]
Chloé Elmerich; Michel-Pierre Faucon; M. A. García; Patrice Jeanson; Guénolé Boulch; Bastien Lange. Envirotyping to control genotype x environment interactions for efficient soybean breeding. Field Crops Research 2023, 303, 109113-109113. [CrossRef]
Artūrs Katamadze; Omar Vergara-Díaz; Estefanía Uberegui; Ander Yoldi-Achalandabaso; J. L. Araus; Rubén Vicente. Evolution of wheat architecture, physiology, and metabolism during domestication and further cultivation: Lessons for crop improvement. The Crop Journal 2023, 11(4), 1080-1096. [CrossRef]
Hans-Peter Piepho; Justin Blancon. Extending Finlay–Wilkinson regression with environmental covariates. Plant Breeding 2023, 142(5), 621-631. [CrossRef]
Mark Cooper; Owen Powell; Carla Gho; Tom Tang; Carlos D. Messina. Extending the breeder’s equation to take aim at the target population of environments. Frontiers in Plant Science 2023, 14, 1129591-1129591. [CrossRef]
Jeffrey B. Endelman. Fully efficient, two-stage analysis of multi-environment trials with directional dominance and multi-trait genomic selection. Theoretical and Applied Genetics 2023, 136(4), 65-65. [CrossRef]
Ashok Singamsetti; P.H. Zaidi; K. Seetharam; Madhumal Thayil Vinayan; Tiago Olivoto; Anima Mahato; Kartik Madankar; Munnesh Kumar; Kumari Shikha. Genetic gains in tropical maize hybrids across moisture regimes with multi-trait-based index selection. Frontiers in Plant Science 2023, 14, 1147424-1147424. [CrossRef]
Paul Adunola; Maria Amélia Gava Ferrão; Romário Gava Ferrão; Aymbiré Francisco Almeida da Fonseca; P. S. Volpi; Marcone Comério; Abraão Carlos Verdin Filho; Patricio Muńoz; Luís Felipe V. Ferrão. Genomic selection for genotype performance and environmental stability in Coffea canephora. G3 Genes Genomes Genetics 2023, 13(6). [CrossRef]
Marco Lopez-Cruz; Fernando Aguate; Jacob D. Washburn; Natalia de León; Shawn M. Kaeppler; Dayane Cristina Lima; Ruijuan Tan; Addie Thompson; Laurence Willard De La Bretonne; Gustavo de los Campos. Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America. Nature Communications 2023, 14(1), 6904-6904. [CrossRef]
Văn Hiếu Nguyễn; Rose Imee Zhella Morantte; Vitaliano Lopena; Holden Verdeprado; Rosemary Murori; Alexis Ndayiragije; Sanjay Katiyar; Md Rafiqul Islam; Roselyne U. Juma; H. Flandez-Galvez. Multi-environment Genomic Selection in Rice Elite Breeding Lines. Rice 2023, 16(1). [CrossRef]
Robert Jackson; Jaap B. Buntjer; Alison R. Bentley; Jacob Lage; Ed Byrne; Chris Burt; Peter Jack; Simon Berry; Edward Flatman; Bruno Poupard. Phenomic and genomic prediction of yield on multiple locations in winter wheat. Frontiers in Genetics 2023, 14, 1164935-1164935. [CrossRef]
Carlos D. Messina; Carla Gho; Graeme Hammer; Tom Tang; Mark Cooper. Two decades of harnessing standing genetic variation for physiological traits to improve drought tolerance in maize. Journal of Experimental Botany 2023, 74(16), 4847-4861. [CrossRef]
Diriba Tadese; Hans-Peter Piepho; Jens Hartung. Accuracy of prediction from multi-environment trials for new locations using pedigree information and environmental covariates: the case of sorghum (Sorghum bicolor (L.) Moench) breeding. Theoretical and Applied Genetics 2024, 137(8), 181-181. [CrossRef]
Alper Adak; Seth C. Murray; Jacob D. Washburn. Deciphering temporal growth patterns in maize: integrative modeling of phenotype dynamics and underlying genomic variations. New Phytologist 2024, 242(1), 121-136. [CrossRef]
Noah D. Winans; Jales M. O. Fonseca; Ramasamy Perumal; Patricia E. Klein; Robert R. Klein; William L. Rooney. Envirotyping can increase genomic prediction accuracy of new environments in grain sorghum trials depending on mega-environment. Crop Science 2024, 64(5), 2519-2533. [CrossRef]
Rafael T Resende; Lee T. Hickey; Cibele Hummel do Amaral; Lucas Lemes de Souza Peixoto; Gustavo Eduardo Marcatti; Yunbi Xu. Satellite-enabled enviromics to enhance crop improvement. Molecular Plant 2024, 17(6), 848-866. [CrossRef]
Igor Kuivjogi Fernandes; Caio Canella Vieira; Kaio Olímpio das Graças Dias; Samuel B. Fernandes. Using machine learning to combine genetic and environmental data for maize grain yield predictions across multi-environment trials. Theoretical and Applied Genetics 2024, 137(8), 189-189. [CrossRef]
Jingxin Wang; Liwei Liu; Kunhui He; Takele Weldu Gebrewahid; Shang Gao; Qiu Tian; Zhanyi Li; Yiqun Song; Y. Y. Guo; Yanwei Li. Accurate genomic prediction for grain yield and grain moisture content of maize hybrids using multi-environment data. Journal of Integrative Plant Biology 2025, 67(5), 1379-1394. [CrossRef]
Fatma Ozair; Alper Adak; Seth C. Murray; Ryan Timothy Alpers; Alejandro Castro Aviles; Dayane Cristina Lima; Jode W. Edwards; David Ertl; Michael A. Gore; Candice N. Hirsch. Phenotypic plasticity in maize grain yield: Genetic and environmental insights of response to environmental gradients. The Plant Genome 2025, 18(3), e70078-e70078. [CrossRef]
Haixiao Hu; Renaud Rincent; Daniel E. Runcie. MegaLMM improves genomic predictions in new environments using environmental covariates. Genetics 2024, 229(1), 1-41. [CrossRef]
Abdulqader Jighly; Anna Weeks; Brendan Christy; Garry J. O’Leary; Surya Kant; Rajat Aggarwal; David Hessel; Kerrie Forrest; Frank Technow; Josquin Tibbits. Integrating biophysical crop growth models and whole genome prediction for their mutual benefit: a case study in wheat phenology. Journal of Experimental Botany 2023, 74(15), 4415-4426. [CrossRef]
Pratishtha Poudel; Bryan Naidenov; Charles Chen; Phillip D. Alderman; Stephen M. Welch. Integrating genomic prediction and genotype specific parameter estimation in ecophysiological models: overview and perspectives. in silico Plants 2023, 5(1). [CrossRef]
Freddy Mora; Carlos Maldonado; Luma Alana Vieira Henrique; Renan Santos Uhdre; Carlos Alberto Scapim; Claudete Aparecida Mangolim. Multi-trait and multi-environment genomic prediction for flowering traits in maize: a deep learning approach. Frontiers in Plant Science 2023, 14, 1153040-1153040. [CrossRef]
Abelardo Montesinos-López; Carolina Rivera; Francisco Pinto; Francisco Piñera; David S. González-González; Matthew Reynolds; Paulino Pérez-Rodríguez; Huihui Li; Osval A. Montesinos-López; José Crossa. Multimodal deep learning methods enhance genomic prediction of wheat breeding. G3 Genes Genomes Genetics 2023, 13(5). [CrossRef]
Admas Alemu; Johanna Åstrand; Osval A. Montesinos-López; Julio Isidro y Sánchez; Javier Fernández-Gónzalez; Wuletaw Tadesse; Ramesh R. Vetukuri; Anders S. Carlsson; Alf Ceplitis; José Crossa. Genomic selection in plant breeding: Key factors shaping two decades of progress. Molecular Plant 2024, 17(4), 552-578. [CrossRef]
Qi-Xin Zhang; Tianneng Zhu; Lin Feng; Dunhuang Fang; Xuejun Chen; Xiang-Yang Lou; Zhijun Tong; Bingguang Xiao; Haiming Xu. mmGEBLUP: an advanced genomic prediction scheme for genetic improvement of complex traits in crops through integrative analysis of major genes, polygenes, and genotype–environment interactions. Briefings in Bioinformatics 2024, 26(1). [CrossRef]
Yao Zhou; Ming Yao; Chuang Wang; Ke Li; Junhao Guo; Yingjie Xiao; Jianbing Yan; Jianxiao Liu. GEFormer: A genotype-environment interaction-based genomic prediction method that integrates the gating multilayer perceptron and linear attention mechanisms. Molecular Plant 2025, 18(3), 527-549. [CrossRef]
Kunhui He; Tingxi Yu; Shang Gao; Shoukun Chen; Liang Li; Xuecai Zhang; Changling Huang; Yunbi Xu; Jiankang Wang; B. M. Prasanna. Leveraging Automated Machine Learning for Environmental Data-Driven Genetic Analysis and Genomic Prediction in Maize Hybrids. Advanced Science 2025, 12(17), e2412423-e2412423. [CrossRef]
Cuiling Wu; Yiyi Zhang; Zhiwen Ying; Ling Li; Jun Wang; Hui Yu; Mengchen Zhang; Xianzhong Feng; Xinghua Wei; Xiaogang Xu. A transformer-based genomic prediction method fused with knowledge-guided module. Briefings in Bioinformatics 2023, 25(1). [CrossRef]
Wanjie Feng; Pengfei Gao; Xutong Wang. AI breeder: Genomic predictions for crop breeding. New Crops 2023, 1, 100010-100010. [CrossRef]
Mohsen Yoosefzadeh-Najafabadi; Sepideh Torabi; Dan Tulpan; Istvan Rajcan; Milad Eskandari. Application of SVR-Mediated GWAS for Identification of Durable Genetic Regions Associated with Soybean Seed Quality Traits. Plants 2023, 12(14), 2659-2659. [CrossRef]
Dwaipayan Sinha; Arun Kumar Maurya; Gholamreza Abdi; Muhammad Majeed; Rachna Agarwal; Rashmi Mukherjee; Sharmistha Ganguly; Robina Aziz; Manika Bhatia; Aqsa Majgaonkar. Integrated Genomic Selection for Accelerating Breeding Programs of Climate-Smart Cereals. Genes 2023, 14(7), 1484-1484. [CrossRef]
Wanchao Zhu; Rui Han; Xiaoyang Shang; Tao Zhou; Chengyong Liang; Xiaomeng Qin; Hong Chen; Zaiwen Feng; Hongwei Zhang; Xingming Fan. The CropGPT project: Call for a global, coordinated effort in precision design breeding driven by AI using biological big data. Molecular Plant 2023, 17(2), 215-218. [CrossRef]
Xiaoding Ma; Hao Wang; Shengyang Wu; Bing Han; Di Cui; Jin Liu; Qıang Zhang; Xiuzhong Xia; Peng Song; Cuifeng Tang. DeepCCR: large-scale genomics-based deep learning method for improving rice breeding. Plant Biotechnology Journal 2024, 22(10), 2691-2693. [CrossRef]
Hai Wang; Mengjiao Chen; Xin Wei; Rui Xia; Dong Pei; Xuehui Huang; Bin Han. Computational tools for plant genomics and breeding. Science China Life Sciences 2024, 67(8), 1579-1590. [CrossRef]
Hao Wang; Shen Yan; Wenxi Wang; Yongming Cheng; Jingpeng Hong; Qiang He; Xianmin Diao; Yunan Lin; Yanqing Chen; Yongsheng Cao. Cropformer: An interpretable deep learning framework for crop genomic prediction. Plant Communications 2024, 6(3), 101223-101223. [CrossRef]
Jinchen Li; Zikang He; Guomin Zhou; Shen Yan; Jianhua Zhang. DeepAT: A Deep Learning Wheat Phenotype Prediction Model Based on Genotype Data. Agronomy 2024, 14(12), 2756-2756. [CrossRef]
Kai Tong; Xiaojing Chen; Shen Yan; Liangli Dai; Yuxue Liao; Zhaoling Li; Ting Wang. PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes 2024, 15(5), 603-603. [CrossRef]
Jinlong Li; Dongfeng Zhang; Feng Yang; Qiusi Zhang; Shouhui Pan; Xiangyu Zhao; Qi Zhang; Yanyun Han; Jinliang Yang; Kaiyi Wang. TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield. Plant Communications 2024, 5(7), 100975-100975. [CrossRef]
Hao Wu; Rui Han; Liang Zhao; Mengyao Liu; Hong Chen; Weifu Li; Lin Li. AutoGP: An intelligent breeding platform for enhancing maize genomic selection. Plant Communications 2025, 6(4), 101240-101240. [CrossRef]
Ran Li; Dongfeng Zhang; Yanyun Han; Zhongqiang Liu; Qiusi Zhang; Qi Zhang; Xiaofeng Wang; Shouhui Pan; Jiahao Sun; Kaiyi Wang. Hybrid Deep Learning Approaches for Improved Genomic Prediction in Crop Breeding. Agriculture 2025, 15(11), 1171-1171. [CrossRef]
Andrew W. Herr; Peter Schmuker; Arron H. Carter. Large-scale breeding applications of unoccupied aircraft systems enabled genomic prediction. The Plant Phenome Journal 2024, 7(1). [CrossRef]
David Hobby; Alain J Mbebi; Zoran Nikoloski. Towards genetic architecture and genomic prediction of crop traits from time-series data: Challenges and breakthroughs. Journal of Plant Physiology 2025, 312, 154566-154566. [CrossRef]
Chenji Zhang; Sirong Jiang; Yangyang Tian; Xiaorui Dong; Jianjia Xiao; Yanjie Lu; Tiyun Liang; Hongmei Zhou; Dabin Xu; Han Zhang. Smart breeding driven by advances in sequencing technology. Modern Agriculture 2023, 1(1), 43-56. [CrossRef]
Muhammad Ahtasham Mushtaq; Hafiz Ghulam Muhu-Din Ahmed; Yawen Zeng. Applications of Artificial Intelligence in Wheat Breeding for Sustainable Food Security. Sustainability 2024, 16(13), 5688-5688. [CrossRef]
Jiayi Fu; S.Q. Zheng; Longjiang Fan; Xiaoming Zheng; Qian Qian. Breeding 5.0: Artificial intelligence (AI)-decoded germplasm for accelerated crop innovation. Journal of Integrative Plant Biology 2025. [CrossRef]
Yukang Zeng; Xiaoming Xu; Jiale Jiang; Shaohang Lin; Zehui Fan; Yao Meng; Ailijiang Maimaiti; Penghao Wu; Jiaojiao Ren. Genome-wide association analysis and genomic selection for leaf-related traits of maize. PLoS ONE 2025, 20(5), e0323140-e0323140. [CrossRef]
My Abdelmajid Kassem. Harnessing Artificial Intelligence and Machine Learning for Identifying Quantitative Trait Loci (QTL) Associated with Seed Quality Traits in Crops. Plants 2025, 14(11), 1727-1727. [CrossRef]
Donghyun Jeon; Yuna Kang; Solji Lee; Se-Hyun Choi; Yeonjun Sung; Tae-Ho Lee; Changsoo Kim. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. Frontiers in Plant Science 2023, 14, 1092584-1092584. [CrossRef]
Javier Mendoza-Revilla; Evan Trop; Liam Gonzalez; Maša Roller; Hugo Dalla-Torre; Bernardo P. de Almeida; Guillaume Richard; Jonathan Caton; Nicolás López Carranza; Marcin J. Skwark. A foundational large language model for edible plant genomes. Communications Biology 2024, 7(1), 835-835. [CrossRef]
Andrew Callister; Germano Costa-Neto; Ben P. Bradshaw; Stephen Elms; José Crossa; Jeremy Brawner. Enviromic prediction enables the characterization and mapping of Eucalyptus globulus Labill breeding zones. Tree Genetics & Genomes 2024, 20(1). [CrossRef]
Saulo Fabrício da Silva Chaves; Michelle B. Damacena; Kaio Olimpio G. Dias; Caio Varonill de Almada Oliveira; Leonardo Lopes Bhering. Factor analytic selection tools and environmental feature-integration enable holistic decision-making in Eucalyptus breeding. Scientific Reports 2024, 14(1), 18429-18429. [CrossRef]
Rafael Tassinari Resende; Alencar Xavier; Pedro Italo T. Silva; Marcela Pedroso Mendes; Diego Jarquín; Gustavo Eduardo Marcatti. GIS-based G × E modeling of maize hybrids through enviromic markers engineering. New Phytologist 2024, 245(1), 102-116. [CrossRef]
Parisa Sarzaeim; Francisco Muñoz-Arriola; Diego Jarquín; Hasnat Aslam; Natalia de León. CLIM4OMICS: a geospatially comprehensive climate and multi-OMICS database for maize phenotype predictability in the United States and Canada. Earth system science data 2023, 15(9), 3963-3990. [CrossRef]
Francisco Pinto; Mainassara Zaman-Allah; Matthew Reynolds; Urs Schulthess. Satellite imagery for high-throughput phenotyping in breeding plots. Frontiers in Plant Science 2023, 14, 1114670-1114670. [CrossRef]
Hoa Thi Nguyen; Md. Arifur Rahman Khan; Thuong Thi Nguyen; Nhi Thi Pham; Nguyen Thi Anh Thu; Touhidur Rahman Anik; Minh Nguyen; Mao Li; Kien Huu Nguyen; Uttam Kumar Ghosh. Advancing Crop Resilience Through High-Throughput Phenotyping for Crop Improvement in the Face of Climate Change. Plants 2025, 14(6), 907-907. [CrossRef]
Pengpeng Zhang; Jingyao Huang; Yuntao Ma; Xiujuan Wang; Mengzhen Kang; Youhong Song. Crop/Plant Modeling Supports Plant Breeding: II. Guidance of Functional Plant Phenotyping for Trait Discovery. Plant Phenomics 2023, 5, 0091-0091. [CrossRef]
Idan Sabag; Ye Bi; Maitreya Mohan Sahoo; Ittai Herrmann; Gota Morota; Zvi Peleg. Leveraging genomics and temporal high-throughput phenotyping to enhance association mapping and yield prediction in sesame. The Plant Genome 2024, 17(3), e20481-e20481. [CrossRef]
Claudia Aviles Toledo; Melba M. Crawford; Mitchell R. Tuinstra. Integrating multi-modal remote sensing, deep learning, and attention mechanisms for yield prediction in plant breeding experiments. Frontiers in Plant Science 2024, 15, 1408047-1408047. [CrossRef]
Q. Zou; Shuaishuai Tai; Qi Yuan; Yang Nie; Heping Gou; Longfei Wang; Chuanxiu Li; Jing Yi; Fangchun Dong; Zhen Yue. Large-scale crop dataset and deep learning-based multi-modal fusion framework for more accurate GG×EE genomic prediction. Computers and Electronics in Agriculture 2024, 230, 109833-109833. [CrossRef]
Freddy Mora-Poblete; Daniel Mieres-Castro; Antônio Teixeira do Amaral Júnior; Matías Balach; Carlos Maldonado. Integrating deep learning for phenomic and genomic predictive modeling of Eucalyptus trees. Industrial Crops and Products 2024, 220, 119151-119151. [CrossRef]
Yang Xu; Wenyan Yang; Jiayong Qiu; Kai Zhou; Guangning Yu; Yu-Xiang Zhang; Xin Wang; Yuxin Jiao; Xinyi Wang; Shujun Hu. Metabolic marker-assisted genomic prediction improves hybrid breeding. Plant Communications 2024, 6(3), 101199-101199. [CrossRef]
Julián García-Abadillo; Paul Adunola; Fernando Silva Aguilar; Jhon Henry Trujillo-Montenegro; John J. Riascos; Reyna Persa; Julio Isidro y Sánchez; Diego Jarquín. Sparse testing designs for optimizing predictive ability in sugarcane populations. Frontiers in Plant Science 2024, 15, 1400000-1400000. [CrossRef]
Nelson Lubanga; Beatrice Elohor Ifie; Reyna Persa; Ibnou Dieng; Ismail Rabbi; Diego Jarquín. Sparse testing designs for optimizing resource allocation in multi-environment cassava breeding trials. The Plant Genome 2025, 18(1), e20558-e20558. [CrossRef]
Rahul Kumar; Sankar Prasad Das; Burhan U. Choudhury; Amit Kumar; Nitish Ranjan Prakash; Ramlakhan Verma; Mridul Chakraborti; Ayam Gangarani Devi; Bijoya Bhattacharjee; Rekha Das. Advances in genomic tools for plant breeding: harnessing DNA molecular markers, genomic selection, and genome editing. Biological Research 2024, 57(1), 80-80. [CrossRef]
Daniel Crozier; Fabian Leon; Jales M. O. Fonseca; Patricia E. Klein; Robert R. Klein; William L. Rooney. Inbred phenotypic data and non-additive effects can enhance genomic prediction models for hybrid grain sorghum. Crop Science 2023, 63(3), 1183-1196. [CrossRef]
Paolo Annicchiarico; Abco J. de Buck; Dimitrios Ν. Vlachostergios; Dennis Heupink; Avraam Koskosidis; Nelson Nazzicari; Margherita Crosta. White Lupin Adaptation to Moderately Calcareous Soils: Phenotypic Variation and Genome-Enabled Prediction. Plants 2023, 12(5), 1139-1139. [CrossRef]
Johanna Åstrand; Firuz Odilbekov; Ramesh R. Vetukuri; Alf Ceplitis; Aakash Chawade. Leveraging genomic prediction to surpass current yield gains in spring barley. Theoretical and Applied Genetics 2024, 137(12), 260-260. [CrossRef]
Zitong Li; Qian-Hao Zhu; Philippe Moncuquet; Iain W. Wilson; Danny Llewellyn; Warwick N. Stiller; Shiming Liu. Quantitative genomics-enabled selection for simultaneous improvement of lint yield and seed traits in cotton (Gossypium hirsutum L.). Theoretical and Applied Genetics 2024, 137(6), 142-142. [CrossRef]
Paul Adunola; E Flores; E. M. Riva-Souza; Maria Amélia Gava Ferrão; João Felipe de Brites Senra; Marcone Comério; Marcelo Curitiba Espíndula; Abraão Carlos Verdin Filho; P. S. Volpi; Aymbiré Francisco Almeida da Fonseca. A comparison of genomic and phenomic selection methods for yield prediction in Coffea canephora. The Plant Phenome Journal 2024, 7(1). [CrossRef]
Alem Gebremedhin; Yongjun Li; Arun S. K. Shunmugam; Shimna Sudheesh; Hossein Valipour Kahrood; Matthew Hayden; Garry M. Rosewarne; Sukhjiwan Kaur. Genomic selection for target traits in the Australian lentil breeding program. Frontiers in Plant Science 2024, 14, 1284781-1284781. [CrossRef]
Alper Adak; Seth C. Murray; José Ignacio Varela; Valentina Infante; Jennifer Wilker; Claudia Irene Calderón; Nithya Subramanian; Natalia de León; Jianming Yu; Matthew A. Stull. Photoperiod associated late flowering reaction norm: Dissecting loci and genomic-enviromic associated prediction in maize. Field Crops Research 2024, 311, 109380-109380. [CrossRef]
Ali Raza; Shanza Bashir; Tushar Khare; Benjamin Karikari; Rhys G. R. Copeland; Monica Jamla; Saghir Abbas; Sidra Charagh; Spurthi N. Nayak; Ivica Djalović. Temperature-smart plants: A new horizon with omics-driven plant breeding. Physiologia Plantarum 2024, 176(1). [CrossRef]
Javaid Akhter Bhat; Xianzhong Feng; Zahoor Ahmad Mir; Aamir Raina; Kadambot H. M. Siddique. Recent advances in artificial intelligence, mechanistic models, and speed breeding offer exciting opportunities for precise and accelerated genomics-assisted breeding. Physiologia Plantarum 2023, 175(4), e13969-e13969. [CrossRef]
Troels Mouritzen; Katharina Meurer; Elesandro Bornhofen; Luc Janss; Martin Weih; Stig Uggerhøj Andersen. Faba bean genetics and crop growth models – progress to date and opportunities for integration. Plant and Soil 2025, 514(1), 47-64. [CrossRef]
Idan Sabag; Ye Bi; Zvi Peleg; Gota Morota. Multi-environment analysis enhances genomic prediction accuracy of agronomic traits in sesame. Frontiers in Genetics 2023, 14, 1108416-1108416. [CrossRef]
Rica Amor Saludares; Sikiru Adeniyi Atanda; Lisa Piche; Hannah Worral; Françoise Dalprá Dariva; Kevin McPhee; Nonoy Bandillo. Multi-trait multi-environment genomic prediction of preliminary yield trial in pulse crop. The Plant Genome 2024, 17(3), e20496-e20496. [CrossRef]
Osval A. Montesinos-López; Andrew W. Herr; José Crossa; Arron H. Carter. Genomics combined with UAS data enhances prediction of grain yield in winter wheat. Frontiers in Genetics 2023, 14, 1124218-1124218. [CrossRef]
Hermann Gregor Dallinger; Franziska Löschenberger; Herbert Bistrich; Christian Ametz; Herbert Hetzendorfer; Laura Morales; Sebastian Michel; Hermann Buerstmayr. Predictor bias in genomic and phenomic selection. Theoretical and Applied Genetics 2023, 136(11), 235-235. [CrossRef]
Mohsen Yoosefzadeh-Najafabadi; Mohsen Hesami; Milad Eskandari. Machine Learning-Assisted Approaches in Modernized Plant Breeding Programs. Genes 2023, 14(4), 777-777. [CrossRef]
Pengfei Gao; Haonan Zhao; Zheng Luo; Y.-T. Lin; Wanjie Feng; Yaling Li; Fanjiang Kong; Xia Li; Chao Fang; Xutong Wang. SoyDNGP: a web-accessible deep learning framework for genomic prediction in soybean breeding. Briefings in Bioinformatics 2023, 24(6). [CrossRef]
Elżbieta Wójcik-Gront; Bartłomiej Zieniuk; Magdalena Pawełkowicz. Harnessing AI-Powered Genomic Research for Sustainable Crop Improvement. Agriculture 2024, 14(12), 2299-2299. [CrossRef]
Chaokun Yan; Jiabao Li; Qi Feng; Junwei Luo; Huimin Luo. ResDeepGS: A deep learning-based method for crop phenotype prediction. Methods 2025, 244, 65-74. [CrossRef]
Rita Dublino; Maria Raffaella Ercolano. Artificial intelligence redefines agricultural genetics by unlocking the enigma of genomic complexity. The Crop Journal 2025, 13(5), 1350-1362. [CrossRef]
Abu Saleh Muhammad Saimon; Mohammad Moniruzzaman; Md Shafiqul Islam; M. Ahmed; Md. Mizanur Rahaman; Sazzat Hossain; Mia Md Tofayel Gonee Manik. Integrating Genomic Selection and Machine Learning: A Data-Driven Approach to Enhance Corn Yield Resilience Under Climate Change. Journal of Environmental and Agricultural Studies 2023, 4(2), 20-27. [CrossRef]
Rajib Roychowdhury; Soumya Prakash Das; Amber Gupta; Parul Parihar; K. Chandrasekhar; Umakanta Sarker; Ajay Kumar; Devade Pandurang Ramrao; C. Sudhakar. Multi-Omics Pipeline and Omics-Integration Approach to Decipher Plant’s Abiotic Stress Tolerance Responses. Genes 2023, 14(6), 1281-1281. [CrossRef]
Xiaoming He; Danning Wang; Yong Jiang; Meng Li; Manuel Delgado-Baquerizo; Chloee M. McLaughlin; Caroline Marcon; Li Guo; Marcel Baer; Yudelsy Antonia Tandrón Moya. Heritable microbiome variation is correlated with source environment in locally adapted maize varieties. Nature Plants 2024, 10(4), 598-617. [CrossRef]
Jon Bančič; Philip B. Greenspoon; R. Chris Gaynor; Gregor Gorjanc. Plant breeding simulations with AlphaSimR. Crop Science 2024, 65(1). [CrossRef]
Nathan Fumia; Ramakrishnan M. Nair; Ya-Ping Lin; Cheng-Ruei Lee; Hung-Wei Chen; Eric von Wettberg; Michael B. Kantar; Roland Schafleitner. Leveraging genomics and phenomics to accelerate improvement in mungbean: A case study in how to go from GWAS to selection. The Plant Phenome Journal 2023, 6(1). [CrossRef]
Danuta Cembrowska-Lech; Adrianna Krzemińska; Tymoteusz Miller; Anna Nowakowska; Cezary Adamski; Martyna Radaczyńska; Grzegorz Mikiciuk; Małgorzata Mikiciuk. An Integrated Multi-Omics and Artificial Intelligence Framework for Advance Plant Phenotyping in Horticulture. Biology 2023, 12(10), 1298-1298. [CrossRef]
Andrew W. Herr; Alper Adak; Matthew E. Carroll; Dinakaran Elango; Soumyashree Kar; Changying Li; Sarah E. Jones; Arron H. Carter; Seth C. Murray; Andrew H. Paterson. Unoccupied aerial systems imagery for phenotyping in cotton, maize, soybean, and wheat breeding. Crop Science 2023, 63(4), 1722-1749. [CrossRef]
Sizhe Xu; Xingang Xu; Qingzhen Zhu; Meng Yang; Guijun Yang; Haikuan Feng; Min Yang; Qilei Zhu; Hanyu Xue; Binbin Wang. Monitoring leaf nitrogen content in rice based on information fusion of multi-sensor imagery from UAV. Precision Agriculture 2023, 24(6), 2327-2349. [CrossRef]
David Hobby; Hao Tong; Marc C. Heuermann; Alain J Mbebi; Roosa A. E. Laitinen; Matteo Dell’Acqua; Thomas Altmann; Zoran Nikoloski. Predicting plant trait dynamics from genetic markers. Nature Plants 2025, 11(5), 1018-1027. [CrossRef]
Adnan Amin; Wajid Zaman; SeonJoo Park. Harnessing Multi-Omics and Predictive Modeling for Climate-Resilient Crop Breeding: From Genomes to Fields. Genes 2025, 16(7), 809-809. [CrossRef]
Shameela Mohamedikbal; Hawlader Abdullah Al-Mamun; Mitchell Bestry; Jacqueline Batley; David Edwards. Integrating multi-omics and machine learning for disease resistance prediction in legumes. Theoretical and Applied Genetics 2025, 138(7), 163-163. [CrossRef]
Ce Liu; Shengli Du; Aimin Wei; Zhihui Cheng; Huanwen Meng; Yike Han. Hybrid Prediction in Horticulture Crop Breeding: Progress and Challenges. Plants 2024, 13(19), 2790-2790. [CrossRef]
Meng Geng; Søren K. Rasmussen; Cecilie S. L. Christensen; Weiyao Fan; Anna Maria Torp. Molecular breeding of barley for quality traits and resilience to climate change. Frontiers in Genetics 2023, 13, 1039996-1039996. [CrossRef]
Cristiana Paina; Per L. Gregersen. Recent advances in the genetics underlying wheat grain protein content and grain protein deviation in hexaploid wheat. Plant Biology 2023, 25(5), 661-670. [CrossRef]
Naveen Puppala; Spurthi N. Nayak; Álvaro Sanz-Sáez; Charles Chen; Mura Jyostna Devi; Nivedita Nivedita; Yin Bao; Guohao He; Sy M. Traore; David A. Wright. Sustaining yield and nutritional quality of peanuts in harsh environments: Physiological and molecular basis of drought and heat stress tolerance. Frontiers in Genetics 2023, 14, 1121462-1121462. [CrossRef]
Marlon-Schylor L. le Roux; K. Kunert; Christopher A. Cullis; Anna-Maria Botha. Unlocking Wheat Drought Tolerance: The Synergy of Omics Data and Computational Intelligence. Food and Energy Security 2024, 13(6). [CrossRef]
Keo Corak; Rue K. Genger; Philipp W. Simon; Julie C. Dawson. Comparison of genotypic and phenotypic selection of breeding parents in a carrot (Daucus carota) germplasm collection. Crop Science 2023, 63(4), 1998-2011. [CrossRef]
Prabhu Govindasamy; Senthilkumar Muthusamy; Muthukumar Bagavathiannan; Jake Mowrer; Prasanth Tej Kumar Jagannadham; Aniruddha Maity; Hanamant M. Halli; G. K. Sujayananad; Rajagopal Vadivel; Das T. K. Nitrogen use efficiency—a key to enhance crop productivity under a changing climate. Frontiers in Plant Science 2023, 14, 1121073-1121073. [CrossRef]
Vishnu Ramasubramanian; Cleiton Antônio Wartha; Lovepreet Singh; Paolo Vitale; Sushan Ru; Siddhi J. Bhusal; Aaron J. Lorenz. GS4PB: An R Shiny application to facilitate a genomic selection pipeline for plant breeding. The Plant Genome 2025, 18(4), e70150-e70150. [CrossRef]
Mohammad Muzahidur Rahman Bhuiyan; Inshad Rahman Noman; M. A. Aziz; Md Mizanur Rahaman; Md Rashedul Islam; Mia Md Tofayel Gonee Manik; Kallol Das. Transformation of Plant Breeding Using Data Analytics and Information Technology: Innovations, Applications, and Prospective Directions. Frontiers in Bioscience-Elite 2025, 17(1), 27936-27936. [CrossRef]
Aubrey Streit Krug; Emily B. M. Drummond; David L. Van Tassel; Emily Warschefsky. The next era of crop domestication starts now. Proceedings of the National Academy of Sciences 2023, 120(14). [CrossRef]

Table 4. Illustrative audit of uncertainty and deployment reporting across representative recent studies.

Main modality / model type	Uncertainty reported?	Ranking stability reported?	Compute burden reported?	Sensing / data-acquisition burden discussed?	Deployment stage explicit?
Environmental covariates + MegaLMM [37]	No	Partial	No	No	Yes
Engineered envirotyping + tree-based ML [34]	No	No	Yes	No	Yes
Daily environmental sequences + deep learning [44]	No	No	Partial	No	Yes
AutoML with environmental feature reduction [45]	No	No	No	Partial	Partial
Bias analysis in genomic vs phenomic selection [96]	Partial	No	No	No	Partial
UAS phenomics + genomic prediction [59]	No	Partial	Yes	Partial	Yes
Temporal high-throughput phenotyping + longitudinal GP [75]	No	Partial	No	Partial	Yes

Note: Studies were selected to span environmental, multimodal, and bias-sensitive settings, covering the main model classes and data modalities discussed in the review; the table is illustrative rather than exhaustive. Coding rules were intentionally conservative. Uncertainty reported = confidence interval, posterior interval, prediction variance, or another formal uncertainty summary explicitly reported. Ranking stability reported = top-k overlap, rank correlation across resamples or environments, or explicit stability analysis. Compute burden reported = runtime, memory, hardware requirement, or training-cost information. Sensing / acquisition burden discussed = explicit discussion of phenotyping logistics, flight frequency, acquisition cost, or availability constraints. Partial = mentioned qualitatively but not quantified.

Table 5. Matrix-Style Deployment Framework for AI-Enabled Multi-Environment Prediction [10,21,28,34,37,59,79].

Breeding Stage operational use case	Typical candidate number	Data realistically available at decision time	Recommended validation split	Model families that are realistic for the stage	Main decision target
Early preselection untested genotypes in mostly familiar contexts	1,000–50,000+	Markers, pedigree, family structure, coarse site-year labels, sometimes basic historical environment summaries	Family-aware CV or untested genotype in tested environment splits	GBLUP, simple G×E terms, reaction norms, tree models only when strong covariates already exist	Cull lines and prioritize retention
Sparse testing across METs recovering missing G x E cells	200–10,000	Markers, historical environmental covariates, trial history, partial phenotype matrices, possibly stage-aware envirotyping summaries	Leave-site-year-out, leave-one-environment-out, or sparse-testing mask recovery	Factor-analytic models, reaction norms, MTME, engineered feature ML, environment-aware mixed models	Fill missing trial cells and support advancement
Late-stage regional recommendation placement and advancement decisions	20–1,000	Markers, site histories, richer environmental profiles, partial phenomics, management context, sometimes current-season UAS or sensor data	Leave-year-out or region holdout with explicit ranking-stability checks	Multimodal fusion, interpretable DL, hybrid biological-statistical models, phenomics-augmented GP when timing is honest	Placement, regional recommendation, product advancement
Untested genotype in untested environment hard extrapolation	case-specific	Markers plus dense environmental histories; phenomics only if available before the decision	Joint genotype-and-environment withholding with strict temporal and relatedness control	Reaction norms with strong envirotyping, sequence-based DL only when scale, diversity	Stress-test transportability and quantify decision risk

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.