Ideas in Genomic Selection with the Potential to Transform Plant Molecular Breeding: A Review

Estimation of breeding values through Best Linear Unbiased Prediction (BLUP) using pedigree-based kinship and Marker-Assisted Selection (MAS) are the two fundamental breeding methods used before and after the introduction of genetic markers, respectively. The emergence of high-density genome-wide markers has led to the development of two parallel series of approaches inspired by BLUP and MAS, which are collectively referred to as Genomic Selection (GS). The first series of GS methods alters pedigree-based BLUP by replacing pedigree-based kinship with marker-based kinship in a variety of ways, including weighting markers by their effects in genome-wide association study (GWAS), joining both pedigree and marker-based kinship together in a single-step BLUP, and substituting individuals with groups in a compressed BLUP. The second series of GS methods estimates the effects for all genetic markers simultaneously. For the second series methods, the marker effects are summed together regardless of their individual significance. Instead of fitting individuals as random effects like in the BLUP series, the second series fits markers as random effects. Differing assumptions regarding the underlying distribution of these marker effects have resulted in the development of many Bayesian-based GS methods. This review highlights critical concept developments for both of these series and explores ongoing GS developments in machine learning, multiple trait selection, and adaptation for hybrid breeding. Furthermore, considering the increasing use and variety of GS methods in plant breeding programs, this review addresses important concerns for future GS development and application, such as the use of GWAS-assisted GS, the long-term effectiveness of GS methods, and the valid assessment of prediction accuracy.


I. INTRODUCTION
Globally, crops represent 55% of calories and 40% of protein for human consumption, and 36% of calories and 53% of protein for animal feed (Cassidy et al. 2013). Many crops are also sources for industrial material and energy supply, such as maize for ethanol. The inevitable growth of the world's population demands a 50% increase in food production over the next three decades. As evidenced by past achievements, the development of crop varieties through breeding will continue to play a critical role in boosting crop production.
Before applications using genetic markers, traditional plant breeding relied on phenotyping to identify superior genetic lines. The introduction of genetic markers allowed breeders to partially eliminate phenotyping and conduct selection at an early stage with high selection accuracy. One such application is Marker-Assisted Selection (MAS), which is effective for simple traits controlled by major genes. For complex traits controlled by multiple genes, assessment of breeding values of genetic lines has to incorporate genetic markers spanning the entire genome, termed Genomic Selection (GS). In 1994, Rex Bernardo achieved an early milestone for the development of GS by replacing pedigree-based kinship (Henderson 1975b(Henderson , 1975a(Henderson , 1976a with a marker-based kinship estimated using RFLP (Restriction Fragment Length Polymorphism) markers spanning the entire maize genome (Bernardo 1994). Instead of using only the markers associated with traits of interest, as in MAS, GS uses all the genomic markers to define the kinship among individuals.
This marker-based kinship has the advantage of capturing Mendelian sampling errors and segregation distortion that were not able to be identified using pedigree-based kinship. As Best Linear Unbiased Prediction (BLUP) is used to evaluate the genomic value of recombinant inbred lines, the method has come to be known as genomic BLUP, or gBLUP (VanRaden 2008;Zhang et al. 2007).
Compared with MAS, which identifies genetic markers that co-segregate with causal genes and have statistically significant effects on traits of interest, GS uses all the available markers across a genome, regardless of whether they have significant effects or not. Inspired by MAS, the second milestone of GS was introduced by Meuwissen et al. in 2001, which first estimates the effects of all markers in a parental generation, then sums them together to predict performance of future progeny. These markers are treated as random effects. Although it is commonly understood that the random effects follow normal distributions, a variety of assumptions on the prior variance distribution have prompted the development of different methods by which the Bayesian theorem has been used, resulting in a series of Bayesian alphabet-named methods. These methods include Bayes A, where markers are assumed to have different variances, and Bayes B, where a certain proportion of markers are restricted to have no effect. Similar to the Bayesian alphabet series, kinship-based BLUP methods have also evolved into their own BLUP alphabet series, including combining marker-based kinship and pedigree-based kinship in a single-step BLUP (ssBLUP) and predicting individuals' genetic effects based on their corresponding groups using kinship compression (compressed BLUP, or cBLUP).
The development of GS methods and their application remains one of the most active areas in genomic research. These developments have been reviewed multiple times over the last 25 years.
This review highlights the critical ideas that are in the process of transforming plant molecular breeding. First, we review the method developments to improve selection accuracy, including the first implementation of gBLUP in maize, the expansion from MAS to GS Bayesian methods, the development of machine learning methods, and the use of Genome-Wide Association Studies (GWAS) for GS. Secondly, we review three factors that influence selection responses, including hybrid breeding, enhancement using multiple traits, and preservation of variation for long-term selection. Finally, we review the pitfalls of the assessment of prediction accuracy and discuss the transformation of plant breeding as a result of GS.

II. BLUP ALPHABET
The mixed linear model (MLM) equation invented by Charles Henderson in 1976(Henderson 1976) combines random and fixed effects in a single equation to simultaneously calculate best linear unbiased predictions (BLUPs) for random effects and best linear unbiased estimates for fixed effects. Compared to the general linear model, which contains only fixed effects except for the residual, the MLM contains both the random residual and random animal effects. In many cases, the levels of random effects are equal to or larger than the number of observations. To avoid the overfitting problem, the MLM equation was built on the optimization of likelihood, instead of the least squared error used in general linear model. The maximum likelihood approach also allows the random effects to have a variance structure. In the case where random variables represent individual additive genetic effects, the kinship among individuals can be derived from pedigrees using the definition developed by Sewall Wright (Wright 1922). In 1976, Henderson developed an algorithm to build the inverse of the numerator relationship matrix (A -1 ) directly from a pedigree (Henderson 1976) in computing time that is linear to the number of individuals ( Figure 1). Figure 1. Best Linear Unbiased Prediction and its deviations for genomic selection. Individual phenotypes (y) are decomposed into their genetic effects (u) and residuals (e). Under decomposition of phenotypes (y) of individuals into genetic effects (u) and residuals (e), Dr.
Henderson developed the mixed linear model equation in the 1960s to directly solve for u as the Best Linear Unbiased Prediction of breeding values by using the pedigree-based kinship to define the variance structure of the genetic effects. The pedigree-based kinship was substituted with the marker-based kinship by Dr. Bernardo in 1994. Individuals' genetic effects were predicted indirectly by summing all the marker effects (si) following normal distribution with mean of zero and variance of for i=1 to n covering the genome (Meuwissen et. al., 2001). Since then, the direct BLUP methods have been evolved into multiple formats, including the prediction of individuals' genetic effects based on their group effects (Wang et.al., 2019).

6
The MLM can also contain random effects for individuals that do not have observations, which makes it possible to evaluate the genetic merit of bulls from their daughters in animal breeding. Before genomic selection, the BLUP method, in conjunction with pedigree, served as the primary method in animal breeding, including dairy and beef cattle, sheep, chicken, and pig.
Many free software packages have been developed, e.g., VCE (Groeneveld 1994), MTDFREML, DMU, ASReml, and BLUPF90 (Misztal et al. 2002). Since crops rarely perform differently due to sex, and more importantly, because plant breeding relies on the selection among siblings which cannot be differentiated by pedigree, BLUP was not used as much in crops as in livestock until genetic markers covering the entire genome became available. In 1994, Bernardo used genomewide RFLP markers to define kinship among maize lines to evaluate their genetic merits using BLUP (Bernardo 1994). Marker-based kinship has advantages over pedigree-based kinship, because it captures the realized kinship, which includes factors that are not part of pedigree-based kinship, including Mendelian sampling error and segregation distortion. However, similarities between the two approaches have allowed many software packages originally developed for pedigree-based kinship to be extended to include marker-based kinship with minor modifications ( Figure 2). In 2007, marker-based kinship was added into MTDFREML as the second way to define kinship among individuals (Zhang et al. 2007). A more efficient algorithm was developed (VanRaden 2008) by multiplying the genotype matrix and its transpose and adjusting for allele frequency: where G is a marker-based kinship (genomic relationship matrix), W is a matrix of centralized genotypes, and allele frequency is noted as . In 2011, gBLUP with the efficient kinship algorithm was implemented in an R package, rrBLUP (Endelman 2011).
Since it is often not possible to genotype every individual, especially ancestral individuals, a single-step BLUP (or ssBLUP) method was developed that uses both marker-based kinship and pedigree-based kinship simultaneously (Aguilar et al. 2010;Christensen and Lund 2010;Legarra et al. 2009). The inverse of the joint kinship (H) is estimated by using an arbitrary weight ( ) between 0 and 1: 7 where A22 is the pedigree relationship matrix among genotyped animals. Studies demonstrated that this method had better prediction accuracy than BLUP using pedigree alone (Chen et al. 2011).
Since then, ssBLUP has dominated the application of genomic selection in animal breeding Zhang et al. 2016).
Figure 2. Direct and indirect estimation of genomic breeding values. When n individuals' phenotypes (y) are decomposed into a black box and residuals (e), the black box has to be parameterized to make it most likely to have the observed phenotypes for the given parameters. The direct approaches specify a genetic variance ( ) to define the distribution of the individuals' genetic effects by incorporating kinship (K) derived from pedigree or genetic markers. The indirect approaches estimate the effects of genetic markers (g) first and then sum them ( ) to estimate genomic breeding values indirectly (note that the symbols with and without hat represent the estimate and the true effect, respectively). When K is defined as M'M and marker effects have independent normal distribution with mean of zero and a variance ( ), the direct and indirect approaches give the same estimates of genomic breeding values.
Before genomic BLUP gained popularity, a different approach developed by Meuwissen et al. in 2001 boosted the usage of all genetic markers, as opposed to MAS, which uses only the statistically significant markers. The 2001 study fit all markers as a random effect and summed them together to predict the genetic merit of individuals. These marker effects were assumed to follow a normal distribution with zero mean and variances. When the variances are assumed to be a constant, the model become a ridge regression. When the variances are assumed to be random variables that follow an inverse Chi-square distribution, the model has to be solved using Bayesian methods. It was demonstrated that ridge regression gives a prediction identical to the gBLUP when the efficient kinship algorithm is used (VanRaden 2008).
Since the random effects in MLM are specific for a trait of interest, their variance structure should also be trait specific. With pedigree kinship, the specification is limited; however, markerbased kinship can be specific for a trait of interest. Instead of having all the genetic markers contribute equally to the marker-based kinship, these markers can be weighted for a particular trait based on their effects estimated by GWAS. This modified marker-based kinship method was named TABLUP (Zhang et al. , 2011. After the introduction of the SUPER GWAS method , TABLUP was further specialized for traits controlled by a small number of genes. In this specialization, kinship is built using only the associated markers determined by the maximum likelihood method. Consequently, the method for genomic selection was named SUPER BLUP, or sBLUP. Both real data and simulations demonstrated that SUPER BLUP was superior to regular gBLUP for traits controlled by a small number of genes (Tang et al. 2016;Jiabo Wang et al. 2018).
Another modified marker-based kinship method considers hierarchical population structures that involve clustered groups of closely related individuals. In this case, individuals can cluster into a closely related group, which is at a greater distance from another group. For traits with low heritability, the phenotypic deviation of an individual from the population mean may be dominated by error rather than by the genetic difference. However, the phenotypic average across individuals in a group can be much closer to the genetic average of the group than the individual phenotype is to the individual genetic value. This group genetic value can be a better representation of individual genetic value than the prediction of individual genetic value based on individual phenotype. Simulations demonstrated that GWAS using groups as random effects for background control identified more causal mutations than when using individuals as random effects in an MLM . Since individuals are compressed into groups for GWAS, the method was named compressed MLM. For GS, the method was similarly called compressed BLUP (cBLUP).
Tests with both real and simulated data have demonstrated that cBLUP is superior to gBLUP and sBLUP for traits with low heritabilities ) and populations with this hierarchical structure.
There is only one random effect in addition to the residual effect in all BLUP methods reviewed above, including the traditional BLUP based on pedigree relationships, regular gBLUP based on genomic relationships derived from all genetic markers, single-step BLUP combining pedigree-based and marker-based kinships together, TABLUP with markers weighted by GWAS, SUPER BLUP with kinship derived from associated markers only, and compressed BLUP with individuals predicted by their corresponding groups. In such cases, likelihood is a function of the ratio (λ) between residual variance and the genetic variance using the efficient mixed-model association (EMMA) algorithm (Kang et al. 2008). The ratio can be expressed as a function of heritability: λ = . Interval search can be efficiently used to find the heritability that maximizes the likelihood by varying heritability from 0 to 1 with limited intervals. The EMMA algorithm has been implemented in rrBLUP for gBLUP (Endelman 2011), and in GAPIT for gBLUP, SUPER BLUP, and compressed BLUP (Tang et al. 2016).

III. BAYESIAN ALPHABET
When available markers cover an entire genome, the marker based-kinship can be transformed in a way that is not feasible for pedigree-based kinship. The total additive genetic effects of a trait can be decomposed into the sum of individual marker effects ( Figure 2). The milestone research by Meuwissen et al. documented this decomposition (Meuwissen et al. 2001). To solve the "large p, small n" problem (i.e., more parameters than observations), genetic markers were treated as random effects. If these markers are fitted as fixed effects, it causes overfitting in the fixed-effect model. These marker effects were assumed to follow a normal distribution with zero mean and an unknown variance. In the 2001 study, one approach was to set the unknown variance as a constant, which turned the genomic prediction problem into a well-established statistical problem known as ridge regression (Figure 3). Markers are fitted simultaneously as random effects following normal distribution with mean of zero and variances that vary across methods. Ridge regression uses a constant (black) for all the variances. The variances are different across all markers and follow a prior distribution (gray) in Bayes A method. Bayes A turns to Bayes B when some of the variances are restricted to be zero for part of marker ( ). Bayes B turns to Bayes CPi when the non-zero variances are restricted to be the same.
In reality, it is rarely true that all genetic marker effects follow the same distribution, as assumed by the ridge regression method. On the other hand, treating these unknown variances as random variables also needs restriction to eliminate potential model overfitting. The Bayesian approach effectively solves this problem by setting the distribution hierarchy (Tipping and Faul 2003). For example, each marker effect can follow a different normal distribution with its variance parameter. These variances are then required to follow a single inverse Chi-square distribution, governed by the hyperparameter prior distribution. This restriction for the prior distribution of variances was named Bayes A ( Figure 3).
The Bayesian approach can also accommodate a variety of different prior distributions.
Ridge regression has the highest restrictions on both marker effects and hyperparameters. All the marker effects follow the same normal distribution. The variance is a constant with a flat prior distribution domain from negative infinity to positive infinity. In contrast, Bayes A has the least restriction. Each marker has its own distribution with a distinct variance. Bayes A can be modified to restrict a proportion () of markers to have no effect. This modified method is termed Bayes B.
Similarly, Bayes B can be further modified to create another method (Cπ) by restricting all nonzero marker effects to have the same normal distribution with variances following an inverse Chisquare distribution ( Figure 3). Comparisons were conducted for the Bayesian methods developed between 2001 and 2011 (Kärkkäinen and Sillanpää 2012), including Bayesian Lasso, in which the inverse Chi-square distribution from Cπ is changed to a Laplace distribution (double exponential).
For ridge regression, there is only one random effect besides the residual. The variance components can be explicitly estimated using algorithms such EMMA (Kang et al. 2008).
Bayesian methods have more parameters than ridge regression. In practice, the parameter estimation is conducted by sampling. Let y denote the data (e.g., phenotype and genotype), θ denotes the parameters (e.g., marker effects and variance parameters of marker effect distribution), (y) denotes the probability density of y (constant), (θ) denotes the probability density for θ (prior), ( |θ) denotes the likelihood of y given θ, (θ|y) denotes the probability density of θ given y (posterior). With enough samples of the posterior, parameters for the posterior distribution can be inferred by using sample means. The problem is that the posterior distribution is hard to get, but Bayesian transformation can be used to transform a hard problem to an easier problem: The equation transforms posterior (θ|y) into the product of an easy-to-solve probability density of y given θ and a prior distribution which represents a belief. For joint distribution of multiple

IV. MACHINE LEARNING
The genomic BLUP method introduced by Bernardo in 1994 can also be interpreted as a special case of machine learning. The kinship matrix is termed "kernel" in the field of engineering. A kinship matrix K can be multiplied by itself to derive a new kernel K 2 =K'K. The process can be iterated until there are no further changes with multiplication. During the matrix multiplication process toward equilibrium, the optimum level of multiplication can be determined by the typical training process of machine learning, i.e., by dividing the entire population into training and testing populations. The recent application of machine learning in genomic prediction has expanded to include many domains of machine learning, including neural network or deep learning, which has been compared with other machine learning methods in crops like wheat. Only a few methods of machine learning for genomic prediction, including random forest, gradient descent, and deep learning, will be reviewed here.

Principles of Supervised and Deep Learning Methods
Machine learning (ML) is a domain of computer science that uses algorithms to gain the ability to automatically learn and improve from experience without being explicitly programmed (Samuel 1959). Some of the main advantages of ML methods over conventional statistical methods in large genomic data analysis include the following: (1) they have the ability to deal with "large p, small n" problems; (2) they are black-box approaches that do not require any prior knowledge about distributions of responsible variables or underlying genetic models affecting traits of interest; (3) they can take multiple interactions or correlations among features into account; and (4) they can provide high prediction accuracy due to both training and validation procedures built into the processes of individual algorithms that allow users to predefine training and validation datasets, or allow ML to apply a randomly assigned cross-validation approach to a large population for ML can be divided into supervised and unsupervised methods. Supervised learning involves using labelled input training data (e.g., SNPs or sequence variants and phenotypes) to learn a prediction function either for regression or classification of a response variable, then examining the prediction error of the derived function using validation data. A prediction error consists of two parts: a variance and a bias. The goal of a learning algorithm is to minimize both bias and variance. Examples of supervised learning methods include Random Forest (Breiman 2001) and Gradient Boosting Machine (GBM) (Friedman 2001). Both Random Forest and GBM are decision tree-based ensemble methods. The main difference between Random Forest and GBM is that decision trees in Random Forest are independently generated with replacement and prediction error of an individual feature (e.g., a SNP) indicated by a variable importance measurement value, which is calculated by averaging prediction errors of all the decision trees that contain the specific feature. GBM proceeds through a step-wise assembly of many "weak learners" (e.g., small subsets of SNPs) to build a predictive model, i.e., the output of the later trees relies heavily on the results from previously sampled trees ).
Unsupervised learning involves discovering existing patterns (e.g., groups or associations) among input data variables themselves without information from an output response variable (Hastie et al. 2009). Examples of unsupervised learning methods include principal component analysis and cluster analysis, two popular pattern identification methods that are used widely for high dimensionality omics data analysis. Unsupervised learning can provide features for GS but cannot directly be used for GS. For example, principal components derived from genetic markers are important features to predict genetic potential (Dong et al. 2018).
Deep learning is another family of ML methods that can be used for both supervised and unsupervised learning. It derives representative information from input data by using a hierarchy of multiple layers of units (neurons). Each neuron computes a weighted sum of its inputs, and the

Application of Supervised Machine Learning in Genomic Prediction
To date, studies using supervised ML methods and large SNP chips in genomic prediction have produced mixed results in livestock and plant species, depending on several factors. Some of these include presence or absence of population structure, additive and epistatic nature of phenotypes, ML methods used, data quality (e.g., the amount of missing data), and representation of training with two conventional statistical methods (gBLUP and a two-step sequential method). The study concluded that "for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics" (Grinberg et al. 2019). Regarding individual methods, GBM performed very well for traits with greater complexity. In the presence of population structure, gBLUP was the best. However, in the presence of missing data, Random Forest was the most robust (Grinberg et al. 2019).
General consensus is that for moderate to high heritability traits that are driven by additive genetic models, there is no additional benefit to using machine learning methods for genomic prediction in comparison to conventional gBLUP or Bayesian methods (

Application of Deep Learning in Genomic Prediction
Early studies using shallow neural networks (e.g., single-layer networks) for nonparametric prediction of complex traits in plant and animal breeding suggested that neural networks could give high prediction accuracy (Gianola et al. 2011;González-Camacho et al. 2012;González-Camacho et al. 2016;Pérez-Rodríguez et al. 2012

Challenges and Opportunities for Future ML Applications in Breeding Programs
Genomic selection is revolutionizing genetic improvement programs. However, it still has difficulty predicting hard-to-measure and low-heritability traits (e.g., disease resistance traits). The digital revolution and high-throughput genomic technologies provide a new era for future smart breeding programs that require drawing real-time large data information from biological (e.g., genomics), environmental (including climate records), and management sources to accurately predict future individual performance. A great challenge here is how to integrate multiple sources of data, especially complex genetics-by-environment-by-management (GxExM) interactions, into an efficient prediction model. Systematic approaches that combine multiple types of omics information with environment and other sources are needed to predict complex phenotypes (Misra et al. 2018;Sperschneider 2019

V. GWAS-ASSISTED GENOMIC SELECTION
Molecular breeding started from single-marker applications and MAS still plays a key role in practice. Using combined validated and newly discovered markers has the potential to boost the prediction accuracy of genomic selection. Both theoretical and practical studies have been conducted to investigate the impact of incorporating these markers.
One of the key factors contributing to the success of GS is its use of genome-wide markers to capture the signals of both large-and small-effect causal mutations underlying traits (Lipka et al. 2015). Since GS methods can also be used for GWAS, the top-ranked markers from GWAS have been subsequently used for GS in two steps of ridge regression (Resende et al. 2012). As sequencing technology continues to improve, the likelihood of these causal mutations being included in the GS model increases. The direct benefit is to retain prediction accuracy over generations, because linkage disequilibrium breakdown between closely linked markers and causal mutations should theoretically become less frequent (Weller and Ron 2010). The indirect benefit is the increased number of validated mutations that provide options for conventional GS models.
In addition to integrating GWAS and GS in one process, such as in SUPER BLUP, there are several ways to conduct GWAS first and then merge the GWAS results into the GS process.

Various Approaches to Incorporate Validated Mutations
Considering the improvements in sequencing technology, various approaches for the specific incorporation of causal mutations into GS models have been explored (Figure 4). First, associated markers were incorporated as fixed effects (Figure 4d). A simulation study demonstrated that it improved prediction accuracy in the RR-BLUP model to include quantitative trait nucleotides, each explaining at least 10% of the genetic variance (Bernardo 2014). Real traits also demonstrated the same trend (Odilbekov et al. 2019). interest by using a genome-wide threshold, the red line in (a) from Genome-Wide Association Study (GWAS). The associated markers may (red arrows) or may not (black arrows) be genetically linked to the causal genes indicated by the dashed vertical lines. Some of the associated markers may not be used for MAS, as they are close to markers with stronger association signals (black dashed arrow). Genomic Selection (GS) uses all markers to define the kinship among individuals to predict their breeding values either by weighting all the markers equally (b) or by their strength of association (c). Other variations include incorporating the associated SNPs as fixed effects (d), or random effects (e) in the regular GS (b). Another variation treats the fragments with associated SNPs and the rest of the fragments as different random effects indicated by different levels of gray (f).
Second, the standard GS model was extended to include an additional random effect for the causal mutations (Brøndum et al. 2015). The additional random effect has its own kernel derived from the markers at the peaks in a genome-wide association study ( Figure 4e).
Third, the kernel, the genomic relationship matrix in the conventional GS model, was modified to put more weight on causal markers (Fragomeni et al. 2017). Such a genomic relationship matrix was derived from a weighting approach, where casual mutations underlying a trait received a stronger weight than the remaining genome-wide markers. In their simulation study, a 1.67-fold increase in prediction accuracy was observed when the genomic relationship matrix was weighted on the causal mutations ( Figure 4c).

Mixed Results for the Incorporation of Associated Markers
When applied in practice, mixed results were obtained when the GS model was modified to account for peak marker-trait associations (Lopes et al. 2017;Moore et al. 2017; Ingvarsson 2019). Prediction accuracy increased more than 10% when peak-associated markers from GWAS conducted in training sets were included as fixed-effect covariates for two out of four traits evaluated in rice (Spindel et al. 2016). Similarly, a 3-14% increase in prediction accuracy was reported for six Fusarium head blight resistance traits in wheat when a major-effect QTL was included in the GS model as a fixed-effect covariate (Arruda et al. 2016).
Other studies reported more modest increases, or even decreases, in prediction accuracy when using the same approach. When peak GWAS associations were included as fixed-effect covariates in gBLUP and Bayes B models, increases of 0.1-1% in prediction accuracy were observed for nine of 11 rice traits and two of three cattle traits . Similarly, increases in prediction accuracies and bias were observed when meta-GWAS signals were included as fixed effect covariates in GS models evaluating stature in Jersey and Holstein bulls (Raymond et al. 2018). In a simulation study conducted in maize and sorghum, only 60 of 216 simulated genetic traits demonstrated improvement when peak-associated markers from a GWAS conducted on training sets were included in an RR-BLUP model (Rice and Lipka 2019). Moreover, the study observed an increase in the variance of prediction accuracies across replicate traits, as well as an increase in bias in the prediction accuracies, with the inclusion of such markers as fixed effects.
Similar mixed results were obtained when incorporating peak GWAS signals as random effects in the GS model. While an increase of up to 5% in reliability was observed for dairy production traits known to be controlled by major-effect genes, smaller increases in prediction accuracy were observed for mastitis and fertility, two traits where major-effect genes have not been consistently reported across dairy cattle breeds (Brøndum et al. 2015). Decrease in prediction accuracy was observed when the weighted genomic relationship matrix was based on GWAS results instead of the simulated effects of quantitative trait nucleotides (Fragomeni et al. 2017).
These mixed results suggest that directly accounting for peak-associated genomic signals in GS models does not guarantee an increase in prediction accuracy and that the performance of such an augmented GS model depends upon the genetic architecture of the trait. Nevertheless, the adaptation of GS models to account for major-effect loci remains a relatively unexplored area of research. In particular, a systematic study of the prediction accuracies of such augmented GS models as a function of sample size would shed light on their utility to the ever-increasing amounts of data that are becoming available.

VI. HYBRID BREEDING
Many major crops like maize and rice utilize heterosis for yield improvement and development of novel cultivars. Genomic selection in hybrids has been previously reviewed for circumstances of low marker density (Zhao et al. 2015), and here we focus on the application of high marker density, which creates both opportunities and challenges. For single-cross hybridization breeding in maize, GS is widely employed to select candidate combinations by predicting F1 phenotypes from the corresponding genotypes of parental inbred lines (Cui et al. 2019;Riedelsheimer et al. 2012).
Therefore, GS prediction of F1 hybrids, which is used mainly to evaluate the hybrid performance of yield-related traits, must consider non-additive effects arising from heterotic loci in crop genomes (Zhou et al. 2012). In commercial maize breeding programs, application of GS usually partitions the training and prediction populations with the ratio of 1:4, to balance phenotyping costs and prediction accuracy (Erbe et al. 2010;Tan et al. 2017). The training population is composed of F1 hybrids generated from crossing parental lines from two heterotic groups between which the progeny frequently expresses superior heterosis for grain yield. In a North Carolina II (NC-II) design (Garretsen and Keuls 1978) between the two heterotic groups of inbred lines, the F1 hybrids with field-measured phenotypes in the training population accounted for 20% of all possible combinations, and the GS model predicted the phenotypes for the remaining 80% of combinations. The top 10-20% of F1 combinations may be selected for field trials. Because heterotic effects may vary greatly among populations, traits, and environments, multiple factors need to be considered when designing a GS project for the best prediction of hybrid performance (Guo et al. 2019).

Factors Affecting Hybrid Prediction Accuracy
Many factors can affect hybrid prediction accuracy. First of all, the genetic backgrounds of training and prediction populations need to be uniform, which is achieved when the training samples are representative of the prediction population. This can be verified by examining the phylogenetic tree of parental lines built based on their genotypes. When this is not the case, the genetic discrepancy between training and prediction populations will lead to problematic overfitting of the GS model, arising from the insufficiently expressed phenotypes, if phenotypes were collected in one place. For an extreme example in maize, if temperate and tropic maize lines are unevenly distributed in training and prediction populations, and the phenotypes are collected from temperate regions, the phenotypes for the tropic maize may not be fully expressed. Thus, model overfitting may occur as the correlation between genotype and phenotype is not correctly established for tropic maize.
Secondly, pedigree kinship and heterotic pattern of the two parental groups are essential for predictive effectiveness so that GS prediction with a much smaller training population than prediction population may achieve ideal accuracy to support decision making (Technow et al. 2014;Velazco et al. 2019). If the pedigree kinship and heterotic pattern were not explicit, heterotic performance of F1 phenotypes may not be fully expressed, which may result in failure to establish the correct correlation between genotypes and phenotypes of F1 hybrids.
Third, heterosis often involves interactions between genotype and environment (G×E), including the influences arising from macro-environment and micro-environment ).
While the macro-environment refers to the major ecological regions classified as different accumulative temperature belts, the micro-environment refers to the different planting conditions within the same ecological region, such as the local weather, nutritional conditions, microbial populations in the soil, and pathogenic conditions. Because macro-environment mainly influences flowering time attributed to photoperiodic genes with major effects, it is feasible to carry out a screening of parental lines based on prediction of the optimal ecological regions from genotypes to facilitate the design of training and prediction populations in a GS project.
Photo-thermal time was proposed to represent the environmental influence of one specific location on flowering time. Then, the correlation between genotype and photo-thermal time was modeled as a genotype-specific reaction-norm to quantitatively represent the contribution from G×E to phenotypic variations . Thus, when the genotype of an inbred line and photo-thermal time of different locations are available, flowering time can be predicted to assist selection of the optimal environment for the inbred line to construct training populations. In contrast, the influence from micro-environment may be too complicated to be accurately modeled, as its effect is relatively minor and fluctuates. The ideal solution is to estimate its contribution to phenotypic variations from multi-location field trials with the BLUP algorithm, so that the genetic contribution can be accurately modeled for GS prediction.

Heterosis Decomposition
Heterosis is generally considered the result of dominant and epistatic effects involving complex allelic, intra-genomic, and inter-genomic interactions. From the genetic perspective, heterosis can be dissected into two components based on the composition of an F1's phenotype. These to the trait, which may be inferred from the genomic relationship or kinship of the inbred lines in a pedigree (Bernardo 1994, Bernardo 1999Crossa et al. 2017 (Zhou et al. 2012). Therefore, GS prediction of different traits requires identification of heterotic QTLs, as well as quantitative inference of associated heterotic effects separately for each trait.

Modeling Heterosis
In practice, breeders traditionally use combining ability, including general combining ability (GCA) and specific combining ability (SCA), to evaluate hybrid performance and select candidate combinations. While the GCA is generally considered a reflection of the additive background effects of one parental genome, the SCA reflects non-additive interaction effects between the two parental genomes (Aliu et al. 2008). This is consistent with the assumption that the strength of heterosis expression of an F1 hybrid is the summed result of additive and non-additive QTLs. Thus, the outstanding GCA and SCA together result in excellent heterosis. Because GCA and SCA may arise from different loci with distinct types of genetic effects, the appropriate solution to GS prediction on F1 hybrids is to separately consider the effects from GCA and SCA.
Evaluation of GCA can be implemented through a linear model using the gBLUP (genomic Best Linear Unbiased Prediction) method, which uses the genomic relationship matrix inferred from whole-genome markers to predict phenotypes. Environmental factors can also be integrated into the gBLUP model as covariates (fixed effects). Because SCA is due to heterotic loci with nonadditive effects, a small panel of heterotic SNPs with quantitatively estimated effects is essential for an accurate evaluation of SCA in two possible ways. The first way to evaluate SCA is to integrate heterotic SNPs into the gBLUP model as fixed effects to enhance prediction accuracy.
The second way is to use non-linear models, such as Bayesian and machine learning methods, to predict MPH, followed by combining GCA and SCA to assist selection of candidate F1 combinations for field trials. The compilation of the heterotic SNPs may be implemented via the GWAS analysis of the MPH value of a trait to first identify heterotic QTLs. Then, one of the haplotypic tagging SNPs with significant p-value, closest to the peak SNP of the QTL, may be selected as the heterotic SNP. The number of heterotic SNPs used in the panel is usually dependent on the number of significant QTLs, typically 3 to 5 SNPs depending on the trait.

VII. MULTIPLE TRAITS
It is common that new varieties of crops are selected for multiple traits. For example, crop breeders observe phenotypic data to evaluate multiple traits in different categories, such as resistance to biotic stress (e.g., insect, plant disease, fungal) or abiotic stress (e.g., cold, heat, wind, flood), grain quality (e.g., color, taste, shape, nutrient level), and yield components (e.g., grain weight or biomass). The multiple traits being selected for very often share a certain level of common genetic architecture and thus are genetically correlated. This applies to non-agricultural species as well; an evaluation of genetic correlations among 24 traits in human genetic studies demonstrated shared genetic controls between many complex traits and diseases (Bulik-Sullivan et al. 2015).
For a long time, researchers have taken advantage of genetically correlated traits to map the genetic control of complex traits through various quantitative genetics approaches (Banerjee et al. 2008;Jiang and Zeng 1995;Xu et al. 2009). However, when genomic selection models were initially developed, the genome-wide marker effects were modeled to calculate the genomic estimated breeding value of a single trait (Bernardo 1994;Meuwissen et al. 2001). It took almost two decades before multi-trait genomic selection (MT-GS) received full attention. In 2011, the advantage of MT-GS was clearly demonstrated over single-trait genomic selection for genetically correlated traits using simulated data for animal breeding scenarios (Calus and Veerkamp 2011).
In 2012, Jia and Jannink presented more detailed model development of multivariate BayesA and BayesCπ using plant examples (Jia and Jannink 2012). In this report, the authors demonstrated for the first time that traits with low heritability can benefit from correlated traits with high heritability in GS. This insight provides one approach to apply MT-GS to accelerate genetic gain, especially for traits with low heritability or that are expensive to measure. Furthermore, the ability to impute phenotypes for focal traits with missing data using secondary traits measured on all individuals in both training and selection populations can improve the prediction accuracy of the individuals for selection by over 50% compared with prediction based on the focal traits alone.
The MT-GS statistical framework has been widely applied to broad breeding scenarios and human genetic studies after its initial published application in animals (Calus and Veerkamp 2011) and plants (dos Santos et al. 2020;Jia and Jannink 2012;Watson et al. 2019) In dairy cattle, reproductive and yield traits were modeled together through MT-GS (Guo et al. 2014) and confirmed the benefit of a trait with low heritability from MT-GS methods using a high-heritability trait. In humans, multivariate gBLUP was applied for genetic risk prediction of multiple diseases (Maier et al. 2015). With the MT-GS model, risk prediction is significantly improved, equivalent to the effects of increased sample size, by 34% for schizophrenia, 68% for bipolar disorder, and 76% for major depressive disorders with single-trait models (Maier et al. 2015). In plants, MT-GS has been evaluated in breeding data for soybean (Volpato et al. 2019), rice , cranberry (Covarrubias-Pazaran et al. 2018), rye (Schulthess et al. 2016), and hybrid wheat disease prediction (Schulthess et al. 2018).

Multi-trait Genomic Selection Models
The MT-GS model using gBLUP with marker-based kinship can be implemented as a regular BLUP with the numerator relationship matrix calculated from pedigree information. The only difference is the substitution of kinship. The MT-GS model of Bayesian methods is typically a linear regression as below.
y = u + X j a j d j + e where y is the a matrix (n  m) for m trait observations on n individuals, is a design matrix (n  p) representing the p marker genotypes on n individuals, a j is a vector of size m for the effects of molecular marker j on all m traits, d j is an unknown indicator variable with value 0 if marker j is not in the model and value 1 otherwise, is a matrix (n  m) of residuals.
The effect vector a j for marker j in a Bayesian MT-GS model is estimated from hierarchical models using multivariable normal distribution N(0, ). The variance of the distribution for marker j can be 0 if the marker j is not in the model under the BayesCπ framework where all markers share a common variance value. In contrast, under the BayesA method, the variance and covariance of each marker for multiple traits are estimated separately (Jia and Jannink 2012). Under the Bayesian framework, the correlation between the adjacent SNP markers on the chromosome due to linkage disequilibrium can be incorporated through the Bayesian multivariate antedependence model (Jiang et al. 2015). In addition, a multi-trait GS model can be expanded to a multi-trait and multi-environment Bayesian model with an available R package, named BMTME,

Multi-trait GS Applications in Plant Breeding
The MT-GS framework has been developed using different statistical or machine learning algorithms to fit different data types or needs in plant breeding. First, the primary and original application of MT-GS was to improve the prediction accuracy of traits of interest by integrating multiple correlated traits together. The insight that traits of low heritability can benefit from correlated traits with high heritability through the MT-GS model narrows the selection of favorable secondary traits. Second, when a secondary trait is available for individuals in both training and selection populations, the framework of phenotype imputation (Jia and Jannink 2012) or trait-X e å a j assisted genomic selection can improve the prediction accuracy by over 50% compared with the single-trait selection strategy (Fernandes et al. 2018). In practice, such secondary traits can be measured on individuals in an early stage of selection to implement stronger selection intensity.
High-throughput phenotyping using platforms such as unmanned aerial vehicles has significantly enriched data collection for secondary traits (Hassan et al. 2019;Moreira et al. 2020). As the rapid development of high-throughput phenotyping techniques continues, more relevant and useful phenotypes or traits will be available for MT-GS applications in plant breeding to meet different breeding needs. Additionally, the MT-GS model has the potential to shed light on the molecular characterization underlying correlated traits, especially when a specific DNA segment is detected for multiple traits of interest. The confirmed pleiotropy can be used to identify targets of genome editing or another molecular engineering approach to introduce a beneficial mutation to unlink the negative correlation between the multiple traits of interest (Börgel 2018).

VIII. LONG-TERM SELECTION
To increase the response of long-term selection, it is necessary to balance genetic gain and genetic diversity. Under long-term selection, the effective population size is reduced because of selection.
Consequently, genetic diversity decreases, causing the loss of favorable alleles and an increase in the rate of inbreeding (Bijma 2012). Although there are numerous simulation and cross-validation studies that report GS is more accurate than conventional selection, there is no direct evidence that GS increases long-term genetic gain in practical breeding schemes (Henryon et al. 2014).
In 2018, Doekes et al. evaluated genome-wide and region-specific diversity in Holstein-Friesian bulls in the Dutch-Flemish breeding program from 1986 to 2015 (Doekes et al. 2018).
After the introduction of GS, rates of inbreeding and kinship substantially increased. Some methods used to reduce inbreeding, such as the method of optimal contribution selection (OCS), which aims to maximize genetic gain with a restricted rate of inbreeding, caused a drop in the rates of inbreeding and kinship (Meuwissen 1997), suggesting that careful management of genetic diversity is indispensable to enhancing long-term genetic gain in GS breeding. To enhance longterm genetic gain, various methods have been proposed. The methods are roughly divided into the five categories reviewed here, including (1) upweighting rare favorable alleles, (2) genomic optimal contribution selection (GOCS), (3) selection based on potential progeny, (4) marker densities and prediction models, and (5) design of breeding populations.

Upweighting Rare Favorable Alleles
The idea behind this approach is to weight favorable alleles with lower frequencies more heavily to avoid losing such alleles from a population. In 2009, Goddard argued that the optimal weight to fix all favorable alleles in long-term selection depended only on the frequencies of the alleles, whilst the optimal weight in single-cycle selection depended only on the effects of the alleles, and suggested that the practical optimal is an intermediate of these two extremes (Goddard 2009). In 2010, Jannink proposed to weight markers based on both the effects and frequencies of favorable alleles. This approach has been modified to further improve long-term genetic gain (Jannink 2010).
In 2014, Sun and VanRaden proposed the inclusion of a parameter in the weight, to balance longterm and short-term gains (Sun and VanRaden 2014). To increase long-term genetic gain within a fixed time horizon, a dynamic weighting method was proposed , which decreases weights on the rate of favorable alleles over the time horizon.

Genomic Optimal Contribution Selection
To balance genetic gain and the rate of inbreeding, Optimal Contribution Selection (OCS) has been efficiently used in animal breeding (Doekes et al. 2018).  (Gorjanc et al. 2018). Another method was proposed to select optimal crosses by balancing genetic gain and inbreeding in a different formulation (Akdemir and Sánchez 2016). Recently, it was found via simulations that OCS based on pedigree relationships realized greater genetic gain than GOCS, even when gBLUP was used as the prediction method (Mark Henryon et al. 2019), suggesting that OCS is better than GOCS even in GS breeding.

Selection Based on Potential Progeny
The idea behind this approach is to select candidates based on the genetic ability of potential progeny that can be produced from the candidates. In 2015, Daetwyler et al. proposed optimal haploid value selection, which predicts the best doubled haploid that can be produced from a candidate (Daetwyler et al. 2015). The authors found that optimum haploid value selection preserved a greater degree of genetic diversity than GS and was important in achieving long-term genetic gain. In 2018, Müller et al. proposed an expected maximum haploid breeding value, which predicts the expected GEBV of the best double haploid lines produced from a candidate (Müller et al. 2018). This expected maximum haploid breeding value has two advantages over optimum

Marker Densities and Prediction Models
Long-term genetic gain is potentially influenced by marker densities and prediction models because weights on favorable alleles can be changed with differences in marker densities and prediction models. In 2014, MacLeod et al. performed simulations to evaluate whether wholegenome sequence data increased the accuracy over recurrent GS (MacLeod et al. 2014). They found that whole-genome data increased the accuracy with large effective population sizes but provided little advantage with reduced effective population size. In 2015, Liu et al. performed simulations and found that, compared to gBLUP, Bayesian Lasso led to less genetic drift and reduced loss of favorable alleles, and more effectively controlled the rate of both pedigree and genomic inbreeding .

Design of Breeding Populations
Genetic diversity can be maintained by designing breeding populations. In 2016, Yabe et al. proposed the island-model design of breeding populations, in which a large population was split into multiple subpopulations and each subpopulation received migrants from the others. The method was inspired by evolutionary algorithms, as well as population genetics. A simulation study suggested that the island-model GS better maintained genetic variation and genetic gain in later generations than did the normal GS (Yabe et al. 2016).
No study has comprehensively compared the potential of the methods in long-term selection, although some comparative studies have been published (de Rezende Neves et al. 2018;Müller et al. 2018). Moreover, various approaches can be extended from current and novel approaches. A "benchmark test" system for selection methods would be useful for the relative evaluation of the potential of the methods ).

IX. ASSESSMENT OF PREDICTION ACCURACY
The ultimate of goal of GS is to accurately predict the genetic potential for individuals that do not have phenotypes yet. To evaluate the accuracy of a prediction method or a breeding program, individuals with phenotypes must be used appropriately masked to conduct a valid assessment of prediction accuracy. Model fit in a training population cannot be used for the assessment of prediction accuracy due to overfitting problems, because most of the prediction involves more parameters than the number of observations. Validation of prediction must be conducted in an independent population, where the observations are not used to derive those parameters for prediction. The use of a separated population eliminates any bias and violation of using observations for both training and validation. The disadvantage is not being able to evaluate prediction accuracy when adding a proportion of the independent population as the training population. Consequently, cross-validation is one of the major approaches to evaluate prediction accuracy.
During the cross-validation process, the entire population is divided into k folds (e.g., five or ten). One of them is used as a testing population, while the rest are used as a training population.
Then, a model is built using only the training population and used to predict phenotypes for the testing population. The Pearson correlation coefficient between the observed and the predicted phenotypes can be instantly calculated from the individuals in the current testing population. This process of training a model and predicting phenotypes for a single testing fold is iterated until all folds have been treated separately as the testing population. When correlation coefficients are calculated separately for each testing fold and then averaged, this final accuracy value is referred to as the "instant" prediction accuracy. Alternatively, a single correlation can be calculated using predicted phenotypes across all folds. In this case, the accuracy value is referred to as the "hold" prediction accuracy (Zhou et al. 2016).
Both the hold and instant methods are used to evaluate prediction accuracy through crossvalidation. The difference between the two methods has not been fully recognized by the public, because details have rarely been provided in literature reports regarding the method used to calculate the correlation coefficients. In particular, GS models with substantial negative crossvalidated prediction accuracies may be suffering from bias related to how the accuracy was calculated. For example, significant negative correlation coefficients ranging from -0.24 to -0.42 were published in a genomic prediction study of maize . One explanation could be the switch of prediction sign. The other possibility is the expectation bias under the null hypothesis that there is no correlation between observed and predicted phenotypes in crossvalidation, for either hold or instant methods or both.
The difference between the instant and hold method accuracy was investigated. Using simulated and real data, it was demonstrated that both approaches exhibited downward bias under certain circumstances (Zhou et al. 2016). When the expected accuracy was zero, instant accuracy remained unbiased, whereas hold accuracy was negatively biased, especially with more folds.
Instant accuracy exhibited bias only when expected accuracy was far from 0 and 1, and the inference population was small. This bias can be corrected to a nearly unbiased estimate by modifying the formula used to calculate the instant accuracy. Considering these potential biases, it is important that future reports involving cross-validated accuracies also include methodological details regarding how the accuracy was calculated.
In addition to systematic bias due to hold accuracy and Person correlation using small samples, there many other factors for biased estimation of prediction accuracy (Runcie and Cheng 2019), including bias in testing data selection, overlap between testing and training populations, and pre-selection of GWAS features before splitting data for cross-validation. Because specific details regarding cross-validation procedures are overlooked or incomplete, it is not possible to determine if cross-validation errors have contributed to the mixed results. Proper cross-validation procedure involves dividing the entire population first, then GWAS should be conducted using the training population only ( Figure 5). However, if GWAS is conducted using the entire population before dividing it into training and testing populations, it violates this requirement even if the effects of the associated markers are reevaluated using only the testing population for prediction. Figure 5. Valid and invalid procedures to evaluate enhancement of incorporating GWAS results in genomic selection. Prediction accuracy is evaluated in a testing population whose phenotypes should not be used in any format to develop the predictions. Any contamination by using the phenotypes of the testing population may result in errors in prediction. The valid procedure (above the dashed line) should divide the whole population into training and testing populations first. Then the phenotypes of the testing population should not be used in any cases except for comparison between the predicted and observed phenotypes. An invalid procedure (below the dashed line) conducts activities on the whole population before splitting it into training and testing populations. In this example, Genome-Wide Association Study (GWAS) is conducted on the whole population. The phenotypes in the testing population are used for marker classification. Even if effects of associated markers identified by GWAS are re-estimated in the training population only, the selection of the associated markers is contaminated by the usage of the phenotypes in the testing population.
To demonstrate the bias of overestimated prediction accuracy, we simulated a random number as a phenotype for each of the 926 individuals genotyped with 4853 SNPs from the standard dataset of Loblolly Pine (Resende et al. 2012). There were no connections between the phenotypes and genotypes. The expected prediction accuracy was zero. The model fitting in the training population exhibited overestimates in all methods examined. The prediction accuracies in the testing population were zeros for using Ridge Regression, or GWAS assisted Ridge Regression using the valid procedure. However, when the GWAS assisted Ridge Regression was conducted using an invalid procedure, an average artifact of positive prediction accuracy (0.14) was observed ( Figure 6). Figure 6. Accuracy of predictions on randomly generated noise. Random noises were generated as phenotypes for 926 individuals genotyped with 4853 SNPs. These SNPs were used to predict the random noises in training and testing populations by using two methods: 1) ridge regression (RR); and 2) genome-wide association study (GWAS)-assisted RR. GWAS-assisted RR required RR to be conducted twice; see Resende et al. (Genetics, 2012, Vol 190, 1503-1510 for details. The first RR ranked markers based on their absolute effects. The second RR reestimated the effects for the markers that maximized the correlation between predicted and observed phenotypes. The 926 individuals were randomly separated into the training population (80%) and testing population (20%). The marker effects were estimated for all the SNPs using RR from the training population. The estimated marker effects were used to predict the random noises in the training and the testing populations, where Pearson correlation coefficients were calculated separately as prediction accuracies. The GWAS-assisted RR was conducted using two procedures: valid and invalid. The valid procedure (GWAS-RR/Valid) split training and testing populations first. GWAS and estimation of marker effects were conducted in the training population. The invalid procedure (GWAS-RR/Invalid) was contaminated by the usage of observations from the testing population. GWAS was conducted first on the entire population to select the associated SNPs. These SNPs were then used to estimate marker effects in RR conducted in the training population. The process was replicated 1000 times. The results suggested an artifact of positive prediction accuracy in the training population, and also an artifact of positive prediction accuracy (0.14) in the testing population, when the invalid procedure was used.

X. GS TRANSFORMED PLANT BREEDING
Although GS methods were introduced to plant breeding earlier than in animals (Bernardo 1994;Meuwissen et al. 2001), animal breeders adapted to GS applications much faster due to savings of time and phenotyping (Hickey et al. 2017). Because marker-based kinship is easy to implement in existing animal genetic evaluation systems (VanRaden 2008;Zhiwu Zhang et al. 2007), gBLUP and its derivations such as single-step BLUP have been widely used in practical animal breeding (Hayes et al. 2009;Hickey et al. 2017;Jonas and De Koning 2013). Using cattle as an example, to reach reliable prediction of a bull's milk potential, hundreds of daughters are required. This can be done relatively soon after the birth of the bull at a cost of several hundred dollars, or even much less. In contrast, the incorporation of GS approaches into plant breeding programs initially lagged behind animal breeding programs. A 2011 in-depth review of genomic selection methodology and application in plants (Lorenz et al. 2011) identified only a single peer-reviewed study that focused on empirical GS accuracy in crops before 2011 (Lorenzana and Bernardo 2009).
A breakthrough for the incorporation of GS in plants was the reduced cost of acquiring high-throughput, genome-wide genotypes for large genomes with high repeat content, which is typical for many agronomically important plant species. In the last decade, two methodologies for genotyping plant species have emerged that can economically scale to the sample sizes required for GS and are adaptable to a wide range of species. These are high-density, single-nucleotide polymorphism arrays (SNP chips) and genotyping-by-sequencing (Ganal et al. 2019). A major review of GS in 2017  indicates that the availability of these high-throughput genotyping methods has helped GS to become more feasible in plant studies. Numerous empirical studies were listed that reported predictive accuracies of different GS models in several plant species. By far, the most represented plant group were the cereal grains with over 40 papers listed.
Many non-grain species were also reported, with 7 in legumes, 5 in clonally propagated crops, and 13 for forest trees. Across all of these papers there was consensus that while the best specific GS modeling approach varied depending on the trait architecture, GS models in general were capable of predicting many agronomically important traits with a high enough accuracy compared to phenotypic selection to warrant incorporation into plant breeding programs. Since this review in 2017, the number of studies demonstrating the predictive capabilities of GS has increased and expanded from previously studied crops such as wheat ( incorporating GS into breeding programs were also discussed and indicated that GS can be substantially cheaper than phenotypic selection and brings the advantage that genotypes can be used for predicting multiple traits. While most discussion regarding the cost advantage of GS analysis has primarily focused on grains (Bernardo 2009;Endelman et al. 2014;Heffner et al. 2010; Voss-Fels et al. 2019), a large portion of the identified cost benefit is due to a decreased breeding cycle (Rutkoski 2019). Therefore, the cost advantage of GS is likely to be similar or even amplified in other species with longer breeding cycles, such as perennial crops and trees. However, considering the high structural diversity in plant breeding programs and biological differences between plant species, more specific studies are needed that explore the logistics of costeffectively incorporating GS into different types of breeding programs. The ongoing dialogue surrounding GS in plant breeding is no longer primarily a question of capability but instead has shifted towards implementation.

XI. FUTURE PROSPECTS
The studies reviewed here reach both inside and outside the breeder's equation. With rapid advances in genotyping technologies during the first two decades in the 21st century, agronomics and genomics are fully connected as a straight line through genomic selection in plant breeding.
This line is expanding into a surface through phenomics, formatting a Genomics-Phenomics-Agronomics (GPA) paradigm. Phenomics not only measures some existing agronomic traits in a high-throughput fashion, but also adds new heritable measurements that are correlated with conventional agronomic traits, or completely new traits with new agronomic content. Examples of this rapidly expanding research area include metabolomics (Zampieri et al. 2017), automated hyperspectral imaging of plant foliage (Thomas et al. 2018), and nondestructive large-scale imaging of root systems with minirhizotrons (Svane et al. 2019). For decades to come, the GPA paradigm is expected to continually transform the breeder's equation, inside and outside. These transformations include at least five areas: 1) transformation from the prediction of parental performance to the prediction of the potential of progeny; 2) expansion of prediction beyond additive effects to include dominant, epistatic, and genetic-by-environment interaction effects; 3) utilization of pleiotropy among conventional agronomics traits and high-throughput phenomics traits; 4) a deeper understanding of the genetic architecture of agronomic traits through GWAS and integration of these architectures into GS models; and 5) the utilization of emerging big data and machine learning approaches to take advantage of the high dimensionality of high-throughput data, especially deep learning using artificial neural networks.

Funding:
The