Omics in major cereals: applications, challenges, and prospects

we describe how these technologies have evolved to keep pace with crop improvement programs focusing on breeding applications. We section this review according to the technologies mentioned above and, in every section, describe their applications in cereals. Abstract: Omics technologies, viz., genomics, transcriptomics, proteomics, metabolomics, and phenomics, are becoming an integral part of virtually every commercial cereal breeding program because they provide substantial dividends per unit time in both pre-breeding and breeding phases. Continuous advances in cereal-omics promise — in combination with time efficiency — the cost ben-efits. In this review, we provide a comprehensive overview of the established cereal-omics methods in five major cereals, viz., rice, sorghum, maize, barley, and bread wheat. We cover the evolution of technologies in each omics section independently and concentrate on their use to improve economically important agronomic as well as biotic and abiotic stress-related traits. Advancements in the (1) identification, mapping, and sequencing of molecular/structural variants, (2) high-density transcriptomics data to study gene expression patterns, (3) global and targeted proteome profiling to study protein structure and interaction, (4) metabolomic profiling to quantify organ level small-density metabolites and their composition, and (5) high-resolution high-throughput image-based phenomics approaches are surveyed in this review.


Cereal genomics: evolution from sparse genetic markers to whole-genome sequencing
Identification of molecular markers, i.e., the observable polymorphisms among individuals of a population within a given DNA sequence, laid the foundations of modern genomics. In the 1980s, the detection of restriction fragment length polymorphisms (RFLPs) and their subsequent association with several primary agronomic importance traits forecasted the promises of genomics to improve the genetic gain per unit time. Later, many other marker systems-most notably microsatellite or simple sequence repeat (SSR) markers-were used to map quantitative trait loci (QTLs). Nevertheless, albeit their excellent use in locating the polymorphisms and use in applied breeding, these systems were time-and cost-inefficient and low throughput. For example, the first SSR map of wheat harbored only 279 loci [1]. Most economically important traits, e.g., grain yield, disease resistance, and grain protein content, are polygenic, i.e., they are controlled by the concerted action of several small-to medium-effect genetic loci [2]. Therefore, sparse genetic linkage maps harboring a limited number of genetic loci become inefficient for improving highly complex or quantitative traits mainly because of the trait-linked loci's absence.
Detection of single nucleotide polymorphisms (SNPs)-the smallest unit of DNA polymorphism-provides an opportunity to survey virtually millions of sites within a species' DNA. Thus, it has become a marker platform of choice. High-throughput, highefficiency, reproducibility, and low cost per data point have enabled large-scale germplasm evaluations in many cereals and, consequently, have resulted in the almost complete replacement of RFLPs or SSR marker platforms [3]. Major methods for SNP detection in cereals include array-based genotyping and genotyping-by-sequencing. Several sequencing technologies are available for both forms of SNP detection [3]. High-density SNP genotyping is invaluable for identifying the genetic underpinning of economically relevant traits and laying the foundation of whole-genome sequencing.

Variants apart from SNPs
While SNPs are an essential source for identifying and mapping traits of interests, studies show that "only" SNPs do not represent all the genomic variation that contributes to the resulting phenotype, and, therefore, other variants, e.g., structural variations (SVs)-that may be up to 1-kb long-play an essential role, as well. Insertions-deletions (InDels; smaller polymorphisms varying from one to 50-bp), inversion, translocation, and copy number variations (CNV) all come under the umbrella of structural variations. Maize is the first cereal in which hundreds of SVs were identified. However, later, this number was found to be underestimated, and efforts were initiated to discover more SVs among higher eukaryotes [10].
The studies of structural variations were recently accelerated in the crop plants primarily due to the reference genome sequence generation. Based on the sequence similarity at the DNA breakpoints, SVs are formed mainly by two mechanisms, viz., non-homologous end-joining (NHEJ) and non-allelic homologous recombination (NAHR) [for review see 11]. Apart from these mechanisms, transposons also generate SV. In general, SVs can be detected mainly by three methods, viz., (1) re-sequencing, (2) the de novo assembly, and (3) the pan-genome assemblies. The resequencing approach mainly identifies CNV and presence-absence variations (PAV), whereas the de novo approach-along with CNV and PAV-also identifies inversions. Nevertheless, the resequencing approach remains the preferred approach to detect the SVs due to its low cost and lack of de novo assembly generation for each variety under investigation. The CNVs arise from the unbalanced DNA modifications that lead to the variable number of copies of a specific DNA sequence [10]. CNVs may vary from 1-kb to several Mbs. Studies show that, along with SNPs and InDels, CNVs are key contributors to intra-species genetic variation. The PAVs can be considered as the extreme form of the CNVs. In PAV, a genomic sequence is present in one individual and absent from the other. In the past few years, SVs affect several traits in different cereals. For instance, 17.1-kb tandem duplication of GL7 locus in rice leads to an increase in the grain size [12], CNVs of Vrn-A1 and Ppd-B1 affects the flowering time in wheat [13], and a 7-bp deletion on HvGA20ox2 gene reduces the plant height and delay flowering time in barley [14].
Internal Figure 3. Types of structural variants. Inverted triangles show the position of a given variation with respect to the reference genome.

Genetic mapping
Several statistical methods can be employed to link the polymorphism to the traits under investigation-most common of which are regression analyses. In cereals-as for many other crops-polymorphisms or variations among individuals can be (1) artificially generated, i.e., via crossing different parents, and (2) surveyed in a natural population, e.g., set of elite lines, gene bank accessions, etc. In the following, we provide most common methods to link genetic polymorphisms to the traits under investigation.

Linkage mapping
Linkage mapping refers to mapping the quantitative trait loci (QTLs) in mostly artificially created segregating population/s. Many traits of economic importance, such as grain yield, stress tolerance, and disease resistance, are of quantitative nature, i.e., they are governed by a concerted action of many genetic loci [2]. Therefore, segregating populations harboring virtually hundreds of individuals are required to dissect the genetic nature of a quantitative trait. Different types of segregating populations such as F2 population, recombinant inbred lines (RILs), doubled haploid (DH) population, heterogeneous inbred family (HIF), near-isogenic lines (NIL), advanced intercross recombinant inbred lines (AI-RIL), backcross inbred lines, and multiparent advanced generation intercross (MAGIC) are developed based mainly on the available resources and research objectives. These segregating populations are mostly based on crosses between contrasting parents, resulting in a limited genetic diversity. Linkage mapping is the most commonly used method to detect genes underlying essential traits. Nevertheless, resources and time to develop these mapping populations coupled with a narrow genetic base plus low allelic richness and mapping resolution are some of the drawbacks of linkage mapping.

Genome-wide association studies
Genome-wide association studies (GWAS) take advantage of the long history of recombination events in the diverse natural population to dissect the genetic nature of a trait. The use of natural population overcomes the constraints of the linkage mapping as it increases the mapping resolution and reduces the research time [15] GWAS was initially used to study the human's complex traits, and then it was adopted for animals and some model organisms. In the last decade, with the improvements in genotyping techniques, decreased cost of sequencing, and robust statistical methods, researchers have adopted the GWAS for dissecting the genetic architecture of complex traits in plants. GWAS identifies marker-trait associations (MTA) that can be attributed to the strength of linkage disequilibrium (LD) between polymorphic markers across a set of diverse germplasm. In a nutshell, GWAS analysis is performed to evaluate each genotyped marker's association with a trait of interest that has been scored across a diverse natural population. GWAS analysis can be used to study both qualitative and quantitative traits. Several aspects must Internal be considered for starting the research, e.g., selection of genotyping platform, sample or population size and structure, statistical analyses, and correction for multiple testing (Bonferroni correction, false discovery rate (FDR) correction factor). Table 1 describes the use of linkage mapping, GWAS, or both for cereal improvement.

The study of species-level variations via pangenomes
The pangenome aims to catalog genic presence-absence variations within a species [41]. A pangenome contains a core genome, i.e., genomic sequences present in all the individuals of a species-and a variable genome, i.e., genomic sequences present in some individuals. The first step to establish a pangenome in any crop species is selecting a diverse set of genotypes, including domesticated and wild progenitors, for sequence assembly. It is also wise to choose genotypes of breeding and genetic value to increase the pangenome's importance for future breeding programs. Genotypes belonging to secondary and tertiary gene pools of a particular species are added to form a genus-level pan-genome. The reference-quality genomes are then generated for the small set of accessions and aligned to the reference genome to detect the structural variations. k-mers present in the SVs are extracted and determined in the form of short-read data from a diversity panel to genotype the underlying SV, and the matrices of the k-mers count are used as biallelic markers in the QTL mapping or genome-wide association studies [42]. Pangenome has already been established in various cereals, viz, rice [43], wheat [44], and barley [41]. In barley, a pangenome of 20 barley assemblies was constructed, single-copy k-mers from the structural variants in these 20 assemblies were detected, and a k-mer abundance matrix was used to perform the GWAS for lemma adherence [41].

Challenges and prospects in crop-genomics
In the past, whole-genome sequencing efforts were hindered mainly by the (1) extensive and repetitive genome sequences of the cereals and (2) the absence of current technologies and algorithms that are robust and exact in generating and assembling the large and correct sequences. Therefore, this has been perhaps the most important reason why considerable international consortia efforts were required. Although large-scale genome sequence production and assembling are currently costly, with continuous innovation in technologies, future large-scale reference-quality genome assemblies will be easier mainly because of the small cost-outcome differential. It can be safely speculated that the construction of genome assemblies will continue to the point where the difference between whole-genome genotyping and whole-genome sequencing will be negligible [4]. With the improvements in sequencing and computing facilities, the production per unit of input will be improved, which will be beneficial for cereal geneticists and breeders. As described elsewhere, robust QTL mapping and gene cloning hinge on dense genetic/physical maps' availability. Advances in genomics will help in fast and accurate mapping of the traits. Also, with the availability of the dense marker information, the methods of prediction of genotypic (in case of inbreeding crops) or breeding (additive) value (in case of outcrosses) will become more efficient to improve the genetic gains per unit time and cost.

Cereal transcriptomics
The genetic content in all the cells of an organism is the same; even then, different cells perform different functions and possess varying compositions under diverse circumstances. As per the central dogma of molecular biology, DNA is transcribed to RNA, and further RNA is translated to proteins, which are functional units [45]. Therefore, mRNA serves as transient molecules in the execution of genetic information stored in DNA. The whole set of RNA transcripts produced by an organism under any specific conditions is called transcriptome, and the study of this transcriptome is known as transcriptomics [46].

Transcriptomics techniques
The first attempt to study RNA transcripts was made in the 1970s when mRNA libraries of silk moths were converted to cDNA using reverse transcriptase [47]. Later in the 1980s, Sanger sequencing was used to sequence the RNA transcripts, called Expressed Sequence Tags (ESTs) [46]. EST was used as a technique to determine the gene content of an organism. Later, RNA transcript quantification was also performed using various techniques such as northern blotting and then by qRT-PCR [48]. However, these techniques do not cover the entire transcriptome, but only a small part of it. In 1995, the first method developed and used for transcriptomics was sequencing-based called Serial Analysis of Gene Expression (SAGE) [49].
SAGE methodology involved preparing a short sequence tag (10-14 bp) from each transcript's unique positions, which can be used to identify a transcript. Sequence tags are then linked together to form long serial molecules. These molecules are then cloned and sequenced. To check the expression of a specific gene, a total number of tags are counted. Quantification of the number of times a particular tag provides the expression level of the corresponding gene. SAGE can also help to identify new genes expressing in a tissue or under specific conditions [49].

Internal
Massively parallel signature sequencing (MPSS) is a sequencing-based approach used to analyze a level of gene expression by quantifying mRNA transcripts present in the samples. MPSS uses a 17-20 bp signature sequence adjacent to the 3'-end of mRNA to identify mRNA. Each signature sequence is first cloned on to microbeads. This technique ensures that only one type of DNA sequence is on a microbead. The microbeads are arrayed in a flow cell for sequencing and quantification. Each signature sequence (MPSS tag) in a MPSS dataset is analyzed, compared with all other signatures, and all identical signatures are counted. The expression level of any single gene is calculated by dividing the total number of signatures for that gene present in the samples with all signature sequences identified. Later, two well-defined techniques that provided high throughput transcriptomics data came into existence: Microarray and RNA-Seq. The progression and advancements in techniques of gene expression analysis are displayed in figure 4. Microarray quantifies a set of the RNA transcripts by their hybridization to complementary probes fixed on a platform. It was used to assay thousands of genes with a low cost per unit gene. Advancements in designing arrays and fluorescence detection systems have boosted the sensitivity and accuracy of this technique. A microarray consists of several probes on a solid platform, i.e., a glass or a silicon chip. The fluorescent-labeled transcripts then hybridize on these chips to complement the probes. The amount or intensity of fluorescence at each probe quantifies the respective transcript [50]. Microarrays are broadly of two categories: -low-density spotted array and high-density probe array. Low-density spotted arrays use large probes and various fluorophores for test and control, whereas high-density probes arrays have higher resolution and use a single fluorophore for the test [51]. Initially, Affymetrix (Santa Clara, CA) Gene chip array developed a high-density array, and later, Nimble Gen developed a more advanced high-density array by mask-less photochemistry. Even though this technique is efficient in revealing the transcripts in an organism, it requires prior knowledge of ESTs and an organism's genome assembly so that probes could be designed to generate the chip.
RNA-seq is defined as sequencing cDNAs of mRNA transcripts and quantifying them based on the number of each transcript. High-throughput sequencing platforms have highly reduced the cost of sequencing and increase the level of accuracy. New sequencing platforms such as Roche 454, Illumina, SOLiD, Pac Bio, and Nanopore (com-pared in Table 2) have aided the RNA seq technique to provide extensive genome coverage [52,53]. RNA seq provides a tremendous amount of information about the genes present & activation of these genes at a particular time point under specific conditions. In recent years, the availability of NGS sequencing technologies has boosted RNA-seq over microarray technique, illustrated by google trends of the last ten years ( Figure 5). Only the mRNA transcripts are sorted out from different kinds of RNAs for RNAsequencing. The mRNA with 5'-cap and poly-A tail are separated by poly-A tail specific probes. Small RNAs are removed based on their size by using gel electrophoresis. The mRNAs are fragmented as per the read length limit of the sequencing technology through hydrolysis or sonication. The selected mRNA is used to synthesize cDNA, which could be amplified if the amount is not sufficient and finally used as reads for sequencing through NGS platforms [54]. Presently, Nanopore technology is used for RNA-seq, which directly sequences RNA without conversion into cDNA. It is better than previous sequencing techniques as it detects the modified bases, which were otherwise masked during cDNA synthesis, and prevents the biasness introduced during the cDNA amplification step. The number of reads and amount of coverage of the genome determines the sensitivity and accuracy of RNA seq. The Encyclopedia of DNA Elements (ENCODES) recommends 70x coverage for standard RNA seq and even 500x coverage for rare transcripts [55].

Transcriptome for improving abiotic stress tolerance in plants
With an increase in the whole genome transcriptomic studies in plants, the genes related to stress response, downstream signaling, and synthesis of stress response molecules are undermined [56]. A plethora of information on transcriptomics of cereals crops such as rice, wheat, maize, barley, and sorghum are available. This information has provided insight into the coordination of different biological processes in various plant tissues under stress conditions [57]. The study of drought stress during the flowering or fruiting stage of the plant gives information about the reproductive system's interaction, hormone signaling, and metabolic pathways. Table 3 highlights the use of microarray and RNA-seq techniques in different crops to identify differentially regulated genes during various abiotic stress conditions.
The comparative transcriptome analysis between drought-tolerant and susceptible cultivars indicates candidate genes and the mechanism of adaptations under drought stress [58]. Earlier studies revealed that 20 CIPK genes are upregulated in rice, specifically under drought stress conditions. However, recent RNA seq experiments state that the overexpression of CIPK genes under various abiotic stresses such as salinity stress, cold stress, and ABA treatment [59]. RNA seq studies reported that rice cultivars tolerant to salinity have a quick response to salinity, earlier induction of H2O2, and signal transduction compared to sensitive ones [60]. Salinity tolerant cultivars set up an adaptive program by limiting sodium to roots and old leaves of the plants and activating the genes related to photosynthesis in new leaves. Two inbred lines with extreme cold tolerance and sensitivity were used for whole-genome transcriptomics and bioinformatic analysis of transcriptomic data reported that 948 DEGs out of a total of 19,794 genes were mainly responsible for DNA binding, ATP binding, and protein kinase [61]. RNA seq of drought-resistant and drought-susceptible cultivars of sorghum at seedling stage under PEG-induced drought revealed 180 differentially expressed genes, and 70 genes upregulated in response to drought stress were uncharacterized novel genes or associated with transcription factors and signal transduction under stress [62].

Transcriptomics for crop improvement against biotic stress
Crop yield is challenged by various biotic stresses such as bacteria, viruses, fungi, insect-pest, and weeds [74]. Most of the plant breeding programs target developing the genotypes, which are tolerant or resistant to plant pathogens and insect-pests so that crop loss due to biotic stress could be mitigated. Plants have evolved with various biochemical and physiological mechanisms to escape biotic stresses [75]. In response to pathogens infection, plants activate salicylic acid (SA), jasmonic acid (JA) and ethylene (ET) signaling, reactive oxygen species (ROS) production, hypersensitive response, the release of toxic compounds, and phytoalexins [76]. Therefore, understanding the molecular level changes in plants in response to pathogen attack is crucial to develop disease-resistant crop varieties.
Various transcriptomic studies are conducted in cereal crops to decipher the disease resistance mechanisms and to identify resistance (R)-genes. The whole-genome transcriptome analysis of four wheat cultivars Wuhan 1, Nyubai, HC374, and Shaw, after head inoculation with Fusarium graminearum, revealed upregulation of leucine-rich repeatsreceptor kinases (LRR-RKs), a class of receptor kinases involved in disease resistance during different time points resistant and susceptible cultivars and differential expression profile of different genotypes shows various genotype-specific defense responses [77]. Table 4 summarizes other important examples where transcriptome data was used to study the plant response against various plant pathogens. Mangnaporthe oryzae causing blast disease in rice was the first pathogenic fungus to be sequenced. Hence Magnaporthe oryzaerice is considered as a model pathosystem to understand molecular host-pathogen interactions. High-quality transcriptomic studies via RNA seq provide essential information to dig out genomic level interactions of host-pathogen systems [78]. It is well known that the Xa23 gene in Oryza sativa confers broad-spectrum resistance to most of the biotypes of Xanthomonas oryzae pv. oryzae (Xoo). The transcriptome profiling of NILs with Xa23 (CBB23) and without Xa23 (JG30) before and after inoculation with Xoo provides insight into the downstream genes and pathways involved in the resistance provided by the Xa23 gene. In total, 1645 DEGs were found, and most of these are associated with phenylpropanoid biosynthesis, followed by flavonoid biosynthesis and phytohormone signaling [79].

Challenges and prospects in transcriptomics
Transcriptomic studies had various challenges from time to time, most of these are resolved with the advancement of techniques, and some are still in the pipeline. Microarray is limited to depict the expression level of only known genes. This was sorted out by RNA-seq, which provides the complete profile of the transcript present at the stage or time of an organism without missing any transcript. It lowers the background and increases the experiment's clarity, whereas analysis of NGS data in RNA-seq is tedious and time-consuming. NGS procedures have hurdles that read coverage may not be uniform along the genome due to variation in nucleotide composition between genomic regions. In RNA-seq, a long transcript is estimated to have more reads than a short transcript at the same expression level. To normalize the counts with respect to transcript length, some software packages are used that represent RNA-seq data by transformed quantities such as RPKM (Reads Per Kilobase per Million mapped reads) or the related FPKM (Fragments Per Kilobase per Million mapped reads). The software, such as Cufflinks/Cuffdiff, provides an integrated analysis pipeline from the aligned reads to the differential expression results, where the inference is based on FPKM values. Further improvements in RNA-seq are revolutionizing the transcriptomics studies in plants to develop crop varieties in the near future which can withstand biotic and abiotic stress and produce a higher yield.

Cereal proteomics
The advances in genomic techniques provide a blueprint of possible gene products that have changed our way of studying biological systems. As the genome is static, it lacks to provide the correlation between mRNA and protein abundance due to post-translational modifications, protein function, and localization. Also, it does not give the biological snapshot of an organism at a particular developmental time point. Therefore, it is essential to study the protein structure, their interactions to explore their role during plant growth and development. Proteomics is a systemic, high-throughput approach for comprehensive identification and analysis of protein expression in a cell, tissue, or organelle of an organism at a particular time under specific conditions [89]. The very first report of 2-DE dates back to 1975, which provided the first glimpse of the protein levels and the isoforms of the cells. Marc Wilkins coined the term proteomics in 1994 as the extension of the word "proteome" (PROTein complement of the genOME) at the first 2-DE meeting in Siena, Italy [90]. The study of proteome profiles provides deep insight into various metabolic processes and their interaction with different regulatory pathways in a biological system. Proteomics is a powerful tool providing robust and better representation of the cell functioning than other techniques, including genomics tools.
The advancements in proteomics in the last decade have led to new and improved technologies, including two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), liquid chromatography (LC), mass spectrometry (MS), etc. which have enabled fast and accurate protein identifications.

Technical advances in proteomics
In the recent past, various proteomics approaches have been developed and adopted in plants. These tools pave the way for high-throughput proteome analysis for quantification, localization, protein-protein interactions, and post-translational modifications (PTMs). All proteomics technologies have three main steps, including protein extraction, separation (gel-based or gel-free/Column-based methods), and identification or quantification (Mass Spectrometry, MS) [91]. The Gel-free techniques can be label-free such as liquid chromatography coupled with mass spectrometry, LC-MS or tag-based such as ICAT, iTRAQ, etc. [92] (Figure 6). A single technology cannot comprehensively analyze complete plant proteome due to its complex and dynamic nature. Therefore, multiple approaches are used to improve the understanding, resolution, and coverage of plant proteome. Table 5 provides knowledge about different proteomics techniques used to study abiotic and biotic stress responses in cereal crops, including wheat, barley, rice, maize, and sorghum. Various factors such as availability of resources, facilities, and applications like global or targeted profiling decide the proteome's approaches [91]. Internal Figure 6. Schematic representation of various proteomics approaches.

Global proteome profiling
It is considered one of the best approaches for comparing two or more proteomes or generating a reference proteome map. It is categorized into gel-based and gel-free/shotgun approaches [93]. The gel-free proteomics is gaining popularity with the passing years due to increased reproducibility and less bias than gel-based proteomics [94].

Gel-based approaches
These are the most popular, versatile and mature methods of protein separation and quantification. They allow the identification of low-abundance proteins and characterization of protein isoforms on a large scale and are less expensive than gel-free approaches.
Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) is considered the workhorse of proteomics due to its affordability and acquaintance. It resolves proteins based on two independent parameters: isoelectric point (pI) and molecular mass (M). The resolved proteins can be stained with Coomassie blue, silver nitrate, or SYPRO Ruby for their visualization. It is widely used in expression proteomics studies.
Difference in gel electrophoresis (DIGE) has been developed to overcome the gel-togel variation and less reproducibility of 2D-PAGE. In this approach, many protein samples labeled at their lysine residues by different fluorophores (CyDye2, CyDye3, CyDye5) are simultaneously separated on a single gel [95]. DIGE is used to elucidate variations in protein expression in response to various biotic and abiotic stresses.
Three-dimensional gel electrophoresis (3DGE) is an advancement of 2D-PAGE to overcome the co-migration interferences [96]. It uses two different buffers with different ion carriers and gives very accurate protein and post-translational modifications (PTMs) identification [97].

Gel-free approaches
Gel-free approaches are developed to overcome the limitations of gel-based approaches, such as the inability to separate the entire proteome, rare detection of low abundance proteins, and labor-intensive. These include quantitative approaches like tag-based labeling (ICAT, iTRAQ), metabolic labeling (SILAC), and label-free methods (MudPIT) [93].
Isotope-Coded Affinity Tagging (ICAT) is an in-vitro isotopic labeling approach for protein quantification, which involves the use of an affinity tag (biotin), linker having stable isotope, and a reactive group that binds to thiol groups (cysteines) of proteins. The labeled tryptic peptides are first fractionated by chromatography and then identified by mass spectrometry (MS) [98]. ICAT mainly contributes to identify novel proteins controlling a vital biological function in a particular cultivar [99].
Isobaric Tagging for Relative and Absolute Quantification (iTRAQ) is a multiplex protein quantification technique utilizing the isobaric tags for labeling the N-terminus and side-chain amine groups of proteins. The sensitivity of protein quantification from different sources in one test is much higher than ICAT [100]. Crop breeders use this technique to elucidate markers for biotic and abiotic stresses, and those later can be used in designing genetically modified crops.
Stable Isotope Labelling by Amino Acid in Cell Culture (SILAC) is a metabolic labeling technique that is the most potent approach for dynamic quantitative plant proteome studies. It utilizes in-vivo labeling of cell population grown in either N14 or N15 containing medium [101]. It is advantageous to identify proteome changes in signaling pathways triggered by PTMs in response to stress [102].
Multi-dimensional protein identification technology (MudPIT) is a shotgun proteomics tool used for complex multi-dimensional protein analysis [94]. It is a less complex and highly sensitive technique for the identification of low abundance proteins. In this approach, the biphasic or triphasic microcapillary columns are used to separate digested proteins, followed by performing tandem MS. This technology has been used to unravel the mechanisms involved in controlling tiller numbers in rice.

Targeted proteome profiling
It is a selective proteome analysis of interacting proteins or post-transcriptionally modified proteins using PTM-specific stains, antibodies, or targeted MS assays [91]. It can be classified into gel-based, affinity and reactive chemistry-based and MS-based targeted proteomics.

Gel-based proteomics
The global proteome analysis is undertaken using 2D-PAGE, followed by staining with Phosphoprotein specific gel stain (Pro-Q Diamond) (PTM specific stain). However, these approaches are not used these days due to a lack of identification of less abundant proteins [91].

Affinity and Reactive chemistry-based proteomics
In this approach, specific proteins are isolated, enriched, and purified by different techniques such as immunoprecipitation (IP), strong cation exchange (SCX), strong anion exchange (SAX), and immobilized metal affinity chromatography (IMAC). These techniques can be used individually or coupled with one another to enhance efficiency.  [103]. However, the afore-mentioned techniques suffer from precision errors between samples. To overcome this shortcoming, SRM/MRM techniques are isotopically labeled [104].

Bioinformatics in Proteomics
The technical advances in proteomics approaches have made it possible to achieve a massive amount of high-quality protein expression data. It is challenging to associate this data with other -omics technologies like genomics, transcriptomics, metabolomics, and phenomics. Bioinformatics plays a fundamental role in overcoming this bottleneck by reducing the analysis time and providing statistically significant results. Some of the major proteomics databases currently used are PRoteomics IDEntification database (PRIDE) [125], Peptide Atlas [126], and Mass Spectrometry Interactive Virtual Environment (Mas-sIVE). Various comprehensive databases for plant proteomics such as Plant Proteomics Database (PPDB), 1001 Proteomes, Pep2 Pro Database, DIPOS, etc. [127][128][129][130] as well as different web-based prediction tools like GelMap [131], MRMaid [132], Peptide Atlas SRM Experiment Library (PASSEL) [133], etc. have been developed to assist proteome analysis.

Challenges and prospects in proteomics
The proteomic analysis complements both transcriptomics and metabolomics for elucidating plants' cellular mechanism and, thus, is a vital tool for crop improvement. The recent advancements in proteomics techniques have enabled us to unravel plant biology. However, we still need to overcome the various limitations of these techniques to develop smart crops with high grain quality and capable of withstanding multiple stresses. New emerging technologies such as peptidome, phosphoproteomics, and redox proteomics will provide in-depth insight into molecular interactions and protein function [134]. With the ever-changing climate, new plant variants are being introduced continuously to cope with these fluctuations. Novel proteomic tools will enable us to generate more stress-tolerant or stress-adaptive cultivars.

Cereal metabolomics
Internal Metabolomics is a relatively new "-omics" technology for deciphering the plant metabolomes and, hence understanding the complex biological systems. Metabolomics allows a comprehensive profiling and comparison of a small molecules (<1500 Da) of a cell, tissue, organ, or organism [135]. Metabolomics deals with the identification and quantification of metabolites in a biological system for investigating their compositions and interactions with environment [136]. Moreover, compared with genomics, transcriptomics, and proteomics, metabolomics focuses on investigating the biological activities, and, therefore, is relatively easier to relate to the phenotype [137].
Based on the purpose of the study, metabolomics can be differentiated into two types, viz., targeted and untargeted. Targeted metabolomics deals with the absolute quantification of one or a few metabolites in a set of predefined known substances. The targeted approach, therefore, tends to be highly sensitive and quantitative, and can be useful to trail the metabolites known to be associated with specific stress. Thus, targeted metabolomics is a discovery-based approach and measures the relative abundances of several hundred to thousands of all detectable metabolites. The Untargeted approach, on the other hand, can measure mass spectrometric features of unknown metabolites, and thus enhances the chances of sensing unintended effects [136].
In recent years, metabolomics has been used to understand biotic and abiotic stresses in crop plants, and many studies summarize the metabolomic advances in corn, sorghum, wheat, rice, and barley, investigating the composition of these crops and/or their products and their applications for crop improvement (reviewed in [138][139][140]). Understanding the plant metabolomic processes would be beneficial for improving crop yield and human nutrition aspects in crop breeding programs Future perspectives of metabolomics integrating other "-omics" technologies were also highlighted.

Overview of metabolomic pipeline
The workflow for metabolomics involves a series of steps, including experimental design, sample preparation and extraction, metabolite detection using analytical techniques, and data processing and analysis using bioinformatics techniques. Since metabolomics involves a wide range of diverse compounds, variations in metabolite concentration (∼106) can complicate the downstream analyses [137]. Thus, it is essential to carefully choose (1) the appropriate experimental design, (2) optimize sample preparation and extraction protocols, and (3) detection technologies for comprehensive metabolomic analyses.
Numerous extraction protocols are available for metabolomics analysis [141,142]. Standard sample preparation protocols involve plant material collection in liquid nitrogen followed by rapid cooling or freeze-drying of sample and then extraction using 80% or 100% methanol [143]. Furthermore, experimental parameters such as the concentration of solvents, extraction time, and temperature can also influence the observed metabolite profiles. Hence, optimizing the metabolomic protocol is an essential step mics, and proteomics, metabolomics focuses on investigating biological activities and is relatively easier to relate to the phenotype [137]. For example, targeted metabolomics can be optimized to increase the signal-to-noise (s/n) ratio of the desired metabolite or decrease the time and cost of experimentation [136]. The untargeted approach must be optimized for reproducibility of the protocol to detect the ratio of the actual variation in a biological sample to the variation due to experimental errors. Several approaches, such as fractional factorial analysis or D-Optimal design to experimental design, can optimize metabolomic protocols [136].

Analytical and data processing techniques in crop metabolomics
Several techniques, e.g., gas chromatography mass spectrometry (GC-MS) [144], liquid chromatography mass spectrometry (LC-MS) [145], capillary electrophoresis mass spectrometry (CE-MS) [146], nuclear magnetic resonance (NMR) [147], and vibrational spectroscopy (VS) [136] have been applied in crop metabolomic studies. Table 6 provides an overview of commonly used analytical techniques along with their advantages and limitations. With recent advancements in technology, other methods such as gas chromatography time-of-flight mass spectrometry (GC-TOF-MS) [148], ultra-performance liquid chromatography mass spectrometry (UPLC-MS) [149], capillary electrophoresis time-offlight mass spectrometry (CE-TOF-MS) [150], high-performance liquid chromatography (HPLC) [151], liquid chromatography high-resolution mass spectrometry (LC-HRMS) [152] have been utilized in crop metabolomic studies.  After analytical analyses with one or more of the techniques mentioned above, the data then undergoes a series of pre-processing steps, including cleaning, noise reduction, baseline correction, alignment, peak deconvolution, normalization, and scaling. Numerous online platforms have been developed to help metabolomics, data mining, data assessment, data processing, and data interpretation. Different automated and/or semi-automated peak detection methods such as Compare, COMSPARI, LineUp, MarVis, Mark-erLynx, MetAlign MarkerView, Metabolic Profiler, MET-IDEA, MSFACTs, MathDAMP, MZmine, Profile, Sieve can be used for data processing [138,153]. Statistical analyses, e.g., principal component analysis (PCA), multivariate curve resolution (MCR), hierarchical cluster analysis (HCA), partial least squares discriminant analysis (PLS-DA), and batchlearning self-organizing map (BL-SOM), are commonly used to make meaningful inferences from large metabolomics datasets [138,154,155]. After profiling metabolites in a particular plant species, metabolic pathways can be reconstructed from a list of functionally annotated genes available from the databases, such as KEGG pathway or KNApSAcK [156,157].

Applications of metabolomics for crop improvement
Metabolomics has widely been used to investigate the plant's adaptive responses against stresses. It plays an essential role in investigating the synthesis of specific metabolites under various stresses to understand how plants adapt to unfavorable surroundings. Metabolomic studies uncover new compounds and novel metabolic pathways that accumulate under different stress conditions [140]. Besides, metabolomics studies also help in improving the understanding of previously recognized metabolic pathways. Over the last decade, several metabolome studies have been conducted to investigate the metabolite concentration changes under various biotic and abiotic stress factors, as described in (Table 7). The drought stress response has also been studied by metabolomic approaches in rice [144], wheat [149], maize [148], and sorghum [158]. Variations in phytohormones and other metabolites in the roots of barley plants under salinity stress were reported by Cao et al. (2017). In rice, profiles of flavone-glycosides, which are major secondary metabolites, were evaluated against abiotic stress and herbivores [159]. Researchers have reported natural metabolic variations in rice [160]. Moreover, identifying the metabolites encoding for specific loci can potentially be utilized as biomarkers in association studies [140]. Metabolome quantitative trait loci (mQTLs) analysis investigates metabolite concentrations in plant tissues (m-trait) and can, therefore, provide a comprehensive understanding of their genetic background. Furthermore, mQTL can discover novel relationships between metabolomic pathways, structural genes, and agronomically important traits, hence can assist in crop breeding. For example, Chen et al. (2014) have provided a comparative mQTL mapping between rice and maize.

Challenges and prospects in metabolomics
Integrating metabolomics with genetic approaches can facilitate the study of genetic regulation of plants in relation to metabolomics. Furthermore, utilizing high throughput genome sequencing, reverse genetics with metabolomics tools can decrease the development time, such as in metabolomics-assisted breeding. These novel plant breeding approaches can help crop improvement programs produce high-yielding crops, stress-tolerant germplasm, and climate-adapted crop varieties. The combination of metabolomics and other omics technologies such as transcriptomics, proteomics, phenomics, and genomics can investigate complex metabolomic pathways in plants. Metabolic profiling combined with genome-wide prediction studies can be utilized to screen desirable agronomic traits in genotypes and hybrids by genetic mapping, thus opening new opportunities to enhance crop genetics (Wen et al. 2014). Furthermore, genome-wide association studies combined with metabolic and gene-expression studies can explore the environmental effects of plants' phenotype plasticity under various biotic and abiotic conditions [161].
Prospects of metabolomics may include screening the metabolic markers to understand plant metabolism. New emerging technologies such as single-cell metabolomics, metabolome-scale labeling will improve metabolite interpretation, metabolic pathway elucidation, and metabolite quantification at the single-cell level [162]. Recent technological advancements such as the single-probe MS technique has potential for near in situtargeted metabolomic analyses with minimum cell manipulation at the cellular level [163]. Future challenges of metabolomics would be to better utilize the available information from metabolomics and interpret the metabolite information correctly for possible applications

Phenomics for crop improvement
During the last two decades, genomics has revolutionized plant breeding mainly due to a reduction in genotyping costs, which results in the adoption of new technologies such as linkage mapping, genome-wide association studies, genomic selection, and rapid generation advance [175]. Accurate genetic mapping and genome-wide selection require precise phenotyping of the plants. However, plant phenomics, i.e., applying tools and methodologies to study plant growth, development, performance, and composition, as a field is still in its infancy and, therefore, has lagged in comparison to genomics [176]. Since the conventional field phenotyping employed by the majority of the plant breeders is laborintensive, costly, and subjective [177]. Plant phenomics is a rapidly expanding domain that ranges from high throughput field phenotyping to cellular level imaging. Nevertheless, during the last decade, more focus was given to the field-based high throughput phenotyping (HTP), primarily, to predict agronomic and physiological traits [178]. In this regard, HTP has demonstrated its potential for non-destructive phenotyping of the various agronomic, physiological, as well as biotic, and abiotic stress-related traits (Kefauver et al. 2017) via (1) utilizing high-throughput tools and platforms, (2) image processing and implementing algorithms for the extraction of raw data, and (3) linking to the processed data to the target traits [179].
Various aerial or ground based HTP platforms have been developed for measuring different plant traits at different growth stages with more precision, throughput, and accuracy [180]. As shown in Table 8, these platforms demonstrated their superiority in rice, wheat, maize, barley, and sorghum (Tables 9, 10). Figure 7 outlines the most used sensors, data analysis methods for managing various stresses in the five most important cereals, namely, rice wheat, maize, barley, and sorghum. The development of novel imaging sensors for non-invasively phenotyping a wide range of organs, tissues, and physiological processes has provided a substantial impetus to the HTP [181]. This section of the review concentrates on (1) various phenotyping platforms that are currently being used to accelerate genetic gains in key cereals viz., rice, wheat, maize, barley, and sorghum, (2) advancements in imaging sensors and subsequent analyses, and (3) application of machine and deep learning methods for solving the "big data" problems in phenomics.

Plant phenotyping platforms
HTP depends on the imaging sensor used. Advanced phenotyping platforms have improved the data capture capabilities by including mobility, throughput, and inbuilt data storage at relatively low cost. Unmanned aerial vehicles (UAVs) have maximum adoption due to their reliability, cost, and technical requirements; however, some countries are still not adopting it due to regulations controlling their flight. Several carts and tractor-mounted tools have similarly been adopted for various crops, although their utilization is also stage-dependent [176]. Moreover, several handheld cheap platforms provide spectral and time-series information. However, these handheld devices face issues of standardization and low throughput; and, because they are usually mounted over poles, they result in less canopy coverage [182]. Table 8 provides detailed information about various platforms utilized during the last decade. Table 8. List of phenotyping platforms and their utilization.

Phenotyping platform/ techniques Utilization References
BreedVision Tractor-pulled multisensory phenotyping platform with RGB, multispectral, and time of flight sensors [182] Grow screen fluoro Work under controlled conditions for quantification of fluorescence pigments [183] Leaf curtain arrays Utilized for leaf area and plant height estimation [184] LEAF-E Estimates the total leaf growth and rate of development [185] Phenocart A movable platform in the field used for high throughput phenotyping [178] Phenopsis Used to study drought tolerance abilities under control conditions [186] Phenoplant Used to obtain chlorophyll fluorescence parameters under controlled conditions [187] Phenovator Used for phenotyping a large number of samples under controlled conditions by providing fluorescence, multispectral and RGB images [188] Pushcarts Carts with different sensors used to study plant response to drought, heat, and other stresses; operated by one person [176] Terrestrial laser scanning Used for measuring plant height and architecture under field conditions [189] TRiP Used to study circadian changes in plants with a series of images and TrRiP algorithm [190] Unmanned aerial platforms Multiple sensors can be employed for measuring various traits throughout the field [191]

Imaging sensors and analysis
Imaging sensors have enabled the collection of high-resolution and multidimensional data from plants to quantify plant growth, yield, stress, and physiological process under both control and field conditions. The recent development of sensors technology measuring reflection from gamma rays to radio waves regions of the electromagnetic spectrum has provided a plethora of information to plant scientists. These imaging sensors vary from spectroscopy, sound navigation ranging (SONAR), light detection and ranging (LIDAR), X-ray computed tomography (CT), thermal, visible to near-infrared, multispectral, hyperspectral, fluorescence, time of flight (ToF), positron emission tomography, stereovision [188,190]. The utilization of these imaging sensors with autonomous platforms has opened up the doors of HTP. Tables 9 and 10 provide detailed information about different imaging sensors utilized for studying agronomic traits, biotic and abiotic stresses in the five most important crops: rice, wheat, maize, barley, and sorghum grown in the world.

RGB/visible imaging
RGB cameras or regular cameras, or digital cameras capture the true color images in the electromagnetic spectrum's visible region. This is the cheapest and most often used senor for plant phenotyping studies. These sensors reflect the red, green, and blue regions of the visible spectrum. It has been used to estimate plant biomass, different pigments, Internal tiller count, yield traits, flowering time, biotic stresses, plant height, germination, and emergence rates [192,193].

Multispectral imaging
Multispectral cameras provide information about specific wavelength bands from the spectrum's visible and infrared regions. These reflection bands are used to extract different vegetation bands, which give information about photosynthetic efficiency, pigments, nutrient status, water status, and plant senescence [194]. The essential indices utilized include normalized differentiation vegetation index (NDVI), water index (WI), anthocyanin reflection index (ARI), and simple ratio (SR) [195].

Hyperspectral imaging
These imaging sensors cover whole visible and infrared regions with a high spatial resolution by covering reflection from the entire areas due to the sensor's small band width. These sensors have the best spatial and spectral resolution, resulting in more useful information. This imaging platform has been used for studying plant health status, leaf growth, predicting grain yield, biotic stresses, water status, plant height, and chlorophyll content [195,196].

Thermal imaging
These sensors provide information about plant water status by measuring reflection from the infrared region for estimating canopy temperature and transpiration rate. Thermal imaging has been used for detecting plant water status, disease-infected plants, and maturity of the fruits [197,198].

Fluorescence imaging
Fluorescence sensors provide information about photochemistry changes by capturing photosystem II's fluorescence emissions. Plants absorb a specific portion of the electromagnetic spectrum and thus have a characteristic emission spectrum. Fluorescence sensors provide information about the photosynthesis rate, chlorophyll content, and various physiological processes in plants [199].

X-ray computed tomography
These imaging sensors aid in the generation of 3D tomographic images of the objects using an extensive series of 2D radiographic images taken with computer-processed Xrays. Images provide root architectures by separating objects depending on the different densities. X-ray CT has been utilized for studying root traits, tillers morphology, and grain quality [179,200].
In addition to all these imaging sensors, there are several others: positron emission tomography, magnetic resonance imaging, SONAR, laser scanning, LIDAR, and flight time. These sensor readings are referred to as other publications [179,195].

Challenges and prospectus in crop phenomics
The continuous use of aerial and ground based HTP platforms with different imaging sensors at multiple points during different growth stages of the plants has resulted in big data, storage issues, and extraction of valuable information. This issue is resolved by adopting the machine and deep learning tools for data analysis to extract legitimate conclusions from the big data sets [179]. Machine learning (ML) is an interdisciplinary approach for data analysis using probability, statistics, classification, regression, decision theory, data visualization, and neural networks to relate information extracted with the phenotyped obtained. ML provides a significant advantage to the plant breeders, pathologists, and agronomists for the extraction of many parameters for analyzing each trait together, despite traditionally where we just used to look at a single feature at a time [205]. The other great breakthrough with ML is directly linking the variables extracted from the HTP data to the plant stresses, biomass accumulation, grain yield, and soil characteristics [206,215]. ML's biggest success involves inferring trends from the data and generalizing the results by training the model. There have been various ML models being applied for HTP, namely support vector machine [204], discriminant analysis [210], k means clustering [195], neural network [211], clustering [195], and dimensional reduction [179]. All these models help identify, classify, quantify, and predict different phenotyping components in plants.
However, the recent transformation by deep learning (DL) in other fields such as traffic signaling, health care, voice and image recognition, consumer analytics, and medical diagnostics has provided a new tool to plant scientists for image analysis in HTP [215]. DL models involve automatically learning the pattern from the extensive data set using non-linear activation functions for making conclusions such as classification or predictions. The important DL models used for phenomics include but are not limited to a multilayer perceptron, generative adversarial networks, convolutional neural network, and recurrent neural network [212]. These potential data analysis tools open up a new path for opening the prospectus of HTP in plant breeding.
Overall, in this phenomics section, we provided a comprehensive review of the advent of high throughput phenotyping, aerial and ground-based platforms, imaging sensors and analysis, and finally, data analysis methods and phenotyping bottlenecks and prospectus to deliver the whole pipeline of the HTP utilization in the programs.