Opportunities and challenges for interpreting rare variation in clinically important genes

Genome sequencing is enabling precision medicine—tailoring treatment to the unique constellation of variants in an individual’s genome. The impact of recurrent pathogenic variants is often understood, leaving a long tail of rare genetic variants that are uncharacterized. The problem of uncharacterized rare variation is especially acute when it occurs in genes of known clinical importance with functionally consequent frequent variants and associated mechanisms. Variants of unknown significance (VUS) in these genes are discovered at a rate that outpaces current ability to classify them using databases of previous cases, experimental evaluation, and computational predictors. Clinicians are thus left without guidance about the significance of variants that may have actionable consequences. Computational prediction of the impact of rare genetic variation is increasingly becoming an important capability. In this paper, we review the technical and ethical challenges of interpreting the function of rare variants in two settings: inborn errors of metabolism in newborns, and pharmacogenomics. We propose a framework for a genomic learning healthcare system with an initial focus on early-onset treatable disease in newborns and actionable pharmacogenomics. We argue that (1) a genomic learning healthcare system must allow for continuous collection and assessment of rare variants, (2) emerging machine learning methods will enable algorithms to predict the clinical impact of rare variants on protein function, and (3) ethical considerations must inform the construction and deployment of all rare-variation triage strategies, particularly with respect to health disparities arising from unbalanced ancestry representation.


Introduction
We are approaching an era in which genome sequencing at birth will be a widespread practice with the potential to revolutionize healthcare. Interpretation of the genetic variants identified by sequencing, however, is a significant challenge and limits the use of DNA sequencing as a primary diagnostic screen 1 . Current algorithms used to interpret the significance of genetic mutations are not reliable enough to be used without additional clinical data 2 . Yet, accumulating biomedical data enables machine learning algorithms to predict the consequence of genetic variants with increasing accuracy. The pairing of modern algorithms and widespread genome sequencing is beginning to deliver precision medicine in limited settings 3 , but the broad interpretation of rare genetic variation requires algorithmic advances and improved access to data. The identification of rare variation responsible for unusual clinical phenotypes is a particularly difficult challenge because both the responsible gene and the associated variation must be identified. A slightly more tractable problem is the identification of clinically important variants in genes that are already known to be clinically significant and have known mechanisms for influencing phenotype.
This paper focuses on two fields that have known clinically important genes and in the near term should benefit greatly from improved rare variant interpretation: pharmacogenomics (PGx) and inborn errors of metabolism (IEM). IEM and PGx are subfields of genetics characterized by monogenic phenotypes for which therapeutic action can be taken in response to clinicallyimportant variants in known genes. Both fields have been revolutionized by low-cost sequencing and the curation of large databases cataloguing the effects of specific genetic variants.
Furthermore, both fields struggle with interpretation of the phenotypic effects of rare variants that have not been clinically evaluated.
As an interdisciplinary team supported by the Chan Zuckerberg Biohub, we approach these two challenges by addressing both computational and ethical issues, in order to develop a framework for genome-informed medical care that benefits all. Here we review the current practices and limitations of variant interpretation in PGx and IEM and highlight recent computational advances that will allow researchers to improve precision medicine. Ethical considerations in these activities primarily address health disparities, since existing genetic and genomic databases are not inclusive of individuals of diverse ancestries. As the recent strategic vision from the US National Human Genomic Research Institute (NHGRI) attests, there are significant societal implications of a genomic learning healthcare system that we cannot afford to oversimplify 4 . Our focus on genes of known consequence should generalize ultimately to the more difficult cases where the gene, function and mechanism are not well-understood.

PGx and IEM in current clinical practice
For both PGx and IEM, our detailed understanding of the biological processes at play (the genes that are critical and how they interact) has reached a point at which routine genetic screens can inform clinical decision making. In the United States, PGx testing is mandated by the Food and Drug Administration for a number of drugs due to safety concerns, and recommended for many others. Testing for IEMs is routine practice for nearly all newborns in the United States, but the role of genetic testing is largely limited to second-tier screens and carrier testing. These two fields are linked in more ways than it may superficially appear. The clinical implications for most known PGx and IEM driven phenotypes are often caused by variants in a single gene. As monogenic traits, there is a critical importance in understanding the impact of variants in the underlying genes, but also a narrowing of the problem space that makes for a tractable solution. Additionally, the mechanisms of disease and treatment response are generally understood.
PGx describes how an individual's response to medication is influenced by genetic variation in pharmacogenes: genes encoding proteins involved in the pharmacokinetics and pharmacodynamics of a drug 5 . Many pharmacogenes have common genetic variants with known clinical significance. These variants can affect the metabolism, transport and action of drugs throughout the body and may influence efficacy or lead to adverse events. Studies have shown as many as 99.8% of individuals carry at least one genetic variant that could lead to adverse outcomes for at least one drug [6][7][8] . In the past, clinical practice overlooked the influence of genetics on drug response, and-except for several extreme cases 9 -used a standardized dose of any particular drug for most patients, with some trial-and-adjustment to determine the ideal drug and dosage. This error-prone process can lead to decreased efficacy and increased incidence of adverse events that could be otherwise avoided 10 . Clinical practice may be moving towards genetic testing prior to drug dosing, though at present current practice is still limited to physicianguided treatment, with genotyping or sequencing ordered by a physician and carried out clinically ( Fig 1A). To date there are 60 drugs with clinical dosing guidelines published by the Clinical Pharmacogenomics Implementation Consortium (CPIC) and 94 drugs with guidelines from the Dutch Pharmacogenomics Working Group (DPWG) 11 . As the inexpensive interrogation of genetic information gains a foothold in clinical medicine, pharmacogenetic information will increasingly be used in standard care. Importantly, when genetic information is used to guide dosing, the current focus is on common polymorphisms in individuals of European ancestry. Common polymorphisms in other ancestral groups and rare variants are generally not included in current clinical dosing guidelines. This can lead to health disparities based on ancestry, and is problematic for all individuals since rare variants are estimated to contribute to as much as 50% of interindividual variation in drug response 12 .
IEM encompass more than 1,000 genetic disorders, including organic acidemias, urea cycle defects, lysosomal storage disorders, and disorders of amino acid metabolism 13 . IEM are characterized by monogenic mutations that can affect protein function and result in altered metabolite levels. The majority are autosomal recessive disorders. Many IEM are severe, early-onset conditions amenable to therapeutic intervention, with early treatment leading to significantly improved clinical outcomes. Since the consequences of unrecognized IEM in pre-symptomatic newborns can be catastrophic, detection before symptom manifestation is essential. Newborn screening (NBS), a pervasive public health effort, detects over 40 of the most common, treatable IEM using biochemical tests performed in blood samples taken shortly after birth.
Population-level NBS has been a routine part of care in the United States since the 1960s.
Presently, NBS detects IEM by identifying elevated metabolites in blood, which is performed using tandem mass spectrometry (MS/MS), an inexpensive and rapid test. However, disorders may be missed, some analytes are non-specific, and follow-up testing may be time-consuming and complex 1,14 . DNA sequencing has the potential to more accurately identify disorders for which MS/MS detection is not optimal, and also identify disorders for which there is no appropriate metabolite screen. Hypothetical future approach to patient care in the fields of PGx and IEM. All individuals undergo whole genome sequencing at birth. Machine learning models use detected variants to predict phenotype (disease risk or differential drug response). Ethics are considered and clinical action is taken accordingly.
Carrier testing provides an opportunity to detect rare variants in IEM and other disease-associated genes 15 before conception. However, interpretation of genetic screening results still faces significant challenges 16 , especially in cases identifying variants of uncertain significance (VUS) where risk for inherited disease cannot be definitively assessed and actionability is questionable.
The falling cost of next generation sequencing will continue to expand the identification of genomic variants that may cause IEM or alter drug response. While many genetic variants have catalogued associations with disease phenotypes or drug response, the majority are of unknown clinical consequence. Generating experimental data to validate the pathogenicity of individual variants is tedious and expensive, though recent advances have facilitated more large-scale generation of data 17 . Several databases attempt to catalog variants in disease-causing genes, but there is no central catalog for associated functional data. Thus, alternative methods for determining or predicting functional effects of genetic variants are urgently needed. 7

Ethical considerations in rare variant interpretation
Genome-informed precision medicine must include analysis of ethical, legal and social implications (ELSI) in order to improve upon rather than exacerbate existing health disparities 4 .
We have identified six chief concerns with enhancing computational predictors for the phenotypic effects of rare variation at the scale proposed here. First, the uncertainty of results and, second, the return of clinical results can either improve or compromise clinical care. Although enhanced computational predictors for IEM and PGx can minimize harm from the trial-and-error of current clinical practices, consistency in clinical education and approaches to ambiguous and incidental findings will be critical to determining societal benefit. Third, there are differences between research and clinical stakeholders in approaching the classifications of VUS that need to be reconciled. Fourth, the underrepresentation of minority groups in current datasets and the underlying research that informs them needs particular attention in order to create a larger and more diverse reference genome so that biases can be reduced. Fifth, an effective genomic learning healthcare system must account for privacy risks. Sixth, there needs to be transparent data sharing expectations across all levels of human input into the learning system. Building on previous ethical frameworks 18,19 and the need for a nuanced approach 20 , we suggest that tradeoffs between ensuring individual control over data and the social obligations of individuals have yet to be engaged with at the level of ethical governance provisions. Discussion of these concerns is guided by three central ethical questions, summarized in Table 1 and outlined within the Spotlight Boxes. For the use of predictive algorithms as the primary methods of analysis for IEM and PGx to be ethically justified, these methods must provide equal or greater certainty than current methods. Improving screening and predictive analysis for IEM and PGx at the testing level is contingent upon the accuracy of results, the provisions around returning results, and the impact on clinical care. Even pathogenic results can have variable penetrance and VUSs, with the possibility of reclassification, can cause significant consternation on both the part of the clinician and patient 21 .
Perhaps most thoroughly documented in cancer genetics 22 , the clinical return of genetic results is rarely straightforward. The prohibition against the return of uncertain results, outlined by the American College of Medical Genetics (ACMG), is such that even if there is a suspicion that an uncertain variant is pathogenic, it should conservatively be classified as a VUS since this information is used in medical decisions 2 .
The follow-up of uncertain results is complicated by clinician/researcher and patient/individual expectations and understandings of actionability. Genomic literacy across different healthcare professional roles is limited 23,24 . The disclosing of sequencing results should be contingent upon what has been previously explained to the patient/parent about incidental findings and potential treatments 25 . As healthcare delivery is already biased with regard to decisions about referrals or withdrawals of care, it will be challenging for algorithms to correct for existing biases in the handling of results 26 . Uncertain and incidental (or secondary) results in clinical care should be considered in the context of existing slippages of fiduciary obligations -such as clinician biases and/or patient mistrust -that emerging tests may or may not be able to compensate for 27 . The NHGRI has called for greater diversity amongst the genomic scientist workforce 4 .
In order to contain immediate risks around uncertainty of results and focus resources, is there a case for tiered approaches? For example, beginning with targeted sequencing and, upon accuracy improvements, expanding programs to include non-targeted sequencing, or, at the individual level, only sequencing specific genes as a second tier option if a positive test result arises in genome sequencing? Certainly, implementing genome sequencing at the routine screening level requires greater computational accuracy, accessibility and more nuanced ethical safeguards 4,20 . In the US healthcare context, it is difficult to resolve the issue of healthcare insurance coverage. Can financial disparity in the follow up of results be partially alleviated with temporary coverage through Risk Sharing Agreements between payers and manufacturers of tests? 28 Can ethical priorities of the clinician and patient transaction may be corroborated with the needs of the genomic learning healthcare system (that must maximize scarce resources) such that genomic sequencing improves healthcare across all of society? Similarly, efforts have been made to catalog the relationship between genetic variation and drug response, exemplified by databases including PharmVar and PharmGKB [33][34][35] . Like ClinVar, PharmVar relies on user submissions of discovered haplotypes in genes related to pharmacogenomics.
These variant databases encapsulate the combined expertise of thousands of clinical researchers across the world, but also reveal a large amount of uncertainty. The majority of possible missense variants in IEM and PGx genes are classified as VUS or are altogether missing from databases.
ClinVar alone contains more than 6,000 variants classified as VUS in IEM genes, and more than 10,000 VUS in PGx genes (Figure 2a To combine the best features of variant databases and computational predictors, automated systems that use both in tandem are already being tested to predict the pathogenicity of rare variants. Consider one recent study evaluating IEM detection by sequencing dried blood spots drawn from newborns 1 . This study compared the performance of MS/MS to exome sequencing as a primary screen for IEM on a set of 805 newborns with confirmed IEM. Variants identified by sequencing were automatically assessed on rarity, protein consequence, predicted pathogenicity (including CADD), and matched with catalogued pathogenic variants in ClinVar and HGMD to predict disease status. Overall, this combination was neither sufficiently sensitive nor specific compared to MS/MS, and exome sequencing notably missed a number of cases in which a pair of rare, protein-altering variants were absent from the causal gene. However, performance varied between IEM, and in some cases, provided more specific diagnoses than analyte testing. 32% of pathogenic variants were absent from HGMD and ClinVar. Critically, sequencing led to several false positives in which an individual harbored a pair of rare, protein-altering variants in an IEM gene, but did not have the associated disorder. These false positives significantly limit the ability to use DNA sequencing for screening, and could be mitigated by more accurate computational methods that distinguish pathogenic from benign protein-altering variants.
Ethics spotlight 2: Can we view the classification of VUS as a social justice opportunity?
Whether the classification of VUS and IEMs can offer a fairer distribution of the benefits of sequencing technologies across all population groups is a significant question. Most large datasets in the US contain homogeneous ancestry that is unrepresentative of disadvantaged groups 48,49 . In addition to the need to improve predictive methods for IEMs, screened individuals need to be considered as part of a social group in relationship to a wider and unequal social system. The moral obligations embedded within the ethics of clinical research and practice need to be better integrated 18 . For individuals seeking healthcare, polygenic risk scores are more accurate for patients of European ancestry because the data on which algorithms are trained comes largely from individuals of European ancestry 50,51 . Similarly, variant impact predictors tend to train catalogued variants from databases, which are not representative of all ancestries. ClinVar was recently found to be missing a large number of hearing impairment variants that primarily affect individuals of African ancestry 52 , likely indicative of a broader pattern. For variant predictors, this bias will lead to greater reliance on European ancestry variants and European genetic context, producing less accurate classification of IEM and PGX variants in other ancestries (e.g., African ancestry), which would only compound existing injustice in healthcare access for underrepresented populations 53,54 . Disparity in ancestry representation is especially stark in data sources for Genome Wide Association Studies (GWAS), where European ancestry disproportionately represents 81% of the data set population 48 .
Can we alleviate healthcare disparity by closing current ancestry gaps in GWAS? Given evidence that polygenic risk scores can be improved upon by incorporating datasets for a broader range of genetic ancestries 55 , it is imperative that GWAS strive for fairer training data also. As the field matures to consider the role of genetic modifiers 56 , as well as social and environmental interactions 57 , results from GWAS of diverse individuals are needed to consider the effects of genetic modifiers and the environment on variants. Newborn screening programs, with their mandatory collection and the near universal application of testing, provide a diverse and truly representative set of individuals 58 . That said, racial discrimination in healthcare and healthcare research is not simply resolvable through technical fixes. Redressing data under-representation and health equity in machine learning precision medicine must be viewed in the context of broader social change, which we pick up on in the next Box regarding questions of social obligation.

Opportunities in rare variant evaluation
In predicting the effect of a variant on gene function, we can predict its effects on the system, such as a metabolic pathway, and then on the physiology/pathophysiology. Cataloging observed likely clinically impactful variants in databases such as ClinVar and PharmVar 32 can be effective for determining the pathogenicity of more frequent rare variants, (allele frequency between 0.01% and 1%). These variants are common enough that they have been identified in multiple individuals and therefore the effect on phenotype can be verified. However, ultra-rare variants, defined as having an allele frequency less than 0.01%, are responsible for a large portion of rare genetic disorders. Publicly available databases of PKU patients indicate that 60% of cases involve at least one ultra-rare SNV, and in 28% of cases the individual carries an ultra-rare variant on both copies of PAH. Some of these ultra-rare variants may be de novo mutations, and the individual may be the only person known to harbor that exact variant 59  Emerging computational algorithms may serve as a means for evaluating the impact of rare variants in IEM and PGx genes. As previously stated, existing algorithms have limited ability to accurately predict the impact of variants in these genes, especially among rare variants. Methods have been developed to specifically evaluate variants in pharmacogenes, but these are largely based on existing methods and may have some of the same inherent biases 41 . Machine learning has revolutionized computer vision and natural language processing by effectively analyzing spatial and sequential data [60][61][62] . With the rapid growth of biological data, deep learning has also been extensively used in bioinformatics [63][64][65][66][67][68][69][70] , including transcription factor binding site prediction 71 , genome functional annotation 72 , and assessment of variant function 73,74 . Several methods have been developed specifically for the evaluation of alleles in pharmacogenes, namely CYP2D6 75,76 .
These purpose-built models outperform existing methods and are capable of assessing the impact of any combination of variants observed in a haplotype, rather than single variants. One major drawback of deep learning is that it requires an immense amount of data in order to estimate the large number of parameters required for good performance 77,78 .
Transfer learning offers an opportunity to leverage the power of deep learning in situations where data is limited. It is difficult to obtain sufficient data to develop phenotype prediction algorithms from genomic data using deep learning, especially when we only have 10s or 100s of individuals with both genome sequencing data and well-characterized clinical or molecular phenotypes.
Transfer learning is an emerging approach for overcoming the limited data challenges. The idea is to build models that perform a task (X) that is similar to the goal task (Y) but for which there are large amounts of relevant real or simulated data. Once the model for solving task X is performing well, it can be refined with data relevant to task Y. In the case of predicting variants, we might build a model using data from a well-studied gene (X) and then refine the model with data from a poorly-studied gene (Y). The resulting model may perform very well on Y, since the "lessons" learned in modeling X transfer well to Y [79][80][81][82][83][84] . There are several flavors of transfer learning that have been applied to applications in genetics and proteomics. Convolutional neural network (CNN) based approaches pre-train weights of convolutional layers on large datasets that can be fine-tuned on smaller datasets 75 . Transformer based approaches, frequently used in natural language processing, have been applied to functional predictions of variants in proteins 85,86 .
Graph-CNNs have been used to make drug binding predictions using protein structure data after being pre-trained using an unsupervised learning step 87 . These transfer learning methods could in theory be used to create structure-based predictions of the effect of amino acid changes on drug binding. These methods combined with chemoinformatics approaches for representing drug molecules could be used to create substrate-specific predictions of drug-protein interactions and how genetic variants may influence that behavior.
The underlying homology between genes existing in gene families and across species may allow for an increased ability to perform transfer learning. We may be able to use knowledge learned in some domains to inform others. Not surprisingly, some rare diseases have received more attention than others, often due to the frequency of disease, serendipitous factors, and scientific opportunities. These well-studied diseases typically have significantly more variant impact data available than others. PKU has an incidence of 1 in 10,000 newborns, and there are hundreds of disease-associated catalogued variants. In comparison, tyrosine hydroxylase deficiency (THD) affects fewer than 1 in 100,000 newborns and has been associated with fewer than 20 variants.
Sequencing benefits individuals with THD less, simply because the disease is rarer and few known pathogenic variants exist. The chemical similarity of phenylalanine and tyrosine leads to a high degree of homology between phenylalanine hydroxylase (PAH) and tyrosine hydroxylase (TH), which presents an opportunity to transfer knowledge about PKU variants to better understand THD-for example, in understanding which parts of the protein may be more or less tolerant of non-synonymous mutations.
Ultimately, the goal of any variant interpretation method is to improve clinical care. Integration of genetics into the clinic is already quite challenging, and integration of computational methods for predicting variant function is rife with further challenges. Learning health systems have long been proposed as models for improving healthcare [88][89][90] , but integration of genetic data into such a system would allow for the accumulation of data to train more sophisticated predictive models as well as an opportunity to iteratively improve upon such algorithms.
A genomic learning healthcare system would allow for rapid collection and phenotyping of rare variants. Learning health systems have been proposed in healthcare since 2007, but none have integrated genetics to inform patient treatment 89 . In these implemented systems, the algorithms are constantly improving based on a feedback loop of data that is collected over the course of patient treatment. A genomic learning healthcare system would operate in much the same way, with the addition that clinical decision support is provided based on genetic data as well as clinical data 28 . In this proposed system, collection, sequencing, and analysis of patient data would be required as a first step, and would need to be available as part of the patient's clinical record in the electronic health system. This would enable clinical decision support for IEM and PGx related conditions, providing doctors with diagnosis and treatment guidance. Then, the algorithms underlying the clinical decision support can be evaluated regularly and updated based on newly available patient data. In addition to evaluating the algorithms, sequencing and analyzing important genes for every individual treated will allow for more rapid collection and phenotyping of ultra-rare variants.
The ultimate goal of a genomic learning healthcare system is to improve treatment for all patients by leveraging their genetic data. This includes triaging rare variants that may be previously unseen in patients and potentially making clinical decisions based on their predicted impact. As a conservative first step, a genomic learning healthcare system could implement existing clinical guidance models for IEM and PGx, such as the pharmacogenomics dosing recommendations from the Clinical Pharmacogenomics Implementation Consortium (which was recently supported by Medicare). Once genetic data are collected for each patient, predictive models for rare variants can be developed and implemented in clinical practice at such a time when there is sufficient confidence in the predictions of the model. Careful analysis will be needed in selecting and evaluating predictive models for both IEM and PGx, and it is likely that gene-specific models will be needed. We illustrate this framework in Figure 3, before turning to the ethical questions to be taken into account. Ethics Spotlight 3: Can blood spot DNA be viewed as a public good in a genomic learning healthcare system? Data sources and determination of data ownership and consent for secondary data use will significantly shape whether or not genomic learning healthcare systems can improve accuracy and reduce biases. Learning health care systems present unique ethical challenges that traditional clinical and research ethics -focusing on individual harms and a sharp research/clinical divide -will find difficult to address 18 . Data collection and input (step 1 and step 2 of Fig 3) differs between clinical and public health repositories in terms of provisions around secondary use. The use of AI in healthcare systems is complicated by issues arising from the possible encoding and routinization of human bias, even with the use of seemingly neutral data sources 49 . Further, AI is becoming the repository of the collective medical mind 26 . More than simply doing no harm, a genomic learning healthcare system should actively support greater health equity 4,95 . Can clinical data be viewed as a 'public good' insofar as all stakeholders (both healthcare and private industry) hold a moral obligation to use and share clinical data in ways that benefit society over and above individual or commercial interests 19 ?
If viewing clinical data as a public good, of greatest difficulty would be how to deal with computational predictors and healthcare outcomes that accurately capture differences not so much resulting from human input biases but rather serving unfair social conditions. For public health data use, it is important to identify and address social and political inconsistencies in the ethical oversight from Institutional Review Boards and government bodies, particularly in regard to informed consent and anonymization of data 96 . This requires careful consideration for the nuances of beneficence regarding collections and distribution of genomic information 97 . The current justification for the mandatory nature of newborn screening rests on the potential harms to the child were they not screened for these treatable conditions (see Johnston et al. 2018 for a full historical justification) 20 . Safeguards are needed to protect the storage and research use of genetic data, which could become more identifiable. 98 With such protections, could the practice of informed consent with individuals be seen as less important than another process to ensure respect for autonomy at a group level, in order to meet social obligations to contribute to both greater knowledge and efforts to reduce social inequity in health 18 ? Because biobanks of newborn blood-spots provide a rich and unique dataset for research and improving newborn screening (and other genetic testing) --with enormous potential for contribution to a genomic learning healthcare system --the loss of such potential (if secondary use of blood spots is only permissible on a individual consent basis) needs to be carefully weighed up against ethical concerns about respect for individual control. How do we ensure respect for individuals in a genomic learning healthcare system that relies on the collective contributions of entire populations in order for everyone to potentially benefit? These are issues that our research must continue to engage with directly.

Conclusion
The defining problem of the genomic age is the interpretation of human variation. In reviewing computational advancements and ethical concerns, we look to develop gene-specific variant interpretation algorithms with a genomic learning healthcare system that builds from a focus on early-onset treatable disease in newborns and actionable pharmacogenomics recommendations.
We seek diagnosis of IEM and treatment for PGx that is tailored to each individual, and treatment outcomes that are shared to improve treatment for future patients across all of society. The existing system is the first step towards this goal, as evidenced by confirmatory sequencing of patients and variant cataloging in databases such as ClinVar. Yet the existing system falls short, because it is reactive rather than predictive, and accurate treatment depends on whether the variant has been previously seen and cataloged. Importantly, it remains to be determined whether computational methods can alleviate health inequity that is reinforced by these limited variant databases. Pervasive sequencing may indeed present a social justice opportunity: to actively promote a more fair and consistent distribution of treatment across all population groups. Yet, there are many barriers in the way, including unrepresentative sequencing databases, secondary data use permissions, barriers to healthcare access, and existing biases at the human interface of research and caregiving.
There are technical challenges, including accurate variant classification, data limitations, and growing numbers of variants of uncertain significance. A combination of integrated learning and transfer learning can overcome existing data limitations in order to improve the computational prediction of variants. An increased understanding of each patient's variants will enable more precise diagnosis and treatment. Most importantly, as more patients provide information into the system, lessons learned from one patient may inform the care of and benefit all patients. A dynamic and fair genomic learning healthcare system will create the greatest patient benefit from the captured genomic and phenotypic information, but this will fundamentally depend on careful consideration of societal implications.