Correction of the Sickle Cell Mutation Through Base and Prime Editing in Hematopoietic Stem Cells

Sickle cell disease is characterized by stiff, “sickled” red blood cells that have difficulty moving through the bloodstream and do not efficiently carry oxygen. It is an inherited disease with severely limited treatment options, and is caused by a point mutation. Its prevalence in black and brown communities makes the already limited treatment options even less accessible. Base editing and prime editing are two relatively recent discoveries in the field of genome editing and were developed after the groundbreaking discovery of the CRISPR Cas9 system. While not fully tested, they hold a lot of promise in providing alternative treatment options for sickle cell disease. Both editing systems are able to install individual point mutations in the beta globin gene, which is where the sickle cell mutation occurs, and can thus cure sickle cell disease (in theory). In this paper we outline the mechanisms of CRISPR-Cas9 systems and base and prime editing, and provide insight into how to apply them to treat SCD. Further investigation should be done on specific editing systems and designs to use to ensure optimal treatment of SCD. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 21 September 2020 doi:10.20944/preprints202009.0490.v1 © 2020 by the author(s). Distributed under a Creative Commons CC BY license. INTRODUCTION Sickle cell anemia (also known as sickle cell disease) is an inherited disease that causes red blood cells to become stiff and misshapen. It is an autosomal recessive trait, characterized both by the unusual shape of the red blood cells and also a subsequent decrease of oxygen flow throughout the body. SCD, or sickle cell disease, is caused by a single point mutation in one of the beta chains of the hemoglobin protein. A thymine in the DNA is swapped out with an adenine, causing valine to form in place of glutamic acid. The hemoglobin with this mutation becomes cohesive, meaning the proteins stick to each other, which is what causes the stiffness and shapes of the red blood cells. As these cells move through the body, they clog up capillaries, causing a lack of blood and therefore oxygen flow to many parts of the body. This results in pain throughout the body and more serious conditions like heart attack or stroke. There are no current cures for SCD outside of largely expensive and inaccessible treatments but there are many strides being made towards alternatives through genome editing techniques and developments. An extremely prominent genome editing technique is CRISPR, or clustered regularly interspaced palindromic repeats, which induces breaks in the DNA. The most commonly used CRISPR protein is Cas9: it is attached to a piece of guide RNA (gRNA) encoding a reprogrammable target sequence and the Cas9 enzyme cuts the DNA there. Despite the relative ease of CRISPR gene editing, there are a few downsides. Indels, or random insertions and deletions, are often found at the target site as well as undesirable off-target double strand breaks in the DNA (DSBs). This makes it extremely difficult to use Cas9 to fix single point mutations such as the one that causes SCD. Further development of genome editing techniques have revealed new technologies utilizing CRISPR for more targeted and concise editing. In base Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 21 September 2020 doi:10.20944/preprints202009.0490.v1


INTRODUCTION
Sickle cell anemia (also known as sickle cell disease) is an inherited disease that causes red blood cells to become stiff and misshapen. It is an autosomal recessive trait, characterized both by the unusual shape of the red blood cells and also a subsequent decrease of oxygen flow throughout the body. SCD, or sickle cell disease, is caused by a single point mutation in one of the beta chains of the hemoglobin protein. A thymine in the DNA is swapped out with an adenine, causing valine to form in place of glutamic acid. The hemoglobin with this mutation becomes cohesive, meaning the proteins stick to each other, which is what causes the stiffness and shapes of the red blood cells. As these cells move through the body, they clog up capillaries, causing a lack of blood and therefore oxygen flow to many parts of the body. This results in pain throughout the body and more serious conditions like heart attack or stroke.
There are no current cures for SCD outside of largely expensive and inaccessible treatments but there are many strides being made towards alternatives through genome editing techniques and developments. An extremely prominent genome editing technique is CRISPR, or clustered regularly interspaced palindromic repeats, which induces breaks in the DNA. The most commonly used CRISPR protein is Cas9: it is attached to a piece of guide RNA (gRNA) encoding a reprogrammable target sequence and the Cas9 enzyme cuts the DNA there. Despite the relative ease of CRISPR gene editing, there are a few downsides. Indels, or random insertions and deletions, are often found at the target site as well as undesirable off-target double strand breaks in the DNA (DSBs). This makes it extremely difficult to use Cas9 to fix single point mutations such as the one that causes SCD. Further development of genome editing techniques have revealed new technologies utilizing CRISPR for more targeted and concise editing. In base editing, an enzyme is used to swap out singular bases with little risk of inducing indels, making it much more useful for treating and curing SCD than CRISPR Cas9 alone. Prime editing is a newer technique similar to base editing that changes single base pairs with increased options of what changes can be made, and can also install precise insertions or deletions at a given target without DSBs. Base editing and prime editing both offer new ways for the mutation causing SCD to be directly modified and treated.

CLINICAL BACKGROUND
Sickle cell disease primarily affects areas with higher rates of malaria due to immunity provided by SCD against it. These regions are found in Africa and the Middle East. Thus, people within demographics originating from these areas exhibit higher rates of SCD (1). The areas within the United States where SCD is most common are black communities, and SCD is more likely to be inherited by African-Americans than those of other races and ethnicities. While the appearance of SCD in these communities is due to genetic factors, the impact of it on the population is exacerbated by the lack of quality healthcare provided to them.
Sickle cell disease is characterized by stiff, "sickled" red blood cells that have difficulty moving through the bloodstream and do not efficiently carry oxygen (figure 1a). The most prominent symptom associated with SCD is periodic episodes of pain or sickle crisis in which people experience bouts of bodily pain that sometimes result in hospitalization as well as later conditions like necrosis and lung disease. Acute chest syndrome, hypersplenism, and other conditions can occur (2). It is also associated with anemia, and tissue injury with organ dysfunction and failure (3). The life expectancy for those with SCD is therefore shortened in comparison to the general population (2). Psychological effects of SCD include higher rates of anxiety and depression due to the mental tax of chronic illness. Perceived stress is also extremely common in patients, linking back to symptoms of anxiety and depression (4).
The hemoglobin protein functions as an oxygen carrier. In healthy adults, it is made up of two beta globin chains and two alpha globin chains, with each chain also containing a heme group. Heme groups are unique to hemoglobin and myoglobin and allow them to bind oxygen effectively. In those expressing the sickle cell trait, one of the beta chains on the hemoglobin protein is modified, meaning that a beta globin gene in the DNA is mutated (5). The specific mutation occurs on chromosome 11p5.5 where a glutamic acid (CTC) is mutated to become a valine (CAC) (figure 1b). The beta globin becomes abnormal, forming hydrophobic interactions with other beta globins. These hydrophobic interactions between beta globins lead to polymerization of the hemoglobin as a whole, making it stiff and sickle-shaped (figure 1c). This modified hemoglobin that is characteristic of SCD is known as HbS, or sickle hemoglobin (3).
Those homozygous for SCD have two copies of this hemoglobin and their genotype is known as HbSS, while there are some who are heterozygous for it with the genotype HbSC, where they have a copy of the modified SCD hemoglobin but also a copy of HbC, or hemoglobin C. HbC occurs when that same glutamic acid is replaced by lysine rather than valine. HbSC is generally much less severe than HbSS, but still makes up about a third of SCD cases overall (6,7,8).
There are also some rarer heterozygous genotypes like HbS beta thalassemia, HbSD, HBsE, and HBsO, where one inherits one sickle cell gene and another abnormal beta thalassemia, hemoglobin D, hemoglobin E, or hemoglobin O gene (9).
The treatment of SCD includes blood transfusion that reduces the concentration of sickled red blood cells in the bloodstream, seeming to effectively reduce symptoms like stroke in SCD patients, while the only cure that exists is human progenitor cell transplantation, which can include bone marrow, peripheral blood stem cells, and umbilical cord cells (10,11). However, the cost of blood transfusions can be extremely high per unit and the cost of cell transplants can be even higher. This provides barriers to black and often impoverished communities affected by SCD by withholding treatment from those who cannot afford it. Myeloablative allogeneic hematopoietic stem cell transplantation (HSCT) is a current treatment used for SCD that seems to be most effective on children (12). There are other treatments related to stem cell transplantation that are being investigated, like the nonmyeloablative version of HSCT that worked on adults severely afflicted by SCD in a short-term study (13).
Another possible cure that is being tested is gene therapy. LentiGlobin is a gene therapy by Bluebird Bio that treats SCD by bringing functional beta globin genes that have been modified into hematopoietic stem cells (HSCs), which transform into red blood cells. These stem cells produce a new kind of hemoglobin, HbA, upon becoming red blood cells, diluting the concentration of sickled hemoglobin being produced (14). There have so far been few adverse effects and LentiGlobin is set to be approved in late 2021 (15).

THERAPEUTIC STRATEGY
A recent development in genome editing techniques led to a series of breakthroughs.
CRISPR systems allow for editing that targets specific DNA sequences, making them much more effective than past strategies that utilized TALENs and zinc fingers. TALENs, or transcription activator-like effector nucleases, are effector domains that each recognize a specific nucleotide. Zinc fingers are similar but each of them recognizes a three to six base pair sequence.
After recognizing a certain sequence, they cleave the DNA at that sequence. It is extremely tedious to use zinc fingers and TALENs to edit DNA sequences due to having to arrange many of them to recognize a specific sequence at any given time. The challenges posed by zinc fingers and TALENs were then addressed by the development of CRISPR systems.
CRISPR systems are naturally occurring bacterial and archaeal systems. CRISPR systems are composed of CRISPR-associated (Cas) genes, noncoding RNA sequences, and repetitive elements (direct repeats), which are interspaced with protospacers and comprise crRNA, or CRISPR RNA (16,17). Protospacers are exogenous DNA targets that are associated with a certain PAM (protospacer adjacent motif) sequence (18)(19)(20). There are four classes of well-characterized CRISPR systems: type I, type II, type V, and type IV. Type II CRISPR systems are usually associated with Cas9 or Cas9-like proteins, and contain Cas1 and Cas2 genes, much like type I systems (21). They also contain genes encoding tracrRNA, which is trans activating CRISPR RNA that pairs with crRNA to form gRNA (21)(22)(23). Cas12 and Cas13, also class II systems, have also been used in gene editing.
The most commonly used CRISPR enzyme is Cas9, an endonuclease that uses a reprogrammable guide RNA target sequence to cut DNA at specific sites. Within a CRISPR system, a Cas9 enzyme binds a piece of RNA that is complementary to a target DNA sequence.
This RNA is known as guide RNA, or gRNA. gRNA includes separate tracrRNA and crRNA as they exist naturally. Single guide RNA, or sgRNA, is a combined tracr-crRNA molecule.
Varying sgRNAs result in different functioning levels of CRISPR systems (24). The sgRNAs in CRISPR Cas9 systems recognize specific PAM sequences and Cas9 enzymes cut the DNA ahead of them, generating DSBs (figure 2). From here, the cut ends of the DNA can be rejoined through either non-homologous end joining (NHEJ) or homology directed repair (HDR) ( figure   3) (17). In NHEJ, exonucleases chew away the bases on the ends of the cut DNA so that it can be stitched back together. This can result in indels and frameshift mutations, which can lead to loss-of-function in a gene or have other potentially harmful effects (25). They can cause dangerous amino acid and protein mutations, which can result in uncontrollable cell growth (cancer) and other detrimental conditions. In HDR, a template for repairing the DNA at the DSB is inserted along with the protein that cuts it. Polymerases build off of this template to create a bridge between the cut ends of the DNA with a specific sequence. HDR also frequently results in extremely high numbers of indels, making it a risky strategy to use (26). There were several gene editing techniques developed following the discovery of CRISPR (figure 4), including base editing and prime editing.
In base editing, editors install specific single-base mutations in DNA. These editors combine dCas9, catalytically inactive Cas9, with a single strand deaminase enzyme. The dCas9 targets a DNA strand and unwinds the double helix without initiating a DSB, and the strand with the PAM forms a single stranded R-loop. The deaminase enzyme then targets a specific base on this R-loop to edit (27,28,29). There are two types of deaminase enzymes that can be used alongside the dCas9 that they are attached to in a base editing system. Cytidine deaminases target cytosines in the DNA and convert them to uracils, which are then read as thymines by polymerases (30, 31) (figure 5a). There are four different base editors that use cytidine deaminases. BE1 is the first base editor that was developed with the mechanisms described above. BE2 targets endogenous DNA repair mechanisms that are unfavorable for base editing by inhibiting uracil DNA glycosylase, or UDG, with uracil DNA glycosylase inhibitor (UGI) (32).
These endogenous repair mechanisms often correct the changes made by base editors, and inhibiting these mechanisms increased the efficiency of base editors by 3x (30). BE3 works to cut the non-target DNA strand so that the DNA host machinery will work to repair the cut rather than the edited bases on the target strand (33). BE3 uses nickase Cas9 rather than dCas9 for this (27). BE4 encodes two UGIs rather than just one like in BE2, increasing efficiency even more (31). Deoxyadenosine deaminases were developed after cytidine deaminases through directed evolution of a single stranded RNA deaminase. Deoxyadenosine deaminases are a key part of adenine base editors (ABEs) that convert adenosines in the R-loop to inosines (figure 5a). These inosines are then read by polymerases as guanines, making it an A to G mutation (34).
Since SCD is caused solely by a point mutation in one gene, base editing is a strategy that would likely be extremely useful in treating and curing it. However, the mutation causing SCD is a thymine to adenine mutation, and correcting it would mean converting from A to T, which is not possible with current base editors. Converting this adenine to a guanine instead effectively changes the amino acid from a valine to an alanine, which does not reverse the SCD mutation but results in the production of Hb G-Makassar, a naturally occurring hemoglobin variant that does not negatively affect patients and acts like normal hemoglobin (35). Observations were made regarding different base editors and spacer sequences to correct the SCD mutation using the CRISPR BE-Hive, a web app that predicts purity and editing efficiencies of different base editors given a target sequence (figure 5b). It was found that the adenine base editor, or ABE, worked better than ABE-CP1041 at changing the adenine causing SCD to a guanine. While ABE has an overall lower editing efficiency than ABE-CP1041, it has a significantly higher purity which minimizes the possibility of changing other amino acids in the polypeptide chain. In the 3.5% of times where valine is not changed to alanine in mES cells, there is no amino acid change at all which also minimizes risk compared to actively changing other amino acids. Meanwhile, ABE-CP1041 with the same spacer sequence in mES cells has a 1.1% chance of changing a completely different amino acid from a serine to a proline. It also has a 41.4% chance of making no amino acid changes at all, meaning it does not edit HBB as well as ABE even though the efficiency is about 10% higher (efficiency is based on a 30% average efficiency level).
Prime editing is a newer gene editing strategy somewhat similar to base editing in that it can correct individual bases. It can install essentially any mutation or edit in a DNA sequence.
Prime editing uses a pegRNA, or a prime editing guide RNA, and Cas9 nickase fused to a reverse transcriptase domain (figure 6a). The pegRNA contains a specific sequence that hybridizes to the target DNA sequence, which is then nicked by the Cas9 nickase to produce a flap of DNA. Only the strand containing the PAM is nicked. From here, the pegRNA extension containing the template for DNA repair that would install the desired edit would bind to the cut sequence. This edited flap is able to replace the original DNA sequence randomly. The edited flap is a 3' sequence that contains the DNA synthesized by the reverse transcriptase from the template in the pegRNA and the original flap is a 5' sequence. 5' ends are more likely to be excised while 3' ends are more likely to be ligated which makes prime editing even more efficient. The entire process continues to cycle until the edited flap successfully replaces the non-edited flap (figure 6b). This basic mechanism is known as PE1, and better prime editing systems were developed afterward. PE2 engineers the reverse transcriptase in PE1 to improve overall editing efficiency. It makes prime editing more compatible with shorter primer binding site sequences (PBS sequences). PE3 systems were developed after. They increase editing efficiency by about 3x in comparison to PE2 and therefore about 9x in comparison to PE1 and use sgRNAs alongside Cas9 nickase in nicking the non-target strand. This reduces the number of concurrent nicks, but still has a high number of indels. PE3b is a PE3 system that uses sgRNAs with spacer sequences that match the edited DNA strand instead of the original one. These sgRNAs further discourage nicking until after the edit has been installed in the sequence containing the PAM, which reduces the rate of indels by about 13 times in comparison to PE3 (26).
Prime editing has also been compared to HDR, when DNA repair mechanisms build off of a template inserted with the editing protein. While HDR is in theory an excellent tool for installing specific mutations in DNA, it results in a large number of indels (26) (figure 6c). This can be attributed to the fact that HDR is involved in Cas9 cutting, which induces DSBs and thus causes indels. Additionally, HDR is typically only active in dividing cells, so using it in stem cells or other non-dividing cells is not feasible (36).
Due to the ability of prime editing systems to install virtually any mutation in a given DNA sequence, they hold a lot of promise for the future of treating and curing genetic diseases like SCD. While base editing cannot completely reverse the mutation causing SCD due to the limits of the deaminases in the system, prime editing can. A pegRNA optimized for treating SCD specifically must be developed. This can be done virtually on PrimeDesign to both choose specific attributes and get a visualization of the completed pegRNA and extension. PrimeDesign asks for pegRNA spacers, pegRNA extensions, and ngRNA spacers. The pegRNA extension section includes PBS length, PBS GC content, RTT length, and RTT content, as well as the extension itself. No PBS length or GC content is explicitly more beneficial than another. RTT lengths are also varying, with shorter and longer lengths being more efficient depending on the specific target site. However, the one steadfast guideline is that the pegRNA's 3' extension should not start with C because generally results in a lower editing efficiency (26). An example pegRNA to edit sickle beta globin (figure 6a) contains a spacer sequence of CATGGTGCATCTGACTCCTG on the negative strand, with a spacer GC content of 0.55. The peg-to-edit distance is 5, and the pegRNA extension is GACTTCTCCTCAGGAGTCAGATGCACC. It has a PBS length of 14 and an RTT length of 13, while also having relatively high GC contents for each of these: 0.57 and 0.54, respectively (figure 7). While GC content typically does not have a large impact on efficiency (assuming it is between 0.40 and 0.60), a higher GC content translates to a more stable pegRNA (26). This pegRNA is most likely not optimized for SCD and requires further investigation into RTT and PBS lengths.
The current animal models most commonly used for SCD are transgenic mouse models.
These models express normal human hemoglobin A (HbA) instead of mouse hemoglobin (37,38,39). Transgenic mice with human beta globin genes were first developed with correct regulation of the expression of the gene and were first used to study globin switching, which is the switch from fetal to adult hemoglobin (39,40). Mice were then used as models for beta thalassemia, allowing scientists to experiment with gene regulation and phenotypic variances (41). Mouse models of sickle cell disease were then created, the first being SAD mice. These mice exhibited mild symptoms of the disease along with visibly sickle cells, but did not show signs of anemia in adulthood. They expressed a modified type of hemoglobin known as SAD hemoglobin, a hemoglobin that included a novel SAD beta globin gene and otherwise normal elements (42). SAD mice were used extensively in testing potential therapies for SCD, specifically in inhibiting the agents actually causing the sickling (37). Further development resulted in more complex models expressing both HbS and HbS-Antilles, which is a form of sickle hemoglobin with reduced oxygen affinity and solubility (43). HbS-only "knock-out" models were then made, with the mice exhibiting severe sickle cell anemia. These mice had an extremely low lifespan and low hemoglobin blood level (44,45). Other versions of S-only models were also developed, the most effective one being the "knock-in" model. This model resulted in mice with less severe SCD and a longer lifespan (46).
Using prime editing and base editing beyond a theoretical treatment of SCD is one of the logical next steps. In base editing, the base editor must be tested first within a cell culture, editing stem cells that eventually produce sickle red blood cells. This provides a way to see how well the SCD mutation is altered in hematopoietic stem cells without using a complex model to do so.
Similarly, with prime editing, testing the prime editing system in vitro is essential to gauging the effectiveness of the pegRNA and associated components.
Following in vitro testing is in vivo testing. As previously established, there are mouse models of SCD that have been used extensively to test different gene therapies (37,39,47).
Models can be ordered from mouse vendors, and offer a more realistic outcome for gene editing than in vitro testing. Editors can be delivered to cell cultures via lipofection, whereas delivery techniques for living organisms are considerably more limited to specific viral and non viral techniques. Viral delivery techniques are the most promising option for delivery of editing systems (33). This is due to their ability to enter cells in a less traumatic way than nonviral delivery methods like electroporation. They consist of viruses with parts of their genomes removed, which increases safety. Retroviral delivery, while often advantageous, works poorly in vivo and thus should not be used in this context (48)(49)(50)(51)(52)(53)(54)(55) . Of the different viral techniques, AAV (adeno-associated virus) and lentiviral delivery seem to be the most promising. Others include herpes simplex viruses (HSVs), adenoviruses, and lentiviruses. HSVs and adenoviruses can cause immune responses, making them less effective than AAV, which is both non-inflammatory and non-pathogenic (33,56). Lentiviral delivery is immunogenic as well but offers an advantage over AAV in that it can package a much larger amount of DNA (57). AAV would require multiple deliveries due to its small packaging capacity. Both strategies should be further evaluated and used to deliver editors to stem cells both in vitro and in vivo. The results of in vivo testing give a foundation for how genome editing for SCD actually manifests in living organisms.

DISCUSSION
Sickle cell disease is caused by a single point mutation that modifies an amino acid. It affects mostly black and brown communities stemming from areas with high rates of malaria, and lacks accessible and affordable treatments. Introduction of base and prime editing as alternatives, while not providing a lower-cost alternative, offers new ways to treat SCD that will hopefully be more accessible to some. Base editing modifies the mutation causing SCD so that a different amino acid is formed that is not the original one but still functions similarly and effectively treats SCD. Prime editing directly reverses the mutation to make the sickle hemoglobin back into normal hemoglobin. Both have their own benefits, with base editing being more well tested and able to modify the mutation, and prime editing being able to reverse the mutation entirely.
Physical testing of base editing and prime editing to treat SCD is necessary to ensure that they actually work. In vitro and in vivo studies should be completed and documented to accurately portray how base editing and prime editing affect SCD in a clinical setting outside of online simulators. Prime editing specifically requires a carefully optimized pegRNA developed specifically for the cell line(s) used, since optimal RTT and PBS lengths differ from cell to cell.
This will call for further investigation into prime editing specifically for SCD. Additionally, delivery techniques for editing systems should be further investigated. Lentiviral delivery and AAV should be compared to determine which is best for base editing and prime editing, respectively. This can be done by further research into previous uses of them, as well as experimentation on hematopoietic stem cells in SCD transgenic model mice.
Due to the presence of human HbS in model mice, a study affirming the effectiveness of base and prime editors in mouse models will confirm their ability to modify human sickle hemoglobin. This will allow scientists to determine whether base and prime editing are possible treatments for SCD, and determine what kinds of guidelines need to be implemented to ensure that the editors function optimally. This can include which kind of prime editing system to use and which delivery method to use.
Assuming the study yields desirable results, further testing would need to be done to confirm the effectiveness of editors, specifically in different hematopoietic stem cell lines in vitro, due to the efficiency of base editors and prime editors differing between cell lines.