Transgene Mapping in Animals: What to Choose?

Alexander Smirnov; Maksim Makarenko; Anastasia Yunusova

doi:10.20944/preprints202505.0649.v1

Submitted:

08 May 2025

Posted:

09 May 2025

You are already at the latest version

Abstract

The phenomenal progress in biotechnology and genomics is both inspiring and overwhelming — a classic curse of choice, particularly when it comes to selecting methods for mapping transgene DNA integration sites. Transgene localization remains a crucial task for validation of transgenic mouse or other animal models generated by pronuclear microinjection. Due to the inherently random nature of DNA integration, reliable characterization of the insertion site is essential. Over the years, a vast number of mapping methods have been developed, and new approaches continue to emerge, making the choice of the most suitable technique increasingly complex. Factors such as cost, required reagents, and the nature of the generated data require careful consideration. In this review, we provide a structured overview of current transgene mapping techniques, which we have broadly classified into three categories: classic PCR-based methods (such as inverse PCR and TAIL-PCR), next-generation sequencing with target enrichment, and long-read sequencing platforms (PacBio and Oxford Nanopore). To aid in decision-making, we included a comparative table summarizing approximate costs for the methods. While each approach has its own advantages and limitations, we highlight our top four recommended methods, which we believe offer the best balance of cost-effectiveness, reliability and simplicity for identifying transgene integration sites.

Keywords:

transgene

;

TAIL-PCR

;

long-read sequencing

;

genome walking

;

pronuclear microinjection

Subject:

Biology and Life Sciences - Biochemistry and Molecular Biology

1. Introduction

Transgenic animals are the backbone of modern biology. It is nothing short of a scientific marvel that foreign DNA can integrate into a genome without direct assistance. However, in many cases, transgene insertion remains a black box — unless we can precisely determine the integration site. While early transgene mapping methods were often laborious and technically challenging, the excitement of discovering an integration site, especially when something was unexpectedly misplaced, was undeniable. One integration might have landed within a coding gene, another - next to a non-coding RNA that had only recently been annotated, sometimes the host gene would interact with transgene and form a hybrid transcript. With the explosion of biotechnology and genomics, a vast array of transgene mapping methods has emerged. The field has progressed tremendously — from the inventive PCR protocols of the early 2000s [1,2] to modern long-read sequencing with target enrichment and multi-omics approaches [3]. Today, integration sites, local chromatin states, and expression levels can all be analyzed in parallel with unprecedented precision and throughput [4]. Although most reviews adopt a historical, archivist perspective, we will deviate from this approach and instead focus on the practical appeal of mapping methods. Despite the wealth of available catalogs, including a recent comprehensive overview [3], researchers who simply want to sequence their transgenic mouse (or other creature) may find themselves overwhelmed (to say the least) by the sheer variety of available techniques.

The choice of mapping method is not just a matter of financial constraints (sigh!), but also depends on the type of data one seeks to obtain, including read length, on-target (transgene) coverage and discovery of accompanying genome rearrangements. So, what is the best approach? Is it expensive to just apply whole-genome sequencing (WGS) to your mice and what coverage would be enough? Would targeted locus amplification (TLA) be optimal to resolve multicopy concatemers? Should one save costs and rely on thermal asymmetric interlace PCR (TAIL-PCR), leaving the results to sheer luck (quite literally)? In this review, we aim to share our experience and recommendations, focusing on sequencing-based approaches for random transgene insertions in animals and, to some extent, in cultured cells.

1.1. Features of Random Transgenic Insertion in Animals

In this review, we focus on transgenic animals in which random integration of DNA is typically achieved via pronuclear injection. A natural question arises: if genotyping can be easily performed using qPCR to distinguish between heterozygotes and homozygotes, why bother identifying the exact integration locus at all (Figure 1)?

From a practical perspective, knowing the integration site can prevent downstream complications, especially in cases involving multiple insertions. A large-scale analysis of F0 mouse founder lines showed that approximately 20% had more than one integration site [5]. Multiple transgene loci may lead to unexpected segregation patterns, complicating both genotyping and phenotype interpretation.

Transgene integration is often influenced by position effect variegation (PEV) — the insertion site can significantly affect transgene expression levels. This is well illustrated in Chinese hamster ovary (CHO) cells, widely used in industrial protein production. Studies have shown that transgene insertions often occur in transcriptionally active regions, which are also prone to structural instability, including rearrangements over time [6,7,8]. As reviewed by Cabrera et al, integration into such regions may enhance expression but can also interfere with endogenous gene regulation [9]. Transgenes may be influenced by regulatory sequences located at considerable distances [10], emphasizing the importance of identifying their integration sites.

Furthermore, studies in mice have shown that nearly half of random integrations could potentially disrupt host gene function, either by inserting into introns or causing deletions of coding exons — for example, 45% (17/38) in report of Yan et al [11], and 53% (21/40) in another work [12]. A recent study of the widely used Ucp1-Cre mouse line — which exhibits lethality in homozygous animals — revealed that the integration of a BAC transgene resulted in a large deletion and inversion affecting four genes, with potential additional effects on seven neighboring genes [13]. Notably, the presence of an active Ucp1 gene copy, which should not exist in the experimental model, influenced fat tissue homeostasis. Such cases are frequent, and genomic sequencing of established mouse strains often resembles archaeological investigation. According to the Mouse Genome Database, only 5% of over 8,000 documented mouse transgenic lines have had their integration sites mapped [12].

Another unanticipated feature of random integration is the cointegration of unrelated DNA fragments. Initially considered rare, such events are now frequently observed thanks to deep genome sequencing in both cell lines and animals [14,15,16]. New quantitative methods analyzing CRISPR/Cas9-induced DNA breaks have shown that DNA is often incorporated at double-stranded break (DSB) ends — at frequencies of 0.1-1% per DSB. This includes not only cotransfected DNA (which is expected to be abundant) but also genomic segments, repetitive elements, and regulatory sequences. For example, Geng et al reported a striking “insertional bingo” event, discovering a ~200 bp fragment of E. coli DNA, a ~6 kb Cas9 plasmid backbone, and a local genomic duplication at the Cas9 target site [16].

Following pronuclear microinjection, the DNA repair machinery recognizes linear transgene ends and attempts to resolve DSBs by ligating whatever DNA is available [17]. Most commonly, transgene fragments are joined into concatemers, but integrations can also include plasmid backbones, bacterial genomic DNA, or even telomeric repeats (see recent review [18]). In the well-known hornless cattle case, a 200 bp “Celtic” allele was introduced using TALENs, but a plasmid backbone fragment was later discovered during FDA re-evaluation. This contamination could have been identified early using plasmid-specific primers — a practice that should become standard in long-term projects. Another illustrative case is the mouse line described by Chiang et al, in which the transgene was fragmented and inserted into host genome with a 168 bp segment of Corynebacterium DNA [19]. This sequence likely originated from the lab environment during DNA preparation. Cointegrations of E. coli fragments are very common as well [12,15]. Curiously, Hussmann et al even identified a 165 bp bovine DNA fragment integrated into a CRISPR/Cas9 reporter in human cells — presumably captured from fetal bovine serum in the culture medium [20]. These findings highlight that the nucleus is a crowded environment, and the risk of foreign DNA integration at DSBs is non-negligible and should be carefully considered during mapping. (Note to self: These risks can potentially be minimized by treating plasmid preps with exonucleases to remove bacterial contaminants and carefully performing gel extraction steps during DNA preparation for microinjections. Better be safe than risk commemorating your sloppiness in a genome of transgenic animal).

Random integration is also frequently accompanied by large-scale structural rearrangements, including deletions, inversions, tandem duplications, and chromosomal translocations. Goodwin et al found that over 50% of analyzed mouse lines carried chromosomal deletions, while 15 out of 40 also harbored duplications [12]. Similarly, Cain-Hom et al reported two chromosomal translocations, two cointegrations, and three duplications near the insertion sites in Cre-deleter rodent lines [14]. Numerous other cases involving large tandem duplications have also been described [21,22,23,24]. The underlying reasons for the high frequency of duplications near integration sites remain to be fully elucidated.

Even when such structural changes do not directly affect the phenotype, they can interfere with genotyping, copy number analysis, and transgene detection. Therefore, high-resolution mapping—such as through long-read sequencing (LRS) or TLA — is strongly recommended, even for supposedly "well-characterized" transgenic lines.

1.2. PCR-Based Methods for Transgene Mapping

Between 1990 and 2010, before affordable WGS became widely available, scientists had to get creative in their quest to locate transgene insertion sites. The early 21st century saw a boom in transgenesis, creating an urgent need for robust and accessible mapping techniques. And scientists rose to the challenge. The field exploded with innovation, leading to the development of numerous PCR-based methods — many falling under the umbrella of "genome walking" term [2,25,26]. In an impressive 2011 review, Leoni et al. catalogued 53 different genome walking methods, based on diverse principles such as: ligation of universal adapters to linearized genomic DNA (LM-PCR), linear amplification using biotinylated primers (LAM-PCR), circularization of restriction fragments (Inverse PCR (iPCR)), annealing of semi-random primers (e.g., TAIL-PCR, PST-PCR) [2]. While these basic principles remain unchanged, many clever modifications have since appeared — ranging from improved degenerate primer designs for TAIL-PCR, to tagmentation-assisted adapter ligation [27], or sonication-based approaches replacing enzymatic restriction in inverse PCR[28]. The number of such methods likely exceeds a few hundred by now. All of them can be effective for transgene mapping in animals, but without direct meta-analysis under similar conditions, it’s not very useful to discuss each one in detail.

Inverse PCR (iPCR) is one of the earliest and most widely used PCR-based approaches for mapping transgene integration sites [29,30]. Genomic DNA is first digested with restriction enzymes. The resulting fragments, including transgene–genome junctions, are self-ligated to form circularized DNA molecules. Outward-facing primers complementary to the transgene sequence amplify the unknown flanking region. Its efficiency remains remarkably high. For example, in the TRIP-Cas9 project, hundreds of transposon insertions were successfully mapped using iPCR [31]. We have also used iPCR to excise hundreds of transgene copies from a single embryo sample in order to analyze concatemer structures [32].

LM-PCR also involves restriction digestion of genomic DNA, followed by ligation with universal adapters [33]. The transgene-genome junction is amplified via nested PCR using a combination of gene-specific and adapter primers. Unlike iPCR, this method does not require digestion inside the transgene. However, its efficiency is affected by the need for adapter sets tailored to different restriction enzymes. Newer modifications introduce an additional digestion step to eliminate non-specific ligation products, but this requires precise restriction site planning and custom adapter preparation [34]. A recent version uses A-tailing, biotinylated primers, streptavidin capture, and secondary amplification [35]. Splinkerette PCR uses a specially designed hairpin adapter (formed from two ~48/61 nt annealed oligos), which provides greater specificity compared to simple ligation or circularization [36,37]. It has been used in mapping transposon and viral insertion sites [38,39,40] and was recently adapted for mapping integrations in CHO cells with high efficiency [41].

Originally developed for mapping lentiviral integrations, Linear Amplification Mediated PCR (LAM-PCR) uses a biotinylated primer to linearly amplify single-stranded DNA, which is then captured by streptavidin beads [42]. A second strand is synthesized with random primers and digested with restriction enzymes to create a ligation site for PCR adapters. This enrichment strategy improves specificity over standard LM-PCR. Later improvements replaced the restriction step: after capturing the ssDNA, a single-stranded adapter is ligated, and amplification proceeds with two primers [43]. LAM-PCR has also been combined with sonication for deep profiling of viral integration sites [44].

Let’s be honest — who wants to spend weeks optimizing complicated ligations when you could just throw a pool of degenerate primers at the problem and see what sticks. This is where TAIL-PCR shines. Unlike other methods, it doesn’t require restriction digestion, primer biotinylation, or commercial kits. All that’s needed is a few gene-specific primers and a set of arbitrary degenerate (AD) primers, such as 5’-NGTCGASWGANAWGAA-3’. First reaction of TAIL-PCR typically involves the following steps: high-stringency cycles with high annealing temperature to let sequence-specific (SS) primers generate single-stranded DNA, low-stringency cycle (~25°C) where AD primers bind randomly to genomic DNA, and normal amplification with nested SS and AD primers to enrich transgene–genome junctions. This is followed by nested PCR to improve specificity (Figure 2A).

Originally developed for T-DNA insertion mapping in plants, TAIL-PCR had a 50–70% success rate [45,46]. Later, hiTAIL-PCR improved specificity by optimizing primer structure and PCR cycling [47]. Some reports noted only 20–30% [48] or 39–69% [49] efficiency of this method, which was improved by AD primers redesign. Others observed up to 83% success of TAIL-PCR in mouse transgene mapping even when using original method [11]. Compared to other methods, TAIL-PCR has a broader range of applications and high efficiency for mapping random insertions in transgenic animals [11,50,51], cell cultures [52], zebrafish [53], and plants [54,55]. Dozens of related methods have emerged based on the same thermal asymmetry principle, including Wristwatch PCR [56], Fork PCR [57], PER-PCR [58], PST-PCR [59]. These modifications aim to reduce non-specific products or extend amplicons beyond 3-4 kb to capture structural rearrangements flanking the integration site.

Compared to other genome walking methods, TAIL-PCR remains one of the most accessible transgene mapping tools, especially for beginners. We recommend classical hiTAIL-PCR using multiple long AD primer pools to minimize the risk of amplifying transgene–transgene junctions [47]. In our own experience, this method worked in over 80% of cases [60,61], later we reanalyzed the uncharted cases with another transgene-specific primers and found end truncations [62]. That said, genomic rearrangements at transgene ends can reduce the reliability of all PCR-based approaches — sometimes, there’s just no primer-binding site at all [63]. Chimeric products due to PCR [64] and ambiguous bands where parts of the transgene map to different chromosomes [65] are not uncommon, so always confirm results with alternative methods like long-distance PCR or long-read sequencing.

While PCR-based mapping isn’t the gold standard anymore in the next-generation sequencing (NGS) era, it still offers valuable solutions for small-scale, cost-sensitive projects. Among them, TAIL-PCR remains our go-to for locating transgene insertions — requiring little more than a few PCRs and Sanger reads.

1.3. Next-Generation Sequencing and Target Enrichment

NGS has become an essential part of genomics research [66]. Objectively, the most efficient way to identify transgene integration sites is through WGS at sufficient coverage (Figure 2B). But what is the optimal genome coverage for reliable transgene detection? Several studies of transgenic mouse lines have shown that a haploid genome coverage as low as 8x [67] or 11.5x [68] may be sufficient for mapping small insertions. Srivastava et al reported unsuccessful mapping using standard paired-end sequencing at 18x coverage, and applied mate-pair sequencing instead [69]. WGS is also commonly used for mapping insertions in transgenic farm animals. Zhang et al. sequenced a transgenic cow carrying a human lactoferrin BAC insert. Although the bovine genome was sequenced at ~10× coverage, the effective coverage of the BAC insert reached 20–50× due to concatemerization. However, internal rearrangements made the structure too complex for short-read NGS to resolve [51]. Another study used ultra-deep sequencing (~268×) to analyze a 3.1 kb SRY-GFP construct knock-in [70]. Despite the high coverage, two alleles with complex structural variants had to be resolved using PacBio. Interestingly, the Cas9-linearized vector caused concatemerization but no random integrations outside of the intended “safe harbor” locus [70]. The same group successfully sequenced F1 offspring of a hornless bull with notorious backbone integration [71] at 20× coverage [72]. Transgenic crops are generally sequenced at ~13–14× [73], 21× [74], 29× [75], or even 70× [76], although T-DNA insertions are usually less repetitive and easier to map. These examples illustrate the approximate sequencing depth needed to identify insertion sites. On modern Illumina platforms (e.g., 150 bp paired-end reads), such coverage can still be relatively expensive, especially for large-scale screening (Table 1).

To improve detection sensitivity and reduce sequencing costs, target enrichment techniques have been developed to increase the proportion of reads covering transgene-genome junctions. Since an insertion site represents only a tiny fraction of the mouse genome, sequencing a 10 kb transgene at high coverage (>10×) requires just a few thousand reads—an insignificant portion of a typical NGS dataset (Table 1). Many enrichment methods have emerged (see recent review [3]), some based on earlier molecular biology strategies such as LAM-PCR [77], TAIL-PCR [78], inverse PCR [79], while others involve newer approaches like chromatin crosslinking or Cas9-mediated enrichment [80]. All these are used to enrich sequencing libraries prior to high-throughput sequencing. In this section, we briefly describe several popular enrichment methods applicable to transgene mapping: biotinylated probe capture (hybrid capture), chromatin-crosslinking (TLA), and others. The final choice depends on user expertise and available resources.

Perhaps the most widely used enrichment technique for mapping transgene integration is TLA [81]. TLA builds on the principles of chromatin conformation capture (3C/Hi-C). In the first step, formaldehyde crosslinks chromatin, fixing together DNA regions that are physically close — including the transgene and flanking genomic sequences. Next, the DNA is digested with a frequent-cutting enzyme (e.g., NlaIII), followed by religation under dilute conditions to promote intramolecular ligation. After reverse crosslinking, a second round of digestion and religation produces circular DNA molecules enriched in ligation products near the transgene. PCR with outward-facing transgene-specific primers amplifies these circles, allowing selective enrichment of flanking genomic regions. The resulting fragments are subjected to standard library preparation and sequencing (Figure 2B). Although the TLA protocol appears complex, it can be performed in any lab with modest resources [81,82,83,84]. However, data analysis requires proficiency in interpreting chromatin ligation-based datasets. For this reason, many researchers outsource TLA mapping to commercial providers like Cergentis [13,85,86,87,88]. TLA has proven particularly valuable in large-scale transgenic mouse studies [12].

Typically, the region with the highest coverage — often exceeding 100 kb—indicates the most likely insertion site. Large constructs like BACs may require 5–6 primer sets and rounds of TLA to achieve sufficient coverage [13,87]. A major advantage of TLA is that the resulting amplicons contain not only flanking sequences but also the entire transgene. Also, because homologous chromosomes occupy distinct nuclear territories, TLA is also capable of haplotyping, detecting SNVs, and identifying large structural variants near integration sites. However, the use of short Illumina reads (~150 bp) limits resolution in repetitive regions and fails to fully resolve complex concatemers. Combining TLA with LRS can improve structural resolution: transgene flanks identified by TLA can guide Cas9 digestion and Nanopore-based enrichment [89,90].

Another widely used method is hybrid target capture [3], including solid-state microarrays [91] and magnetic beads. The latter approach is more convenient. Biotinylated DNA or RNA probes anneal to denatured, fragmented genomic DNA. Hybridized molecules are captured using streptavidin-coated magnetic beads. The captured DNA is then extended by polymerase to complete sequencing templates. A major advantage of hybrid capture is that overlapping 60-120 nt probes can cover an entire transgene sequence, — especially useful for random integration mapping, because the borders of the insert could be truncated. This method has been used successfully in multiple studies [19,92], and is considered cost-effective once probes are synthesized (Table 1). For instance, Magembe et al used a pool of 413 xGen Lockdown probes to tile an 18 kb T-DNA region in plants. They found around 10-20% of target reads in the NGS data. Although probe coverage was uneven, 30 and 27 of each of the T-DNA ends from 34 lines were successfully mapped [92]. In another study, capture probes targeted bovine leukemia virus (BLV) insertions with modest enrichment: 10.2% of the total reads mapped to the target proviral genome [93]. Iwase et al used hybrid enrichment to detect HIV-1 integration sites and generated around 5% of the target reads of the total data [94].

An intriguing and recent addition to the toolbox is T7-based transcriptional mapping, used by Li et al for mapping transposon insertions [95]. This method requires addition of a ~20 bp T7 promoter near the end of the transgene. Genomic DNA is subjected to in vitro transcription, followed by cDNA synthesis using random primers—eliminating the need for restriction digestion or ligation. Although the effective read length depends on the transcription reaction, cDNAs can exceed 1 kb, enabling efficient transgene-genome junction recovery. This approach is promising but may suffer from loss of the T7 sequence during random integration events.

In summary, NGS is a powerful tool for mapping transgene insertions. For many applications, WGS or commercial TLA remains the best choice (Figure 2B), depending on budget and available expertise. However, the limited read length of short-read platforms often complicates mapping—especially for rearranged or repetitive regions. For example, in the study by Siddique et al, only one end of a T-DNA insertion was resolved even at 36× coverage [96]. In another report, Peng et al mapped a complex insertion in a repetitive region of the maize genome but even 41× WGS and TAIL-PCR failed to identify the region, which required long-read sequencing [97]. For transgenic core facilities or large-scale mouse projects, implementing enrichment protocols such as hybrid capture or TLA can significantly improve mapping outcomes. In this review, we only scratched the surface of available tools. While many protocols are low-cost, they require significant optimization and bench skills. Still, for researchers who can manage custom biotinylated probe synthesis or chromatin crosslinking, the results are often worth the effort.

As one colleague once remarked, during yet another transgene mapping crisis: "What am I supposed to do with these 100 bp snippets? Give me long reads or I’m out!" (personal communication).

1.4. Long-Read Sequencing

In recent years, two independent platforms — PacBio (Pacific Biosciences) and Oxford Nanopore Technologies — have developed third-generation sequencing (TGS), also referred to as single-molecule sequencing (SMS) or LRS [98,99]. These technologies routinely produce reads in the 10–100 kb range and avoid PCR-associated artifacts. LRS has been successfully applied for genome polishing [100], sequencing of repetitive chromosome regions [101], and even for whole-genome assembly from single sandflies [102]. Here, we focus on the use of LRS for transgene mapping and concatemer structure analysis.

The PacBio platform is based on single-molecule real-time (SMRT) sequencing. Fragmented DNA is ligated to single-stranded hairpin adapters from both sides, and a sequencing primer anneals to the hairpin region. Fluorescently labeled nucleotides allow base detection as docked polymerase molecule replicates the circularized DNA in a special well (SMRT cell). PacBio reads are typically limited to 25–30 kb, so that the circular consensus sequencing (CCS) strategy enables multiple polymerase passes over the same molecule, greatly increasing accuracy. PacBio has been used for transgene mapping in mice [32,103] and plants [104], although it is less frequently used than Nanopore. The cost of both LRS platforms continues to fall and is now broadly comparable, depending on the specific instrument (Table 1).

Oxford Nanopore sequencing works by measuring ionic current changes as DNA moves through a biological nanopore embedded in a membrane [111]. This enables extremely long (megabase) reads, although average read lengths are typically similar to PacBio. Different authors casually report long reads around 200 kb [24], 238 kb [106], or 351 kb [112]. Occasionally such reads could contain transgenes and provide valuable insight into concatemer structure.

Below are selected examples to guide Nanopore-based experimental planning. Technology and chemistry improvements are ongoing, but for most transgene mapping experiments, a single MinION flow cell (typically R9 series) can produce 5–10 Gb of data — sufficient for a typical animal or plant transgenic line. In one early study, Nicholls et al generated 4.88 Gb (1.8× haploid genome coverage) using a MinION run that yielded 611,279 reads with an N50 of 28 kb. Among these, 25 reads contained transgene fragments, but only one 5.5 kb read spanned the genome–transgene junction within a 450 kb concatemer [15]. Suzuki et al used a single MinION flow cell to sequence a transgenic mouse, obtaining 3 Gb of data (1× hgc; 922,210 reads; N50 = 7.6 kb). A 21.5 kb read covering one and a half copies of the transgene allowed successful integration mapping [106]. Another group investigated Cre-deleter mouse lines that failed to yield homozygotes in PCR screenings. TLA identified a 95 kb tandem duplication close to the floxed cassette in the gene of interest with unedited coding sequence. Three Nanopore runs produced 13 Gb (4.4× hgc; 699,343 reads; N50 = 40.7 kb), identifying 9 on-target reads and unambiguously resolving the rearrangement [22]. Giraldo et al sequenced transgenic crops using one flow cell per sample and obtained 7.3–10.4 Gb with sufficient on-target coverage, though average read lengths varied from 1.6 to 12 kb [113]. In a soybean study, Li et al generated 2.8 Gb (2.5× hgc; 1,061,117 reads) and found two reads spanning transgene–genome junctions. The results confirmed the site previously mapped by TAIL-PCR, highlighting the latter’s cost-efficiency [112].

These examples illustrate that running a single MinION may yield only a few useful reads and become a costly endeavor as transgenes represent only ~0.01% of the genome. Enrichment strategies are often necessary when working with transgene mapping. In contrast to NGS-based enrichment methods, LRS approaches must preserve long DNA fragments. Two commonly used strategies—hybrid capture and Cas9 digestion—are compatible with LRS.

For PacBio, DNA is usually fragmented and size-selected to ~10–20 kb, while Nanopore sequencing often uses high-molecular-weight DNA [114]. Biotinylated probe enrichment for PacBio has been used to enrich symbiont genomes by 11–200× [115] or blood group system loci by 737x [114]. Biotin-based PacBio enrichment method, LIFE-seq, was introduced by Zhang et al [104]. This method uses 75 nt tiling probes to cover known plasmid sequences (~99% coverage). Seven transgenic crop samples were enriched and sequenced, yielding 1.8–2.7 Gb per sample. On average, 17,000–25,000 unique CCS reads (average length ~6 kb, N50 ~17 kb) were obtained. These data enabled mapping of insertion sites and partial concatemer reconstruction. Biotin enrichment was also applied to Nanopore sequencing. In the soybean study mentioned earlier, enrichment allowed identification of 51 transposon integration sites from a single Nanopore flow cell [112]. Although probe synthesis is costly and may reduce read length during sample preparation [115], this strategy avoids transgene fragmentation and does not require preservation of transgene ends. Other enrichment strategies for LRS include sonication-based inverse PCR (SIP) [28] and TLA-seq [116], although these are complex and less standardized than Cas9-based methods.

The CRISPR/Cas9 system has become a favored tool for target enrichment. In this approach, guide RNAs define cleavage points in the genome or transgene, producing ligation-compatible ends. Though PacBio-compatible [117,118], most applications in transgene mapping use Nanopore. One widely adopted method is nCATS (Nanopore Cas9-Targeted Sequencing), where high-molecular-weight DNA is dephosphorylated, treated with Cas9–gRNA RNPs, and only the phosphorylated cut ends are ligated to Nanopore adapters [107]. nCATS method has become very popular for human diagnostics with enrichment of targeted regions of 25x [119], 665x [120], >100x [121].

nCATS has been successfully applied to transgene mapping in various organisms [23,89,122]. Low et al used nCATS to confirm site-specific integration of human ACE2 transgene into the Rosa26 locus via Bxb1-mediated recombination [23]. With one flow cell they achieved 195× coverage of an 8.5 kb cassette. In the same study, they sequenced mouse line with random multicopy integration of a similar transgene, and two 70–80 kb contigs were identified which contained the transgene-genome borders [23]. Other group compared nCATS, TLA, and Southern blotting to map transgene insertions in CHO cells [90]. For small transgenes (3–6 copies), nCATS produced contigs up to 41.6 kb from 22 reads and successfully resolved rearrangements. Notably, this allowed confirmation of peculiar Southern blot results obtained earlier—demonstrating the continuity of two mapping technologies [90]. nCATS is now supported by an official Nanopore protocol, but it requires prior knowledge of flanking sequences and would not be useful for initial transgene mapping.

Alternative Cas9 enrichment method is based on the same principle but DNA is digested inside the transgene region (Figure 2D). Funnily enough, this otherwise straightforward approach still lacks a definitive and concise acronym. The method is inconsistently named across publications and is referred to as “Targeted Cas9 sequencing” in the official Nanopore protocol — a term easily confused with nCATS, which, unlike this method, requires prior knowledge of the flanking sequences. For clarity, we propose a temporary name: CHAD (CRISPR-based Homing for Anchored Detection). (As a side note: given how much scientists enjoy inventing acronyms — see the many creative efforts for TAIL-PCR modifications — it might be time to standardize the terminology, especially considering the growing popularity of the CHAD approach.) One of the first applications of this strategy was AFIS-seq (Amplification-Free Integration Site sequencing), which mapped lentiviral integrations using paired Cas9 cuts inside the transgene. Enrichment ranged from 285–1612×, with average read lengths of ~12 kb [44]. In comparison to NGS-based S-EPTS/LM-PCR method, AFIS-seq provided fewer ambiguous reads thanks to longer sequencing length. McDonald et al applied CHAD with a single cut to human samples to study mobile elements. One flow cell yielded ~110,000 reads, 31% of which were on-target (54× enrichment; N50 = 25 kb) [123]. Similarly, Hertel et al used dual cuts flanking eGFP transgene in CHO cells, achieving 86–244× enrichment and revealing unplanned random integrations [124]. Bryant et al used CRISPR-LRS with paired gRNAs to map several transgenes in mice. For a 217 kb BAC, 9 reads (0.03%) spanned transgene-genome borders. With extra guides, enrichment improved to 0.15–0.35%. However, internal concatemer structure was lost due to Cas9 fragmentation: in the Sm22-Cre mouse line where qPCR detected ~20 copies, Nanopore only detected a max of 4 per read [24]. Ironically, WGS of the ultra-high molecular weight (HMW) DNA with Nanopore generated more useful detail in a few reads (6 selected reads, 89 kb average read length) than Cas9 enrichment due to the longer read sizes. We also applied CHAD to a 5 kb hACE2 concatemer (~70 copies). Nanopore WGS (0.25× genome coverage) yielded 15 transgene reads, while CHAD produced 864 reads longer than 3 kb, mapping one border at the cost of losing internal concatemer structure [108]. We suspect that reads with the second transgene-genome border were lost because we enriched with only one Cas9 site instead of two (Figure 2D). Importantly, Cas9 often blocks the PAM-distal end [125], hindering adapter ligation and reducing coverage in the respective direction by 2–10× [119,120,126]. Thermolabile Proteinase K treatment [126] or using Cpf1, which does not block ends, may help to improve nuclease-based targeting [127].

Finally, a novel Nanopore-compatible method, Xdrop, offers an original approach to target enrichment [109,128]. In this technique, the target locus is captured indirectly using a short PCR amplicon that is designed to lie within or near the region of interest, such as a transgene. HMW genomic DNA is mixed with PCR reagents and primers, and encapsulated in droplets using an oil emulsion system. During the droplet PCR fluorescence is triggered by an intercalating dye only in droplets that contain the specific target DNA. Typically, only about 0.01% of the double emulsion droplets will contain the desired fragment. These fluorescent droplets are then isolated via FACS (fluorescence-activated cell sorting) and subjected to single-molecule multiple displacement amplification (dMDA) to amplify the enriched genomic DNA. The resulting product is then sequenced using the Nanopore platform [109,128].

Early publications have already demonstrated the successful use of this method to map transgenes in mice [109] and plants [110], as well as to detect complex genomic rearrangements in human cells [16,109]. These studies show that indirect targeting by droplet PCR provides very high enrichment levels (100× to 3000×) and enables detailed resolution of internal rearrangements, albeit at the cost of reduced average read length (around 5 kb) [109]. Given the technical complexity and specialized instrumentation involved, it is unlikely that Xdrop will be used routinely for mapping transgenes in animal models. However, one clear advantage is that indirect enrichment preserves the internal structure of concatemers, which is often lost in Cas9-based methods.

Ultimately, we would recommend the CHAD approach for most transgene mapping scenarios. While a typical Nanopore run on a standard flow cell may yield only 3–5 reads per million reads covering a transgene border—sometimes with no guarantee of successful mapping—Cas9 enrichment offers a more targeted and controlled strategy, and it is not especially difficult to implement. One full run using this method requires a single flow cell (~$800) and a library prep kit (~$200), both of which can potentially be reused, making it cost-effective for many labs (Table 1). Unfortunately, CHAD destroys the internal concatemer structure unlike the original nCATS, where the cuts are introduced in the flanking sequences and preserve the concatemer structure, with up to 30-100 kp inside concatemer, which could be enough to assemble the whole insert, depending on the transgene size (Clappier et al., 2023).

Compared to PCR-based techniques, there are few disadvantages to LRS, aside from the requirement for larger quantities of high-molecular-weight genomic DNA (in the microgram range), which is usually not a problem when working with animal tissue, but may be a problem with valuably founders or tiny model animals. At the same time, it’s important to note that current Cas9 enrichment workflows generally lead to low sequencing coverage, making them unsuitable for applications requiring single-nucleotide resolution, such as precise indel detection or barcode identification.

Figure 3. Decision tree for choosing a mapping method.

2. Conclusions

Transgene mapping remains a critical yet technically diverse task, with no universal solution. In this review, we evaluated a range of available methods — from classic PCR-based genome walking to advanced enrichment protocols for NGS and LRS (Figure 3). For small-scale projects or initial screening, we recommend hiTAIL-PCR as a low-cost and accessible method. It requires minimal optimization and demonstrates high success rates, especially when transgene ends are preserved. For reliable integration site search, WGS and TLA allow high-throughput mapping, though they could be costly and typically require access to sequencing facilities and bioinformatics support. When long-range information is essential, particularly in concatemer inserts or rearranged regions, Nanopore sequencing combined with Cas9-based enrichment (e.g., CHAD) is currently the most promising approach, because it enables sequencing of long DNA fragments for easy alignments. However, LRS methods are still evolving and can be technically demanding, with variable enrichment efficiency and sensitivity to DNA quality. Looking ahead, the future of transgene mapping is promising. Perhaps in five years emerging techniques such as adaptive sampling, AI-enhanced base calling, and real-time alignment filtering will likely make LRS accessible and targeted to specific regions. This will signify the end of the old genome walking era, but until than we have to keep walking.

Author Contributions

Conceptualization, A.S.; Visualization, A.S., A.Y.; Writing — Original Draft Preparation, all authors; Writing—Review & Editing, all authors. All authors have read and agreed to the published version of the manuscript.

Funding

Preparation of the review was supported by Russian Science Foundation grant #22-74-10013.

Acknowledgments

Comparison of costs and labor time for transgene mapping methods was performed by Maksim Makarenko and supported by the grant of the state program of the «Sirius» Federal Territory «Scientific and technological development of the «Sirius» Federal Territory» (Agreement №26-03, 27/09/2024). Access to the article publisher sites for data analysis was provided by the Ministry of Education and Science of the Russia Federation, state project FWNR-2022-0019.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NGS	Next-generation sequencing
LRS	Long-read sequencing
TLA	Targeted locus amplification
WGS	Whole-genome sequencing

References

Tonooka, Y.; Fujishima, M. Comparison and Critical Evaluation of PCR-Mediated Methods to Walk along the Sequence of Genomic DNA. Appl Microbiol Biotechnol 2009, 85, 37–43. [Google Scholar] [CrossRef] [PubMed]
Leoni, C.; Volpicella, M.; De Leo, F.; Gallerani, R.; Ceci, L.R. Genome Walking in Eukaryotes. The FEBS Journal 2011, 278, 3953–3977. [Google Scholar] [CrossRef] [PubMed]
Zhu, Z.; Lu, S.; Wang, H.; Wang, F.; Xu, W.; Zhu, Y.; Xue, J.; Yang, L. Innovations in Transgene Integration Analysis: A Comprehensive Review of Enrichment and Sequencing Strategies in Biotechnology. ACS Appl. Mater. Interfaces 2025, 17, 2716–2735. [Google Scholar] [CrossRef] [PubMed]
Vandereyken, K.; Sifrim, A.; Thienpont, B.; Voet, T. Methods and Applications for Single-Cell and Spatial Multi-Omics. Nat Rev Genet 2023, 24, 494–515. [Google Scholar] [CrossRef]
Nakanishi, T.; Kuroiwa, A.; Yamada, S.; Isotani, A.; Yamashita, A.; Tairaka, A.; Hayashi, T.; Takagi, T.; Ikawa, M.; Matsuda, Y.; et al. FISH Analysis of 142 EGFP Transgene Integration Sites into the Mouse Genome. Genomics 2002, 80, 564–574. [Google Scholar] [CrossRef]
Bandyopadhyay, A.A.; O’Brien, S.A.; Zhao, L.; Fu, H.; Vishwanathan, N.; Hu, W. Recurring Genomic Structural Variation Leads to Clonal Instability and Loss of Productivity. Biotech & Bioengineering 2019, 116, 41–53. [Google Scholar] [CrossRef]
Lee, J.S.; Kildegaard, H.F.; Lewis, N.E.; Lee, G.M. Mitigating Clonal Variation in Recombinant Mammalian Cell Lines. Trends in Biotechnology 2019, 37, 931–942. [Google Scholar] [CrossRef]
Dhiman, H.; Campbell, M.; Melcher, M.; Smith, K.D.; Borth, N. Predicting Favorable Landing Pads for Targeted Integrations in Chinese Hamster Ovary Cell Lines by Learning Stability Characteristics from Random Transgene Integrations. Computational and Structural Biotechnology Journal 2020, 18, 3632–3648. [Google Scholar] [CrossRef]
Cabrera, A.; Edelstein, H.I.; Glykofrydis, F.; Love, K.S.; Palacios, S.; Tycko, J.; Zhang, M.; Lensch, S.; Shields, C.E.; Livingston, M.; et al. The Sound of Silence: Transgene Silencing in Mammalian Cell Engineering. Cell Systems 2022, 13, 950–973. [Google Scholar] [CrossRef]
Laboulaye, M.A.; Duan, X.; Qiao, M.; Whitney, I.E.; Sanes, J.R. Mapping Transgene Insertion Sites Reveals Complex Interactions Between Mouse Transgenes and Neighboring Endogenous Genes. Front. Mol. Neurosci. 2018, 11, 385. [Google Scholar] [CrossRef]
Yan, B.-W.; Zhao, Y.-F.; Cao, W.-G.; Li, N.; Gou, K.-M. Mechanism of Random Integration of Foreign DNA in Transgenic Mice. Transgenic Res 2013, 22, 983–992. [Google Scholar] [CrossRef] [PubMed]
Goodwin, L.O.; Splinter, E.; Davis, T.L.; Urban, R.; He, H.; Braun, R.E.; Chesler, E.J.; Kumar, V.; Van Min, M.; Ndukum, J.; et al. Large-Scale Discovery of Mouse Transgenic Integration Sites Reveals Frequent Structural Variation and Insertional Mutagenesis. Genome Res. 2019, 29, 494–505. [Google Scholar] [CrossRef] [PubMed]
Halurkar, M.S.; Inoue, O.; Singh, A.; Mukherjee, R.; Ginugu, M.; Ahn, C.; Bonatto Paese, C.L.; Duszynski, M.; Brugmann, S.A.; Lim, H.-W.; et al. The Widely Used Ucp1-Cre Transgene Elicits Complex Developmental and Metabolic Phenotypes. Nat Commun 2025, 16, 770. [Google Scholar] [CrossRef]
Cain-Hom, C.; Splinter, E.; van Min, M.; Simonis, M.; van de Heijning, M.; Martinez, M.; Asghari, V.; Cox, J.C.; Warming, S. Efficient Mapping of Transgene Integration Sites and Local Structural Changes in Cre Transgenic Mice Using Targeted Locus Amplification. Nucleic Acids Res 2017, gkw1329. [Google Scholar] [CrossRef]
Nicholls, P.K.; Bellott, D.W.; Cho, T.-J.; Pyntikova, T.; Page, D.C. Locating and Characterizing a Transgene Integration Site by Nanopore Sequencing. G3 Genes|Genomes|Genetics 2019, 9, 1481–1486. [Google Scholar] [CrossRef]
Geng, K.; Merino, L.G.; Wedemann, L.; Martens, A.; Sobota, M.; Sanchez, Y.P.; Søndergaard, J.N.; White, R.J.; Kutter, C. Target-Enriched Nanopore Sequencing and de Novo Assembly Reveals Co-Occurrences of Complex on-Target Genomic Rearrangements Induced by CRISPR-Cas9 in Human Cells. Genome Res. 2022, genome;gr.276901.122v2. [CrossRef]
Guirouilh-Barbat, J.; Lambert, S.; Bertrand, P.; Lopez, B.S. Is Homologous Recombination Really an Error-Free Process? Front. Genet. 2014, 5. [Google Scholar] [CrossRef]
Smirnov, A.; Battulin, N. Concatenation of Transgenic DNA: Random or Orchestrated? Genes 2021, 12, 1969. [Google Scholar] [CrossRef]
Chiang, C.; Jacobsen, J.C.; Ernst, C.; Hanscom, C.; Heilbut, A.; Blumenthal, I.; Mills, R.E.; Kirby, A.; Lindgren, A.M.; Rudiger, S.R.; et al. Complex Reorganization and Predominant Non-Homologous Repair Following Chromosomal Breakage in Karyotypically Balanced Germline Rearrangements and Transgenic Integration. Nat Genet 2012, 44, 390–397. [Google Scholar] [CrossRef]
Hussmann, J.A.; Ling, J.; Ravisankar, P.; Yan, J.; Cirincione, A.; Xu, A.; Simpson, D.; Yang, D.; Bothmer, A.; Cotta-Ramusino, C.; et al. Mapping the Genetic Landscape of DNA Double-Strand Break Repair. Cell 2021, 184, 5653–5669.e25. [Google Scholar] [CrossRef]
Ohigashi, I.; Yamasaki, Y.; Hirashima, T.; Takahama, Y. Identification of the Transgenic Integration Site in Immunodeficient Tgε26 Human CD3ε Transgenic Mice. PLoS ONE 2010, 5, e14391. [Google Scholar] [CrossRef]
Sailer, S.; Coassin, S.; Lackner, K.; Fischer, C.; McNeill, E.; Streiter, G.; Kremser, C.; Maglione, M.; Green, C.M.; Moralli, D.; et al. When the Genome Bluffs: A Tandem Duplication Event during Generation of a Novel Agmo Knockout Mouse Model Fools Routine Genotyping. Cell Biosci 2021, 11, 54. [Google Scholar] [CrossRef] [PubMed]
Low, B.E.; Hosur, V.; Lesbirel, S.; Wiles, M.V. Efficient Targeted Transgenesis of Large Donor DNA into Multiple Mouse Genetic Backgrounds Using Bacteriophage Bxb1 Integrase. Sci Rep 2022, 12, 5424. [Google Scholar] [CrossRef] [PubMed]
Bryant, W.B.; Yang, A.; Griffin, S.H.; Zhang, W.; Rafiq, A.M.; Han, W.; Deak, F.; Mills, M.K.; Long, X.; Miano, J.M. CRISPR-Cas9 Long-Read Sequencing for Mapping Transgenes in the Mouse Genome. The CRISPR Journal 2023, 6, 163–175. [Google Scholar] [CrossRef]
Hui, E.K.-W.; Wang, P.-C.; Lo, S.J. PCR-Based Strategies to Clone Unknown DNA Regions from Known Foreign Integrants: An Overview. In PCR Cloning Protocols; Humana Press: New Jersey, 2002; ISBN 978-1-59259-177-0. [Google Scholar]
Kalendar, R.; Shustov, A.V.; Seppänen, M.M.; Schulman, A.H.; Stoddard, F.L. Palindromic Sequence-Targeted (PST) PCR: A Rapid and Efficient Method for High-Throughput Gene Characterization and Genome Walking. Sci Rep 2019, 9, 17707. [Google Scholar] [CrossRef]
Hamada, M.; Nishio, N.; Okuno, Y.; Suzuki, S.; Kawashima, N.; Muramatsu, H.; Tsubota, S.; Wilson, M.H.; Morita, D.; Kataoka, S.; et al. Integration Mapping of piggyBac-Mediated CD19 Chimeric Antigen Receptor T Cells Analyzed by Novel Tagmentation-Assisted PCR. EBioMedicine 2018, 34, 18–26. [Google Scholar] [CrossRef]
Alquezar-Planas, D.E.; Löber, U.; Cui, P.; Quedenau, C.; Chen, W.; Greenwood, A.D. DNA Sonication Inverse PCR for Genome Scale Analysis of Uncharacterized Flanking Sequences. Methods Ecol Evol 2021, 12, 182–195. [Google Scholar] [CrossRef]
Triglia, T.; Peterson, M.G.; Kemp, D.J. A Procedure for in Vitro Amplification of DNA Segments That Lie Outside the Boundaries of Known Sequences. Nucl Acids Res 1988, 16, 8186–8186. [Google Scholar] [CrossRef]
Ochman, H.; Gerber, A.S.; Hartl, D.L. Genetic Applications of an Inverse Polymerase Chain Reaction. Genetics 1988, 120, 621–623. [Google Scholar] [CrossRef]
Schep, R.; Leemans, C.; Brinkman, E.K.; Van Schaik, T.; Van Steensel, B. Protocol: A Multiplexed Reporter Assay to Study Effects of Chromatin Context on DNA Double-Strand Break Repair. Front. Genet. 2022, 12, 785947. [Google Scholar] [CrossRef]
Smirnov, A.; Fishman, V.; Yunusova, A.; Korablev, A.; Serova, I.; Skryabin, B.V.; Rozhdestvensky, T.S.; Battulin, N. DNA Barcoding Reveals That Injected Transgenes Are Predominantly Processed by Homologous Recombination in Mouse Zygote. Nucleic Acids Research 2019, gkz1085. [Google Scholar] [CrossRef]
O’Malley, R.C.; Alonso, J.M.; Kim, C.J.; Leisse, T.J.; Ecker, J.R. An Adapter Ligation-Mediated PCR Method for High-Throughput Mapping of T-DNA Inserts in the Arabidopsis Genome. Nat Protoc 2007, 2, 2910–2917. [Google Scholar] [CrossRef] [PubMed]
Yu, D.; Zhou, T.; Sun, X.; Sun, Z.; Sheng, X.; Tan, Y.; Liu, L.; Ouyang, N.; Xu, K.; Shi, K.; et al. Cyclic Digestion and Ligation-Mediated PCR Used for Flanking Sequence Walking. Sci Rep 2020, 10, 3434. [Google Scholar] [CrossRef]
Lung, J.; Hung, M.-S.; Chen, C.-Y.; Yang, T.-M.; Lin, C.-K.; Fang, Y.-H.; Jiang, Y.-Y.; Liao, H.-F.; Lin, Y.-C. An Optimized Ligation-Mediated PCR Method for Chromosome Walking and Fusion Gene Chromosomal Breakpoints Identification. Biology Methods and Protocols 2024, 9, bpae037. [Google Scholar] [CrossRef] [PubMed]
Uren, A.G.; Mikkers, H.; Kool, J.; Van Der Weyden, L.; Lund, A.H.; Wilson, C.H.; Rance, R.; Jonkers, J.; Van Lohuizen, M.; Berns, A.; et al. A High-Throughput Splinkerette-PCR Method for the Isolation and Sequencing of Retroviral Insertion Sites. Nat Protoc 2009, 4, 789–798. [Google Scholar] [CrossRef]
Potter, C.J.; Luo, L. Splinkerette PCR for Mapping Transposable Elements in Drosophila. PLoS ONE 2010, 5, e10168. [Google Scholar] [CrossRef]
Dambrot, C.; Buermans, H.P.J.; Varga, E.; Kosmidis, G.; Langenberg, K.; Casini, S.; Elliott, D.A.; Dinnyes, A.; Atsma, D.E.; Mummery, C.L.; et al. Strategies for Rapidly Mapping Proviral Integration Sites and Assessing Cardiogenic Potential of Nascent Human Induced Pluripotent Stem Cell Clones. Experimental Cell Research 2014, 327, 297–306. [Google Scholar] [CrossRef]
Jia, W.; Guan, Z.; Shi, S.; Xiang, K.; Chen, P.; Tan, F.; Ullah, N.; Diaby, M.; Guo, M.; Song, C.; et al. The Annotation of Zebrafish Enhancer Trap Lines Generated with PB Transposon. CIMB 2022, 44, 2614–2621. [Google Scholar] [CrossRef]
Sato, M.; Inada, E.; Saitoh, I.; Nakamura, S.; Watanabe, S. In Vivo Piggybac-Based Gene Delivery towards Murine Pancreatic Parenchyma Confers Sustained Expression of Gene of Interest. IJMS 2019, 20, 3116. [Google Scholar] [CrossRef]
Han, H.-J.; Kim, D.H.; Baik, J.Y. A Splinkerette PCR-Based Genome Walking Technique for the Identification of Transgene Integration Sites in CHO Cells. Journal of Biotechnology 2023, 371–372, 1–9. [Google Scholar] [CrossRef]
Schmidt, M.; Schwarzwaelder, K.; Bartholomae, C.; Zaoui, K.; Ball, C.; Pilz, I.; Braun, S.; Glimm, H.; Von Kalle, C. High-Resolution Insertion-Site Analysis by Linear Amplification–Mediated PCR (LAM-PCR). Nat Methods 2007, 4, 1051–1057. [Google Scholar] [CrossRef]
Gabriel, R.; Eckenberg, R.; Paruzynski, A.; Bartholomae, C.C.; Nowrouzi, A.; Arens, A.; Howe, S.J.; Recchia, A.; Cattoglio, C.; Wang, W.; et al. Comprehensive Genomic Access to Vector Integration in Clinical Gene Therapy. Nat Med 2009, 15, 1431–1436. [Google Scholar] [CrossRef] [PubMed]
van Haasteren, J.; Munis, A.M.; Gill, D.R.; Hyde, S.C. Genome-Wide Integration Site Detection Using Cas9 Enriched Amplification-Free Long-Range Sequencing. Nucleic Acids Research 2021, 49. [Google Scholar] [CrossRef] [PubMed]
Singer, T.; Burke, E. High-Throughput TAIL-PCR as a Tool to Identify DNA Flanking Insertions. In Plant Functional Genomics; Humana Press: New Jersey, 2003; ISBN 978-1-59259-413-9. [Google Scholar]
Liu, Y.-G.; Whittier, R.F. Thermal Asymmetric Interlaced PCR: Automatable Amplification and Sequencing of Insert End Fragments from P1 and YAC Clones for Chromosome Walking. Genomics 1995, 25, 674–681. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.-G.; Chen, Y. High-Efficiency Thermal Asymmetric Interlaced PCR for Amplification of Unknown Flanking Sequences. BioTechniques 2007, 43, 649–656. [Google Scholar] [CrossRef]
Zhang, H.; Xu, W.; Feng, Z.; Hong, Z. A Low Degenerate Primer Pool Improved the Efficiency of High-Efficiency Thermal Asymmetric Interlaced PCR to Amplify T-DNA Flanking Sequences in Arabidopsis Thaliana. 3 Biotech 2018, 8, 14. [Google Scholar] [CrossRef]
Wu, L.; Di, D.-W.; Zhang, D.; Song, B.; Luo, P.; Guo, G.-Q. Frequent Problems and Their Resolutions by Using Thermal Asymmetric Interlaced PCR (TAIL-PCR) to Clone Genes in Arabidopsis T-DNA Tagged Mutants. Biotechnology & Biotechnological Equipment 2015, 29, 260–267. [Google Scholar] [CrossRef]
Luo, W.; Li, Z.; Huang, Y.; Han, Y.; Yao, C.; Duan, X.; Ouyang, H.; Li, L. Generation of AQP2-Cre Transgenic Mini-Pigs Specifically Expressing Cre Recombinase in Kidney Collecting Duct Cells. Transgenic Res 2014, 23, 365–375. [Google Scholar] [CrossRef]
Zhang, R.; Yin, Y.; Zhang, Y.; Li, K.; Zhu, H.; Gong, Q.; Wang, J.; Hu, X.; Li, N. Molecular Characterization of Transgene Integration by Next-Generation Sequencing in Transgenic Cattle. PLoS ONE 2012, 7, e50348. [Google Scholar] [CrossRef] [PubMed]
Zelensky, A.N.; Schimmel, J.; Kool, H.; Kanaar, R.; Tijsterman, M. Inactivation of Pol θ and C-NHEJ Eliminates off-Target Integration of Exogenous DNA. Nat Commun 2017, 8, 66. [Google Scholar] [CrossRef]
Kondrychyn, I.; Garcia-Lecea, M.; Emelyanov, A.; Parinov, S.; Korzh, V. Genome-Wide Analysis of Tol2 Transposon Reintegration in Zebrafish. BMC Genomics 2009, 10, 418. [Google Scholar] [CrossRef]
Johansson, O.N.; Töpel, M.; Pinder, M.I.M.; Kourtchenko, O.; Blomberg, A.; Godhe, A.; Clarke, A.K. Skeletonema Marinoi as a New Genetic Model for Marine Chain-Forming Diatoms. Sci Rep 2019, 9, 5391. [Google Scholar] [CrossRef] [PubMed]
Gong, W.; Zhou, Y.; Wang, R.; Wei, X.; Zhang, L.; Dai, Y.; Zhu, Z. Analysis of T-DNA Integration Events in Transgenic Rice. Journal of Plant Physiology 2021, 266, 153527. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Jia, M.; Li, Z.; Liu, X.; Sun, T.; Pei, J.; Wei, C.; Lin, Z.; Li, H. Wristwatch PCR: A Versatile and Efficient Genome Walking Strategy. Front. Bioeng. Biotechnol. 2022, 10, 792848. [Google Scholar] [CrossRef] [PubMed]
Pan, H.; Guo, X.; Pan, Z.; Wang, R.; Tian, B.; Li, H. Fork PCR: A Universal and Efficient Genome-Walking Tool. Front. Microbiol. 2023, 14, 1265580. [Google Scholar] [CrossRef]
Li, H.; Lin, Z.; Guo, X.; Pan, Z.; Pan, H.; Wang, D. Primer Extension Refractory PCR: An Efficient and Reliable Genome Walking Method. Mol Genet Genomics 2024, 299, 27. [Google Scholar] [CrossRef]
Kalendar, R.; Shustov, A.V.; Schulman, A.H. Palindromic Sequence-Targeted (PST) PCR, Version 2: An Advanced Method for High-Throughput Targeted Gene Characterization and Transposon Display. Front. Plant Sci. 2021, 12, 691940. [Google Scholar] [CrossRef]
Burkov, I.A.; Serova, I.A.; Battulin, N.R.; Smirnov, A.V.; Babkin, I.V.; Andreeva, L.E.; Dvoryanchikov, G.A.; Serov, O.L. Expression of the Human Granulocyte–Macrophage Colony Stimulating Factor (hGM-CSF) Gene under Control of the 5′-Regulatory Sequence of the Goat Alpha-S1-Casein Gene with and without a MAR Element in Transgenic Mice. Transgenic Res 2013, 22, 949–964. [Google Scholar] [CrossRef]
Serova, I.A.; Dvoryanchikov, G.A.; Andreeva, L.E.; Burkov, I.A.; Dias, L.P.B.; Battulin, N.R.; Smirnov, A.V.; Serov, O.L. A 3,387 Bp 5′-Flanking Sequence of the Goat Alpha-S1-Casein Gene Provides Correct Tissue-Specific Expression of Human Granulocyte Colony-Stimulating Factor (hG-CSF) in the Mammary Gland of Transgenic Mice. Transgenic Res 2012, 21, 485–498. [Google Scholar] [CrossRef]
Smirnov, A.V.; Kontsevaya, G.V.; Feofanova, N.A.; Anisimova, M.V.; Serova, I.A.; Gerlinskaya, L.A.; Battulin, N.R.; Moshkin, M.P.; Serov, O.L. Unexpected Phenotypic Effects of a Transgene Integration Causing a Knockout of the Endogenous Contactin-5 Gene in Mice. Transgenic Res 2018, 27, 1–13. [Google Scholar] [CrossRef]
Le Saux, A.; Houdebine, L.-M.; Jolivet, G. Chromosome Integration of BAC (Bacterial Artificial Chromosome): Evidence of Multiple Rearrangements. Transgenic Res 2010, 19, 923–931. [Google Scholar] [CrossRef]
Won, M.; Dawid, I.B. PCR Artifact in Testing for Homologous Recombination in Genomic Editing in Zebrafish. PLoS ONE 2017, 12, e0172802. [Google Scholar] [CrossRef] [PubMed]
Pillai, M.M.; Venkataraman, G.M.; Kosak, S.; Torok-Storb, B. Integration Site Analysis in Transgenic Mice by Thermal Asymmetric Interlaced (TAIL)-PCR: Segregating Multiple-Integrant Founder Lines and Determining Zygosity. Transgenic Res 2008, 17, 749–754. [Google Scholar] [CrossRef] [PubMed]
Giani, A.M.; Gallo, G.R.; Gianfranceschi, L.; Formenti, G. Long Walk to Genomics: History and Current Approaches to Genome Sequencing and Assembly. Computational and Structural Biotechnology Journal 2020, 18, 9–19. [Google Scholar] [CrossRef] [PubMed]
Ji, Y.; Abrams, N.; Zhu, W.; Salinas, E.; Yu, Z.; Palmer, D.C.; Jailwala, P.; Franco, Z.; Roychoudhuri, R.; Stahlberg, E.; et al. Identification of the Genomic Insertion Site of Pmel-1 TCR α and β Transgenes by Next-Generation Sequencing. PLoS ONE 2014, 9, e96650. [Google Scholar] [CrossRef]
Yong, C.S.M.; Sharkey, J.; Duscio, B.; Venville, B.; Wei, W.-Z.; Jones, R.F.; Slaney, C.Y.; Mir Arnau, G.; Papenfuss, A.T.; Schröder, J.; et al. Embryonic Lethality in Homozygous Human Her-2 Transgenic Mice Due to Disruption of the Pds5b Gene. PLoS ONE 2015, 10, e0136817. [Google Scholar] [CrossRef]
Srivastava, S.K.; Wolinski, P.; Pereira, A. A Strategy for Genome-Wide Identification of Gene Based Polymorphisms in Rice Reveals Non-Synonymous Variation and Functional Genotypic Markers. PLoS ONE 2014, 9, e105335. [Google Scholar] [CrossRef]
Owen, J.R.; Hennig, S.L.; McNabb, B.R.; Mansour, T.A.; Smith, J.M.; Lin, J.C.; Young, A.E.; Trott, J.F.; Murray, J.D.; Delany, M.E.; et al. One-Step Generation of a Targeted Knock-in Calf Using the CRISPR-Cas9 System in Bovine Zygotes. BMC Genomics 2021, 22, 118. [Google Scholar] [CrossRef]
Carlson, D.F.; Lancto, C.A.; Zang, B.; Kim, E.-S.; Walton, M.; Oldeschulte, D.; Seabury, C.; Sonstegard, T.S.; Fahrenkrug, S.C. Production of Hornless Dairy Cattle from Genome-Edited Cell Lines. Nat Biotechnol 2016, 34, 479–481. [Google Scholar] [CrossRef]
Young, A.E.; Mansour, T.A.; McNabb, B.R.; Owen, J.R.; Trott, J.F.; Brown, C.T.; Van Eenennaam, A.L. Genomic and Phenotypic Analyses of Six Offspring of a Genome-Edited Hornless Bull. Nat Biotechnol 2020, 38, 225–232. [Google Scholar] [CrossRef]
Niu, L.; He, H.; Zhang, Y.; Yang, J.; Zhao, Q.; Xing, G.; Zhong, X.; Yang, X. Efficient Identification of Genomic Insertions and Flanking Regions through Whole-Genome Sequencing in Three Transgenic Soybean Events. Transgenic Res 2021, 30, 1–9. [Google Scholar] [CrossRef]
Guo, B.; Guo, Y.; Hong, H.; Qiu, L.-J. Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method. Front. Plant Sci. 2016, 7. [Google Scholar] [CrossRef] [PubMed]
Xu, W.; Zhang, H.; Zhang, Y.; Shen, P.; Li, X.; Li, R.; Yang, L. A Paired-End Whole-Genome Sequencing Approach Enables Comprehensive Characterization of Transgene Integration in Rice. Commun Biol 2022, 5, 667. [Google Scholar] [CrossRef] [PubMed]
Kovalic, D.; Garnaat, C.; Guo, L.; Yan, Y.; Groat, J.; Silvanovich, A.; Ralston, L.; Huang, M.; Tian, Q.; Christian, A.; et al. The Use of Next Generation Sequencing and Junction Sequence Analysis Bioinformatics to Achieve Molecular Characterization of Crops Improved Through Modern Biotechnology. The Plant Genome 2012, 5, plantgenome2012.10–0026. [Google Scholar] [CrossRef]
Volpicella, M.; Leoni, C.; Costanza, A.; Fanizza, I.; Placido, A.; Ceci, L.R. Genome Walking by Next Generation Sequencing Approaches. Biology 2012, 1, 495–507. [Google Scholar] [CrossRef]
Zhao, S.; Wang, Y.; Zhu, Z.; Chen, P.; Liu, W.; Wang, C.; Lu, H.; Xiang, Y.; Liu, Y.; Qian, Q.; et al. Streamlined Whole-Genome Genotyping through NGS-Enhanced Thermal Asymmetric Interlaced (TAIL)-PCR. Plant Communications 2024, 5, 100983. [Google Scholar] [CrossRef]
Salnikov, P.A.; Khabarova, A.A.; Koksharova, G.S.; Mungalov, R.V.; Belokopytova, P.S.; Pristyazhnuk, I.E.; Nurislamov, A.R.; Somatich, P.; Gridina, M.M.; Fishman, V.S. Here and There: The Double-Side Transgene Localization. Vestn. VOGiS 2021, 25, 607–612. [Google Scholar] [CrossRef]
Malekshoar, M.; Azimi, S.A.; Kaki, A.; Mousazadeh, L.; Motaei, J.; Vatankhah, M. CRISPR-Cas9 Targeted Enrichment and Next-Generation Sequencing for Mutation Detection. The Journal of Molecular Diagnostics 2023, 25, 249–262. [Google Scholar] [CrossRef]
De Vree, P.J.P.; De Wit, E.; Yilmaz, M.; Van De Heijning, M.; Klous, P.; Verstegen, M.J.A.M.; Wan, Y.; Teunissen, H.; Krijger, P.H.L.; Geeven, G.; et al. Targeted Sequencing by Proximity Ligation for Comprehensive Variant Detection and Local Haplotyping. Nat Biotechnol 2014, 32, 1019–1025. [Google Scholar] [CrossRef]
Wang, G.; Zhang, C.; Kambara, H.; Dambrot, C.; Xie, X.; Zhao, L.; Xu, R.; Oneglia, A.; Liu, F.; Luo, H.R. Identification of the Transgene Integration Site and Host Genome Changes in MRP8-Cre/Ires-EGFP Transgenic Mice by Targeted Locus Amplification. Front. Immunol. 2022, 13, 875991. [Google Scholar] [CrossRef]
Stadermann, A.; Gamer, M.; Fieder, J.; Lindner, B.; Fehrmann, S.; Schmidt, M.; Schulz, P.; Gorr, I.H. Structural Analysis of Random Transgene Integration in CHO Manufacturing Cell Lines by Targeted Sequencing. Biotech & Bioengineering 2022, 119, 868–880. [Google Scholar] [CrossRef]
Lefferts, J.W.; Boersma, V.; Hagemeijer, M.C.; Hajo, K.; Beekman, J.M.; Splinter, E. Targeted Locus Amplification and Haplotyping. In Haplotyping; Peters, B.A., Drmanac, R., Eds.; Methods in Molecular Biology; Springer US: New York, NY, 2023; Vol. 2590, pp. 31–48 ISBN 978-1-07-162818-8.
Tosh, J.L.; Rickman, M.; Rhymes, E.; Norona, F.E.; Clayton, E.; Mucke, L.; Isaacs, A.M.; Fisher, E.M.C.; Wiseman, F.K. The Integration Site of the APP Transgene in the J20 Mouse Model of Alzheimer’s Disease. Wellcome Open Res 2018, 2, 84. [Google Scholar] [CrossRef] [PubMed]
Hinteregger, B.; Loeffler, T.; Flunkert, S.; Neddens, J.; Birner-Gruenberger, R.; Bayer, T.A.; Madl, T.; Hutter-Paier, B. Transgene Integration Causes RARB Downregulation in Homozygous Tg4–42 Mice. Sci Rep 2020, 10, 6377. [Google Scholar] [CrossRef]
Wong, A.M.; Patel, T.P.; Altman, E.K.; Tugarinov, N.; Trivellin, G.; Yanovski, J.A. Characterization of the Adiponectin Promoter + Cre Recombinase Insertion in the Tg(Adipoq-Cre)1Evdr Mouse by Targeted Locus Amplification and Droplet Digital PCR. Adipocyte 2021, 10, 21–27. [Google Scholar] [CrossRef] [PubMed]
Fan, Y.; Chen, W.; Wei, R.; Qiang, W.; Pearson, J.D.; Yu, T.; Bremner, R.; Chen, D. Mapping Transgene Insertion Sites Reveals the α-Cre Transgene Expression in Both Developing Retina and Olfactory Neurons. Commun Biol 2022, 5, 411. [Google Scholar] [CrossRef]
Leitner, K.; Motheramgari, K.; Borth, N.; Marx, N. Nanopore Cas9-targeted Sequencing Enables Accurate and Simultaneous Identification of Transgene Integration Sites, Their Structure and Epigenetic Status in Recombinant Chinese Hamster Ovary Cells. Biotech & Bioengineering 2023, 120, 2403–2418. [Google Scholar] [CrossRef]
Clappier, C.; Böttner, D.; Heinzelmann, D.; Stadermann, A.; Schulz, P.; Schmidt, M.; Lindner, B. Deciphering Integration Loci of CHO Manufacturing Cell Lines Using Long Read Nanopore Sequencing. New Biotechnology 2023, 75, 31–39. [Google Scholar] [CrossRef]
DuBose, A.J.; Lichtenstein, S.T.; Narisu, N.; Bonnycastle, L.L.; Swift, A.J.; Chines, P.S.; Collins, F.S. Use of Microarray Hybrid Capture and Next-Generation Sequencing to Identify the Anatomy of a Transgene. Nucleic Acids Research 2013, 41, e70–e70. [Google Scholar] [CrossRef]
Magembe, E.M.; Li, H.; Taheri, A.; Zhou, S.; Ghislain, M. Identification of T-DNA Structure and Insertion Site in Transgenic Crops Using Targeted Capture Sequencing. Front. Plant Sci. 2023, 14, 1156665. [Google Scholar] [CrossRef]
Ohnuki, N.; Kobayashi, T.; Matsuo, M.; Nishikaku, K.; Kusama, K.; Torii, Y.; Inagaki, Y.; Hori, M.; Imakawa, K.; Satou, Y. A Target Enrichment High Throughput Sequencing System for Characterization of BLV Whole Genome Sequence, Integration Sites, Clonality and Host SNP. Sci Rep 2021, 11, 4521. [Google Scholar] [CrossRef]
Iwase, S.C.; Miyazato, P.; Katsuya, H.; Islam, S.; Yang, B.T.J.; Ito, J.; Matsuo, M.; Takeuchi, H.; Ishida, T.; Matsuda, K.; et al. HIV-1 DNA-Capture-Seq Is a Useful Tool for the Comprehensive Characterization of HIV-1 Provirus. Sci Rep 2019, 9, 12326. [Google Scholar] [CrossRef]
Li, X.; Chen, W.; Martin, B.K.; Calderon, D.; Lee, C.; Choi, J.; Chardon, F.M.; McDiarmid, T.A.; Daza, R.M.; Kim, H.; et al. Chromatin Context-Dependent Regulation and Epigenetic Manipulation of Prime Editing. Cell 2024, 187, 2411–2427.e25. [Google Scholar] [CrossRef] [PubMed]
Siddique, K.; Wei, J.; Li, R.; Zhang, D.; Shi, J. Identification of T-DNA Insertion Site and Flanking Sequence of a Genetically Modified Maize Event IE09S034 Using Next-Generation Sequencing Technology. Mol Biotechnol 2019, 61, 694–702. [Google Scholar] [CrossRef] [PubMed]
Peng, C.; Mei, Y.; Ding, L.; Wang, X.; Chen, X.; Wang, J.; Xu, J. Using Combined Methods of Genetic Mapping and Nanopore-Based Sequencing Technology to Analyze the Insertion Positions of G10evo-EPSPS and Cry1Ab/Cry2Aj Transgenes in Maize. Front. Plant Sci. 2021, 12, 690951. [Google Scholar] [CrossRef]
Van Dijk, E.L.; Jaszczyszyn, Y.; Naquin, D.; Thermes, C. The Third Revolution in Sequencing Technology. Trends in Genetics 2018, 34, 666–681. [Google Scholar] [CrossRef]
Warburton, P.E.; Sebra, R.P. Long-Read DNA Sequencing: Recent Advances and Remaining Challenges. Annu. Rev. Genom. Hum. Genet. 2023, 24, 109–132. [Google Scholar] [CrossRef]
Adams, P.E.; Thies, J.L.; Sutton, J.M.; Millwood, J.D.; Caldwell, G.A.; Caldwell, K.A.; Fierst, J.L. Identifying Transgene Insertions in Caenorhabditis Elegans Genomes with Oxford Nanopore Sequencing. PeerJ 2024, 12, e18100. [Google Scholar] [CrossRef]
Jain, M.; Koren, S.; Miga, K.H.; Quick, J.; Rand, A.C.; Sasani, T.A.; Tyson, J.R.; Beggs, A.D.; Dilthey, A.T.; Fiddes, I.T.; et al. Nanopore Sequencing and Assembly of a Human Genome with Ultra-Long Reads. Nat Biotechnol 2018, 36, 338–345. [Google Scholar] [CrossRef]
Huang, M.; Kingan, S.; Shoue, D.; Nguyen, O.; Froenicke, L.; Galvin, B.; Lambert, C.; Khan, R.; Maheshwari, C.; Weisz, D.; et al. Improved High Quality Sand Fly Assemblies Enabled by Ultra Low Input Long Read Sequencing. Sci Data 2024, 11, 918. [Google Scholar] [CrossRef]
Meier, M.J.; Beal, M.A.; Schoenrock, A.; Yauk, C.L.; Marchetti, F. Whole Genome Sequencing of the Mutamouse Model Reveals Strain- and Colony-Level Variation, and Genomic Features of the Transgene Integration Site. Sci Rep 2019, 9, 13775. [Google Scholar] [CrossRef]
Zhang, H.; Li, R.; Guo, Y.; Zhang, Y.; Zhang, D.; Yang, L. LIFE-Seq: A Universal L Arge I Ntegrated DNA F Ragment E Nrichment Seq Uencing Strategy for Deciphering the Transgene Integration of Genetically Modified Organisms. Plant Biotechnology Journal 2022, 20, 964–976. [Google Scholar] [CrossRef]
Sheehan, M.; Kumpf, S.W.; Qian, J.; Rubitski, D.M.; Oziolor, E.; Lanz, T.A. Comparison and Cross-Validation of Long-Read and Short-Read Target-Enrichment Sequencing Methods to Assess AAV Vector Integration into Host Genome. Molecular Therapy - Methods & Clinical Development 2024, 32, 101352. [Google Scholar] [CrossRef]
Suzuki, O.; Koura, M.; Uchio-Yamada, K.; Sasaki, M. Analysis of the Transgene Insertion Pattern in a Transgenic Mouse Strain Using Long-Read Sequencing. Exp. Anim. 2020, 69, 279–286. [Google Scholar] [CrossRef] [PubMed]
Gilpatrick, T.; Lee, I.; Graham, J.E.; Raimondeau, E.; Bowen, R.; Heron, A.; Downs, B.; Sukumar, S.; Sedlazeck, F.J.; Timp, W. Targeted Nanopore Sequencing with Cas9-Guided Adapter Ligation. Nat Biotechnol 2020, 38, 433–438. [Google Scholar] [CrossRef]
Smirnov, A.; Nurislamov, A.; Koncevaya, G.; Serova, I.; Kabirova, E.; Chuyko, E.; Maltceva, E.; Savoskin, M.; Zadorozhny, D.; Svyatchenko, V.A.; et al. Characterizing a Lethal CAG-ACE2 Transgenic Mouse Model for SARS-CoV-2 Infection Using Cas9-Enhanced Nanopore Sequencing. Transgenic Res 2024, 33, 453–466. [Google Scholar] [CrossRef]
Blondal, T.; Gamba, C.; Møller Jagd, L.; Su, L.; Demirov, D.; Guo, S.; Johnston, C.M.; Riising, E.M.; Wu, X.; Mikkelsen, M.J.; et al. Verification of CRISPR Editing and Finding Transgenic Inserts by Xdrop Indirect Sequence Capture Followed by Short- and Long-Read Sequencing. Methods 2021, 191, 68–77. [Google Scholar] [CrossRef]
Zarka, K.A.; Jagd, L.M.; Douches, D.S. T-DNA Characterization of Genetically Modified 3-R-Gene Late Blight-Resistant Potato Events with a Novel Procedure Utilizing the Samplix Xdrop® Enrichment Technology. Front. Plant Sci. 2024, 15, 1330429. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, Y.; Bollas, A.; Wang, Y.; Au, K.F. Nanopore Sequencing Technology, Bioinformatics and Applications. Nat Biotechnol 2021, 39, 1348–1365. [Google Scholar] [CrossRef]
Li, S.; Jia, S.; Hou, L.; Nguyen, H.; Sato, S.; Holding, D.; Cahoon, E.; Zhang, C.; Clemente, T.; Yu, B. Mapping of Transgenic Alleles in Soybean Using a Nanopore-Based Sequencing Strategy. Journal of Experimental Botany 2019, 70, 3825–3833. [Google Scholar] [CrossRef]
Giraldo, P.A.; Shinozuka, H.; Spangenberg, G.C.; Smith, K.F.; Cogan, N.O.I. Rapid and Detailed Characterization of Transgene Insertion Sites in Genetically Modified Plants via Nanopore Sequencing. Front. Plant Sci. 2021, 11, 602313. [Google Scholar] [CrossRef]
Steiert, T.A.; Fuß, J.; Juzenas, S.; Wittig, M.; Hoeppner, M.P.; Vollstedt, M.; Varkalaite, G.; ElAbd, H.; Brockmann, C.; Görg, S.; et al. High-Throughput Method for the Hybridisation-Based Targeted Enrichment of Long Genomic Fragments for PacBio Third-Generation Sequencing. NAR Genomics and Bioinformatics 2022, 4, lqac051. [Google Scholar] [CrossRef]
Lefoulon, E.; Vaisman, N.; Frydman, H.M.; Sun, L.; Voland, L.; Foster, J.M.; Slatko, B.E. Large Enriched Fragment Targeted Sequencing (LEFT-SEQ) Applied to Capture of Wolbachia Genomes. Sci Rep 2019, 9, 5939. [Google Scholar] [CrossRef]
Tilleman, L.; Rubben, K.; Van Criekinge, W.; Deforce, D.; Van Nieuwerburgh, F. Haplotyping Pharmacogenes Using TLA Combined with Illumina or Nanopore Sequencing. Sci Rep 2022, 12, 17734. [Google Scholar] [CrossRef]
Hafford-Tear, N.J.; Tsai, Y.-C.; Sadan, A.N.; Sanchez-Pintado, B.; Zarouchlioti, C.; Maher, G.J.; Liskova, P.; Tuft, S.J.; Hardcastle, A.J.; Clark, T.A.; et al. CRISPR/Cas9-Targeted Enrichment and Long-Read Sequencing of the Fuchs Endothelial Corneal Dystrophy–Associated TCF4 Triplet Repeat. Genetics in Medicine 2019, 21, 2092–2102. [Google Scholar] [CrossRef]
Tsai, Y.-; Brown, K.; Bernardi, M.; Harting, J.; Clelland, C. Single-Molecule Sequencing of the C9orf72 Repeat Expansion in Patient iPSCs. BIO-PROTOCOL 2024, 14. [Google Scholar] [CrossRef]
Watson, C.M.; Crinnion, L.A.; Lindsay, H.; Mitchell, R.; Camm, N.; Robinson, R.; Joyce, C.; Tanteles, G.A.; Halloran, D.J.O.; Pena, S.D.J.; et al. Assessing the Utility of Long-Read Nanopore Sequencing for Rapid and Efficient Characterization of Mobile Element Insertions. Laboratory Investigation 2021, 101, 442–449. [Google Scholar] [CrossRef]
Stangl, C.; De Blank, S.; Renkens, I.; Westera, L.; Verbeek, T.; Valle-Inclan, J.E.; González, R.C.; Henssen, A.G.; Van Roosmalen, M.J.; Stam, R.W.; et al. Partner Independent Fusion Gene Detection by Multiplexed CRISPR-Cas9 Enrichment and Long Read Nanopore Sequencing. Nat Commun 2020, 11, 2861. [Google Scholar] [CrossRef]
Xu, S.; Shiomi, H.; Yamashita, Y.; Koyama, S.; Horie, T.; Baba, O.; Kimura, M.; Nakashima, Y.; Sowa, N.; Hasegawa, K.; et al. CRISPR-Cas9-Guided Amplification-Free Genomic Diagnosis for Familial Hypercholesterolemia Using Nanopore Sequencing. PLoS ONE 2024, 19, e0297231. [Google Scholar] [CrossRef]
López-Girona, E.; Davy, M.W.; Albert, N.W.; Hilario, E.; Smart, M.E.M.; Kirk, C.; Thomson, S.J.; Chagné, D. CRISPR-Cas9 Enrichment and Long Read Sequencing for Fine Mapping in Plants. Plant Methods 2020, 16, 121. [Google Scholar] [CrossRef]
McDonald, T.L.; Zhou, W.; Castro, C.P.; Mumm, C.; Switzenberg, J.A.; Mills, R.E.; Boyle, A.P. Cas9 Targeted Enrichment of Mobile Elements Using Nanopore Sequencing. Nat Commun 2021, 12, 3586. [Google Scholar] [CrossRef]
Hertel, O.; Neuss, A.; Busche, T.; Brandt, D.; Kalinowski, J.; Bahnemann, J.; Noll, T. Enhancing Stability of Recombinant CHO Cells by CRISPR/Cas9-Mediated Site-Specific Integration into Regions with Distinct Histone Modifications. Front. Bioeng. Biotechnol. 2022, 10, 1010719. [Google Scholar] [CrossRef]
Reginato, G.; Dello Stritto, M.R.; Wang, Y.; Hao, J.; Pavani, R.; Schmitz, M.; Halder, S.; Morin, V.; Cannavo, E.; Ceppi, I.; et al. HLTF Disrupts Cas9-DNA Post-Cleavage Complexes to Allow DNA Break Processing. Nat Commun 2024, 15, 5789. [Google Scholar] [CrossRef]
Keraite, I.; Becker, P.; Canevazzi, D.; Frias-López, C.; Dabad, M.; Tonda-Hernandez, R.; Paramonov, I.; Ingham, M.J.; Brun-Heath, I.; Leno, J.; et al. A Method for Multiplexed Full-Length Single-Molecule Sequencing of the Human Mitochondrial Genome. Nat Commun 2022, 13, 5902. [Google Scholar] [CrossRef] [PubMed]
Lu, W.; Lan, X.; Zhang, T.; Sun, H.; Ma, S.; Xia, Q. Precise Characterization of Bombyx Mori Fibroin Heavy Chain Gene Using Cpf1-Based Enrichment and Oxford Nanopore Technologies. Insects 2021, 12, 832. [Google Scholar] [CrossRef]
Madsen, E.B.; Höijer, I.; Kvist, T.; Ameur, A.; Mikkelsen, M.J. Xdrop: Targeted Sequencing of Long DNA Molecules from Low Input Samples Using Droplet Sorting. Human Mutation 2020, 41, 1671–1679. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Features of the DNA integration in the pronuclear microinjection to be considered during transgene mapping. These features could complicate the transgene mapping and data analysis (details in the main text). Image was generated with ChatGPT.

Figure 2. Selected methods for transgene mapping in animals. (A) hiTAIL-PCR. The schematic overview of the method shows the main steps which may differ between alternative TAIL-PCR approaches. SP – sequence-specific primer, ADP – arbitrary degenerate primer. (B) NGS-based methods: WGS and TLA. (C) Nanopore with enrichment by Cas9 digestion.

Table 1. Comparison of costs and labor time for transgene mapping methods. Estimates are based on a hypothetical 10 kb transgene and should be adjusted according to the expected insert size. Pricing and time estimates exclude DNA isolation and do not account for delivery time, which may vary significantly depending on geographic location. High-throughput sequencing using platforms such as Revio (PacBio), PromethION (Nanopore), and NovaSeq 6000 (Illumina) is typically outsourced to specialized service providers rather than conducted in individual laboratories. Therefore, when planning such experiments, it is essential to consider additional factors, including probe design and synthesis time, shipping logistics, and service turnaround—each of which can substantially affect the overall project timeline.

	Preparation price per sample	Sequencing price per sample	Total / On-target data (coverage)	Preparation / Run time
Inverse PCR [30]	20-30$	10-30$ (Sanger)	Several kb	~ 9-12/3-4 h
TAIL-PCR [47]	40-50$	10-30$ (Sanger)	Several kb	~ 8-12/3-4 h
WGS by NGS (Illumina paired-end 150 bp) [51,67,68,70]	75-135$	NGS Option A: NovaSeq 6000 S4 ~160-250$ NGS Option B: NextSeq 500/550 ~1900-2400$	30 GB / >0.01% (coverage >10)	NGS Option A: ~ 3-5/45 h NGS Option B: ~ 3-5/35 h
NGS + TLA (commercial) [12,13,87]	1000-2000 $		NA	Weeks
NGS + TLA (lab) [82,83,84]	50-75$	NGS Option A: ~35-70$ NGS Option B: ~200-250$	3 GB / ~30–70% (coverage >30)	NGS Option A: ~36-48/35 h NGS Option B: ~36-48/45 h
NGS + hybrid capture (using 120 nt commercial tiling probes) [92,93]	180-250$	NGS Option A: ~10-20$ NGS Option B: ~75-150$	1 GB / ~40–80%, up to 95% based on the probes (coverage >30)	NGS Option A: ~24-36/45 h NGS Option B: ~24-36/35 h
NGS + hybrid capture (probes made in the lab)	50-60$	NGS Option A: ~10-20$ NGS Option B: ~75-150$	1 GB / ~80–90%, up to 93% (coverage >50)	NGS Option A: ~50/45 h NGS Option B: ~50/35 h
NGS + T7 In vitro transcription [95]	50–70$	NGS Option A: ~35-70$ NGS Option B: ~200-250$	3GB / ~35–70% (coverage >30)	NGS Option A: ~6-9/45 h NGS Option B: ~6-9/35 h
PacBio WGS [32,105]	~100–150$	900-1600$	45-90 GB / >0.01% (coverage >15-25)	6-10/24-36 h
PacBio + hybrid capture (using 120nt commercial probes) [104]	~350–500$	125-200 $	5-10 GB / 40–60% (coverage >30)	30-40/24-36 h
Oxford Nanopore Technologies (ONT) WGS [15,100,106]	~100–150$	ONT Option A: MinION, 2-3 flow cells 1200–2400$ ONT Option B: PromethION (shared) 300-600$	60-90 GB / >0.01% (coverage 20-30)	ONT Option A: ~5-7/24-60 h ONT Option B: ~5-7/48-72 h
ONT + nCATs [23,90,107]	~160–200$	ONT Option A (1 flow cell): 600–800$ ONT Option B: 100-150$	30 GB / 10–40%, depends on the gRNA (coverage 20-30)	ONT Option A: ~7-10/24-60 h ONT Option B: ~7-10/48-72 h
ONT + internal cuts (AFIS-seq, CRISPR-LRS) [24,44,108]	150-200$	ONT Option A (1 flow cell): 600–800$ ONT Option B: 100-150$	30 GB / 5-40%, depends on the gRNA (coverage >30)	ONT Option A: ~7-10/24-60 h ONT Option B: ~7-10/48-72 h
Nanopore + Xdrop (commercial) [16,109,110]	650-900$	ONT Option A (1 flow cell): 600–800$ ONT Option B: 100-150$	30 GB / ~60–90% (coverage >100)	ONT Option A: ~4-5 days/24-60 h ONT Option B: ~4-5 days/48-72 h

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.