Deep learning in medical image registration: introduction and survey

Image registration (IR) is a process that deforms images to align them with respect to a reference space, making it easier for medical practitioners to examine various medical images in a standardized reference frame, such as having the same rotation and scale. This document introduces image registration using a simple numeric example. It provides a definition of image registration along with a space-oriented symbolic representation. This review covers various aspects of image transformations, including affine, deformable, invertible, and bidirectional transformations, as well as medical image registration algorithms such as Voxelmorph, Demons, SyN, Iterative Closest Point, and SynthMorph. It also explores atlas-based registration and multistage image registration techniques, including coarse-fine and pyramid approaches. Furthermore, this survey paper discusses medical image registration taxonomies, datasets, evaluation measures, such as correlation-based metrics, segmentation-based metrics, processing time, and model size. It also explores applications in image-guided surgery, motion tracking, and tumor diagnosis. Finally, the document addresses future research directions, including the further development of transformers.


Nomenclature bx
translation on the X-axis by translation on the Y-axis bz translation on the Z-axis C ij pq correspondence of image p in space i and image q in space j DT average training time E the total number of elements in a set e (as in xe) stands for the order of an element in a set X ∅i p = {x1 ) domain values of an image p in space i, where ∅ is a reference unknown codomain (used with raw data).p is an index of a registration example in a dataset.X ij the transformed domain after applying T ij to X ∅i such that X ij = T ij (X ∅i ) X i−j the outcome of applying T i−j to X ∅i such that X i−j = T i−j (X ∅i ) but before any postprocessing like resampling xe element number e in a set X, X ∅i p = {x1 When a novice human reads or hears the concept of "image registration" for the first time, the word "registration" may not provide a clue about what image registration engineers do.A curious non-native English speaker may look for a hint in a dictionary such as Oxford or Cambridge, but none of the listed senses in the dictionaries seem related.In the dictionary, the word "registration" is mainly associated with an entry in an official record or list, such as the addition of a new citizen to a national register or the enrollment of a student in a course (an entry in the record of enrolled students).Similarly, the license plate number on the back and front of a car is called a registration number in British English (an entry in the record of licensed cars).A related sense is found in Merriam-Webster's dictionary under the word "register," but not under the word "registration," which defines "register" (noun) as a correct alignment.Nevertheless, the Cambridge and Oxford dictionaries do not list a related sense under "register" or "registration."Links are in Table 1.
Table 1.Registration in dictionaries (Oxford, Cambridge, Meriam Webster) Dictionary Word Link The concept of 'image registration' in computer vision could have been coined under the influence of the early printing industry.In the printing industry, registration is the process of getting an image printed at the same location on the paper each time.It also means the perfect alignment of printing components (e.g., dots, lines, colors) with respect to each other.Figure 1 shows an example of printing misalignments.The misalignment in early printing machines depended on the initial settings in addition to the movement of a paper while it runs through the printing machine.Hence, marks like crosshairs were used to be printed on paper boundaries to check a popper alignment/registration (Stallings, 2010).You may have seen a crosshair like the one shown on the right of Figure 1 in old printed documents.In addition to ink printing on papers, registration covered other kinds of printing such as embossing and metallic foiling.
In color printing, basic colors were printed one color after another.For example, the basic colors of the CMYK color model are cyan, magenta, yellow, and key/black, which form the acronym of CMYK.A misalignment between colors may result in overlapping replicas (Wikipedia 2023).Figure 1.An example of printing misalignment and a crosshair on the right boundary.Crosshairs were used in the early printing industry to check misalignments like the misalignment between copies/pages In summary, the concept of image registration in computer vision seems to have been influenced by the printing industry.In computer vision, it is common to call a computer image that will be aligned a "moving image" and the reference image a "fixed image".The naming of a "moving" and a "fixed" image suit the movement of a paper in a printing process, however, it is still very common to read "moving image" and "fixed image" in computer image registration papers, despite the lack of a moving part as IR is done digitally by computer algorithms only.
The sense of registration in the printing industry can be seen as a narrow case of an expanding IR arena.IR includes aligning identical images, aligning images of non-rigid objects, and aligning images of different objects and/or different dimensions (like aligning a 2D X-ray image with a 3D MRI image).
In the previous paragraphs, a potential connection was drawn between the concept of IR and the registration printing industry.Another potential, but less obvious, connection is between IR and registration in the music industry.Registration is known to organists (musicians who play organ) as the selection of organ stops.An organ stop is a part of an organ that controls the flow of the air to certain pipes, hence basically a musician combines stops to generate sounds.What is common between organ registration and IR is that both are processes of finding a configuration for an intended outcome.That outcome is a melody in the case of organ registration and an aligned image in the case of IR.Can organ registration be considered a type of IR? Answering such questions entails a full technical definition of image registration (see the next section).

Image registration definition
Humans align objects mentally before deciding whether two rotated objects are similar or not according to cognitive psychology (Cooper, 1975).Likewise, it is easier for medical practitioners to compare aligned medical images.To demonstrate this, a reader can compare the left side and the right side of Figure 2.

IR definitions in the literature
Image registration definitions in the literature of medical image registration can be categorized into three main definitions: Definition 1: Finding a transformation between two images that are related to each other such that the images are of the same object or similar objects, the same region, or similar regions (Stewart et al., 2004).Definition 2: The process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors (Zitova et al., 2003).Definition 3: The process of transforming different images into one coordinate system with matched content (Chen, X., Wang, et al, 2022).
The first definition limited registration to two images.The second definition extended the number of images beyond two, but it limited registration to the "same scene" while the first definition included "similar scenes".Hence, the alignment of an image of a human brain and another of a monkey's brain is covered by the first definition, but not clearly by the second.For medical images, the alignment between images of different patients is known as inter-patient registration.Hence 'inter-patient' registration is not covered by the second definition.The first two definitions limited registration to "same" or "similar" scenes, but no clear definition was provided to distinguish what can be considered similar and what cannot.Similarity in physics entails a preservation of at least one quantity such that the quantity measured in case 1 (first image) will be equal to that in case 2 (second image).Given the fundamental definition of similarity, we argue that the concept of "similarity" should not be part of the definition unless there is really a preserved quantity determined.Some may argue that the preserved quantity is the loss function used for optimization.Although that optimization function guides an approximate solution for a registration problem, that function is not grounded (the optimization function itself can be adapted).Moreover, that optimization function or similarity measure is not preserved for the images being registered.Hence, we argue that the concept of similarity should not be part of a definition to IR, but it is a part of an engineered IR solution.

Before registration After registration
Are they the same?Rotational alignment Same size?Rotational and translational alignment

Figure 2. Examples of mental alignment
The proposed definition of image registration which will be introduced later in this section eliminates the concept of similarity, instead, it considers that any images with a correspondence relation can be registered, for example, the registration of an image of an orange with another of an apple is possible.Likewise the registration of a sound with a video.
The third definition introduced a "coordinate system", but does image registration entail a coordinate system?If yes, is it one coordinate system or more?The answer will be no if a single example of a registration process can be done without a coordinate system (falsifiability).Let's assume a mapping between two rectangles is expressed by colors where the corresponding points share the same color.The need for a coordinate system in image registration is similar to the need for an absolute reference frame in Newtonian physics.In Newtonian physics, measurements of moving objects are recorded with respect to an absolute reference point that is chosen randomly and called the origin of a coordinate system.However, that coordinate system is a just tool that is not an essential part of the world.In his theory of relativity, Einstein suggested a relative frame of reference where the movement of an object is measured with reference to any point.An absolute coordinate system is a useful tool that makes solving the IR problem easier, especially to define locations, but that tool is not essential since we can define an object without it using a graph or a set.Our argument here is not to propose the elimination of the coordinate systems (yet), but to propose a more accurate concept than the coordinate system, which is the concept of "space".The alignment of images happens in a "space".That space may or may not have a coordinate system associated with it.The presence or absence of a coordinate system does not affect the components of a space.Some spaces do not even accept a coordinate system such as topological spaces and spaces of infinite dimensions.Replacing the concept of the "coordinate system" by the concept of the "space" does not make the definition more accurate only, but also proposes solving the registration problem in other mathematical spaces beyond the coordinate system.
Table 2 shows a list of image registration definitions extracted from review papers on MIR along with the constraints that each definition imposed.The differences between IR definitions are about imposing the following constraints: C1 registration is limited to two images, C2 registration is limited to the "same scene", C3 a coordinate system is essential.Overall, the definition above is distinguished from the image registration definitions mentioned earlier in the following points: 1-Registration is not limited to two images, instead multiple images can be registered in a space called correspondence space.2-The possibility of registration is not limited to "similar" images, instead, registration is between any images with a correspondence relation (as in mathematics).The correspondence relation can be explicit as in the iterative closest point method, or implicit as in deep learning approaches.Both approaches are explained in section 7.3 titled MIR algorithms.3-A coordinate system is not part of the definition; however, the concept of space (as in mathematics) is used instead for a more accurate and generic definition.
Although constraints C1, C2, or C3 were common in the surveyed MIR papers between 2021 and 2022, some IR definitions that do not impose any of those constraints were found in earlier publications.(Modersitzki, 2003) defined IR as finding an optimal geometric transformation between corresponding images, where the notions 'optimal' and 'corresponding' were considered dependent on the application.Unfortunately, just the first chapter of (Modersitzki, 2003)'s book was found open access.In (Brown, 1992), two definitions were provided.while the first definition in the abstract "to match two or more pictures taken, for example, at different times, from different sensors, or different.viewpoints."does not explicitly impose any of the constraints C1-C3, the second definition in a subsection titled "definition" imposed C1: "a mapping between two images both spatially and with respect to intensity."(Fitzpatrick et al., 2000) defined IR as "the determination of a geometrical transformation that aligns points in one view of an object with corresponding points in another view of that object or another object.",which seems to comply with C1, but rejects C2, and doesn't impose C3.

Introductory example
Readers who have some knowledge of image processing or modern algebra are advised to skip this section which targets novice readers.This section demonstrates an image transformation pipeline using a simple example (see Figure 3).(1,0) (1,1)

Image deformation using a displacement field:
A displacement field ∆X represents domain relocation distances (see Equation 2), such that a 2D displacement value of <1,-1> moves a pixel 1 unit on the horizontal axis and -1 unit on the vertical axis.

post-processing
A domain transformation may relocate pixels to locations that violate space constraints.A constraint that is commonly violated after a domain transformation of a digital graphical image is that domain values should be uniformly distributed integers.For example, a domain value of (0.3, 0.17) violates the mentioned constraint since 0.3 and 0.17

IR constraints
Constraints in machine learning (ML) are assumptions that limit the search space for a solution to a problem.Such assumptions are vital for generalization beyond the known examples.When a true solution is unknown, all solutions that ML models provide become equal if no assumptions are made (Mitchell, 1980).ML constraints were reviewed in Goyal, A., & Bengio, Y. ( 2022) and considered essential to the development of a higher level of machine intelligence -the next generation of deep learning.An example of an ML constraint is the rotation equivariance assumption which implies that a rotation of an input entails a similar rotation of the output.Geometric deep learning (GDL) was introduced in Bronstein et al., (2021) as a framework that studies the incorporation of constraints into neural network architectures based on unified geometric principles (e.g., symmetry ).An example of the incorporation of priors through wavelets was done by Oyallon & Mallat (2015).
Three common IR constraints were shown to be invalid in Section 2. This section introduces IR constraints that were common among IR methods.

1-The proximity constraint
The proximity constraint assumes a space k in which correspondent elements are in proximity, as in Equation 3.
A special case of the proximity constraint is when Prox() is the distance between points in a Euclidean space.Equation 3 in this special case becomes as shown in Equation 4. The special proximity constraint assumes that, for images with a correspondence relation between them, there is a space K in which the correspondent points will be located at the same location or very close to each other.
A limitation of the mathematical expression in Equation 4is that 1) it measures the proximity as a distance in a correspondence space, and 2) It uses point-based proximity.

2-Co-domain preservation constraint
The second constraint assumes a transformation Ti-k that transforms the domain but not the co-domain as in Equation 5.
Image registration methods, to the best of the authors' knowledge so far, include a spatial transformation that affects the domain only.In other words,  − relocates pixels/voxels without changing their colors/values.In general, deep learning architectures used for MIR impose the second constraint explicitly.A deep learning model takes images as input and produces a deformation field.Then the deformation field adjusts the domain of an image using a spatial transformation operation to yield a registered version of the image.Co-domain values might be Figure 8. Examples of image interpolation fine-tuned in a post-processing step (e.g., resampling) to fit the discretized representation of digital images in a Euclidean space, but that post-processing does not affect the co-domain preservation constraint.
Is there an IR method that violates the second constraint?A registration method that does not impose a spatial transformation step could be a potential falsifier.In theory, neural networks (NN) can approximate any function according to the universal approximation theorem.Hence, an end-to-end neural network can, in theory, register images without an imposed spatial transformation step, but no such model yet to the best of the authors' knowledge.Hence, the second constraint holds until an end-to-end neural networks model, without an explicit or implicit spatial transformation unit, is developed in practice.
Another toy falsifier is that the registration in Figure 5

Related survey papers
Table 3 compares this work to related survey papers that appeared in the search query in section 5. Two highly cited review papers (Zitova et al., 2003;Haskins et al., 2020) were added to the table although they were published before 2021.
It could be in the interest of novice readers to read about the history of IR and its etymology (see section 1) in addition to a simple numeric example that demonstrates the basics of IR (see section 3) since no review paper was found that addresses these parts to the best of the authors' knowledge.Advanced users could be interested in the novel constraint-based analyses of IR introduced in the previous sections.Different from other survey papers shown in Table 3 which were mainly descriptive with no or just a few equations, this survey introduced a symbolic framework of the IR components (see the nomenclature) that has been used to express tens of equations.
Zitova et al. ( 2003) structured their paper based on the classical IR pipeline starting with feature detection, followed by feature matching, mapping function, image transformation, and resampling.
Haskins, et al. ( 2020) tracked the development of MIR algorithms covering 1) deep iterative methods that are based on similarity estimation, 2) supervised transformation estimation which entails ground truth labels that are not easily affordable, and 3) unsupervised transformation estimation methods which overcome the challenge of ground truth labels.Finally, 4) weakly supervised approaches were discussed.
Chen, X. et al. ( 2021) first provided a framework for image registration.Then explained the basic units of DL and reviewed DL methods such as deep similarity, supervised, unsupervised, weakly supervised, and RL.The authors discussed the challenges of MIR: 1) different preprocessing steps lead to different results, 2) a few studies quantify the uncertainty of predicted registration, and 3) limited data (small scale).Finally, possible research directions were highlighted: 1) hybrid models (classical methods and deep learning), and 2) Boosting MIR performance with priors.2022) reviewed the evaluation metrics of unsupervised MIR in a sample of 55 papers.The statistics showed that: 1) the majority of papers were handling unimodal registration (82%), 2) a private dataset was more likely to be used than a publicly available dataset, 3) most papers worked with MR images (61%), and 4) the most researched ROI was the brain at 44%, then the heart at 15%.supervised, unsupervised, and semi-supervised.Then it addressed ideas of DL that were shown to improve the outcomes: attention, involvement of domain knowledge, and uncertainty estimation.Then the paper briefly reviewed classification, detection, segmentation, and registration.Finally, the paper highlighted ideas for future improvement that included the idea of a fully end-to-end deep learning model for MIR.In addition to the incorporation of domain knowledge.They also highlighted important points for large-scale applications of deep learning in clinical settings such as having large datasets publicly available as well as producible codes.They also highlighted the need for more clinical-based evaluation and the involvement of domain experts from the medical field in the evaluation rather than limiting the evaluation to theoretical evaluation metrics.

Taxonomies
A registration algorithm consists of a set of assumptions (prior knowledge), and a margin of uncertainty (the unknown part), which is expressed using variables (e.g., model parameters).For example, if a programmer knows exactly how to register any images in a similar way to having a formula that finds the roots of any quadratic equation, then s/he will just embed that prior knowledge (the formulae) in the code.However, there is no such a generic formula yet for most IR cases.Accordingly, variables are made and adjusted using an optimization method.

Deformation types
Transformation functions in MIR can be categorized based on their deformability into rigid, affine, and deformable transformations as shown in Figure 9.
In physics, the shape and size of a rigid body do not change under force.When you push a small solid steel bar, the location and/or the orientation of the bar may change, but the bar itself remains the same (e.g., the same mass, shape, and size).Likewise, a rigid transformation preserves the distances between every pair of points.Accordingly, rotations and translations are rigid transformations or proper rigid transformations in the distinction of reflections which are called improper rigid transformations as they do not preserve the handedness.
A rigid transformation Tij preserves the distances between any two points on the object of interest, such that the constraint || ∅  −  ∅  || = ||   −    || holds for every pair of points k, l ∈ the set Mp.A rigid transformation can be expressed as in Equation 6.
Where v ~ is a newly transformed vector after the application of a rigid transformation to a vector v, which could be a position of a point in Euclidian space.b is a translation vector, and A is an orthogonal transformation (see the appendix for definition) such as orientation.
A rigid transformation is a subcategory of a bigger group of transformations called affine transformations.Affine transformations preserve parallelism and lines, but no constraints on the preservation of distances.Thus, it can be expressed as in Equation 6 The formula of a proper rigid transformation in a 3D space consists of 6 unknown variables: 3 rotation angles (θx, θy, θz), and 3 translations (bx, by, bz) as shown in Equation 8, where the subscriptions x, y, z are 3 perpendicular coordinates.
Transformations that do not preserve the rigidity or affinity constraints are called deformable transformations.

Optimization phase
Image registration entails an optimization step in which a model's parameters are adjusted to minimize/maximize an objective function.Optimization can occur, as shown in Figure 10, 1) during the development phase as in DL approaches, or 2) during the running phase such as in iterative methods, or 3) in both e.g., active learning approaches, or a test-time training as called in (Zhu et al., 2021).The objective function of MIR is expressed in Equation 9 as a weighted sum of two components: the first quantifies the registration error that represents the proximity between the predicted registration and the correct one, and the second is a regularization component.

Loss = registration_error + regularization (9)
The optimization methods such as gradient descent, evolutionary algorithms, and search are iterative.Hence the optimization step adds a time overhead to the phase in which it takes place.Thus, DL approaches take a long training time, but shorter registration time.
Approaches that run optimization in both phases aim at further improving the registration despite a slight increase in the computation time.To reduce the run-time overhead, the bulk optimization of the model parameters occurs in the training phase while just slight finetuning occurs during the run phase to customize the results (Zhu et al., 2021).

MIR algorithms
This section discusses selected registration algorithms.Mainly the algorithms that were used as baselines against which the performance of a new algorithm is compared.A taxonomy of MIR algorithms is shown in Figure 11.
Figure 11.MIR methods taxonomy The diagram of directly supervised image registration approaches is shown in Figure 12.Initially, input images are fed to neural networks which produce a registration field.The registration field is applied to the fixed image to relocate its pixels in a process called spatial transformation represented as a yellow circle in the figures below.An example of a supervised registration can be seen in (Lee et al., 2022).

Deep learning approaches
The main question is how neural networks learn to estimate the registration field.In the directly supervised approach, A ground truth label is provided during the training phase.The ground truth label could be the registration field as shown in Figure 12 (left), or the wrapped image as shown in Figure 12 (right).A challenge of directly supervised MIR approaches is their need for ground truth labels, which entails medical experts annotating a large number of images.To overcome ground truth labels, unsupervised MIR has been proposed.

b. Unsupervised deep learning approach: Voxelmorph
Unsupervised MIR approaches do not entail an external supervision signal.Instead, the fixed image (input) was assumed to replace the ground truth label of the registered image <    ′ ,    ′ > ≈ <  ∅  ,  ∅  > as in Voxelmorph (Balakrishnan et el., 2019).This assumption is useful when the fixed image and the moving image have similar modalities/co-domains.However, the assumption may not work well if the fixed image and the registered image are of different modalities (e.g., one is 3D MRI, and the other is 2D X-ray) unless a way is developed to bridge the gap between the two modalities.This has been reported by the results shown in Synthmorph (Hoffmann et al., 2021).Even for images of the same modality, co-domain dissimilarities can be a problem with this approach.For example, if the contrast of the fixed image is different than that of the moving image, then the mean square error MSE( ∅  ,    ) may not represent the error adequately.However, another loss function like cross-correlation "CC" is more resilient against the contrast problem than MSE due to its scale invariance property.CC (Y1, Y2) = CC (Y1, α×Y2) where α is a scale number.
MIR using Voxelmorph yielded results much faster than non-deep learning MIR methods without degradation of the registration quality.Voxelmorph cut the registration runtime to minutes/seconds compared to hours needed by non-deep learning methods used before Voxelmorph.Voxelmorph superseded non-deep learning methods when segmentation labels were added to the registration.Synthmorph generated images in two steps, first segmentation labels were generated randomly, then fixed, and moving images were generated given the segmentation label.The results yielded by Synthrmorph were superior to classical methods even when the images were of different modalities.

non-deep learning methods:
MIR methods that do not involve deep neural networks are called 'non-deep learning methods', 'classical methods', or 'iterative methods.' a. Iterative Closest Point (ICP) ICP (Arun, 1987;Estépar, 2004;Bouaziz, 2013) alternates between two goals: the establishment of a correspondence    , and finding a transformation   that optimizes a loss function.A loss function quantifies the quality of a registration (see section 8).A demonstration of the ICP process is shown in Figure 13.Let the moving image be a blue line of 4 marked points, and the fixed image a similar black line.The loss function can be a point-wise Euclidean distance.First, 1) a correspondence is established between the points on each line such that each point is matched with its closest neighboring point.Notice that the correspondence is not 1-to-1 as the two bottom black points are matched with the same point, and the top blue point is not matched, 2) the blue line was translated to minimize the distance between the two lines, 3) another correspondence was found (1-to-1 correspondence this time), and 4) the black line was transformed (rotation and translation) based on the new correspondence.
ICP, like other iterative approaches, takes longer registration time than DL approaches.The establishment of a correspondence between nearest neighbors is straightforward but not always optimal and it sticks in local optima.
b. Demons A deformable IR approach was proposed by Thirion (1996).The name of the Demons approach was influenced by Maxwell's Demons paradox in Thermodynamics.Maxwell assumed a membrane that allows Correspondence Transformation Correspondence Transformation Figure 14.A demonstration of ICP registration.particles of type A to pass in one direction, while particles of type B can pass in the opposite direction, which will end up having all particles of type A on one side of the membrane and particles of type B on the other side as shown in Figure 15.That state of organized particles corresponds to a decrease in entropy, which contradicts the second law of thermodynamics.The solution to that paradox was that the demons generate entropy to organize the particles resulting in a greater total entropy than that was before the separation of the particles.

Figure 15. Maxwell's membrane with demons
Influenced by Maxwell's demons, Thirion suggested distributing particles (demons) on the boundaries of an object (see Figure 16) such that a demon will push locally either inside or outside the object based on a prediction of a binary classifier.It has been shown that what Thirion's demons do is object matching using optical flow.

c. Symmetric Image normalization (SyN)
The main idea of SyN is to assume a symmetric and invertible transformation.Instead of transforming space i to j, SyN symmetrically transforms both space i & space j to an intermediate space such that   =   −1 .In this case   can be seen as half a step forward towards space j, and   is half a step backward towards i (see Figure 17).The symmetric invertibility constraint of SyN can be expressed as in Equation 10∃  ANTs on Github: https://github.com/ANTsX/ANTsChen, T. et al., ( 2002) compared three registration tools: SPM12, FSL, and AFNI.SPM12 was recommended for novice users in the area of medical image analysis.It provided stable outcome images of "maximum contrast information" needed for tumor diagnosis.AFNI was recommended for advanced users and researchers due to the advanced capabilities needed for tasks such as volume estimation.FSL was considered for mid-level users.

Correspondence space
MIR alignment occurs in a correspondence space k.The correspondence space can be the space in which an input image is located (internal), or it can be a new space (external).MIR in an internal correspondence space has been the most common among MIR methods.Examples of MIR in an internal space can be seen in the methods mentioned earlier, which included a transformation from the space of a moving image (i) to the space of the fixed image (j).An example of MIR in external space is Atlas-based registration (Wang, Z. et al., 2022).

Atlas-based registration
An Atlas is a standard or a reference image that represents a population of images.One way to form an Atlas of a brain is by finding the average image of a population of brain images, which is expected to be smooth and symmetrical.However, that is not the only way.(Dey et al., 2021) suggested an atlas generated by GANS.Another way to form an atlas is by IR in an external correspondence space.An example of atlas-based registration is the Aladdin framework (Ding, Z. et al., 2022) shown in Figure 18.Aladdin transformations are bidirectional and invertible.
• Invertibility: for a transformation   , there is an inverse transformation   −1 • Bidirectionality: A bidirectional registration maps spaces in both directions from i to k and vice versa (  ↔ :   , and   ).Accordingly, a bidirectional IR model (Ding, W. et al.,2022;Andreadis et al., 2022;Ye et al.,2021) can yield two wrapped images   ,   .On the other side, a unidirectional registration maps a single space i into another j but not vice versa.An example of an invertible bidirectional MIR model in an internal correspondence space, namely Inversenet (Nazib et al., 2022), is shown in Figure 19.The spatial transformation unit imposes isomorphism, since the registration field just maps a single pixel from one location to another single point only, which is a 1:1 correspondence.However, the resampling step can affect the 1:1 correspondence relation, for example, if two nearby points are merged in the target image, which makes metamorphism possible but no guarantees.Diffeomorphism can be achieved by an integral ∫ before a spatial transformation.
Metamorphosis (Maillard et al., 2022) is a deep learning model that addresses metamorphic registration.
Metamorphosis estimated the wrapped image without an explicit spatial transformation unit.However, alternative constraints were added as 2 equations embedded in the network as layers.However, no information if a spatial transformation holds implicitly.Metamorphosis superseded diffeomorphic registration methods especially when the ground truth correspondence was metamorphic.However, its runtime was 10-20 times that of Voxelmorph.The runtime is defined in section 8 (evaluation measures).

Multistage image registration
Figure 23.Taxonomy of image registration stages Instead of solving the registration problem for high-resolution images entirely in a big dimensional space, the registration problem can be conquered into multiple registration problems of various scales.Figure 23 shows a taxonomy of multistage image registration.Multistage MIR approaches save computational resources and time in addition to the enhancement of registration results.

Coarse-fine registration:
A coarse-fine registration (Himthani et al., 2022;Naik et al., 2022;Saadat et al., 2022;Van Houtte et al.,2022) consists of two stages: The first stage is called coarse registration, which aims at finding a fast registration solution but not optimal.That solution is fine-tuned later in the second stage.For example, the coarse registration could be an affine registration that aligns the position and orientation while the fine-tuned registration could be a deformable registration method that aligns deformed parts.
The parameters of a rigid transformation of a high-resolution image can be found using a downscaled version of the image, which would save computation time and energy.The parameters of a rigid transformation are either independent of the scale (e.g., rotation) or linearly dependent (translations).Assume an image of 1000x1000 pixels and its lower resolution version of 100x100 (downscaling by 10).Scaling does not affect angles, hence if an object is rotated by 30 degrees in the downscaled image, it will be also rotated by the same angle in the highresolution image.However, distances between objects do change according to a fixed scale.If the distance between 2 objects in the low-resolution image is 25 units, then the equivalent distance in the high-resolution image will be 10×25 = 250, where 10 is the scaling ratio between the two images.Hence a solution for a rigid registration problem can be solved in a downscaled version of the images and then transferred to the higher resolution image.

7.6.2
Pyramid image registration.A pyramid consists of multi-scale images, where registration occurs at multiple stages.The idea of a pyramid representation has been well-studied in classical computer vision (Adelson et al., 1984) and utilized later in deep learning architectures such as Pyramid GANs (Denton et al., 2015;Lai et al., 2017).A pyramid registration (Wang et al., 2022;Chen, J. et al. 2022, Zhang, L. et al., 2021;Zhang, G. et al., 2021) starts with a downscaled version of the moving image followed by several operations of registration and upscaling as shown in Figure 24.After Registration stages

stage Default
Coarse -Fine

Feature based then internsity based
Rigid then deformable Pyramid every registration step, the proximity between the wrapped image and the downscaled fixed image improves.Multi-stage registration can be seen as a sort of curriculum learning (Bengio et al., 2009;Burduja et al., 2021) such that the first stages learn to solve easier problems and later stages learn the more difficult tasks.In (Wang, C. et al., 2022), both pyramid and coarse-fine registration were used.
Figure 24.Pyramid registration of three stages 5.7 Space Geometry A taxonomy of spaces has been proposed GDL is shown in Figure 25.A space can be Euclidean-like RGB images (pixels distributed regularly in a rectangle).Non-Euclidean spaces are represented in sets, graphs, meshes, or manifolds.Examples of MIR for non-Euclidean data, specifically 3D point clouds, have been presented in (Terpstra et al., 2022;Su et al., 2021).(Saiti et al., 2022;Santarossa et al., 2022;Wang, H. et al., 2022;Liu et al., 2021) involves an explicit feature extraction or selection, thus the input to the registration algorithm is not the image itself but representative features of that image such as its histogram (Ban et al., 2022).In pixel-based approaches, images are fed directly to the model without feature extraction.In general, DL registration approaches are pixel-based as neural networks can extract features implicitly.Some works included both feature and pixels (Ringel et al., 2022;Yang, Y. et al., 2021).

Medical imaging modalities
Medical imaging modalities are imaging techniques used to visualize the body and its components (Kasban et al., 2015).The main medical imaging modalities in MIR are: a. X-ray X-ray uses ionizing radiation (X-rays) to produce two-dimensional images of bones and dense tissues.X-rays are absorbed differently by different tissues, allowing visualization of structures like bones, lungs, and some organs.Non Eucledian Sets Graphs Manifolds X-rays are quick and relatively inexpensive, thus suitable for some diagnostic purposes, such as detecting fractures, lung infections, and dental issues.However, they provide limited details about soft tissues.

b. Computed Tomography (CT) scan
CT scan, also known as CAT (Computerized Axial Tomography), is a non-invasive imaging technique that uses X-rays to create detailed cross-sectional images of the body.A CT scan provides a more detailed view of bones, blood vessels, and solid organs compared to traditional X-rays.It is especially useful for imaging areas like the brain, chest, abdomen, and pelvis.However, they involve exposure to ionizing radiation, and repeated scans should be minimized to reduce radiation exposure.During a CT scan, the X-ray source rotates around the patient, and multiple X-ray images are captured from different angles.These images are then processed by a computer to create cross-sectional slices, allowing doctors to visualize the body in detail.CT scans are commonly used in emergencies, trauma cases, and cancer staging, among other applications.MIR of CT images was reported in (Dida et al., 2022;Gao et al., 2022).c.Magnetic Resonance Imaging (MR) MRI uses strong magnetic fields and radio waves to create detailed images of tissues, organs, and the central nervous system.It provides high-resolution, multi-planar images, making it ideal for diagnosing conditions in the brain, spinal cord, muscles, and joints.MRI does not use ionizing radiation, which makes it safer, but it can be more time-consuming and expensive compared to X-rays and CT scans.MIR of MR images was reported in (Li et al., 2022;Meng et al., 2022;Himthani et al., 2022;Kujur et al., 2022;Wu et al., 2022;Ashfaq et al., 2022).

d. Ultrasound (US)
Ultrasound, also known as sonography, uses high-frequency sound waves to create real-time images of internal organs and structures.It is commonly used for imaging the abdomen, pelvis, heart, and developing fetus during pregnancy.Ultrasound is non-invasive and does not involve ionizing radiation.It provides real-time imaging and is excellent for assessing blood flow and certain soft tissue abnormalities.However, it may not provide as detailed images as MRI and CT.

e. Positron Emission Tomography (PET)
PET is a functional imaging technique that provides information about metabolic activity and cellular function.It involves the injection of a radioactive tracer that emits positrons.The interaction between the tracer and tissues produces gamma rays, which are detected by the PET scanner.PET is valuable in oncology (cancer imaging) and neurology (e.g., detecting Alzheimer's disease).PET can be combined with CT imaging to provide both functional and anatomical information in a single scan.MIR is considered "unimodal" when there are no modality differences between the images involved in the registration process, otherwise, the registration is considered "multimodal".See Figure 26.An example of a unimodal registration is when both moving and fixed images are X-rays.An example of multimodal registration is when a fixed image is of the T1-weighted MRI modality and the moving image of the T2-weighted MRI.T2weighted MRI enhances the signal of the water and suppresses the signal of the fatty tissue while MRI/T1 does the opposite.Examples of multimodal registration can be seen in ( Van et al., 2022;Begum et al., 2022;Xu et al., 2021).
Figure 26.MIR taxonomy based on the modalities 6.0 Evaluation measures IR evaluation measures can be categorized as shown in Figure 27 into 1) time-based measures that focus on the time needed to finish a task, 2) size-based measures that focus on the memory resources that an MIR algorithm occupies, 3) smoothness measures that focus on the smoothness of the registration field (expressed by Jacobian), MIR modalities Unimodal Multimodal and 4) proximity-based measures that find the deviation of a registration outcome from the ground-truth.proximity can be expressed using distances between objects in a space, overlap between sets, or correlations between variables.In practice, getting the registration outcome in a short time is a desired property.The Voxelmorph algorithm, which uses deep learning for medical image registration, has shown an RT reduction from hours to seconds while keeping almost the same performance.The computation time of a registration process depends on the software as well as the hardware (Alcaín et al., 2021).Thus, a fair comparison of registration algorithms entails testing the computation time on the same hardware.The shorter RT of Voxelmorph compared to iterative approaches can be attributed partially to the hardware, where matrix multiplication processes used in DL are faster when run with a GPU.However, even on CPUs, Voxelmorph remains faster than iterative methods on a scale of minutes for voxelmorph to hours for iterative methods.The main reason for the longer RT in iterative approaches is the optimization done during the runtime, however, Voxelmorph-like approaches do not optimize the variables during the run phase, instead, all the variables are optimized in the training phase before the run time.b) The Jaccard coefficient is similar to DSC with a slight modification shown in Equation 28.(32) 6.5 The smoothness of the registration field A non-smooth registration field can relocate a pixel far away from all its adjacent pixels after registration, however, a smooth registration field is more likely to keep nearby pixels relatively close to each other after relocation.

6.1
a.The determinant of the Jacobian (JOCA) 6.6 Model size A model size can be expressed by the number of bytes that a model occupies in a storage device or the total number of its parameters.

Clinical-based evaluation
Virtual evaluation using computer-based metrics (above) may not always align perfectly with the practical evaluation by medical experts.Thus, clinical-based evaluation and involvement of domain experts from the medical field have been recommended by Chen, X., Wang et al. (2022) to characterize the reliability of MIR tools (Huang et al., 2022).
The challenges of MIR assessment included 1) the lack of ground truth labels in practical scenarios makes it difficult to evaluate an MIR outcome convincingly.2) Medical experts' assessment could be subjective and may vary among experts.3) Instable outcomes of some MIR algorithms, which yield different outcomes of different registration qualities for the same input image.4) the quality of data can have a substantial impact on registration results, making it challenging to compare algorithms across datasets with varying quality (Chen, T. et al.,2022).

Medical imaging datasets
A list of public datasets used in the literature was summarized in Table 5.The datasets were categorized based on the region of interest (ROI) such as brain, chest, …etc., and the medical imaging type.

Medical applications
Changing the frame of reference might mislead humans (like the phenomenon of not recognizing an object if it has been flipped (e.g., old/young lady face in Figure 2).Hence, it is easier for medical practitioners to evaluate a medical image in a standard reference frame (e.g., orientation, scale).Thus, registration is an essential part of medical diagnoses that depend on imaging technologies.IR was applied in retina imaging (Ho et al., 2021), breast imaging (Ringel et al., 2022;Ying et al., 2022), HIFU treatment of heart arrhythmias (Dahman et al., 2022), and cross-staining alignment (Wang et al., 2022).Selected applications of MIR are discussed below.
8.1 Image-guided surgery Image-guided surgery (IGS) incorporates imaging modalities such as CT, and US to assist surgeons during surgical procedures.For example, surgeons can visualize internal anatomy, pinpoint the location of tumors or lesions, and determine optimal incision points.Image-guided surgery enables surgeons to precisely target specific areas, and avoid critical structures during a procedure.
Before an IGS, a patient's preoperative images are loaded into a software or surgical navigation system (Wang, D. et al., 2022).The collected images are then aligned with images taken during the surgery (inter-operative) using registration algorithms.Having images with key points/landmarks improves the registration process in terms of speed and precision.The landmarks can be selected manually by medical experts on computer software (Schmidt et al., 2022;Wang, Y. et al., 2022), or they could be fiducial markers, which are small devices placed in a patient's body such as the injection of gold seeds to mark a tumor before radiation therapy.The number of landmarks needed for a precise registration can be reduced by the integration of semantic segmentation in addition to the use of a standard template (atlas) instead of preoperative images as shown by (Su et al., 2021).An alignment with no landmarks was tested by (Robertson et al., 2022) for catheter placement in non-immobilized patients.
To mention some examples of the use of MIR for IGS (Vijayan et al., 2021;Upendra et al., 2021, February), 2D inter-operative and 3D preoperative images were aligned in real-time surgical navigation systems (Ashfaq et al., 2022).A similar alignment of 2D-3D was needed for the deep brain stimulation procedure which involves the placement of neuro-electrodes into the brain to treat movement disorders such as Parkinson, and Dystonia (Uneri et al., 2021).A real-time biopsy navigation system was developed by (Dupuy et al., 2021) to align 2D US interoperative images with 3D TRUS preoperative images and to estimate in real-time the biopsy target of a prostate based on its previous trajectory.

Tumor diagnosis and therapy
A tumor is an abnormal mass or growth of cells in the body.Tumors can develop in various tissues or organs and can be either benign or malignant.Benign tumors are non-cancerous and typically do not invade nearby tissues or spread to other parts of the body.Benign tumors are generally not life-threatening, but medical attention and/or treatment are still required.Malignant tumors, on the other side, are cancerous.They have the potential to invade surrounding tissues and can spread to other parts of the body through the bloodstream or lymphatic system.Malignant tumors grow rapidly and can be life-threatening.Medical experts often diagnose a tumor and plan therapy depending on the tumor's growth over time as recorded in aligned medical images.Accordingly, MIR has been used for radiotherapy (Fu et al., 2022;Vargas-Bedoya et al., 2022) and proton therapy (Hirotaki et al., 2022).

Motion processing
The human body experience normal deformation over time, some deformations occur at a slower pace such as the growth of bones over a lifetime (e.g., a human height grows from afew feet in newborns to several feet in adults) while some deformations occur at a faster pace such as heartbeats.The heart experiences alternating contractions and relaxations while pumping blood at a frequency of 1-3 beats per second.MIR helps to analyze such temporospatial deformations and resulting movements.2D-3D motion registration of bones was addressed in (Djurabekova et al., 2022) by manipulating segmented bones from static scans and matching digitally reconstructed radiographs to X-ray projections.The bones were, particularly foot and ankle structure.9.0 Other research directions 9.1 Transformers Transformers are a DL architecture that uses the attention mechanism solely dispensing with conventional and recurrent units (Vaswani et al., 2017).Transformers have contributed to noticeable improvements in computer vision, audio processing, and language processing tasks (Lin et al, 2022).The improvement can be seen in products like GPT-2, and ChatGPT which are examples of Generative Pre-trained Transformers (GPT).
Transformers can be decomposed into basic/abstract mathematical components that distinguished them from recurrent and convolutional networks: 1) the position encoding, which explicitly feeds the position of a token as an input, 2) the product operation between features which is manifested explicitly in the product between the key and the query of the attention mechanism, and implicitly within the exponential function of the SoftMax ( + =   ×   ). 3) the exponential function which represents a transformation into another space.
In MIR, (Mok et al., 2022) proposed the use of the attention mechanism for affine MIR such that multi-head attention was used in the encoder, and convolutional units in the decoder.Transformers were embedded partially for deformable MIR in Transmorph (Chen, J. et al., 2022).Transmorph is a coarse-fine IR such that affine alignment is conducted in the first stage followed by deformable alignment in the second stage.The latter stage is a Voxelmorph-like registration with U-Net architecture except that the encoder part consists of transformers instead of ConvNets.Transmorph introduced transformers (self-attention blocks) as a part of the encoder only but not the decoder.Ma et al. ( 2022) attributed the difficulty of developing transformers for MIR to the large number of trainable parameters of a transformer unit compared to convolutional units.To reduce the number of parameters, the authors proposed the use of both convolution units and transformer units in an MIR model -SymTrans (Ma et al., 2022).SymTrans embedded transformers in both the encoder and the decoder (2 blocks in the encoder and 2 in the decoder).
The utilization of transformers in MIR was not as fast and revolutionary as it was in other domains.That could be attributed to the relatively small number of images in MIR datasets compared to other tasks.For example, millions of images were used for the ViLT model (Kim et al., 2021), and up to 0.8 billion images for the GiT model (Wang, J. et al., 2022).

No Registration
Another potential research direction is the elimination of the image registration step from the medical image analysis pipeline.In theory, an end-to-end deep learning model learns an automatic medical image analysis task (e.g., disease detection) without an explicit registration step.In (Chen, X., Zhang, et al, 2022), the authors proposed the elimination of the registration step entirely by the development of a breast cancer prediction model using vision transformers and multi-view images.9.3 Other research directions explored before include Fourier transform-based IR (Zitova et al., 2003), Reinforcement learning based IR (Chen, X. et al., 2021;George et al., 2021;Sutton et al., 1994), and GANs-based MIR (Xiao et al., 2021;Chaudhary et al., 2022;Dey et al., 2021;Goodfellow et al., 2020).There could be further research interest in the mentioned MIR research directions in the future.

Figure 3 .Figure 4 .
Figure 3.A transformation of an image from space i to space j consists of a domain deformation of a moving image followed by resampling.The replacement of the domain of the moving image  ∅  by the domain of the wrapped image before any post-processing  −  yields a wrapped image  −  shown below.Figure 7 demonstrates how  −  was obtained by the addition of ∆X to  ∅  .

Figure 7 .
Figure 7.The displacement field transforms the domain of the moving image.

Figure 6 .
Figure 6.A displacement field estimated by an algorithm is used to register a moving image.
can be done by applying the following codomain transformation {a: b, b: c, c: d, d: a}, which yields the same outcome shown in Figure 5C.That transformation alters the codomain and preserves the domain.IR is possible outside the co-domain preservation constraint.One may wonder what makes the constraint holds.
above used earlier for rigid transformation except that A is a linear transformation/matrix with no orthogonality constraint.In an affine registration, the transformation Tij imposes the constraint   ( ∅  −  ∅  ) =   ( ∅  ) −   ( ∅  ) =    −    for every point k, l ∈ the set Mp. Scaling and shear mapping are examples of an affine, but not rigid, transformation.The formula of a 2D proper rigid transformation (rotation and translation) is shown in Equation 7. The variables are the rotation angle θ, the translation on the x-axis , and the translation on the y-axis . ~= [ cos(θ) −sin(θ) sin(θ) cos(θ) ]  + [   ]

Figure 19 .
Figure 19.An example of a MIR model that estimates a transformation field and its inverse (InverseNet)

Figure 20 .
Figure 20.diagrams of unidirectional and bidirectional IR in internal and external correspondence spaces

Figure 25 .
Figure 25.Space geometry taxonomy Time a) Average registration runtime RT: The runtime (RT) is the average registration time per image.The registration time is measured from the moment  ,  at which an image p is loaded until obtaining the registered image at time  ,  including the post-processing time.See Equation 18.Where N is the number of examples in a dataset.
Figure 27.Evaluation Metrics b) is the distance between point a in the first set and point b in the second set.inf ∈ ((, )) is the infimum distance between point a and all the points in set B 95 metric replaces the supremum in the equation by the 95 percentile, which results in less sensitivity to outliers.e) Domain distance: Center of mass COM measures the displacement between two the center of two sets A, B as shown in Equations 25 and 26 COM(A, B) = dist( Center(A), Center(dice similarity coefficient (DSC) measures the overlap between two segmentation sets  ∅  , to the F1 score used in classification problems, where the segmentation problem is a classification problem on the pixel level, in which a pixel/point is assigned to a segmentation label that could be true or false.F1 = 2TP/(FP+FN +2TP).
33) b.The standard deviation of log Jacobian (SDlogJ) SDlogJ = (log ( JOCA ) ) Cardiac motion was tracked by(Ye et al., 2021) using tagging magnetic resonance imaging (t-MRI), where an unsupervised bidirectional MIR model estimated the motion field between consecutive frames.(Upendra et al., 2021, November) focused on motion extraction from 4D cardiac CMRI (Cine Magnetic Resonance Imaging), mainly the development of patient-specific right ventricle (RV) models based on kinematic analysis.A DL deformable MIR was used to estimate the motion of the RV and generate isosurface meshes of cardiac geometry.Respiratory movement can affect the quality of medical imaging by causing motion blur.To overcome this(Hou  et al., 2022)  proposed an unsupervised MIR framework for respiratory motion correction in PET (Positron Emission Tomography) images.(Chaudhary et al., 2022) focused on lung tissue expansion which is typically estimated by registering multiple scans.To reduce the number of needed scans,Chaudhary et al., (2022) proposed the use of generative adversarial learning to estimate local tissue expansion of lungs from a single CT scan.

Figure
Figure 33.Invertibility statistics Figure 34.Data type statistics

Image registration etymology: in dictionaries
X′ ij p , Y′ ij p > is the ground truth outcome <domain and codomain> of image p after IR to space j.
p ) codomain values of an image p in space i, where ∅ is a reference unknown codomain (used with raw data).p is an index of a registration example in a dataset.′ the apostrophe indicates ground truth, for example, <

Table 2 :
Image registration definitions in MIR review papers (Haskins et al., 2020)ng two images so that anatomical features would spatially coincide.This is required when analyzing pairs of images that were taken at different times or taken by different imaging modalities."C1(Zitovaetal., 2003)"Image registration is the process of overlaying images (two or more) of the same scene taken at different times, from different viewpoints, and/or by different sensors.The registration geometrically aligns two images (the reference and sensed images)."C2(Chen,X., Wang et al, 2022)"the process of aligning two or more images into one coordinate system with matched contents."C3(Haskinsetal., 2020)"The process of transforming different image datasets into one coordinate system with matched imaging contents, which has significant applications in medicine.Registration may be necessary when analyzing a pair of images that were acquired from different viewpoints, at different . The goal is to obtain a transformed image    by deforming the moving image  ∅  .

Table 3
(Bashkanov et al., 2021;Yang, Q. et al., 2021;statistics and figures summarized the results showing, for example, that the distribution of the ROIs was 36% for the prostate(Bashkanov et al., 2021;Yang, Q. et al., 2021; Yang et al., 2022), 33% for the head and neck, and 26% for the thorax.Another figure showed the most frequent evaluation metrics ordered as the following: DSC > HD > TRE.
Dossun et al. (2022)reviewed the performance of deformable IR in radiotherapy treatments in real patients.First, the scope of the paper and the paper selection process were explained.Then a taxonomy of MIR evaluation metrics was mentioned but no explanation or formula was provided.A table of 7 pages compared the surveyed papers.Abbasi et al. ( Huang et al. (2022)reviewed AI applications in brain tumor imaging from a medical practitioner's perspective.They pointed out the lack and the need for studies about the use of AI tools in routine clinical practice to characterize the validity and utility of the developed AI tools.Zhang, Y. et al. (2021)elaborated on AI registration success, and highlighted challenges 1) the lack of large databases with precise annotation, 2) the need for guidance from medical experts in some cases, 3) having different opinions of experts in the case of some ambiguous images.4)excludingnon-imagingdata of the patient, like age, and medical history, and 5) the interpretability of AI models.Decuyper et al. (2021) started with an explanation of DL components covering neural network layers (CNNs, activations, normalization, pooling, and dropout), and DL architecture (e.g., Resnet, GANs, U-Net).Then the paper explained medical image acquisition and reconstruction.After a brief elaboration on IR categories, the paper elaborated on their challenges: 1) traditional iterative methods work well with unimodal images but poorly with multimodal images, or in the presence of noise, 2) deep iterative methods imply non-convex optimization that is difficult to converge, 3) In RL, deformable transformation results in a high dimensional space of possible actions, which makes it computationally difficult to train RL agents.Most previous works dealt with rigid transformation (low dimensional search space), 4) supervised learning approaches need ground-truth labels, and 5) unsupervised approaches face difficulty in back-propagating the gradients due to the multiple different steps.Finally, specific application areas were reviewed: chest pathology, breast cancer, cardiovascular diseases, abdominal diseases, neurological diseases, and whole-body imaging.This work reviewed MIR.The list of surveyed work was collected by searching using the keywords medical image registration in the Scopus database.The search query was limited to open-access MIR papers written in English and published between 2021 and 2022.The number of retrieved records was 270, out of which 38 papers were excluded based on the abstract for their irrelevance (e.g., they are about medical images but not MIR).Other 41 papers were excluded as the authors could not find open-access versions of those papers as of December 2022.Out of 191 papers, 96 have been reviewed at the time of writing this draft in addition to 10 papers published before 2021.The research questions and sub-questions of this work are shown in Table4.The outcomes of the survey are summarized in the appendix.
case is considered an implicit function in distinction with explicit transformation functions which assumes a tractable formula of the transformation functions such as rigid transformations shown in Equations 6-8.DL approaches were also called earlier "non-parametric methods".
Deep learning approaches use multiple layers of neural networks.Neural networks can estimate the transformation function in the registration problem entirely using unknown variables (called neurons).Hence, the transformation a. directly supervised deep learning approaches.