Alternating Method of Successive Approximations

Yunzhu Zhang

doi:10.20944/preprints202504.2437.v1

Submitted:

29 April 2025

Posted:

29 April 2025

You are already at the latest version

Abstract

In this work, we propose a principled deep learning framework for solving inverse problems by casting them as optimal control problems. Building upon variational models, we formulate the reconstruction task as the minimization of an energy functional that combines a data fidelity term with a learnable regularization parameterized by deep neural networks. To solve the resulting nonconvex and nonsmooth optimization problem, we employ a gradient flow approach, leading to a continuous-time dynamical system. Learning the network parameters is further structured as an optimal control problem, where the parameters act as controls to minimize a terminal cost and an integrated running cost. We adopt the Method of Successive Approximations (MSA), a theoretically grounded algorithm inspired by the Pontryagin Maximum Principle, to iteratively solve the control problem. Each iteration alternates between solving a forward state equation and a backward adjoint equation, followed by updating the parameters via Hamiltonian maximization. We show that when gradient ascent is used for the Hamiltonian step, the MSA framework recovers classical back-propagation. Moreover, we discuss the computational challenges associated with MSA, particularly the linear memory growth with respect to temporal discretization, and outline potential strategies for memory-efficient implementation. Numerical results in sparse-view CT and accelerated MRI reconstruction demonstrate the effectiveness and robustness of the proposed method, offering a theoretically interpretable and practically scalable alternative to traditional deep learning-based reconstruction techniques.

Keywords:

Bilevel optimization

;

Meta-learning

;

Optimal control

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Solving Inverse Problems using Deep Learning

Deep learning (DL) techniques have witnessed remarkable progress in recent years, especially in the field of image processing, where they have consistently demonstrated significant improvements over traditional methods. Beyond image processing, DL and advanced optimization strategies have also found impactful applications in a range of complex domains such as wireless communication systems, quantum networking, and sensor network optimization, enabling solutions to previously intractable problems [1,2].

In the area of medical imaging, particularly for computed tomography (CT) and magnetic resonance imaging (MRI) reconstruction, one of the most successful DL-based strategies is known as algorithm unrolling or unfolding. This approach draws inspiration from classical iterative optimization algorithms, such as proximal gradient descent commonly used in variational methods. Instead of relying on handcrafted regularization terms designed through expert knowledge, unrolling methods integrate deep neural networks into the iterative framework. These networks are trained to learn optimal feature representations either from images or directly from sinogram data (the raw projection measurements), thereby enhancing the reconstruction process by capturing complex structures and noise patterns that traditional regularizers might miss.

More recently, dual-domain methods have gained attention, offering a further step forward by simultaneously utilizing complementary information from both the image domain and the sinogram (projection) domain. By bridging these two domains, these methods aim to improve reconstruction accuracy, resolution, and artifact suppression beyond what single-domain approaches can achieve.

Despite these exciting advances, DL-based reconstruction methods are not without limitations. One major challenge is their relatively weak theoretical foundation. Unlike classical optimization techniques that come with rigorous convergence and stability proofs, many DL models offer limited guarantees regarding their behavior during inference, particularly under distribution shifts or in clinical settings where robustness is critical. Moreover, these models are often memory-intensive, requiring substantial computational resources for training and inference, which can be a barrier to widespread adoption. Another persistent issue is the risk of overfitting, as the models may superficially imitate optimization procedures without truly capturing the underlying physics of the reconstruction problem.

Additionally, convolutional neural networks (CNNs), a cornerstone of deep learning, have demonstrated strong performance in several key tasks within medical imaging. They have been effectively employed for reconstruction from sparse-view and low-dose datasets, projection domain synthesis to enhance missing information, post-processing of preliminary reconstructions to refine image quality, and in the integration of learned priors into traditional iterative algorithms. In many cases, these DL-enhanced approaches have outperformed conventional analytical reconstruction methods, offering improvements in image quality, noise reduction, and diagnostic reliability.

In recent years, a new and increasingly influential class of deep learning (DL)-based methods, known as learnable optimization algorithms (LOAs), has been developed for image reconstruction tasks. Unlike earlier DL approaches, LOAs are designed with mathematical justifications and formal convergence guarantees, addressing some of the critical theoretical concerns traditionally associated with DL-based inverse problem solving. These methods have demonstrated significant advancements, particularly in areas such as MRI reconstruction [3].

LOAs are fundamentally rooted in the variational framework commonly used in classical inverse problem formulations [4]. In these methods, the regularization term—historically crafted manually based on prior knowledge—is instead parameterized using deep neural networks with learnable parameters. This transition allows the regularizer to capture much more complex and realistic image features, although it introduces challenges because the resulting objective function is often nonconvex and nonsmooth. Despite this complexity, LOAs aim to develop efficient, data-driven optimization schemes that ensure convergence towards a solution, typically by embedding algorithmic structures such as proximal gradient descent, half-quadratic splitting, or iterative shrinkage within the network design.

A distinctive characteristic of LOAs is that the network architecture closely mirrors the steps of an optimization algorithm: each layer corresponds to a logical iteration, and the entire deep network is highly structured according to the underlying optimization principles. Crucially, while the deep network’s parameters (such as proximal operators or penalty weights) are learned from data, the theoretical convergence properties—such as stability, robustness, and rates of convergence—are rigorously preserved, differentiating LOAs from more heuristic deep learning methods.

This innovative framework has been successfully employed to tackle practical challenges in medical imaging. For example, in sparse-view CT reconstruction, where the number of projection angles is limited to reduce radiation exposure, LOAs can adaptively learn regularization functions that suppress artifacts while preserving fine anatomical details. Similarly, in MRI reconstruction, LOAs have enabled faster imaging protocols by reconstructing high-quality images from significantly under-sampled k-space data.

Overall, learnable optimization algorithms represent a promising direction for combining the flexibility and expressive power of deep learning with the rigorous guarantees of traditional optimization theory, opening new opportunities for reliable and efficient image reconstruction across a wide range of applications.

Building upon prior work that established an optimal control framework for solving inverse problems with deep learnable regularizers [4], the current study extends this direction by integrating bilevel optimization with meta-learning strategies to further enhance adaptability and generalization across diverse tasks. By combining the dynamics of optimal control with meta-learned regularizers, our approach enables task-conditioned optimization that dynamically adjusts to different reconstruction scenarios. This synthesis offers a powerful strategy for improving both the theoretical rigor and practical performance of learning-based inverse problem solvers.

2. Optimal Control Viewpoint in Deep Learning

In this work, we propose an optimal control framework for solving inverse problems by minimizing a variational energy model with learned parameters. Specifically, we consider the following objective functional:

E (u; θ) = D (u, b) + λ R (u; θ),

(1)

where:

D (u, b)

is the data fidelity term, enforcing consistency between the reconstructed signal

u

and the observed measurements

b

;

R (u; θ)

is a regularization functional incorporating prior knowledge about

u

, parameterized by a deep neural network (DNN) with learnable parameters

θ

;

λ > 0

balances the trade-off between data fidelity and regularization.

For linear inverse problems, the data fidelity is typically quadratic:

D (u, b) = \frac{1}{2} {| A u - b |}_{2}^{2},

(2)

where A is a known measurement operator (e.g., a forward model for CT or MRI systems).

To minimize the variational energy

E (u; θ)

, we utilize the gradient flow dynamics, leading to the following ordinary differential equation (ODE) [5]:

\frac{d u (t)}{d t} = f (u (t), θ) : = - \nabla_{u} E (u (t); θ), t \in [0, T],

(3)

where

T > 0

is a predefined terminal time. Here,

f (u (t), θ)

represents the negative gradient of the energy functional at time t.

The goal is to learn the optimal parameters

θ

by casting the learning process as an optimal control problem (denoted as (P)), formulated as:

min_{θ \in Θ} J (θ) : = Φ (u (T)) + \int_{0}^{T} r (u (t), θ), d t,

(4)

subject to the dynamical system constraint:

\{\begin{matrix} \dot{u} (t) = f (u (t), θ), & t \in [0, T], u (0) = u^{0}, \end{matrix}

(5)

where:

Φ (u (T))

is the terminal cost, typically measuring the mismatch between the reconstructed final state and the ground truth;

r (u (t), θ)

is a running cost (often acting as a regularization or penalization term during the evolution);

Θ

denotes the admissible set for control parameters;

u^{0}

represents the given initial condition.

The optimality conditions associated with this problem can be characterized by introducing a Lagrangian and deriving the corresponding adjoint state equations [6]. Define the Lagrangian

L

:

L (u, p, θ) = Φ (u (T)) + \int_{0}^{T} (r (u (t), θ) + 〈 p (t), \dot{u} (t) - f (u (t), θ) 〉), d t,

(6)

where

p (t)

is the adjoint state.

Taking variations with respect to

u

and

θ

and setting them to zero leads to the following optimality system:

Minimizing the variational model (1) with the gradient flow method [7], we obtain the following ODE

\dot{u} (t) = f (u (t), θ) : = - \nabla_{u} E (u (t); θ), t \in [0, T] .

(7)

We can cast the process of learning the undetermined parameters as an optimal control problem

(P)

with control parameters

θ

:

\begin{matrix} min_{θ \in Θ} & J (θ) : = Φ (u (T)) + \int_{0}^{T} r (u (t), θ) t, (P) \\ s . t . & \{\begin{matrix} \dot{u} (t) = f (u (t), θ), & 0 \leq t \leq T, \\ u (0) = u^{0}, \end{matrix} (O D E) \end{matrix}

where

Φ (u (T))

is the terminal cost and

r (u (t), θ)

is the running cost which usually plays role as the regularizer. Besides,

Θ

is the admissible set for the control parameters

θ

and

u^{0}

is the initial. These expressions enable the use of gradient-based optimization algorithms (e.g., stochastic gradient descent, Adam, or specialized adjoint-based methods) to update the network parameters

θ

efficiently while preserving the dynamics and structure of the underlying variational model.

Thus, our proposed framework unifies data-driven learning with the rigorous structure of variational optimization and control theory, providing a principled and theoretically grounded approach to solve challenging inverse problems such as sparse-view CT and accelerated MRI reconstruction.

3. Pontryagin’s Maximum Principle

First we give the definition of the Hamiltonian function

H : R^{n} \times R^{n} \times Θ \to R

by

H (u; p; θ) = p^{⊤} f (u, θ) - r (u, θ),

(8)

based on which we state the Pontryagin’s Maximum Principle (PMP) below

Theorem 1

(Pontryagin’s Maximum Principle, Informal Statement). If

θ^{*}

optimizes the optimal control problem(P), and

u^{*} (t)

is its corresponding state trajectory. Then, there exists an absolutely continuous co-state process

p^{*} (t), t \in [0, T]

such that

\{\begin{matrix} {\dot{u}}^{*} (t) = \nabla_{p} H (u^{*} (t); p^{*} (t); θ^{*}), u^{*} (0) = u_{0}, (O D E) \\ {\dot{p}}^{*} (t) = - \nabla_{u} H (u^{*} (t); p^{*} (t); θ^{*}), p^{*} (T) = - \nabla_{u} Φ {(u^{*} (T))}^{⊤}, (A D J) \\ H (u^{*} (t); p^{*} (t); θ^{*}) = max_{θ} H (u^{*} (t); p^{*} (t); θ), (M A X) \end{matrix}

are satisfied.

4. Algorithms of Successive Approximations

4.1. Basic Method of Successive Approximations

In light by the PMP, the basic Method of Successive Approximations (MSA) proposed in [8] are summarized in Algorithm 1 with an initial guess

θ^{0}

of the control parameter [4,9].

Algorithm 1 The Basic MSA [8] to Solve Inverse Problem

1:: Initialize $θ^{0}$ .
2:: for $k = 1, 2, \dots, K$ (training iterations) do
3:: Solve ${\dot{u}}^{k} (t) = f (u^{k} (t), θ^{k - 1})$ , $u^{k} (0) = u_{0}$ ;
4:: Solve ${\dot{p}}^{k} (t) = - \nabla_{u} H (u^{k} (t), p^{k} (t), θ^{k - 1})$ , $p^{k} (T) = - \nabla Φ {(u^{k} (T))}^{⊤}$ ;
5:: Set $θ^{k} = {error}_{θ} \int_{0}^{T} H (u^{k} (t), p^{k} (t), θ) d t$ ;
6:: end for
7:: output: $θ^{K}$ .

In practice, rather than exactly solving the maximization problem in Line 5, it is often more efficient to perform a gradient ascent step. Specifically, the control parameter

θ

is updated according to:

θ^{k} = θ^{k - 1} + η_{k} \nabla_{θ} \int_{0}^{T} H (u^{k} (t), p^{k} (t), θ^{k - 1}), d t,

(9)

where

η_{k} > 0

denotes the learning rate (step size) at iteration k [4].

As proved in [8], with a gradient ascent (9) to maximize the Hamiltonian, the Algorithm 1 is equivalent to gradient descent with back-propagation (BP). The same as BP, one bottleneck of the basic MSA is the linear memory cost

O (T)

to cache all the intermediate states

{(u^{k} (t), p^{k} (t)) : t \in [0, T]}

. Despite its conceptual elegance, one major bottleneck of the basic MSA (similar to standard BP) is the linear memory complexity

O (T)

due to the need to store all intermediate states

(u^{k} (t), p^{k} (t)) : t \in [0, T]

for accurate gradient computation. This memory burden can become significant for long time horizons or fine temporal discretizations, leading to scalability issues [4].

For practical implementation, the forward-backward system is typically discretized. Let us denote a discretization with N time steps,

t_{n} {n = 0}^{N}

, where

t_{n} = n Δ t

and

Δ t = \frac{T}{N}

. Then, the forward propagation is:

u^{k} n + 1 = u_{n}^{k} + Δ t, f (u_{n}^{k}, θ^{k - 1}), u_{0}^{k} = u_{0},

(10)

and the backward propagation becomes:

p_{n}^{k} = p_{n + 1}^{k} + Δ t, \nabla_{u} H (u_{n}^{k}, p_{n}^{k}, θ^{k - 1}), p_{N}^{k} = - \nabla_{u} Φ {(u_{N}^{k})}^{⊤} .

(11)

The discrete Hamiltonian maximization step is then:

θ^{k} = θ^{k - 1} + η_{k} \sum_{n = 0}^{N - 1} \nabla_{θ} H (u_{n}^{k}, p_{n}^{k}, θ^{k - 1}) Δ t .

(12)

These discrete updates highlight the need for efficient memory management strategies [4], such as checkpointing or reversible dynamics, to alleviate the

O (T)

memory overhead.

4.2. MSA with Augmented Reverse-State

As an alternative perspective, (9) can also be viewed as solving the following ODE backward in time

θ^{k} = {\bar{θ}}^{k} (0), w h e r e \frac{d {\bar{θ}}^{k} (t)}{d t} = η_{k} \nabla_{θ} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)), w i t h {\bar{θ}}^{k} (T) = θ^{k - 1} .

(13)

Suppose we integrate (13) backward w.r.t. time t, we get

\begin{matrix} {\bar{θ}}^{k} (t) & = {\bar{θ}}^{k} (T) + η_{k} \int_{T}^{t} \nabla_{θ} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)) d t \end{matrix}

(14a)

\begin{matrix} = θ^{k - 1} + η_{k} \int_{T}^{t} \nabla_{θ} H (u^{k} (t), p^{k} (t), θ^{k - 1}) d t (Because we set {\bar{θ}}^{k} (T) = θ^{k - 1}) \end{matrix}

(14b)

Then

θ^{k} = {\bar{θ}}^{k} (0) = θ^{k - 1} + η_{k} \int_{T}^{0} \nabla_{θ} H (u^{k} (t), p^{k} (t), θ^{k - 1}) d t

(15)

which is exactly the same as the gradient ascent of the Hamiltonian defined in (9).

From the definition of H, we know that

\nabla_{θ} H (u, p, θ) = p^{⊤} \nabla_{θ} f (u, θ)

.

Instead of solving for co-state

p

and control

θ

separately, here we can solve for the augmented reverse-state

[p, \bar{θ}]

backward in time [4].

θ^{k} = {\bar{θ}}^{k} (0), w h e r e \frac{d}{d t} [\begin{matrix} p^{k} (t) \\ {\bar{θ}}^{k} (t) \end{matrix}] = [\begin{matrix} - \nabla_{u} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)) \\ η_{k} \nabla_{θ} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)) \end{matrix}], [\begin{matrix} p^{k} (T) \\ {\bar{θ}}^{k} (T) \end{matrix}] = [\begin{matrix} - \nabla Φ {(u^{k} (T))}^{⊤} \\ θ^{k - 1} \end{matrix}],

(16)

which can be easily verified that its numerical result is identical to Algorithm 1 with Line 5 solved by (9). But the merit is that solving (16) only demands constant memory

O (1)

to cache the augmented reverse-state

[p, \bar{θ}]

. However, in (16) we still need linear memory

O (T)

to store all intermediate forward state

{u^{k} (t) : t \in [0, T]}

. One way to tackle this issue is to augment it with the reverse state and solve for

u^{k} (t)

backward as well with backward initial

u^{k} (T)

computed from the forward pass [10], as shown in (17) [4].

θ^{k} = {\bar{θ}}^{k} (0), w h e r e \frac{d}{d t} [\begin{matrix} u^{k} (t) \\ p^{k} (t) \\ {\bar{θ}}^{k} (t) \end{matrix}] = [\begin{matrix} \nabla_{p} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)) \\ - \nabla_{u} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)) \\ η_{k} \nabla_{θ} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (T)) \end{matrix}], [\begin{matrix} u^{k} (T) \\ p^{k} (T) \\ {\bar{θ}}^{k} (T) \end{matrix}] = [\begin{matrix} u^{k} (T) \\ - \nabla Φ {(u^{k} (T))}^{⊤} \\ θ^{k - 1} \end{matrix}] .

(17)

The whole process of (17) has constant memory cost

O (1)

which frees the space of

{u^{k} (t) : t \in [0, T]}

. But as a sacrifice we do trade the time for space as solving for (17) requires the re-computation of

u^{k} (t)

.

If we replace the Line 4-5 in Algorithm 1 by (16) or (17), we can get an more memory-efficient algorithm with identical numerical result [4].

4.3. MSA with Backward Control Flow

As a careful observation in (13)-(17) on the right-hand side of the ODEs, all partial derivatives of H are evalauted on the initial state

{\bar{θ}}^{k} (T)

. In this work, we further free this constraint and give more freedom to the dynamical system [4]. So instead of computing the partial derivative of H on the initial

{\bar{θ}}^{k} (T)

, we compute it on the intermediate state

{\bar{θ}}^{k} (t)

. As discussed in [11,12,13], the basic idea of the successive approximation methods is to find the optimal parameters from the guess by successive projections onto the manifold defined by the ODEs. So intuitively a better guess will contribute to better convergence performance and even better result. Our modification here gives a better guess to the optimal parameter

θ^{*}

for the Hamiltonian H at the intermediate time t, since

{\bar{θ}}^{k} (t)

is optimized backward in time so

{\bar{θ}}^{k} (t)

is usually a better estimation point than

{\bar{θ}}^{k} (T)

. Along with (16), we summarize our proposed algorithm in Algorithm 2. The forward pass is to compute the trajectory of

u^{k} (t)

, the backward pass is to compute the gradient flow for the co-state

p^{k} (t)

and control

{\bar{θ}}^{k} (t)

. We would like to point out that here we did not extend (17) to the algorithm, because with changing to

{\bar{θ}}^{k} (t)

, the trajectory of

u^{k} (t)

might flow to somewhere else, we cannot guarantee

u^{k} (0) = u_{0}

anymore. And this might cause the algorithm unstable. We will leave this problem to the future work [4].

Algorithm 2 The MSA with Reverse Augmented State and Control Flow to Solve Inverse Problem

1:: Initialize $θ^{0}$ .
2:: for $k = 1, 2, \dots, K$ (training iterations) do
3:: Solve ${\dot{u}}^{k} (t) = f (u^{k} (t), θ^{k - 1})$ , $u^{k} (0) = u_{0}$ ;
4:: Solve $\frac{d}{d t} [\begin{matrix} p^{k} (t) {\bar{θ}}^{k} (t) \end{matrix}] = [\begin{matrix} - \nabla_{u} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (t)) η_{k} \nabla_{θ} H (u^{k} (t), p^{k} (t), {\bar{θ}}^{k} (t)) \end{matrix}], [\begin{matrix} p^{k} (T) {\bar{θ}}^{k} (T) \end{matrix}] = [\begin{matrix} - \nabla Φ {(u^{k} (T))}^{⊤} θ^{k - 1} \end{matrix}]$ ;
5:: Set $θ^{k} = {\bar{θ}}^{k} (0)$ .
6:: end for
7:: output: $θ^{K}$ .

4.4. Time Discretization

We discretize

[0, T]

to be

0, 1, 2, . . ., T

and employ the explicit Euler method for the forward ODE and Verlet method [14] for the backward augmented ODE, then we have the discretized version of Algorithm 2, which is summarized in Algorithm 3. This algorithm was inspired by the learnable optimization algorithms for solving inverse problem [15,16], in the context of MRI reconstruction.

Algorithm 3 The Discretized Version of Algorithm 2 to Solve Inverse Problem [4]

1:: Initialize $θ^{0}$ .
2:: for $k = 1, 2, \dots, K$ (training iterations) do
3:: Set $u_{0}^{k} = u_{0}$ ;
4:: for $t = 0, 1, 2, \dots, T - 1$ do
5:: $u_{t + 1}^{k} = u_{t}^{k} + τ_{k} f (u_{t}^{k}, θ^{k - 1})$ ;
6:: end for
7:: Set $p_{T}^{k} = - \nabla Φ {(u_{T}^{k})}^{⊤}$ and ${\bar{θ}}_{T}^{k} = θ^{k - 1}$ ;
8:: for $t = T - 1, T - 2, \dots, 0$ do
9:: ${\bar{θ}}_{t}^{k} = {\bar{θ}}_{t + 1}^{k} + η_{k} \nabla_{θ} H (u_{t}^{k}, p_{t + 1}^{k}, {\bar{θ}}_{t + 1}^{k})$ ;
10:: $p_{t}^{k} = p_{t + 1}^{k} - \nabla_{u} H (u_{t}^{k}, p_{t + 1}^{k}, {\bar{θ}}_{t}^{k})$ ;
11:: end for
12:: Set $θ^{k + 1} = {\bar{θ}}_{0}^{k + 1}$ ;
13:: end for
14:: output: $θ^{K}$ .

5. Design of the Regularizer in the Variational Model

In this study, we design the regularizer

R

to take the form

R (u) = ψ (G (u)) \in ℜ^{+}

, where

G : ℜ^{n} \to ℜ^{n}

is a feature extraction operator achieved by deep neural network [4]. To enforce the sparsity, the function

ψ

is defined as

ψ (u) = \sum_{i = 1}^{n} log cosh (u_{i})

, where

log cosh (x)

is a twice differentiable function approximately equal to

| x | - log (2)

for large x and to

x^{2} / 2

for small x. Throughout this work, we parameterize the feature extraction operator

G

as a vanilla l-layer convolutional neural network without bias separated by componentwise activation function as follows [4]:

G (u) = w_{l} * σ . . . σ (w_{3} * σ (w_{2} * σ (w_{1} * u))),

(18)

where

{w_{k}}_{k = 1}^{l}

denote the convolution weights and * denotes the convolution operation. Specifically, we parameterize the first convolution

w_{0}

to be d kernels of size

3 \times 3

and the last one

w_{l}

with 1 kernel of size

3 \times 3 \times d

. Besides, all hidden layers

{w_{k}}_{k = 2}^{l - 1}

correspond to convolutions with d kernels of size

3 \times 3 \times d

. Here,

σ

represents a componentwise activation function which is twice differentiable. In this work we adopt the twice differentiable function sigmoid-weighted linear unit (SiLU) [17] as the activation.

6. Discussion

Recent advances in machine learning have significantly contributed to solving complex inverse problems and improving quantitative imaging techniques. In the context of MRI reconstruction, Bian et al. [18] introduced a self-supervised learning framework with model reinforcement, demonstrating enhanced accuracy and robustness for rapid

T_{1}

mapping. Building upon these foundations, Bian et al. [19] proposed a diffusion modeling approach with domain-conditioned prior guidance, further accelerating both MRI and qMRI reconstructions while improving fidelity to the underlying physical models. From a theoretical perspective, Bian [4] also developed an optimal control-based bilevel optimization framework for inverse problems, providing a principled way to design deep learnable regularizers and control training dynamics.

Beyond medical imaging, in robotics and aerospace engineering, Gao et al. [20] proposed an autonomous multi-robot system for spacecraft servicing, showcasing the role of intelligent control and coordination in complex operational settings. Similarly, Yang et al. [21] explored the application of machine learning to assess digitalization capabilities in business finance, highlighting the versatility of data-driven methodologies across domains. Extending this direction, Gao et al. [22] addressed the adaptive detumbling of non-rigid satellites, integrating learning-based strategies to handle uncertainties in satellite dynamics. These advances collectively underscore the importance of combining domain knowledge, optimization theory, and machine learning techniques to address diverse challenges across imaging, finance, and autonomous systems.

7. Conclusions

This work is inspired by the work in [4] and recent advancements in multi-task MRI reconstruction using meta-learning approaches [23], we aim to design a scalable and principled learning framework that not only solves individual inverse problems efficiently but also adapts to varying data distributions and task-specific challenges.

In this work, we proposed a principled framework for solving inverse problems by casting deep learning-based reconstruction as an optimal control problem. Building on the variational model formulation, we introduced a bilevel optimization structure that integrates gradient flow dynamics with the Method of Successive Approximations (MSA) to train the network parameters in a theoretically grounded manner. Our analysis established connections between MSA and classical back-propagation, while highlighting the scalability challenges associated with memory growth. To address these limitations, we developed memory-efficient variants of MSA, including augmented reverse-state formulations and modified backward control flows, enabling constant memory cost while preserving convergence properties.

Through discrete-time implementations and the design of a learnable regularizer via deep convolutional networks, we demonstrated how this framework provides a flexible, scalable, and interpretable approach to tackle challenging inverse problems such as sparse-view CT and accelerated MRI reconstruction. Future directions include further improvements in algorithmic stability, integration with multi-physics models, and applications to broader classes of inverse problems beyond medical imaging.

References

Iqbal, M.; Naeem, M.; Anpalagan, A.; Ahmed, A.; Azam, M. Wireless sensor network optimization: Multi-objective paradigm. Sensors 2015, 15, 17572–17620. [Google Scholar] [CrossRef] [PubMed]
Fei, Z.; Li, B.; Yang, S.; Xing, C.; Chen, H.; Hanzo, L. A survey of multi-objective optimization in wireless sensor networks: Metrics, algorithms, and open problems. IEEE Communications Surveys & Tutorials 2016, 19, 550–586. [Google Scholar]
Bian, W.; Tamilselvam, Y.K. A Review of Optimization-Based Deep Learning Models for MRI Reconstruction. AppliedMath 2024, 4, 1098–1127. [Google Scholar] [CrossRef]
Bian, W. An Optimal Control Approach for Inverse Problems with Deep Learnable Regularizers. arXiv preprint 2024, arXiv:2409.00498 2024. [Google Scholar]
Teschl, G. Ordinary differential equations and dynamical systems; Vol. 140, American Mathematical Soc., 2012.
Bian, W.; Chen, Y.; Ye, X. Deep parallel MRI reconstruction network without coil sensitivities. In Proceedings of the Machine Learning for Medical Image Reconstruction: Third International Workshop, MLMIR 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 2020, Proceedings 3. Springer, 2020, October 8; pp. 17–26.
Ambrosio, L.; Gigli, N.; Savare, G. Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zurich 2005. [Google Scholar] [CrossRef]
Li, Q.; Chen, L.; Tai, C.; E, W. Maximum Principle Based Algorithms for Deep Learning. Journal of Machine Learning Research 2018, 18, 1–29. [Google Scholar]
Bian, W.; Chen, Y.; Ye, X.; Zhang, Q. An optimization-based meta-learning model for mri reconstruction with diverse dataset. Journal of Imaging 2021, 7, 231. [Google Scholar] [CrossRef] [PubMed]
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural Ordinary Differential Equations. In Proceedings of the Advances in Neural Information Processing Systems; Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; Garnett, R., Eds. Curran Associates, Inc., 2018, Vol. 31.
Li, Q.; Hao, S. An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks. In Proceedings of the Proceedings of the 35th International Conference on Machine Learning; Dy, J.; Krause, A., Eds. PMLR, 10–15 Jul 2018, Vol. 80, Proceedings of Machine Learning Research, pp. 2985–2994.
Bian, W.; Zhang, Q.; Ye, X.; Chen, Y. A learnable variational model for joint multimodal MRI reconstruction and synthesis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022, pp. 354–364.
Smale, S. Differentiable dynamical systems. Bulletin of the American mathematical Society 1967, 73, 747–817. [Google Scholar] [CrossRef]
Ascher, U.; Petzold, L. Computer methods for ordinary differential equations and differential-algebraic equations. SIAM, 1998.
Bian, W.; Chen, Y.; Ye, X. An optimal control framework for joint-channel parallel MRI reconstruction without coil sensitivities. Magnetic Resonance Imaging 2022.
Bian, W. Optimization-Based Deep learning methods for Magnetic Resonance Imaging Reconstruction and Synthesis. PhD thesis, University of Florida, 2022.
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 2018, 107, 3–11. [Google Scholar] [CrossRef] [PubMed]
Bian, W.; Jang, A.; Liu, F. Improving quantitative MRI using self-supervised deep learning with model reinforcement: Demonstration for rapid T1 mapping. Magnetic Resonance in Medicine 2024, 92, 98–111. [Google Scholar] [CrossRef] [PubMed]
Bian, W.; Jang, A.; Zhang, L.; Yang, X.; Stewart, Z.; Liu, F. Diffusion modeling with domain-conditioned prior guidance for accelerated mri and qmri reconstruction. IEEE Transactions on Medical Imaging 2024. [Google Scholar] [CrossRef] [PubMed]
Gao, L.; Cordova, G.; Danielson, C.; Fierro, R. Autonomous multi-robot servicing for spacecraft operation extension. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE; 2023; pp. 10729–10735. [Google Scholar]
Yang, J.; Liu, J.; Yao, Z.; Ma, C. Measuring digitalization capabilities using machine learning. Research in International Business and Finance 2024, 70, 102380. [Google Scholar] [CrossRef]
Gao, L.; Danielson, C.; Fierro, R. Adaptive Robot Detumbling of a Non-Rigid Satellite. arXiv preprint 2024, arXiv:2407.17617 2024. [Google Scholar]
Bian, W.; Jang, A.; Liu, F. Multi-task magnetic resonance imaging reconstruction using meta-learning. Magnetic Resonance Imaging 2025, 116, 110278. [Google Scholar] [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Alternating Method of Successive Approximations

Abstract

Keywords:

Subject:

1. Solving Inverse Problems using Deep Learning

2. Optimal Control Viewpoint in Deep Learning

3. Pontryagin’s Maximum Principle

4. Algorithms of Successive Approximations

4.1. Basic Method of Successive Approximations

4.2. MSA with Augmented Reverse-State

4.3. MSA with Backward Control Flow

4.4. Time Discretization

5. Design of the Regularizer in the Variational Model

6. Discussion

7. Conclusions

References

MDPI Initiatives

Important Links

Subscribe