Correct Degree Selection for Koopman Mode Decomposition

Kilho Shin; Shodai Asaoka

doi:10.20944/preprints202512.1616.v1

Submitted:

10 December 2025

Posted:

18 December 2025

You are already at the latest version

Abstract

Fourier Decomposition (FD) and Koopman Mode Decomposition (KMD) are important tools for time series data analysis, applied across a broad spectrum of applications. Both aim to decompose time series functions into superpositions of countably many wave functions, with strikingly similar mathematical foundations. These methodologies derive from the linear decomposition of functions within specific function spaces: FD uses a fixed basis of sine and cosine functions, while KMD employs eigenfunctions of the Koopman linear operator. A notable distinction lies in their scope: FD is confined to periodic functions, while KMD can decompose functions into exponentially amplifying or damping waveforms, making it potentially better suited for describing phenomena beyond FD’s capabilities. However, practical applications of KMD often show that despite accurate approximation of training data, its prediction accuracy is limited. This paper clarifies that this issue is closely related to the number of wave components used in decomposition, referred to as the degree of a KMD. Existing methods use predetermined, arbitrary, or ad hoc values for this degree. We demonstrate that using a degree different from a uniquely determined value for the data allows infinite KMDs to accurately approximate training data, explaining why current methods, which select a single KMD from these candidates, struggle with prediction accuracy. Furthermore, we introduce mathematically supported algorithms to determine the correct degree. Simulations verify that our algorithms can identify the right degrees and generate KMDs that can make accurate predictions, even with noisy data.

Keywords:

Koopman mode decomposition

;

uniquely feasible degree

;

Hankel matrix

;

spectral decomposition

;

model selection

;

dynamical systems

;

time series analysis

Subject:

Computer Science and Mathematics - Mathematics

1. Introduction

A wide range of natural and social phenomena are observed as superpositions of multiple nonlinear elemental processes. For example, recorded audio signals typically include not only the target speech—for instance, a conversation between individuals—but also various environmental noise components. Similarly, variations in the geomagnetic field arise from both internal processes, such as temporal fluctuations in the Earth’s main magnetic field, and external perturbations, such as solar flares and solar wind. Decomposing such observations into constituent processes and extracting only the components relevant to the study is a fundamental procedure in scientific research. Among such methods, frequency analysis—where time-series data are decomposed into countably (often finitely) many frequency components—plays a central role in data science.

A foundational principle across the natural sciences involves reducing nonlinear phenomena to linear problems, enabling analysis via linear algebra. For instance, kernel methods in machine learning embed data into high-dimensional (often infinite-dimensional Hilbert) spaces to facilitate linear solutions. Likewise, neural networks—which approximate arbitrary continuous (and thus potentially nonlinear) functions via linear combinations followed by nonlinear activations—rely on efficient linear transformations during training. The backpropagation algorithm, essential for learning from large-scale data, exemplifies this reliance.

A well-established and widely applied technique based on this principle is Fourier decomposition, which forms the foundation of frequency analysis across a wide range of fields. Grounded in functional analysis, it represents a function as a linear combination of frequency components, typically expressed in terms of an orthonormal basis of trigonometric functions. This decomposition facilitates tasks such as signal characterization and noise reduction and has found broad applications in speech recognition and compression, image processing, radar and sonar analysis, time-series forecasting, and medical imaging.

Koopman Mode Decomposition (KMD) has recently attracted considerable theoretical attention as a powerful extension of Fourier decomposition. Its origin traces back to the 1931 work of B.O. Koopman, who formulated a representation of nonlinear dynamical systems through linear operators acting on function spaces—now referred to as Koopman operators. The theoretical foundation of this framework was subsequently formalized, and beginning in the 1990s, research by I. Mezić and collaborators renewed interest in its potential to reveal latent dynamics in nonlinear systems. The development of Dynamic Mode Decomposition (DMD) by P.J. Schmid ignited a new wave of research and led to advanced extensions such as Extended DMD (EDMD), which enable practical estimation of Koopman spectral components from data beyond the original limitations of DMD. Although KMD often provides accurate representations of observed data, it has been noted that its predictive accuracy may deteriorate under certain conditions.

The present study aims to address this limitation by identifying the sources of prediction error in KMD and proposing efficient algorithms to extract those Koopman modes that, if present, are the only viable candidates for accurate forecasting.

To illustrate Koopman Mode Decomposition and our contributions, we begin by recalling the concept of Fourier Decomposition (FD). Let

f (t)

be a

2 π

-periodic, complex-valued function in

L^{2}

. That is,

f (t + 2 π) = f (t)

for all t, and

\int_{- π}^{π} {| f (t) |}^{2} d t < \infty .

The space of such functions forms a Hilbert space, where the inner product between

g (t)

and

h (t)

is defined by

〈 g, h 〉 = \int_{- π}^{π} g (t) \cdot \bar{h (t)} d t,

with

\bar{x}

denoting the complex conjugate of x. Then,

f (t)

admits the decomposition:

f (t) = \sum_{n = - \infty}^{\infty} c_{n} e^{i n t} .

(1)

This decomposition is justified by the fact that the family

\{\frac{1}{\sqrt{2 π}} e^{i n t} ∣ n \in Z\}

forms a countable orthonormal basis for the Hilbert space of

2 π

-periodic functions in

L^{2} ([0, 2 π])

. The convergence in Equation (1) is understood in the

L^{2}

-norm. Although such convergence does not imply pointwise convergence, Riesz’s theorem [1] [Theorem 3.12] guarantees that a subsequence of the partial sums converges pointwise almost everywhere.

Koopman Mode Decomposition (KMD) [2] is similar to FD in that it expresses a function as a sum of oscillatory components. However, unlike FD, KMD allows for exponentially growing or decaying components. Hence, if a KMD of

f (t)

exists, it takes the form:

f (t) = \sum_{n = 0}^{\infty} c_{n} λ_{n}^{t},

(2)

where

λ_{n}^{t} = {| λ_{n} |}^{t} e^{i arg (λ_{n}) t}

. Thus, unless

| λ_{n} | = 1

, each term represents an exponentially growing (if

| λ_{n} | > 1

) or decaying (if

| λ_{n} | < 1

) wave (see Figure 1). The

λ_{n}

constitute a countable subset of the spectrum of the so-called Koopman operator [2].

KMD is expected to provide a more flexible framework for representing diverse phenomena and has been applied in a wide array of domains, including: fluid dynamics [3,4,5,6], chaotic systems [7], neuroscience [8], plasma physics [9,10,11], sports analytics [12], robotics [13], and video processing [14].

In practical settings, both FD and KMD rely on a finite number of observations. Without restricting the summations in Equation (1) and Equation (2) to finitely many terms, the decomposition becomes ill-posed. We therefore approximate the function by a finite superposition of ℓ oscillatory components, where ℓ is called the degree of the decomposition.

In the case of the Discrete Fourier Transform (DFT), we assume observations

f (t_{0}), \dots, f (t_{T - 1})

at

t_{k} = \frac{2 π k}{T}

, for

k = 0, \dots, T - 1

. Since

e^{i m t_{k}} = e^{i n t_{k}}

whenever

m \equiv n mod T

, the problem of finding the coefficients

c_{0}, \dots, c_{ℓ}

reduces to solving the linear system:

(f (t_{0}), \dots, f (t_{T - 1})) = (c_{0} \dots c_{T - 1}) [\begin{matrix} 1 & 1 & \dots & 1 \\ 1 & e^{i \frac{2 π}{T}} & \dots & e^{i \frac{2 π (T - 1)}{T}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & e^{i \frac{2 π (T - 1)}{T}} & \dots & e^{i \frac{2 π {(T - 1)}^{2}}{T}} \end{matrix}] .

(3)

This system has a unique solution, as the coefficient matrix is a square Vandermonde matrix over distinct T-th roots of unity.

In general, an

m \times n

matrix

V_{n ∣ a_{1}, \dots, a_{m}} = [\begin{matrix} 1 & a_{1} & a_{1}^{2} & \dots & a_{1}^{n - 1} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & a_{m} & a_{m}^{2} & \dots & a_{m}^{n - 1} \end{matrix}],

is referred to as a Vandermonde matrix, whose determinant when

m = n

is given by

det V_{m ∣ a_{1}, \dots, a_{m}} = \prod_{i > j} (a_{i} - a_{j}) .

The square matrix on the right-hand side of Equation (3) is a Vandermonde matrix, and Equation (3) can be restated as

(x_{0} \dots x_{T - 1}) = (c_{0} \dots c_{ℓ - 1}) V_{T ∣ α^{0}, α^{1}, \dots, α^{ℓ - 1}},

(4)

where

α = e^{2 π i / T}

is a primitive T-th root of unity.

By the aforementioned invertibility of the Vandermonde matrix generated by distinct points

α^{0}, \dots, α^{T - 1}

, Equation (3) admits a unique solution:

(c_{0} \dots c_{T - 1}) = \frac{1}{T} (x_{0} \dots x_{T - 1}) V_{T ∣ α^{0}, α^{1}, \dots, α^{T - 1}}^{*} .

(5)

Here,

M^{*}

denotes the conjugate transpose (i.e., Hermitian transpose) of a matrix

M

.

In contrast, the KMD problem can be formulated analogously as:

[x_{0} \dots x_{T - 1}] = [m_{1} \dots m_{ℓ}] V_{T ∣ λ_{1}, \dots, λ_{ℓ}},

(6)

with the following distinctions:

Each observable $x_{t}$ is an m-dimensional vector. We denote the matrix of observations by $X = [x_{0} \dots x_{T - 1}]$ .
The eigenvalues $λ_{1}, \dots, λ_{ℓ}$ and the corresponding modes $m_{1}, \dots, m_{ℓ}$ are unknown and must be determined.
The choice $ℓ = T$ , which is required for DFT, is entirely unsuitable for KMD: for any distinct set of $λ_{1}, \dots, λ_{T}$ , there always exists a corresponding set of modes $m_{1}, \dots, m_{T}$ such that Equation (6) holds exactly.

Despite the increased complexity of the KMD problem, several numerical methods exist to solve Equation (6) for a given degree ℓ, including: Dynamic Mode Decomposition (DMD), which is typically applicable when

ℓ = rank X

; the Arnoldi method, applicable when

ℓ = T - 1

; and the vector Prony method, which allows arbitrary ℓ. These methods yield approximate solutions minimizing the residual sum of squares (RSS), especially in the presence of observation noise.

However, what remained unresolved was how to determine an optimal degree ℓ. We illustrate its importance through the following example, highlighting the predictive risk of inappropriate choice.

Example 1.

Consider one-dimensional observables given by

X = [1 1 1 1 3 5 7]

. It is readily verified that Equation (6) admits no solution if

ℓ \leq 3

. For

ℓ = 4

, the roots

λ_{1}, \dots, λ_{4}

of the following equation

f (x; α) = x^{4} - x^{3} - x - 1 + α (x - 1) = 0

(7)

uniquely determine the modes

m_{1}, \dots, m_{4}

such that Equation (6) is satisfied, thereby yielding a valid KMD for any value of the parameter α. As illustrated in Figure 2, all such KMDs exactly reproduce the observed sequence

X

for

t = 0, \dots, 6

, yet their extrapolations for

t \geq 7

differ significantly.

This example highlights a key issue: even if an algorithm happens to return a single quartic KMD, it is merely one among infinitely many KMDs that fit the observed data. Consequently, the forecast made by such a KMD is almost certainly different from the ground truth, and the chance of accurate prediction is negligibly small.

Thus, for the sake of predictive accuracy, it is crucial to select ℓ such that it is uniquely feasible, defined as follows:

Definition 1.

Given an observable matrix

X

, a degree ℓ is said to befeasibleif there exists at least one solution

(λ_{1}, \dots, λ_{ℓ}; m_{1}, \dots, m_{ℓ})

to Equation (6). Moreover, if this solution is unique, then ℓ is said to beuniquely feasible.

This paper develops a theoretical framework for uniquely feasible degrees and, based on this foundation, proposes efficient and practical algorithms to determine whether a given set of observables

X

admits a uniquely feasible degree—and if so, to identify it. We also demonstrate through simulations that the KMD selected by our algorithms can yield highly accurate predictions.

2. Theoretical Frameworks Underlying Koopman Mode Decomposition

A key significance of Koopman Mode Decomposition (KMD) is its ability to analyze the dynamics of a nonlinear system using only methods from linear algebra. In this section, we provide a brief review of the theoretical framework of KMD, which bridges nonlinear dynamics and linear algebra.

2.1. Temporal Transition of States and Semigroup Property

Let Z denote a (possibly unobservable) state space. Under a deterministic assumption, once a state

ζ \in Z

is observed at some time, the state of the system after an elapsed time

t \geq 0

is uniquely determined and is denoted by

ζ_{t}

. Accordingly, the temporal evolution of the system is described by the mapping

\hat{ζ} : Z \times [0, \infty) ∋ (ζ, t) \mapsto ζ_{t} \in Z .

In the discrete-time setting,

\hat{ζ}

is instead defined on

Z \times ({0} \cup N)

, which can be regarded as a special case of the continuous-time formulation.

While the notation

\hat{ζ} (ζ, t)

emphasizes the bivariate nature of the mapping, the map

t \mapsto ζ_{t}

is essentially regarded as a univariate function of t, with the initial state

ζ

fixed. The deterministic assumption also requires the identity

ζ_{s + t} = {(ζ_{s})}_{t}

, which is equivalently expressed as

\hat{ζ} (ζ, s + t) = \hat{ζ} (\hat{ζ} (ζ, s), t)

for all

s, t \geq 0

. This implies that if we define

σ^{t} : = \hat{ζ} (\cdot, t) : Z \to Z

for

t \geq 0

, then the family

{σ^{t} : Z \to Z ∣ t \in [0, \infty)}

forms a one-parameter semigroup under composition, that is,

σ^{s + t} = σ^{t} \circ σ^{s}

holds for all

s, t \geq 0

.

2.2. Koopman Operator

We denote the space of

C

-valued functions defined over Z by

C^{Z}

. The function space

C^{Z}

forms a

C

-algebra, equipped with addition, multiplication, and scalar multiplication, defined as follows for

f, g \in C^{Z}

and

z \in C

:

(f + g) (ζ) = f (ζ) + g (ζ), (f g) (ζ) = f (ζ) g (ζ), and (z f) (ζ) = z f (ζ)

In particular,

C^{Z}

is a vector space over

C

.

The Koopman operator parameterized by time t is defined as

U^{t} : C^{Z} ∋ f \mapsto f \circ σ^{t} \in C^{Z} .

It is straightforward to verify that the Koopman operator is a

C

-algebra homomorphism and, in particular, a linear operator. Furthermore, we have:

Proposition 1.

The collection of Koopman operators

{U^{t} ∣ t \in [0, \infty)}

forms a one-parameter semigroup. That is,

U^{s + t} = U^{t} \circ U^{s}

holds for any

s, t \in [0, \infty)

.

2.3. Koopman Generator

In general, when a one-parameter semigroup

T = {T^{t} ∣ t \in [0, \infty)}

is defined on a Banach space

B

, it is said to be strongly continuous if, for every

x \in B

, the following norm convergence holds:

lim_{t ↓ 0} ∥ T^{t} x - x ∥ = 0 .

A strongly continuous one-parameter semigroup

T

has several important properties [15] [Chapter 13]:

Each $T^{t}$ is bounded; that is, the operator norm $∥ T^{t} ∥$ is well-defined. More precisely, there exist constants $M \geq 1$ and $ω \in R$ such that $∥ T^{t} ∥ \leq M e^{ω t}$ for all $t \geq 0$ .
The set $D (A)$ of all $x \in B$ for which

$A x : = lim_{t ↓ 0} \frac{T^{t} x - x}{t}$

exists is a dense linear subspace of $B$ , and A is a closed linear operator with domain $D (A)$ . This operator A is called the infinitesimal generator of ${T^{t}}$ .
If ${T^{t}}$ is bounded, i.e., ${sup}_{t \geq 0} ∥ T^{t} ∥ < \infty$ , then $D (A) = B$ .
For every $x \in D (A)$ , the derivative

$\frac{d}{d t} T^{t} x = lim_{τ ↓ 0} \frac{T^{t + τ} x - T^{t} x}{τ}$

exists, and we have

$\frac{d}{d t} T^{t} x = T^{t} A x = A T^{t} x$

for all $t \geq 0$ and $x \in D (A)$ .
If $A x = λ x$ for some $x \neq 0$ and $λ \in C$ (that is, $λ$ is an eigenvalue of A), then

$T^{t} x = e^{λ t} x .$

When we say that the Koopman operator semigroup

U = {U^{t} ∣ t \in [0, \infty)}

is strongly continuous, we assume that it acts on a Banach space

B

whose elements can be regarded as functions in

C^{Z}

in some way (e.g.,

B \subseteq C^{Z}

), and that for each

f \in B

the limit

lim_{t ↓ 0} U^{t} f = f

holds in the norm of

B

. Pointwise convergence of functions is a more primitive notion, and although these two modes of convergence are generally independent, they are closely related in certain settings.

Let X be a compact topological space and let $C (X)$ denote the space of continuous functions on X. Since every continuous function on a compact space is bounded, we may equip $C (X)$ with the supremum norm:

${∥ f ∥}_{\infty} : = sup_{x \in X} | f (x) | .$

With this norm, $C (X)$ is a Banach space. In this setting, convergence in norm is equivalent to uniform convergence, and in particular, uniform convergence implies pointwise convergence.
Let $(X, Σ, μ)$ be a measure space, and let $B = L^{p} (μ)$ for $1 \leq p < \infty$ . Elements of $L^{p} (μ)$ are equivalence classes of measurable functions that are equal almost everywhere. Thus, any statement about pointwise convergence should be interpreted in terms of representatives of these equivalence classes, that is, convergence almost everywhere. If a sequence ${f_{n}}_{n \in N} \subset L^{p} (μ)$ satisfies

$lim_{n \to \infty} {∥ f_{n} - f ∥}_{L^{p}} = 0,$

then there exists a subsequence that converges to f pointwise almost everywhere. This follows from the completeness of $L^{p}$ spaces and is sometimes referred to as a version of the Riesz convergence theorem (see [1] [Theorem 3.12]).

If the Koopman operator semigroup

{U^{t} ∣ t \in [0, \infty)}

is bounded, then its infinitesimal generator, referred to as the Koopman generator, and defined by

K f = lim_{t ↓ 0} \frac{U^{t} f - f}{t}, f \in D (K),

(8)

is defined on the entire Banach space.

We next consider the case in which the Koopman operators are bounded. Let

(Z, Σ, μ)

be a measure space, and suppose that for each

t \geq 0

the map

σ^{t} : Z \to Z

is measurable. We examine the boundedness of the associated Koopman operator

U^{t}

acting on

L^{p} (μ)

.

A sufficient condition for

U^{t}

to be bounded on

L^{p} (μ)

is that

σ^{t}

is non-expansive with respect to

μ

, meaning that

μ ({(σ^{t})}^{- 1} (A)) \leq μ (A) for all A \in Σ .

In this case, for any

f \in L^{p} (μ)

,

∥ U^{t} {f ∥}_{L^{p} (μ)}^{p} = \int_{Z} | f (σ^{t} (z)) |^{p} d μ (z) = \int_{Z} {| f (z) |}^{p} d ({(σ^{t})}_{*} μ) (z),

where

{(σ^{t})}_{*} μ

denotes the pushforward of

μ

by

σ^{t}

. If

σ^{t}

is non-expansive, then

{(σ^{t})}_{*} μ \leq μ

(as measures), and hence

∥ U^{t} {f ∥}_{L^{p} (μ)} \leq {∥ f ∥}_{L^{p} (μ)},

so in particular

sup_{t \geq 0} {∥ U^{t} ∥}_{L^{p} (μ) \to L^{p} (μ)} \leq 1 < \infty .

(For

p = \infty

, the same argument shows

∥ U^{t} ∥_{L^{\infty} \to L^{\infty}} \leq 1

.)

Conversely, if

σ^{t}

is expansive, i.e., there exists

A \in Σ

such that

μ ({(σ^{t})}^{- 1} (A)) > μ (A),

then the family

{U^{t}}_{t \geq 0}

may fail to be uniformly bounded in t (even though each fixed

U^{t}

can still be bounded).

In many applications, especially in ergodic theory and dynamical systems,

σ^{t}

is assumed to be measure-preserving, that is,

μ ({(σ^{t})}^{- 1} (A)) = μ (A) for all A \in Σ .

This implies

{(σ^{t})}_{*} μ = μ

, and hence

U^{t}

acts as an isometry on

L^{p} (μ)

for every

1 \leq p \leq \infty

. On

L^{2} (μ)

, the Koopman operators are therefore unitary. If, in addition,

{σ^{t}}_{t \in R}

forms a measure-preserving flow (so that

{U^{t}}_{t \in R}

is a strongly continuous unitary group), then by Stone’s theorem [1] [Theorem 13.40] the Koopman generator

K

is skew-adjoint, i.e.,

K^{*} = - K

. Since unitary and skew-adjoint operators are normal, the spectral theorem applies and provides the functional-analytic foundation for the Koopman Mode Decomposition.

2.4. Koopman Mode Decomposition and Spectral Theorem

To introduce the Koopman mode decomposition, we assume that the Koopman operator semigroup

{U^{t} ∣ t \in [0, \infty)}

defined on a Banach space

B

is strongly continuous in the norm of

B

and induces a Koopman generator defined on the entire

B

. For example, this holds when the semigroup

{U^{t}}

is bounded.

Let

σ_{p} (K) \subset C

denote the point spectrum of

K

, and let

V_{λ} \subseteq B

be the eigenspace corresponding to

λ \in σ_{p} (K)

. When f belongs to the completion in

B

of the linear span of

⋃_{λ \in σ_{p} (K)} V_{λ}

, that is, when

f = \sum_{λ \in σ_{p} (K)} ϕ_{λ}, ϕ_{λ} \in V_{λ},

holds in the norm of

B

, only countably many

ϕ_{λ}

are nonzero. We denote the corresponding eigenvalues by

{λ_{n} ∣ n = 1, 2, \dots}

. Then the Koopman mode decomposition of f is expressed as

f = \sum_{n = 1}^{N} ϕ_{λ_{n}}, N \in N \cup {\infty},

and the following relations hold:

\begin{matrix} K f = \sum_{n = 1}^{N} λ_{n} ϕ_{λ_{n}}, U^{t} f = \sum_{n = 1}^{N} e^{λ_{n} t} ϕ_{λ_{n}}, t \in [0, \infty) . \end{matrix}

If, in addition, every element of the one-parameter semigroup

{σ^{t} ∣ t \in [0, \infty)}

is measure-preserving on a measure space

(Z, Σ, μ)

, then the Koopman operators

U^{t}

and the Koopman generator

K

defined on

L^{2} (Z)

are unitary and skew-adjoint, respectively; that is, they are normal linear operators defined on the entire

B

. Hence, the Koopman mode decomposition can be understood in the context of the spectral theorem.

The spectral theorem asserts that a normal operator T defined on a Hilbert space

H

can be represented as

T = \int_{σ (T)} λ d E (λ),

where E is a projection-valued measure, which plays the role of a Borel measure defined on the Borel

σ

-algebra

B

of

σ (T) \subseteq C

. For each

B \in B

, the value

E (B)

is an orthogonal projection operator on

H

rather than a real number, and the measure E satisfies the following properties:

Orthogonality: $E (B_{1}) E (B_{2}) = E (B_{1} \cap B_{2})$ .
Countable additivity: For any countable mutually disjoint family ${B_{i}}_{i \in I} \subseteq B$ ,

$E (⋃_{i \in I} B_{i}) = \sum_{i \in I} E (B_{i}),$

where the convergence is in the strong operator topology.

Based on the projection-valued measure E, the integral of a measurable function F over

C

is defined as

F (T) = \int_{σ (T)} F (λ) d E (λ),

in complete analogy with the Lebesgue integral.

This integral representation is essential, because the spectrum

σ (T)

may be distributed continuously in

C

. Furthermore, letting

σ_{d} (T) \subseteq σ_{p} (T)

denote the set of all isolated eigenvalues of T, we can express the decomposition as

F (T) = \sum_{λ \in σ_{d} (T)} F (λ) E ({λ}) + \int_{σ (T) ∖ σ_{d} (T)} F (λ) d E (λ),

(9)

where each

E ({λ})

for

λ \in σ_{d} (T)

coincides with the orthogonal projection onto the eigenspace of

λ

.

For

f \in L^{2} (Z)

—an equivalence class of functions equal almost everywhere— the Koopman mode decomposition of f and the actions of the Koopman generator

K

and the Koopman operators

U^{t}

are obtained by setting

T = K

and assuming that the integral part of Equation (9) vanishes (or is negligible). Since

E ({λ}) f

is nonzero only for a countable subset of

σ_{d} (K)

, we label the corresponding eigenvalues as

{λ_{n}}_{n = 1}^{N}

for

N \in N \cup {\infty}

, and write

ϕ_{λ_{n}} : = E ({λ_{n}}) f

.

For $F (x) = 1$ , we obtain the Koopman mode decomposition of f:

$f = \sum_{λ \in σ_{d} (T)} E ({λ}) f = \sum_{n = 1}^{N} ϕ_{λ_{n}} .$
For $F (x) = x$ , we obtain the action of $K$ :

$K f = \sum_{λ \in σ_{d} (T)} λ E ({λ}) f = \sum_{n = 1}^{N} λ_{n} ϕ_{λ_{n}} .$
For $F (x) = e^{t x}$ , we obtain the action of $U^{t}$ :

$e^{t K} f = \sum_{λ \in σ_{d} (T)} e^{t λ} E ({λ}) f = \sum_{n = 1}^{N} e^{t λ_{n}} ϕ_{λ_{n}} .$

3. Discrete Koopman Mode Decomposition

The objective of Discrete Koopman Mode Decomposition (DKMD) is analogous to that of the Discrete Fourier Transform (DFT): to obtain a decomposition that fits a given finite sequence of samples of an unknown function. We begin by recalling the formulation of the DFT.

3.1. DFT and Vandelmonde Matrix

Let

(x_{0}, \dots, x_{T - 1})

denote the values of an unknown function observed at times

t = 0, 1, \dots, T - 1

. The DFT seeks coefficients

c_{0}, c_{\pm 1}, c_{\pm 2}, \dots

such that

x_{t} = \sum_{n = - \infty}^{\infty} c_{n} e^{i 2 π n \frac{t}{T}}, t = 0, \dots, T - 1 .

Since this is an underdetermined system with infinitely many unknowns, we restrict to a finite sum. Moreover, since

n \equiv n^{'} (mod T)

implies

e^{i 2 π n \frac{t}{T}} = e^{i 2 π n^{'} \frac{t}{T}}

, we may assume

n = 0, 1, \dots, T - 1

. Let

ω : = e^{i \frac{2 π}{T}}

. Then

x_{t} = \sum_{n = 0}^{T - 1} c_{n} ω^{n t}, t = 0, \dots, T - 1 .

In matrix form, we can write

(\begin{matrix} x_{0} & x_{1} & \dots & x_{T - 1} \end{matrix}) = (\begin{matrix} c_{0} & c_{1} & \dots & c_{T - 1} \end{matrix}) [\begin{matrix} 1 & 1 & 1 & \dots & 1 \\ 1 & ω & ω^{2} & \dots & ω^{T - 1} \\ 1 & ω^{2} & ω^{4} & \dots & ω^{2 (T - 1)} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & ω^{T - 1} & ω^{2 (T - 1)} & \dots & ω^{{(T - 1)}^{2}} \end{matrix}] .

(10)

The coefficient matrix above is an instance of the Vandermonde matrix.

Definition 2.

Let

a_{1}, \dots, a_{m} \in C

and

n \in N

. The associatedVandermonde matrixis defined as

V_{n ∣ a_{1}, \dots, a_{m}} = [\begin{matrix} 1 & a_{1} & a_{1}^{2} & \dots & a_{1}^{n - 1} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 1 & a_{m} & a_{m}^{2} & \dots & a_{m}^{n - 1} \end{matrix}] .

(11)

Basic properties. For pairwise distinct nodes

a_{1}, \dots, a_{m}

, the Vandermonde matrix satisfies:

$det V_{m ∣ a_{1}, \dots, a_{m}} = \prod_{1 \leq j < i \leq m} (a_{i} - a_{j})$ .
$rank V_{n ∣ a_{1}, \dots, a_{m}} = min {m, n}$ .
Viewing $V_{n ∣ a_{1}, \dots, a_{m}}$ as a linear map $F : C^{m} \to C^{n}$ , F is injective if and only if $m \leq n$ , surjective if and only if $m \geq n$ , and bijective if and only if $m = n$ .

In particular,

det V_{T ∣ ω^{0}, ω^{1}, \dots, ω^{T - 1}} = \prod_{0 \leq m < n < T} (ω^{n} - ω^{m}) \neq 0,

so Equation (10) has a unique solution for

(c_{0}, \dots, c_{T - 1})

.

3.2. Formulation of DKMD

Assume that the Koopman mode decomposition of a function f is expressed as a countably infinite sum of eigenfunctions

{ϕ_{n}}_{n \in N}

of the Koopman generator

K

. For example, assume that the spectrum of

K

consists only of a countably infinite set of isolated eigenvalues

{λ_{n}}_{n \in N}

:

f = \sum_{n = 1}^{\infty} ϕ_{n} .

Then, without loss of generality, we may assume that the values

{e^{λ_{n}}}_{n \in N}

are pairwise distinct.

Given finitely many samples

x_{t} = f (t)

for

t = 0, 1, \dots, T - 1

, DKMD seeks to satisfy

x_{t} = \sum_{n = 1}^{\infty} c_{n} e^{λ_{n} t}, where c_{n} : = ϕ_{n} (0) .

(12)

In the same way as in the DFT case, since the infinitely many unknowns

{c_{n}}_{n = 1}^{\infty}

and

{λ_{n}}_{n = 1}^{\infty}

make the system indefinite, we restrict their number to a finite

ℓ < \infty

. Then the system can be written compactly as

(x_{0}, x_{1}, \dots, x_{T - 1}) = (c_{1}, c_{2}, \dots, c_{ℓ}) V_{T ∣ e^{λ_{1}}, \dots, e^{λ_{ℓ}}} .

(13)

If

e^{λ_{1}}, \dots, e^{λ_{ℓ}}

are pairwise distinct and

ℓ \geq T

, the Vandermonde matrix represents a surjective linear mapping from

C^{ℓ}

onto

C^{T}

. Then we have the following:

Proposition 2.

For any

(x_{0}, \dots, x_{T - 1}) \in C^{T}

, pairwise distinct

e^{λ_{1}}, \dots, e^{λ_{ℓ}}

, and

ℓ \geq T

, there exists

(c_{1}, \dots, c_{ℓ}) \in C^{ℓ}

satisfying Equation (13).

Thus, in contrast to the DFT case, where the given values

x_{0}, \dots, x_{T - 1}

are decomposed into exactly T Fourier components, performing DKMD requires determining the number ℓ of Koopman modes within the range

ℓ < T

. Specifically, in the nonlinear system (13), the Koopman degree ℓ itself appears as an additional unknown, alongside

e^{λ_{1}}, \dots, e^{λ_{ℓ}}

and

c_{1}, \dots, c_{ℓ}

.

In this regard, the primary contribution of this paper is to establish criteria for determining ℓ, and to present computationally efficient methods for estimating it both exactly and approximately, particularly when the given observations are contaminated by noise.

In Section 5, we review an existing method from the literature for solving the equation in the special case where ℓ is known.

4. Definitions and Notations

The following definitions and descriptions are essential for the subsequent sections.

We extend the aforementioned formalization of DKMD from the decomposition of

C

-valued functions to that of

C^{m}

-valued functions for a dimension

m \geq 1

. This modification does not alter the fundamental nature of the problem but better aligns with real-world applications of DKMD, such as fluid dynamics.

Definition 3

(DKMD). Let

[x_{0} \dots x_{T - 1}]

be a matrix of observables at times

t = 0, \dots, T - 1

, where each

x_{t}

is an m-dimensional column vector. The discrete Koopman mode decomposition (DKMD) of this observable matrix is a matrix factorization given by

[x_{0} \dots x_{T - 1}] = [m_{1} \dots m_{ℓ}] V_{T ∣ μ_{1}, \dots, μ_{ℓ}},

(14)

where

μ_{1}, \dots, μ_{ℓ} \in C

are pairwise distinct.

Definition 4

(Koopman Eigenvalues, Modes, and Degree). Given a DKMD as in Equation (14), we refer to ℓ as theKoopman degree,

μ_{1}, \dots, μ_{ℓ}

as theKoopman eigenvalues, and

m_{1}, \dots, m_{ℓ}

as theKoopman modes.

For convenience, let

X

and

M

denote the observable matrix and the matrix of modes, respectively:

X = [\begin{matrix} x_{0} & \dots & x_{T - 1} \end{matrix}], M = [\begin{matrix} m_{1} & \dots & m_{ℓ} \end{matrix}] .

Then the DKMD can be expressed compactly as

X = M V_{T ∣ μ_{1}, \dots, μ_{ℓ}} .

(15)

Table 1 summarizes the principal notations used throughout this article.

5. Computing DKMD for Known Degrees (Related Work)

In this section, we introduce the vector Prony method [16], which estimates the unknown Koopman eigenvalues

μ_{1}, \dots, μ_{ℓ}

and the Koopman mode matrix

M

that satisfy Equation (15), given the observable matrix

X

and the Koopman degree ℓ.

The procedure first computes the Koopman eigenvalues, and subsequently computes the Koopman modes based on the obtained eigenvalues.

5.1. Computing the Koopman Eigenvalues

We first introduce the characteristic polynomial associated with a DKMD, whose roots correspond to the Koopman eigenvalues.

Definition 5

(Characteristic Polynomial). For a DKMD

X = M V_{T ∣ μ_{1}, \dots, μ_{ℓ}}

, the polynomial

f (X) = \prod_{i = 1}^{ℓ} (X - μ_{i})

is called thecharacteristic polynomialof the DKMD.

If the characteristic polynomial of a DKMD is expressed as

f (X) = X^{ℓ} + a_{ℓ - 1} X^{ℓ - 1} + \dots + a_{0},

then the following recurrence relation holds:

Proposition 3.

For all integers j with

0 \leq j \leq T - ℓ - 1

,

x_{j + ℓ} + a_{ℓ - 1} x_{j + ℓ - 1} + \dots + a_{0} x_{j} = 0 .

Proof.

The statement follows from

x_{j + ℓ} + a_{ℓ - 1} x_{j + ℓ - 1} + \dots + a_{0} x_{j} = M diag (μ_{1}^{j}, \dots, μ_{ℓ}^{j}) {[\begin{matrix} f (μ_{1}) & \dots & f (μ_{ℓ}) \end{matrix}]}^{T} = 0 .

□

Using Hankel matrices (Definition 6), the statement of Proposition 3 can be compactly expressed as

H_{0 ∣ X_{ℓ}^{T - 1}} = - [\begin{matrix} a_{0} & \dots & a_{ℓ - 1} \end{matrix}] H_{ℓ - 1 ∣ X_{0}^{T - 1}} .

(16)

Definition 6

(Hankel Matrix). Given a matrix

X = [x_{0}, \dots, x_{T - 1}]

, the kth Hankel matrix of

X

is defined for

0 \leq k \leq T - 1

as

H_{k ∣ X} = [\begin{matrix} {(X_{0}^{k})}^{T} & {(X_{1}^{k + 1})}^{T} & \dots & {(X_{T - k - 1}^{T - 1})}^{T} \end{matrix}],

where

H_{k ∣ X} \in C^{(k + 1) m \times (T - k)}

.

Although

(a_{0}, \dots, a_{ℓ - 1})

that satisfy Equation (16) may not exist, and even if they exist they may not be unique, the following least-squares optimality always holds:

- H_{0 ∣ X_{ℓ}^{T - 1}} H_{ℓ - 1 ∣ X_{0}^{T - 1}}^{+} \in \underset{a_{0}, \dots, a_{ℓ - 1}}{arg min} {∥H_{0 ∣ X_{ℓ}^{T - 1}} + [\begin{matrix} a_{0} & \dots & a_{ℓ - 1} \end{matrix}] H_{ℓ - 1 ∣ X_{0}^{T - 1}}∥}_{F} .

This represents the least-squares solution for the characteristic coefficients when Equation (16) is inconsistent.

5.2. Computing the Koopman Modes

Once the eigenvalues

μ_{1}, \dots, μ_{ℓ}

are computed, the linear equation (14) determines

M

. Since

X V_{T ∣ μ_{1}, \dots, μ_{ℓ}}^{+} \in \underset{M}{arg min} {∥X - M V_{T ∣ μ_{1}, \dots, μ_{ℓ}}∥}_{F}

holds, if a solution exists,

X V_{T ∣ μ_{1}, \dots, μ_{ℓ}}^{+}

provides one possible solution, which is not necessarily unique.

Equation (15) can be viewed as a factorization of linear mappings:

X : C^{T} \to C^{m}, V_{T ∣ μ_{1}, \dots, μ_{ℓ}} : C^{T} \to C^{ℓ}, M : C^{ℓ} \to C^{m} .

The condition for the existence of such an

M

is

ker X \supseteq ker V_{T ∣ μ_{1}, \dots, μ_{ℓ}} .

Under this condition, the solution is unique if and only if

V_{T ∣ μ_{1}, \dots, μ_{ℓ}}

is surjective, which holds if and only if

ℓ \leq T

and the nodes

μ_{1}, \dots, μ_{ℓ}

are pairwise distinct.

Proposition 4.

For an observable matrix

X

and a Koopman degree ℓ with

ℓ \leq T

, a DKMD of

X

with Koopman eigenvalues

μ_{1}, \dots, μ_{ℓ}

, if it exists, it is unique.

6. The Contributions of This Article

Given an observable matrix, the vector Prony method introduced in the previous section computes a DKMD for a specified Koopman degree ℓ. However, a DKMD does not always exist for the given degree, and even when it does, it may not be unique.

Definition 7

(Feasible Degree). A Koopman degree ℓ is said to befeasiblefor the observables if a DKMD of degree ℓ exists for the given observable matrix.

Although the Koopman degree ℓ must satisfy

ℓ < T

(Proposition 2), a DKMD may exist for multiple values of ℓ. Thus, selecting the optimal feasible degree is essential to obtain an optimal DKMD.

For this selection, we consider two independent principles:

Minimality:: The optimal degree should be the smallest among all feasible degrees. This principle is analogous to Occam’s razor, favoring the simplest representation that adequately explains the observations.
Uniqueness:: The optimal degree should correspond to a unique DKMD. If multiple DKMDs exist for a given ℓ, as described in a later section, the set of such decompositions forms a continuum, where different DKMDs yield distinct eigenvalues and modes. Consequently, any particular DKMD extracted from this continuum— such as one obtained by the vector Prony method— may fail to reproduce the true dynamics precisely.

In this article, we demonstrate that if a degree satisfies the uniqueness criterion, it also satisfies the minimality criterion, but the converse does not necessarily hold. Thus, between these two principles, we adopt the uniqueness criterion as the standard for selecting the optimal Koopman degree.

Definition 8

(Uniquely Feasible Degree). A Koopman degree is said to beuniquely feasibleif a DKMD of that degree exists and is unique for the given observables.

In summary, the objective of this paper is to establish a theoretical framework for uniquely feasible degrees. Specifically, we demonstrate and present the following:

A uniquely feasible degree for a given observable matrix, if it exists, is the smallest among all feasible degrees.
Several structural properties of uniquely feasible degrees lead to computationally efficient algorithms for determining them.
These algorithms are further extended to handle noisy observables via least-squares formulations.

7. Finding Uniquely Feasible Degrees

7.1. Key Indices: Hankel Dimension and Codimension

We first summarize the results of the previous sections in a theorem that provides a necessary and sufficient condition for an observable matrix to admit a DKMD.

For notational convenience, we first introduce the following definition.

Definition 9

(Square-free coefficient vector). A vector

(\begin{matrix} a_{0} & \dots & a_{n - 1} & 1 \end{matrix}) \in C^{n + 1}

is called asquare-free coefficient vectorif the algebraic equation

X^{n} + a_{n - 1} X^{n - 1} + \dots + a_{0} = 0

has no repeated roots.

Theorem 1

(Feasibility condition). Let

X

be an observable matrix and

ℓ < T

. The following statements are equivalent:

1.: $X$ admits a DKMD with Koopman degree ℓ; equivalently, ℓ is feasible for $X$ .
2.: There exists a square-free coefficient vector $(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix})$ satisfying

$(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix}) H_{ℓ ∣ X} = 0 .$

(17)

Proof.

(1 ⇒ 2) Suppose

X

admits a DKMD with Koopman eigenvalues

μ_{1}, \dots, μ_{ℓ}

. The coefficient vector

a = (\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix})

of the characteristic polynomial of this DKMD is square-free by the definition of DKMD (Definition 3), and the assertion of Proposition 3 can be restated as

a {(X_{i}^{i + ℓ})}^{T} = 0 for i = 0, \dots, T - ℓ - 1,

which further implies Equation (17).

(2 ⇒ 1) Let

μ_{1}, \dots, μ_{ℓ}

denote the distinct roots of the square-free polynomial

X^{ℓ} + a_{ℓ - 1} X^{ℓ - 1} + \dots + a_{0} = 0 .

We consider

X, V_{T ∣ μ_{1}, \dots, μ_{ℓ}},

and

M

as representing linear mappings

F : C^{T} \to C^{m}

,

G : C^{T} \to C^{ℓ}

, and

H : C^{ℓ} \to C^{m}

, respectively. Then, the rank-nullity theorem implies

ker G = ⨁_{i = 0}^{T - ℓ - 1} C {(\begin{matrix} 0^{i} & a & 0^{T - ℓ - 1 - i} \end{matrix})}^{T},

and

ker F \supseteq ker G

follows from

X {(\begin{matrix} 0^{i} & a & 0^{T - ℓ - 1 - i} \end{matrix})}^{T} = X_{i}^{i + ℓ} a^{T} = 0

for all

i = 0, \dots, T - ℓ - 1

, which implies the existence of H. □

The square-free coefficient vector in Equation (17) lies in the orthogonal complement

V {(H_{ℓ ∣ X})}^{⊥}

, which is the subspace of

C^{ℓ + 1}

consisting of all vectors orthogonal to every column of

H_{ℓ ∣ X}

. Theorem 1 thus shows that the existence of a DKMD depends on the structure of

V (H_{k ∣ X})

and its orthogonal complement. This motivates the following key indices.

Definition 10

(Hankel dimension and codimension). Given an observable matrix

X

, the Hankel dimension and codimension of order k (

k < T

) are defined as

\dim_{H} (k ∣ X) = rank H_{k ∣ X}, {codim}_{H} (k ∣ X) = k + 1 - \dim_{H} (k ∣ X) .

Using these indices, we can restate Theorem 1 as follows:

Corollary 1.

For an observable matrix

X

and a Koopman degree

ℓ < T

, a necessary condition for the existence of an ℓ-degree DKMD of

X

is

{codim}_{H} (ℓ ∣ X) \geq 1 .

However, the condition

{codim}_{H} (ℓ ∣ X) \geq 1

is not a sufficient condition due to the following two reasons:

Even if we can find $(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & a_{ℓ} \end{matrix})$ with

$(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & a_{ℓ} \end{matrix}) H_{ℓ ∣ X} = 0,$

it may happen that $a_{ℓ} = 0$ , meaning that the polynomial corresponding to $(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & a_{ℓ} \end{matrix})$ is of degree lower than ℓ, which cannot induce a DKMD of degree ℓ.
Even if

$(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix}) H_{ℓ ∣ X} = 0,$

the resulting polynomial

$X^{ℓ} + a_{ℓ - 1} X^{ℓ - 1} + \dots + a_{0} = 0$

may have repeated roots, rendering it unsuitable as a characteristic polynomial.

Our purpose here is to identify a Koopman degree ℓ that admits a unique DKMD. We can represent the condition for a uniquely feasible degree using the Hankel codimension, assuming that ℓ is feasible, that is, an ℓ-degree DKMD exists.

Corollary 2.

For a feasible degree ℓ of an observable matrix

X

with

ℓ < T

, the following hold:

1.: The DKMD is unique if and only if ${codim}_{H} (ℓ ∣ X) = 1$ .
2.: The set of ℓ-degree DKMDs forms a continuum if and only if ${codim}_{H} (ℓ ∣ X) > 1$ .

Proof.

Since ℓ is feasible,

V {(H_{ℓ ∣ X})}^{⊥}

is non-empty, and there exists a coefficient vector

{(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix})}^{T} \in V {(H_{ℓ ∣ X})}^{⊥}

which determines the characteristic polynomial of a DKMD of

X

.

If

{codim}_{H} (ℓ ∣ X) = 1

, that is, if

dim V {(H_{ℓ ∣ X})}^{⊥} = 1

, then only

(\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix})

can determine a characteristic polynomial, implying that the DKMD is unique by Proposition 4.

On the other hand, if

{codim}_{H} (ℓ ∣ X) > 1

, there exists a coefficient vector

(\begin{matrix} b_{0} & \dots & b_{ℓ - 1} & 1 \end{matrix}) \in V {(H_{ℓ ∣ X})}^{⊥} ∖ C (\begin{matrix} a_{0} & \dots & a_{ℓ - 1} & 1 \end{matrix}) .

When we define

f_{α} (X) = (1 - α) f (X) + α (X^{ℓ} + b_{ℓ - 1} X^{ℓ - 1} + \dots + b_{0}),

since

f_{0} (X)

is square-free, there exists

ϵ > 0

such that

f_{α} (X)

remains square-free for any

α \in (- ϵ, ϵ)

. In particular, each distinct value of

α

yields a distinct DKMD. □

7.2. The Koopman Dimension and Codimension for $m = 1$

In this section, we investigate the Hankel dimension and codimension in the restricted case where

m = 1

, i.e., when

X

consists of a single row with T components. These results will be extended to the general case

m \geq 1

in Section 7.4.6.

Lemma 1.

Let

X

be an observable matrix with one row and T columns. Furthermore, suppose that

X

admits a DKMD of the form

X = [\begin{matrix} m_{1} & \dots & m_{ℓ} \end{matrix}] V_{T ∣ μ_{1}, \dots, μ_{ℓ}}

such that

ℓ \leq \frac{T + 1}{2}

and each

m_{i}

is nonzero. Then, the leftmost

ℓ \times ℓ

block submatrix of

H_{ℓ - 1 ∣ X}

has nonzero determinant.

Proof.

The leftmost

ℓ \times ℓ

block submatrix is expressed as:

\begin{matrix} [\begin{matrix} {(X_{0}^{ℓ - 1})}^{T} & {(X_{1}^{ℓ})}^{T} & \dots & {(X_{ℓ - 1}^{2 ℓ - 2})}^{T} \end{matrix}] & = {(V_{ℓ ∣ μ_{1}, \dots, μ_{ℓ}})}^{T} diag (m_{1}, \dots, m_{ℓ}) V_{ℓ ∣ μ_{1}, \dots, μ_{ℓ}} . \end{matrix}

The determinant of this matrix is nonzero, because

μ_{1}, \dots, μ_{ℓ}

are mutually distinct, implying

det V_{ℓ ∣ μ_{1}, \dots, μ_{ℓ}} \neq 0,

and

m_{1}, \dots, m_{ℓ}

are nonzero, implying

det diag (m_{1}, \dots, m_{ℓ}) \neq 0

. □

Theorem 2.

Let

X

,

m_{1}, \dots, m_{ℓ}

, and

μ_{1}, \dots, μ_{ℓ}

be as in Lemma 1. Assume further that

ℓ \leq \frac{T + 1}{2}

. Then, the Hankel dimension and codimension are given by:

\begin{matrix} \dim_{H} (k ∣ X) & = \{\begin{matrix} k + 1, & for k \in {0, \dots, ℓ - 1}; \\ ℓ, & for k \in {ℓ, \dots, T - ℓ}; \\ T - k, & for k \in {T - ℓ + 1, \dots, T - 1}; \end{matrix} \\ {codim}_{H} (k ∣ X) & = \{\begin{matrix} 0, & for k \in {0, \dots, ℓ - 1}; \\ k + 1 - ℓ, & for k \in {ℓ, \dots, T - ℓ}; \\ 2 k - T + 1, & for k \in {T - ℓ + 1, \dots, T - 1} . \end{matrix} \end{matrix}

Proof.

Note that

H_{k ∣ X}

consists of

k + 1

rows and

T - k

columns.

If

0 \leq k \leq ℓ - 1

, equivalently if

k + 1 \leq ℓ

and

T - k \geq ℓ

, the leftmost

(k + 1) \times ℓ

submatrix of

H_{k ∣ X}

coincides with the top

(k + 1) \times ℓ

submatrix of

H_{ℓ - 1 ∣ X}

, implying its rank is

k + 1

by Lemma 1. Thus, all

k + 1

rows of

H_{k ∣ X}

are linearly independent, and

\dim_{H} (k ∣ X) = k + 1

follows.

If

T - ℓ + 1 \leq k \leq T - 1

, equivalently if

k + 1 > ℓ

and

T - k < ℓ

, the entire

H_{k ∣ X}

has

T - k

columns. Its top

ℓ \times (T - k)

submatrix coincides with the leftmost

ℓ \times (T - k)

submatrix of

H_{ℓ - 1 ∣ X}

, implying its rank is

T - k

by Lemma 1. Thus, all

T - k

columns of

H_{k ∣ X}

are linearly independent, and

\dim_{H} (k ∣ X) = T - k

follows.

If

ℓ \leq k \leq T - ℓ

, equivalently if

k + 1 > ℓ

and

T - k \geq ℓ

, the top-left

ℓ \times ℓ

submatrix of

H_{k ∣ X}

coincides with that of

H_{ℓ - 1 ∣ X}

, and hence, Lemma 1 implies that the leftmost ℓ columns of

H_{k ∣ X}

are linearly independent.

On the other hand, we let

\prod_{i = 1}^{ℓ} (X - μ_{i}) = X^{ℓ} - a_{ℓ - 1} X^{ℓ - 1} - \dots - a_{0}

denote the characteristic polynomial. Then each eigenvalue

μ_{i}

satisfies

μ_{i}^{ℓ} = a_{ℓ - 1} μ_{i}^{ℓ - 1} + \dots + a_{0}

. For the i-th element

x_{i}

of

X

with

i \geq ℓ

,

\begin{matrix} x_{i} & = \sum_{j = 1}^{ℓ} m_{j} μ_{j}^{i} = \sum_{j = 1}^{ℓ} m_{j} μ_{j}^{i - ℓ} \cdot μ_{j}^{ℓ} = \sum_{j = 1}^{ℓ} m_{j} μ_{j}^{i - ℓ} (\sum_{n = 0}^{ℓ - 1} a_{n} μ_{j}^{n}) \\ = \sum_{n = 0}^{ℓ - 1} a_{n} (\sum_{j = 1}^{ℓ} m_{j} μ_{j}^{i - ℓ + n}) = \sum_{n = 0}^{ℓ - 1} a_{n} x_{i - ℓ + n} . \end{matrix}

This recurrence relation shows that for any

i \geq ℓ

, the element

x_{i}

is determined by

x_{i - ℓ}, \dots, x_{i - 1}

. Consequently, any column of

H_{k ∣ X}

is a linear combination of the leftmost ℓ columns, which have been proven to be linearly independent. Therefore,

\dim_{H} (k ∣ X) = ℓ

.

The claims about the Hankel codimension follow directly from the definition

{codim}_{H} (k ∣ X) = k + 1 - \dim_{H} (k ∣ X)

. □

Corollary 3.

If

ℓ \leq \frac{T}{2}

, then

k = ℓ

is the unique value satisfying

{codim}_{H} (k ∣ X) = 1

.

If

ℓ = \frac{T + 1}{2}

, then

X

admits no uniquely feasible degree.

7.3. Examples

In this section, we introduce three examples of an observable matrix

X

, each demonstrating different properties regarding the existence of a uniquely feasible degree:

No Koopman degree ℓ satisfies ${codim}_{H} (ℓ ∣ X) = 1$ , meaning that no uniquely feasible degree exists (Example 2).
A Koopman degree ℓ with ${codim}_{H} (ℓ ∣ X) = 1$ exists, but the corresponding characteristic polynomial is not square-free. As a result, a uniquely feasible degree does not exist (Example 3).
A uniquely feasible degree exists, ensuring that a DKMD is uniquely determined for the degree (Example 4).

These examples illustrate the conditions under which a DKMD is uniquely determined and the role of Hankel codimension in establishing uniqueness.

Example 2.

Consider the observable matrix

X = [\begin{matrix} 1 & 1 & 1 & 1 & 3 & 5 & 7 \end{matrix}] .

The Hankel matrices for

ℓ = 1, 2, 3

are computed as:

\begin{matrix} H_{1 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 & 3 & 5 \\ 1 & 1 & 1 & 3 & 5 & 7 \end{matrix}], H_{2 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 & 3 \\ 1 & 1 & 1 & 3 & 5 \\ 1 & 1 & 3 & 5 & 7 \end{matrix}], and H_{3 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 3 \\ 1 & 1 & 3 & 5 \\ 1 & 3 & 5 & 7 \end{matrix}] . \end{matrix}

These computations yield:

{codim}_{H} (1 ∣ X) = {codim}_{H} (2 ∣ X) = {codim}_{H} (3 ∣ X) = 0,

implying that

X

admits no DKMD for these degrees by Corollary 1.

On the other hand, for

ℓ = 4, 5, 6

, we have

H_{4 ∣ X} = [\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 3 \\ 1 & 3 & 5 \\ 3 & 5 & 7 \end{matrix}], H_{5 ∣ X} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 3 \\ 3 & 5 \\ 5 & 7 \end{matrix}], and H_{6 ∣ X} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 3 \\ 5 \\ 7 \end{matrix}] .

It follows that

{codim}_{H} (4 ∣ X) = 2, {codim}_{H} (5 ∣ X) = 4, and {codim}_{H} (6 ∣ X) = 6,

implying that DKMDs form a continuum by Corollary 2.

For example, for

ℓ = 4

,

V {(H_{4 ∣ X})}^{⊥} = C {(\begin{matrix} - 1 & - 1 & 0 & - 1 & 1 \end{matrix})}^{T} \oplus C {(\begin{matrix} - 1 & 1 & 0 & 0 & 0 \end{matrix})}^{T} .

Thus, the possible candidates for characteristic polynomials take the form:

f_{a} (x) = x^{4} - x^{3} - x - 1 + a (x - 1),

(18)

whose discriminant is computed as:

D = - 27 a^{4} + 54 a^{3} - 783 a^{2} - 900 a - 500 .

Since

f_{a} (X) = 0

has no repeated roots if and only if

D \neq 0

, it follows that 4-degree DKMDs form a continuum.

Example 3.

When we let

X = [\begin{matrix} 1 & 1 & - 3 & 5 & - 7 \end{matrix}]

, the Hankel matrices for

ℓ = 1, 2, 3

are computed as:

\begin{matrix} H_{1 ∣ X} = [\begin{matrix} 1 & 1 & - 3 & 5 \\ 1 & - 3 & 5 & - 7 \end{matrix}], H_{2 ∣ X} = [\begin{matrix} 1 & 1 & - 3 \\ 1 & - 3 & 5 \\ - 3 & 5 & - 7 \end{matrix}], and H_{3 ∣ X} = [\begin{matrix} 1 & 1 \\ 1 & - 3 \\ - 3 & 5 \\ 5 & - 7 \end{matrix}], \end{matrix}

implying

{codim}_{H} (1 ∣ X) = 0, {codim}_{H} (2 ∣ X) = 1, and {codim}_{H} (3 ∣ X) = 2 .

In fact, we have

V {(H_{1 ∣ X})}^{⊥} = (0)

, and furthermore,

\begin{matrix} V {(H_{2 ∣ X})}^{⊥} = C {(\begin{matrix} 1 & 2 & 1 \end{matrix})}^{T} and V {(H_{3 ∣ X})}^{⊥} = C {(\begin{matrix} 1 & 2 & 1 & 0 \end{matrix})}^{T} \oplus C {(\begin{matrix} 0 & 1 & 2 & 1 \end{matrix})}^{T} . \end{matrix}

This implies that there exists no DKMD for any of these degrees because:

1.: For $ℓ = 1$ , no characteristic polynomial exists.
2.: For $ℓ = 2$ , if a DKMD existed, the corresponding characteristic polynomial would be

$X^{2} + 2 X + 1 = {(X + 1)}^{2},$

which is not square-free.
3.: For $ℓ = 3$ , if a DKMD existed, the corresponding characteristic polynomial would be of the form

$X^{3} + 2 X^{2} + X + a (X^{2} + 2 X + 1) = (X + a) {(X + 1)}^{2},$

which is also not square-free.

On the other hand, for

ℓ = 4

, we have

H_{4 ∣ X} = X^{T}

and

\begin{matrix} V {(H_{4 ∣ X})}^{⊥} & = C {(\begin{matrix} 7 & 0 & 0 & 0 & 1 \end{matrix})}^{T} \oplus C {(\begin{matrix} 1 & 2 & 1 & 0 & 0 \end{matrix})}^{T} \\ \oplus C {(\begin{matrix} 0 & 1 & 2 & 1 & 0 \end{matrix})}^{T} \oplus C {(\begin{matrix} 0 & 0 & 1 & 2 & 1 \end{matrix})}^{T}, \end{matrix}

implying that there exists a DKMD whose characteristic polynomial is

X^{4} + 7 = \prod_{k = 0}^{3} (X - \sqrt[4]{7} e^{\frac{i π}{4} (2 k + 1)}) .

Furthermore, all possible characteristic polynomials of DKMDs for these observables are of the form

f_{a, b, c} (X) = a (X^{4} + 7) + (1 - a) (X^{4} + 2 X^{3} + X^{2}) + b (X^{3} + 2 X^{2} + X) + c (X^{2} + 2 X + 1) .

Since

f_{a, b, c} (X)

is square-free for

(a, b, c) = (1, 0, 0)

, and since the set of

(a, b, c) \in C^{3}

for which

f_{a, b, c} (X)

is square-free is an open subset of

C^{3}

, we conclude that the set of possible four-degree DKMDs for

X

forms a continuum.

Example 4.

We expand

X

of Example 2 by adding one more dimension to the observables. Let the observable matrix

X

be given by

X = [\begin{matrix} 1 & 1 & 2 & 3 & 5 & 8 & 13 \\ 1 & 1 & 1 & 1 & 3 & 5 & 7 \end{matrix}] .

For

ℓ = 1, 2, 3

, the corresponding Hankel matrices are computed as:

\begin{matrix} H_{1 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 & 2 & 1 & 3 & 1 & 5 & 3 & 8 & 5 \\ 1 & 1 & 2 & 1 & 3 & 1 & 5 & 3 & 8 & 5 & 13 & 7 \end{matrix}], \\ H_{2 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 & 2 & 1 & 3 & 1 & 5 & 3 \\ 1 & 1 & 2 & 1 & 3 & 1 & 5 & 3 & 8 & 5 \\ 2 & 1 & 3 & 1 & 5 & 3 & 8 & 5 & 13 & 7 \end{matrix}], and H_{3 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 & 2 & 1 & 3 & 1 \\ 1 & 1 & 2 & 1 & 3 & 1 & 5 & 3 \\ 2 & 1 & 3 & 1 & 5 & 3 & 8 & 5 \\ 3 & 1 & 5 & 3 & 8 & 5 & 13 & 7 \end{matrix}], \end{matrix}

implying

{codim}_{H} (1 ∣ X) = {codim}_{H} (2 ∣ X) = {codim}_{H} (3 ∣ X) = 0 .

On the other hand,

{codim}_{H} (4 ∣ X) = 1, {codim}_{H} (5 ∣ X) = 2, and {codim}_{H} (6 ∣ X) = 5

follows from

H_{4 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 & 2 & 1 \\ 1 & 1 & 2 & 1 & 3 & 1 \\ 2 & 1 & 3 & 1 & 5 & 3 \\ 3 & 1 & 5 & 3 & 8 & 5 \\ 5 & 3 & 8 & 5 & 13 & 7 \end{matrix}], H_{5 ∣ X} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 2 & 1 \\ 2 & 1 & 3 & 1 \\ 3 & 1 & 5 & 3 \\ 5 & 3 & 8 & 5 \\ 8 & 5 & 13 & 7 \end{matrix}], and H_{6 ∣ X} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 2 & 1 \\ 3 & 1 \\ 5 & 3 \\ 8 & 5 \\ 13 & 7 \end{matrix}] .

Therefore, if

X

admits a uniquely feasible degree, it must be four. In fact,

V {(H_{4 ∣ X})}^{⊥} = C {(\begin{matrix} - 1 & - 1 & 0 & - 1 & 1 \end{matrix})}^{T}

holds, and the corresponding characteristic polynomial is

f (x) = x^{4} - x^{3} - x - 1,

which is square-free, implying that

ℓ = 4

is uniquely feasible.

7.4. Important Properties of The Hankel Dimension and Codimension

In this subsection, we introduce important properties of the Hankel dimension and Hankel codimension, which play a crucial role in designing algorithms to determine uniquely feasible degrees, including the following. For the convenience of description, we denote the minimum feasible degree as:

L = min {ℓ \in [0, T) ∣ {codim}_{H} (ℓ ∣ X) > 0} .

(19)

Best Possible Upper Bound of a Uniquely Feasible Degree.

Let

r = rank X

. If

ℓ > \frac{r T}{r + 1}

, then

{codim}_{H} (ℓ ∣ X) > 1

. Hence, no uniquely feasible degree can exceed

\frac{r T}{r + 1}

, which is also the sharpest possible upper bound.

Monotonic Increase of the Hankel Codimension.

The Hankel codimension

{codim}_{H} (ℓ ∣ X)

is strictly increasing with respect to ℓ over the interval

[L, T)

.

Equivalence Between Unique and Minimal Feasibility.

The monotonicity of the codimension implies that, if ℓ is uniquely feasible, then

ℓ = L

. In particular, if

{codim}_{H} (L ∣ X) > 1

, no uniquely feasible degree exists.

Saturation of the Hankel Dimension.

If

L \leq \frac{T}{2}

and

ℓ \in [L, T - L]

, then

\dim_{H} (ℓ ∣ X) = L, {codim}_{H} (ℓ ∣ X) = ℓ + 1 - L .

In particular,

{codim}_{H} (L ∣ X) = 1

holds, which implies L is the only candidate for a uniquely feasible degree.

7.4.1. Invariance under Basis Transformations

For

X \in C^{m \times T}

,

Y \in C^{n \times T}

, and

A \in C^{n \times m}

, we assume

Y = A X .

Their corresponding Hankel matrices are computed as

H_{k ∣ Y} = H_{k ∣ X} diag (\underset{T - k}{\underset{︸}{A^{T}, \dots, A^{T}}}),

where

diag ({\underset{︸}{A^{T}, \dots, A^{T}}}_{T - k})

is defined as the

m (T - k) \times n (T - k)

matrix given by

diag (\underset{T - k}{\underset{︸}{A^{T}, \dots, A^{T}}}) = [\begin{matrix} A^{T} & 0 \\ ⋱ \\ 0 & A^{T} \end{matrix}] .

This implies, in particular,

\dim_{H} (k ∣ Y) \leq \dim_{H} (k ∣ X) and {codim}_{H} (k ∣ Y) \geq {codim}_{H} (k ∣ X) .

Furthermore, if a square-free polynomial

f (x) = x^{k} + a_{k - 1} x^{k - 1} + \dots + a_{0}

is a characteristic polynomial of a DKMD for

X

, it is also a characteristic polynomial of a DKMD for

Y

. In fact, we have

(\begin{matrix} a_{0} & \dots & a_{k - 1} & 1 \end{matrix}) H_{k ∣ Y} = (\begin{matrix} a_{0} & \dots & a_{k - 1} & 1 \end{matrix}) H_{k ∣ X} diag (\underset{T - k}{\underset{︸}{A^{T}, \dots, A^{T}}}) = 0 .

Furthermore, if in addition there exists

B \in C^{m \times n}

such that

X = B Y,

then

\dim_{H} (k ∣ Y) = \dim_{H} (k ∣ X) and {codim}_{H} (k ∣ Y) = {codim}_{H} (k ∣ X)

holds, and the correspondence of characteristic polynomials is bijective.

The condition for

Y = A X

is that the row space of

Y

is contained in the row space of

X

, and symmetrically, the condition for

X = B Y

is that the row space of

Y

contains the row space of

X

. Hence, both

Y = A X

and

X = B Y

hold if and only if the row space of

Y

and the row space of

X

are identical. In other words, under the hypothesis

Y = A X

, the necessary and sufficient condition for the existence of

B \in C^{m \times n}

such that

X = B Y

is

rank Y = rank X

.

If

Y = A X

and

rank Y = rank X

hold, the minimum

{min}_{B} {∥ X - B Y ∥}_{F}

is zero, and hence,

∥ X - X Y^{+} {Y ∥}_{F} = 0

holds, implying

X = X Y^{+} Y .

Furthermore, this bijective correspondence between characteristic polynomials yields a bijective correspondence between DKMDs. In fact,

X = M V_{T ∣ μ_{1}, \dots, μ_{k}},

which is a DKMD of

X

, yields

Y = (A M) V_{T ∣ μ_{1}, \dots, μ_{k}},

which is a DKMD of

Y

. In reverse, a DKMD of

Y

Y = M V_{T ∣ μ_{1}, \dots, μ_{k}}

corresponds to a DKMD of

X

X = (X Y^{+} M) V_{T ∣ μ_{1}, \dots, μ_{k}} .

The latter correspondence is the inverse of the former by Proposition 4.

Thus, we have:

Theorem 3.

If

rank Y = rank X

holds for

Y = A X

, then the following statements hold:

1.: $\dim_{H} (k ∣ X) = \dim_{H} (k ∣ Y)$ for each $k \in {0, 1, \dots, T - 1}$ .
2.: ${codim}_{H} (k ∣ X) = {codim}_{H} (k ∣ Y)$ for each $k \in {0, 1, \dots, T - 1}$ .
3.: The set of characteristic polynomials for $X$ is identical with that for $Y$ .
4.: A DKMD $X = M V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $X$ can be converted to a DKMD $Y = (A M) V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $Y$ , while a DKMD $Y = M V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $Y$ can be converted to a DKMD $X = (X Y^{+} M) V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $X$ . These conversions yield a bijective correspondence between the set of DKMDs for $X$ and that for $Y$ .

As an application of Theorem 3, the following two cases are particularly important:

When $A$ is a nonsingular $m \times m$ matrix, we have $rank Y = rank X$ automatically, and thus Theorem 3 provides the invariance of the Hankel dimension, Hankel codimension, characteristic polynomials, and DKMDs under basis transformations in $C^{m}$ .
When $A \in C^{r \times m}$ with $r = rank X$ is selected to satisfy $rank (A X) = rank X$ , the matrix $Y = A X$ has fewer rows than $X$ , and thus $Y$ requires less computation to obtain DKMDs than $X$ .

7.4.2. Invariance under Basis Transformations

For

X \in C^{m \times T}

,

Y \in C^{n \times T}

, and

A \in C^{n \times m}

, we assume

Y = A X .

Their corresponding Hankel matrices are computed as

H_{k ∣ Y} = H_{k ∣ X} diag (\underset{T - k}{\underset{︸}{A^{T}, \dots, A^{T}}}),

where

diag ({\underset{︸}{A^{T}, \dots, A^{T}}}_{T - k})

is defined as the

m (T - k) \times n (T - k)

matrix given by

diag (\underset{T - k}{\underset{︸}{A^{T}, \dots, A^{T}}}) = [\begin{matrix} A^{T} & 0 \\ ⋱ \\ 0 & A^{T} \end{matrix}] .

This implies, in particular,

\dim_{H} (k ∣ Y) \leq \dim_{H} (k ∣ X) and {codim}_{H} (k ∣ Y) \geq {codim}_{H} (k ∣ X) .

Furthermore, if a square-free polynomial

f (x) = x^{k} + a_{k - 1} x^{k - 1} + \dots + a_{0}

is a characteristic polynomial of a DKMD for

X

, it is also a characteristic polynomial of a DKMD for

Y

. In fact, we have

(\begin{matrix} a_{0} & \dots & a_{k - 1} & 1 \end{matrix}) H_{k ∣ Y} = (\begin{matrix} a_{0} & \dots & a_{k - 1} & 1 \end{matrix}) H_{k ∣ X} diag (\underset{T - k}{\underset{︸}{A^{T}, \dots, A^{T}}}) = 0 .

Furthermore, if there exists

B \in C^{m \times n}

such that

X = B Y,

\dim_{H} (k ∣ Y) = \dim_{H} (k ∣ X) and {codim}_{H} (k ∣ Y) = {codim}_{H} (k ∣ X),

holds, and the correspondence of characteristic polynomials is bijective.

The condition for

Y = A X

is that the row space of

Y

is contained in the row space of

X

, and symmetrically, the condition for

X = B Y

is that the row space of

Y

contains the row space of

X

. Hence, both

Y = A X

and

X = B Y

hold, if and only if the row space of

Y

and the row space of

X

are identical. In other words, under the hypothesis

Y = A X

, the necessary and sufficient condition for the existence of

B \in C^{m \times n}

such that

X = B Y

is

rank Y = rank X

.

If

Y = A X

and

rank Y = rank X

hold, the minimum

{min}_{B} {∥ X - B Y ∥}_{F}

is zero, and hence,

∥ X - X Y^{+} {Y ∥}_{F} = 0

holds, implying

X = X Y^{+} Y .

Furthermore, the aforementioned bijective correspondence between characteristic polynomials yields a bijective correspondence between DKMDs. In fact,

X = M V_{T ∣ μ_{1}, \dots, μ_{k}},

which is a DKMD of

X

, yields

Y = (A M) V_{T ∣ μ_{1}, \dots, μ_{k}},

which is a DKMD of

Y

In reverse, a DKMD of

Y

Y = M V_{T ∣ μ_{1}, \dots, μ_{k}},

corresponds to a DKMD of

X

X = (X Y^{T} M) V_{T ∣ μ_{1}, \dots, μ_{k}} .

The latter correspondence is the inverse of the former by Proposition 4.

Thus, we have:

Theorem 4.

If

rank Y = rank X

holds for

Y = A X

, then the following statements hold:

1.: $\dim_{H} (k ∣ X) = \dim_{H} (k ∣ Y)$ for each $k \in {0, 1, \dots, T - 1}$ .
2.: ${codim}_{H} (k ∣ X) = {codim}_{H} (k ∣ Y)$ for each $k \in {0, 1, \dots, T - 1}$ .
3.: The set of characteristic polynomials for $X$ is identical with that for $Y$ .
4.: A DKMD $X = M V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $X$ can be converted to a DKMD $Y = (A M) V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $Y$ , while a DKMD $Y = M V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $Y$ can be converted to a DKMD $X = (X Y^{+} M) V_{T ∣ μ_{1}, \dots, μ_{k}}$ of $X$ . These conversions yields a bijective correspondence between the set of DKMDs for $X$ and that for $Y$ .

As application of Theorem 3, the following two cases are particularly important.

When $A$ is a nonsingular $m \times m$ matrix, we have $rank Y = rank X$ automatically, and thus Theorem 3 provides the invariance of the Hankel dimension, Hankel codimension, characteristic polynomials and DKMDs under basis transformation in $C^{m}$ .
If $A \in C^{r \times m}$ can be selected to satisfy $rank X = rank A X$ . since $Y = A X$ has fewer rows than $X$ , $Y$ requires less computation to obtain DKMDs than $X$ .

7.4.3. The Best Possible Upper Bound for A Uniquely Feasible Degree

Although we have seen that a uniquely feasible degree is always less than T, we can determine the best possible upper bound. First, we see:

Proposition 5.

For an observable matrix

X

with T columns, we have

r = rank X < T

if

X

admits a uniquely feasible degree.

Proof.

If ℓ is a uniquely feasible degree, Proposition 2 implies

ℓ < T

. On the other hand,

X = M V_{T ∣ μ_{1}, \dots, μ_{ℓ}}

implies

r \leq ℓ

. The assertion follows. □

Based on this Proposition, we now establish the best possible upper bound for a Koopman degree that admits a unique DKMD.

Theorem 5.

If

{codim}_{H} (ℓ ∣ X) = 1

, then

ℓ \leq \frac{r T}{r + 1}

. Furthermore,

⌊ \frac{r T}{r + 1} ⌋

is the best possible upper bound for a uniquely feasible degree.

Proof.

When we take

Y = A X \in C^{r \times T}

with

rank Y = r

, Theorem 3 implies

\dim_{H} (ℓ ∣ X) = \dim_{H} (ℓ ∣ Y) = rank H_{ℓ ∣ Y} \leq r (T - ℓ),

equivalently,

{codim}_{H} (ℓ ∣ X) \geq ℓ + 1 - r (T - ℓ) .

Since

{codim}_{H} (ℓ ∣ X) = 1

by hypothesis,

ℓ + 1 - r (T - ℓ) \leq 1

, which yields

ℓ \leq \frac{r T}{r + 1} .

To show that

⌊ \frac{r T}{r + 1} ⌋

is the best possible upper bound for a uniquely feasible degree, we construct an

r \times T

observable matrix

X

with

rank X = r

and

\dim_{H} (ℓ ∣ X) = ℓ

.

First, we determine a sufficiently long series

Y

with mutually distinct eigenvalues

(μ_{1}, \dots, μ_{ℓ})

and nonzero modes

(m_{1}, \dots, m_{ℓ})

:

Y = [\begin{matrix} x_{0} & \dots & x_{\hat{T} - 1} \end{matrix}] = [\begin{matrix} m_{1} & \dots & m_{ℓ} \end{matrix}] V_{\hat{T} ∣ μ_{1}, \dots, μ_{ℓ}},

where

\hat{T}

is chosen sufficiently large so that we can construct an observable matrix

X = [\begin{matrix} x_{a_{1}} & \dots & x_{a_{1} + T - 1} \\ x_{a_{2}} & \dots & x_{a_{2} + T - 1} \\ ⋮ & ⋱ & ⋮ \\ x_{a_{r}} & \dots & x_{a_{r} + T - 1} \end{matrix}] .

Without loss of generality, we let

0 = a_{1} < a_{2} < \dots < a_{r}

and further suppose

a_{i + 1} - a_{i} \leq T - ℓ

for

i = 1, \dots, r - 1

.

We now verify that the columns of

H_{ℓ ∣ X}

correspond exactly to the first

a_{r} + T - ℓ

columns of

H_{ℓ ∣ Y}

. The i-th row of

X

contributes

T - ℓ

consecutive columns to

H_{ℓ ∣ X}

, where the last such column is

{(x_{a_{i} + T - ℓ - 1} \dots x_{a_{i} + T - 1})}^{T}

. On the other hand, the first column contributed by the

(i + 1)

-th row is

{(x_{a_{i + 1}} \dots x_{a_{i + 1} + ℓ})}^{T}

. Since

a_{i + 1} - a_{i} \leq T - ℓ

implies

a_{i + 1} \leq a_{i} + T - ℓ

, we have

a_{i + 1} \leq a_{i} + T - ℓ - 1 + 1

, which means the columns contributed by the i-th and

(i + 1)

-th rows are contiguous (or overlapping) in

H_{ℓ ∣ Y}

, with no gaps. Consequently, the set of columns of

H_{ℓ ∣ X}

coincides exactly with the first

a_{r} + T - ℓ

columns of

H_{ℓ ∣ Y}

.

Thus,

rank H_{ℓ ∣ X} = ℓ

holds if and only if

a_{r} + T - ℓ \geq ℓ

by Theorem 2. Since

a_{r}

satisfies

r - 1 \leq a_{r} \leq (r - 1) (T - ℓ)

, the condition for such an

a_{r}

to exist is that

2 ℓ - T \leq (r - 1) (T - ℓ), i . e ., ℓ \leq \frac{r T}{r + 1} .

Since ℓ must be an integer, the largest feasible value is

ℓ = ⌊ \frac{r T}{r + 1} ⌋

, completing the proof. □

7.4.4. Monotonic increase of the Hankel codimension

In this section, we establish that the Hankel codimension increases strictly monotonically beyond a certain threshold. This property plays a crucial role in identifying uniquely feasible degrees.

Theorem 6.

In the domain

ℓ \in {L, L + 1, \dots, T - 1}

,

{codim}_{H} (ℓ ∣ X)

is strictly monotonically increasing.

Proof.

Let

L \leq ℓ < k < T

. First, for a nonzero vector

w \in V {(H_{L ∣ X})}^{⊥}

, define

w^{0, ℓ - L} = {(w^{T} \underset{ℓ - L}{\underset{︸}{0 \dots 0}})}^{T} \in C^{ℓ + 1} .

This is a nonzero vector in

V {(H_{ℓ ∣ X})}^{⊥}

, so we have

{codim}_{H} (ℓ ∣ X) > 0

.

Next, let

v_{1}, \dots, v_{d}

be a basis of

V {(H_{ℓ ∣ X})}^{⊥}

, where

d = {codim}_{H} (ℓ ∣ X)

. By reordering if necessary, we may assume that

j_{1} : = max {j : {(v_{1})}_{j} \neq 0} \geq max {j : {(v_{i})}_{j} \neq 0}

holds for all

i \in {1, \dots, d}

.

We claim that the following

d + (k - ℓ)

vectors are linearly independent:

v_{1}^{1, k - ℓ - 1}, v_{1}^{2, k - ℓ - 2}, \dots, v_{1}^{k - ℓ, 0}, v_{1}^{0, k - ℓ}, v_{2}^{0, k - ℓ}, \dots, v_{d}^{0, k - ℓ},

which all belong to

V {(H_{k ∣ X})}^{⊥} \subset C^{k + 1}

. To prove the claim, let us consider

\sum_{s = 1}^{k - ℓ} c_{s} v_{1}^{s, k - ℓ - s} + \sum_{i = 1}^{d} c_{i}^{'} v_{i}^{0, k - ℓ} = 0 .

The vector

v_{1}^{s, k - ℓ - s}

has the form

{(0^{s} v_{1}^{T} 0^{k - ℓ - s})}^{T} \in C^{k + 1}

, so its

(j_{1} + s)

-th component equals

{(v_{1})}_{j_{1}} \neq 0

.

First, we prove

c_{s} = 0

for

s = k - ℓ, k - ℓ - 1, \dots, 1

by backward induction on s.

For

s = k - ℓ

, consider the

(j_{1} + k - ℓ)

-th component of the left-hand side. Among all vectors in the sum, only

v_{1}^{k - ℓ, 0}

has a nonzero value at this position, namely

{(v_{1})}_{j_{1}}

. The vectors

v_{i}^{0, k - ℓ}

for

i = 1, \dots, d

have the form

{(v_{i}^{T} 0^{k - ℓ})}^{T}

, and since

k - ℓ \geq 1

, we have

j_{1} + k - ℓ > j_{1}

, which implies that their

(j_{1} + k - ℓ)

-th component is zero. Similarly, for

t < k - ℓ

, the vector

v_{1}^{t, k - ℓ - t}

has nonzero components only up to position

j_{1} + t < j_{1} + k - ℓ

, so its

(j_{1} + k - ℓ)

-th component is also zero. Therefore,

c_{k - ℓ} {(v_{1})}_{j_{1}} = 0

, which gives

c_{k - ℓ} = 0

.

For

s < k - ℓ

, assuming

c_{s + 1} = \dots = c_{k - ℓ} = 0

by the induction hypothesis, we consider the

(j_{1} + s)

-th component of the left-hand side. By the same argument as above, only

v_{1}^{s, k - ℓ - s}

has a nonzero value at position

j_{1} + s

, namely

{(v_{1})}_{j_{1}}

. Therefore,

c_{s} {(v_{1})}_{j_{1}} = 0

, which gives

c_{s} = 0

.

This reduces the equation to

\sum_{i = 1}^{d} c_{i}^{'} v_{i}^{0, k - ℓ} = 0

, which implies

c_{i}^{'} = 0

for

i = 1, \dots, d

by the linear independence of

v_{1}, \dots, v_{d}

.

Therefore,

{codim}_{H} (k ∣ X) \geq d + (k - ℓ) = {codim}_{H} (ℓ ∣ X) + (k - ℓ) > {codim}_{H} (ℓ ∣ X),

which completes the proof. □

7.4.5. Equivalence Between Unique and Minimal Feasibility

If a uniquely feasible degree exists, it must coincide with the minimum feasible degree L. This conclusion follows directly from Theorem 6 as an immediate corollary.

Corollary 4.

If

X

admits a uniquely feasible degree, it must be L.

Proof.

While Corollary 2 states that a uniquely feasible degree ℓ must satisfy

{codim}_{H} (ℓ ∣ X) = 1

, Theorem 6 ensures that this condition is met only when

ℓ = L

. □

7.4.6. Saturation of $\dim_{H} (ℓ ∣ X)$

In this section, we present a theorem that extends Theorem 2. This result plays a crucial role in the development of algorithms for determining uniquely feasible degrees, particularly in the case where

L \leq \frac{T + 1}{2}

for the minimum feasible degree L.

Theorem 7.

If

L \leq \frac{T + 1}{2}

and an L-degree DKMD exists, then:

\begin{matrix} \dim_{H} (ℓ ∣ X) & = \{\begin{matrix} ℓ + 1, & for 0 \leq ℓ \leq L - 1, \\ L, & for L \leq ℓ \leq T - L; \end{matrix} \\ {codim}_{H} (ℓ ∣ X) & = \{\begin{matrix} 0, & for 0 \leq ℓ \leq L - 1, \\ ℓ - L + 1, & for L \leq ℓ \leq T - L . \end{matrix} \end{matrix}

Proof.

We express the L-degree DKMD given by hypothesis in the form

X = M V_{T ∣ μ_{1}, \dots, μ_{L}} .

No column of

M

is a zero vector, since L is the minimum feasible degree. Indeed, if

X = M^{'} V_{T ∣ μ_{1}, \dots, μ_{L - 1}}

held for some

M^{'} \in C^{m \times (L - 1)}

, then the coefficient vector

(\begin{matrix} b_{0} & b_{1} & \dots & b_{L - 2} & 1 \end{matrix})

of the polynomial

(X - μ_{1}) \dots (X - μ_{L - 1}) = X^{L - 1} + b_{L - 2} X^{L - 2} + \dots + b_{0}

would belong to

V {(H_{L - 1 ∣ X})}^{⊥}

, contradicting the definition of L.

By Theorem 3, we can transform

X

by multiplying a matrix

A

such that

rank X = rank (A X)

without changing the assertion. In particular, we may assume the following:

$X \in C^{r \times T}$ with $rank X = r$ . To achieve this, take $i_{1}, \dots, i_{r}$ such that the rows $X 〈 i_{1} 〉, \dots, X 〈 i_{r} 〉$ are linearly independent. Then, define $A \in C^{r \times m}$ so that the j-th row of $A$ has 1 as the $i_{j}$ -th component and 0 for the other components.
The first row of $M$ has no zero component: $m_{1 i} \neq 0$ for $i = 1, 2, \dots, L$ . Since each column of $M$ is nonzero and $X \in C^{r \times T}$ , we can find a nonsingular matrix $A \in C^{r \times r}$ such that the first row of $A M$ has no zero component.

After this transformation, we apply Theorem 2 to the first row of

X

:

X 〈 1 〉 = (\begin{matrix} m_{11} & \dots & m_{1 L} \end{matrix}) V_{T ∣ μ_{1}, \dots, μ_{L}},

and obtain

\begin{matrix} \dim_{H} (ℓ ∣ X) & \geq \dim_{H} (ℓ ∣ X 〈 1 〉) = \{\begin{matrix} ℓ + 1, & for 0 \leq ℓ \leq L - 1, \\ L, & for L \leq ℓ \leq T - L, \end{matrix} \end{matrix}

since the columns of

H_{ℓ ∣ X 〈 1 〉}

are among those of

H_{ℓ ∣ X}

. For

0 \leq ℓ \leq L - 1

, in particular, the equality

\dim_{H} (ℓ ∣ X) = ℓ + 1

holds because

\dim_{H} (ℓ ∣ X) \leq ℓ + 1

by definition.

Leveraging this evaluation of lower bound, the claim

\dim_{H} (ℓ ∣ X) = L

for

L \leq ℓ \leq T - L

can be proven by induction on ℓ.

For the base case

ℓ = L

, the definition of L gives

{codim}_{H} (L ∣ X) \geq 1

, which implies

\dim_{H} (L ∣ X) = L + 1 - {codim}_{H} (L ∣ X) \leq L

. Combined with

\dim_{H} (L ∣ X) \geq L

shown above, we have

\dim_{H} (L ∣ X) = L

.

For the inductive step, assume

ℓ > L

and

\dim_{H} (ℓ - 1 ∣ X) = L

, i.e.,

{codim}_{H} (ℓ - 1 ∣ X) = ℓ - L

. By Theorem 6,

{codim}_{H} (ℓ ∣ X) > {codim}_{H} (ℓ - 1 ∣ X) = ℓ - L

, which gives

\dim_{H} (ℓ ∣ X) = ℓ + 1 - {codim}_{H} (ℓ ∣ X) < ℓ + 1 - (ℓ - L) = L + 1,

and consequently,

\dim_{H} (ℓ ∣ X) \leq L

. Combined with

\dim_{H} (ℓ ∣ X) \geq L

, we conclude

\dim_{H} (ℓ ∣ X) = L

.

Finally, the expressions for

{codim}_{H} (ℓ ∣ X)

follow directly from the definition of the Hankel codimension. □

Although Theorem 7 requires

L \leq \frac{T + 1}{2}

, if

L = \frac{T + 1}{2}

(which occurs only when T is odd), then

L = \frac{T + 1}{2} > \frac{T - 1}{2} = T - L

holds, implying that the range

L \leq ℓ \leq T - L

in Theorem 7 is empty. In this boundary case, the theorem provides information only for

ℓ = 0, 1, \dots, L - 1 = \frac{T - 1}{2}

, namely that

\dim_{H} (ℓ ∣ X) = ℓ + 1

and

{codim}_{H} (ℓ ∣ X) = 0

in this range.

For

L < \frac{T + 1}{2}

, which for integer L is equivalent to

L \leq \frac{T}{2}

, we have:

Corollary 5.

If L is feasible and satisfies

L \leq \frac{T}{2}

, then L is a uniquely feasible degree.

Proof.

Since L is feasible, an L-degree DKMD exists. The condition

L \leq \frac{T}{2}

implies

L < \frac{T + 1}{2}

, so Theorem 7 applies and gives

{codim}_{H} (L ∣ X) = 1

. By Corollary 2, this means that L is uniquely feasible. □

8. Algorithms

Leveraging the five properties mentioned in Section 7.4, we develop efficient algorithms to search for a uniquely feasible degree L and to determine an L-degree characteristic polynomial, which provides eigenvalues of a DKMD. Once mutually distinct eigenvalues are obtained, the associated Koopman modes are calculated as described in Section 5.2. Our algorithms are categorized into:

One that applies to the case where $L \leq \frac{T}{2}$ and determines L by $\dim_{H} (⌊ \frac{T}{2} ⌋ ∣ X)$ ;
Another that performs a binary search to determine L in the case when $L > \frac{T}{2}$ .

To introduce the algorithms, we start by addressing a theoretical scenario where the observables exactly consist of a finite number of wave components. In this case, an exact DKMD is obtained. Subsequently, we consider a practical scenario where the observables comprise a finite number of dominant wave components, an infinite number of minor wave components, and noise. Here, our algorithms focus on extracting only the dominant wave components, effectively filtering out the minor components and the noise, resulting in an approximate DKMD.

8.1. A Theoretical Scenario

We first present Algorithm 1, which reduces the dimension of each observable so that decomposing the reduced observable matrix is equivalent to decomposing the original one. This reduction is practically useful for more efficient computation and clearer understanding of the underlying structure. We then present algorithms that decompose an observable matrix for two cases:

L \leq \frac{T}{2}

(Algorithms 2 and 3) and

\frac{T}{2} < L \leq \frac{r T}{r + 1}

(Algorithm 4).

8.1.1. Dimension Reduction

Although each observable vector, i.e., a column vector of

X

, is of dimension m, the rank r of

X

can be smaller than m. In this case, Algorithm 1 determines a dimensionally reduced observable matrix

Y \in C^{r \times T}

and a conversion matrix

A \in C^{r \times m}

such that

Y = A X

and

rank Y = rank X = r

. By Theorem 3, the Hankel dimensions and codimensions, as well as the set of characteristic polynomials, are invariant under this conversion, and DKMDs for

X

and those for

Y

are mutually converted by matrix multiplication by

A

and

X Y^{+}

.

Such

A

and

Y

can be constructed by selecting r linearly independent rows of

X

and defining

A

so that these rows become the rows of

Y = A X

. Since these r rows of

X

are linearly independent and form all the rows of

Y

, we have

rank Y = r = rank X

.

Algorithm 1 Dimension reduction of the observable matrix.

Require:: $X \in C^{m \times T}$
Ensure:: Matrices $A \in C^{r \times m}$ and $Y \in C^{r \times T}$ for $r = rank X$ with $Y = A X$ and $rank Y = r$
1:: Find $i_{1}, \dots, i_{r} \in {1, 2, \dots, m}$ such that the row vectors $X 〈 i_{1} 〉, \dots, X 〈 i_{r} 〉$ are linearly independent;
2:: Determine a matrix $A \in C^{r \times m}$ such that the $(j, i_{j})$ component is 1 and all other components are 0 for $j = 1, \dots, r$ ;
3:: return $A$ and $Y = A X$ .

By applying Algorithm 1, we can reduce the problem of decomposing

X \in C^{m \times T}

to the problem of decomposing

Y \in C^{r \times T}

with

r = rank X \leq m

. This reduction provides benefits in terms of computational efficiency and clearer understanding of the data structure when executing the algorithms presented below. However, these algorithms are formulated in general terms and do not require that such a dimension reduction has been performed.

8.1.2. Case $L \leq \frac{T}{2}$

Algorithm 2 first investigates whether

L \leq \frac{T}{2}

by leveraging the equivalence between

L \leq ⌊\frac{T}{2}⌋

and

{codim}_{H} (⌊\frac{T}{2}⌋ ∣ X) > 0

. Indeed, if

L \leq ⌊\frac{T}{2}⌋

, then

{codim}_{H} (⌊\frac{T}{2}⌋ ∣ X) \geq {codim}_{H} (L ∣ X) > 0

holds by Theorem 6. Conversely, if

{codim}_{H} (⌊\frac{T}{2}⌋ ∣ X) > 0

, then

L \leq ⌊\frac{T}{2}⌋

by the definition of L.

If this investigation reveals

L \leq \frac{T}{2}

, the algorithm identifies L as

\dim_{H} (⌊\frac{T}{2}⌋ ∣ X)

by Theorem 7, and then executes Algorithm 3 to determine whether L is uniquely feasible. If

L > \frac{T}{2}

, the algorithm returns the value continue, indicating that Algorithm 4 should be used.

When invoked, Algorithm 3 verifies the following:

A vector ${(a_{0}, a_{1}, \dots, a_{L}, 1)}^{T}$ exists in $V {(H_{L ∣ X})}^{⊥}$ . This can be efficiently verified by performing a QR decomposition of $H_{L ∣ X}$ .
If such a vector exists, verify that the polynomial

$x^{L + 1} + a_{L} x^{L} + \dots + a_{1} x + a_{0} = 0$

has no repeated roots.

If both conditions are satisfied, this confirms that L is feasible, and we can then apply Theorem 7. As a result, we have

{codim}_{H} (L ∣ X) = 1

, meaning that L is uniquely feasible. Thus, the polynomial obtained in the verification is square-free and serves as the characteristic polynomial of the unique L-degree DKMD of

X

. In this case, the algorithm returns the obtained characteristic polynomial. Otherwise, it returns the value no_solution, indicating that no uniquely feasible degree exists.

Algorithm 2 Search for an L-degree characteristic polynomial when

L \leq \frac{T}{2}

.

Require:: $X \in C^{m \times T}$
Ensure:: The signal continue if $L > \frac{T}{2}$ ; the characteristic polynomial if $L \leq \frac{T}{2}$ is uniquely feasible; no_solution if $L \leq \frac{T}{2}$ is not uniquely feasible.
1:: if ${codim}_{H} (⌊\frac{T}{2}⌋ ∣ X) > 0$ then
2:: Let $L = \dim_{H} (⌊\frac{T}{2}⌋ ∣ X)$ ;
3:: Execute Algorithm 3;
4:: return the return value of Algorithm 3;
5:: else
6:: return continue
7:: end if

Algorithm 3 Determine the characteristic polynomial.

Require:: $X \in C^{m \times T}$ and L
Ensure:: An L-degree characteristic polynomial or no_solution
1:: if $\exists {(a_{0} a_{1} \dots a_{L} 1)}^{T} \in V {(H_{L ∣ X})}^{⊥}$ then
2:: Let $f (x) = x^{L + 1} + a_{L} x^{L} + \dots + a_{1} x + a_{0}$ ;
3:: if $f (x) = 0$ has no repeated roots then
4:: return $f (x)$
5:: end if
6:: end if
7:: return no_solution

8.1.3. Case $\frac{T}{2} < L \leq \frac{rT}{r + 1}$

Algorithm 4 details the procedure for cases when

L > \frac{T}{2}

. Note that if L is uniquely feasible, then

L \leq \frac{r T}{r + 1}

holds by Theorem 5, and this gives the best possible upper bound.

The algorithm first verifies whether

{codim}_{H} (⌊\frac{r T}{r + 1}⌋ ∣ X) > 0

. If this condition does not hold, no uniquely feasible degree exists, and the algorithm returns the signal no_solution.

If

{codim}_{H} (⌊\frac{r T}{r + 1}⌋ ∣ X) > 0

, then L lies in the range

⌊\frac{T}{2}⌋ < L \leq ⌊\frac{r T}{r + 1}⌋

. The algorithm utilizes a binary search to find L, leveraging the fact that

{codim}_{H} (ℓ ∣ X)

is a strictly increasing function by Theorem 6.

Since the identified L does not necessarily satisfy

{codim}_{H} (L ∣ X) = 1

, the algorithm must verify

{codim}_{H} (L ∣ X) = 1

before executing Algorithm 3 to confirm that L is uniquely.

8.2. A Practical Scenario

In practice, even if the Koopman operator has only discrete eigenvalues, the number of eigenvalues can be infinite, and in addition, observables can contain error signals. In such situations, the purpose of DKMD is to find a finite number of major wave components that most significantly affect the observables. From the viewpoint of executing our algorithms, the presence of minor components and errors makes direct computation of Hankel dimensions via QR decomposition impractical. In fact,

rank H_{ℓ ∣ X} = ℓ + 1

may always hold, which makes it impossible to identify the Hankel dimensions. In this section, we present a method to estimate Hankel dimensions for the major components via singular value decomposition (SVD) rather than QR decomposition.

We assume that the observables are represented as

x_{t} = \sum_{n = 1}^{ℓ} m_{n} λ_{n}^{t} + \sum_{n = 1}^{\infty} m_{n}^{'} {λ_{n}^{'}}^{t} + ε_{t},

Algorithm 4 Search for an L-degree characteristic polynomial when

L \in (\frac{T}{2}, \frac{r T}{r + 1}]

.

Require:: $X \in C^{m \times T}$ with ${codim}_{H} (⌊\frac{T}{2}⌋ ∣ X) = 0$
Ensure:: Either the characteristic polynomial of the DKMD of $X$ for the uniquely feasible degree $L > \frac{T}{2}$ , if present, or no_solution, otherwise.
1:: if ${codim}_{H} (⌊\frac{r T}{r + 1}⌋ ∣ X) = 0$ then
2:: return $no_solution$
3:: end if
4:: Let $l = ⌊\frac{T}{2}⌋$ ;
5:: Let $h = ⌊\frac{r T}{r + 1}⌋$ ;
6:: while $h - l > 1$ do
7:: Let $k = ⌊\frac{l + h}{2}⌋$ ;
8:: if ${codim}_{H} (k ∣ X) = 0$ then
9:: Let $l = k$ ;
10:: else if ${codim}_{H} (k ∣ X) = 1$ then
11:: Let $L = k$ ;
12:: Execute Algorithm 3;
13:: return the return value of Algorithm 3;
14:: else
15:: Let $h = k$ ;
16:: end if
17:: end while
18:: if ${codim}_{H} (h ∣ X) = 1$ then
19:: Let $L = h$ ;
20:: Execute Algorithm 3;
21:: return the return value of Algorithm 3;
22:: else
23:: return $no_solution$ ;
24:: end if

where

λ_{n}^{t}

and

{λ_{n}^{'}}^{t}

represent major and minor wave components, respectively, and

ε_{t}

is noise. We define:

\begin{matrix} {\hat{x}}_{t} = \sum_{n = 1}^{ℓ} {\hat{m}}_{n} {\hat{λ}}_{n}^{t}, \hat{X} = [{\hat{x}}_{0} \dots {\hat{x}}_{T - 1}]; \\ x_{t}^{'} = \sum_{n = 1}^{\infty} m_{n}^{'} {λ_{n}^{'}}^{t} + ε_{t}, X^{'} = [x_{0}^{'} \dots x_{T - 1}^{'}], \end{matrix}

and our basic assumption is that

x_{t}^{'}

is a small perturbation. Our aim is to estimate

rank H_{ℓ ∣ \hat{X}}

from

H_{ℓ ∣ X}

, taking advantage of the fact that

rank H_{ℓ ∣ \hat{X}}

equals the number of positive singular values of

H_{ℓ ∣ \hat{X}}

.

We consider

A = \hat{A} + A^{'} \in C^{m \times n}

with

m \leq n

. Let

σ_{1} \geq \dots \geq σ_{m} \geq 0

,

{\hat{σ}}_{1} \geq \dots \geq {\hat{σ}}_{m} \geq 0

, and

σ_{1}^{'} \geq \dots \geq σ_{m}^{'} \geq 0

be the singular values of

A

,

\hat{A}

, and

A^{'}

, respectively. Then,

|σ_{i} - {\hat{σ}}_{i}| \leq σ_{1}^{'}

holds for all

i \in {1, \dots, m}

. This can be proven as follows. By the Courant–Fischer min-max theorem [17], the i-th singular value of

A

satisfies

σ_{i} = min_{\begin{matrix} V \subseteq C^{n} \\ dim V = n - i + 1 \end{matrix}} max_{\begin{matrix} v \in V \\ ∥ v ∥ = 1 \end{matrix}} ∥ A v ∥ .

Since

A = \hat{A} + A^{'}

, we have

σ_{i} - {\hat{σ}}_{i} \leq σ_{1}^{'}

as follows:

\begin{matrix} σ_{i} & = min_{\begin{matrix} V \subseteq C^{n} \\ dim V = n - i + 1 \end{matrix}} max_{\begin{matrix} v \in V \\ ∥ v ∥ = 1 \end{matrix}} ∥ A v ∥ \\ \leq min_{\begin{matrix} V \subseteq C^{n} \\ dim V = n - i + 1 \end{matrix}} max_{\begin{matrix} v \in V \\ ∥ v ∥ = 1 \end{matrix}} (∥ \hat{A} v ∥ + ∥ A^{'} v ∥) \\ \leq min_{\begin{matrix} V \subseteq C^{n} \\ dim V = n - i + 1 \end{matrix}} max_{\begin{matrix} v \in V \\ ∥ v ∥ = 1 \end{matrix}} ∥ \hat{A} v ∥ + σ_{1}^{'} = {\hat{σ}}_{i} + σ_{1}^{'} . \end{matrix}

By the same reasoning, we also obtain

{\hat{σ}}_{i} - σ_{i} \leq σ_{1}^{'}

.

Furthermore, if

rank \hat{A} = r < m

, that is,

{\hat{σ}}_{1} \geq \dots \geq {\hat{σ}}_{r} > 0 = {\hat{σ}}_{r + 1} = \dots = {\hat{σ}}_{m},

then we have

σ_{i} \{\begin{matrix} \geq {\hat{σ}}_{i} - σ_{1}^{'} \geq {\hat{σ}}_{r} - σ_{1}^{'}, & if i \leq r, \\ \leq σ_{1}^{'}, & if i > r . \end{matrix}

Therefore, if

{\hat{σ}}_{r}

is sufficiently larger than

σ_{1}^{'}

, there exists a large gap between

σ_{i}

for

i \leq r

and

σ_{i}

for

i > r

, and hence, we can estimate r from

σ_{1}, \dots, σ_{m}

. Thus, if we can assume that the smallest positive singular value of

H_{ℓ ∣ \hat{X}}

is sufficiently larger than the largest singular value of

H_{ℓ ∣ X^{'}}

, we can apply this method to estimate

\dim_{H} (ℓ ∣ \hat{X})

.

8.3. Time Complexities

The computationally intensive operations in these algorithms primarily involve executing QR decomposition (QRD), singular value decomposition (SVD), and solving equations (EqS). Table 2 demonstrates that the algorithms execute these computations only a small number of times, and consequently, prove to be highly efficient.

9. Simulations

Through simulations, we investigate the accuracy of Algorithms 2 to 4 in estimating Koopman eigenvalues and making predictions.

9.1. Synthetic Datasets Used in the Simulations

The synthetic datasets of observables are constructed with

m = 2

and

T = 50

, which makes

X

a

2 \times 50

matrix, and are randomly generated by performing the following steps:

Sample as many distinct Koopman eigenvalues, each classified as either major or minor, as specified in Table 3. Let $λ_{1}, \dots, λ_{N}$ denote these eigenvalues. For each $λ_{i}$ , its complex conjugate ${\bar{λ}}_{i}$ must also be in the set. Furthermore, every conjugate pair $(λ, \bar{λ})$ is sampled independently as follows:
- $| λ | = | \bar{λ} |$ is sampled according to a log-normal distribution with parameters $μ = 0$ and $σ = 0.01$ , whose probability density function is $\frac{1}{x σ \sqrt{2 π}} exp (- \frac{{(ln x)}^{2}}{2 σ^{2}})$ . The median, mean, and variance of this distribution are $e^{0} = 1$ , $e^{\frac{σ^{2}}{2}}$ , and $e^{2 σ^{2}} - e^{σ^{2}}$ , respectively.
- $arg λ = - arg \bar{λ}$ is sampled uniformly from the interval $(0, π)$ .
The distribution of $| λ |$ is designed to restrict the occurrence of samples too far from 1, because $| λ |$ much larger than 1 causes the observables to diverge, while a component with $| λ |$ much smaller than 1 decays rapidly.
Determine the Koopman mode $m_{i} = {(m_{1 i}, m_{2 i})}^{⊤}$ corresponding to $λ_{i}$ with the following constraints:
- $m_{1 j} = {\bar{m}}_{1 i}$ and $m_{2 j} = {\bar{m}}_{2 i}$ hold whenever $λ_{j} = {\bar{λ}}_{i}$ ;
- The modes associated with the major eigenvalues must have significant magnitudes, while those associated with the minor eigenvalues must have smaller magnitudes.
To satisfy the second requirement, we use a function $ς : C \to [0, 1]$ defined below, which has sharp peaks only at the sampled major eigenvalues $λ_{i_{1}}, \dots, λ_{i_{k}}$ :

$ς (λ ∣ λ_{i_{1}}, \dots, λ_{i_{k}}) = max_{j = 1, \dots, k} \frac{2}{1 + e^{100 [(| λ | - | λ_{i_{j}} {|)}^{2} + {(arg λ - arg λ_{i_{j}})}^{2}]}} .$

For each Koopman eigenvalue $λ_{i}$ , the mode $m_{i}$ is determined by sampling the argument of each component uniformly at random, while setting the magnitude to $ς (λ_{i} ∣ λ_{i_{1}}, \dots, λ_{i_{k}})$ , i.e., $| m_{1 i} | = | m_{2 i} | = ς (λ_{i} ∣ λ_{i_{1}}, \dots, λ_{i_{k}})$ .
Construct $X$ as $X = [m_{1}, \dots, m_{N}] V_{T ∣ λ_{1}, \dots, λ_{N}}$ . If the inclusion of noise is required, add to $X$ a noise matrix $[ε_{i t}]$ with $i \in {1, 2}$ and $t \in {0, 1, \dots, T - 1}$ , where each $ε_{i t}$ is independently sampled from a normal distribution $N (0, 0 . 01^{2})$ .

In addition, to evaluate the predictive accuracy of our algorithms, we compute observable values for

t \in {50, 51, \dots, 79}

using the Koopman eigenvalues and Koopman modes determined above.

9.2. Simulation Scenarios

We conduct simulations under the following four distinct scenarios:

Scenarios 1 and 2 investigate the case where an exact DKMD is obtained via QR decomposition (Section 8.1).
Scenarios 3 and 4 investigate the case where an approximated DKMD is obtained via singular value decomposition (Section 8.2).
Scenarios 1 and 3 are used to investigate Algorithm 2.
Scenarios 2 and 4 are used to investigate Algorithm 4.

9.3. Results of the Simulations

We have obtained excellent results in the simulations for all scenarios. In the case of estimating an exact DKMD, the estimated Koopman eigenvalues and the predictions for

t \in {50, 51, \dots, 79}

are identical to the ground truth within numerical precision. In the case of estimating an approximated DKMD, the estimated Koopman eigenvalues and the predictions for

t \in {50, 51, \dots, 79}

show close agreement with the ground truth.

Scenarios 1 and 2:: For both scenarios, Algorithms 2 and 4 correctly identify the ground truth uniquely feasible degrees. Furthermore, we observe that the estimated eigenvalues (the upper-right panels of Figure 3(a,b)) and the predictions for $t \in {50, 51, \dots, 79}$ (the bottom panels of Figure 3(a,b)) are identical to the ground truth within numerical precision.
Scenario 3:: The upper-left panel in Figure 4(a) depicts the logarithm of the singular values of $H_{25 ∣ X}$ . Evidently, we observe a significant gap between the top ten singular values and those that follow, leading to the conclusion that $\dim_{H} (25 ∣ X) = 10$ , indicating that $ℓ = 10$ is the uniquely feasible degree. Furthermore, the estimated eigenvalues (the upper-right panel) and the predictions (the bottom panels) show excellent agreement with the ground truth.
Scenario 4:: The upper-left panel in Figure 4(b) depicts the result of a singular value decomposition of $H_{ℓ ∣ X}$ when the binary search of Algorithm 4 visits $ℓ = 30$ . Since the top 30 singular values are significantly greater than those that follow, we can conclude that ${codim}_{H} (30 ∣ X) = 1$ , meaning that $ℓ = 30$ is the uniquely feasible degree. The estimated eigenvalues and the predictions also show excellent agreement with the ground truth.

10. Conclusions

We have developed a theoretical framework to estimate the correct degree (uniquely feasible degree) of Discrete Koopman Mode Decomposition (DKMD) from a given dataset of observables. The degree of DKMD corresponds to the number of Koopman eigenvalues involved. We demonstrate that unless the correct degree is used, infinitely many DKMDs with different Koopman eigenvalues and modes can fit the training data. However, these variations lead to divergent predictions for future time steps, resulting in unreliable forecasts. Furthermore, the theory provides efficient algorithms to identify uniquely feasible degrees.

References

Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw-Hill: New York, 1987. [Google Scholar]
Koopman, B.O. Hamiltonian systems and transformation in Hilbert space. Proceedings of the national academy of sciences of the united states of america 1931, 17, 315. [Google Scholar] [CrossRef] [PubMed]
Schmid, P.J. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics 2010, 656, 5–28. [Google Scholar] [CrossRef]
Rowley, C.W.; Mezić, I.; Bagheri, S.; Schlatter, P.; Henningson, D.S. Spectral analysis of nonlinear flows. Journal of Fluid Mechanics 2009, 641, 115–127. [Google Scholar] [CrossRef]
Tu, J.H.; Rowley, C.W.; Luchtenburg, D.M.; Brunton, S.L.; Kutz, J.N. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics 2014, 1, 391. [Google Scholar] [CrossRef]
Taira, K.; Brunton, S.L.; Dawson, S.T.; Rowley, C.W.; Colonius, T.; McKeon, B.J.; Schmidt, O.T.; Gordeyev, S.; Theofilis, V.; Ukeiley, L.S. Modal analysis of fluid flows: An overview. Aiaa Journal 2017, 55, 4013–4041. [Google Scholar] [CrossRef]
Brunton, S.L.; Brunton, B.W.; Proctor, J.L.; Kaiser, E.; Kutz, J.N. Chaos as an intermittently forced linear system. Nature communications 2017, 8, 1–9. [Google Scholar] [CrossRef] [PubMed]
Brunton, B.W.; Johnson, L.A.; Ojemann, J.G.; Kutz, J.N. Extracting spatial–temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of neuroscience methods 2016, 258, 1–15. [Google Scholar] [CrossRef] [PubMed]
Taylor, R.; Kutz, J.N.; Morgan, K.; Nelson, B.A. Dynamic mode decomposition for plasma diagnostics and validation. Review of Scientific Instruments 2018, 89, 053501. [Google Scholar] [CrossRef] [PubMed]
Kaptanoglu, A.A.; Morgan, K.D.; Hansen, C.J.; Brunton, S.L. Characterizing magnetized plasmas with dynamic mode decomposition. Physics of Plasmas 2020, 27, 032108. [Google Scholar] [CrossRef]
Kusaba, A.; Shin, K.; Shepard, D.; Kuboyama, T. Predictive Nonlinear Modeling by Koopman Mode Decomposition. In Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), 2020; IEEE; pp. 811–819. [Google Scholar]
Fujii, K.; Takeishi, N.; Kibushi, B.; Kouzaki, M.; Kawahara, Y. Data-driven spectral analysis for coordinative structures in periodic human locomotion. Scientific reports 2019, 9, 1–14. [Google Scholar] [CrossRef] [PubMed]
Berger, E.; Sastuba, M.; Vogt, D.; Jung, B.; Ben Amor, H. Estimation of perturbations in robotic behavior using dynamic mode decomposition. Advanced Robotics 2015, 29, 331–343. [Google Scholar] [CrossRef]
Takeishi, N.; Kawahara, Y.; Yairi, T. Sparse nonnegative dynamic mode decomposition. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), 2017; IEEE; pp. 2682–2686. [Google Scholar]
Rudin, W. Functional Analysis. In McGraw-Hill Series in Higher Mathematics, second ed. Second edition; McGraw-Hill, Inc.: New York, St. Louis, San Francisco, 1991; p. xvi + 424. [Google Scholar]
Susuki, Y.; Mezic, I. A Prony approximation of Koopman mode decomposition. In Proceedings of the 2015 54th IEEE Conference on Decision and Control (CDC), 2015; IEEE; pp. 7022–7027. [Google Scholar] [CrossRef]
Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press, 2012. [Google Scholar]

Figure 1. Comparison between FD and KMD.

Figure 2. Illustration of multiple valid KMDs with divergent predictions.

Figure 3. Results of simulations for Scenarios 1 and 2.

Figure 4. Results of simulations for Scenarios 3 and 4.

Table 1. Notation summary.

Notation	Description
ℓ	Koopman degree.
$x_{0}, \dots, x_{T - 1} \in C^{m}$	Column vectors of observables.
$μ_{1}, \dots, μ_{ℓ} \in C$	Koopman eigenvalues of an ℓ-degree DKMD.
$m_{1}, \dots, m_{ℓ} \in C^{m}$	Koopman modes of an ℓ-degree DKMD.
$X \in C^{m \times T}$	Observable matrix $[x_{0} \dots x_{T - 1}]$ .
$M \in C^{m \times ℓ}$	Mode matrix $[m_{1} \dots m_{ℓ}]$ .
$X_{i}^{j} \in C^{m \times (j - i + 1)}$	Submatrix $[x_{i} \dots x_{j}]$ for $0 \leq i \leq j < T$ .
$X 〈 i 〉 \in C^{T}$	The ith row vector of $X$ .
$V_{n ∣ a_{1}, \dots, a_{k}} \in C^{k \times n}$	Vandermonde matrix (Definition 2).
$H_{k ∣ X} \in C^{(k + 1) \times m (T - k)}$	Hankel matrix (Definition 6).
$\dim_{H} (k ∣ X)$	kth Hankel dimension of $X$ , defined as $rank H_{k ∣ X}$ (Definition 10).
${codim}_{H} (k ∣ X)$	kth Hankel codimension of $X$ , defined as $k + 1 - \dim_{H} (k ∣ X)$ (Definition 10).
L	The smallest ℓ such that ${codim}_{H} (ℓ ∣ X) > 0$ .
$A_{i j}$	Entry of a matrix $A$ at row i and column j.
${∥ A ∥}_{F}$	Frobenius norm of $A$ : ${∥ A ∥}_{F} = \sqrt{\sum_{i, j} {\| A_{i j} \|}^{2}}$ .
$A^{+}$	Moore–Penrose pseudoinverse of $A$ ; $B = C A^{+}$ minimizes ${∥ C - B A ∥}_{F}$ .
$X^{T} \in C^{n \times m}$	Transpose of $X \in C^{m \times n}$ .
$X^{*} \in C^{n \times m}$	Conjugate transpose of $X \in C^{m \times n}$ .
$V (M)$	Subspace spanned by the column vectors of $M$ .
$[\begin{matrix} A & B \end{matrix}] \in C^{k \times (m + n)}$	Matrix obtained by appending the n columns of $B \in C^{k \times n}$ to $A \in C^{k \times m}$ .
$V^{⊥} \subset C^{n}$	Orthogonal complement of a subspace $V \subseteq C^{n}$ .
$v_{i}$	The ith component of a vector $v$ .
$0^{n} \in C^{n}$	n-dimensional zero row vector $(0 \dots 0)$ .
$v^{a, b} \in C^{n + a + b}$	Column vector defined as $v^{a, b} = {(0^{a} v^{T} 0^{b})}^{T}$ for $v \in C^{n}$ .
$diag (A_{1}, \dots, A_{k})$	Block diagonal matrix with diagonal blocks $A_{1}, \dots, A_{k}$ .

Table 2. Time complexity of the algorithms in the theoretical and practical scenarios.

Algorithms	Theoretical		Practical
	QRD	EqS	SVD	EqS
Algorithm 1	1	0	1	0
Algorithm 2	1	0	1	0
Algorithm 3	1	1	1	1
Algorithm 4	$< {log}_{2} T$	0	$< {log}_{2} T$	0

Table 3. Simulation scenarios.

Scenario	Type	# Major	# Minor	Noise
number		eigenvalues	eigenvalues	inclusion
1	$L \leq \frac{T}{2}$	10	0	No
2	$L > \frac{T}{2}$	30	0	No
3	$L \leq \frac{T}{2}$	10	90	Yes
4	$L > \frac{T}{2}$	30	70	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Correct Degree Selection for Koopman Mode Decomposition

Abstract

Keywords:

Subject:

1. Introduction

2. Theoretical Frameworks Underlying Koopman Mode Decomposition

2.1. Temporal Transition of States and Semigroup Property

2.2. Koopman Operator

2.3. Koopman Generator

2.4. Koopman Mode Decomposition and Spectral Theorem

3. Discrete Koopman Mode Decomposition

3.1. DFT and Vandelmonde Matrix

3.2. Formulation of DKMD

4. Definitions and Notations

5. Computing DKMD for Known Degrees (Related Work)

5.1. Computing the Koopman Eigenvalues

5.2. Computing the Koopman Modes

6. The Contributions of This Article

7. Finding Uniquely Feasible Degrees

7.1. Key Indices: Hankel Dimension and Codimension

7.2. The Koopman Dimension and Codimension for m = 1

7.3. Examples

7.4. Important Properties of The Hankel Dimension and Codimension

Best Possible Upper Bound of a Uniquely Feasible Degree.

Monotonic Increase of the Hankel Codimension.

Equivalence Between Unique and Minimal Feasibility.

Saturation of the Hankel Dimension.

7.4.1. Invariance under Basis Transformations

7.4.2. Invariance under Basis Transformations

7.4.3. The Best Possible Upper Bound for A Uniquely Feasible Degree

7.4.4. Monotonic increase of the Hankel codimension

7.4.5. Equivalence Between Unique and Minimal Feasibility

7.4.6. Saturation of dim H ℓ ∣ X

8. Algorithms

8.1. A Theoretical Scenario

8.1.1. Dimension Reduction

8.1.2. Case L ≤ T 2

8.1.3. Case T 2 < L ≤ rT r + 1

8.2. A Practical Scenario

8.3. Time Complexities

9. Simulations

9.1. Synthetic Datasets Used in the Simulations

9.2. Simulation Scenarios

9.3. Results of the Simulations

10. Conclusions

References

MDPI Initiatives

Important Links

Subscribe

7.2. The Koopman Dimension and Codimension for $m = 1$

7.4.6. Saturation of $\dim_{H} (ℓ ∣ X)$

8.1.2. Case $L \leq \frac{T}{2}$

8.1.3. Case $\frac{T}{2} < L \leq \frac{rT}{r + 1}$