Differential Topology and Matrix Analysis: An Overview

Petko H. Petkov

doi:10.20944/preprints202601.1792.v1

Submitted:

22 January 2026

Posted:

23 January 2026

You are already at the latest version

Abstract

The present overview aims to illustrate the application of differential topology methods to some important problems in matrix analysis. In particular, it focuses on the use of smooth manifolds and smooth mappings to study fundamental issues such as the determination of matrix rank and the computation of the Jordan form in presence of uncertainties. Various aspects of numerical matrix analysis are discussed, including the genericity of matrix problems, characterization of singular sets in the parameter space, the distance to ill-posed problems and its relation to problem conditioning. The paper also addresses the conditioning of matrix problems in both deterministic and probabilistic settings and the regularization of ill--posed matrix problems. Several examples are provided to illustrate these concepts and their practical relevance.

Keywords:

matrix computations

;

numerical analysis

;

smooth manifolds

;

smooth maps

;

singularities

;

conditioning

;

probabilistic bounds

;

regularization

Subject:

Computer Science and Mathematics - Computational Mathematics

MSC: 53Z50; 53-01; 57R45; 58K05; 65F15; 65F22; 22E70

1. Introduction

Contemporary mathematical models used across the natural sciences are typically not single models corresponding to fixed parameter values; rather, they represent families of models that vary as parameters change within certain bounds. Such models arise in physics, chemistry, biology, control theory, statistics, aerodynamics, hydrodynamics, the social sciences, and several other disciplines. Studying these models using traditional methods is challenging, because the properties of a family of models cannot be described as continuous functions of the parameters if one considers only individual models corresponding to fixed parameter values. This difficulty necessitates the use of families of models whose descriptions and properties depend smoothly on the parameters. Consequently, in recent years, specialists from various scientific fields have shown growing interest in methods of differential topology, whose objects of study are smooth manifolds and smooth mappings. Representing mathematical models in the natural sciences as smooth manifolds, with dimension determined by the number of independent parameters, allows the essential properties of families of models to be expressed as smooth functions of these parameters, thereby greatly facilitating the solution of the corresponding mathematical problems. In this connection, it is appropriate to cite the German mathematician and philosopher Hermann Weyl, who wrote [1], p. 90: Topology has the peculiarity that questions belonging to its domain may under certain circumstances be decidable, even though the continua to which they are addressed may not be given exactly but only vaguely, as is always the case in reality.

The questions addressed by differential topology are global in nature, as they concern the manifold as a whole. Differential topology combines the study of qualitative properties of sets in spaces of arbitrary dimension, which is the domain of topology, with the methods of classical analysis, which enable quantitative analysis under small parametric variations. In this regard, it is appropriate to quote the words of the American mathematician Marston Morse [2], Foreword:

Any problem which is nonlinear in character, which involves more than one coordinate system or variable, or whose structure is initially defined in the large, is likely to require considerations of topology and group theory in order to arrive at its meaning and its solution. In the solution of such problems classical analysis will frequently appear as an instrument in the small, integrated over the whole problem with the aid of group theory or topology.

Differential topology is a broad mathematical discipline whose primary goal is the study and characterization of the global properties of manifolds. A central theme in this field is the transition from local to global properties: many concepts in differential topology can be understood by examining how local behavior extends to the global structure. Another fundamental notion is manifold transversality, which describes the manner in which two manifolds intersect and provides a framework for understanding generic intersections and their stability.

The present overview aims to illustrate the application of methods from differential topology to the solution of several important mathematical problems arising across the natural and applied sciences. In particular, it focuses on the use of smooth manifolds and smooth mappings in matrix analysis, a field with wide–ranging applications in physics, engineering, biology, and beyond. The motivation for this overview arises from the absence of a single comprehensive reference on this subject, as the relevant material is currently scattered across numerous books and research articles.

It should be noted that some mathematical rigor has been deliberately relaxed to make the exposition accessible to specialists from different disciplines.

The overview is organized into six sections.

In Section 2, we present the basic concepts from differential topology necessary for the subsequent discussion. We briefly consider smooth manifolds and smooth maps between them, including the differential of a map and singular points of varieties. Some fundamental facts about Lie groups and matrix groups are also included.

Section 3 is devoted to the geometry of matrix spaces. We examine important characteristics of problems in this space, such as genericity and well–posedness. Condition numbers of matrix problems are discussed in detail to demonstrate the connection between the distance to ill-posed problems and problem conditioning. We also present results on the probabilistic distribution of matrix condition numbers obtained using methods from differential topology.

The important problem of matrix rank is considered in Section 4. We study the orbits of matrices with different ranks and show how small matrix perturbations can move a matrix to an orbit with lower codimension. The problem of determining the numerical rank of a matrix in the presence of uncertainties is also discussed.

Another fundamental problem in matrix analysis—the determination of the Jordan form of a matrix—is addressed in Section 5. We consider orbits and bundles of matrices with fixed Jordan form and investigate their generic properties. The reduction to the “true” Jordan form is described as an ill–posed problem, whose solution can be obtained via regularization methods. This leads to the concept of the numerical Jordan form, which is defined using the tools of differential topology.

In Section 6, we study matrices depending on parameters. It is shown that smooth properties of such matrices can be determined using versal deformations. Several examples of bifurcation diagrams are provided to illustrate how the Jordan form of a matrix depends on the varying parameters.

All computations in this paper were performed using MATLAB^®Version 9.9 (R2020b) [3], employing IEEE double precision arithmetic with a unit roundoff

u \approx 1.11 \times 10^{- 16}

.

2. A Glimpse into Differential Topology

The presentation in this section follows the classic textbooks by Guillemin and Pollack [4], Lee [5,6], and Arnold [7,8], as well as the books by Tu [9] and by Burns and Gidea [10], which are written in a language accessible to non–mathematicians. Excellent introductions to manifold theory for non–specialists include the books by Milnor [11] and Wallace [12]. One of the most authoritative sources in this field is Hirsch’s book [13]; however, reading it requires a very strong mathematical background. Applications of manifolds in mechanics are discussed in depth in [14].

2.1. Smooth Manifolds

A manifold is a multidimensional generalization of the concepts of a line and a surface, without singular points. When studying manifolds, the notion of dimension plays a central role. Generally speaking, the dimension is the number of independent quantities (or parameters) required to specify a point on the manifold. Manifolds of dimension one are lines and curves, while manifolds of dimension two are surfaces. Typical examples of two–dimensional manifolds include planes and spheres, as well as other familiar surfaces such as cylinders, ellipsoids, paraboloids, and tori. A key feature of these examples is that an n–dimensional manifold “looks” locally like

R^{n}

: every point of a manifold

M

has a neighborhood that is topologically equivalent to an open subset of

R^{n}

. Thus, in one-dimensional manifolds each point has a neighborhood resembling a line segment; in two–dimensional manifolds, each point has a neighborhood resembling an open disk; and in three-dimensional manifolds, a neighborhood resembling an open ball.

In this sense, manifolds are sets in which the neighborhood of every point has the same local topological structure as the n–dimensional Euclidean space.

We note that the concept of dimension, as used in the characterization of manifolds, belongs to the most fundamental ideas in mathematics. An excellent overview of the significance of this concept in geometry and algebra is given by Manin in [15].

In Figure 1 we illustrate the decomposition of three-dimensional Euclidean space into layers (or strata) of manifolds defined by the equation

x^{2} + y^{2} - z^{2} = C

for different values of C. An essential feature of this decomposition is that the individual layers do not intersect. Note that the innermost layer (the two opposite cones with a common vertex at the origin) is not smooth; rather, it is an algebraic manifold, since the vertex is a singular point (see Section 2.7).

Definition 1.

Two subsets of Euclidean spaces

U \subseteq R^{k}, V \subseteq R^{n}

are topologically equivalent, or homeomorphic (from the Greek word meaning “similar form”), if there exists a one–to–one correspondence

φ : U \to V

, such that both φ and its inverse are continuous. Such a correspondence is called a homeomorphism.

Based on these considerations, a provisional definition of a topological manifold can be given. We can consider n–dimensional manifold as a subset of some Euclidean space

R^{k}

that is locally Euclidean of dimension n, i.e., every point of

M

has a neighborhood in

M

that is homeomorphic to a ball in

R^{k}

.

For example, every one–dimensional manifold is homeomorphic to either a line or a circle.

Definition 2.

A topological space

M

is called an n–dimensional topological manifold, if it is locally homeomorphic to

R^{n}

.

Every topological manifold

M

is a Hausdorff space: for every pair of distinct points

p, q \in M

, there exist disjoint open subsets

U, V \subseteq M

such that

p \in U

and

q \in V

.

In most cases, the analysis of manifolds cannot be performed directly on the manifold itself. Instead, it is necessary to describe the manifold unambiguously in an appropriate coordinate space and to apply analytical methods to this representation. For this purpose, coordinate charts and a manifold atlas are employed.

An open chart of

M

is defined as a pair

(U, φ)

, where U is an open subset of the space

M

, and

φ

a homeomorphism from U onto an open subset of the coordinate space

R^{n}

. To each point

p \in U

there corresponds, in a one-to-one manner, an n–tuple of numbers

φ : p \in U \mapsto (x^{1} (p), \dots, x^{n} (p)) \in φ (U),

which are called its local coordinates (Figure 2).

On the basis of the concept of an open chart, a rigorous definition of a topological manifold can be introduced.

Let A be a finite or countable set. A topological space

M

is called an n–dimensional topological manifold if there exists a collection of open charts

(U_{α}, φ_{α}), α \in A

, such that:

For each $α$ of the set A, $φ_{α} (U_{α})$ is an open subset of $R^{n}$ .
$M = ⋃_{α \in A} U_{α}$ .

Such a collection of charts is called an atlas of the topological manifold (Figure 3). The atlas of

M

is denoted by

A (M)

.

In the general case, each chart

(U_{i}, φ_{i})

is obtained using a different mapping

φ_{i}

associated with the subset

U_{i}

. In this way, one can study complex manifolds composed of several subsets with different properties. This constitutes an important advantage of manifolds over simpler topological objects which consist of a single set with fixed properties and are described by only one chart.

Example 1.

(a) The coordinate space $R^{n}$ is an n–dimensional topological manifold: its atlas consists of a single open chart $(R^{n}, i_{R^{n}})$ , where $i_{R^{n}} : R^{n} \to R^{n}$ is the identity map.
(b) (Atlas of the two–dimensional sphere).

Let $N = (0, \dots, 0, 1) \in S^{k}$ denote the north pole of the k–dimensional sphere, and let $S = (0, \dots, 0, - 1)$ denote its south pole.

The stereographic projection $φ_{N}$ from $U_{1} = S^{k} ∖ {N}$ onto $R^{k}$ is the mapping that sends a point p to the point where the line through N and p intersects the subspace of $R^{k + 1}$ defined by $x^{k + 1} = 0$ (the projection plane). See Figure 4 for the case $k = 2$ . It is a smooth, bijective map from the entire sphere, except for the projection point, onto the whole plane. Stereographic projection provides a way to represent the sphere by a plane, but it can also be used for other curved surfaces, such as deformed spheres and hyperboloids.

The map $φ_{N} : S^{k} ∖ {N} \to R^{k}$ is given by the formula

$(x^{1}, \dots, x^{k + 1}) \mapsto \frac{1}{1 - x^{k + 1}} (x^{1}, \dots, x^{k}) .$

Analogously, the projection $φ_{S}$ from $U_{2} = S^{k} ∖ {S}$ onto $R^{k}$ is defined by

$(x^{1}, \dots, x^{k + 1}) \mapsto \frac{1}{1 + x^{k + 1}} (x^{1}, \dots, x^{k}) .$

These projections are homeomorphisms from $U_{1}$ and $U_{2}$ onto $R^{2}$ .

Let us define on the sphere $S^{2} \subset R^{3}$ an atlas consisting of two charts, using the stereographic projections

$φ_{N} (x) = (\frac{x^{1}}{1 - x_{3}}, \frac{x^{2}}{1 - x^{3}}), φ_{S} (x) = (\frac{x^{1}}{1 + x_{3}}, \frac{x^{2}}{1 + x^{3}}) .$

In this case, the open sets $φ_{N} (U_{1})$ and $φ_{S} (U_{2})$ , where $U_{1} = S^{2} ∖ {N}$ and $U_{2} = S^{2} ∖ {S}$ , are palnes in $R^{3}$ that are tangent to the sphere at the points S and N, respectively.

The atlas of the sphere is shown in Figure 5. The family of circles lying on the sphere and tangent at the point N is mapped in the lower chart to a family of parallel straight lines, while in the upper chart it is mapped to a family of tangent circles.

The definition of a topological space does not allow one to define differentiable functions or other concepts from mathematical analysis on a manifold. However, many important applications of manifolds involve mathematical analysis. For example, the application of manifold theory in geometry includes properties such as volume and curvature. Typically, volumes are computed by integration, while curvatures are determined through differentiation, so extending these concepts to manifolds requires a way to make integration and differentiation meaningful on a manifold.

Similarly, applications in classical mechanics involve solving ordinary differential equations on manifolds. To give these concepts meaning, it is necessary to define an additional structure on the manifold. In order to make sense of derivatives of real-valued functions, curves, or manifolds, it is necessary to introduce a new type of manifold called a smooth manifold. This is a topological manifold equipped with an additional structure compatible with its topology, which allows one to determine which functions to or from the manifold are smooth.

Let

M

be an n–dimensional topological manifold, and let

A

be an atlas of the manifold

M

. Consider any two charts in the atlas

A

, denoted by

(U, φ)

and

(V, ψ)

.

Definition 3.

The coordinate transformation

ψ \circ φ^{- 1} : φ (U \cap V) \in R^{n} \to ψ (U \cap V) \in R^{n},

which maps the intersection

φ (U \cap V)

in

R^{n}

to

ψ (U \cap V)

in

R^{n}

, is called smooth if the transition functions

\begin{matrix} y^{1} & = & y^{1} (x^{1}, \dots, x^{n}), \\ ⋮ & ⋮ \\ y^{n} & = & y^{n} (x^{1}, \dots, x^{n}) \end{matrix}

have continuous partial derivatives of all orders (that is, they are infinitely differentiable, belonging to the class

C^{\infty}

in the open set

φ (U \cap V)

), and the determinant

\det ([\frac{\partial y_{i}}{\partial x_{j}}]),

of the Jacobian matrix of the transformation is nonzero.

This also implies that the inverse coordinate transformation of

{(ψ \circ φ^{- 1})}^{- 1} = φ \circ ψ^{- 1}

is smooth, since the corresponding transition functions

\begin{matrix} x^{1} & = & x^{1} (y^{1}, \dots, y^{n}), \\ ⋮ & ⋮ \\ x^{n} & = & x^{n} (y^{1}, \dots, y^{n}) \end{matrix}

have continuous partial derivatives of all orders in the open set

ψ (U \cap V)

.

If the coordinate transformation is smooth, we say that the corresponding charts are smoothly compatible.

An atlas

A

is called smooth if:

Any two charts in the atlas are smoothly compatible.
Every point $p \in M$ lies in the domain of at least one chart.

A smooth atlas

A

on

M

is called maximal if it is not properly contained in any larger smooth atlas. This means that any chart that is smoothly compatible with every chart in

A

is already included in

A

. A maximal atlas is also called a complete atlas.

Definition 4.

An n–dimensional topological manifold

M

is said to have a smooth structure if there exists an atlas

A

on the manifold satisfying the following properties:

For any two charts in $A$ , the corresponding coordinate transformation is smooth.
The atlas $A$ is maximal.

Definition 5.

n–dimensional topological manifold that has a smooth structure is called an n–dimensional smooth manifold.

Example 2.

(a) (Normed vector spaces). Let V be a finite-dimensional real vector space. Any norm on V defines a topology that is independent of the choice of norm. With this topology, V is an n–dimensional topological manifold with a natural smooth structure defined as follows. Any (ordered) basis $(e^{1}, \dots, e^{n})$ of V defines a basic isomorphism $E : R^{n} \to V$ by

$E (x) = \sum_{i = 1}^{n} x^{i} e^{i} .$

This map is a homeomorphism, so $(V, E^{- 1})$ is a chart. The collection of all such charts defines a smooth structure, called the standard smooth structure on V.
(b) (Matrix spaces). Let $M (m \times n, R)$ denote the set of $m \times n$ matrices with real entries. Since it is a real vector space of dimension $m \cdot n$ under matrix addition and scalar multiplication, $M (m \times n, R)$ is a smooth $m \cdot n$ –dimensional manifold. (Since the spaces $M (m \times n, R)$ and $R^{m \cdot n}$ are isometric, they can be identified by “stacking” all matrix entries into a single row or column.) A chart on this manifold is given by

$φ : R^{m \times n} \to R^{m \cdot n} : X \mapsto vec (X),$

where $vec (X)$ denotes the vector obtained from the matrix elements as described above. The dimension of this manifold is $m \cdot n$ . The space $R^{m \times n}$ can be equipped with a Euclidean structure via the inner product

$〈 M_{1}, M_{2} 〉 = vec {(M_{1})}^{T} vec (M_{2}) = tr (M_{1}^{T} M_{2}) .$

The norm induced by this inner product is the Frobenius norm, defined by

${∥ M ∥}_{F}^{2} = tr (M^{T} M),$

i.e., ${∥ M ∥}_{F}^{2}$ is the sum of the squares of all entries of M.

Similarly, the space $M (m \times n, C)$ of $m \times n$ complex matrices is a vector space of dimension $2 m n$ over $R$ and therefore is a smooth manifold of dimension $2 \cdot m \cdot n$ . In the special case $m = n$ (square matrices), the notations $M (m \times n, R)$ and $M (m \times n, C)$ are abbreviated as $M (n, R)$ and $M (n, C)$ , respectively.
(c) (General linear group). The general linear group $GL (n, R)$ is the set of invertible $n \times n$ matrices with real entries. It is an $n^{2}$ –dimensional manifold because it is an open subset of the $n^{2}$ –dimensional vector space $M (n, R)$ , namely the set where the (continuous) determinant function $det (A)$ is nonzero.
(d) (Spaces of linear maps). Let V and W be finite–dimensional real vector spaces, and let $L (V, W)$ denote the set of linear maps from V to W. Since $L (V, W)$ is itself a finite–dimensional vector space (whose dimension is the product of the dimensions of V and W), it naturally carries the structure of a smooth manifold, just as in item (b).

A submanifold of a manifold

M

is a subset

P \subset M

that itself has the structure of a manifold. The sphere

S^{2}

, defined by the equation

x^{2} + y^{2} + z^{2} = 1

, is an example of a subset of the coordinate space

R^{3}

that inherits the natural manifold topology from

R^{3}

.

In a number of important applications of manifolds, one encounters spaces that would be smooth manifolds except that they have a “boundary” of some kind. Elementary examples of such spaces are closed intervals in

R

, closed balls in

R^{n}

, and closed hemispheres in

S^{n}

. The study of manifolds with boundary requires a generalization of the definition of a manifold; see, for example, [6], Ch. 1.

An overview of the historical development of the concept of a manifold can be found in [16].

2.2. Smooth Maps

Let

U \subset R^{n}

be an open set. Any mapping

f : U \to R^{m}

can be represented as an ordered collection of m functions:

f (x^{1}, \dots, x^{n}) = (f^{1} (x^{1}, \dots, x^{n}), f^{2} (x^{1}, \dots, x^{n}), \dots, f^{m} (x^{1}, \dots, x^{n})) .

The mapping

f : U \to R^{m}

is called smooth (

f \in C^{\infty}

), if each function

f^{k}, k = 1, 2, \dots, m

, has continuous partial derivatives of all orders. Mappings with only continuous functions are called

C^{0}

mappings. In the case where all functions

f^{k}

are analytic (a function is called analytic if its Taylor series converges to it in a neighborhood of each point), the mapping f is called analytic (

f \in C^{ω}

). We have the inclusion

C^{\infty} \supset C^{ω}

.

Let

M

and

P

be smooth manifolds of dimensions n and k, respectively, and let

f : M \to P

be an arbitrary map. The map f is called smooth at point p of the manifold

M

if there exist a local chart

(U, φ)

containing p and a local chart

(V, ψ)

containing

f (p)

such that

f (U) \subseteq V

and the composition

ψ \circ f \circ φ^{- 1}

is a smooth map from

φ (U)

to

ψ (V)

(Figure 6).

A map f is called smooth if it is smooth at every point p of the manifold

M

(see Figure 7).

Every smooth map is continuous.

The set of smooth maps of the form

f : M \to P

is denoted by

C^{\infty} (M, P)

.

Let

M

and

P

be smooth manifolds.

A smooth map

f : M \to P

is called a diffeomorphism if it is bijective and its inverse

f^{- 1} : P \to M

is also smooth.

The manifold

M

is said to be diffeomorphic to the manifold

P

if there exists a diffeomorphism

f : M \to P

. This is denoted by

M ≅ P

.

Example 3.

In Figure 8, an open disk and an open ellipse are shown, which are diffeomorphic. The disk is mapped to the ellipse by the smooth map

f : (x, y) \mapsto (x / a, y / b),

and the ellipse is mapped back to the disk by the inverse map

f^{- 1} : (x, y) \mapsto (a x, b y),

which is also smooth.

Similar to the case when two topological spaces are considered “the same” if they are homeomorphic, two smooth manifolds are regarded as indistinguishable if they are diffeomorphic. A central question in the theory of smooth manifolds is the study of properties of smooth manifolds that are preserved under diffeomorphisms.

2.3. Tangent Space

In the study of the metric properties of regions in Euclidean space, an important role is played by properties that are defined in a theoretically infinitesimal neighborhood of a fixed point, by neglecting quantities of higher order relative to the distance to that point. Similarly, in the study of smooth manifolds it is appropriate to neglect infinitesimal quantities of higher order in order to simplify the analysis of a given problem. One way to achieve this is to introduce special concepts analogous to tangent vectors to curves and tangent planes to surfaces, as used in mathematical analysis.

Let

v_{p}

be a vector in the coordinate space

R^{n}

based at a point

p \in R^{n}

. Then, for every smooth function f defined in a neighborhood of the point p, the directional derivative determined by the vector

v_{p}

is defined as follows:

{\frac{\partial f}{\partial v}|}_{p} = lim_{t \to 0} \frac{f (p + t v_{p}) - f (p)}{t} = \frac{d}{d t} f (p + t v_{p}) |_{t = 0},

where

t \geq 0

is a numerical parameter (in analysis, one usually considers a vector

v_{p}

of unit length). In a coordinate system, we have the formula:

{\frac{\partial f}{\partial v}|}_{p} = {\frac{\partial f}{\partial x^{1}}|}_{p} v^{1} + \dots + {\frac{\partial f}{\partial x^{n}}|}_{p} v^{n} = {〈 grad (f (p)), v 〉}_{p},

where

grad (f) = (\frac{\partial f}{\partial x^{1}}, \dots, \frac{\partial f}{\partial x^{n}})

is the gradient of the function f, and

x^{1}, \dots, x^{n}

,

v^{1}, \dots, v^{n}

are the coordinates of the point p and the vector

v_{p}

, respectively. Furthermore, the quantity

{\frac{\partial f}{\partial v}|}_{p} = \sum_{i = 1}^{n} v^{i} \frac{\partial}{\partial x^{i}} f (p)

will be called the derivative of the smooth function f in the direction of the vector $v_{p}$ at the point p and will be denoted by

D_{p}^{v} f

. In this notation, it is understood that the partial derivatives are evaluated at p, since

v_{p}

is a vector at p. Note that

D_{p}^{v} f

is a number, not a function. We write

D_{p}^{v} = \sum_{i = 1}^{n} v^{i} {\frac{\partial}{\partial x^{i}}|}_{p}

for the map that sends the function f to the number

D_{p}^{v} f

.

In this way, given a vector

v_{p}

at the point p, an operation is defined on the set

C^{\infty} (p)

of smooth functions in a neighborhood of p:

D_{p}^{v} : C^{\infty} (p) \to R,

performed according to the rule

f (p) \mapsto D_{p}^{v} f .

Definition 6.

A tangent vector at point p of the manifold

M

is a rule

v_{p} : C^{\infty} (p) \to R,

which assigns to each function f in the set

C^{\infty} (p)

a number

v_{p} f

satisfying the following properties:

$v_{p} (f + g) = v_{p} f + v_{p} g$ ,
$v_{p} (α f) = α v_{p} f$ ,
$v_{p} (f \cdot g) = f (p) \cdot v_{p} g + v_{p} f \cdot g (p)$ ,

where

f, g \in C^{\infty} (p), α \in R

.

The association

v_{p} \mapsto D_{p}^{v}

of the directional derivative

D_{p}^{v}

with the tangent vector

v_{p}

allows tangent vectors to be characterized as certain operators acting on functions.

The set of vectors that are tangent to the manifold

M

at a point p is denoted by

T_{p} M

. We now define on this set the operations of addition of tangent vectors and scalar multiplication of a tangent vector.

Let

v_{p}, w_{p} \in T_{p} M, f \in C^{\infty} (p)

and

α \in R

. We define

\begin{matrix} (v_{p} + w_{p}) f & = & v_{p} f + w_{p} f, \\ (α v_{p}) f & = & α v_{p} f . \end{matrix}

It is not difficult to verify that both the sum of the tangent vectors

v_{p}

and

w_{p}

, and the product of a tangent vector with a scalar

α v_{p}

, are also tangent vectors. In this way, the set

T_{p} M

becomes a vector space. It is called the tangent space to the smooth manifold

M

at the point p.

Let

(U, φ)

be a local chart (coordinate system) and let

p \in U

. For every function

f \in C^{\infty} (p)

one can construct a smooth function

\hat{f} = f \circ φ^{- 1}

, defined on the open subset

φ (U)

of

R^{n}

(see Figure 9).

By computing the partial derivatives of

\hat{f}

with respect to the variables

x^{1}, \dots, x^{n}

, we obtain

\frac{\partial}{\partial x^{i}} \hat{f} (φ (p)) = {\frac{\partial}{\partial x^{i}}|}_{φ (p)} (f \circ φ^{- 1}), i = 1, \dots, n .

In this way, for each function

f \in C^{\infty} (p)

, one can associate n numbers

f \mapsto {\frac{\partial}{\partial x^{1}}|}_{φ (p)} (f \circ φ^{- 1}), \dots, {\frac{\partial}{\partial x^{n}}|}_{φ (p)} (f \circ φ^{- 1}) .

We see that the choice of a system of local coordinates determines n vectors in the tangent space

T_{φ (p)} M

acting according to the rule

f \mapsto {\frac{\partial}{\partial x^{i}}|}_{φ (p)} (f \circ φ^{- 1}), i = 1, \dots, n .

These vectors are conventionally denoted by

{\frac{\partial}{\partial x^{i}}|}_{p} .

The quantities

\partial / \partial x^{i} |_{p}

represent linear partial differential operators, which act according to the rule

{\frac{\partial}{\partial x^{i}}|}_{p} f = {\frac{\partial f}{\partial x^{i}}|}_{p} = {\frac{\partial}{\partial x^{i}}|}_{φ (p)} (f \circ φ^{- 1}) = \frac{\partial \hat{f}}{\partial x^{i}} (\hat{p}) .

Theorem 1.

[6], Ch. 3 The vectors

{\frac{\partial}{\partial x^{1}}|}_{p}, \dots, {\frac{\partial}{\partial x^{n}}|}_{p}

form a basis of the tangent space

T_{p} M

.

In this way, we obtain the important result that the dimension of the tangent space

T_{p} M

is equal to the dimension of the manifold

M

,

\dim (T_{p} M) = \dim (M) .

The orthogonal complement of the tangent space is called the normal space and is denoted by

N_{p} M

.

Instead of working with the dimension of the manifold

M

, it is often more convenient to use the codimension of

M

, denoted by

codim (M)

, which is equal to the dimension of the normal space

N_{p} M

. Since

T_{p} M + N_{p} M = S,

where

S

is the ambient manifold, we have

\dim (M) + codim (M) = \dim (S) .

2.4. Differential of a Map

To analyze the action of smooth maps on tangent vectors, it is necessary to consider the differentiation of such maps. In the case of a smooth manifold between Euclidean spaces, the total derivative of the map at a point (represented by its Jacobian matrix) is a linear map that provides the “best linear approximation” of the map near the given point. In the case of manifolds, a linear map is defined between the tangent spaces.

Definition 7.

Let

M

and

P

be smooth manifolds, and let

f : M \to P

be a smooth map. The differential of f at a point

p \in M

is the linear map from the tangent space of

M

to the tangent space of

P

,

D_{p} f : T_{p} M \to T_{f (p)} P,

defined as follows (see Figure 10). Let

v \in T_{p} M

. Consider a curve

φ : R \to M

with

φ (0) = p

whose tangent vector is v. Then

D_{p} f v

is the tangent vector to the curve

f \circ φ : R \to P

.

Proposition 1.

(Propertis of the differential). Let

M

,

P

, and

Z

be smooth manifolds. Let

f : M \to P

and

g : P \to Z

be smooth maps, and let

p \in M

.

(a) $D_{p} f : T_{p} M \to T_{f (p)} P$ is a linear map.
(b) $D_{p} (g \circ f) = D_{f (p)} g \circ D_{p} f : T_{p} M \to T_{g \circ f (p)} Z$ .
(c) $D_{p} (i_{M}) = i_{T_{p} M} : T_{p} M \to T_{p} M$ , where $i_{M}$ is the identity map on $M$ .
(d) If f is a diffeomorphism, then $D_{p} f : T_{p} M \to T_{f (p)} P$ is an isomorphism, and ${(D_{p} f)}^{- 1} = D_{f (p)} (f^{- 1})$ .

Using

(x^{1}, \dots, x^{n})

to denote coordinates in the domain of f and

(y_{1}, \dots, y^{m})

to denote coordinates in the codomain, the action of

D_{p} f

on a typical basis vector is

D_{p} f ({\frac{\partial}{\partial x^{i}}|}_{p}) = \frac{\partial f^{j}}{\partial x^{i}} (p) {\frac{\partial}{\partial y^{j}}|}_{f (p)} .

(1)

Therefore, the matrix of

D_{p} f

in terms of the coordinate bases is

[\begin{matrix} \frac{\partial f_{1}}{\partial x^{1}} (p) & \dots & \frac{\partial f_{1}}{\partial x^{n}} (p) \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial f_{m}}{\partial x^{1}} (p) & \dots & \frac{\partial f_{m}}{\partial x^{n}} (p) \end{matrix}] .

This matrix is precisely the Jacobian matrix of f at p, which is the matrix representation of the total derivative

D f (p) : R^{n} \to R^{m}

.

Example 4.

We consider the maps

f : B_{R}^{3} \to R^{3}

and

g : R^{3} \to B_{R}^{3}

, defined by

\begin{matrix} f (x) & = & (\frac{x^{1}}{\sqrt{1 - {x^{1}}^{2}}}, \frac{x^{2}}{\sqrt{1 - {x^{2}}^{2}}}, \frac{x^{3}}{\sqrt{1 - {x^{3}}^{2}}}), \\ g (x) & = & (\frac{x^{1}}{\sqrt{1 + {x^{1}}^{2}}}, \frac{x^{2}}{\sqrt{1 + {x^{2}}^{2}}}, \frac{x^{3}}{\sqrt{1 + {x^{3}}^{2}}}), \end{matrix}

where

B_{R}^{3}

is the open 3–dimensional ball of radius R. These maps are smooth, and it can be shown that they are inverses of each other for

{∥ x ∥}_{2} = \sqrt{{x^{1}}^{2} + {x^{2}}^{2} + {x^{3}}^{2}} < 1

. Therefore, both maps are diffeomorphisms.

Let

p = (p^{1}, p^{2}, p^{3})

be a point on the sphere

S_{R}^{2}

that does not coincide with the north pole. To obtain the tangent space

T_{f (p)} f (S_{R}^{2})

, we need to compute the differential

D_{p} f : R^{2} \to T_{f (p)} f (S_{R}^{2})

. For a local parametrization of

S_{R}^{2}

and

f (S_{R}^{2})

in

R^{2}

, we use the stereographic projections (see Example 1)

\begin{matrix} φ : S_{R}^{2} \to R^{2}, (x^{1}, x^{2}) & = & φ (p) = (\frac{p^{1}}{1 - p^{3}}, \frac{p^{2}}{1 - p^{3}}), \\ φ^{- 1} : R^{2} \to S_{R}^{2}, (p^{1}, p^{2}, p^{3}) & = & φ^{- 1} (x) = \frac{1}{1 + {x^{1}}^{2} + {x^{2}}^{2}} (2 x^{1}, 2 x^{2}, {x^{1}}^{2} + {x^{2}}^{2} - 1), \\ ψ : f (S_{R}^{2}) \to R^{2}, ψ (y) & = & (\frac{y^{1}}{1 - y^{3}}, \frac{y^{2}}{1 - y^{3}}), \end{matrix}

where

y = f (p) = (\frac{p^{1}}{\sqrt{1 - {p^{1}}^{2}}}, \frac{p^{2}}{\sqrt{1 - {p^{2}}^{2}}}, \frac{p^{3}}{\sqrt{1 - {p^{3}}^{2}}}) .

The matrices of partial derivatives are

\begin{matrix} D_{y} ψ & = & [\begin{matrix} 1 / (1 - y^{3}) & 0 & - y^{1} / (1 - y^{3}) \\ 0 & 1 / (1 - y^{3}) & - y^{1} / (1 - y^{3}) \end{matrix}], \\ D_{p} f & = & [\begin{matrix} 1 / \sqrt{1 - {p^{1}}^{2}} & 0 & 0 \\ - 2 {p^{1}}^{2} / (1 - {p^{1}}^{2}) \\ 0 & 1 / \sqrt{1 - {p^{2}}^{2}} & 0 \\ - 2 {p^{2}}^{2} / (1 - {p^{2}}^{2}) \\ 0 & 0 & 1 / \sqrt{1 - {p^{3}}^{2}} \\ - 2 {p^{3}}^{2} / (1 - {p^{3}}^{2}) \end{matrix}], \\ D_{x} φ^{- 1} & = & \frac{2}{{(1 + {x^{1}}^{2} + {x^{2}}^{2})}^{2}} [\begin{matrix} 1 - {x^{1}}^{2} + {x^{2}}^{2} & - 2 x^{1} x^{2} \\ - 2 x^{1} x^{2} & 1 + {x^{1}}^{2} - {x^{2}}^{2} \\ 2 x^{1} & 2 x^{2} \end{matrix}] . \end{matrix}

The tangent space

T_{f (p)} f (S_{R}^{2})

is obtained as the image of

R^{2}

under the differential

D_{x} f = D_{p} f D_{x} φ^{- 1}

.

Figure 11 shows the diffeomorphic image

S_{R}^{2} \mapsto f (S_{R}^{2})

of a sphere with radius

R = 0.97

. The tangent spaces

T_{p} S_{R}^{2}

and

T_{f (p)} f (S_{R}^{2})

are computed for the points

p = R (1 / \sqrt{3}, - 1 / \sqrt{3}, 1 / \sqrt{3})

and

x = φ (p), y = f (p)

.

The matrix of the differential

D_{\hat{p}} \hat{f} : R^{2} \to R^{2}

is obtained from

D_{\hat{p}} \hat{f} = D_{y} ψ D_{p} f D_{x} φ^{- 1} .

Very good descriptions of tangent vectors and tangent spaces of manifolds, accompanied by examples, are given in [6], Ch. 3, [10], Ch. 1, [9,17], Ch. 3, [12], Ch. 4, [18], Ch. 4.

2.5. Tangent and Normal Bundle

Let

M

be a smooth manifold. Let us define the disjoint union of all tangent spaces of

M

,

T M = ⋃_{p \in M} T_{p} M .

The set

T M

is called tangent bundle of the manifold

M

. The term bundle indicates that

T M

consists of “layers”– the tangent spaces at the individual points of the manifold. The tangent bundle of an m–dimensional manifold in

R^{n}

is itself a manifold whose dimension is equal to

2 m

.

In the trivial case, when

M = R^{n}

, for each

p \in R^{n}

the tangent space

T_{p} R^{n}

can be identified with

R^{n}

. Therefore, in this case we have

T R^{n} ≅ R^{n} \times R^{n}

, i.e., the tangent bundle

T R^{n}

is diffeomorphic to the Cartesian product

R^{n} \times R^{n}

.

The only tangent bundles that can be easily visualized are those of the real line

R

and of the circle. The tangent bundle of two–dimensional manifolds is four–dimensional and therefore difficult to visualize.

The tangent bundle of the real line coincides with

R^{2}

. The tangent bundle of the circle is obtained by considering all tangent spaces (Figure 12, top) and combining them disjointly into a smooth manifold (Figure 12, bottom).

The map

π : T M \to M

, which assigns to each tangent vector v the point

p \in U

, at which the vector is tangent to

M

(

v \in T M

), is called the the natural projection. The preimage of a point

p \in M

under the natural projection,

π^{- 1} (p)

, is the tangent space

T_{p} M

. This space is called the fiber of the bundle over the point p.

Let

M

be a smooth manifold. Let us define the disjoint union of all normal spaces of

M

,

N M = ⋃_{p \in M} N_{p} M .

The set

N M

is called normal bundle. The normal bundle of a circle is shown in Figure 13.

Assume that

M \subseteq R^{n}

is an m–dimensional submanifold. The normal bundle

N M

consists of all vectors that are normal to

M

:

N M = \{(p, v) \in R^{n} \times R^{n} : p \in M, v \in N_{p} M\} .

In this case,

N M

can be viewed as an n–dimensional submanifold of the Cartesian product

R^{n} \times R^{n}

.

Let

M \subseteq R^{n}

be an m–dimensional submanifold without boundary. Then

N M

is a smooth manifold of dimension n.

Normal bundles can be considered more generally when we have a submanifold

P \subset M

, in order to understand the geometry of

P

within

P

in

M

.

Definition 8.

(Normal bundle of a submanifold). Let

M \subset R^{m}

be a manifold without boundary, and let

P

be a submanifold of

M

. The normal bundle of

P

in

M

is defined as the set

N (P, M) : = \{(p, v) : p \in P, v \in T_{p} M a n d v ⊥ T_{p} P\} .

The normal bundle

N (P, M)

is a smooth manifold of dimension equal to

\dim (M)

.

2.6. Tubular Neighborhoods

In this section we consider an important application of normal bundles, which is characterized by the fact that every smooth manifold

M

without boundary possesses a special type of neighborhood.

Generally speaking, a tubular neighborhood U of a smooth submanifold

M

is an open set around the submanifold whose structure resembles that of the normal bundle. This definition can be made more concrete by the following example. Let us consider a smooth plane curve without self–intersections. At each point of the curve we draw a straight line perpendicular to the curve. Except in the case where the manifold is a straight line, these lines will intersect in a complicated way (Figure 14). However, if we consider a narrow strip around the curve, the portions of the normal lines contained in this strip will not intersect and will cover the entire strip without gaps.

The tubular neighborhood of a space curve in

R^{3}

is shown in Figure 15.

Let

M \subseteq R^{n}

be an m–dimensional submanifold. Viewing the normal bundle

N M

as a submanifold of

R^{n} \times R^{n}

, we define the smooth map

E : N M \to R^{n}, (p, v) \mapsto p + v .

It maps each normal space

N_{p} M

affinely through p and orthogonally to

T_{p} M

. The tubular neighborhood of

M

is the neighborhood U of

M

in

R^{n}

which is the diffeomorphic image, under this map, of an open subset

V \subset N M

of the form

V = \{(p, v) \in N M : | v | < ε (p)\},

for some positive continuous function

ε : M \to R^{+}

(Figure 16). We have the following definition:

Definition 9.

(Tubular neighborhood). Let

M \subset R^{n}

be a smooth manifold without boundary. A tubular neighborhood of

M

is an open subset U of

R^{n}

containing

M

such that E maps an open subspace

V \subset N M

diffeomorphically onto U, where V is defined by a smooth function

ε : M \to R^{+}

.

The key property of smooth manifolds embedded in a Euclidean space is that they always possess a tubular neighborhood.

Theorem 2.

Every embedded submanifold of

R^{n}

has a tubular neighborhood.

Tubular neighborhoods and their properties are studied in detail in [6], Ch. 6, [10], Ch. 6, [13], Ch. 5.

2.7. Singular and Regular Points

Definition 10.

Let f be a smooth map from an m–dimensional manifold

M

to an n–dimensional manifold

P

.

(a) A point $p \in M$ is called critical or singular point of f if the derivative $D_{p} f : T_{p} M \to T_{f (p)} P$ is not surjective; that is, if the rank of the Jacobian matrix ${(\frac{\partial f}{\partial x})|}_{p}$ is smaller than the dimension n of $P$ . The image $f (p)$ of a critical point is called a critical value of f.
(b) A point $p \in M$ is called a regular point of f if it is not critical. A point $q \in P$ is called a regular value of f if its inverse image $f^{- 1} (q)$ contains no critical points.

Note that if the dimension of

M

is smaller than the dimension of

P

, then all points of

M

are critical points of f. On the other hand, if

f (M)

is not the whole of

P

, then all points of

P ∖ f (M)

are regular values.

Example 5.

(a) Let $M$ be a torus embedded as a submanifold in three-dimensional Euclidean space. From Figure 17 it can be seen that there exist exactly four horizontal planes (that is, planes of the form $z = const$ ), denoted $H_{1}, H_{2}, H_{3}, H_{4}$ , which are tangent planes to $M$ at the points $p_{1}, p_{2}, p_{3}, p_{4}$ , respectively. This corresponds to the fact that the function z restricted to $M$ has critical points at $p_{1}, p_{2}, p_{3}, p_{4}$ . These critical points correspond to the critical values $z_{1} = f (p_{1}), z_{2} = f (p_{2}), z_{3} = f (p_{3}), z_{4} = f (p_{4})$ .
(b) Let $f : R^{n + 1} \to R$ be the mapping

$x = (x^{1}, \dots, x^{n + 1}) \mapsto {| x |}^{2} = {x^{1}}^{2} + \dots + {x^{n + 1}}^{2} .$

The derivative $\frac{\partial f}{\partial x}$ at point $x = (x^{1}, \dots, x^{n + 1})$ is the linear mapping given, in the standard basis, by the matrix $(2 x^{1}, \dots, 2 x^{n + 1})$ . Thus, $D_{x} f : R^{n + 1} \to R$ is surjective unless $f (x) = 0$ , so every nonzero real number is a regular value of f. In particular, we obtain the sphere $S^{n} = f^{- 1} (1)$ as an n–dimensional manifold.
(c) We consider the case in which the full–rank condition for the Jacobian matrix is not satisfied. Let the manifold $M \subset R^{3}$ be defined by the equation ${x^{1}}^{2} - {x^{2}}^{2} - {x^{3}}^{2} = 0$ (see Figure 18). On the set $M ∖ 0$ , a structure of a two–dimensional $C^{\infty}$ submanifold can be defined. At the point 0, all minors of the Jacobian matrix vanish and its rank is not maximal. Therefore, the set $M$ is an algebraic variety [34], and the point 0 is a singular point of this variety.

The study of abrupt changes that arise in families of objects depending smoothly on parameters is the subject of singularity theory [19,20,21,22]. This theory deals with the classification of types of changes and the characterization of the sets of parameters that give rise to sudden transitions. Singularity theory forms the foundation of the famous catastrophe theory [23,24,25,26,27,28].

2.8. Sard’s Theorem and Morse Functions

The following result shows that almost every point in the image of a smooth map is a regular value.

Theorem 3.

(Sard’s Theorem). Suppose that

M

and

P

are smooth n–manifolds and

f : M \to P

is a smooth map. Then the set of critical values of f has measure zero in

P

.

The proof of Theorem 3 can be found in ([10], Ch. 1]).

Sard’s Theorem is illustrated in Figure 19. This theorem is a key result in differential topology and is applied in many situations. For example, it is crucial for Thom’s Transversality Theorem (see Theorem 4).

An equivalent formulation of Sard’s Theorem is as follows:

If

f : M \to P

is a smooth map between manifolds, then almost every

y \in P

is a regular value of f, i.e., the set of regular values is a dense subset of

P

, or equivalently, every open subset of

P

contains a regular value.

It should be emphasized that this theorem refers to regular values, not regular points. For example, a constant function of one variable has no regular points (all points are critical), but it has only one critical value, so the remaining points in

R

are regular values. A set consisting of a single point clearly has measure zero.

Sard’s lemma was published by the American mathematician Arthur Sard in 1942.

It is interesting to study the behavior of a function f near its critical points. If

M

is a compact set, every function on it must have a maximum and a minimum. However, if

f (x)

has an extremum, its derivative

D_{x} f

must be equal to zero. Thus, over a compact domain, every function has at least two critical points (except in the case

M = {a s i n g l e p o i n t}

).

Let us consider a smooth function

f : R^{k} \to R

. Locally, near a point

c \in R^{k}

, f can be expressed using the Taylor series:

f (x) = f (c) + \sum_{i = 1}^{k} \frac{\partial f}{\partial x^{i}} (c) \cdot (x^{i} - c^{i}) + \frac{1}{2} \sum_{i, j = 1}^{k} \frac{\partial^{2} f}{\partial x^{i} \partial x^{j}} (c) \cdot (x^{i} - c^{i}) (x^{j} - c^{j}) + o (| x^{3} |) .

If c is a critical point, then by definition:

D_{c} f = (\frac{\partial f}{\partial x^{1}} (c), \dots, \frac{\partial f}{\partial x^{k}} (c)) = (0, \dots, 0) .

Therefore, in the neighborhood of a critical point, we have

f (x) = f (c) + \frac{1}{2} \sum_{i, j = 1}^{k} \frac{\partial^{2} f}{\partial x^{i} \partial x^{j}} (c) \cdot (x^{i} - c^{i}) (x^{j} - c^{j}) + o (| x^{3} |) .

(2)

Hence, the best possible approximation of the local behavior of f at the point c is given by the Hessian matrix of second derivatives:

H = [\frac{\partial^{2} f}{\partial x^{i} \partial x^{j}}] .

Note that the Hessian H is a real symmetric matrix and therefore has only real eigenvalues. At a point where all eigenvalues of the Hessian are positive, the function f has a minimum; at a point where they are negative, f has a maximum.

Definition 11.

(Non-degenerate critical points and Morse functions). For a smooth function

f : R^{k} \to R

, a point

c \in R^{k}

where

D_{c} f = 0

but the Hessian matrix

H_{c} (f) = [\frac{\partial^{2} f}{\partial x^{i} \partial x^{j}} (c)]

is invertible at c, is called non-degenerate critical point of f. If all critical points of f are non-degenerate, then f is called a non-degenerate function or Morse function.

Morse’s lemma was published by the American mathematician Marston Morse in 1925.

Computations in the neighbourhood of algebraic singularities are considered in [29].

2.9. Transverse Intersection of Manifolds

Let

M

and

P

be smooth submanifolds of an ambient manifold

S

. They are said to intersect transversally if for every

x \in M \cup P

, their tangent spaces at x satisfy

T_{M} (x) + T_{P} (x) = T_{S} (x) .

(Figure 20). That is, the directions tangent to

M

together with the directions tangent to

P

span all possible directions of the ambient manifold.

The term frequently used synonymously for transversal is general position, i.e., two manifolds which intersect transversally are said to be in general position.

Two linear subspaces

X

and

Y

of a linear space

L

are transverse, if their sum is equal to the whole space,

X + Y = L

. For instance, two planes intersecting at nonzero angles in

R^{3}

are transverse (Figure 21).

In three–dimensional space, transverse curves do not intersect. Indeed, if

C_{1}

and

C_{2}

are curves in

R^{3}

, then

dim C_{1} + dim C_{2} = 1 + 1 < 3,

so their tangent spaces cannot span the tangent space of the ambient manifold at a common point.

A curve transverse to a surface intersects the surface in isolated points. In this case

dim C + dim S = 1 + 2 = 3,

and transversality implies that at each intersection point the tangent line to the curve together with the tangent plane to the surface span

R^{3}

.

Similarly, two surfaces transverse to each other intersect in a curve. Indeed,

dim S_{1} + dim S_{2} = 2 + 2 > 3,

and their transverse intersection has dimension

2 + 2 - 3 = 1 .

Note that two perpendicular planes in

R^{3}

intersect transversally, whereas two perpendicular lines lying in one and the same plane do not (Figure 22, left). Curves that are tangent to a surface at a point (for example, curves lying entirely on a surface) do not intersect the surface transversally. The same is true for planes that are tangent to a surface at a point (Figure 22, right). If an intersection of two submanifolds is transverse, then it is itself a smooth submanifold whose codimension is equal to the sums of the codimensions of the two intersecting manifolds.

The following result shows that transverse intersections are generic among intersections of smooth manifolds.

Theorem 4.

(Thom’s Transversality Theorem). ([4], Ch. 2) Suppose we have a family of smooth maps

f_{s} : M \to P, f_{s} (x) = F (x, s),

where each map depends smoothly on a parameter s belonging to a parameter space

S

. Assume that

M

may have a boundary, while

P

and a given submanifold

Z \subset P

do not.

If the full mapping

F (x, s)

, as well as its restriction to the boundary of

M

, intersects

Z

in a transversal way, then for “almost all” choices of the parameter s the corresponding map

f_{s} (x)

also intersects

Z

transversally, both in the interior of

M

and on its boundary.

In other words, transversality is a generic property: although a particular map may fail to be transversal, a small perturbation – obtained by slightly changing the parameter s – will typically restore transversality.

The concept of transversality was developed by the French mathematician René Thom in the 1950s.

2.10. Lie Groups

Lie groups are one of the powerful tools of differential topology, applied in a variety of areas, such as the theory of differential equations, the study of special functions, and matrix analysis. In this section, some basic information about Lie groups and their properties is provided.

The theory of Lie groups and Lie algebras is presented in depth in [6], Ch. 7, [9], Ch. 4, [30], Ch. 3. A comprehensive overview of Lie group theory, matrix Lie groups, and matrix Lie algebras is given in [31]. The group–theoretic approach to matrices and vector spaces is developed in detail in [32]. Applications of Lie groups in the theory of differential equations are discussed in [33].

2.10.1. Basic Definitions

Definition 12.

A group is a set G together with a group operation, usually called multiplication, such that for any two elements g and h in G, their product

g \cdot h

is again an element of G. The group operation is required to satisfy the following properties:

(1): Associativity. If g, h, and k are elements of G, then

$g \cdot (h \cdot k) = (g \cdot h) \cdot k .$
(2): Existence of an identity element. There exists a distinguished element $e \in G$ , called the identity element, which satisfies

$e \cdot g = g \cdot e = g$

for all $g \in G$ .
(3): Existence of an inverse element. For every $g \in G$ , there exists an inverse element, denoted $g^{- 1}$ , which satisfies

$g \cdot g^{- 1} = e = g^{- 1} \cdot g .$

Below are some elementary examples of groups.

Example 6.

(a) Let $G = Z$ be the set of integers with the group operation being addition. Clearly, associativity holds, the identity element is 0, and the “inverse”of an integer x is $- x$ .
(b) Similarly, $G = R$ – the set of real numbers, is also a group under addition. Again, the identity element is 0, and the inverse of a real number x is $- x$ . In both cases, the group operation is commutative: $g \cdot h = h \cdot g$ for all $g, h \in G$ . Such groups are called Abelian.
(c) Let $G = G L (n, Q)$ be the set of invertible $n \times n$ matrices with rational entries. The group operation is matrix multiplication. The identity element is the identity matrix I, and the inverse of a matrix A is the usual inverse matrix, whose entries are again rational numbers.
(d) Similarly, the general linear group $G L (n, R)$ – the set of invertible $n \times n$ matrices with real entries – is a group under matrix multiplication with the same identity element and inverses as in the previous example.

Lie groups are smooth manifolds that are also groups, in which the multiplication and inversion operations are smooth maps. Besides providing many interesting examples of manifolds in their own right, Lie groups are a fundamental tool in the study of more general manifolds.

In examples (b) and (d) given above, we actually have Lie groups, since the sets

R

and

G L (n)

are smooth manifolds. In both cases, the group operation is smooth (in fact, analytic). This leads to the following general definition of a Lie group.

Definition 13.

(Lie Groups). A Lie group is a smooth manifold G that is also a group in the algebraic sense, with the property that the multiplication map

m : G \times G \to G

and the inverse map

i : G \to G

, defined by

m (g, h) = g \cdot h, i (g) = g^{- 1},

are both smooth maps of manifolds.

In fact, one can equivalently state that if G is a smooth manifold with a group structure such that the map

G \times G \to G, (g, h) \mapsto g h^{- 1}

is smooth, then G is a Lie group.

Lie groups exist at the boundary between algebra and topology. The algebraic properties of Lie groups follow from the group axioms, while their geometric properties arise from the parametrization of group elements by points of a differentiable manifold. At the topological level, a Lie group is homogeneous, meaning that every point of the manifold parametrizing the group looks the same as any other point.

The dimension of a Lie group is the dimension of the manifold that parametrizes the group operations. If G is an r–dimensional smooth manifold, the corresponding Lie group is also called an r–parameter Lie group. In particular, every Lie group is a topological group, that is, a topological space equipped with a group structure such that the multiplication and inversion maps are continuous.

If G is a Lie group, any element

g \in G

defines the maps

L_{g}, R_{g} : G \to G

, called respectively the left translation and right translation, given by

L_{g} (h) = g h, R_{g} (h) = h g .

The maps

L_{g}

and

R_{g}

are smooth and, in fact, are diffeomorphisms of of G.

Example 7.

(Lie Groups). Each of the following manifolds is a Lie group with the specified group operation.

(a) The general linear group $G L (n, R)$ is the set of all invertible $n \times n$ matrices with real entries. It is a group with group operation given by matrix multiplication, as already noted in Example 6(d), and it is an open subset of the vector space $M (n, R)$ . As will be shown below, multiplication is smooth since the entries of the matrix product $A B$ are polynomials in the entries of A and B.
(b) Let $G L^{+} (n, R)$ denote the subset of $G L (n, R)$ consisting of matrices with positive determinant. Since $d e t (A B) = d e t (A) d e t (B)$ and $det (A^{- 1}) = 1 / det (A)$ , it is a subgroup of $G L (n, R)$ . Because this subset is the preimage of $(0, \infty)$ under the continuous determinant function, it is an open subset of $G L (n, R)$ and hence an $n^{2}$ –dimensional manifold. The group operations are the restrictions of those on $G L (n, R)$ and are therefore smooth. Thus $G L^{+} (n, R)$ is a Lie group.
(c) The complex general linear group $G L (n, C)$ is the group of invertible complex $n \times n$ matrices under matrix multiplication. It is an open submanifold of $M (n, C)$ and hence a $2 n^{2}$ –dimensional smooth manifold. It is a Lie group since matrix multiplication and inversion are smooth functions of the real and imaginary parts of the matrix entries.
(d) If V is an arbitrary real or complex vector space, $G L (V)$ denotes the set of invertible linear transformations from V to itself. It is a group under composition of functions. If V has finite dimension n, any choice of basis of V determines an isomorphism of $G L (V)$ with $G L (n, R)$ or $G L (n, C)$ , so that $G L (V)$ is a Lie group. The transition map between two such isomorphisms is given by a map of the form $A \mapsto B A B^{- 1}$ , where B is the change-of-basis matrix, which is smooth. Consequently, the smooth manifold structure on $G L (V)$ is independent of the choice of basis.
(e) The field of real numbers $R$ is a Lie group under addition, with the inverse given by $x \mapsto - x$ .
(f) Let $G = R^{r}$ with its natural manifold structure, and let the group operation be vector addition $(x, y) \mapsto x + y$ . The inverse of a vector x is $- x$ . These operations are smooth, so $R^{r}$ provides an example of an r–parameter Abelian Lie group. Similarly, $C$ and $C^{n}$ are Lie groups under addition.
(g) Let $G = S O (2)$ be the group of planar rotations. That is,

$G = \{[\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}] : 0 \leq θ < 2 π\},$

where θ denotes the angle of rotation of a vector under multiplication by the rotation matrix. Note that G can be identified with the unit circle

$S^{1} = \{(cos θ, sin θ) : 0 \leq θ < 2 π\}$

in $R^{2}$ , which allows one to endow $S O (2)$ with a manifold structure.

2.10.2. Matrix Groups

In this section we consider several matrix Lie groups that play an important role in matrix analysis and matrix computations. For brevity, in what follows we shall denote the general linear group

G L (n, R)

simply by

G L (n)

.

General linear group

As noted above, the general linear group

$G L (n) = \{A \in M (n) : det (A) \neq 0\}$

of all invertible $n \times n$ matrices with real entries is a smooth manifold of dimension $n^{2}$ , since it is an open subset of the space of all $n \times n$ matrices $M (n)$ . Indeed,

$G L (n) = R^{n^{2}} \subset \{det (A) = 0\},$

where the space $R^{n^{2}}$ is identified with the space $M (n)$ of all real $n \times n$ matrices. Since the determinant function $det : M (n) \to R$ is continuous —- because $det (A)$ is a polynomial in the matrix entries -— the set $R^{n^{2}} \subset \{det (A) = 0\}$ is open. Hence it is a domain and therefore a smooth manifold of dimension $n^{2}$ .

To prove that $G L (n)$ is a Lie group, we must verify that matrix multiplication and matrix inversion are smooth operations. Given two matrices A and B in $G L (n)$ , the element in position $(i, j)$ of the product $A B$ is given by

${(A B)}_{i j} = \sum_{k = 1}^{n} a_{i k} b_{k j} .$

Thus, ${(A B)}_{i j}$ is a polynomial of degree two in the entries of A and B. Consequently, the matrix multiplication map

$m : G L (n) \times G L (n) \to G L (n)$

is a smooth mapping.

Recall that the $(i j)$ –minor of a matrix A is the determinant of the submatrix $A_{i j}$ , obtained by deleting the i–th row and the j–th column of A. According to Cramer’s rule the $(i j)$ –th entry of $A^{- 1}$ is given by

${(A^{- 1})}_{i j} = \frac{1}{det (A)} . {(- 1)}^{i + j} (the (j, i) - minor of A),$

which is a smooth function of the entries $a_{i j}$ , provided that $det (A) \neq 0$ . That is, the mapping

$M (n) \to R, A \mapsto {(A^{- 1})}_{i j}$

is smooth, since it depends smoothly on the entries of A. Therefore, the matrix inversion map

$i : G L (n) \to G L (n)$

is also smooth.

The complex general linear group $G L (n, C)$ is likewise a Lie group with respect to matrix multiplication and inversion. The set $G L (n, C)$ is an open subset of $M (n, C)$ and hence is a smooth manifold of dimension $2 n^{2}$ . It is a Lie group because matrix multiplication and inversion are smooth functions of the real and imaginary parts of the matrix entries.
Special Linear Group

This group is defined as

$S L (n) = \{A \in M (n) : det (A) = 1\} .$

Geometrically, $S L (n)$ consists of all transformations of $R^{n}$ that preserve both volume and orientation. It can be shown that $S L (n) = {det}^{- 1} (1)$ is a smooth manifold of dimension $n^{2} - 1$ , defined by the condition $det (A) - 1 = 0$ . Since this manifold is a subset of the Lie group $G L (n)$ with the operation inherited from $G L (n)$ , $S L (n)$ is also a Lie group. Moreover, the tangent space of $S L (n)$ at the identity is the subspace of $M (n, R)$ consisting of all matrices with zero trace.
Group of Orthogonal Matrices

The group of orthogonal $n \times n$ matrices is defined as

$O (n) = \{A \in G L (n) : A^{T} A = I_{n}\} .$

Thus, $O (n)$ is a subset of $R^{n^{2}}$ , defined by $n^{2}$ equations

$A^{T} A - I_{n} = 0$

in terms of the entries $a_{i j}$ of the matrix A. It can be shown that exactly $n (n + 1) / 2$ of these equations, corresponding to the entries on and above the diagonal, are independent and satisfy the maximal rank condition for $O (n)$ everywhere. Therefore, $O (n)$ is a submanifold of $G L (n)$ of dimension $n - n (n + 1) / 2 = n (n - 1) / 2$ . Moreover, matrix multiplication and the matrix inversion operation remain smooth when restricted to $O (n)$ . Consequently, $O (n)$ itself is a Lie group.
Special Orthogonal Group

The equation

$A^{T} A = I_{n},$

used in the definition of the orthogonal group $O (n)$ , in particular implies that every matrix $A \in O (n)$ is invertible with $A^{- 1} = A^{T}$ . Consequently, the determinant of $A \in O (n)$ must satisfy ${(det (A))}^{2} = 1$ , i.e., $det (A) = \pm 1$ . In this way, $O (n)$ is divided into two disconnected components: the subset of matrices with determinant $+ 1$ and the subset of matrices with determinant $- 1$ .

If A and B both have determinant -1, then their product $A B$ has determinant $+ 1$ . Therefore, the subset of orthogonal matrices with determinant $- 1$ is not closed under multiplication and is not a subgroup of $O (n)$ . The other component, however, is a Lie subgroup of $O (n)$ and is called the special orthogonal group, denoted by $S O (n)$ ,

$S O (n) = \{A \in O (n) : det (A) = 1\} \subset O (n) .$

The subgroup $S O (n)$ is a Lie group.
Unitary and Special Unitary Groups

The unitray group $U (n)$ is defined as

$U (n) : = \{A \in G L (n, C) : A^{H} A = I_{n}\},$

where $A^{H} = {\bar{A}}^{T}$ denotes the Hermitian (complex conjugate transpose) of A. A similar argument as for $O (n)$ shows that $U (n)$ is a submanifold o $G L (n, C)$ and that $\dim U (n) = n^{2}$ .

The special unitary group $S U (n)$ is defined as the subgroup of $U (n)$ consisting of matrices with determinant equal to 1.
Group of Upper Unit Triangular Matrices

The group $T (n)$ of upper triangular matrices with ones on the main diagonal is an $n (n - 1) / 2$ –parameter Lie group. As a manifold, $T (n)$ can be identified with the Euclidean space $R^{n (n - 1) / 2}$ , since each matrix is uniquely determined by its entries above the diagonal. For example, in the case of $T (3)$ we identify the matrix

$[\begin{matrix} 1 & x & z \\ 0 & 1 & y \\ 0 & 0 & 1 \end{matrix}] \in T (3)$

with the vector $(x, y, z)$ in $R^{3}$ . However, except for the special case $T (2)$ , the group $T (n)$ is not isomorphic to the Abelian Lie group $R^{n (n - 1) / 2}$ .

Several other important matrix groups are defined by imposing linear or quadratic constraints on the

n^{2}

entries of

G L (n, F)

or

S L (n, F)

.

3. Geometry of Matrix Space

3.1. The Matrix Space

Let us consider the linear operator

A : F^{n} \to F^{m},

represented by a rectangular matrix

A \in F^{m \times n}

. The finite–dimensional space of all linear operators from

F^{n}

to

F^{m}

is isometric to the vector space

F^{m \cdot n}

, that is, the two spaces may be regarded as the same metric space. This makes it possible to study the space of linear operators using methods from the analysis of metric and topological spaces.

Let the set

M (m, n)

of all

m \times n

matrices with entries in

F

be endowed with the topology induced by the natural mapping

i : F^{m \cdot n} \to M (m, n)

:

(x_{1}, x_{2}, \dots, x_{m n}) \mapsto [\begin{matrix} x_{1} & x_{2} & \dots & x_{n} \\ x_{n + 1} & x_{n + 2} & \dots & x_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{(m - 1) n + 1} & x_{(m - 1) n + 2} & \dots & x_{m n} \end{matrix}] .

The mapping i is a homeomorphism that induces on

M (m, n)

the structure of a smooth

(C^{\infty})

manifold of dimension

m n

. In other words, the set

M (m, n)

of all rectangular matrices

A \in F^{m \times n}

forms a smooth manifold of dimension

m \cdot n

embedded in the vector space

F^{m \cdot n}

.

According to the above considerations, a complex rectangular matrix

A \in C^{m \times n}

can be represented as a point in the complex linear space

C^{m \times n}

of dimension

m \cdot n

. Similarly, a real

m \times n

matrix A can be viewed as a point in the real space

R^{N}

, with

N = m \cdot n

, by arranging its entries in a prescribed order and interpreting them as coordinates. The corresponding space is called the matrix space and is isometric to a vector space of the same dimension. In the same way, a complex polynomial of degree n can be identified with a point in

C^{N}

, where

N = n + 1

by using its coefficients as coordinates.

The distance between two points in the matrix space is defined by

dist (A, B) = ∥ A - B ∥ .

Depending on the norm used, we shall implement the 2–norm distance

{dist}_{2} (A, B) = {∥ A - B ∥}_{2}

or the Frobenius norm distance

{dist}_{F} (A, B) = {∥ A - B ∥}_{F}

.

3.2. Generic and Well–Posed Problems

Consider the following idealized geometric interpretation of a computational problem

y = f (x)

.

Let

X

denote the space of the data x, let

Y

denote the space of solutions y and let

f : X \to Y

be (a generally nonlinear) operator mapping

x \in X

to

y \in Y

. In general, the spaces

X

and

Y

may be arbitrary topological spaces, but for the purposes of our analysis we restrict attention to metric spaces and, in particular, to normed vector spaces.

The components of x are called parameters and the space

X

is referred to as the parameter space. The dimension of the parameter space denoted by

\dim (X)

is equal to the number N of parameters. Each computational problem is therefore identified with a point in

X

.

If the vector x represents a

m \times n

matrix with independent entries, than

X

is the full matrix space with dimension

m \cdot n

. However, if the number of parameters N is smaller than

m \cdot n

, the matrix entries are no longer independent. Consequently, the parameter space in this case will have smaller dimension than the corresponding full matrix space.

If the mapping

f : x \to y

is linear, the problem is called linear; otherwise it is called nonlinear. If the inverse operator

f^{- 1}

exists (at least locally) for the given data x, the problem is said to be regular. In the opposite case, the problem is called singular.

Suppose

P (x)

is a property that may be asserted about a problem x. This property represents a function

P : X \to {0, 1}

, where

P (x) = 1

(resp. 0) indicates that

P

holds (resp. fails) at x. In applications where x represents a data of a physical problem subject to errors and uncertainties, it is important to understand the topological features of

P

.

For instance, if

P

holds at a nominal parameter value

x_{0}

, it is useful to know whether

P

also holds at points

x = x_{0} + δ x

in a neighborhood of

x_{0}

, corresponding to small deviations

δ x

of the parameters from their nominal values. Usually, the property

P

of interest holds for all sets of parameter values except those corresponding to points x lying on some surface in the parameter space, which are thus atypical.

From the point of view of algebraic geometry [34], such a surface represents a variety

S

whose dimension satisfies

0 < \dim (S) < N

. We call

S

the variety of singular cases. Typically,

S

is a closed subset of

X

, while the set of problems for which

P

holds contains an open and dense subset of

X

.

We say that a property

P

is generic relative to $S$ if

P (x) = 0

only for points

x \in S

. A generic property of a parameter space is a property that holds at “almost all” points of that space. Intuitively, a generic property is one that “almost all”problems satisfy. Equivalently, if the parameters are chosen at random, then

P = 1

with probability 1.

It should be emphasized that, depending on the problem, the variety of singular cases may have a complex structure. In the special case when the variety has no singular points, it is referred to as manifold.

Since regular problems are generic, they have been studied more thoroughly than the corresponding singular problems. Therefore, the analysis of generic cases is always a primary focus when investigating phenomena and processes described by a given mathematical model. Nevertheless, there are situations in which it becomes necessary to examine non–generic cases. In this context, it is appropriate to quote the words of the great German mathematician Leopold Kronecker, spoken in 1874 [35]: “It is customary–especially in algebraic problems–to encounter truly new difficulties when one moves away from the cases usually regarded as general. As soon as one penetrates beneath the surface of the so-called generality, which excludes all particularities in the true generality encompassing all singularities, one typically first meets the real challenges of investigation, but at the same time also the richness of new perspectives and phenomena contained in its depths.”

A property

P

is said to be well posed at x if

P

holds through some neighborhood of x in

X

. A problem parametrized by data in

X

is said to be well posed at a point

x_{0}

, if it is solvable for all data points

x_{0} + δ

in some neighborhood of

x_{0}

. According to the French mathematician Jacques Hadamard [36], a mathematical problem is well posed, if it satisfies the following three conditions:

a solution exists,
the solution is unique,
the solution depends continuously on the data of the problem.

If a problem fails to satisfy at least one of these conditions – namely, if a solution does not exist for some set

S

of parameters

x \in S

, if the solution is not unique, or if the solution does not depend continuously on the parameters – then the problem is called an ill–posed problem.

The ill–posed problems typically form a variety

S

within the space of all problems. This variety is referred to as the set of ill–posed problems.

If the property

P

is generic relative to

S

, then

P

is well posed at every point in the complement

S^{c}

(where

S \cup S^{c} = X

). Since the dimension of

S

is less than the dimension of

X

, it follows that almost all problems in

X

are well posed.

It should be noted, however, that many meaningful and important problems in physics and computational mathematics are not well–posed, that is, they are ill–posed.

Example 8.

Consider the problem of finding the inverse of a square matrix A of order n. In this case, the property

P

is the nonsingularity of the matrix A and the set

S

consists of all singular matrices. The set of singular matrices is an algebraic variety, since it can be defined as the zero set of the polynomial

\det (A)

. Thus, the equation

\det (A) = 0

describes the variety

S

.

For example, if

n = 2, N = 3

and

A = [\begin{matrix} a & 1 \\ c & b \end{matrix}],

then the variety

S

is determined by the equality

a b = c

and represents a hyperbolic paraboloid – a two dimensional manifold in the three–dimensional parameter space (see Figure 23). All points outside

S

correspond to nonsingular matrices A, that is matrices for which

P (A) = 1

.

Example 9.

Consider the eigenvalue problem for the same matrix, as in the previous example. In this case, the property

P

is possessed by matrices with distinct eigenvalues, and the set

S

consists of all matrices with a double eigenvalue. Since the eigenvalues of A are given by

λ_{1, 2} = \frac{a + b \pm \sqrt{{(a + b)}^{2} - 4 (a b + c)}}{2},

the set

S

is obtained by setting the discriminant equal to zero:

{(a + b)}^{2} - 4 (a b + c) = 0 .

The corresponding surface in the three–dimensional parameter space is shown in Figure 24.

3.3. Conditioning of Computational Problems

To obtain an accurate solution of a computational problem, it is not sufficient for the problem to be well posed. If the problem lies close, in some sense, to the set

S

of ill–posed problems, then it may be highly sensitive to variations in the data, and the discrepancy between the computed and the exact solutions can be very large.

The sensitivity of a given mathematical problem describes how its solution varies in the vicinity of a nominal solution. This important property of mathematical problems is the subject of perturbation theory, which is widely used in science and engineering.

A simplified quantitative characterization of sensitivity can be obtained by using the notion of conditioning.

Conditioning is a property of a computational problem that characterizes the sensitivity of its solution to small changes (or perturbations) in the data. A problem is said to be well conditioned, if small perturbations in the data lead to small changes in the solution. Conversely, if small changes in the data may result large changes in the solution, the problem is called ill conditioned.

It should be emphasized that the conditioning is an intrinsic property of the problem itself. It does not depend on the numerical precision used to solve the problem, nor does it depend on the particular numerical method implemented.

The theory of conditioning is developed in [37] and [38]. Various kinds of condition numbers are discussed in [39] and [40], while the computing of condition estimates is addressed in [42], Ch. 15, and [41,43,44].

3.4. Condition Numbers

Consider the computational problem

y = f (x)

, where f maps the input data x to the output y. The problem is said to be well conditioned if

f (x)

is relatively insensitive to small perturbations

δ x

in x, that is,

f (x)

remains close to

f (x_{0})

when

x = x_{0} + δ x

is near

x_{0}

. The precise meaning of “insensitive”, “small”and “close”depends on the specific context of the problem. Conversely, if small changes in x produce large variations in f, the problem is ill conditioned.

The conditioning of a computational problem can be formally characterized as illustrated in Figure 25 where

∥ . ∥

denotes the vector norms in

X

and

Y

, respectively.

Definition 14.

(Absolute condition number). The absolute $δ x$ –condition number of the mapping

f : X \to Y

at the point

x_{0} \in X

is defined as

κ_{δ x}^{a b s} (f, x_{0}) = \inf {ρ : f [S_{x} (x_{0}, ∥ δ x ∥)] \subset S_{y} (f (x_{0}), ρ ∥ δ x ∥)} .

(3)

According to this definition, the condition number

κ_{δ x}^{a b s}

is the smallest number

ρ

such that the image

f [S_{x} (x_{0}, ∥ δ x ∥)]

of the sphere

S_{x} (x_{0}, ∥ δ x ∥)

lies entirely within the sphere

S_{y} (f (x_{0}, ρ δ x)

of a radius

ρ ∥ δ x ∥

.

Defined in this way, the condition number can be interpreted as a Lipschitz constant for the mapping f, i.e., it is the smallest number M for which

∥ f (x_{0} + δ x) - f (x_{0}) ∥ \leq M ∥ x - x_{0} ∥

holds for all points on the boundary of

S (x_{0}, ∥ δ x ∥)

.

As illustrated in Figure 25, the condition number

κ

provides a quantitative measure of how perturbations in the data are transmitted to the solution space

Y

. If

κ

is relatively small, the problem is well conditioned; if

κ

is large, the problem is ill conditioned.

Definition 15.

(Relative condition number). The relative $δ x$ -condition number of the mapping

f : X \to Y

at the point

x_{0} \in X

is defined as

κ_{δ x}^{r e l} (f, x_{0}) = \inf {ρ : f [S_{x}^{r} (x_{0}, ∥ δ x ∥)] \subset S_{y}^{r} (f (x_{0}), ρ ∥ δ x) ∥} .

(4)

In many practical applications, it is common to consider the condition number if the perturbation in the limit of infinitesimally small perturbations

∥ δ x ∥

. Although this represents a theoretical approximation, it is widely used in perturbation analysis because it is easier to compute.

Definition 16.

(Asymptotic condition number). The asymptotic absolute and asymptotic relative condition number of

f : X \to Y

are defined, respectively, as

\begin{matrix} (i) κ^{a b s} (f, x_{0}) & = & \lim_{∥ δ x ∥ \to 0} κ_{δ x}^{a b s} (f, x_{0}), \end{matrix}

(5)

\begin{matrix} (i i) κ^{r e l} (f, x_{0}) & = & \lim_{∥ δ x ∥ \to 0} κ_{δ x}^{r e l} (f, x_{0}) . \end{matrix}

(6)

If the Jacobian matrix

D_{x_{0}} f

of the mapping f exists, then the asymptotic absolute and relative condition numbers are given by

\begin{matrix} (i) κ^{a b s} (f, x_{0}) & = & ∥ D_{x_{0}} f ∥, \end{matrix}

(7)

\begin{matrix} (i i) κ^{r e l} (f, x_{0}) & = & \frac{∥ D_{x_{0}} f ∥ ∥ x_{0} ∥}{∥ f (x_{0}) ∥}, \end{matrix}

(8)

where

∥ D_{x_{0}} f ∥

denotes the operator (subordinate norm) of the derivative

D_{x_{0}} f

in

Y

.

The asymptotic condition number is usually referred to as the condition number of the problem. Note that this condition number does not exist for all mappings f.

The asymptotic absolute and relative condition numbers can be expressed as

κ^{a b s} (f, x_{0}) : = lim_{∥ δ x ∥ \to 0} sup \frac{∥ f (x_{0} + δ x) - f (x_{0}) ∥}{∥ δ x ∥} = ∥ D_{x_{0}} f ∥,

(9)

\begin{matrix} κ^{r e l} (f, x_{0}) & : = & lim_{∥ δ x ∥ \to 0} sup \frac{∥ f (x_{0} + δ x) - f (x_{0}) ∥ / ∥ f (x_{0}) ∥}{∥ δ x ∥ / ∥ x_{0} ∥} \\ = & \frac{∥ D_{x_{0}} f ∥ ∥ x_{0} ∥}{∥ f (x_{0}) ∥} \end{matrix}

(10)

which coincides with the expressions in (7), (8).

From this definition, it follows that

\begin{matrix} \frac{∥ f (x_{0} + δ x) - f (x_{0}) ∥}{∥ f (x_{0}) ∥} & \leq & κ^{r e l} (f, x_{0}) \frac{∥ δ x ∥}{∥ x_{0} ∥} + o (δ x), \end{matrix}

(11)

where

h = o (| δ x |)

means

| h | / | δ x | \to 0

as

| δ x | \to 0

.

Equation (11) shows that the relative change in the result can be approximated by the product of the relative condition number

κ^{r e l} (f, x_{0})

and the relative perturbation

∥ δ x ∥ / ∥ x_{0} ∥

in the data. The ratio

∥ δ x ∥ / ∥ x_{0} ∥

is called backward error since the change in the solution is represented as an equivalent perturbation in the input data equivalent to the observed change in the solution. Computational methods for which the backward error is small are called numerically stable.

Consequently, the quantity

κ^{r e l} (f, x_{0}) \frac{∥ δ x ∥}{∥ x_{0} ∥}

provides an estimate of the relative forward error in the solution, given the relative backward error. If the condition number is large, even small perturbations in the input may produce a large forward error, highlighting the difficulty of the computational problem.

It should be noted that even well–conditioned problems may have specific perturbations for which the sensitivity estimate overstates the true change in the solution. When the sensitivity estimate consistently produces overly pessimistic bounds, it is called a conservative estimate.

3.5. Conditioning of Basic Matrix Problems

In this section we present two case studies that illustrate the concept of conditioning in matrix computations.

3.5.1. Conditioning of a Linear System of Equations

Consider the linear system

A y = x,

where

A \in C^{n \times n}

is nonsingular and

y = A^{- 1} x, x \neq 0

. Solving the system corresponds to applying the linear operator

x \to A^{- 1} x

.

The entries of A serve as parameters, and the set of singular matrices forms a variety

S

, representing ill–posed problems. Small perturbations

δ A

may therefore lead to large changes in the solution.

Assuming that A is perturbed to

A + δ A

, the perturbed solution satisfies

y + δ y = {(A + δ A)}^{- 1} x,

(12)

The matrix

A + δ A

remains nonsingular provided

∥ A^{- 1} δ A ∥ < 1 .

(13)

Under this condition, standard perturbation theory [45], Ch. 3, [46], Sect. 2.6, [47], Sect. 1.2) yields the relative error bound

\frac{∥ δ y ∥}{∥ y ∥} \leq \frac{∥ A ∥ ∥ A^{- 1} ∥}{1 - ∥ A^{- 1} δ A ∥} \frac{∥ δ A ∥}{∥ A ∥} .

(14)

According to Definition 15, the product

κ^{r e l} = ∥ A ∥ ∥ A^{- 1} ∥

represents the relative condition number of the linear system. It satisfies

κ^{r e l} \geq 1

and is invariant under scalar multiplication of A. For sufficiently small perturbations, the bound simplifies asymptotically to

\frac{∥ δ y ∥}{∥ y ∥} \leq κ^{r e l} \frac{∥ δ A ∥}{∥ A ∥} .

(15)

If

κ^{r e l}

is small, the system is well conditioned; if large, it is ill conditioned. When

κ^{r e l} \approx 1 / u

, where

u

is the unit roundoff, the matrix is considered singular to working precision, and accurate solutions cannot be expected numerically.

Following standard notation, the condition number is denoted by

cond (A) = ∥ A ∥ ∥ A^{- 1} ∥ .

The value of the condition number depends on the chosen norm. In particular, using the singular value decomposition

A = U Σ V^{T}

,

{cond}_{2} (A) = \frac{σ_{max}}{σ_{\min}},

where

σ_{max}

and

σ_{\min}

are the largest and smallest singular values of A. The 2–norm or the Frobenius–norm condition numbers are invariant under unitary (orthogonal) transformations.

Example 10.

Consider the linear system

A y = x

, where

A = [\begin{matrix} - 6 & 2 & 5 & - 3 & p & 1 \\ 3 & - 1 & - 4 & 3 & - 8 & - 2 \\ q & 7 & - 2 & - 5 & 4 & - 3 \\ - 2 & - 8 & 9 & 4 & - 5 & 7 \\ 9 & 0 & 6 & - 1 & - 7 & 3 \\ 5 & 8 & - 3 & - 6 & 2 & 4 \end{matrix}], x = [\begin{matrix} - 2 \\ 7 \\ - 5 \\ - 3 \\ 9 \\ 1 \end{matrix}]

and

p \in [- 10, 10], q \in [- 10, 10]

are variable parameters. Assume that the matrix A is subject to the perturbation

δ A = 10^{- 6} [\begin{matrix} 2 & 5 & 8 & 9 & 3 & 4 \\ 6 & - 4 & - 7 & - 2 & - 5 & 1 \\ - 3 & 1 & 0 & 4 & 6 & - 2 \\ - 4 & 1 & 2 & - 9 & - 8 & 5 \\ 8 & - 5 & 3 & 2 & 0 & 1 \\ - 7 & 9 & - 2 & - 3 & 5 & 0 \end{matrix}] .

In Figure 26, we display the relative perturbation

\frac{{∥ δ y ∥}_{2}}{{∥ y ∥}_{2}}

in the solution caused by the perturbation

δ A

, as a function of the parameters p and q. This is shown together with the corresponding relative perturbation estimate

\frac{∥ δ y^{e s t} ∥_{2}}{{∥ y ∥}_{2}} = \frac{{cond}_{2} (A) \frac{{∥ δ A ∥}_{2}}{{∥ A ∥}_{2}}}{1 - {cond}_{2} (A) \frac{{∥ δ A ∥}_{2}}{{∥ A ∥}_{2}}} .

The peak values in the Figure represent ill conditioned matrices. The maximum condition number of the perturbed matrices is

{cond}_{2} (A) = 3.21 \times 10^{5}

which leads to maximum relative perturbations of size

0.1

.

3.5.2. Conditioning of the Eigenvalue Problem

We consider, in a simplified setting, the problem of sensitivity of the eigenvalues of a matrix. This problem is characterized by fundamentally different properties in the regular case, when the eigenvalues are distinct, and in the singular case, when the eigenvalues are multiple and possess nonlinear elementary divisors.

Let us first examine the asymptotic sensitivity of the eigenvalue problem in the regular case. In this case, the eigenvalue problem is well posed, since the variations in the eigenvalues depend linearly on the perturbations in the entries of the matrix.

Let

A \in C^{n \times n}

be a given square matrix. If A has a simple eigenvalue

λ

, with corresponding right eigenvector

x \in C^{n}

and left eigenvector

y \in C^{n}

, then

A x = λ x and y^{H} A = λ y^{H},

where

y^{H}

denotes the conjugate transpose of y. Since

λ

a simple eigenvalue, it follows that

y^{H} x = 〈 x, y 〉 \neq 0 .

For every sufficiently small perturbation

δ A \in C^{n \times n}

, there exists a unique eigenvalue

λ + δ λ

of

A + δ A

, that is close to

λ

. Therefore, we have

(A + δ A) (x + δ x) = (λ + δ λ) (x + δ x),

which implies that, up to terms of second order,

δ A x + A δ x \approx δ λ x + λ δ x .

Multiplying from the left by

y^{H}

, we obtain

δ λ = \frac{1}{〈 x, y 〉} y^{H} δ A x + o (∥ δ A ∥) .

Moreover,

max_{{∥ δ A ∥}_{F} \leq 1} | y^{H} δ A x | = ∥ x ∥ ∥ y ∥ .

Therefore, absolute condition number of the matrix A with respect to the regular eigenvalue problem can be defined as

κ_{λ}^{a b s} (A) : = \frac{∥ x ∥ ∥ y ∥}{〈 x, y 〉} .

We have that

κ_{λ}^{a b s} (A) = {∥ P ∥}_{F},

(16)

where

P = \frac{x y^{H}}{y^{H} x}

is the spectral projector onto the invariant subspace generated by x. If the right and left eigenvector are normalized so that

y^{H} x = 1

, then

κ_{λ}^{a b s} (A) : = ∥ x ∥ ∥ y ∥ .

The condition number is homogeneous, since multiplying A by a scalar does not change

κ_{λ}^{a b s} (A)

. It is also invariant under unitary (or orthogonal) transformations.

If

κ_{λ}^{a b s} (A)

is large, then

λ

is poorly conditioned. Poorly conditioned eigenvalues are computed with large errors as a consequence of their high sensitivity to perturbations in the matrix. As the separation between eigenvalues decreases, their sensitivity increases, and in the limiting case of defective eigenvalues, their condition numbers become infinitely large. In such cases, the linear estimate

| δ λ | \leq κ_{λ}^{a b s} (A) ∥ δ A ∥,

(17)

is no longer valid, and the eigenvalue problem becomes ill–posed. In this case, it is justified to set

κ_{λ}^{a b s} (A) : = \infty

, where

λ

is a multiple eigenvalue of A.

We note that the eigenvalue sensitivity estimates provided by the linear algebra package LAPACK [48] and the software system MATLAB^® [3] yield meaningful results only for well–posed problems, specifically when the matrices have simple eigenvalues, i.e., eigenvalues to which correspond linearly independent eigenvectors.

Example 11.

Consider the upper triangular matrix

A = [\begin{matrix} λ_{1} & 1 \\ 0 & λ_{1} + ε \end{matrix}], ε \neq 0 .

The matrix A has two simple eigenvalues

λ_{1}

and

λ_{2} = λ_{1} + ε

. The corresponding spectral projectors are

P_{λ_{1}} = [\begin{matrix} 1 & - 1 / ε \\ 0 & 0 \end{matrix}] a n d P_{λ_{2}} = [\begin{matrix} 0 & 1 / ε \\ 0 & 1 \end{matrix}] .

Both eigenvalues have the same absolute condition number given by

κ_{λ}^{a b s} (A) = \frac{\sqrt{1 + {| ε |}^{2}}}{| ε |} .

As

ε \to 0

, the condition number tends to infinity, since the two eigenvalues of A coalesce (

λ_{1} = λ_{2} = λ

) and the eigenvalue problem becomes ill-posed.

Since the eigenvalue condition number becomes infinite when

λ

is a multiple eigenvalue, the variety of singular cases for the eigenvalue problem consists of all matrices with multiple eigenvalues. This variety forms a hypersurface in the space of

n \times n

matrices, i.e., it has dimension

N - 1

, where

N = n^{2}

.

For matrices with multiple eigenvalues, small perturbations of the entries can cause large changes in the eigenvalues. Such matrices are characterized by eigenvalues that appear in nonlinear elementary divisors.

Following [49,50], we now examine the sensitivity of the eigenvalue problem in the singular case.

Let

J = diag (Γ_{1}^{1}, \dots, Γ_{1}^{r 1}, \dots, Γ_{q}^{1}, \dots, Γ_{q}^{r_{q}}),

where, for

j = 1, \dots, q

,

Γ_{j}^{1} = \dots = Γ_{j}^{r_{j}} = [\begin{matrix} λ & 1 \\ . . \\ . . \\ . 1 \\ λ \end{matrix}]

is a Jordan block of dimension

n_{j}

, repeated

r_{j}

times, and

n_{1} \geq n_{2} \geq \dots \geq n_{q} .

The eigenvalue

λ

is semisimple (participating in a scalar Jordan block, i.e., nondefective) if

q = n_{1} = 1

, and nonderogatory (participating in only one Jordan block) if

q = r_{1} = 1

. It follows that the algebraic and geometric multiplicities of

λ

, are, respectively,

m = \sum_{j = 1}^{q} r_{j} n_{j} and g = \sum_{j = 1}^{q} r_{q} .

Denote by

x_{1}^{k}

the right eigenvector and by

y_{1}^{k}

the left eigenvector of A associated with

Γ_{1}^{k}, k = 1, \dots, r_{1}

. With these eigenvectors we construct the matrices

X_{1} = [x_{1}^{1}, \dots, x_{1}^{r_{1}}], Y_{1} = [y_{1}^{1}, \dots, y_{1}^{r_{1}}],

corresponding to the Jordan blocks of maximum size

n_{1}

. Note that the

r_{1}

columns of

X_{1}

and

Y_{1}

are linearly independent right and left eigenvectors, respectively, each associated with a separate Jordan chain of maximal length

n_{1}

.

Assume that the matrix A is perturbed to

A + ε B

, where

ε

is small. It is shown in [49,50] that the eigenvalues

\tilde{λ}

of

A + ε B

converging to

λ

as

ε \to 0

satisfy

| \tilde{λ} {- λ | \leq c (α ε)}^{1 / n_{1}},

(18)

for all sufficiently small positive

ε

, where

α = ∥ X_{1} Y_{1}^{H} ∥_{2}

and

c > 1

.

The bound (18) shows that the sensitivity of a multiple eigenvalue depends on

ε^{1 / n_{1}}

, where

n_{1}

is the size of the largest Jordan block associated with

λ

. In the case where

λ

is simple, we have

n_{1} = r_{1} = 1

, and

α = ∥ x ∥ ∥ y ∥

, where the right eigenvector x and the left eigenvector y are normalized so that

y^{H} x = 1

. In this case, the bound (18) coincides with the bound (17) valid for a simple eigenvalue. It is important to note that the bound (18) remains finite, while the linear bound (17) becomes infinite in this case.

If

λ

is nonderogatory (i.e., there is only one Jordan block corresponding to

λ

), then

r_{1} = 1

, and again in (18) we obtain

α = ∥ x ∥ ∥ y ∥

, where x and y are the corresponding right and left eigenvectors associated with

Γ_{1}^{1}

.

The following example illustrates how the dimension of the Jordan block associated with a multiple eigenvalue affects its sensitivity.

Example 12.

Consider a Jordan block

J_{n} (λ)

of nth order with eigenvalue λ. If the zero entry in position

(n, 1)

is replaced by a small number

ε > 0

, the characteristic equation of the perturbed block becomes

{(λ - \tilde{λ})}^{n} - {(- 1)}^{n} ε = 0

and the multiple eigenvalue λ is split into n distinct eigenvalues

{\tilde{λ}}_{k} = λ + ε^{1 / n} (cos (2 k π / n) + j_{o} sin (2 k π / n)), k = 0, 1, \dots, n - 1 .

In the given case

n_{1} = 6, r_{1} = 1

and

J_{n} (λ)

has one right eigenvector

x = {[1 0 0 0 0 0]}^{T}

and one left eigenvector

y = {[0 0 0 0 1]}^{T}

so that

α = ∥ x ∥ ∥ y ∥ = 1

. The perturbed eigenvalues satisfy

| {\tilde{λ}}_{k} - λ | = ε^{1 / n},

as predicted by (18).

For instance, consider

J_{6} (- 1) = [\begin{matrix} - 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & - 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & - 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & - 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & - 1 \end{matrix}],

with

n = 6

and

λ = - 1

. For

ε = 10^{- 6}

, the perturbed eigenvalues are

\begin{matrix} - 0.900000000000000, \\ - 0.950000000000000 + j_{o} 0.086602540378444, \\ - 0.950000000000000 - j_{o} 0.086602540378444, \\ - 1.050000000000000 + j_{o} 0.086602540378444, \\ - 1.050000000000000 - j_{o} 0.086602540378444, \\ - 1.100000000000000 . \end{matrix}

The perturbations in the eigenvalues are exactly

ε^{1 / 6} = 0.1

, which is

10^{5}

larger than the original perturbation

ε = 10^{- 6}

.

In Figure 27, we show the eigenvalue perturbations of

J_{6} (- 1)

for 20 equally spaced values of ε between

10^{- 8}

(bottom circle) and

10^{- 6}

(top circle). Note that the number of the eigenvalue loci is equal to the size of the Jordan block.

Thus, a perturbation of size ε can induce changes of order

ε^{1 / n}

in the eigenvalues, which can be large even for moderate n. Since the trace of the matrix is unchanged, the mean of the eigenvalues

\sum_{1}^{n} λ_{i} / n

remains λ.

In this example, the eigenvalues are highly sensitive to perturbations because the matrix is completely defective. In contrast, eigenvalues corresponding to linear elementary divisors generally exhibit low sensitivity. This highlights that the analysis of eigenvalue sensitivity must explicitly or implicitly account for the Jordan structure of the matrix.

A large body of results on the sensitivity of matrix eigenvalues can be found in the books [46,51,52,53], and [54]. The book by Stewart and Sun [55] provides a detailed presentation of various methods for perturbation analysis of eigenvalue and eigenvector problems. Comprehensive surveys of such methods are also given in [56] and [57]. Several important results on eigenvalue sensitivity have been published in [58,59,60,61,62,63,64,65], among others. Theoretical and practical aspects of eigenvalue conditioning analysis are addressed in [66,67,68,69,70,71].

3.6. Distance to an Ill–Posed Problem and Conditioning

The study of the sensitivity of a system of linear equations done in Section 3.5, shows that with the increasing of the condition number the matrix becomes closer and closer to a singular matrix. The analysis of several other problems of numerical analysis shows that, in a similar way, the corresponding condition number is inversely proportional to the distance of the problem to the set of ill–posed problems. Thus, as a problem gets closer to the set of ill–posed ones, its condition number approaches infinity.

The geometry of ill–conditioning in numerical computations has been developed by Smale [72,73,74], Renegar [75,76], and Demmel [77,78,79]. These works show that many problems in numerical analysis, particularly in matrix computations, satisfy the property that the condition number of a problem is proportional to (or bounded by a multiple of) the reciprocal of the distance to the set of ill–posed problems. Consequently, as a problem approaches the variety of ill–posed problems, its condition number can increase without bound.

Determining the probability of encountering problems with a given condition number is closely related to computing the volume of a tubular neighborhood around a manifold, a topic studied rigorously in the mathematical discipline of geometric probability [80,81,82]. The conditioning of computational problems from the perspective of geometric probability is examined in depth in the book by Bürgisser and Cucker [83]. Additionally, estimates of the distance from a matrix to the set of matrices with multiple eigenvalues are provided in [84,85,86,87,88].

3.6.1. Distance to the Set of Singular Matrices

The distance of a given matrix $A \in C^{n \times n}$ to the variety of singular cases $S$ is defined as

dist (A, S) = \min ∥ A - Z ∥ : Z \in S \geq 0 .

(19)

To define the distance between matrices in

C^{n \times n}

, we will use the 2–norm or the Frobenius norm, noting that similar results hold if another matrix norm is used.

Consider first the matrix inversion. In this setting, the following classic result holds [83,89], Ch. 1:

Theorem 5.

Let

A \in C^{n \times n}

be nonsingular. Then

dist (A, S) = \frac{1}{∥ A^{- 1} ∥} .

By defining

{cond}_{F} (A) : = \infty

for a singular matrix, we immediately obtain the relationship between the distance to singularity and the condition number.

Corollary 1.

For any nonzero

A \in C^{n \times n}

, the following holds:

{cond}_{F} (A) = \frac{{∥ A ∥}_{F}}{{dist}_{F} (A, S)} .

This results shows that for a normalized problem with

{∥ A ∥}_{F} = 1

, the condition number of the matrix in respect to inversion is inversely proportional to the distance from A to the set of singular matrices. In other words, the closer a matrix is to singularity, the larger its condition number, and hence the more sensitive it is to perturbations.

Example 13.

Figure 28 illustrates the hypersurfaces of constant condition number with respect to inversion,

{cond}_{F} (A) = 5, 10, 20

, for the matrix

A = [\begin{matrix} a & 1 \\ c & b \end{matrix}]

considered in Examples 8 and 9.

3.6.2. Distance to the Set of Defect Matrices

Consider now the eigenvalue problem. We have the following result.

Theorem 6.

[78] The distance of the matrix A to the set of matrices with multiple eigenvalues satisfies

{dist}_{F}^{λ} (A, S) \leq \frac{\sqrt{2} {∥ A ∥}_{F}}{{∥ P ∥}_{F}},

where P is the spectral projector associated with the eigenvalue of interest.

Using the expression for the eigenvalue condition number (16), and assuming that

{∥ A ∥}_{F} = 1

, we obtain

{dist}_{F}^{λ} (A, S) \leq \frac{\sqrt{2}}{κ_{λ}^{a b s} (A)}

which confirms that the eigenvalue condition number of the normalized problem is inversely proportional of the distance to the set of ill–posed problems.

In this way, the set of problems whose condition number is at least K is approximately the set of problems within distance

C / K

(with C a constant) from the variety of ill–posed matrices. As one approaches

S

, the conditioning of the problem worsens, which is why

S

is called pejorative manifold (from the Latin word pejorare - to make worse) by Kahan [90].

The eigenvalue problem for matrices with ill–conditioned eigenvalues is studied in [91,92].

3.7. Probabilistic Distribution of Condition Numbers

The idea of the probabilistic distribution analysis in the parameter space of a property characterizing the computational problem can be described in general terms [83], Ch. 2. The parameter space is endowed with a probability distribution, and a certain real–valued function

g (x)

defined on this space is considered as a random variable. The goal is to estimate quantities such as the probability that

g (x) \geq K

for a given K, which provides information about the behavior of g.

In this section, we show that the geometric structure of computational problems allows one to estimate the probability distribution of problems with a given condition number

κ (A)

in the parameter space. To this end, we exploit the fact that the multiplication by a scalar does not change

κ (A)

, i.e., the condition number is homogeneous in the parameter space. This permits normalization of the problems to unit norm, so it suffices to consider only problems that lie on the unit sphere in

R^{N}

or

C^{N}

.

Due to the homogeneity of the condition number, its distribution in the parameter

space induces the same distribution of

x / ∥ x ∥

over the unit sphere. It is natural to assume that problems are uniformly distributed in the parameter space, since each problem is as likely as any other. The uniformity allows us to bound the volume of the set of problems with condition number at least K, which lie within distance

C / K

of the variety of ill–posed problems. Consequently, the probability that

κ (A) > K

is proportional to the volume of the corresponding set of problems.

Figure 29 shows an interpretation of the probabilistic distribution of the condition number in three-dimensional space. Let

T (S, ϵ)

denote the set of all points in the unit ball

B {(1)}^{N}

that lie within distance

ϵ

of

S

,

T (S, ϵ) = {z : ∥ z ∥ \leq 1, dist (z, S) \leq ϵ} .

Such variety represents a tubular neighborhood (Section 2.7).

The ratio

\frac{vol (T (S, ϵ))}{vol (B_{1}^{N})},

where

vol (M)

denotes the volume of a manifold

M

, gives the fraction of the unit ball within the distance

ϵ

of

S

.

We are interested in the part of

T (S, ϵ)

that intersects the unit ball, namely,

T_{I} (S, ϵ) = T (S, ϵ) \cap B_{1}^{N} .

Note that

T_{I} (S, ϵ)

contains all singular problems lying on the unit sphere after normalization. The volume of this set is used to determine the probability distribution of the scaled singular problems

x / ∥ x ∥

in the

ϵ

-neighborhood of

S

. By definition, this probability is given by the ratio of the volume of

T_{I} (S, ϵ)

to the volume of the unit ball

B_{1}^{N}

, that is,

Prob (dist (x, S) \leq ϵ) = \frac{{vol}_{N - 1} (T_{I} (S, ϵ))}{vol (B_{1}^{N})},

(20)

According to this definition it holds that

0 < Prob (dist (x, S) \leq ϵ) < 1 .

Example 14.

Consider the matrix

[\begin{matrix} a & 1 \\ c & b \end{matrix}]

from Examples 8 and 9. The unit ball

B_{1}^{3}

of all matrices with unit Frobenius norm is given by the equation

a^{2} + b^{2} + c^{2} = 1 .

The variety of singular matrices (

\det (A) = a b - c = 0

) that lie on the unit sphere, is described by

b = \pm \sqrt{\frac{1 - a^{2}}{1 + a^{2}}}, c = a b .

The manifold

S

of singular matrices is represented by a hyperbolic paraboloid which is a surface with codimension 1. The sets

T (S, ϵ)

and

T_{I} (S, ϵ)

are tubular neighborhoods.

To determine

Prob (dist (x, S) \leq ϵ)

, it is necessary to compute the quantities

{vol}_{N - 1} (T_{I} (S, ϵ))

and

vol (B_{1}^{N})

.

Proposition 2.

The volume of the unit ball is given by the formula

vol (B_{1}^{N}) = \frac{2 π^{n / 2}}{n Γ (n / 2)},

where the gamma-function

Γ (n)

for a positive integer n is computed from

Γ (n) = n! .

Various proofs of this classical result can be found in [93].

Determining the quantity

{vol}_{N - 1} (T_{I} (S, ϵ))

is related to computing the volume of a tubular neighborhood of a real or complex manifold and represents a difficult problem. Using formulas for the volumes of tubular neighborhoods derived in [76,94], the following theorem was proved in [78], providing an upper bound for

{vol}_{N - 1} (T_{I} (S, ϵ))

in the complex case.

Theorem 7.

(Volume of a complex tubular neighborhood). [78] Assume that

M

is a

2 d

–dimensional complex manifold in

C^{N}

. Let

T_{I} (S, ϵ)

be the part of the unit ball in

C^{N}

that lies in a distance

ε \leq 1

from

M

. Then

\begin{matrix} {vol}_{N - 1} (T_{I} (S, ϵ)) & \leq & \frac{\sqrt{π} Γ (N + 1 / 2)}{Γ (N - d + 1 / 2) Γ (d + 1 / 2)} e^{2} N^{2} {(N - 1)}^{2 N - 2 d - 2} \\ \deg (M) ε^{2 (N - d)} {(1 + N ε)}^{2 d} . \end{matrix}

In the above expression,

e \approx 2.7183

denotes Euler’s number, and

\deg (M)

is the so–called degree of

M

, which generalizes the notion of the degree of a polynomial and is defined as the number of intersection points of an n–dimensional manifold

P

in

R^{N}

with an (

N - n

)–dimensional affine subspace of

R^{N}

. If

M

is a hypersurface

(d = N - 1)

, this upper bound can be improved to

{vol}_{N - 1} (T_{I} (S, ϵ)) \leq e^{2} N^{3} \deg (M) ε^{2} {(1 + N ε)}^{2 (N - 1)} .

The expressions for the volume of a real tubular neighborhood are more complicated and can be found in [83,95,96], Ch. 21.

3.7.1. Probability Conditioning of Matrix Inversion

For computational convenience, when determining the probability of occurrence of a matrix with a given condition number, instead of the usual condition number

κ_{2} (A) = {∥ A ∥}_{2} {∥ A^{- 1} ∥}_{2}

, we shall use the nearly equivalent scaled condition number

κ (A) = {∥ A ∥}_{F} {∥ A^{- 1} ∥}_{2}

. Since

{∥ A ∥}_{2} \leq {∥ A ∥}_{F}

, it follows that

κ_{2} (A) \leq κ (A)

.

Using Theorem 7, the following result concerning the probabilistic distribution of the matrix inversion problem was proved in [78].

Theorem 8.

Let

A \in C^{n \times n}

be a random matrix distributed in the parameter space such that

A / {∥ A ∥}_{F}

is uniformly distributed on the unit sphere. Define

κ (A) = {∥ A ∥}_{F} ∥ A^{- 1} ∥_{2} = {∥ A^{- 1} ∥}_{2} = 1 / {dist}_{2} (A, S),

where

S

denotes the set of singular matrices and

{dist}_{2} (A, S) = 1 / {∥ A^{- 1} ∥}_{2}

. Then for a given number

K > 0

, the probability that

κ (A) \geq K

satisfies

\frac{{(1 - 1 / K)}^{2 n^{2} - 2}}{2 n^{4} K^{2}} \leq Prob (κ (A) \geq K) \leq \frac{e^{2} n^{5} {(1 + n^{2} / K)}^{2 n^{2} - 2}}{K^{2}}

(21)

and

Prob (κ (A) \geq K) = \frac{n^{2} (n - 1)}{K^{2}} + o (\frac{1}{K^{2}}) .

(22)

Expression (21) provides lower and upper bounds for the probability, while (22) gives its asymptotic behavior as

K \to \infty

.

Theorem 8 was refined in [97], where the following exact result was obtained:

Theorem 9.

For a complex

n \times n

matrix A, the probability that the condition number

κ (A)

satisfies

κ (A) \geq K

is

Prob (κ (A) \geq K) = 1 - {(1 - n / K^{2})}^{n^{2} - 1}, K > \sqrt{n} .

(23)

For large K and

n > 20

, the asymptotic estimate holds:

Prob (κ (A) \geq K) \approx n^{3 / 2} / K, K > > \sqrt{n}, n > > 1 .

According to this result, the probability that the condition number of a matrix exceeds a given value K is inversely proportional to K. That is, as a matrix approaches the set of singular matrices, the condition number

κ (A) = \frac{{∥ A ∥}_{F}}{{dist}_{2} (A, S)}

increases, but the likelihood of encountering a very ill-conditioned matrix decreases.

A graphical interpretation of equation (23) for

n = 10, 100, 1000

is shown in Figure 30, where the probability estimate is plotted as a function of

dist (A, S) / {∥ A ∥}_{F} = 1 / K

. As the matrix dimension increases, the probability of encountering a larger condition number also increases.

Using a similar technique, the following result concerning the condition number with respect to inversion was obtained in [98]:

Theorem 10.

For all

n \geq 1, 0 < ε \leq 1

, it holds that

\sup_{Z \in S^{n^{2} - 1}} E_{Z \in B_{ε} (A)} (ln κ (A) (Z)) \leq 6 ln n + 2 ln \frac{1}{ε} + 4.7,

where the expectation

E

is taken over all Z uniformly distributed in the open ball

B_{ε} (A)

of radius ε centered at A on the unit sphere

S^{n^{2} - 1}

.

This result reflects the fact that, as

ε \to 0

, the condition number tends to infinity.

3.7.2. Probability Conditioning of Eigenvalues

A result similar to Theorem 8, can be obtained with respect to the eigenvalue problem.

Theorem 11.

[78] Let

A \in C^{n \times n}

be a random matrix distributed in the parameter space such that

A / {∥ A ∥}_{F}

is uniformly distributed on the unit sphere. Define

κ_{λ} (A) : = max_{λ (A)} {∥ P_{λ (A)} ∥}_{F} = \sqrt{2} / {dist}_{F} λ (A, S),

where the maximum is taken over all eigenvalues

λ (A)

of A, and

P_{λ (A)}

denotes the spectral projector associated with

λ (A)

. Then for any given number

K > 0

the probability that

κ_{λ} (A) \geq K

, satisfies

Prob (κ_{λ} (A) \geq K) \leq \frac{e^{2} n^{5} (n - 1) {(1 + \sqrt{2} n^{2} / K)}^{2 n^{2} - 2}}{K^{2}} .

(24)

In Figure 31 we show the upper probability bound given by (24) for

n = 5, 10, 15

. This bound is meaningful for

K > 11828

, which means that for

dist (A, S) > 1.1956 \times 10^{- 4}

the matrices with

κ_{λ} (A) \leq 11828

appear with a probability equal to 1.

An analogue of Theorem 10 is given by the following result.

Theorem 12.

[98] For all

n \geq 1

and

0 < ε \leq 1

, the following holds:

(a) For all real matrices,

$\sup_{A \in S^{n^{2} - 1}} E_{Z \in B_{ε} (A)} (ln κ_{λ} (Z)) \leq 8 ln n + 2 ln \frac{1}{ε} + 5.1 .$
(b) For all complex matrices,

$\sup_{A \in S^{2 n^{2} - 1}} E_{Z \in B_{ε} (A)} (ln κ_{λ} (Z)) \leq 8 ln n + 2 ln \frac{1}{ε} + 6.5 .$

Similar results are obtained for the polynomial zero finding, see [78].

The assumed uniform distribution is a continuous model which is a good approximation only as long as the finite–precision numbers of the computer arithmetic are dense enough to resemble the continuum. For a detailed discussion of this limitation of the method, see [78].

4. Geometry of Matrix Rank

The variety of singular cases in computational problems can have a highly complex structure, depending on the problem being solved. In this section, we focus on the geometric structure of the variety of singular cases in matrix space, particularly those associated with the rank of rectangular matrices.

4.1. Orbits of Matrices with Constant Rank

In studying the manifolds of rectangular matrices, different matrix manifolds can be obtained through equivalent transformations of matrices with a fixed rank. Specifically, in the space of

m \times n

rectangular matrices, the set of all matrices equivalent to a given matrix A form a smooth manifold in

C^{m \times n}

. This manifold is defined as

O (A) = {P A Q^{- 1} : \det (P) \det (Q) \neq 0},

(25)

and is called the orbit of A. Since the equivalent transformations preserve the rank of a matrix, each orbit consists of all matrices of a fixed rank r and the entire space

C^{m \cdot n}

is partitioned into orbits containing matrices of the same rank.

Consider the linear operator

A : C^{n} \to C^{m}

represented by the rectangular matrix

A \in C^{m \times n}

. The finite–dimensional space of all linear operators from

C^{n}

to

C^{m}

is isomorphic to the vector space

C^{m \cdot n}

. The linear operators

A

of maximal rank

r = \min {m, n}

form an everywhere dense subset of

C^{m \cdot n}

. Such operators are called regular or non-singular. Hence, the non–singularity is a generic property of the operators in

C^{m \cdot n}

. According to a well known results from linear algebra, a non–singular operator can b e represented in a suitable choice of bases by the

m \times n

matrix

A_{0} = [\begin{matrix} I_{r} \\ 0_{m - r, r} \end{matrix}] i f m \geq r = n

or

A_{0} = [\begin{matrix} I_{r} & 0_{r, n - r} \end{matrix}] i f r = m \leq n .

If

r < \min {m, n}

, the operator is singular and, in a suitable choice of bases, it can be represented by the

m \times n

matrix

A_{0} = [\begin{matrix} I_{r} & 0_{r, n - r} \\ 0_{m - r, r} & 0_{m - r, n - r} \end{matrix}] .

The matrix

A_{0}

is called the normal form of A and the differences

m - r

and

n - r

are referred to as coranks.

4.2. Dimension of an Orbit with Fixed Rank

In what follows, we will show that the singular operators, corresponding to matrices A of rank

r < \min {m, n}

form an orbit in

C^{m \times n}

, whose codimension is the product

(m - r) (n - r)

.

For a rectangular matrix

A \in C^{m \times n}

, define the linear operator

L_{A} : (X, Y) \to X A - A Y, (X, Y) \in C^{m \times m} \times C^{n \times n} .

Let

T_{A} O (A)

denote the tangent space to the orbit

O (A)

at A and let

R (L_{A})

denote the image of

L_{A}

. Then we have the following result.

Lemma 1.

T_{A} O (A) = R (L_{A}) .

Proof. Consider the nonlinear transformation

ψ

from the sets of

m \times m

and

n \times n

nonsingular matrices into the space

C^{m \times n}

, defined by

ψ (P, A, Q) = P A Q^{- 1} .

Assume that the matrices

| X |

and

| Y |

are sufficiently small, and set

P = I_{m} + X, Q = I_{n} + Y,

so that P and Q remain nonsingular. Then, for

| X | \to 0

and

| Y | \to 0

, we have the expansions

P^{- 1} = I_{m} - X + {O (| X |}^{2}), Q^{- 1} = I_{n} - Y + {O (| Y |}^{2})

. Substituting these into

ψ

gives

\begin{matrix} ψ (I_{m} + X, A, I_{n} + Y) & = & (I_{m} + X) A {(I_{n} + Y)}^{- 1} \\ = & A + X A - A Y + O (| Y |^{2}) \\ = & ψ (I_{m}, A, I_{n}) + L_{A} (X, Y) + {O (| Y |}^{2}), \end{matrix}

(26)

where

L_{A} (X, Y) = X A - A Y

. Hence, the tangent vectors to the orbit

O (A)

are given by the differential

D ψ (X, Y) = L_{A} (X, Y) = X A - A Y .

Evaluating the differential at the identity,

P = I_{m}, Q = I_{n}

, we obtain

T_{A} O (A) = R (L_{A}),

i.e., the tangent space to the orbit at A is exactly the range of the linear operator

L_{A}

.

The dimension of the orbit of matrices with fixed rank can be determined by noting that

\dim (O (A)) = T_{A} O (A),

where

T_{A} O (A)

is the tangent space at A. Define the subspace

Z_{A} = {(X, Y) \in C^{m \times m} \times \in C^{n \times n} : X A - A Y = 0} .

Clearly,

Z_{A}

is the null space of the linear operator

L_{A}

. Since for any linear operator, the dimension of the domain equals the sum of the dimensions of the range and null space, we have

C^{m \times n} = R (L_{A}) + N (L_{A}),

where

N (L_{A})

is the null space. Therefore,

\begin{matrix} \dim (C^{m \times n}) & = & \dim (R (L_{A})) + \dim (N (L_{A})) \\ = & \dim (T_{A} O) + \dim (Z_{A}) \\ = & \dim (O (A)) + \dim (Z_{A}) . \end{matrix}

This shows that the dimension of the orbit is equal to the dimension of the ambient space minus the dimension of the null space of

L_{A}

.

Thus, we arrive at the important relationship

\dim (Z_{A}) = \dim (C^{m \times n}) - \dim (O (A)) = codim (O (A)),

(27)

i.e., the dimension of the subspace

Z_{A}

is equal to the dimension of the normal space

N_{O}

, which in turn equals the codimension of the orbit

O (A)

. Since

\dim (Z_{A})

is invariant under equivalent transformations of A, it follows that

\dim (Z_{A}) = \dim (Z_{A_{0}}),

where

Z_{A_{0}} = \{(X, Y) : X A_{0} - A_{0} Y = 0\}

and

A_{0} = [\begin{matrix} I_{r} & 0_{r, n - r} \\ 0_{m - r, r} & 0_{m - r, n - r} \end{matrix}] .

is the normal form of a rank–r matrix.

Partitioning the matrices X and Y as

X = [\begin{matrix} X_{11} & X_{12} \\ X_{21} & X_{22} \end{matrix}], X_{11} \in C^{r \times r}, Y = [\begin{matrix} Y_{11} & Y_{12} \\ Y_{21} & Y_{22} \end{matrix}], Y_{11} \in C^{r \times r},

we have

L_{A} (X, Y) = [\begin{matrix} X_{11} & 0_{r, n - r} \\ X_{21} & 0_{m - r, n - r} \end{matrix}] - [\begin{matrix} Y_{11} & Y_{12} \\ 0_{m - r, r} & 0_{m - r, n - r} \end{matrix}] = [\begin{matrix} X_{11} - Y_{11} & - Y_{12} \\ X_{21} & 0_{m - r, n - r} \end{matrix}] .

From this expression, it follows that

\begin{matrix} \dim (R) (L_{A}) & = & \dim (C^{r \times r} \times C^{(m - r) \times r} \times C^{r \times (n - r)}) \\ = & \dim (C^{r \times r}) + \dim (C^{(m - r) \times r}) + \dim (C^{r \times (n - r)}) \\ = & r^{2} + (m - r) r + (n - r) r . \end{matrix}

Hence, the dimension of the null space of

L_{A}

is

\dim (Z_{A}) = \dim (P (L_{A})) = m n - r^{2} - (m - r) r - (n - r) r = (m - r) (n - r)

which leads to the following important result.

Theorem 13.

The codimension of rank–r matrices in

C^{m \times n}

is

(m - r) (n - r)

.

The dimension and codimension counts are illustrated in Figure 32. The dimension of the orbit is given by

\dim (O (A)) = m n - (m - r) (n - r) = (m + n - r) r

The perturbation with the minimum number of parameters that acts in the normal space of

O (A_{0})

has the form

δ A = [\begin{matrix} 0_{r \times r} & 0_{r \times (n - r)} \\ δ_{11} & δ_{12} & \dots & δ_{1, n - r} \\ 0_{(m - r) \times r} & δ_{21} & δ_{22} & \dots & δ_{2, n - r} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ δ_{m - r, 1} & δ_{m - r, 2} & \dots & δ_{m - r, n - r} \end{matrix}],

where

δ

is an

(m - r) \times (n - r)

block with independent entries

δ_{i j}

. This perturbation lies in the normal space to the orbit

O (A_{0})

and is transversal (i.e., in general position) to

O (A_{0})

.

The concept of the matrix rank was introduced by the German mathematician Ferdinand Georg Frobenius in 1879 [99], albeit implicitly in connection with determinants. Theorem 13 has been proved using various methods in ([83], p. 470), and [100,101].

4.3. Stratification of Orbits with Fixed Rank

The partition of a space into a finite number of submanifolds defined by algebraic equations and inequalities is called a stratification. The stratification means that the matrix space

C^{m \times n}

is decomposed into manifolds, called strata, which are arranged in different layers. The partition of the matrix space into orbits with fixed rank is an example of matrix stratification.

We say that a complex manifold

M

is embedded in another complex manifold

P

, if

M

is contained in the closure

\bar{P}

of

P

, which we denote by

M \subseteq \bar{P} .

Every orbit is embedded in itself and in all orbits of equal or higher rank (equivalently, equal or lower codimension).

Theorem 14.

(Orbit Embedding Theorem). In the matrix space

C^{m \times n}

, the orbit

O_{r}

of matrices of rank r is embedded in the orbit

O_{s}

if and only if only

r \leq s

or, equivalently, if and only if

codim (O_{r}) \geq codim (O_{s}) .

More precisely, the following chain of inclusions holds:

O_{0} \subseteq O_{1} \subseteq \dots \subseteq O_{\min {m, n}} .

Theorem 14 describes the stratification of the matrix space

C^{m \times n}

into orbits of different ranks. According to this result, if a matrix A belongs to the orbit

O_{r}

, it also lies in the closure of all orbits

O_{s}

with higher rank

s \geq r

. These higher–rank orbits have strictly smaller codimension than the orbit containing A.

Example 15.

Consider families of

6 \times 4

matrices of different rank. For each rank

r = 0, 1, 2, 3, 4

there exists an orbit

O_{r}

consisting of all

6 \times 4

matrices of rank r. The codimension of each orbit is given by

(m - r) (n - r)

. These orbits form a stratification of the matrix space

C^{6 \times 4}

, which can be summarized as follows:

\begin{matrix} rank : & 0 & < & 1 & < & 2 & < & 3 & < & 4 \\ orbit : & O_{0} & \subseteq & O_{1} & \subseteq & O_{2} & \subseteq & O_{3} & \subseteq & O_{4} \\ dimension : & 0 & < & 9 & < & 16 & < & 21 & < & 24 \\ codimension : & 24 & > & 15 & > & 8 & > & 3 & > & 0 \end{matrix}

The table clearly illustrates the inclusion relations between the orbits: as the rank increases, the dimension of the orbit increases while its codimension decreases. The full-rank orbit

O_{4}

is open and dense in

C^{6 \times 4}

, whereas the lower–rank orbits form boundary strata of increasing codimension.

The next example illustrates how infinitely small perturbations can cause a matrix A of a given rank to move from an orbit of higher codimension to an orbit of lower codimension.

Example 16.

Consider rank transitions of a

5 \times 3

matrix (blank entries are understood to be 0):

\begin{matrix} \begin{matrix} rank (A) = 3 \\ [\begin{matrix} η \\ ϵ \\ δ \end{matrix}] \\ codim (O_{3} (A)) = 0 \end{matrix} \overset{δ \to 0}{⟶} \begin{matrix} rank (A) = 2 \\ [\begin{matrix} η \\ ϵ \\ 0 \end{matrix}] \\ codim (O_{2} (A)) = 3 \end{matrix} \overset{ε \to 0}{⟶} \begin{matrix} rank (A) = 1 \\ [\begin{matrix} η \\ 0 \\ 0 \end{matrix}] \\ codim (O_{1} (A)) = 8 \end{matrix} \\ ⟸ perturbation increasing \end{matrix}

Note that perturbations of general position act from right to left increasing the rank of the matrix. In contrast, decreasing the rank from left to right requires perturbations with a special (diagonal) structure. Such perturbations lie in the tangent space of the corresponding matrix orbit

O (A)

and are therefore not in general position with respect to

O (A)

and not generic.

From the figure we see that the rightmost

5 \times 3

matrix A with

rank (A) = 1

(and

codim (O_{1} (A)) = (5 - 1) (3 - 1) = 8

) is arbitrarily close to matrices with

rank (A) = 2

(whose orbit has codimension

codim (O_{2} (A)) = 3

). In turn, these matrices are arbitrary close to matrices with

rank (A) = 3

(for which

codim (O_{3} (A)) = 0

).

Thus, by applying a sufficiently small perturbation that is in general position to A, one can move from the orbit

O_{1} (A), rank (A) = 1

, to the orbit

O_{2} (A), rank (A) = 2

. In other words, the orbit

O_{1} (A), rank (A) = 1

is contained in the closure of

O_{2} (A)

, and

O_{2} (A)

, is said to cover

O_{1} (A)

. Formally, this relation is written as

O_{1} (A) \subset {\bar{O}}_{2},

where

{\bar{O}}_{2}

denotes the closure of the orbit of rank–2 matrices.

Similarly, a small perturbation in general position allows one to move from

O_{2} (A), rank (A) = 2

to

O_{3} (A), rank (A) = 3

. This implies that

O_{2} (A) \subset {\bar{O}}_{3} .

Taken together, these closure relations show that, via a sufficiently small perturbation, one can move directly from

O_{1} (A), rank (A) = 1

to

O_{3} (A), rank (A) = 3

, passing through intermediate orbits of lower codimension.

It is important in numerical computations that adding a perturbation in general position to a matrix A can only decrease the codimension of the orbit in which the perturbed matrix lies; it can never increase it. This implies that, due to rounding errors, numerical algorithms effectively operate on matrices that belong to orbits (manifolds) of codimension 0, i.e., the full–rank or maximal–rank orbits.

For orbits of matrices with fixed rank, it is possible to compute exactly the distance in the matrix space from a given matrix to an orbit corresponding to a prescribed rank.

Theorem 15.

(Schmidt-Mirsky Theorem). [45], Ch. 1 Let

A \in C^{m \times n}

be a matrix with singular values

σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0,

where

r = rank (A)

. Then the distance from A to an orbit of matrices with rank

k < r

is given by

\begin{matrix} {dist}_{2} (A, O (B : rank (B) = k)) & = & σ_{k + 1}, k = 1, 2, \dots, r - 1, \\ {dist}_{F} (A, O (B : rank (B) = k)) & = & \sqrt{σ_{k + 1}^{2} + \dots + σ_{r}^{2}}, k = 1, 2, \dots, r - 1 . \end{matrix}

Thus, the matrix A lies at distances

σ_{2}, σ_{3}, \dots, σ_{r}

from the orbits corresponding to rank

1, 2, \dots, r - 1

, respectively.

4.4. Numerical Rank of a Matrix

An important task arising in matrix computations is determining the rank of a given matrix A in the presence of uncertainties in its entries. This problem is closely related to solving linear systems of less-than-full rank and to the numerical determination of the Jordan structure of a matrix. The difficulty in determining the rank stems from the fact that it is not a continuous function of the matrix entries and may change abruptly under arbitrarily small perturbations of these entries.

Let

A \in C^{m \times n}

, and without loss of generality assume that

m \geq n

. Let

A = U Σ V^{H}

be the singular value decomposition (SVD) of A, where

U \in C^{m \times m}

,

V \in C^{n \times n}

are unitary matrices, and

Σ = diag (σ_{1}, \dots, σ_{k}, \dots, σ_{n})

contains the singular values of A, with

σ_{1} \geq \dots σ_{k} \geq \dots \geq σ_{n}

. Note that

σ_{1} = {∥ A ∥}_{2}

.

The “theoretical” rank of A is defined as the number of nonzero singular values. In practice, the matrix A contains measurement, approximation, and discretization errors. Therefore, instead of the exact rank of A, we determine the rank of a perturbed matrix

A + δ A

, where the perturbation

δ A

satisfies

{∥ δ A ∥}_{2} \leq tol {∥ A ∥}_{2}

for some small positive number

tol

. The quantity

tol

can be interpreted as the relative uncertainty in A.

These considerations lead to the concept of the numerical rank of a matrix relative to the tolerance

{tol ∥ A ∥}_{2}

, defined as

k = k (A, tol) : = \min_{{∥ δ A ∥}_{2} \leq tol {∥ A ∥}_{2}} rank (A + δ A) .

(28)

Unlike the “theoretical”; rank, the numerical rank is stable in the sense that perturbations smaller than the tolerance

tol ∥ A ∥

will not change the rank of A.

The concept of the numerical rank of a matrix is discussed in detail in [46], Sec. 5.4, [45], Ch. ], [53], Sec. 3.5, and [102], Ch. 3; see also [103,104,105]. Efficient procedures for determining the numerical rank are described in [106,107] and [108].

In connection with the stability of the numerical rank, the following can be noted.

In general terms, a mathematical object is called structurally stable, if its structure remains unchanged under perturbations of the object’s parameters. If the object remains stable under large parameter variations, it is called robust.

The normal canonical form of a rectangular matrix is an example of an object that is structurally unstable. The concepts of structural stability, genericity, and transversality, when applied to dynamical system, are discussed in depth in [109], Ch. 16, [110], Ch. 4, [111], Ch. 12, and [112], Ch. 3.

5. Geometry of Jordan Form

5.1. Orbits of Matrices with Fixed Jordan Form

In the space of

n \times n

square matrices, the set of all matrices similar to a given matrix A form a manifold in

C^{n \times n}

. This manifold, defined by

O (A) = {P A P^{- 1} : \det (P) \neq 0},

(29)

is called the orbit of the matrix A. All matrices lying in the same orbit have the same eigenvalues and the same dimensions of the Jordan blocks.

The bundle

B (A)

of the matrix A is defined as the union of all orbits. It consists of all matrices whose Jordan canonical forms differ only in their eigenvalues while having the same number of distinct eigenvalues and the same sizes of the corresponding Jordan blocks (i.e., the same Segre characteristics). If two matrices have identical Jordan structures but different distinct eigenvalues, they belong to the same bundle. For example, all diagonal matrices with simple eigenvalues form a single bundle.

Each bundle is a manifold in the space of matrices, whose strata are the individual orbits. Within a given bundle, an orbit consists precisely of those matrices that share the same eigenvalues.

As an illustration, consider the two matrices with Jordan forms

J_{1} = [\begin{matrix} - 1 & 1 \\ - 1 \\ 1 \\ 2 \end{matrix}], J_{2} = [\begin{matrix} 2 & 1 \\ 2 \\ 3 \\ 4 \end{matrix}] .

These matrices lie in different orbits, since their eigenvalues differ, but they belong to the same bundle because their Jordan block structures are identical.

The most important characteristic of orbits and bundles are their dimensions. The dimension of an orbit, denoted by

\dim (O (A))

, is equal to the dimension of its tangent space

T_{O} (x)

. In practice, it is often more convenient to work with the codimension of

O (A)

denoted by

codim (O (A))

, which is equal to the dimension of the normal space

N_{O} (x)

. Since

T_{O} (x) + N_{O} (x) = C^{n \times n},

it follows that

\dim (O (A)) + codim (O (A)) = n^{2} .

The following result can now be established.

Theorem 16.

[100] Let

S_{i} = {s_{1 i} \geq s_{2 i} \geq \dots \geq s_{q_{i}, i}}

be the Segre characteristic of A associated with the eigenvalue

λ_{i}

. Then the codimension of the orbit

O (A)

is given by

\begin{matrix} codim (O (A)) & = & \sum_{i = 1}^{p} \sum_{j = 1}^{q_{i}} (2 j - 1) s_{j i} \\ = & \sum_{i = 1}^{p} (s_{1 i} + 3 s_{2 i} + 5 s_{3 i} + \dots + (2 q_{i} - 1) s_{q_{i}, i}), \end{matrix}

(30)

where p denotes the number of the distinct eigenvalues of A. Note that the complex conjugate eigenvalues are counted as two distinct eigenvalues.

The difference between orbits and bundles is that the eigenvalues of the matrices belonging to a bundle are not fixed. In other words, while an orbit consists of matrices similar to A with the same eigenvalues, a bundle allows the eigenvalues to vary, provided the Jordan structure remains unchanged. As a consequence, the tangent space of a bundle contains one additional dimension for each distinct eigenvalue compared to the tangent space of the corresponding orbit. Hence the codimension of the bundle

B (A)

is given by

\begin{matrix} codim (B (A)) & = & \sum_{i = 1}^{p} (- 1 + \sum_{j = 1}^{q_{i}} (2 j - 1) s_{j i}) \\ = & \sum_{i = 1}^{p} (- 1 + s_{1 i} + 3 s_{2 i} + 5 s_{3 i} + \dots + (2 q_{i} - 1) s_{q_{i}, i}) . \end{matrix}

(31)

Comparing (30) and (31), we obtain

codim (B (A)) = codim (O (A)) - p,

that is, the codimension of a bundle is equal to the codimension of the corresponding orbit minus the number of distinct eigenvalues. This relation reflects the fact that a bundle possesses additional degrees of freedom – one for each distinct eigenvalue – which are absent in a fixed orbit.

Note that simple eigenvalues contribute nothing to the sum in (31). Furthermore, the codimension of a bundle does not depend on the order n of the matrix, but only on the sizes of the Jordan blocks corresponding to multiple eigenvalues. Thus, the bundle codimension provides a measure of the dependencies in the space of matrices imposed by the Jordan structure of the matrix.

The partition of the matrix space into orbits and bundles, represented by manifolds with corresponding codimensions, was introduced by the Russian mathematician Vladimir Arnold in [100] in connection with the determination of normal forms of matrices depending on a minimal number of perturbation parameters. This approach makes it possible to apply methods of differential topology to the study of matrix problems [113,114,115,116,117,118], Ch. 14.

5.2. Generic and Nongeneric Jordan Bundles

Let

X = R^{n}

or

X = C^{n}

, and denote by

L X

the set of

n \times n

matrices, or equivalently, the set of linear maps on

X

. By writing all entries of a matrix columnwise as a vector, we may identify

L X

with

R^{n \cdot n}

or

C^{n \cdot n}

, respectively. Then the following important result holds [120], Sect. 5.6.

Theorem 17.

The set

M

of matrices in

L X

that have n distinct eigenvalues is open and dense in

L X

.

According to this theorem, having all distinct eigenvalues is a generic property of

n \times n

matrices, that is, “almost all” matrices have distinct eigenvalues and are therefore diagonalizable. Matrices with non–diagonal (Jordan) forms, whose Jordan blocks have specified sizes (the Segre characteristics), lie on a bundle of

X

, whose dimension is determined by the sizes of the blocks.

In the case of two distinct eigenvalues,

μ_{1}

and

μ_{2}

, each orbit is determined by a fixed combination of these eigenvalues and corresponds to a plane in the 3-dimensional space. Note that the distance between planes corresponding to infinitesimally close distinct eigenvalues is itself infinitesimal. The set of matrices with simple eigenvalues forms a bundle with dimension 4, which is dense set in the space of

2 \times 2

matrices. Such matrices are most probable, in the sense that a matrix with randomly chosen entries almost surely has distinct (simple) eigenvalues. For this reason, this case is refrerred to as the most generic case.

In Table 1 we show the different Jordan forms corresponding to the case of a single eigenvalue

λ

with algebraic multiplicity n. Note that the single nth order Jordan block associated with such eigenvalue represents the most generic Jordan structure in the case of multiple eigenvalue. The corresponding bundle contains all matrices A which are similar to nth–order companion matrices.

Table 2 summarizes the most generic and the most degenerate cases of

n \times n

matrices in terms of their Jordan structures. The most generic case corresponds to matrices with n distinct eigenvalues, each forming a separate Jordan block; in this case, the orbit has maximal dimension

n^{2} - n

and codimension n, while the coresponding bundle has dimension

n^{2}

and codimension 0. Conversely, the most degenerate case corresponds to a single eigenvalue with n scalar Jordan blocks; here, the orbit is zero–dimensional with maximal codimension

n^{2}

, and the bundle has minimal dimension 1 and codimension

n^{2} - 1

. This table illustrates how the Jordan structure directly influences the dimensions of orbits and bundles.

In the general case, when a matrix A belongs to a given bundle, it may also lie in the closure of many other bundles corresponding to different Segre characteristics. These bundles have smaller codimension than the bundle containing the original matrix, forming a hierarchy of stratification of Jordan structures. Adding a perturbation in general position decreases the codimension of the bundle into which the perturbed matrix moves. The stratification of Jordan structures has been studied by Edelman, Elmroth, and Kågström [115], who show that Jordan and Kronecker canonical forms can be represented as integer partitions. These partitions reveal closure relations of orbits and bundles through simple combinatorial rules, which can also be used to determine whether one structure is more generic than another.

5.3. The Reduction into Jordan Form as an Ill–Posed Problem

Determining the Jordan canonical form of a square nonsymmetric matrix A with defective eigenvalues by means of a computer is one of the most challenging problems in numerical matrix analysis. This difficulty arises for two main reasons. First, deciding which eigenvalues are multiple in the presence of rounding errors is inherently problematic. Second, the determination of the sizes of the Jordan blocks associated with a given multiple eigenvalue (the Segre characteristic) is closely related to computing of the numerical rank of a matrix, which itself is a difficult task working with finite–precision arithmetic.

The Jordan canonical form is structurally unstable in the sense that it is not a continuous function of the matrix entries. The following example of a

2 \times 2

matrix illustrates this instability.

Example 17.

Let

A (ε) = [\begin{matrix} 0 & 1 \\ ε & 0 \end{matrix}],

where ε is a small positive number. For

ε = 0

, this matrix has the Jordan canonical form

J = [\begin{matrix} 0 & 1 \\ 0 & 0 \end{matrix}],

whereas for

ε > 0

it has the Jordan canonical form

J = [\begin{matrix} - \sqrt{ε} & 0 \\ 0 & \sqrt{ε} \end{matrix}] .

Clearly, the Jordan canonical form of

A (ε)

changes its structure discontinuosly at

ε = 0

; that is, it is not continuous at this point. For small ε, the nonsingular matrix that diagonalizes

A (ε)

,

V = [\begin{matrix} 1 & 1 \\ - \sqrt{ε} & \sqrt{ε} \end{matrix}]

is ill–conditioned, since

{cond}_{2} (V) = \frac{1}{\sqrt{ε}} .

For example, if

ε = 10^{- 8}

, then the condition number of V with respect to inversion is

{cond}_{2} (V) = 10^{4}

.

The eigenvalue problem for a matrix A with distinct eigenvalues

λ_{i}, i = 1, 2, \dots, n

is a well–posed computational problem, since for a sufficiently small perturbation

δ A

, the eigenvalues

{\tilde{λ}}_{i}, i = 1, 2, \dots, n

of the perturbed matrix

A + δ A

satisfy the inequality

| {\tilde{λ}}_{i} - λ_{i} | \leq κ_{λ_{i}} {∥ δ A ∥}_{2},

(32)

where

κ_{λ_{i}}

is the condition number of

λ_{i}

(Section 3.5.2). If

κ_{λ_{i}}

is large, then the eigenvalue

λ_{i}

is ill–conditioned and in the limiting case of defective eigenvalues the condition number becomes infinite. In such cases, the eigenvalue problem is ill–posed.

In case of an ill–posed eigenvalue problem, a perturbation of magnitude

ε

applied to a Jordan block

J_{s_{j i}} (λ_{i}) \in C^{s_{j i} \times s_{j i}}

of order

s_{j i} > 1

may change its eigenvalues by an amount proportional to

ε^{1 / s_{j i}}

, whose derivative at

ε = 0

is infinite (see Example 12). This extreme sensitivity is the source of the ill–posedness.

The set of matrices with defective eigenvalues forms a low–dimensional surface

Π

in the

n^{2}

–dimensional parameter space of matrix entries. Consequently, matrices that lie in the vicinity of

Π

give rise to ill–conditioned eigenvalue problems. When the problem lies exactly on

Π

, it usually has bounded conditioning since the sensitivity of the defective eigenvalues is finite (see (18). For this reason, the solution of ill–conditioned problems is often obtained by projecting them onto

Π

and solving the resulting ill–posed problem. Such an approach is called regularization.

The use of regularization in the solution of the eigenvalue problem is illustrated by the following example.

Example 18.

Consider the matrix

A = [\begin{matrix} - 109 & - 126 & 675 \\ - 45 & - 52 & 279 \\ - 26 & - 30 & 161 \end{matrix}] .

Using the MATLAB^®function eig , one obtains the eigenvalues

\begin{matrix} 1.999999999999170 + j_{o} 0.000000000000000, \\ - 0.999999999999574 + j_{o} 0.000001024409445, \\ - 0.999999999999574 - j_{o} 0.000001024409445 . \end{matrix}

The corresponding eigenvalue condition numbers, determined by the MATLAB^®function condeig , are

\begin{matrix} 1.6689517668 \cdot 10^{2}, \\ 1.3224701370 \cdot 10^{8}, \\ 1.3224701370 \cdot 10^{8}, \end{matrix}

which shows that the last two eigenvalues are extremely sensitive. As a consequence, the eigenvector matrix V is also ill conditioned, with condition number

cond (V) = 3.24 \cdot 10^{8} .

The eigenvalue problem can be regularized in the following way. The matrix A is reduced to Jordan form, using the algorithm presented in [121,122]. Within this algorithm, the last two eigenvalues are recognized as multiple and are replaced by their mean value,

(λ_{2} + λ_{3}) / 2 = - 0.999999999999574 .

This approach is justified by the fact that, although the individual eigenvalues may be highly sensitive, their mean value is not. In the present case, the errors in the computed second and third eigenvalues are of the order

\sqrt{{u ∥ A ∥}_{2}} \approx 1 \cdot 10^{- 6}

, whereas the mean value changes by only

4.26 \cdot 10^{- 13}

, which is of order of the backward error

{u ∥ A ∥}_{2}

, where

u

denotes the unit roundoff.

In this way the ill conditioned eigenvalue problem is projected onto the set of ill–posed problems which includes the matrices with defective multiple eigenvalues. As a result of the reduction of A to Jordan form, one obtains

\begin{matrix} J & = & Z^{- 1} A Z \\ = & [\begin{matrix} 1.999999999999214 & 0 & 0 \\ 0 & - 0.999999999999611 & 15.248813586295865 \\ 0 & 0 & - 0.999999999999611 \end{matrix}], \end{matrix}

where

Z = [\begin{matrix} - 0.886460226183731 & 0.928279121632659 & 0.334637240705078 \\ - 0.406294270334237 & 0.309426373877600 & - 0.936984273974103 \\ - 0.221615056545937 & 0.206284249251711 & - 0.100391172211498 \end{matrix}] .

Clearly, there is a quadratic elementary divisor corresponding to the eigenvalue

- 1.0

, that is, this eigenvalue belongs to a Jordan block of size 2. In the same time, the transformation matrix Z has condition number

cond (Z) = 333.79

, which is relatively modest.

In summary, due to rounding errors and the nature of standard eigenvalue algorithms, the eigenvalues are initially computed as simple but ill–conditioned. The algorithm for reduction to Jordan form correctly recognizes these eigenvalues as multiple and determines them with maximal possible accuracy. In this way, the ill-conditioned eigenvalue problem is transformed into the problem of determining multiple eigenvalues, which is then solved accurately by reducing the matrix to Jordan form.

Example 18 confirms that the computation of defective multiple eigenvalues of a matrix is an ill–posed problem. This problem can be regularized by applying an appropriate criterion to determine the dimensions of the Jordan blocks when constructing the Jordan structure of the matrix. In this way, one can determine the exact canonical form

\hat{J}

of a near matrix

A + δ A

, where the norm of

δ A

provides an upper bound on the distance of A to the regularized problem with computed structure

\hat{J}

.

5.4. Numerical Jordan Form

Following the presentation of Zeng [123,124] and Zeng and Li [125], the problem of determining the Jordan form of a matrix in presence of errors can be formalized as follows.

The determination of the Jordan form of a matrix with an ill–conditioned eigenvalue problem may be illustrated in a simplified setting, as shown in Figure 33 where the space

R^{3}

is used as a substitution for the

n^{2}

dimensional parameter space of matrix entries. (Note that the ordering of the strata in the figure is purely illustrative.) The objective is to find the Jordan canonical form of the matrix A, represented as a point lying on a manifold

Π

.

In practice, the exact matrix A is not known. Instead, one works with an approximation

\tilde{A} = A + δ A

, which is contaminated by empirical and/or rounding errors

δ A

satisfying

∥ δ A ∥ < ε

. With respect to the Frobenius norm, the point

\tilde{A}

lies inside a sphere of radius

ε

centered at A. From a theoretical point of view, the matrix

\tilde{A}

typically has distinct eigenvalues and it is therefore used by numerical methods to compute an approximation of the eigenstructure. However, since

\tilde{A}

lies outside the manifold

Π

, its eigenvalue problem is ill–conditioned, and the corresponding numerical results may contain large errors.

To regularize the problem, the point

\tilde{A}

is projected onto the manifold

Π

, yielding a new matrix

\hat{A}

whose eigenvalues are defective but with bounded sensitivity as shown by (18). Note that there exist several nonintersecting manifolds, each corresponding to a different Jordan structure. It can be shown rigorously [124], that the best regularization results are obtained when the distance

dist (\hat{A}, \tilde{A}) = ∥ \hat{A} - \tilde{A} ∥

between

\tilde{A}

and

Π

is minimal, which corresponds to the orthogonal projection of

\tilde{A}

onto the closest pejorative manifold

Π

. This observation shows that the numerical determination of the Jordan form can be recast as a least–squares problem. As a result, the Jordan form J of

\hat{A}

is taken as the Jordan canonical form of A. The quantity

dist (\hat{A}, A) = ∥ \hat{A} - A ∥

characterizes the backward error in finding J.

Thus, we arrive at the following rigorous definition of the notion of a numerical Jordan form.

Definition 17.

Let

A \in C^{n \times n}

and let

ε > 0

. Suppose that an approximation

\tilde{A} = A + δ A

of A is given, where

∥ δ A ∥ < ε

. Let

Π \subset C^{n \times n}

be a matrix bundle such that

codim (Π) = \max {codim (Π^{'}) : dist (\tilde{A}, Π^{'}) < ε}

(33)

and let

\hat{A} \in Π

be a matrix satisfying

∥ \hat{A} {- A ∥}_{F} = \min_{B \in Π} {∥ B - A ∥}_{F}

with exact Jordan decomposition

\hat{A} = Z J Z^{- 1}

. Then the matrix J is called the numerical Jordan canonical form of A within ε, and

Z J Z^{- 1}

is called the numerical Jordan decomposition of A within ε.

Example 19.

Consider the

6 \times 6

matrix

A = [\begin{matrix} - 670 & - 1302 & 4610 & 11257 & 895 & - 16559 \\ - 1481 & - 2740 & 9987 & 24831 & 3390 & - 40853 \\ 820 & 1652 & - 5758 & - 13884 & - 556 & 18800 \\ - 792 & - 1525 & 5441 & 13336 & 1225 & - 20142 \\ - 320 & - 574 & 2127 & 5345 & 905 & - 9323 \\ - 184 & - 340 & 1240 & 3084 & 424 & - 5083 \end{matrix}]

whose exact Jordan canonical form is

J = diag ([\begin{matrix} - 2 & 1 \\ - 2 & 1 \\ - 2 & 1 \\ - 2 \end{matrix}], [\begin{matrix} - 1 & 1 \\ - 1 \end{matrix}]) .

This matrix is defective but nonderogatory. It belongs to a bundle Π in the 36–dimensional space whose codimension is

4 + 2 - 2 = 4

. The diagonal elements of the Schur form, computed using the function schur in MATLAB^®, are

\begin{matrix} - 2.005581021025717 + j_{o} 0.000000000000000, \\ - 2.000018276197160 + j_{o} 0.005599556627086, \\ - 2.000018276197160 - j_{o} 0.005599556627086, \\ - 1.994382426681778 + j_{o} 0.000000000000000, \\ - 0.999999999949297 + j_{o} 0.000018939381468, \\ - 0.999999999949297 - j_{o} 0.000018939381468 \end{matrix} .

These eigenvalues are distinct and therefore correspond exactly to the eigenvalues of a nearby matrix

\tilde{A} + δ A

, which lies in a bundle of codimension

codim (B) = 0

. Using a numerical algorithm, one computes a Jordan form

\hat{J}

with Segre characteristics

s_{11} = 4

and

s_{12} = 2

associated with the eigenvalues

\begin{matrix} - 2.000000000025454, \\ - 2.000000000025454, \\ - 2.000000000025454, \\ - 2.000000000025454 \end{matrix}

and

\begin{matrix} - 0.999999999949297, \\ - 0.999999999949297, \end{matrix}

respectively. Thus, the eigenvalues of A are computed correctly to approximately eleven decimal digits. The computed Segre characteristics show that the matrix

\begin{matrix} \hat{A} & = & Z \hat{J} Z^{- 1} = 10^{4} \cdot \\ [\begin{matrix} - 0.066999999999759 & - 0.130200000000348 & 0.461000000000600 \\ - 0.148100000000135 & - 0.274000000000015 & 0.998700000000243 \\ 0.082000000000064 & 0.165199999999896 & - 0.575799999999809 \\ - 0.079199999999947 & - 0.152499999999883 & 0.544099999999599 \\ - 0.031999999999793 & - 0.057399999999809 & 0.212699999999166 \\ - 0.018399999999934 & - 0.033999999999923 & 0.123999999999690 \end{matrix} \dots \to \\ \leftarrow \dots & \begin{matrix} 1.125699999999035 & 0.089499999992746 & - 1.655899999978640 \\ 2.483100000001330 & 0.339000000002283 & - 4.085300000008000 \\ - 1.388400000000210 & - 0.055600000002015 & 1.880000000005861 \\ 1.333599999999064 & 0.122500000000053 & - 2.014199999998984 \\ 0.534499999997335 & 0.090499999997931 & - 0.932299999990937 \\ 0.308399999999091 & 0.042399999999476 & - 0.508299999997412 \end{matrix}], \end{matrix}

belongs to the same bundle Π as A, with codimension equal to 4. The exact Jordan canonical form of

\hat{A}

is

\hat{J}

.

The relative distance between

\hat{A}

and A, which characterizes the backward error in computing

\hat{J}

, is

dist (\hat{A}, A) = ∥ \hat{A} {- A ∥}_{F} / {∥ A ∥}_{F} = 4.16 \cdot 10^{- 12} .

Note that in some cases the matrix

\hat{A}

may lie in a different bundle from A, if the Segre characteristics of A are not identified correctly.

It should be noted that, in contrast to the “theoretical case”, the numerical Jordan structure remains unchanged within a certain set of parameter values defined by the inequality

{∥ E ∥}_{F} \leq ε .

(34)

The parameter set determined by (34), is represented as a ball of radius

ε

, centered at the singular point corresponding to the theoretical case. Thus, unlike the exact Jordan form, which is defined for a single combination of parameters at a singular point, the numerical Jordan form remains the same for all parameter values contained in the ball of radius

ε

.

Efficient numerical algorithm for computing the numerical Jordan form of a matrix is presented in [121,122,126,127], see also [128].

6. Matrices Depending on Parameters

6.1. Matrix Deformations

In physical problems, the entries of a matrix A often depend on certain parameters. Suppose these parameters belong to a parameter space

P^{k}

with

\dim (P^{k}) = k

, where k is the number of independent parameters. Let

A_{0} \in C^{m \times n}

be a fixed matrix. A family of matrices

A (δ), δ \in Δ \subseteq P^{k},

where

Δ

is a neighborhood of the origin in

P^{k}

, is called a deformation, of

A_{0}

if the mapping

A : Δ \to C^{m \times n}

is such that each entry of

A (δ)

is a convergent power series in the

δ_{i}

, and

A (0) = A_{0}

. A deformation is also called matrix family and the subset

Δ \subseteq P^{k}

is referred to as the base of the family.

A function that is locally given by a convergent power series is called analytic, and the complex case, it is called holomorphic. A holomorphic function is infinitely differentiable at every point in its domain and is therefore continuous. A mapping

A : Δ \to C^{m \times n}

whose entries are holomorphic functions is called holomorpic mapping. Since the entries depend smoothly on the parameters, such mappings are convenient for computational implementation.

When a matrix depends on parameters, we say that it is given a family of matrices. In practice, we are usually interested in the family locally, i.e., for small changes of the parameters near fixed values. In such cases, we speak of deformations of the matrix corresponding to these small parameter changes.

In Figure 34 we symbolically represent a k–parameter generic matrix family that intersects the variety of singular cases transversally (i.e., the intersection occurs at “nonzero angle”). Matrix families that are transverse to all varieties are called generic families.

A family with k parameters can be viewed as a k-dimensional manifold in the matrix space. For instance, a one–parameter family is represented by a curve in a 3-dimensional matrix space with codimension equal to 2, while the variety of singular cases has codimension equal 1.

The variety of singular cases depends on the specific problem. In the context of solving linear systems of equations or inverting matrices, this variety is the set of all singular matrices. For the eigenvalue problem, the variety of singular cases consists of all matrices that are defective or/and derogatory. In this setting, the one–dimensional family typically contains matrices with simple eigenvalues except at the singular points, where the curve representing the one–parameter family intersects the manifold of singular cases.

By the principle of transversality, in the general case, the variety of singular cases has dimension

n \cdot m - k

and a codimension equal to k. Hence, the codimension of the variety of singular cases is equal to the number of parameters that determine the matrix family.

We now illustrate this concept with an example related to the eigenvalue problem.

Example 20.

Consider a two-parameter deformation of the

2 \times 2

Jordan block

A_{0} = [\begin{matrix} - 1 & 1 \\ 0 & - 1 \end{matrix}]

given by

A (x, y) = A_{0} + Δ (x, y), Δ (x, y) = [\begin{matrix} x + 10 x^{2} + y + 6 y^{2} & 0 \\ 2 x^{2} + 3 y^{2} & - 5 x + 7 y \end{matrix}] .

In this example, the orbit of

A_{0}

has dimension 2, and the tangent space of the perturbation

Δ (x, y)

also has dimension 2 for each pair

x, y

. Consequently,

\dim (O (A_{0})) + \dim (O (Δ (x, y)) = n^{2},

i.e., the perturbation

Δ (x, y)

is in general position with respect to the orbit of

A_{0}

for all

x, y

.

Figure 35 shows the matrix family

A (x, y)

for

x \in [- 0.2, 0.2], y \in [- 0.2, 0.2]

. The entries

a_{11} = - 1, a_{21} = 0, a_{22} = - 1

correspond to a singular point at which A has double eigenvalue and canonical form which is a

2 \times 2

Jordan block

A_{0}

. By introducing infinitesimal changes in x and y, the double eigenvalue splits into two simple eigenvalues, and the Jordan form becomes diagonal.

The singular points corresponding to matrices with multiple eigenvalues in the form

A = [\begin{matrix} μ & ν \\ 0 & μ \end{matrix}], ν \neq 0,

lie on the variety of the singular cases

S

, which in this example represents a plane in the three–dimensional parameter space.

6.2. Versal Deformations

We shall adopt the following terminology. By a map

B (η) \to A (δ)

of one family into another family, we mean a correspondence in which to each value of the parameter

η

of the family

B (η)

there corresponds a definite value

δ = φ (η)

of the parameter of the family

A (δ)

.

A versal family of matrices is one into which we can map every other family of matrices by means of a suitable mapping.

A universal family is a versal family

A (δ)

with the additional property that in mapping any family

B (η)

into it, the “change of parameters”

δ = φ (η)

is uniquely determined by

B (η)

.

A miniversal family is a versal family depending on the minimum possible number of parameters. Clearly, such families are of particular interest from a computational point of view.

The term “versal” is formed from the word “universal” by dropping the prefix “uni”, which signifies the uniqueness of the map

φ

. As Arnold remarks [129], “versal” is the intersection of the concepts of “universal” and “transversal”.

To formalize these notations rigorously, we introduce the following definitions.

Two deformations

A (δ)

and

A^{'} (δ)

of the

m \times n

matrix

A_{0}

are called equivalent if there exist deformations

P (δ)

and

Q (δ)

of the identity matrices

I_{m}

and

I_{n}

, respectively, both defined on the same base

Δ

, such that

A^{'} (δ) = P (δ) A (δ) Q^{- 1} (δ) .

In other words, the deformation

A^{'} (δ)

is obtained from

A (δ)

via an equivalent matrix transformation.

Let

A (δ)

and

A^{'} (η)

be deformations of

A_{0}

, where

δ \in Δ \subseteq P^{k}

and

η \in H \subseteq P^{ℓ}

, with

P^{k}

and

P^{ℓ}

being parameter spaces of dimensions k and ℓ, respectively.

If there exist deformations

P (η)

of the identity matrix

I_{m}

and

Q (η)

of the identity matrix

I_{n}

, defined for

η \in \tilde{H} \subseteq H

, together with a holomorphic mapping

φ : \tilde{H} \to Δ

satisfying

φ (0) = 0

, such that

A^{'} (η) = P (η) A (φ (η)) Q^{- 1} (η), η \in \tilde{H},

then we say that

A^{'} (η)

is induced from the deformation

A (δ)

via

P (η), Q (η)

and

φ (η)

(see Figure 36).

A deformation

A (δ)

of a matrix

A_{0}

is said to be versal, if every other deformation

A^{'} (η)

of

A_{0}

is equivalent to a deformation induced from

A (δ)

via a suitable parameter change. That is, there exist deformations

P (η)

of

I_{m}

and

Q (η)

of

I_{n}

, and a holomorphic mapping

φ (η)

with

P (0) = I_{m}, Q (0) = I_{n}, φ (0) = 0,

such that

A^{'} (η) = P (η) A (φ (η)) Q^{- 1} (η), P (0) = I_{m}, Q (0) = I_{n}, φ (0) = 0 .

(35)

A versal deformation of

A_{0}

is called universal, if the inducing map

φ

is uniquely determined by

A^{'}

.

A versal deformation of

A_{0}

is called miniversal if its parameter space has minimal dimension among all versal deformations of

A_{0}

.

We now proceed to a characterization of versal deformations.

The following theorem gives a condition under which a matrix deformation is versal.

Theorem 18.

[100] A deformation

A (δ)

of

A_{0}

is a versal deformation if and only if

A (δ)

is transversal to the orbit of

A_{0}

at

δ = 0

.

We now show that a versal deformation

A (δ)

is indeed transversal. Let

A^{'} (η)

be an arbitrary deformation of

A_{0}

. By the versality of

A (δ)

, there exist deformations

P (η), Q (η)

, an a mapping

φ (η)

such that

A^{'} (η) = P (η) A (φ (η)) Q^{- 1} (η), η \in \tilde{H} .

Differentiating and taking into account that

\frac{\partial Q^{- 1}}{\partial η} = - Q^{- 1} \frac{\partial Q}{\partial η} Q^{- 1}, Q (0) = Q^{- 1} (0) = I_{n}

we obtain

A_{*}^{'} = A_{*} φ_{*} + P_{*} A_{0} - A_{0} Q_{*},

where the subscript * denotes differentiation with respect to

η

at

η = 0

.

Consequently, for every tangent vector

v \in P^{ℓ}

at the base of

A^{'}

, we have that

A_{*}^{'} v = A_{*} φ_{*} v + (P_{*} A_{0} - A_{0} Q_{*}) v .

By Lemma 1, the vector

(P_{*} A_{0} - A_{0} Q_{*}) v

belongs to the tangent space of the orbit

O (A_{0})

. Therefore, any vector in the tangent space

T_{A_{0}} A (δ)

at

A_{0}

can be expressed as the sum of a vector in the image of

A_{*}

and a vector tangent to the orbit of

A_{0}

.

Hence,

A (δ)

is transversal to the orbit of

A_{0}

, as illustrated in Figure 37.

A proof that a deformation which is transversal to the orbit is versal, can be found in ([113], Sect. 2.9).

According to Theorem 18, we have

T_{A_{0}} A (δ) + T_{A_{0}} O (A_{0}) = C^{m \times n} .

(36)

This implies that the dimension of the parameter space of a versal deformation of

A_{0}

is equal to

\dim (Δ) = m \cdot n - \dim (O (A_{0})),

i.e., it equals the codimension of the orbit

O (A_{0})

.

Equation (36) represents a particular case of a general situation, which can be described as follows. Let

N \subset M

be a smooth submanifold of a manifold M. Consider a mapping

A : Δ \to M

of another manifold

Δ

into M, and let

δ \in Δ

be such that

A (δ) \in N

.

The mapping

A : Δ \to M

is said to be transversal to

N \subset M

at

δ

if the tangent space to M at

A (δ)

is the sum of the image of the differential of A and the tangent space to N, i.e.,

T_{A (δ)} M = D A (T_{δ} Δ) + T_{A (δ)} N .

(37)

Equation (37) gives a condition for a transversal intersection of the manifolds

Δ

and

N

(Figure 38).

Example 21.

(a) Let

$A_{0} = [\begin{matrix} λ_{1} \\ λ_{2} \\ ⋱ \\ λ_{n} \end{matrix}], λ_{i} \neq λ_{j} for i \neq j .$

Then the codimension of the orbit $O (A_{0})$ is equal to n, and an n–parameter versal deformation of $A_{0}$ is

$A (δ) = [\begin{matrix} λ_{1} + δ_{1} \\ λ_{2} + δ_{2} \\ ⋱ \\ λ_{n} + δ_{n} \end{matrix}],$

where $δ_{i}, i = 1, 2, \dots, n$ are arbitrary parameters. This deformation is both universal and miniversal.
(b) Let $A_{0} = 0$ . Then $codim (O (A_{0})) = n^{2}$ , and an $n^{2}$ –parameter versal deformation of $A_{0}$ is

$A (δ) = [\begin{matrix} δ_{11} & δ_{12} & \dots & δ_{1 n} \\ δ_{21} & δ_{22} & \dots & δ_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ δ_{n 1} & δ_{n 2} & \dots & δ_{n n} \end{matrix}],$

that is, the family of all $n \times n$ matrices. This deformation is also miniversal.

Equation (35) may be interpreted as a local approximation, in a neighbourhood of the origin, of an arbitrary matrix family

A (δ)

by a versal family

A^{'} (η)

. A versal deformation of a matrix

A_{0}

thus plays the role of a “normal form” into which not only the single matrix

A_{0}

, but also any family of matrices sufficiently close to

A_{0}

, can be transformed.

Naturally, this normal form must itself depend on parameters. Its pricipal advantage is that both the entries of the normal form and the similarity transformation leading to it can be chosen to depend smoothly on the entries of the original matrix, as they vary in a neighbourhood of

A_{0}

.

Moreover, provided that the second order terms neglected in the linear approximation (26) are sufficiently small, versal deformations preserve the bundle containing

A_{0}

, that is, the matrices of the family

A (δ)

remain in the same bundle as

A_{0}

.

Further details on the versal deformations of matrices can be found in [117,119,130]. Such deformations are used to construct normal forms of square matrices that depend smoothly on parameters [100].

6.3. Bifurcation Diagrams

Consider the structure and properties of the set in the parameter space, corresponding to the variety of singular cases in the matrix space associated with the eigenvalue problem. In this case, the parameter space can be partitioned into subsets, correspondingly to the partition of the matrix space in bundles. The exceptional values of the parameters to which correspond matrices with multiple eigenvalues (the singular cases), constitute a subset in the parameter space. This subset is called bifurcation diagram. The bifurcation diagram of a generic family of matrices represents a finite union of varieties – to each bundle of orbits corresponds its own variety in the parameter space

P

. The codimension k of a variety in the parameter space of a generic family is equal to the codimension of the corresponding bundle in the space of all matrices, i.e., to the number of parameters determining the matrix family. Therefore, the bifurcation diagram of a family of matrices is a partition of the parameter space

P

according to Jordan types of matrices. The partition consists of grouping together the matrices with the same dimensions of the Jordan blocks differing only in the eigenvalues. Thus, the bifurcation diagrams allow studying the partition of space of matrices into matrices with Jordan forms of distinct types. This makes them a useful tool to analyze the qualitative metamorphosis (or “catastrophes”) of a matrix family.

The partition of the matrix space into matrices with the same dimensions of the Jordan blocks groups the matrices into bundles of equal codimensions and represents a finite stratification of the space of matrices. In the space of families of matrices of order n, the families transversal to the stratification into Jordan types constitute an everywhere dense set. Clearly, such families consist of matrices with simple eigenvalues.

The term “bifurcation”was introduced by Poincaré in 1885 in a paper that marks the beginning of bifurcation theory [131].

An accessible introduction to the bifurcation theory of dynamical systems is given in [132], while a more comprehensive and in-depth treatment can be found in [22,133,134]. A bifurcation analysis of eigenvalues and generalized eigenvalues is presented in [135], Ch. 2.

Below, we consider bifurcation diagrams of two–, and three–parameter matrix families, associated with strata containing Jordan blocks of different sizes.

Example 22.

Consider a family of third order matrices in companion form

A = [\begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ - a_{3} & - a_{2} & - a_{1} \end{matrix}]

with characteristic equation

z^{3} + a_{1} z^{2} + a_{2} z + a_{3} = 0 .

Using the substitution

z = t - a_{1} / 3,

we obtain the depressed cubic equation

t^{3} + p t + q = 0,

(38)

p = a_{2} - a_{1}^{2} / 3, q = 2 a_{1}^{3} / 27 - a_{1} a_{2} / 3 + a_{3} .

Representation (38) shows that the parametric space associated with the matrix bundle under consideration is two–dimensional, coinciding with the codimension of the bundle.

In the generic case, the discriminant

Δ = - (4 p^{3} + 27 q^{2})

is nonzero, and the characteristic equation has three disjoint roots

α, β

, γ. In this case, the Jordan form of A is diagonal,

J = [\begin{matrix} α & 0 & 0 \\ 0 & β & 0 \\ 0 & 0 & γ \end{matrix}] .

This stratum will be denoted by

α β γ

.

If the discriminant

Δ = 0

and

p \neq 0

, then the cubic equation has a double root

α = α_{1} = α_{2} = - \frac{3 q}{2 p} - \frac{a_{1}}{3} = \frac{9 a_{3} - a_{1} a_{2}}{2 (a_{1}^{2} - 3 a_{2})}

and a simple root

β = - \frac{3 q}{p} - \frac{a_{1}}{3} = \frac{4 a_{1} a_{2} - 9 a_{3} - a_{1}^{3}}{a_{1}^{2} - 3 a_{2}} .

The Jordan form is then

J = [\begin{matrix} α & 1 & 0 \\ 0 & α & 0 \\ 0 & 0 & β \end{matrix}]

with a

2 \times 2

Jordan block corresponding to the double eigenvalue α and a

1 \times 1

block corresponding to the simple eigenvalue β. This stratum is denoted by

α^{2} β

.

If

Δ = 0

and

p = 0

, then

p = q = 0

and

α = α_{1} = α_{2} = α_{3} = - \frac{a_{1}}{3}

is a triple eigenvalue of A. Since the companion matrices are non–derogatory, the triple eigenvalue participates in one Jordan block:

J = [\begin{matrix} α & 1 & 0 \\ 0 & α & 1 \\ 0 & 0 & α \end{matrix}] .

This stratum is denoted by

α^{3}

.

In Figure 39 we show the discriminant Δ of the characteristic equation of a

3 \times 3

companion matrix for

a_{1} = 1

and various values of the coefficients

a_{2}

and

a_{3}

. The bifurcation diagram of the matrix bundle is obtained as the intersection of the discriminant surface with the plane

Δ = 0

. The bifurcation diagram forms a semi–cubic parabola–a curve with a singular point in the shape of a cusp, which corresponds to the triple eigenvalue α (Figure 40).

Example 23.

Consider a family of

2 \times 2

matrices

A = [\begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix}] .

In the regular (generic) case, such matrices have the diagonal Jordan form

J = [\begin{matrix} α & 0 \\ 0 & β \end{matrix}],

where α and β are simple eigenvalues. For simplicity, a stratum of this type will be denoted by

α β

.

We are interested in the degenerate case when the Jordan form

J = [\begin{matrix} α & 0 \\ 0 & α \end{matrix}]

consists of two

1 \times 1

blocks and the double eigenvalue α is semisimple. This stratum will be denoted by

α α

. The matrices

P J P^{- 1} = α I_{2}

are non–defective but derogatory. In the given case, the eigenvalue

α = - a_{1} / 2

is determined as the double root of the quadratic equation

z^{2} + a_{1} z + a_{2} = 0

for values of the parameters

a_{1}

and

a_{2}

zeroing the discriminant,

Δ = a_{1}^{2} - 4 a_{2} = 0 .

(39)

Equation (39) is parameterized as

\sqrt{x_{1}^{2} + x_{2}^{2}} = \pm r,

(40)

where

x_{1} = r cos (Θ), x_{2} = r sin (Θ), 0 \leq Θ \leq 2 π, r = 2 \sqrt{a_{2}} .

The expression (40) describes a variety of singular cases in

C^{3}

in the form of a cone with vertex at

r = 0

(Figure 41). The point

r = 0

, corresponding to the double semisimple eigenvalue α, is a singular point of the variety, while the conical surfaces for

r \neq 0

corresponds to bundles of type

α^{2}

with codimension 1 (single Jordan block of order 2). The points

α β

outside the bifurcation diagram represent matrices with two distinct simple eigenvalues α and β.

Example 24.

Consider a three–parameter matrix family consisting of

4 \times 4

companion matrices of the form

A = [\begin{matrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ - e & - d & - c & 0 \end{matrix}] .

The singular cases of this family consist of matrices that are defective and non-derogatory. Note that a fourth order companion matrix with a quadruple eigenvalue has only one Jordan block in its Jordan canonical form.

The characteristic equation of the matrix A has the form of a depressed quartic polynomial equation

z^{4} + c z^{2} + d z + e = 0 .

This equation has repeated roots if and only if

Δ (c, d, e) = 0,

(41)

where

Δ (c, d, e) = 16 c^{4} e - 4 c^{3} d^{2} - 128 c^{2} e^{2} + 144 c d^{2} e - 27 d^{4} + 256 e^{3}

is the discriminant of the quartic polynomial

z^{4} + c z^{2} + d z + e

.

Equation (41) can be written as a biquadratic equation

a_{1} d^{4} + a_{2} d^{2} + a_{3} = 0

with respect to the parameter d, where

\begin{matrix} a_{1} & = & - 27, \\ a_{2} & = & 144 c e - 4 c^{3}, \\ a_{3} & = & 16 c^{4} e - 128 c^{2} e^{2} + 256 e^{3} . \end{matrix}

This equation has four roots given by

d = \pm \sqrt{(- a_{2} \pm \sqrt{a_{2}^{2} - 4 a_{1} a_{3}}) / (2 a_{1})} .

Thus, the bifurcation diagram corresponding to repeated roots depends on the two free parameters c and e.

The surface representing the points

(c, d, e)

shown in Figure 42, is called a swallowtail.

According to the data presented in Table 3, the point

α^{4}

(the swallowtail point) represents the

4 \times 4

Jordan blocks

J_{4} (α)

of defective matrices, corresponding to the most degenerate companion matrices A).

The curve

α^{3} β

, consisting of two cuspidal edges emanating from the swallowtail point, corresponds to the Jordan forms

J_{3} (α) \oplus J_{1} (β)

. The curve

α^{2} β^{2}

, given by the intersection of the swallowtail wings, represents the Jordan forms

J_{2} (α) \oplus J_{2} (β)

.

The surface

α^{2} β γ

, known as the swallowtail surface, represents the Jordan forms

J_{2} (α) \oplus J_{1} (β) \oplus J_{1} (γ)

. Finally, the points

α β γ δ

, corresponding to the region outside the swallowtail, represent the diagonal Jordan forms

J_{1} (α) \oplus J_{1} (β) \oplus J_{1} (γ) \oplus J_{1} (δ),

that is, matrices with four simple eigenvalues.

The latest case is the most generic one and therefore occurs with the highest probability.

7. Conclusions

The brief survey of the applications of differential topology in matrix analysis shows that the introduction of the concepts of smooth manifolds and smooth mappings enables the analysis of various global properties of parameter-dependent matrices, including the genericity of the problems considered, the identification of singular cases, the estimation of the distance to ill–posed problems, the conditioning of problems in both deterministic and probabilistic terms, and the determination of numerical solutions as a problem for finding an optimal bundle containing an approximate solution.

Methods of differential topology are particularly well suited for matrices whose entries are only approximately known and for computations subject to rounding errors. All of this provides reason to expect that differential topology methods will find wider applications in matrix analysis.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated during the current study are available from the author upon reasonable request.

Conflicts of Interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

$F$ ,	number field;
$N$ ,	set of natural numbers $1, 2, \dots, n, \dots$ ;
$Z$ ,	set of integers $0, \pm 1, \pm 2, \dots, \pm n, \dots$ ;
$Q$ ,	set of rational numbers $p / q$ , where
	p and q are integers;
$R$ ,	set of real numbers;
$R^{+}$ ,	set of positive real numbers;
$C$ ,	set of complex numbers;
$j_{o} = \sqrt{- 1}$ ,	imaginary unit;
$A \cup B$ ,	union of sets A and B;
$A \cap B$ ,	intersection of sets A and B;
$A ∖ B$ ,	difference of sets A and B;
$A \times B$ ,	Cartesian product of sets A and B;
$C^{\infty}$ ,	class of smooth functions;
$C^{ω}$ ,	class of analytic functions;
$〈 x, y 〉$ ,	scalar product of the vectors x and y;
$C^{n \times m}$ ,	space of $n \times m$ complex matrices;
$R^{n \times m}$ ,	space of $n \times m$ real matrices;
$\| A \|$ ,	matrix of absolute values of the entries of A;
$A^{T}$ ,	transposed of A;
$A^{H}$ ,	Hermitian conjugate of A;
$A^{- 1}$ ,	inverse of A;
$0_{m \times n}$ ,	$m \times n$ zero matrix;
$I_{n}$ ,	$n \times n$ identity matrix;
$det (A)$ ,	determinant of A;
$rank (A)$ ,	rank of A;
$tr (A)$ ,	trace of a square matrix A;
$σ_{i} (A)$ ,	ith singular value of A;
$δ A$ ,	perturbation of A;
${∥ x ∥}_{2}$ ,	Euclidean norm of the vector x;
${∥ A ∥}_{2}$ ,	spectral norm of A;
${∥ A ∥}_{F}$ ,	Frobenius norm of A;
$M$ ,	manifold;
$\dim (M)$ ,	dimension of the manifold $M$ ;
$codim (M)$ ,	codimension of the manifold $M$ ;
$vol (M)$ ,	volume of the manifold $M$ ;
$f : M \to P$ ,	mapping of the manifold $M$ into the manifold $P$ ;
$i_{M} : M \to M$ ,	identity map of $M$ into $M$ ,
$D_{x} f$ ,	differential of the map f at the point x;
$g \circ f$ ,	composition of the maps g and f;
$O (A)$ ,	orbit of the matrix A;
$B (A)$ ,	bundle of the matrix A;
$T_{x} M$ ,	tangent space of $M$ at the point x;
$T M$ ,	tangent bundle of $M$ ;
$N_{x} M$ ,	normal space of $M$ at the point x;
$N M$ ,	normal bundle of $M$ ;
$S^{n}$ ,	n–dimensional sphere in $R^{n + 1}$ ;
$T^{n}$ ,	n–dimensional torus in $R^{n + 1}$ ;
$B_{r} (p)$ ,	open ball centered at p with a radius r
$B_{r} [p]$ ,	closed ball centered at p with a radius r;
$X = R (X)$ ,	subspace spanned by the columns of X;
$P {y > y}$ ,	probability of the event ${y > y}$ ;
$E {ξ}$ ,	expected value (mean) of the random variable $ξ$ ;
$v ⊥ S$ ,	vector v orthogonal to a subspace S;
$: =$ ,	equal by definition.

References

H. Weyl. Philosophy of Mathematics and Natural Science. Princeton University Press, Princeton, NJ, 2009. ISBN: 9780691141206.
M. Morse. The Calculus of Variations in the Large. AMS, New York, 1934.
The MathWorks, Inc., Natick, MA. MATLAB Version 9.9.0.1538559 (R2020b), 2020. http://www.mathworks.com.
V. Guillemin and A. Polack. Differential Topology. Prentice–Hall, Inc., Englewood Cliffs, NJ, 1974. ISBN: 0-13-212605-2.
J.M. Lee. Introduction to Topological Manifolds, volume 202 of Graduate Texts in Mathematics. Springer, New York, second edition, 2011. [CrossRef]
J.M. Lee. Introduction to Smooth Manifolds, volume 218 of Graduate Texts in Mathematics. Springer, New York, second edition, 2013. [CrossRef]
V.I. Arnold. Geometrical Methods in the Theory of Ordinary Differential Equations. Springer-Verlag, New York, second edition, 1988. ISBN: 978-1-4612-6994-6.
V.I. Arnol’d. Ordinary Differential Equations. Springer-Verlag, Berlin, 1992. ISBN: 3-540-54813-0.
L.W. Tu. An Introduction to Manifolds. Springer–Verlag, New York, 2011. [CrossRef]
K. Burns and M. Guidea. Differential Geometry and Topology. With a View to Dynamical Systems. Chapman & Hall/CRC, Boca Raton, FL, 2005. ISBN: 978-1584882534.
J.W. Milnor. Topology from the Differential Viewpoint. The University Press of Virginia, Charlotteville, VA, 1965.
A.H. Wallace. Differential Topology. First Steps. W.A. Benjamin, Inc., New York, 1968.
M.W. Hirsh. Differential Topology. Springer–Verlag, New York, 1976. [CrossRef]
V.I. Arnold. Mathematical Methods of Classical Mechanics. Springer-Verlag, New York, second edition, 1989. ISBN: 0-387-96890-3.
Y. I. Manin. The notion of dimension in geometry and algebra. Bull. Amer. Math. Soc. (N.S.), 43:139–161, 2006. [CrossRef]
E. Scholz. The concept of manifold, 1850–1950. In I.M. James, editor, History of Topology, pages 25–64. Elsevier, Amsterdam, 1999. ISBN: 0-444-82375-1.
B. Chow and C. Chow. Lectures on Differential Geometry. American Mathematical Society, Providence, RI, 2024. ISBN: 978-1-4704-7804-9.
Yu. Borisovich and N. Bliznyakov and Ya. Izrailevich and T. Fomenko. Introduction to Topology. Mir Publishers, Moskow, 1985.
V.I. Arnold, V.V. Goryunov, O.V. Lyashko, and V.A. Vasil’ev. Dynamical Systems VIII. Singularity Theory II, Applications, volume 39 of Encyclopaedia of Mathematical Sciences. Springer–Verlag, Berlin, 1993. [CrossRef]
V.I. Arnold, V.V. Goryunov, O.V. Lyashko, and V.A. Vasil’ev. Singularity Theory I, volume 6 of Encyclopaedia of Mathematical Sciences. Springer–Verlag, Berlin, 1998. [CrossRef]
M. Golubitsky and V. Guillemin. Stable Mappings and Their Singularities. Springer–Verlag, New York, 1973. ISBN: 0-387-90073-3-X.
J. Guckenheimer. Bifurcations of dynamical systems. In C. Marchioro, editor, Dynamical Systems, volume 78, pages 7–123. Springer, Berlin, 2010. [CrossRef]
R. Thom. Structural stability, catastrophe theory, and applied mathematics. SIAM Rev., 19:189–201, 1977. [CrossRef]
E.C. Zeeman. Catastrophe theory. In W. Güttinger and H. Eikemeier, editors, Structural Stability in Physics, volume 4 of Springer Series in Synergetics, pages 12–22. Springer, Berlin, 1979. [CrossRef]
V.I. Arnold. Catastrophe Theory. Springer-Verlag, Berlin, 1992. ISBN: 978-3-540-54811-9. [CrossRef]
of Encyclopaedia of Mathematical Sciences. [CrossRef]
M. Golubitsky. An introduction to catastrophe theory and its applications. SIAM Rev., 20:352–387, 1978. [CrossRef]
J. Guckenheimer. The catastrophe controversy. Math. Intelligencer, 1:15–20, 1978. [CrossRef]
F. Chaitin-Chatelin, V. Frayssé, and T. Braconnier. Computations in the neighbourhood of algebraic singularities. Numer. Funct. Anal. Optim., 16:287–302, 1995. [CrossRef]
C.T.C. Wall. Differential Topology. Cambridge University Press, Cambridge, UK, 2016. ISBN: 978-1-107-15352-3.
R. Gilmore. Lie groups: General theory. In J.P. Françoise, G.L. Naber, and T. S. Tsun, editors, Encyclopedia of Mathematical Physics, pages 286––304. Elsevier, Amsterdam, 2006. [CrossRef]
J.B. Carrell. Groups, Matrices, and Vector Spaces. Springer, New York, 2017. ISBN: 978-0-387-79427-3.
P.J. Olver. Applications of Lie Groups to Differential Equations. Springer–Verlag, New York, 1993. ISBN: 978-0-387-95000-6.
K. Kendig. Elementary Algebraic Geometry. Springer–Verlag, New York, 1977. ISBN: 978-1-4615-6901-5.
L. Kronecker. Über Schaaren von quadratischen und bilinearen formen. Monatsberichte der Königlich Preussischen Akademie der Wissenschaften zu Berlin vom Jahre 1874, pages S. 59–76, 149–156, 206–232, 1874. = Werke, 1 (1895), Teubner, Leipzig, pp. 349–413. [CrossRef]
J. Hadamard. Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Yale University Press, New Haven, 1923.
J.R. Rice. A theory of condition. SIAM J. Numer. Anal., 3:287–310, 1966. [CrossRef]
A.J. Geurts. A contribution to the theory of condition. Numer. Math., 39:85–96, 1982. [CrossRef]
I. Gohberg and I. Koltracht. Mixed, componentwise, and structured condition numbers. SIAM J. Matrix Anal. Appl., 14:688–704, 1993. [CrossRef]
W. Wang and Y. Wei. Mixed and component–wise condition numbers for matrix decompositions. Theoret. Comput. Sci., 681:199–216, 2017. [CrossRef]
W.W. Hager. Condition estimates. SIAM J. Sci. Statist. Comput., 5:311–316, 1984. [CrossRef]
N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA, second edition, 2002. ISBN: 978-0-898715-21-7.
C. Kenney and A.J. Laub. Condition estimates for matrix functions. SIAM J. Matrix Anal. Appl., 10:191–209, 1989. [CrossRef]
R. Mathias. Condition estimation for matrix functions via the Schur decomposition. SIAM J. Matrix Anal. Appl., 16:565–578, 1995. [CrossRef]
G.W. Stewart. Matrix Algorithms, volume I: Basic Decompositions. SIAM, Philadelphia, PA, 1998. ISBN: 0-89871-414-1.
G.H. Golub and C.F. Van Loan. Matrix Computations. The John Hopkins University Press, Baltimore, MD, fourth edition, 2013. ISBN: 978-1-4214-0794-4.
Å. Björck. Numerical Methods in Matrix Computations. Springer, New York, 2015. [CrossRef]
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D.C. Sorensen. LAPACK Users’ Guide. SIAM, Philadelphia, PA, third edition, 1999. ISBN: 0-89871-447-8.
V.B. Lidskii. Perturbation theory of non–conjugate operators. USSR Computational Mathematics and Mathematical Physics, 6:73–85, 1966. [CrossRef]
J. Moro, J.V. Burke, and M.L. Overton. On the Lidskii–Vishik-Lyusternik perturbation theory for eigenvalues of matrices with arbitrary Jordan structure. SIAM J. Matrix Anal. Appl., 18:793–817, 1997. [CrossRef]
J.H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon Press, Oxford, UK, 1965. ISBN: 978-0-19-853418-1.
G.W. Stewart. Matrix Algorithms, volume II: Eigensystems. SIAM, Philadelphia, PA, 2001. ISBN: 0-89871-503-2.
J.W. Demmel. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997. ISBN: 978-0-898713-89-3.
R. Bhatia. Perturbation Bounds for Matrix Eigenvalues. Classics in Applied Mathematics. SIAM, Philadelphia, PA, 2007. [CrossRef]
G.W. Stewart and J.-G. Sun. Matrix Perturbation Theory. Academic Press, San Diego, CA, 1990. ISBN: 978-0126702309.
R. Li. Matrix perturbation theory. In L. Hogben, editor, Handbook of Linear Algebra, Discrete Math. Appl., pages (21–1)–(21–20). CRC Press, Boca Raton, FL, second edition, 2014. ISBN: 978-1-4665-0729-6.
A. Greenbaum, R.-C. Li, and M.L. Overton. First-order perturbation theory for eigenvalues and eigenvectors. SIAM Rev., 62:463–482, 2020. [CrossRef]
L. Elsner. On the variation of the spectra of matrices. Linear Algebra Appl., 47:127–138, 1982. [CrossRef]
L. Elsner. An optimal bound for the spectral variation of two matrices. Linear Algebra Appl., 71:77–80, 1985. [CrossRef]
L. Elsner. Perturbation theorems for the matrix eigenvalue problem. Port. Math., 43:69–76, 1986.
P. Henrici. Bounds for iterates, inverses, spectral variation and fields of values for non–normal matrices. Numer. Math., 4:24–40, 1962. [CrossRef]
Y. Nakatsukasa. Algorithms and Perturbation Theory for Matrix Eigenvalue Problems and the Singular Value Decomposition. PhD thesis, Office of Graduate Studies, University of California, Davis, CA, 2011.
J.–G. Sun. Stability and accuracy: Perturbation analysis of algebraic eigenproblems. Technical Report 98.07, Department of Computing Science, Umeå University, Umeå, Sweden, 1998. (Revised version, 2002).
J.H. Wilkinson. Sensitivity of eigenvalues. Util. Math., 25:5–76, 1984.
J.H. Wilkinson. Sensitivity of eigenvalues, part II. Util. Math., 30:243–286, 1986.
Z. Bai, J. Demmel, and A. Mckenney. On computing condition numbers for the nonsymmetric eigenproblem. ACM Trans. Math. Software, 19:202–223, 1993. [CrossRef]
S.P. Chen, R. Feldman, and B.N. Parlett. Algorithm 517. A program for computing the condition numbers of matrix eigenvalues without computing eigenvectros. ACM Trans. Math. Software, 3:186–203, 1977. [CrossRef]
M. Karow, D. Kressner, and F. Tisseur. Structured eigenvalue condition numbers. SIAM J. Matrix Anal. Appl., 28:1052–1068, 2006. [CrossRef]
R.M. Lin, Z. Wang, and M.K. Lim. A practical algorithm for the efficient computation of eigenvector sensitivities. Computer Methods in Applied Mechanics and Engineering, 130:355–367, 1996. [CrossRef]
J.–g. Sun. On condition numbers of a nondefective multiple eigenvalue. Numer. Math., 61:265–275, 1992. [CrossRef]
C. Van Loan. On estimating the condition of eigenvalues and eigenvectors. Linear Algebra Appl., 88–89:715–732, 1987. [CrossRef]
S. Smale. The fundamental theorem of algebra and complexity theory. AMS Bull. (New Series), 4:1–35, 1981. [CrossRef]
S. Smale. On the efficiency of algorithms of analysis. AMS Bull. (New Series), 13:87–121, 1985. [CrossRef]
S. Smale. Some remarks on the foundations of numerical analysis. SIAM Rev., 32:211–220, 1990. [CrossRef]
J. Renegar. On the efficiency of Newton’s method in approximating all zeros of a system of complex polynomials. Math. Oper. Res., 12:121–148, 1987. [CrossRef]
J. Renegar. Is it possible to know a problem instance is ill–posed? J. Complexity, 10:1–56, 1994. [CrossRef]
J.W. Demmel. On condition numbers and the distance to the nearest ill–posed problem. Numer. Math., 51:251–289, 1987. [CrossRef]
J.W. Demmel. The probability that a numerical analysis problem is difficult. Math. Comp., 50:449–480, 1988. [CrossRef]
J.W. Demmel. Steve Smale and the geometry of ill–conditioning. In M.W. Hirsch, J.E. Marsden, and M.Shub, editors, From Topology to Computation: Proceedings of the Smalefest, pages 305–316. Springer–Verlag, New York, 1993. [CrossRef]
D.A. Klain and G.-C. Rota. Introduction to Geometric Probability. Lezioni Lincee. Cambridge University Press, Cambridge, UK, 1997. ISBN: 978-0-521-59362-5.
L.A. Santaló. Integral Geometry and Geometric Probability. Cambridge University Press, New York, second edition, 2004. ISBN: 978-0-521-52344-8.
H. Solomon. Geometric Probability. SIAM, Philadelphia, PA, 1978. ISBN: 978-0898710250.
P. Bürgisser and F. Cucker. Condition. The Geometry of Numerical Algorithms. Springer, Heidelberg, 2013. ISBN: 978-3-642-38895-8.
R. O. Akinola, M. A. Freitag, and A. Spence. The calculation of the distance to a nearby defective matrix. Numer. Linear Algebra Appl., 21:403––404, 2013. [CrossRef]
G. Armentia, J.-M. Gracia, and F.-E. Velasco. Nearest matrix with a prescribed eigenvalue of bounded multiplicities. Linear Algebra Appl., 592:188–209, 2020. [CrossRef]
A. Kalinina, A. Uteshev, M. Goncharova, and E. Lezhnina. On the distance to the nearest defective matrix. In F. Boulier, M. England, I. Kotsireas, T.M. Sadykov, and E.V. Vorozhtsov, editors, Computer Algebra in Scientific Computing. CASC 2023., volume 14139 of Lecture Notes in Computer Science, pages 255–271, Cham, Switzerland, 2023. Springer Nature. [CrossRef]
R.A. Lippert. Fixing multiple eigenvalues by a minimal perturbation. Linear Algebra Appl., 432:1785––1817, 2010. [CrossRef]
A.N. Malyshev. A formula for the 2–norm distance from a matrix to the set of matrices with multiple eigenvalues. Numer. Math., 83:443–454, 1999. [CrossRef]
C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 3:211–218, 1936. [CrossRef]
W. Kahan. Conserving confluence curbs ill–condition. Technical Report AD–766 916, University of California, Berkely, CA, August 1972.
A. Ruhe. Properties of a matrix with a very ill–conditioned eigenproblem. Numer. Math., 15:57–60, 1970. [CrossRef]
J.H. Wilkinson. Note on matrices with a very ill–conditioned eigenproblem. Numer. Math., 19:176–178, 1972. [CrossRef]
D.J. Smith and M.K. Vamanamurthy. How small is a unit ball. Mathematics Magazine, 62:101–107, 1989. [CrossRef]
H. Weyl. On the volume of tubes. Amer. J. Math., 61:461–472, 1939. [CrossRef]
R. Wongkew. Volumes of tubular neighbourhoods of real algebraic varieties. Pacific Journal of Mathematics, 159:177–184, 1993. [CrossRef]
M. Lotz. On the volume of tubular neighborhoods of real algebraic varieties. Poc. Amer. Math. Soc., 143:1875–1889, 2014. [CrossRef]
A. Edelman. On the distribution of the scaled condition number. Math. Comp., 58: 185–190. 1992. [CrossRef]
P. Bürgisser, F. Cucker, and M. Lotz. The probability that a slightly perturbed numerical analysis problem is difficult. Math. Comp., 77:1559–-1583, 2008. [CrossRef]
G. Frobenius. Theorie der linearen Formen mit ganzen Coefficienten. Journal für die reine und angewandte Mathematik, 86:146–208, 1879. [CrossRef]
V.I. Arnold. On matrices depending on parameters. Russian Math. Surveys, 26:29–43, 1971. [CrossRef]
J.W. Demmel and A. Edelman. The dimension of matrices (matrix pencils) with given Jordan (Kronecker) canonical form. Linear Algebra Appl., 230:61–87, 1995. [CrossRef]
P.C. Hansen. Rank–Deficient and Discrete Ill–Posed Problems: Numerical Aspects of Linear Inversion. SIAM, Philadelphia, PA, 1998. ISBN: 978-0-898714-03-6.
G.H. Golub, V. Klema, and G.W. Stewart. Rank degeneracy and least squares problems. Technical Report TR–456, Department of Computer Science, University of Maryland, College Park, MD, 1976.
G.W. Stewart. Rank degeneracy. SIAM J. Sci. Stat. Comput., 5:403–413, 1984. [CrossRef]
G.W. Stewart. On the early history of the singular value decomposition. SIAM Rev., 35:551–566, 1993. [CrossRef]
J. Demmel and B. Kågström. The generalized Schur decomposition of an arbitrary pencil A-λB: Robust software with error bounds and applications. Part I: Theory and algorithms. ACM Trans. Math. Software, 19:160–174, 1993. [CrossRef]
J. Demmel and B. Kågström. The generalized Schur decomposition of an arbitrary pencil A-λB: Robust software with error bounds and applications. Part II: Software and applications. ACM Trans. Math. Software, 19:175–201, 1993. [CrossRef]
B. Kågström and P. Wiberg. Extracting partial canonical structure for large scale eigenvalue problems. Numer. Algorithms, 24:195–237, 2000. [CrossRef]
M.W. Hirsh and S. Smale. Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, San Diego, CA, 1974. ISBN: 0-12-349550-4.
J. Palis and W. de Melo. Geometric Theory of Dynamical Systems: An Introduction. Springer–Verlag, New York, 1982. ISBN: 978-1-4612-5705-9.
S. Wiggins. Introduction to Applied Nonlinear Dynamical Systems and Chaos. Texts in Applied Mathematics. Springer, New York, second edition, 2003. ISBN: 0-387-00177-8-4.
D.K. Arrowsmith and C.M. Place. An Introduction to Dynamical Systems. Cambridge University Press, Cambridge, UK, 1990. ISBN: 978-0-521-31650-7.
S.-N. Chow, C. Li, and D. Wang. Normal Forms and Bifurcation of Planar Vector Fields. Cambridge University Press, London, UK, 1994. ISBN: 978-0-521-37226-8.
A. Edelman, E. Elmroth, and B. Kågström. A geometric approach to perturbation theory of matrices and matrix pencils. part i: Versal deformations. SIAM J. Matrix Anal. Appl., 18:653––692, 1997. [CrossRef]
A. Edelman, E. Elmroth, and B. Kågström. A geometric approach to perturbation theory of matrices and matrix pencils. part ii: A stratification-enhanced staircase algorithm. SIAM J. Matrix Anal. Appl., 20:667––699, 1999. [CrossRef]
R. Gilmore. Catastrophe Theory for Scientists and Engineers. Dover Publications, New York, 1993. ISBN: 0-486-67539-4.
L. Stolovitch. On the computation of a versal family of matrices. Numer. Algorithms, 4:25–46, 1993. [CrossRef]
A.A. Mailybaev. Computation of multiple eigenvalues and generalized eigenvectors for matrices dependent on parameters. Numer. Linear Algebra Appl., 13:419––436, 2006. [CrossRef]
A. Dmytryshyn. Versal deformations: A tool of linear algebra. Cornell University Library, ArXiv e–prints in Representation Theory [math.RT] 2312.14910v1, 1–15, 2023. [CrossRef]
M.W. Hirsh, S. Smale, and R.L. Devaney. Differential Equations, Dynamical Systems, and an Introduction to Chaos. Elsevier, Amsterdam, 2013. ISBN: 978-0-12-382010-5.
B. Kågström and A. Ruhe. An algorithm for numerical computation of the Jordan normal form of a complex matrix. ACM Trans. Math. Software, 6:398–419, 1980. [CrossRef]
B. Kågström and A. Ruhe. Algorithm 560: JNF, an algorithm for numerical computation of the Jordan normal form of a complex matrix. ACM Trans. Math. Software, 6:437–443, 1980. [CrossRef]
Z. Zeng. Computing multiple roots of inexact polynomials. Math. Comp., 74:869–903. [CrossRef]
Z. Zeng. Geometric modeling and regularization of algebraic problems. Cornell University Library, ArXiv e-prints in Numerical Analysis [math.NA] 2102.08830v1, 2021. [CrossRef]
Z. Zeng and T.-Y. Li. A numerical method for computing the Jordan canonical form. Cornell University Library, ArXiv e–prints in Numerical Analysis [math.NA] 2103.0206v1, 2021. [CrossRef]
V.N. Kublanovskaya. On a method of solving the complete eigenvalue problem for a degenerate matrix. USSR Computational Mathematics and Mathematical Physics, 6:1–14, 1966. [CrossRef]
A. Ruhe. An algorithm for numerical determination of the structure of a general matrix. BIT Numerical Mathematics, 10:196–216, 1970. [CrossRef]
P. Petkov. The Numerical Jordan Form. World Scientific, Singapore, 2024. ISBN: 978-981-12-8644-5.
V.I. Arnol’d. Lectures on bifurcations in versal families. Russian Math. Surveys, 27:54–123, 1972. [CrossRef]
A.A. Mailybaev. Transformation to versal deformations of matrices. Linear Algebra Appl., 337:87–108, 2001. [CrossRef]
H. Poincaré. Sur l’équilibre d’une masse fluide animé d’un mouvement de rotation. Acta Math., 7:259–380, 1885. [CrossRef]
P. Glendinning. Bifurcation theory. In N.J. Higham, editor, The Prinston Companion to Applied Mathematics, pages 393–402. Prinston University Press, Prinston, NJ, 2015. ISBN: 978-0-691-15039-0.
J.D. Crawford. Introduction to bifurcation theory. Rev. Mod. Phys., 63:991–1037, 1991. [CrossRef]
Y.A. Kuznetsov. Elements of Applied Bifurcation Theory. Springer, New York, fourth edition, 2004. [CrossRef]
A. P. Seyranian and A. A. Mailybaev. Multiparameter Stability Theory with Mechanical Applications. World Scientific, Singapore, 2003. ISBN: 981-238-406-5.

Figure 1. Manifolds in the 3–dimensional space.

Figure 2. Open chart.

Figure 3. An atlas of a manifold.

Figure 4. Stereographic projection of the sphere.

Figure 5. An atlas of the sphere.

Figure 6. Smooth map between two manifolds.

Figure 7. The projection of the torus

T^{2}

onto the circle

S^{1}

is a smooth map.

Figure 7. The projection of the torus

T^{2}

onto the circle

S^{1}

is a smooth map.

Figure 8. The ellipse

\frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}} < 1

and the open disk

x^{2} + y^{2} < 1

are diffeomorphic.

Figure 8. The ellipse

\frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}} < 1

and the open disk

x^{2} + y^{2} < 1

are diffeomorphic.

Figure 9. Tangent space of a manifold.

Figure 10. Differential of a smooth map.

Figure 11. Tangent space of a difeomorphic map of a sphere.

Figure 12. The tangent bundle of a circle.

Figure 13. The normal bundle of a circle.

Figure 14. A tubular neighborhood in

R^{2}

with normal lines.

Figure 14. A tubular neighborhood in

R^{2}

with normal lines.

Figure 15. A tubular neighborhood in

R^{3}

.

Figure 15. A tubular neighborhood in

R^{3}

.

Figure 16. A tubular neighborhood.

Figure 17. Critical points and values of the function

f (x, y, z) = z

.

Figure 17. Critical points and values of the function

f (x, y, z) = z

.

Figure 18. The case of a singular point.

Figure 19. Sard’s Theorem.

Figure 20. Transverse intersection of two manifolds.

Figure 21. Transverse planes in

R^{3}

.

Figure 21. Transverse planes in

R^{3}

.

Figure 22. Intersections which are not transverse.

Figure 23. Variety of

2 \times 2

singular matrices in the parameter space.

Figure 23. Variety of

2 \times 2

singular matrices in the parameter space.

Figure 24. Variety of

2 \times 2

matrices with a double eigenvalue.

Figure 24. Variety of

2 \times 2

matrices with a double eigenvalue.

Figure 25. Geometric interpretation of the condition number.

Figure 26. Perturbations in the solution of a linear system in the parameter space.

Figure 27. Sensitivity of the eigenvalues of a Jordan block for values of

ε

between

10^{- 8}

and

10^{- 6}

.

Figure 27. Sensitivity of the eigenvalues of a Jordan block for values of

ε

between

10^{- 8}

and

10^{- 6}

.

Figure 28. Third–order matrices with different condition numbers in the parameter space.

Figure 29. Intersection of the unit sphere with the variety of singular problems.

Figure 30. Probability distribution of the condition number with respect to inversion.

Figure 31. Probability distribution of the eigenvalue condition number.

Figure 32. Dimension and codimension of the orbit

O (A)

.

Figure 32. Dimension and codimension of the orbit

O (A)

.

Figure 33. Determining the numerical Jordan form.

Figure 34. A generic k-parameter family and the variety of singular cases.

Figure 35. Two–parameter matrix deformation and the manifold of singular cases.

Figure 36. Versal deformation.

Figure 37. The versal deformation

A (δ)

is transverse to

O (A_{0})

.

Figure 37. The versal deformation

A (δ)

is transverse to

O (A_{0})

.

Figure 38. The mapping

A : Δ \to M

is transverse to the manifold

N

.

Figure 38. The mapping

A : Δ \to M

is transverse to the manifold

N

.

Figure 39. Bifurcation diagram for 3rd order companion matrices.

Figure 40. Bifurcation diagram for

α^{3}

in the parameter plane.

Figure 40. Bifurcation diagram for

α^{3}

in the parameter plane.

Figure 41. Bifurcation diagram for matrices with double eigenvalue.

Figure 42. Swallowtail diagram for matrices of type

α^{4}

.

Figure 42. Swallowtail diagram for matrices of type

α^{4}

.

Table 1. Jordan forms corresponding to an n-tuple eigenvalue.

Most generic Jordan form
Single nth order Jordan block
$J = [\begin{matrix} λ & 1 \\ λ & 1 \\ λ & ⋱ \\ ⋱ & 1 \\ λ \end{matrix}]$ ,	$\begin{matrix} S = {n} \\ \dim (O) = n^{2} - n \\ codim (O) = n \\ \dim (B) = n^{2} - n + 1 \\ codim (B) = n - 1 \end{matrix}$
Intermediate cases
$1 < q < n$ Jordan blocks
with the same eigenvalue
$J = [\begin{matrix} J_{s_{1}} (λ) \\ J_{s_{2}} (λ) \\ J_{s_{3}} (λ) \\ ⋱ \\ J_{s_{q}} (λ) \end{matrix}]$ ,	$\begin{matrix} S = {s_{1}, s_{2}, \dots, s_{q}} \\ s_{1} \geq s_{2} \geq \dots \geq s_{q} \geq 1 \\ n - 1 \leq codim (B) \leq n^{2} - 1 \\ codim (B) = s_{1} + 3 s_{2} + \dots + (2 q - 1) s_{q} \end{matrix}$
Most degenerate Jordan form
n scalar blocks with the same eigenvalue
$J = [\begin{matrix} λ \\ λ \\ λ \\ ⋱ \\ λ \end{matrix}]$ ,	$\begin{matrix} S = {1, 1, \dots, 1} \\ \dim (O) = 0 \\ codim (O) = n^{2} \\ \dim (B) = 1 \\ codim (B) = n^{2} - 1 \end{matrix}$

Table 2. Most generic and most degenerate orbits and bundles of

n \times n

matrices.

Table 2. Most generic and most degenerate orbits and bundles of

n \times n

matrices.

Case	Number	Number	Segre characteristics	Dimensions
	of Jordan	of distinct		of orbits
	blocks	eigenvalues		and bundles
Most	n	n	$S = [{1}, {1}, \dots, {1}]$	$\dim (O) = n^{2} - n$
generic				$codim (O) = n$
				$\dim (B) = n^{2}$
				$codim (B) = 0$
Most	n	1	$S = {1, 1, \dots, 1}$	$\dim (O) = 0$
degene-				$codim (O) = n^{2}$
rate				$\dim (B) = 1$
				$codim (B) = n^{2} - 1$

Table 3. Bundles of Jordan forms represented by the swallowtail diagram.

Notation	Segre	$codim (B)$	$\dim (B)$	Representation
	characteristics			in $C^{3}$
$α^{4}$	{4}	3	0	point
$α^{3} β$	{3}, {1}	2	1	curve
$α^{2} β^{2}$	{2}, {2}	2	1	curve
$α^{2} β γ$	{2}, {1}, {1}	1	2	surface
$α β γ δ$	{1}, {1}, {1}, {1}	0	3	complement of
				the swallowtail

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Differential Topology and Matrix Analysis: An Overview

Abstract

Keywords:

Subject:

1. Introduction

2. A Glimpse into Differential Topology

2.1. Smooth Manifolds

2.2. Smooth Maps

2.3. Tangent Space

2.4. Differential of a Map

2.5. Tangent and Normal Bundle

2.6. Tubular Neighborhoods

2.7. Singular and Regular Points

2.8. Sard’s Theorem and Morse Functions

2.9. Transverse Intersection of Manifolds

2.10. Lie Groups

2.10.1. Basic Definitions

2.10.2. Matrix Groups

3. Geometry of Matrix Space

3.1. The Matrix Space

3.2. Generic and Well–Posed Problems

3.3. Conditioning of Computational Problems

3.4. Condition Numbers

3.5. Conditioning of Basic Matrix Problems

3.5.1. Conditioning of a Linear System of Equations

3.5.2. Conditioning of the Eigenvalue Problem

3.6. Distance to an Ill–Posed Problem and Conditioning

3.6.1. Distance to the Set of Singular Matrices

3.6.2. Distance to the Set of Defect Matrices

3.7. Probabilistic Distribution of Condition Numbers

3.7.1. Probability Conditioning of Matrix Inversion

3.7.2. Probability Conditioning of Eigenvalues

4. Geometry of Matrix Rank

4.1. Orbits of Matrices with Constant Rank

4.2. Dimension of an Orbit with Fixed Rank

4.3. Stratification of Orbits with Fixed Rank

4.4. Numerical Rank of a Matrix

5. Geometry of Jordan Form

5.1. Orbits of Matrices with Fixed Jordan Form

5.2. Generic and Nongeneric Jordan Bundles

5.3. The Reduction into Jordan Form as an Ill–Posed Problem

5.4. Numerical Jordan Form

6. Matrices Depending on Parameters

6.1. Matrix Deformations

6.2. Versal Deformations

6.3. Bifurcation Diagrams

7. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

MDPI Initiatives

Important Links

Subscribe