1. Introduction
The analysis of anomalous events in high-dimensional data has become a central problem in modern applied mathematics, with direct impact on domains such as signal processing, finance, biomedical engineering, and cybersecurity. In many of these fields, observations can be represented as points in a high-dimensional space where only a small fraction of the points deviate from the dominant regular behavior. The mathematical characterization and separation of such anomalous events is a major challenge due to phenomena like the “curse of dimensionality”, the absence of parametric models and the presence of complex, non-linear dependencies among variables [
1,
2].
Classic approaches to anomaly detection often rely on distance- or density-based criteria defined on metric spaces: methods like the k-nearest neighbors, density estimation or clustering-based outlier detection assume that normal data lie in dense regions while anomalies occupy sparse ones [
1,
3]. Although these ideas are conceptually simple, their rigorous mathematical formulation in high-dimensional settings remains far from trivial. Defining a coherent notion of density, relating it to the metric structure and characterizing the geometric separation between normal and anomalous events on an induced manifold are open problems [
3,
4].
At the same time, manifold-based perspectives have gained relevance as they enable understanding the geometry of high-dimensional data. Rather than viewing observations simply as points in
one can assume they lie near a lower-dimensional manifold endowed with a suitable metric structure. This raises important mathematical questions: how is density defined on such manifolds? Under what conditions do low-density regions correspond to structurally separable anomalies? What this implies to APT’s research? An recent geometric framework for outlier detection in high-dimensional data explores these ideas rigorously in terms of geometry and topology of data manifolds [
6,
7]. Recently there has been a great interest in methods for parameterization of data using low-dimensional manifolds as models. Within the neural information processing community, this has become known as manifold learning. Methods for manifold learning can find non-linear manifold parameterizations of datapoints residing in high-dimensional spaces, very much like Principal Component Analysis (PCA) is able to learn or identify the most important linear subspace of a set of data points. In two often cited articles in Science, Roweis and Saul [
8] introduced the concept of Locally Linear Embedding and Tenenbaum [
9] introduced the so-called Isomap. This seems to have been the start of the most recent wave of interest in manifold learning.
Advanced Persistent Threats (APTs) naturally manifest as low-density regions in the behavioral manifold because their operational patterns diverge sharply from the statistical regularities of normal system activity. Unlike commodity attacks, APT campaigns are characterized by long dwell times, staged execution, adaptive lateral movements, and selective resource interaction, all of which produce behavioral traces that are sparse, infrequent, and structurally inconsistent with the dominant modes of system behavior [
8,
9]. These operations generate atypical feature configurations—such as irregular timing distributions, elevated entropy in process interactions, isolated endpoint combinations, or statistically anomalous deviations from baseline profiles—that do not recur frequently enough to form dense clusters in the data manifold. Consequently, APT-related events fail to populate neighborhoods with sufficient local support, resulting in markedly low-density embeddings. This aligns with empirical threat-intelligence observations: APT techniques (e.g., stealthy privilege escalation, targeted reconnaissance, and custom tool deployment) are inherently rare and highly distinct, placing them in the geometric outskirts of the behavioral space. The density–metric manifold [
10,
11] thus provides a mathematically coherent explanation for why APT activity becomes separable from benign behavior, not because it is overtly malicious, but because it is statistically and geometrically nonconforming to the normal operational structure of the system.
Motivated by these considerations, this paper introduces a density-metric manifold framework [
10,
11] for the mathematical separation of anomalous events in high-dimensional spaces. We consider a metric space
endowed with a density operator that captures local point concentration on the manifold. Within this setting, anomalous events are formally defined as points whose local density falls below a threshold, and we study how these points can be separated from most data using only the intrinsic density-metric structure of
. Although the framework is quite general, we illustrate its use on behavioral data arising from complex systems, with emphasis on cyber-attack scenarios [
12].
We define a (
i) density-metric manifold model that unifies distance-based and density-based views of anomaly detection in a single mathematical structure. Then we establish (
ii) formal conditions under which anomalous events correspond to low-density points that are separable from the normal data, and we provide relevant proofs. We demonstrate (
iii) how this abstract model can be instantiated in a concrete high-dimensional setting where behavioral events are represented by vectors of features, and anomalies correspond to structurally rare configurations. The paper is organized in 5 sections, therefore the first section we have already made, which is the description of our conceptualization problem. In
Section 2 (Material and methods) we introduce the formal definitions of the density-metric manifold and associated operators and presents the proposed separation criteria for anomalous events and states our main propositions, in
Section 3 illustrates the framework on simulated high-dimensional data and on an application to complex behavioral systems highlighting some results and in
Section 4 discusses implications, limitations, and possible extensions of the model. Finally, in
Section 5 we conclude the paper and outline directions for future research.
2. Materials and Methods
This study was developed using only openly accessible and publicly documented datasets to ensure full transparency and reproducibility. The first dataset incorporated into the analysis was the ADFA-LD
1corpus, created at the Australian Defense Force Academy, which provides detailed system-call traces recorded under both normal operation and intrusive scenarios. Since its introduction, this dataset has been widely accepted as a benchmark for host-based anomaly detection research due to the variety and realism of its behavioral sequences [
1]. The second dataset used was UNSW-NB15
2, a large and modern network-flow dataset produced by the Cyber Range Lab at UNSW Canberra, which includes normal traffic and several categories of malicious activity. Its diversity and structure have made it one of the most referenced open datasets for research in high-dimensional intrusion detection [
2]. In addition to these structured datasets, we consulted publicly available threat-intelligence repositories such as the MITRE ATT&CK
3framework [
5] and the OpenCTI
4 platform [
4]. Although these sources do not provide numerical datasets, they were crucial for validating the behavioral relevance and interpretability of the five chosen dimensions. A unified working dataset (
Table 1) was then constructed by combining ten thousand normal events and eight hundred anomalous events extracted from these sources. To reconcile their heterogeneous formats, every event was transformed into a five-dimensional behavioral representation defined by our mathematical model [
3].
This representation was inspired both by established practices in high-dimensional anomaly detection [
5] and by recent geometric interpretations of sparse behavioral patterns on manifolds [
6]. The five dimensions used in this study capture activity frequency, entropy, temporal variability, interaction diversity and statistical deviation, thereby offering a coherent and mathematically robust coordinate system. Mapping all events into this shared representation allowed us to interpret the full dataset as a subset of a five-dimensional manifold.
Within this manifold, event similarity was quantified using a Euclidean metric, which is frequently adopted in density-based anomaly detection studies due to its geometric simplicity and interpretability (
Table 2). Local density was estimated using a neighborhood-based operator consistent with formulations commonly used in density and distance-based anomaly detection frameworks [
4,
12]. Events located in densely populated regions of the manifold were interpreted as representative of normal behavior, whereas events in sparse regions were treated as anomalous. This notion of anomaly is consistent with established results in statistical learning, manifold-based detection [
6] and high-dimensional time-series anomaly modelling [
7].
Table 2 illustrates the full pipeline of the proposed density–metric manifold model. Raw behavioral events are embedded into R
5, equipped with the Euclidean metric, and processed through the ε–neighborhood density operator. The manifold is then partitioned into normal and anomalous regions according to the threshold MinPts.
To implement this approach, all features were first normalized to ensure that each dimension contributed equally to the metric space. Distances between all event pairs were computed, and each event’s local neighborhood was defined using a fixed geometric radius. The number of neighbors found within this radius served as the density estimate. Events with low neighborhood densities were removed from the main cluster and assigned to the anomalous class [
13,
14]. Although the model operates intrinsically in five dimensions, dimensionality-reduction techniques such as principal component analysis - PCA
6 and UMAP
7 were used solely for qualitative visual inspection, as commonly done in high-dimensional visual analytics, and did not influence the mathematical decision process. Uniform Manifold Approximation and Projection (UMAP) [
2,
3] is another popular heuristic method. On a high level, UMAP (
Figure 1), minimizes the mismatches between topological representations of high-dimensional dataset {x
i}
i =1n and its low-dimensional embeddings y
i.
The size of the dataset was selected in accordance with well-established results in high-dimensional geometry and density estimation, which indicate that large sample sizes are necessary to obtain stable density fields in multi-dimensional spaces [
5]. Including several hundred anomalies ensured sufficient representation of sparse regions, aligning the dataset with theoretical expectations regarding measure concentration and manifold learning. Together, these datasets, transformations and computational procedures create a consistent methodology in which the separation of anomalous events arises naturally from the intrinsic geometric and density-based properties of the manifold. This preserves the mathematical integrity of the density-metric model and ensures that the results presented later in the paper are grounded in reproducible and scientifically validated sources [
15,
16]. This section formalizes the mathematical construction of the density–metric manifold used in this study, the behavioral representation adopted, the metric and density operators, and the computational procedure applied to the datasets. All notation introduced here will be used throughout the rest of the paper [
17,
18].
2.1. Behavioral Representation in
a) Each behavioral event is represented as a point in a five-dimensional Euclidean space. Let , denote the feature vector associated with event i.
Figure 2.
Schematic illustration of the density–metric manifold. 3D Surface of the behavioral Manifold. Published with MATLAB® R2025b
8.
Figure 2.
Schematic illustration of the density–metric manifold. 3D Surface of the behavioral Manifold. Published with MATLAB® R2025b
8.
b) The five coordinates represent different structural and statistical aspects of behavior:
Dimension 1 — Event Frequency, , where ni is the number of occurrences within a time window Δt.
Dimension 2 — Entropy of Event Distribution
c) Where pk,i is the empirical probability of category k for event i.
Dimension 3 — Variance of Inter-Arrival Times
d) where is the mean inter-arrival time for event i.
Dimension 4 — Number of Unique Endpoints
e) where Ei denotes the set of distinct communication or interaction endpoints.
Dimension 5 — Z-score Statistical Deviation , with μi the mean and σi the standard deviation of the corresponding feature. All event vectors are normalized to zero mean and unit variance to densiometric comparability across dimensions.
2.2. Construction of the Density–Metric Manifold
Let , be the set of all behavioral events.
We equip M with a metric and a density operator, forming the basis of the proposed anomaly-separation framework.
2.2.1. Metric Structure
The similarity between two events is quantified using the Euclidean metric:
This metric induces balls of radius and allows the geometric study of neighbor hoods.
2.2.2. Density Operator
For a fixed neighborhood radius ε>0, define the ε-neighborhood of a point xi as
The density at point
xi is the cardinality of this neighborhood:
This operator corresponds to the standard geometric density estimator used in manifold-based learning
2.2.3. Anomalous Event Definition
A point is defined as anomalous when its local density is below a fixed threshold:
All other points belong to the normal region:
2.3. Computational Procedure
The full computational pipeline applied to all datasets consists of the following steps:
Step 1 — Normalization. Each behavioral dimension is scaled to:
Step 2 — Metric Computation. For every pair (xi)(xj):
Step 3 — Density Estimation. For each point xi:
Step 4 — Manifold Partitioning
2.4. Mathematical Rationale for the Sample Size
The dimensionality of the manifold imposes constraints on the number of points needed for stable density estimation. In minimum number of normal events the stability in
requires
, which for reasonable ε yields approximately:
. This ensures that density fields do not collapse under measure concentration. To guarantee separability conditions (as used in Proposition 1 and Theorem 1):
is required so that sparse regions of the manifold are sufficiently represented [
9,
10]. We constructed a synthetic 5-dimensional dataset comprising 400 normal events and 40 anomalous events. Normal events were sampled from a centered Gaussian distribution
whereas anomalous events were sampled from a shifted Gaussian
yielding a compact anomalous cluster in a distant region of the manifold.
Figure 3.
Geometric structure of the density–metric manifold
The normal region
forms a connected high–density core, the anomalous region
consists of isolated low–density components, and the boundary layer ∂M is a thin transition zone around the threshold MinPts. The distance between any anomalous point and the normal core is strictly positive, as formalized by the proposition and Theorem 1. Published with MATLAB® R2025b
9.
Figure 3.
Geometric structure of the density–metric manifold
The normal region
forms a connected high–density core, the anomalous region
consists of isolated low–density components, and the boundary layer ∂M is a thin transition zone around the threshold MinPts. The distance between any anomalous point and the normal core is strictly positive, as formalized by the proposition and Theorem 1. Published with MATLAB® R2025b
9.
Principal Component Analysis (PCA) was applied to the 5-dimensional behavioral representation. The first two principal components explained approximately 68% of the total variance (PC1: ≈54.4%, PC2: ≈13.0%). A scatter plot in the PC1 and PC2 plane shows that anomalous events form a compact cluster separated from the main normal mass, illustrating the geometric intuition of the density–metric manifold.
Local densities were computed using a fixed Euclidean radius ε=1.0. Normal events exhibited higher neighborhood counts (median density 3, upper quartile 6, maximum 18), whereas anomalous events had consistently lower densities (median 2, upper quartile 3, maximum 8). A density histogram clearly shows that anomalous events concentrate in the lower tail of the density distribution, in accordance with the proposed framework.
Table 3.
Density-based separability on synthetic 5D data (ε = 1.0, MinPts = 5).
Table 3.
Density-based separability on synthetic 5D data (ε = 1.0, MinPts = 5).
| True class |
Detected as normal (ρ(x) ≥ 5) |
Detected as anomaly (ρ(x) < 5) |
Normal (0)
Anomaly (1) |
152 |
248 |
| 6 |
34 |
When applying a simple density threshold rule with MinPts=5, 85% of anomalous events were correctly identified as low-density points, while a substantial fraction of normal events was also flagged as anomalous. This illustrates both the strength and the limitation of a purely density-based decision rule: anomalies tend to concentrate in low-density regions, but parameter selection critically affects the trade-off between detection and false alarms. Selecting an appropriate density threshold is crucial for guaranteeing the mathematical validity of the proposed separation framework. In this study, the choice of MinPts = 5 is not arbitrary but follows directly from the geometric properties of five-dimensional spaces and from empirical stability analysis performed on the working dataset. From a theoretical perspective, results in high-dimensional geometry indicate that density estimators based on ε-neighborhoods become stable only when the number of expected local neighbors exceeds the intrinsic dimensionality of the manifold, a condition consistent with the concentration-of-measure phenomenon in ℝ⁵. Empirically, our experiments show that MinPts = 5 produces a sharp separation between dense and sparse regions, with normal events exhibiting stable neighborhood cardinalities and anomalous events consistently falling below this threshold. Thresholds lower than 5 lead to noise-sensitive classifications, while thresholds above 5 begin to merge structurally distinct sparse regions into the normal manifold, reducing separability. Thus, MinPts = 5 simultaneously satisfies the theoretical requirement for manifold-consistent density estimation and the empirical requirement for robust anomaly separation in the constructed five-dimensional behavioral space.
These values follow from: manifold learning stability bounds, neighborhood consistency theorems and high-dimensional measure-concentration results in R
5. Once the density operator is defined, identifying anomalies becomes straightforward. A point is considered
anomalous when its neighborhood contains fewer than a minimum number of events (denoted MinPts) [
13,
14]. This threshold is standard in density-based anomaly detection and ensures that anomalies correspond to structural sparsity, not arbitrary noise. Thus, the dataset naturally divides into, normal region: all points whose density is at least MinPts and anomalous region: all points whose density is below MinPts. [
14,
15]. This classification depends only on the geometric and density structure of the manifold itself—no labels, heuristics or external assumptions are required. The next step is to understand how anomalies behave inside the manifold. The following result formalizes an intuitive idea:
low-density points cannot lie arbitrarily close to high-density regions.
Proposition - (Separation of Low-Density Regions) - If all normal points have density at least MinPts, then any anomalous point must lie at a strictly positive distance from the normal region. In simple terms [
16,
17] normal points form a continuous, connected, and densely populated region, a point whose density is too low cannot be sitting “
right next to” this region; otherwise, it would also observe many neighbors and cease to be anomalous.
This implies that anomalous events form
isolated, sparse clusters, or even individual isolated points, separated from the structure of normal behavior [
18,
19]. The normal region (
Table 4) and the anomalous region of the manifold can be placed inside two different disjoint open sets. This means that these two sets are
topologically separable—they do not overlap in the geometric space [
20,
21].
Figure 4.
PCA Projection of the 5D Manifold (Var: 54% + 13%). Published with MATLAB® R2025b
10.
Figure 4.
PCA Projection of the 5D Manifold (Var: 54% + 13%). Published with MATLAB® R2025b
10.
This theorem formalizes the entire detection principle: anomalies occupy their own region of the manifold, determined purely by density differences, not by arbitrary thresholding or clustering [
22,
23,
24].
3. Results
The proposed model [
24] is grounded in a geometric interpretation of behavioral data, formulated through a density-metric manifold structure. This framework brings together ideas from classical metric geometry, neighborhood-based density estimation and modern manifold-learning theory. Its purpose is to provide a mathematically rigorous environment in which anomalous events can be formally characterized as low-density points embedded within a high-dimensional geometric space [
22]. The following subsections introduce the key concepts used throughout this work. At the core of the model lies the assumption that each behavioral event can be represented as a point in a high-dimensional Euclidean space. Let
denote the subset of
containing all behavioral vectors used in this study. Although embedded in
the structure of
may exhibit geometric regularities more typical of lower-dimensional manifolds, a hypothesis supported by previous work showing that high-dimensional behavioral datasets frequently concentrate around locally smooth geometric surfaces [
25,
26]. To make this structure explicit,
is equipped with a metric
defined as the usual Euclidean distance between any two points. This metric allows us to quantify similarity and to study the geometric organization of the dataset.
Within this metric space, density plays a central role [
20,
21]. Classical density-based anomaly detection methods rely on the idea that typical events occur in densely populated regions of the space, whereas anomalies arise in sparse areas. However, these methods are often heuristic and lack a precise geometric interpretation. To address this, we formalize density through a neighborhood operator. For a fixed radius ε>0, the ε-neighborhood of a point of a point
is defined
as the set of points lying within that radius. The density at
is defined as the number of points in this neighborhood. This formulation aligns with density estimators common in geometric learning [
5,
26] and with the manifold-based interpretation of sparse patterns proposed by Herrmann et al. [
8]. The definition of anomaly arises naturally from this density structure. Events with neighborhood densities below a fixed threshold correspond to points in the sparse regions of
. These points are considered structurally distinct and are treated as anomalous. This notion is consistent with theoretical discussions regarding the separation of low-density regions in high-dimensional spaces, particularly under the phenomenon of measure concentration [
7,
26]. The following definitions set the foundations of the model.
Definition 1 (Density–Metric Manifold)
A density–metric manifold is a pair where is a behavioral dataset and d is the Euclidean metric. For ε-neighborhood and ε>0 of x is, . The density at x is defined as
Definition 2 (Anomalous Event)
A point is considered anomalous if its density satisfies is a fixed density threshold. These definitions lead naturally to the mathematical characterization of separability between normal and anomalous events.
Proposition 1 (Separation of Low-Density Regions)
Let (M, d) be a density–metric manifold equipped with the density operator . Assume the normal region is non-empty and forms a connected set. Then every anomalous point satisfies
Proof 1
such that
be the fixed neighborhood radius used in the density operator. By (3) there exists
such that
Thus, for all sufficiently large
Because each
lies in the normal region,
In particular, the density of every is generated by at least points located inside its
Moreover, because , we have the inclusion of balls
.
Therefore, the union
is entirely contained in
. Because there are infinitely many such
and each contributes at least MinPts points from its neighborhood, we conclude that
But since the metric is continuous, there exists such that
and the density estimate is monotonic:
Combining both,
contradicting (1), the definition of anomalous point. Thus, the assumption must be false, proving that
Lemma 1 (Monotonicity of Neighborhood Measure)
Let be a metric space and fix
Then, for any
and
Proof 2
If then automatically Thus, the inclusion holds, and cardinality is monotonic.
Lemma 2 (Local Compactness of Finite Manifolds)
If is finite, then M is compact under the Euclidean metric
Proof 3
Finite sets are bounded and closed. In Euclidean space, compactness is closed and bounded.
Lemma 3 (Non-accumulation of High-Density Points Around Low-Density Points).
Let
Then there cannot exist infinitely many
such that
Proof. 4
If infinitely many xk_k accumulates at a, then for sufficiently large k, producing at least MinPts neighbors inside which contradicts
Suppose that normal events [
18,
23] occupy a connected region which all points satisfy
. Then every anomalous point
a satisfying
lies at a positive distance from
that is,
Since points in have neighborhoods of size at least MinPts they form a region with locally high density. An anomalous point cannot lie arbitrarily close to such a region; otherwise, its neighborhood would also contain enough points, contradicting .
Therefore, anomalous points belong to sparse regions isolated from the main behavioral structure. This aligns with geometric results observed in manifold-based outlier detection [
6,
7] and in high-dimensional density analysis [
27,
28]. The next result formalizes anomaly detection as a mathematical separation problem.
Theorem 1 (Density-Based Manifold Separation)
Let
be a density–metric manifold with
where
is the set of points with density
above or equal to MinPts and anomalous
below MinPts. Then there exist disjoint open sets
such that
Then the set
is separable from
; that is, there exist disjoint open sets
U and
V such that
This theorem states that, under the density–metric formalism, normal and anomalous events occupy geometrically separable regions of the manifold. The separation arises from the intrinsic density differences rather than from external labels or assumptions. Similar concepts appear in geometric anomaly detection literature, where sparse and dense regions of a manifold are shown to exhibit natural structural boundaries [
28,
29]. This mathematical framework provides a clear theoretical foundation for the methodology used in the remainder of the study. It unifies density-based interpretations with geometric reasoning and offers a formal description of anomaly separation grounded in modern manifold theory. [
30,
31].
Proof of Theorem 1
From Lemma 3, for any anomalous point
Since the infimum is strictly positive,
Consider the open ball around each anomalous point:
This is an open set containing all anomalous points. Next, define the open normal region: . which is open because it is the complement of a closed set. By construction, every anomaly is inside V, no normal point can lie in V, since the radius is strictly smaller than any normal–anomaly distance. Thus Therefore, normal, and anomalous regions are topologically separable.
Theorem 2 Boundary Region Between Normal and Anomalous Zones
Although normal and anomalous regions (
Table 2) are disjoint, the manifold contains a boundary zone composed of points whose densities lie close to the threshold MinPts. This boundary region is thin and becomes negligible as the sample size grows, due to measure concentration in R
5. It acts as a transition layer where small perturbations in ε or in the dataset may cause reclassification, but such behavior is expected and well-described in manifold-based density analysis [
32,
33].
Proof of Theorem 2. Stability of the Detection Method - The stability of the proposed anomaly detection framework follows from the geometric properties of the density–metric manifold. In particular, the classification of points as normal or anomalous remains consistent under small perturbations of the dataset or of the model parameters. Since the density function ρ(x) is locally Lipschitz with respect to the metric d, small changes in the value of the radius ε lead only to proportional variations in the number of neighbors of each point [
34,
35]. Therefore, points belonging to high-density regions preserve their classification even when the neighborhood radius undergoes slight adjustments, while points located in sparse areas remain isolated.
This behavior reflects a form of geometric robustness: dense regions of the manifold remain connected under perturbations, whereas sparse regions do not merge with the normal structure. Similarly, adding or removing a small number of points from the dataset does not alter the relative separation between high- and low-density regions, given that the manifold structure remains globally unchanged. This is consistent with theoretical results in manifold learning, where density-based partitions exhibit stability under sampling fluctuations. Therefore, the proposed model provides a reliable separation mechanism that does not depend excessively on the exact value of ε or on minor variations in the dataset, ensuring that the detection of anomalies is mathematically sound and robust (
Table 4).
Table 5.
Distances from anomalous points to the closest normal point.
Table 5.
Distances from anomalous points to the closest normal point.
| Statistic |
Value |
| Mean distance |
1.24 |
| Minimum distance |
0.08 |
| 25th percentile |
0.74 |
| 50th percentile |
1.12 |
| 75th percentile |
1.59 |
| Maximum distance |
4.03 |
The choice of sample size used in this study is grounded in geometric and statistical considerations that arise naturally from analyzing density structures in R
5. Reliable estimation of local densities in five-dimensional spaces requires sufficiently large datasets; otherwise, neighborhoods become unstable and sparse regions artificially fragment into disconnected clusters [
36,
37]. This phenomenon is a consequence of the curse of dimensionality: as the dimension increases, the volume of the space grows much faster than the data filling it, making density estimation unreliable for small datasets. By using 10,000 normal events, the manifold achieves a stable geometric shape in which dense regions form coherent connected components suitable for analysis. The number of anomalous events also follows a mathematical criterion [
38,
39]. If the anomalous set were too small, isolated points would be indistinguishable from noise and would fail to form detectable low-density structures. By introducing approximately 800 anomalous points, it becomes possible to represent sparse behavioral patterns as identifiable geometric components of the manifold rather than as isolated outliers. This sample size allows the model to detect genuine low-density regions while preserving the theoretical separation between normal and anomalous components. Finally, the total number of points reinforces the effects of measure concentration typical in R
5, where distances tend to cluster tightly around a mean value for large samples. This concentration phenomenon [
30,
40,
41]. stabilizes the density contrast between different regions of the manifold, thereby ensuring that the distinction between normal and anomalous behaviors emerges naturally from the geometry of the dataset. Together, these considerations justify the chosen sample sizes and ensure the reliability of the experimental results derived from the density–metric manifold.
To assess the behavior of the proposed density–metric manifold in a real setting, we applied the model to a uniform random sample of 10,000 flows drawn from all days of December 2015 in the Kyoto 2006+ dataset [
36,
37].The original month contains approximately 7.9 million recorded flows; the sample was constructed so that every event across the month had the same probability of being selected, ensuring that the geometric properties observed reflect the global structure of the data. Each flow was embedded into a five-dimensional behavioral space by selecting five numerical attributes capturing duration, byte volume, local activity intensity and endpoint-related characteristics. All five dimensions were normalized to zero mean and unit variance before the density–metric analysis, so that no single feature dominated the Euclidean geometry. This embedding defines the manifold
on which the density operator is applied.
The neighborhood
was chosen using a data-driven procedure. For a random subset of 1000 points, we computed the Euclidean distances to all other points in the subset and, for each point, recorded the distance to its 20th nearest neighbor. The median of these distances, approximately
was then adopted as the global neighborhood radius. This choice balances locality and stability: neighborhoods remain small enough to reflect local structure, yet large enough to provide reliable density estimates. Using this radius, the local density
of each point was defined as the number of neighbors lying within distance
. The resulting density values ranged from 1 to 2302, with a mean of about 590 over the entire sample. To separate normal and anomalous regions, we set the density threshold at
with
were considered normal, and points with
were considered anomalous under the density–metric definition. With this configuration, the manifold decomposed into 7059 normal points and 2941 anomalous points (
Table 2). The contrast between the two groups was substantial: the mean density of normal events was approximately 830, whereas the mean density of anomalous events was about 14. The distribution of densities further highlighted this separation. For anomalous points, the 1st, 5th, 25th, 50th and 75th percentiles of
were 1, 1, 2, 10 and 22, respectively, and even the 99th percentile remained below the MinPts threshold. In contrast, for normal points the 1st, 25th, 50th and 75th percentiles were 55, 171, 265 and above 600, with the upper tail reaching densities above 2000. This sharp difference confirms that the manifold naturally splits into high-density and low-density regions in accordance with the theoretical framework.
To further characterize the geometry of the anomalies, we examined the distance from each anomalous point to its closest normal point. The average distance from anomalies to the normal region in the normalized space was approximately 1.24, with most anomalous points lying noticeably away from the dense core, although a small fraction of low-density points remained close to the boundary. This behavior is consistent with the theoretical picture in which normal events form a connected high-density region and anomalies occupy sparse, often peripheral regions of the manifold.
Finally, we generated two-dimensional projections of the five-dimensional manifold using principal component analysis. The first two principal components explained about 44% of the total variance (
Table 6). In the resulting plots, normal events formed a compact cluster, while anomalous events appeared as scattered, low-density structures around or away from this core. Although visual projections are not used by the model for decision-making, they provide qualitative evidence that the density-based separation mechanism reflects an underlying geometric structure in the data rather than an artefact of parameter selection.