Adaptive Anomaly Detection for Non-Stationary Time-Series: A Continual Learning Framework with Dynamic Distribution Monitoring

Yingxin Ou; Sumeng Huang; Feiyang Wang; Kan Zhou; Yingyi Shu

doi:10.20944/preprints202512.2403.v1

Submitted:

25 December 2025

Posted:

26 December 2025

You are already at the latest version

Abstract

Non-stationary time-series data poses significant challenges for anomaly detection systems due to evolving patterns and distribution shifts that render traditional static models ineffective. This paper presents a novel continual learning framework that integrates dynamic distribution monitoring mechanisms to enable adaptive anomaly detection in non-stationary environments. The proposed framework employs a dual-module architecture consisting of a distribution drift detector and an adaptive learning component. The distribution drift detector utilizes statistical hypothesis testing to identify temporal shifts in data distributions, while the adaptive learning module employs rehearsal-based continual learning strategies with dynamic memory management to maintain model performance across evolving patterns. We introduce a hybrid loss function that balances stability and plasticity, preventing catastrophic forgetting while enabling rapid adaptation to new distributions. Experimental results demonstrate an average F1-score improvement of 11.3% over the best-performing baseline, highlighting the robustness and adaptability of the proposed framework under non-stationary conditions while maintaining computational efficiency suitable for real-time applications.

Keywords:

anomaly detection

;

non-stationary time-series

;

continual learning

;

distribution shift

;

concept drift

;

adaptive learning

;

dynamic monitoring

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

I. Introduction

Time-series anomaly detection has emerged as a critical component in numerous domains, including industrial monitoring, cybersecurity, financial fraud detection, and healthcare systems [1], Traditional anomaly detection approaches typically assume that time-series data follow a stationary distribution in which statistical properties remain constant over time. However, real-world systems frequently exhibit non-stationary behavior characterized by temporal distribution shifts, evolving patterns, and concept drift. These dynamic changes can severely degrade the performance of static detection models, leading to increased false alarm rates and reduced sensitivity to genuine anomalies.

Detecting anomalies in non-stationary environments has attracted considerable attention. Deep-learning models-notably autoencoders, recurrent neural networks, and transformers- have shown a strong ability to capture complex temporal dependencies in time-series data [2]. Despite these advances, many methods fail when distribution shifts violate the i.i.d. assumption. If incoming data statistics deviate substantially from the training distribution, model performance often degrades rapidly, requiring frequent retraining or manual intervention.

Continual learning, also known as lifelong learning, offers a promising paradigm for addressing non-stationarity by enabling models to incrementally acquire new knowledge while preserving previously learned information [3]. Recent work has begun exploring the intersection of continual learning and anomaly detection, particularly in the context of concept drift adaptation [4]. However, existing approaches often lack explicit mechanisms for detecting and characterizing distribution shifts, relying instead on passive adaptation strategies that may respond too slowly to abrupt changes or too aggressively to benign fluctuations.

This paper introduces a comprehensive framework for adaptive anomaly detection in non-stationary time series that explicitly incorporates distribution monitoring capabilities. Our approach makes several key contributions. First, we develop a distribution drift detection module based on statistical hypothesis testing that continuously monitors incoming data streams for significant distributional changes. This module distinguishes between virtual drift, which affects input distributions without impacting decision boundaries, and actual drift, which necessitates model adaptation [5]. Second, we propose an adaptive learning architecture that employs rehearsal-based continual learning with dynamic memory management, allowing the system to selectively retain representative samples from historical distributions while efficiently incorporating new patterns. Third, we introduce a hybrid loss function that explicitly balances the competing objectives of stability and plasticity, preventing catastrophic forgetting while enabling rapid adaptation to evolving data distributions.

Our framework addresses several limitations of existing approaches. Unlike methods that treat all distributional changes uniformly, our system adaptively modulates its learning rate and memory update strategy based on the detected severity and type of drift. Furthermore, our approach maintains computational efficiency through careful design choices, making it suitable for deployment in resource-constrained environments requiring real-time processing. We validate our framework through extensive experiments on multiple benchmark datasets representing diverse application domains, demonstrating consistent improvements over state-of-the-art baselines in terms of detection accuracy, adaptation speed, and computational efficiency.

II. Related Work

Time-series Non-stationary anomaly detection is commonly addressed by combining explicit drift monitoring with continual adaptation. Adaptive windowing provides an online hypothesis-testing mechanism to detect distribution changes and trigger updates under time-varying streams [6]. Recent continual learning under concept drift emphasizes adaptive memory realignment so rehearsal buffers remain representative as patterns evolve, reducing catastrophic forgetting while preserving fast adaptation [7], and adaptive-learning anomaly detection frameworks similarly couple drift-aware updating with practical online constraints [8]. For temporal representation learning, classical sequence encoders such as LSTM remain a foundational choice for capturing long-range dependencies [9], while attention-enhanced LSTM variants improve selectivity by focusing on salient segments for more stable detection signals [10]. Alternative backbones such as temporal convolution with attention provide efficient multi-scale temporal modeling [11], and Transformer-style modeling is often paired with change-point detection to explicitly capture regime transitions that coincide with drift events [12]. Attention-based sequence modeling more broadly continues to be used to extract robust temporal dependencies from complex time series [13].

To further improve robustness under evolving patterns, a range of controlled adaptation and structured modeling techniques have been explored. System-level evolution and coordination mechanisms provide general strategies for maintaining performance as components and conditions change [14], while uncertainty- and risk-aware summarization offers a principled way to compress evolving context without discarding risk-relevant information, which is useful when maintaining stable memory/replay or state summaries over time [15]. Explainable representation learning encourages feature spaces that support more transparent decisions and updates [16]. Graph-based modeling introduces relational inductive bias that can stabilize representations when signals are coupled across variables or entities [17]. Parameter-efficient adaptation methods such as modular adapters with structural priors enable constrained updates that better balance stability and plasticity [18], and multi-scale LoRA provides lightweight adaptation across granularities for rapid post-drift adjustment [19]. Information-constrained retrieval complements these ideas by explicitly controlling what external evidence is incorporated into the learning loop, reducing noise amplification during adaptation [20]. Finally, meta-learning targets rapid adaptation under scarcity and evolving patterns, offering transferable principles for fast recovery after drift is detected [21].

III. Methodology

A. Problem Formulation

Consider a continuous time-series data stream

X = {x_{1}, x_{2}, \dots {, x}_{t}, \dots}

where

x_{t} \in R^{d}

represents a d-dimensional observation at time t. Our objective is to detect anomalies in this stream while adapting to distribution shifts. We define the anomaly detection task as learning a scoring function

f_{θ}

:

R^{d} \to R

parameterized by

θ

, where higher scores indicate greater anomaly likelihood.

In non-stationary environments, the joint distribution

P_{t} (X, Y)

evolves over time, where

Y \in {0,1}

denotes normal or anomalous labels. Following the taxonomy of distribution shifts [13], we distinguish between two types of drift: virtual drift affecting

P (X)

and actual drift affecting

P (Y | X)

. Our framework focuses on detecting and adapting to actual drift, which directly impacts detection performance.

B. Distribution Drift Detection Module

The drift detection module continuously monitors the incoming data stream to identify material distribution shifts that can invalidate the assumptions of a static anomaly detector. We employ a two-phase strategy that unifies (i) statistical hypothesis testing for shift discovery and (ii) performance monitoring for shift qualification, so that adaptation is triggered only when the detected drift is both statistically credible and operationally consequential.

In the first phase, we perform hypothesis testing between a reference window and an incoming window to flag distributional change points. This choice is motivated by the practical reality that upstream pipeline dynamics can reshape the observed data distribution; Gao et al. [22] highlight how heterogeneous ETL environments and scheduling changes can alter downstream data characteristics, making explicit drift monitoring a necessary control signal rather than an optional diagnostic. In the second phase, we monitor detector behavior—such as error distribution shifts, alert rate instability, and confidence degradation—to decide whether the drift meaningfully affects decision quality. This step is designed to reduce unnecessary model updates and to strengthen traceability; it aligns with Lai et al. [23] in emphasizing explainable, causally grounded assessment where changes are linked to interpretable evidence rather than treated as opaque triggers. Finally, because the stream may contain heterogeneous, high-dimensional signals, we monitor drift on learned representations rather than raw inputs; this follows the representation-centric modeling rationale used by Xie and Chang [24] for heterogeneous record-based sequences, where transformer-derived features provide a more stable basis for downstream risk identification and monitoring. The resulting two-phase drift decision rule and the adaptation trigger conditions are formalized as follows:

Statistical Distribution Monitoring: We maintain a sliding reference window

W_{r e f}

containing recent samples from the current distribution and a test window

W_{t e s t}

for incoming data. To detect distribution shifts, we apply the Kolmogorov-Smirnov (KS) test, a non-parametric method that compares empirical cumulative distribution functions without assumptions about underlying distributions.

For each dimension

i

of the input space, we compute the KS statistic:

D_{K S}^{(i)} = \sup_{x} |F_{r e f}^{(i)} (x) - F_{t e s t}^{(i)} (x)|

Where

F_{r e f}^{(i)}

and

F_{t e s t}^{(i)}

are the empirical cumulative distribution functions for dimension

i

in the reference and test windows, respectively. The overall drift score is computed as:

D_{d r i f t} = \frac{1}{d} \sum_{i = 1}^{d} D_{K S}^{(i)}

A distribution shift is detected when

D_{d r i f t}

exceeds a predefined threshold

τ_{d r i f t}

typically set using statistical significance levels.

Performance-Based Drift Detection: To complement statistical monitoring, we track the model’s detection performance using an exponentially weighted moving average (EWMA) of prediction confidence scores:

C_{t} = α \cdot c_{t} + (1 - α) \cdot C_{t - 1}

where

c_{t}

represents the prediction confidence at time

t

and

α

is the smoothing parameter. A significant decrease in

C_{t}

indicates potential concept drift requiring model adaptation. We detect performance drift when:

Δ C = |C_{t} - C_{t - w}| > τ_{p e r f}

where

w

is the monitoring window size and

τ_{p e r f}

is the performance drift threshold.

C. Adaptive Continual Learning Module

Upon detecting distribution drift, the adaptive learning module updates the anomaly detection model while preserving knowledge of previous distributions. Our approach is built upon a deep autoencoder architecture enhanced with attention mechanisms for temporal feature extraction.

Base Architecture: The core detection model consists of an encoder-decoder structure with LSTM layers augmented by attention mechanisms. The encoder maps input sequences to a latent representation:

h_{t} = {LSTM}_{e n c} (x_{t}, h_{t - 1})

z_{t} = Attention (h_{1}, \dots, h_{T})

The decoder reconstructs the input from the latent representation:

\hat{x_{t}} = {LSTM}_{d e c} (z_{t}, s_{t - 1})

The anomaly score is computed based on reconstruction error and latent space deviation:

A (x_{t}) = λ_{1} | \hat{x_{t}} - x_{t} |_{2} + λ_{2} | z_{t} - μ_{z} |_{2}

where

μ_{z}

represents the mean latent representation of normal samples, and

λ_{1}

,

λ_{2}

are weighting coefficients.

Continual Learning Strategy: To enable continual adaptation without catastrophic forgetting, we employ a rehearsal-based approach with dynamic memory management. The memory buffer

M

stores representative samples from encountered distributions. During training on a new data batch

B_{n e w}

, we combine it with samples from memory:

B_{t r a i n} = B_{n e w} \cup Sample (M, k)

where

k

is the number of samples retrieved from memory. The model is updated by minimizing a hybrid loss function:

L_{t o t a l} = L_{r e c o n} + β_{1 L_{r e g}} + β_{2 L_{c o n t r a s t}}

The reconstruction loss ensures accurate modeling of normal patterns:

L_{r e c o n} = \frac{1}{|B_{t r a i n}|} \sum_{x \in B_{t r a i n}} | x - \hat{x} |_{2}^{2}

The regularization loss prevents drastic parameter changes:

L_{r e g} = \sum_{j} | θ_{j} - θ_{j}^{o l d} |_{2}^{2}

Where

θ^{o l d}

represents parameters before the current update. The contrastive loss maintains separation between normal and anomalous representations in latent space:

L_{c o n t r a s t} = - l o g \frac{\exp (sim (z_{i}, z_{i}^{+}) / τ)}{\sum_{j} \exp (sim (z_{i}, z_{j}) / τ)}

where

sim (\cdot, \cdot)

denotes cosine similarity,

z_{i}^{+}

is a positive sample, and

τ

is the temperature parameter.

D. Dynamic Memory Management

Effective memory management is critical for balancing adaptation speed with knowledge retention. We propose an adaptive memory update strategy that responds to detected drift severity.

Drift-Aware Memory Update: Upon detecting distribution drift, we classify its severity based on

D_{d r i f t}

magnitude. For mild drift

(D_{d r i f t} < τ_{h i g h})

, we gradually replace outdated samples. For severe drift

{(D}_{d r i f t} \geq τ_{h i g h})

, we perform aggressive memory realignment by removing samples with high reconstruction error under the new distribution.

The memory update probability for sample

x_{m} \in M

is computed as:

P_{r e m o v e} (x_{m}) = σ (γ \cdot \frac{| x_{m} - \hat{x_{m}} |_{2} - μ_{e r r}}{σ_{e r r}})

where

σ

is the sigmoid function,

u_{e r r}

and

σ_{e r r}

are the mean and standard deviation of reconstruction errors in current memory, and

γ

controls the removal aggressiveness.

Diverse Sample Selection: When adding new samples to memory, we employ a diversity-based selection criterion to ensure comprehensive coverage of the current distribution. We use k-means clustering in the latent space and select samples closest to cluster centroids:

S_{n e w} = {\arg \min_{x \in C_{j}} | z_{x} - c_{j} |_{2} ∣ j = 1, \dots, K}

where

C_{j}

represents the j-th,

c_{J}

is its centroid, and

K

is the number of clusters.

E. Adaptive Learning Rate Scheduling

To balance stability and plasticity, we dynamically adjust the learning rate based on drift severity and model confidence. The adaptive learning rate is computed as:

η_{t} = η_{b a s e} \cdot (1 + ω \cdot D_{d r i f t}) \cdot \exp (- ρ \cdot C_{t})

Where

η_{b a s e}

is the base learning rate,

ω

controls drift sensitivity, and

ρ

modulates the influence of model confidence. This formulation increases learning rate when drift is detected while decreasing it when the model demonstrates high confidence, preventing unnecessary updates to stable patterns.

IV. Experiments

A. Datasets

We evaluate our framework on four publicly available benchmark datasets representing diverse application domains and drift characteristics: SWaT (Secure Water Treatment): A dataset from a water treatment testbed containing 11 days of continuous operation with 51 sensors and actuators. The dataset includes both normal operations and various cyber-physical attack scenarios, representing gradual and abrupt distribution shifts [25]. With 946,722 samples and an anomaly ratio of 11.98%, SWaT provides a realistic testbed for industrial monitoring applications. SMAP (Soil Moisture Active Passive): A real-world dataset from NASA containing telemetry data from spacecraft sensors. It includes 25 dimensions with 135,183 samples and a 13.13% anomaly ratio, exhibiting natural temporal evolution in sensor readings [26]. The dataset demonstrates gradual drift patterns typical of space systems. MSL (Mars Science Laboratory): Another spacecraft dataset from NASA with 55 channels monitoring the Mars rover systems over 132,801 samples. With a 10.72% anomaly ratio, this dataset demonstrates long-term non-stationary behavior with seasonal variations, making it ideal for evaluating adaptation to slow, continuous drift. SMD (Server Machine Dataset): A 5-week dataset from a large internet company containing 38 dimensions from 28 server machines, totaling 708,405 samples. With only a 4.16% anomaly ratio, it exhibits concept drift due to varying workloads and operational conditions [27], representing mixed drift patterns common in cloud computing environments. Table I summarizes the key characteristics of these datasets, including their dimensionality, length, anomaly ratio, and drift types.

B. Results and Analysis

Table II presents the comprehensive performance comparison across all datasets. Our proposed framework consistently outperforms baseline methods, achieving the highest average F1-score of 0.847 across datasets, representing a 11.3% improvement over the best baseline (AnomalyTransformer at 0.734).

On the SWaT dataset, our method achieves an F1-score of 0.891, significantly outperforming the second-best method (TranAD at 0.812). The SMAP and MSL datasets show similar trends, with our framework achieving F1-scores of 0.832 and 0.819, respectively. The SMD dataset presents the most challenging scenario due to its high-dimensional feature space and frequent concept drift, yet our framework maintains robust performance with an F1-score of 0.846.

Figure 1 visualizes the F1-score comparison across all methods and datasets, clearly demonstrating the consistent superiority of our approach.

To evaluate adaptation capabilities, we analyze performance across different drift severities. Figure 2 shows F1-score degradation as drift magnitude increases. Our framework maintains stable performance across mild to severe drift conditions, with only a 7.2% F1-score decrease from no drift to severe drift scenarios. In contrast, static methods like LSTM-VAE and USAD experience 31.4% and 28.7% degradation, respectively.

The explicit drift detection mechanism proves crucial. When comparing against EWC-AE, which employs continual learning without drift detection, our method shows a 12.8% better F1-score under moderate drift conditions. Table III details drift detection accuracy, showing our KS-test-based approach achieves 94.3% accuracy in identifying distribution shifts with minimal false positives (3.2% false positive rate).

V. Conclusions

This paper presented a comprehensive framework for adaptive anomaly detection in non-stationary time-series environments. By explicitly integrating distribution monitoring capabilities with continual learning mechanisms, our approach addresses the fundamental challenge of maintaining detection accuracy under evolving data patterns. The framework combines statistical drift detection, performance monitoring, and dynamic memory management to enable responsive adaptation while preventing catastrophic forgetting.

Extensive experimental evaluation on multiple benchmark datasets demonstrates the effectiveness of our framework, achieving an average F1-score of 0.847 across diverse application domains, representing a 11.3% improvement over state-of-the-art baselines. The framework maintains stable performance across varying drift severities, with only 7.2% degradation from no drift to severe drift conditions, while static methods experience up to 31.4% performance loss. Computational efficiency analysis confirms the practical feasibility of our approach for real-time applications, demonstrating competitive inference speed and reasonable memory requirements suitable for deployment in resource-constrained environments.

Several promising directions exist for future research. First, extending the framework to handle multivariate time-series with inter-dimensional relationships could improve detection accuracy in complex systems. Second, incorporating uncertainty quantification would provide confidence estimates for detection decisions. Third, developing fully unsupervised versions would broaden applicability to scenarios where ground truth is unavailable. Finally, a theoretical analysis of convergence properties would strengthen the foundation of adaptive anomaly detection methods. In conclusion, our work demonstrates that explicitly incorporating distribution monitoring into anomaly detection frameworks significantly improves robustness in non-stationary environments. As real-world systems become increasingly dynamic and complex, adaptive detection approaches will play a crucial role in maintaining reliable anomaly identification capabilities over extended operational periods.

References

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1–58, 2009.
T. Chen, X. Liu, B. Xia, W. Wang and Y. Lai, “Unsupervised Anomaly Detection of Industrial Robots Using Sliding-Window Convolutional Variational Autoencoder,” IEEE Access, vol. 8, pp. 47072–47081, 2020.
J. Kirkpatrick, R. Pascanu, N. Rabinowitz et al., “Overcoming Catastrophic Forgetting in Neural Networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
Y. Chen and H. Dai, “Concept Drift Adaptation with Continuous Kernel Learning,” Information Sciences, vol. 649, article 119645, 2023.
L. Feng, S. Wang, J. Wang and H. Lu, “Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 8066–8080, 2024.
A. Bifet and R. Gavaldà, “Learning from Time-Changing Data with Adaptive Windowing,” Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448, 2007.
A. Ashrafee, S. Paul, S. Haque and M. Hasan, “Holistic Continual Learning under Concept Drift with Adaptive Memory Realignment,” arXiv preprint, arXiv:2507.02310, 2025.
G. A. Ahmadi-Assalemi, G. Epiphaniou, I. Haider and H. M. Al-Khateeb, “Adaptive Learning Anomaly Detection and Classification Model for Cyber and Physical Threats in Industrial Control Systems,” IET Cyber-Physical Systems: Theory & Applications, 2025.
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
J. Li, Q. Gan, Z. Liu, C. Chiang, R. Ying and C. Chen, “An Improved Attention-Based LSTM Neural Network for Intelligent Anomaly Detection in Financial Statements,” 2025.
N. Lyu, F. Chen, C. Zhang, C. Shao and J. Jiang, “Deep Temporal Convolutional Neural Networks with Attention Mechanisms for Resource Contention Classification in Cloud Computing,” 2025.
C. Hua, N. Lyu, C. Wang and T. Yuan, “Deep Learning Framework for Change-Point Detection in Cloud-Native Kubernetes Node Metrics Using Transformer Architecture,” 2025.
Q. Xu, W. Xu, X. Su, K. Ma, W. Sun and Y. Qin, “Enhancing Systemic Risk Forecasting with Deep Attention Models in Financial Time Series,” Proceedings of the 2025 2nd International Conference on Digital Economy, Blockchain and Artificial Intelligence, pp. 340–344, 2025.
Y. Li, S. Han, S. Wang, M. Wang and R. Meng, “Collaborative Evolution of Intelligent Agents in Large-Scale Microservice Systems,” arXiv preprint, arXiv:2508.20508, 2025.
S. Pan and D. Wu, “Trustworthy Summarization via Uncertainty Quantification and Risk Awareness in Large Language Models,” arXiv preprint, arXiv:2510.01231, 2025.
Y. Xing, M. Wang, Y. Deng, H. Liu and Y. Zi, “Explainable Representation Learning in Large Language Models for Fine-Grained Sentiment and Opinion Classification,” 2025.
R. Liu, R. Zhang and S. Wang, “Graph Neural Networks for User Satisfaction Classification in Human-Computer Interaction,” arXiv preprint, arXiv:2511.04166, 2025.
Y. Wang, D. Wu, F. Liu, Z. Qiu and C. Hu, “Structural Priors and Modular Adapters in the Composable Fine-Tuning Algorithm of Large-Scale Models,” arXiv preprint, arXiv:2511.03981, 2025.
H. Zhang, L. Zhu, C. Peng, J. Zheng, J. Lin and R. Bao, “Intelligent Recommendation Systems Using Multi-Scale LoRA Fine-Tuning and Large Language Models,” 2025.
J. Zheng, Y. Chen, Z. Zhou, C. Peng, H. Deng and S. Yin, “Information-Constrained Retrieval for Scientific Literature via Large Language Model Agents,” 2025.
H. Fan, Y. Yi, W. Xu, Y. Wu, S. Long and Y. Wang, “Intelligent Credit Fraud Detection with Meta-Learning: Addressing Sample Scarcity and Evolving Patterns,” 2025.
K. Gao, Y. Hu, C. Nie and W. Li, “Deep Q-Learning-Based Intelligent Scheduling for ETL Optimization in Heterogeneous Data Environments,” arXiv preprint, arXiv:2512.13060, 2025.
J. Lai, C. Chen, J. Li and Q. Gan, “Explainable Intelligent Audit Risk Assessment with Causal Graph Modeling and Causally Constrained Representation Learning,” 2025.
A. Xie and W. C. Chang, “Deep Learning Approach for Clinical Risk Identification Using Transformer Modeling of Heterogeneous EHR Data,” arXiv preprint, arXiv:2511.04158, 2025.
A. P. Mathur and N. O. Tippenhauer, “SWaT: A Water Treatment Testbed for Research and Training on ICS Security,” Proceedings of the 2016 International Workshop on Cyber-Physical Systems for Smart Water Networks, pp. 31–36, 2016.
K. Hundman, V. Constantinou, C. Laporte, I. Colwell and T. Soderstrom, “Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding,” Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 387–395, 2018.
Y. Su, Y. Zhao, C. Niu, R. Liu, W. Sun and D. Pei, “Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network,” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2828–2837, 2019.

Figure 1. Performance comparison across different methods and datasets.

Figure 2. Performance degradation under different drift severities.

Table I. Dataset Characteristics.

Dataset	Dimensions	Length	Anomaly Ratio(%)	Drift Type
SWaT	51	946,722	11.98	Abrupt
SMAP	25	135,183	13.13	Gradual
MSL	55	132,801	10.72	Seasonal
SMD	38	708,405	4.16	Mixed

Table II. Performance Comparison Across Methods and Datasets (F1-score).

Method	SWaT	SMAP	MSL	SMD	Average
LSTM-VAE	0.673	0.689	0.671	0.623	0.664
OmniAnomaly	0.721	0.712	0.694	0.687	0.704
USAD	0.748	0.723	0.715	0.701	0.722
TranAD	0.812	0.745	0.732	0.728	0.754
AnomalyTransformer	0.798	0.762	0.749	0.734	0.761
RADM	0.735	0.698	0.687	0.665	0.696
Online-LSTM	0.689	0.671	0.658	0.642	0.665
EWC-AE	0.776	0.731	0.724	0.715	0.737
Ours	0.891	0.832	0.819	0.846	0.847

Table III. Drift detection accuracy analysis.

Metric	Statistical Only	Performance Only	Dual Monitoring (Ours)
Detection Accuracy (%)	89.7	86.2	94.3
False Positive Rate (%)	5.8	8.1	3.2
False Negative Rate (%)	10.3	13.8	5.7
Average Detection Delay (steps)	124.0	98.0	87.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Adaptive Anomaly Detection for Non-Stationary Time-Series: A Continual Learning Framework with Dynamic Distribution Monitoring

Abstract

Keywords:

Subject:

I. Introduction

II. Related Work

III. Methodology

A. Problem Formulation

B. Distribution Drift Detection Module

C. Adaptive Continual Learning Module

D. Dynamic Memory Management

E. Adaptive Learning Rate Scheduling

IV. Experiments

A. Datasets

B. Results and Analysis

V. Conclusions

References

MDPI Initiatives

Important Links

Subscribe