How does Active Learning Tackle Domain Adaptation: A Survey on Active Domain Adaptation and Active Continual Learning

Jin Yang; Jing Zhang; Xiaobing Yu

doi:10.20944/preprints202605.0070.v1

Submitted:

01 May 2026

Posted:

04 May 2026

You are already at the latest version

Abstract

Machine learning and deep learning models trained on a source domain often suffer from performance degradation when deployed to new target domains due to domain shifts arising from differences in data distributions, acquisition conditions, or temporal variations. Domain adaptation addresses this issue by transferring knowledge from labeled source data to unlabeled target data. However, acquiring labels for target-domain samples is often costly or impractical in real-world applications. To improve label efficiency, active domain adaptation (ADA) and active continual learning (ACL) integrate active learning strategies into domain adaptation and continual learning frameworks. ADA selectively queries informative target samples to enhance adaptation performance, while ACL extends this paradigm to sequential settings, enabling models to adapt to evolving data streams while mitigating catastrophic forgetting. This survey provides a systematic review of ADA and ACL, focusing on their advances and applications. We further examine extensions of ADA such as source-free ADA, integration with semi-supervised learning, and advanced techniques for handling challenging adaptation scenarios. In addition, we summarize applications across computer vision, medical imaging, robotics, natural language processing, scientific and engineering tasks. Finally, we discuss open challenges and future directions, including robust adaptation under complex distribution shifts and reliable semi-supervised adaptation.

Keywords:

active domain adaptation

;

active continual learning

;

source-free domain adaptation

;

semi-supervised learning

Subject:

Computer Science and Mathematics - Computer Vision and Graphics

1. Introduction

Machine learning (ML) and deep learning (DL) models have achieved remarkable success across a wide range of tasks, including computer vision [1,2], natural language processing [3,4], and scientific engineering [5,6]. These models are typically trained under the assumption that training and test data follow the same distribution. However, in many real-world scenarios this assumption does not hold. When models trained on a source domain are applied to a new target domain with different data distributions, their performance often degrades significantly. This phenomenon, commonly referred to as domain shift, arises due to variations in data acquisition conditions, storage environments, feature distributions, or class priors across domains [7]. Domain adaptation has emerged as an important paradigm to address this challenge by transferring knowledge from labeled source domains to unlabeled or sparsely labeled target domains [8]. Existing domain adaptation methods aim to learn domain-invariant representations or align feature distributions between domains in order to improve generalization to new environments [9]. While these approaches have achieved promising results, many of them rely on the assumption that target-domain data are fully unlabeled or that large amounts of labeled source data remain accessible. In practice, however, labeling new data can be expensive and time-consuming, and the most informative samples for adaptation are often unknown. Additionally, their applicability is largely limited when the source-domain and target-domain data have large distribution shifts.

Active learning (AL) provides a feasible solution to this problem by enabling models to selectively query informative samples for annotation [10,11]. Instead of labeling large amounts of data indiscriminately, AL strategies aim to identify samples that are expected to provide the greatest benefit for model improvement. By integrating AL with domain adaptation, active domain adaptation (ADA) methods employ query strategies to select the most informative target-domain samples for annotation, thus training models with these data for improving generalization on target domains [12,13]. ADA can effectively reduce annotation cost while improving adaptation performance.

Beyond static domain adaptation scenarios, many real-world applications involve data that arrive sequentially over time, resulting in continually evolving data distributions [14,15]. In such settings, models must not only adapt to new domains but also retain knowledge learned from previous domains. Active continual learning (ACL) addresses this challenge by combining active sample selection with continual learning mechanisms that mitigate catastrophic forgetting [14]. ACL methods enable models to efficiently adapt to new tasks or domains by selectively querying new informative samples for training while maintaining performance on previously encountered data.

Despite the rapid development of ADA and ACL methods, a systematic overview on these methods is still lacking while existing surveys typically focus on either domain adaptation, active learning, or continual learning independently. Furthermore, recent learning paradigms such as active source-free domain adaptation, and integration of ADA with semi-supervised learning, and applications across diverse domains have not yet been comprehensively investigated.

1.1. Contributions of the Survey

To address the lack of a unified perspective in this field, this survey provides a comprehensive and structured review of recent advances in active domain adaptation (ADA) and active continual learning (ACL). By synthesizing recent developments in these areas, this survey aims to provide researchers and practitioners with a unified perspective on how active learning can enhance domain adaptation and continual learning, ultimately enabling more robust, efficient, and adaptable ML/DL systems in dynamic real-world environments. First, we present a detailed investigation of different types of query strategies in ADA, highlighting their design principles and impact on model generalization under domain shifts. By providing discussions on rationale behind design principles, our survey provides guidance to utilize query strategies in ADA.

Beyond conventional settings, we further review emerging learning paradigms that extend the applicability of ADA in practical scenarios, including active source-free domain adaptation, the integration of ADA with semi-supervised learning, class-balanced ADA, and multi-fidelity ADA. These paradigms are particularly important in real-world applications where labeled data are scarce, imbalanced, or inaccessible. In addition, we systematically examine several challenging yet critical scenarios, including label distribution shift, open-set and universal domain adaptation, multi-source and multi-target adaptation, and cross-modality adaptation, and discuss how existing ADA methods address these challenges.

Moreover, this survey provides a comprehensive overview of applications of ADA across diverse domains, including computer vision, medical data analysis, robotics, natural language processing, graph learning, and scientific and engineering problems, demonstrating the practical impact and versatility of ADA.

Next, we provide a comprehensive review of recent advances in active continual learning about query strategies for adaptation, continual learning techniques to mitigate catastrophic forgetting. We also demonstrate the application of these ACL methods.

Finally, we identify open challenges and promising future directions, such as shift-robust source-free adaptation frameworks and reliable semi-supervised ADA.

1.2. Structure of the Survey

The remaining sections of this survey are organized as follows (Figure 1): Section II formulates the ADA problem and covers various query strategies in ADA. Section III and IV demonstrate four emerging learning paradigms and five challenging scenarios in ADA, respectively. Section V demonstrates the applications of ADA in various fields. Section VI focuses on advancement in ACL and it applications. Finally, Section VII discusses open challenges and future directions.

2. Active Domain Adaptation and Query Strategies

2.1. Problem Formulation

Active domain adaptation (ADA) aims to improve model generalization on a new unlabeled target domain by selectively acquiring annotations for a small number of informative target samples to train models (Figure 2A). Formally, given a labeled source domain

S

and an unlabeled target domain

T

, the source domain contains

N_{s}

labeled samples

X_{s} = {x_{s}}_{s = 1}^{N_{s}}

with corresponding labels

Y_{s} = {y_{s}}_{s = 1}^{N_{s}}

, while the target domain contains

N_{t}

unlabeled samples

X_{t} = {x_{t}}_{t = 1}^{N_{t}}

.

A model

F_{s} (Θ)

is first trained on the source domain using

(X_{s}, Y_{s})

. Then ADA integrates an AL query strategy

π (\cdot)

that selects a small subset of informative target samples for annotation. Specifically, the query strategy selects

N_{A L}

samples

X_{t}^{l} = {x_{t}^{l}}_{t = 1}^{N_{A L}}

from

X_{t}

, which are then annotated to obtain labels

Y_{t}^{l}

by Oracle. The model is subsequently trained for adaptation using both the labeled source data

(X_{s}, Y_{s})

and the queried target samples

(X_{t}^{l}, Y_{t}^{l})

. By strategically selecting informative samples, ADA aims to maximize adaptation performance while minimizing annotation cost.

A fundamental principle of ADA is to prioritize the annotation of informative target samples that provide the greatest benefit for improving model performance. Thus, the effectiveness of ADA largely depends on the design of query strategies that identify the most informative samples for annotation. Existing query strategies generally evaluate sample informativeness from three complementary perspectives: uncertainty, diversity, and hybrid criteria that combine both aspects.

2.2. Uncertainty-Based Query Strategies

Uncertainty-based query strategies select samples for which the model predictions are uncertain with low predictive confidence, indicating samples where the model lacks knowledge. These strategies are motivated by the principle that labeling uncertain samples can provide the most informative feedback for improving model decision boundaries by learning new knowledge.

2.2.1. Classical Uncertainty Metrics

Several classical uncertainty metrics have been widely adopted in ADA: predictive entropy [16], predictive confidence [17], and prediction margin [16].

We assume a model

F (Θ)

that produces predictive probabilities

p (x_{t})

for a target sample

x_{t}

via a softmax function, and the uncertainty-based query score Q of the sample

x_{t}

is calculated for all class labels

c \in {1, . . ., C}

. The predictive entropy measures the uncertainty of the predicted class distribution. High entropy indicates that the model assigns similar probabilities across multiple classes, suggesting high uncertainty. Thus, querying high-entropy samples for model training enables the model to capture more knowledge to distinguish the predictions of different classes. The entropy-based query score is calculated as

\begin{matrix} Q (x_{t}) = - \sum_{c}^{C} p_{c} (x_{t}) log p_{c} (x_{t}) . \end{matrix}

(1)

Another commonly used metric is prediction confidence, which evaluates the maximum predicted probability. Samples with low confidence are considered more informative for annotation. The confidence-based query score is calculated as

\begin{matrix} Q (x_{t}) = max_{c \in {1, . . ., C}} p_{c} (x_{t}) . \end{matrix}

(2)

A third metric is prediction margin, which measures the difference between top two predicted probabilities. A small margin indicates that the model is uncertain about two competing classes. The margin-based query score is calculated as

\begin{matrix} Q (x_{t}) = p_{1} (x_{t}) - p_{2} (x_{t}), \end{matrix}

(3)

where

p_{1}

and

p_{2}

denote the highest and second-highest predicted probabilities for two classes

c_{1}

and

c_{2}

, respectively.

2.2.2. Disagreement-Based Uncertainty Metrics

The disagreement-based uncertainty metrics measure uncertainty by evaluating the differences and disagreement of predictions among multiple samples or from multiple models. Instead of relying on a single model, Query-by-Committee (QBC) constructs multiple predictors and measures disagreement among their predictions. Target samples that produce large prediction discrepancies are considered informative for annotation [18,19,20].

Several methods estimate uncertainty by measuring disagreement of predictions among local neighboring samples. These approaches assume that samples with similar semantic representations should exhibit consistent predictions. If a sample shows large prediction inconsistencies with its nearest neighbors in the feature space, it is likely to be uncertain and informative for model improvement. For example, the Local Context-Aware ADA (LADA) and ADA with Balancing Uncertainty and Diversity (ADA-BUD) frameworks evaluate inconsistency of predictive probabilities to its K nearest neighbors [21,22], while the Minimum Happy Points Learning (MHPL) and Local Uncertainty Energy Transfer (LUET) frameworks measure predictive entropy and energy weighted among neighbors [23,24], respectively.

Perturbation-based strategies evaluate uncertainty by measuring differences of predictions among samples and their perturbations. These methods generate perturbed versions of target samples and measure the stability of model predictions. Large prediction variations indicate that the model operates in poorly understood regions of the input space with high epistemic uncertainty, suggesting that the sample is informative for annotation [18,25,26,27,28].

2.2.3. Novel Uncertainty Metrics

Although classical metrics are simple and effective, they rely on deterministic predictions that may produce overconfident probability estimates, particularly under domain shift [29]. Thus, it may lead to poorly calibrated point uncertainty estimates and unreliable uncertainty estimation on data with distribution shift. To achieve reliable uncertainty estimation, several methods have explored probabilistic modeling approaches for improved uncertainty estimation. For example, Gaussian mixture models (GMMs) can be employed to model the distribution of feature embeddings. Given a GMM with K components

{(π_{k}, μ_{k}, σ_{k})}_{k = 1}^{K}

, the probability density of an instance

x_{t}

is computed as

\begin{matrix} p (x_{t}) = \sum_{k = 1}^{K} π_{k} N (x_{t}; μ_{k}, Σ_{k}) . \end{matrix}

(4)

By modeling feature distributions, GMM-based ADA approaches estimate uncertainty based on the likelihood of samples belonging to learned clusters [30,31,32].

Energy-based methods provide another perspective for uncertainty estimation. In energy-based ADA frameworks, such as EADA [33], the free energy derived from model logits is used to quantify prediction confidence. Samples with higher energy values indicate higher uncertainty and are therefore prioritized for annotation. Thus, energy-based ADA methods query samples with large uncertainty for model adaptation by measuring the free energy gaps between source labeled data and target unlabeled data [22,33]. Subsequent works further improve this strategy by incorporating local neighborhood information, for example LUET which aggregates energy among neighboring samples to improve robustness [24]. Other works, such as the Locality Preserving Transfer (LPT) framework [34], measure free energy only based on predictions of the target sample without accessing to source data.

More recently, diffusion-based approaches generate multiple probability samples through latent diffusion models to estimate uncertainty distributions rather than single-point predictions. Thus, a diffusion-based probabilistic uncertainty estimation method is proposed to captures both data-level and prediction-level uncertainties beyond a point estimate, facilitating a diffusion-based ADA (DiffADA) [35].

Additionally, evidential learning methods model predictive uncertainty using a Dirichlet distribution over class probabilities, enabling the decomposition of predictive uncertainty into data uncertainty (aleatoric uncertainty) and distributional uncertainty (epistemic uncertainty). Evidential models demonstrate superior capabilities of improving the calibration of uncertainty estimates and enhancing query reliability. Thus, a dirichlet-based uncertainty calibration (DUC) approach is proposed for improving measurements of sample uncertainty during sample query [36], and is further employed in various scenarios [37,38,39].

2.3. Diversity-Based Query Strategies

While uncertainty-based strategies prioritize samples near model decision boundaries, they may repeatedly select similar samples located in the same region of the feature space. This can lead to redundant annotations and insufficient coverage of the whole target-domain distribution. Diversity-based query strategies address this limitation by selecting representative samples that capture the overall structure of the target data distribution, thus training models to learn sufficient knowledge for adaptation. By ensuring coverage of the target distribution, these strategies improve the robustness of domain adaptation and reduce the risk of overfitting to a narrow subset of the data.

2.3.1. Clustering-Based Diversity Sampling

Most diversity-based methods operate in the feature embedding space and aim to select samples that maximize coverage of the target domain. Clustering-based approaches are among the most widely used techniques. These methods group target samples into clusters using algorithms such as k-means or k-nearest neighbors (KNN), and then select representative samples from each cluster. By ensuring that samples are selected from multiple clusters, these approaches improve the representativeness of the query set and help models learn the global structure of the target distribution [21,27,34,40,41,42,43,44,45,46,47].

Core-set based sampling is another widely adopted diversity strategy. The core-set method aims to select a subset of samples that best approximates the distribution of the entire dataset by minimizing the maximum distance between selected samples and the remaining data points [48]. Although effective in maximizing diversity, the core-set strategy may occasionally select outliers that lie in sparse regions of the feature space. To mitigate this issue, density-aware core-set has been proposed that incorporate local density information when selecting representative samples. These methods prioritize samples located in dense regions of the data manifold, thereby improving the robustness of diversity-based sampling [49].

2.3.2. Intra-Domain Diversity Sampling

Another common strategy measures pairwise similarity between target samples to avoid selecting redundant instances for maximizing intra-domain diversity. Most of these methods evaluate intra-domain dissimilarity by employing cosine similarity to compute semantic distances among neighbors [50,51], while the Submodular Subset Selection for Virtual Adversarial Active Domain Adaptation (S3VAADA) framework employs a different metric, Bhattacharyya coefficient, to measure semantic similarity [25]. Similarly, prototype-based approaches, for example prototype-guided class-balanced ADA (PCADA) [52], estimate the statistical distance between target samples and class prototypes to select representative instances, while the learn from the learnt (LFTL) measures the similarity between actively queried target samples and unlabeled samples [53].

2.3.3. Domain Discrepancy Diversity Sampling

Despite their advantages, diversity-based strategies that rely solely on target-domain structure may overlook the knowledge already learned from the source domain. Since the model has been trained on source data, the informativeness of a target sample depends not only on its representativeness within the target domain but also on its discrepancy from the source distribution. To address this issue, discrepancy-based diversity strategies have been proposed to measure the distance between target samples and source-domain knowledge. Discrepancy-based strategies can generally be categorized into two groups. The first category employs discriminative models to estimate domain differences. These methods train domain discriminators to distinguish between source and target samples, and target samples that are strongly classified as belonging to the target domain are considered source-dissimilar and informative for annotation [13,18,54,55,56,57,58]. The second category directly measures distributional discrepancies and knowledge gaps between source and target samples. For instance, some approaches compute distances and dissimilarity between feature embeddings of target samples and source prototypes or medoids to identify target samples that exhibited large distribution shifts [59,60,61], while others identify source-dissimilar or source-like samples in the target domains [31,62,63].

However, discrepancy-based strategies that evaluate distances to all source samples may introduce a bias toward high-density regions of the source domain. To alleviate this issue, more refined sampling mechanisms have been proposed. For example, the Select by Distinctive Margin (SDM) strategy selects target samples based on their distance to difficult source examples located near the decision boundary, rather than considering all source samples equally [64]. This strategy enables the selection of samples that provide more informative signals for adapting decision boundaries across domains.

2.4. Hybrid Query Strategies

To query the most informative samples by fully evaluating their information levels, recent ADA methods adopt hybrid query strategies that integrate multiple sampling criteria, most commonly combining uncertainty estimation with diversity or representativeness measures when selecting target samples for annotation. Hybrid strategies aim to balance two complementary objectives: selecting samples that are highly informative for refining decision boundaries while maintaining sufficient diversity to cover the target distribution. These hybrid strategies follow two ways to combine these criteria: sequentially or jointly.

2.4.1. Sequential Hybrid Strategies

A common design of hybrid query strategies is to evaluate uncertainty and diversity sequentially. These methods first identify a candidate pool of uncertain samples and then select diverse samples from this subset to construct the final query set. This sequential is designed to avoid high computational burdens. Specifically, using classical metrics (e.g., entropy and confidence) to measure uncertainty on all samples will not lead to high computational cost, and selecting a subset of candidates for diversity sampling will reduce computational cost even though high-cost diversity measurements are employed (e.g., clustering). For example, the Active Adversarial Domain Adaptation (AADA) method first measures predictive uncertainty using classification confidence and subsequently evaluates domain discrepancy using a domain discriminator to select source-dissimilar samples [13]. Similarly, the discriminative ADA (AC-DA) [54], Transferable Loss-based ADA (TL-ADA) [55], and semi-supervised domain adaptation (SSDA) [42] frameworks all compute predictive entropy to identify uncertain samples and subsequently evaluate intra-target-domain dissimilarity, domain discrepancy, and K-means to ensure diversity among selected samples, respectively. Other approaches adopt similar sequential designs. The Stochastic Adversarial Gradient Embedding (SAGE) method estimates uncertainty through the variance of feature embeddings before and after annotation and then applies k-means++ clustering to select diverse samples from the uncertain candidate pool [41]. Neighborhood-aware ADA frameworks adopt sequential hybrid sampling to measure informativeness among neighbors. LADA [21] and LUET [24] methods identify uncertain samples by evaluating their consistency and energy with those of their nearest neighbors and then apply KNN or clustering to ensure diversity.

2.4.2. Joint Hybrid Strategies

Instead of applying uncertainty and diversity sequentially, some ADA methods jointly evaluate multiple informativeness metrics. These strategies typically compute query scores based on uncertainty, diversity, or distributional discrepancy separately, and combine them into a unified sampling objective. For instance, the S3VAADA framework employs a unified objective which simultaneously evaluates perturbation-based uncertainty, intra-domain dissimilarity-based diversity, and distributional source-target divergence [25]. Similarly, clustering environment-aware learning (CEAL) integrates multiple criteria when selecting samples. It evaluates uncertainty using prediction margins, measures diversity through feature-space similarity within clusters, and incorporates distributional similarity between source and target samples to guide sample selection [51]. In contrast, the Transferable Query Selection (TQS) method incorporates two uncertainty metrics, Query-by-Committee and prediction margin, and one diversity metric, domain discrimination, into a unified objective [18]. By integrating multiple complementary criteria, these hybrid strategies provide a more comprehensive evaluation of sample informativeness, which has been shown to improve the effectiveness of active sampling in domain adaptation scenarios.

Table 1. The summary of ADA and ASFDA methods.

3. Emerging Learning Paradigms in Active Domain Adaptation

3.1. Active Source-Free Domain Adaptation

ADA methods typically rely on access to both source and target data to exploit cross-domain correlations during adaptation. However, in many real-world scenarios, source data cannot be accessed due to privacy regulations, proprietary restrictions, or storage limitations. To address this limitation, active source-free domain adaptation (ASFDA) adapts models to the target domain without accessing source data during adaptation (Figure 2B).

Formally, a source domain

S

contain

N_{s}

labeled samples

(X_{s}, Y_{s}) = {(x_{s}, y_{s})}_{s = 1}^{N_{s}}

, and a target domain

T

contains

N_{t}

unlabeled samples

X_{t} = {x_{t}}_{t = 1}^{N_{t}}

. A model

F (Θ)

is first initialized on the source domain. ASFDA employs an AL query strategy

π (\cdot)

to select a small subset of informative target samples

X_{t}^{l} = {x_{t}^{l}}_{t = 1}^{N_{A L}}

for annotation

Y_{t}^{l}

. The model is then adapted using the queried labeled samples

(X_{t}^{l}, Y_{t}^{l})

without using source data, thus improving its performance on the target domain while minimizing annotation cost.

3.1.1. Source-Free Query Strategies

Query strategies in ASFDA largely follow the same principles as those in standard ADA, aiming to identify informative and representative target samples for annotation. A few studies explored strategies based solely on uncertainty estimation [82] or diversity sampling [43]. However, relying on a single criterion may lead to redundant or unrepresentative sample selection. Thus, most ASFDA methods adopt hybrid strategies that evaluate uncertainty and diversity to improve the quality of the query set.

Some methods employ classical uncertainty metrics and clustering diversity sampling sequentially in their hybrid strategies. Representative methods include LPT [34] and Self-adaptive Clustering-based Active Learning (SCAL) [43] which identify uncertain samples using free energy and confidence, respectively, and subsequently implement clustering sampling. Similarly, the Feature Mixing and Self-Training (FMAS) framework estimates uncertainty by evaluating prediction inconsistency after modifying feature embeddings, and then implements diverse sampling via weighted k-means clustering [27]. In contrast, some works transverse uncertainty and diversity evaluation, such as Diverse Structure Learning (DSL) which performs k-means clustering to ensure diversity first and subsequently selects samples with high predictive entropy for annotation [79].

In addition to clustering sampling, several methods exploit structural relationships among samples to improve diverse sample selection. These methods identify uncertain samples first, and subsequently exclude their neighbor semantic-similar samples for annotation to avoid selecting redundant samples. For example, MHPL identifies highly uncertain samples by measuring the entropy of predictions among neighbors as informative ones for annotation, while these neighbors are ignored for selection [23]. Similarly, the structure-based uncertainty estimation model (SUEM) model the target feature distribution using a Gaussian mixture model (GMM) and evaluates uncertainty through probability margins that reflect cluster membership confidence [78]. This method also incorporates neighbor-exclusion mechanisms to avoid selecting redundant samples.

3.1.2. Source-Free Source Knowledge Utilization

The absence of source data in ASFDA restricts the direct transfer of source-domain knowledge and limits the ability to guide sample selection using source information. To overcome this limitation, several methods attempt to implicitly exploit source knowledge during the query process or adaptation stage.

One strategy leverages historical model behavior or predictions to represent knowledge from source data. For example, LFTL identifies challenging samples that consistently yield low-confidence predictions across several previous query rounds [53]. To maintain query diversity, LFTL reduces the probability of selecting samples from classes that have already been frequently queried.

Another strategy focuses on identifying source-like samples within the target domain. Since conventional domain adaptation methods typically align target distributions to source distributions, these approaches first locate target samples that resemble the source domain and then use them as anchors for adaptation. For instance, SQAdapt identifies source-like samples by jointly evaluating prediction uncertainty, sensitivity to data augmentation, and diversity [26]. After selecting informative samples for annotation, SQAdapt constructs class prototypes using both labeled and pseudo-labeled target samples and performs self-training to refine the model. Target samples exhibiting low uncertainty and low sensitivity are considered source-like and used to guide the alignment of other target samples.

3.2. Semi-Supervised Learning in Active Domain Adaptation

ADA typically queries only a small subset of target samples for annotation, leaving a large portion of target data unused during training. This limited utilization of available data hinders adaptation performance. To address this limitation, many ADA methods incorporate semi-supervised learning (SSL) by generating pseudo labels for unlabeled target samples and using them together with annotated samples for model training (Figure 2C).

Formally, given a source domain

S

with labeled data

(X_{s}, Y_{s})

and a target domain

T

with unlabeled data

X_{t}

, ADASSL employs a query strategy

π (\cdot)

to select informative target samples

X_{t}^{l}

for annotation

Y_{t}^{l}

. In addition, ADASSL utilizes a pseudo-labeling strategy to select a subset of or all other unlabeled samples

X_{t}^{p}

for assigning pseudo labels

Y_{t}^{p}

. The labeled samples

(X_{t}^{l}, Y_{t}^{l})

and pseudo-labeled samples

(X_{t}^{p}, Y_{t}^{p})

are used to train the model in a semi-supervised manner.

When source data are inaccessible, ASFDASSL follows a similar procedure but relies solely on the pretrained source model without accessing the original source dataset (Figure 2D). The queried labeled samples and pseudo-labeled samples are jointly used to train the model for adaptation.

Besides query strategies, pseudo-labeling strategies is another important component in ADASSL to demonstrate how to pseudo-label samples. Existing ADASSL methods employ different strategies to generate pseudo labels for unlabeled target samples.

3.2.1. Pseudo Labeling Strategies

Some pseudo-labeling strategies assign pseudo labels to all remaining unlabeled samples after active selection. These ADASSL query a subset of informative samples with annotation for model training. This model with supervised training is utilized to generate pseudo labels for all remaining unlabeled samples. These pseudo-labeled samples and labeled samples are finally combined for a semi-supervised fine-tuning. For example, the Clustering Uncertainty-weighted Embeddings (CLUE) [65] and multi-anchor ADA (MADA) [66] approaches query samples via uncertainty-weighted clustering and multi-anchor clustering, respectively, and subsequently assign pseudo labels to the remaining unlabeled target data. Other frameworks such as informative path planning (IPP) [30] and SSDA [42] adopt similar strategies to enable semi-supervised training after sample querying.

3.2.2. Selective Pseudo Labeling Strategies

However, pseudo labels generated for some samples may be unreliable or inaccurate. Thus, using all pseudo-labeled samples during training may introduce noisy supervision and degrade model performance. Since models are more likely to generate noisy pseudo labels for highly uncertain samples, to mitigate this issue, many pseudo-labeling strategies selectively assign pseudo labels to a subset of samples with high confidence or low uncertainty.

Some ADASSL methods estimate uncertainty for all target samples, and then select a subset of samples with high uncertainty for annotation and another subset of samples with low uncertainty or high confidence for pseudo-labeling, such as ADA with Multi-level Contrastive Units (ADA-MCU) [68], LPT [34], updated class consensus dictionary (UCCDA) [71], EFfective Target Labeling (EFTL) [50], and Gaussian Process-based Active Sampling (GPAS) [80]. Other methods estimate uncertainty and diversity during active selection and pseudo-label confident samples such as FMAS [27] and Bridging Inactive and Active Samples (BIAS) in ASFDA [82]. In contrast, PCADA further integrates class balancing by selecting confident samples with lower class frequency for pseudo-label generation [52].

Other methods propose alternative mechanisms for identifying reliable pseudo-labeled samples, such as structure-aware pseudo-labeling mechanisms. These methods generate feature representations for target samples in the feature space, and select confident neighbors or clustered neighbors of actively labeled samples for generating pseudo labels in LADA [21] and Cluster-based approaches such as Self-adaptive Clustering-based Active Learning (SCAL) and SUEM [43,78], respectively. Neighboring samples in feature space demonstrate large semantic similarity, so the model can learn semantic knowledge from labeled samples, and generate reliable pseudo labels for these neighbors.

More advanced methods design query strategies that implicitly distinguish informative samples from reliable pseudo-labeled samples. The source-trained model has been equipped with source knowledge, so it may generate reliable and accurate pseudo labels for source-similar samples from the target domain. Domain discrepancy is evaluated to identify source-dissimilar samples for annotation and source-similar samples for pseudo-labeling in the Dual-Focus Memory Contrastive Learning (DumDA) [63], while prediction uncertainty and prototype consistency are jointly utilized in the Divide-and-Adapt (DiaNA) [31].

Integrating semi-supervised learning into ADA has become an effective strategy for improving label efficiency by leveraging the large pool of unlabeled target data. While active learning focuses on identifying the most informative samples for annotation, semi-supervised learning enables models to exploit additional information from unlabeled data through pseudo-labeling or representation learning. The combination of these two paradigms allows models to improve annotation efficiency and data utilization, which is particularly important in scenarios where labeling costs are high. However, the success of SSL-based ADA methods strongly depends on the reliability of pseudo labels. Incorrect pseudo labels may introduce noisy supervision into the training process and negatively affect adaptation performance. Therefore, most approaches emphasize confidence-aware pseudo-label selection or structure-consistency pseudo-label propagation to improve pseudo-label quality.

3.3. Class-Balanced Active Domain Adaptation

Active query strategies may inadvertently select samples predominantly from certain classes, resulting in class-imbalanced query sets. Such imbalance limits the representativeness of annotated samples and may hinder the model’s ability to learn the full distribution of the target domain. To address this issue, some mechanisms help construct more balanced query sets, enabling models to better capture the underlying structure of the target domain and improving adaptation performance. Existing class-balanced ADA methods employ two main mechanisms to achieve class-balanced sample selection.

One mechanism adjusts the query probability based on class frequency. These methods explicitly reduce the probability of selecting samples from classes that have already been frequently queried [21,22,53,63]. A similar mechanism is utilized in semantic segmentation tasks by Dynamic Density-aware ADA (D2ADA) to query more categories exhibiting larger domain shifts while reducing queries for well-aligned categories [61].

Another mechanism improves class balance during pseudo-label generation. Instead of adjusting the query process, these methods generate additional pseudo labels for samples belonging to underrepresented classes or with low class frequency [52,80].

3.4. Multi-fidelity active domain adaptation

When the domain gap between source and target distributions is large, direct adaptation may be ineffective. Multi-fidelity active domain adaptation (MFADA) addresses this problem by introducing intermediate domains with varying query costs [83]. This framework progressively adapts the model from the source domain to the target domain while actively selecting informative samples from each intermediate domain, enabling efficient adaptation under limited annotation budgets.

4. Challenging Scenarios in Active Domain Adaptation

Despite the success of ADA, many real-world applications involve complex domain shifts that significantly degrade adaptation performance. These challenges arise when the relationship between source and target domains deviates from the standard assumption of shared label spaces and moderate distribution shifts. Major challenging scenarios in ADA include label distribution shift, open-set domain adaptation, universal domain adaptation, multi-source or multi-target domain adaptation, and cross-modality adaptation. These scenarios are summarized as follows:

Label distribution shift (Figure 3A): the class frequencies differ significantly between the source $S$ and target $T$ domains, leading to a change in the marginal label distribution $P (y)$ , i.e., $P_{S} (y) \neq P_{T} (y)$ .
Open set domain adaptation (Figure 3B): the target domain contains classes that are absent in the source domain. If the source and target label spaces are denoted as $Y_{S}$ and $Y_{T}$ , respectively, the relationship becomes $Y_{S} \subset Y_{T}$ , where the additional classes correspond to unknown categories $Y_{unknown} = Y_{S} ∖ Y_{T}$ .
Universal domain adaptation (Figure 3C): the overlap between source $Y_{S}$ and target label spaces $Y_{T}$ is unknown, requiring models to simultaneously handle shared and domain-private classes.
Multi-source and multi-target domain adaptation (Figure 3D and Figure 3E):

–

Multi-source DA: models are trained on multiple source domains ${S_{1}, . . ., S_{n}}$ and adapted to a target domain $T$ , where the large distribution discrepancy and heterogeneity exists between source and target domains $P_{S_{i}} \neq P_{T}$ , and among the source domains themselves $P_{S_{i}} \neq P_{S_{j}}$ .

–

Multi-target DA: models are adapted from a single source domain $S$ to multiple target domains ${T_{1}, . . ., T_{n}}$ , where the large distribution discrepancy and heterogeneity exists between source and target domains $P_{S} \neq P_{T_{i}}$ , and among the target domains themselves $P_{T_{i}} \neq P_{T_{j}}$ .
Cross-modality adaptation: source and target samples originate from different data modalities $M_{S} \neq M_{T}$ , resulting in substantial domain shifts $P_{S} ([M_{S}]) \neq P_{T} ([M_{T}])$ .

4.1. Label Distribution Shift

Label distribution shift arises when class frequencies differ between source and target domains, often biasing models toward dominant source classes and degrading performance on underrepresented target classes. To address this issue, ADA methods typically incorporate distribution-aware or class-balanced query strategies.

A representative class of works aligns source and target label distributions through distribution matching. For example, the LAbel distribution Matching through Density-aware Active sampling (LAMDA) framework selects a subset of target samples that approximates the global target distribution and then resamples source data to match this distribution, enabling more balanced supervised adaptation [84]. Low-confidence samples are annotated, while high-confidence ones are pseudo-labeled.

Another class of methods adopts class-aware querying, explicitly prioritizing underrepresented categories. These approaches estimate class frequencies or category-specific importance and allocate labeling budgets accordingly. For instance, category-aware ADA strategies selects samples for individual classes based on their contributions to class-wise performance [85], while methods such as PCADA adjust pseudo-label generation for underrepresented classes [52]. Overall, these approaches mitigate bias by encouraging more uniform coverage of target classes.

4.2. Open-Set and Universal Domain Adaptation

In many real-world scenarios, source and target label spaces are only partially overlapping, leading to open-set or universal domain adaptation problems.

In open-set domain adaptation (OSDA), the target domain contains additional unknown classes that are not present in the source domain. Thus, models are adapted to simultaneously recognize patterns from known classes and identify patterns from unknown ones. A common strategy is to leverage uncertainty estimation to identify samples likely belonging to unknown classes. Evidential approaches assign high uncertainty to samples that deviate from known-class evidence and prioritize them for annotation [37,39]. In source-free settings, OSDA methods employ clustering with uncertainty threshold to separate target samples into common and novel groups, enabling the training of specialized classifiers for known and unknown classes [79].

Universal domain adaptation (UniDA) generalizes this setting by assuming no prior knowledge of label space overlap. Thus, it is unknown whether models are adapted to target samples from shared classes or target private classes [86]. ADA methods for UniDA typically combine domain alignment with active sampling, using uncertainty and diversity to identify both shared and private classes [87]. This allows models to simultaneously adapt to common categories while discovering new ones.

4.3. Multi-Source or Multi-Target Domain Adaptation

Real-world applications often involve multiple source or target domains, introducing additional heterogeneity.

In multi-target ADA, the goal is to adapt a single model across multiple target domains. Existing methods such as Multi-Target ADA (MT-ADA) address this by jointly aligning distributions across domains via decomposed discrimination while actively selecting informative samples using uncertainty and diversity criteria [88]. Prototype-based approaches such as Progressive Prototype Refinement (PPR) construct shared latent representations, aligning distributions to the shared prototypes and thus enabling consistent generalization across domains [89].

Conversely, multi-source ADA integrates knowledge from heterogeneous source domains. Recent approaches such as Multi-source Active Domain Adaptation (MS-ADA) learn domain-agnostic representations from multiple sources (e.g., via hyper-networks or shared prototypes) and then actively query target samples with high uncertainty to guide adaptation [38]. These strategies effectively unify multi-source knowledge before transferring it to the target domain.

4.4. Cross-Modality Adaptation

In some tasks, source and target data are collected from different sensing modalities, resulting in severe cross-modal domain shifts. Conventional unimodal ADA methods often struggle in this setting due to heterogeneous feature spaces. To address this, recent methods such as Curiosity-Driven Active Adaptation Network (CD-A2N) map multi-modal data into a shared latent space and design query strategies based on cross-modal discrepancy [90]. For example, samples whose embeddings deviated significantly from source distributions are prioritized for annotation. Such approaches explicitly account for modality gaps and provide a principled way to guide adaptation under extreme domain shifts.

5. Applications of Active Domain Adaptation

5.1. Active Domain Adaptation in Natural Images

5.1.1. Image Classification

The first work combining active learning and domain adaptation appeared in the Active Learning Domain Adaptation (ALDA) framework to train traditional classifiers (e.g., SVMs) [12]. It was applied to tasks such as adapting classifiers from virtual images to real-world images [91] and multimedia classification [92]. With the emergence of deep learning, ADA methods have increasingly focused on adapting deep neural networks for natural image classification, and they are evaluated on multiple benchmark datasets, including Office-31 [93], Office-Caltech10 [94], Office-Home [95], VisDA-2017 [96], LSDAC [97], Image-CLEF [98], and Adaptiope [99].

Thermal imagery is inherently more robust than RGB imaging under low-light conditions and is therefore widely used in scenarios with limited illumination. However, training models directly on thermal data is often impractical due to the scarcity of well-annotated datasets. In contrast, large-scale RGB image datasets are readily available, motivating the use of ADA to transfer knowledge from RGB to thermal domains. For example, a spectral transfer guided (STG) method adapt models trained on RGB images to thermal imagery by actively selecting informative thermal samples based on predictive uncertainty and feature diversity, thereby improving performance under cross-modality domain shifts [62].

5.1.2. Object Detection

Object detection differs from image classification in that informativeness must be evaluated at the region level, reflecting the structured and spatial nature of detection tasks. Accordingly, ADA methods for object detection focus on identifying informative regions or proposals rather than entire images. Existing agreement-based approach designs a multi-criteria query strategies that combines complementary signals, including prediction inconsistency (e.g., disagreement between source-trained and target-adapted detectors), domain representativeness, and feature diversity, to capture both uncertainty and distributional characteristics [56].

Beyond generic query strategies, some methods integrate ADA directly into modern detection architectures, such as YOLO [100], enabling tighter coupling between adaptation and detection pipelines. YOLO-detection is extended with ADA in ADAID-YOLO which improves adaptation performance of YOLO by actively selecting samples with large pseudo-label discrepancies and region-level uncertainty for domain-adaptive training [75].

In addition, task-specific error patterns play a critical role in guiding sample selection. In particular, false negative (FN) errors are a major source of performance degradation under domain shift in object detection. Recent methods explicitly model such errors by estimating the likelihood of missed detections and prioritizing samples with high FN risk for annotation via a False Negative Prediction Module (FNPM) [58]. This task-aware querying strategy enables more targeted supervision, leading to improved detection performance after adaptation.

5.1.3. Semantic Segmentation

Semantic segmentation introduces unique challenges for ADA due to its dense, pixel-wise prediction nature, requiring query strategies that capture fine-grained spatial uncertainty and structural information. Therefore, existing methods primarily differ in how they define and evaluate informativeness across pixel-level, region-level, and hybrid representations.

At a fundamental level, many approaches rely on pixel-level uncertainty estimation, where informativeness is quantified using entropy or prediction disagreement such as ADA-MCU [68] and SSDA [42]. While these strategies are computationally straightforward and capture local ambiguity, they often overlook spatial structure and semantic coherence across regions. To address this limitation, a second line of works focuses on region-level modeling, grouping neighboring pixels into semantic regions and evaluating their informativeness using combined measures of uncertainty and domain representativeness such as Dynamically Balancing Domainness and Uncertainty (DBDU) [57]. This enables more structured sampling that better reflects object-level characteristics.

Building on this, several methods adopt structure-aware metrics to improve region-level selection. In particular, region impurity has been widely adopted to quantify class heterogeneity within a region [101], providing a principled way to identify ambiguous or mixed semantic areas in ADA of semantic segmentation [32,72,74,77]. Such metrics are integrated with entropy-based uncertainty and diversity criteria in dynamic weighting and boundary-aware (DWBA) method [72]. A boundary-based ADA (BADA) method further extends it with boundary-aware strategies, which prioritize regions near object boundaries where segmentation errors are most likely to occur [32].

Beyond single-level designs, recent advances such as the Label Fusion with Prototype (LFP) method emphasize hybrid query strategies that jointly consider pixel-level and region-level informativeness, achieving a better balance between local uncertainty and global structure [77]. Additionally, alternative representations have been explored to define informative regions beyond prediction space, such as embedding-based methods (e.g., hyperbolic representations) in hyperbolic ADA [76] and distribution-aware criteria that measures domain density differences between source and target domains for underrepresented samples in D2ADA [61].

Finally, to address the complexity of segmentation data, some methods adopt multi-anchor representations to better capture diverse feature distributions. By representing domains with multiple class-wise prototypes rather than a single centroid, these approaches such as MADA and MADAv2 enable more effective selection of complementary samples, and further incorporate region- and pixel-level data augmentation in long-tail scenarios [66,69].

5.1.4. Multi-Tasks

Most ADA methods are designed for single-task settings, whereas recent efforts have begun to explore multi-task adaptation. For example, the label-agnostic active source-free domain adaptation framework (SALAD) jointly integrates active sampling with source-free adaptation across multiple tasks, including classification, detection, and segmentation [70]. Its query strategy combines uncertainty derived from the source model with the target model, enabling task-agnostic selection of informative samples for efficient multi-task adaptation.

5.1.5. Remote Sensing

Remote sensing is a key application domain for ADA due to substantial domain shifts caused by variations in sensors, acquisition conditions, and geographic regions. These challenges have motivated the development of query strategies that effectively capture both uncertainty and representativeness across heterogeneous data distributions in ADA methods.

Early ADA approaches in remote sensing primarily focused on traditional machine learning models, where informative samples were selected based on margin-based uncertainty [102], and further enhanced with diversity-aware sampling through clustering [40]. With the advancement of deep learning, more sophisticated strategies have been developed for both classification and segmentation tasks, emphasizing region-level entropy uncertainty estimation in Error-Aware ADA (EasySeg) [74] and feature-space similarity to ensure that selected samples are both informative and representative of target-domain distributions for land cover classification (LCC) [60].

In hyperspectral image classification, where each pixel contains rich spectral information, early ADA methods focused on adapting SVM-based classifier using margin-based uncertainty strategies [103,104]. More recent deep learning approaches, such as the adversarial discriminative (ADADL) [67] and PCADA [52], further exploit feature-space structure by combining uncertainty estimation with diversity or distance-based criteria. This enables more effective selection of samples that capture complex spectral variations and domain discrepancies.

5.1.6. Vehicle Re-Identification

Vehicle re-identification (ReID) aims to match vehicle instances across cameras and environments, making it highly sensitive to domain shifts between datasets which are collected from different platforms. To address this challenge, ADA-based approaches employ uncertainty- and diversity-driven query strategies to select informative target samples for adaptation. For example, the two-stage active learning (TSAL) framework first constructs a candidate pool via simple sampling schemes and then refines selection by prioritizing samples with high perturbation-based uncertainty and feature diversity, enabling more effective adaptation across heterogeneous ReID datasets [73].

5.2. Active Domain Adaptation in Robotics

ADA plays a crucial role in robotics, where collecting large-scale real-world data is costly and often impractical, motivating sim-to-real adaptation. To address this, existing approaches leverage uncertainty- and diversity-driven query strategies to efficiently select informative real-world samples for adaptation. For instance, multi-view uncertainty and metadata-based diversity have been used to capture both perception ambiguity and contextual variability across environments in MetaMVUC [81]. Some methods, such as informative path planning (IPP) [30], integrate ADA with active data acquisition, enabling embodied agents to actively explore environments and collect high-value samples based on predictive uncertainty, thereby improving adaptation efficiency in dynamic settings.

5.3. Active Domain Adaptation in Medical Data Analysis

Medical data are inherently heterogeneous due to variations in imaging hardware, acquisition protocols, and patient populations, resulting in substantial domain shifts in appearance and feature distributions. Because deep learning models rely heavily on such low-level and structural cues, models trained on source domains often generalize poorly to new clinical environments. ADA addresses this limitation by selectively querying informative target samples, enabling label-efficient adaptation (Table 2).

A key challenge in medical ADA lies in data heterogeneity and representation. Medical data span multiple modalities, including volumetric imaging (CT/MR), whole-slide pathology images, physiological signals, and videos, requiring task-specific preprocessing (e.g., slices, patches, segments, or frames). Consequently, query strategies must be carefully designed to operate in the appropriate scenarios, balancing computational efficiency with clinical relevance.

5.3.1. Classification for Diagnosis

In diagnostic classification, ADA typically operates on image- or segment-level representations, where informativeness is evaluated in feature space. Existing methods such as ALFREDO largely follow a unified paradigm that evaluating entropy-aware uncertainty, clustering-based diversity and to domain discrepancy on feature representations to guide image sample selection [106]. Ultra-wide-field (UWF) fundus images are utilized to identify the stage of diabetic retinopathy, and an ASFDA method selects samples with high feature-space diversity to guide adaptation [107]. Ultrasound imaging is a 2D imaging modality and commonly utilized for the diagnosis of breast diseases, and the ADAptation framework integrates uncertainty and diversity sampling while leveraging diffusion-based reconstruction to generate source-style samples for feature discrepancy analysis [46].

Beyond imaging data, neurophysiological signals such as intracranial electroencephalogram (iEEG) demonstrate large subject-to-subject gaps or shifts due to the differences in human brain anatomy. Since full-sequence evaluation is computationally prohibitive, methods such as the Neighborhood Uncertainty and Diversity (NUD) and ADAADL instead operate on short signal segments, applying similar uncertainty–diversity criteria [113,114].

5.3.2. Medical Image and Video Segmentation

5.3.2.1. CT and MR images.

Segmentation in medical imaging poses unique challenges due to dense pixel/voxel-level predictions and high annotation cost. Early ADA methods predominantly adopt slice-level querying, selecting informative 2D slices based on uncertainty, feature similarity, or diversity, often combined with semi-supervised learning.

Segmentation in CT and MR images is widely used for clinical applications such as tumor localization, organ delineation, and treatment planning, but poses challenges due to large domain shifts across scanners and institutions. Early methods predominantly adopt slice-level querying, selecting informative 2D slices using a single metric, such as domain discrepancy diversity in STDR and versatile ASFDA for nasopharyngeal carcinoma tumor MR segmentation [47,108]. Others employ hybrid strategies, such as entropy uncertainty and clustering diversity in UGTST for MR prostate segmentation [44], and domain discrepancy and Monte Carlo dropout uncertainty in the Influence Points Learning (IPL) [111]; they are further incorporated with semi-supervised learning paradigms.

However, slice-based query strategies exhibit inherent limitations. First, they may prioritize slices with large foreground regions while ignoring those within the boundary regions. Thus, models cannot learn the whole feature distributions of the target organ from these slices. Second, these strategies are prone to select slices which include organs with dominant patterns, thus limiting models from learning patterns of small-sized anatomical structures in multi-organ segmentation. Third, these strategies query non-consecutive slices and thus fail to capture 3D anatomical relationships. To address these issues, recent advances focuse on volume-level querying, where informativeness is evaluated over entire 3D scans. These methods such as ASFDA-ISOH [28] and STFR-PFCM [112] integrate uncertainty estimation, organ diversity, and cross-domain feature alignment to enable more holistic adaptation for pancreas and pancreas tumor segmentation, particularly in cross-modality settings. This transition reflects an important trend from local to global sampling strategies in medical ADA.

5.3.2.2. Pathologic images.

Whole-slide pathology images introduce extreme resolution challenges, making full-image annotation infeasible. Therefore, ADA methods typically divide slides into smaller patches and operate on patch-level sampling. These methods select informative patches via similar metrics, such as foreground uncertainty in CUP for vessel segmentation [109] and cluster entropy for cervical cancer segmentation [105].

5.3.2.3. Medical videos.

For medical videos, ADA methods such as STAR typically perform frame-level selection, incorporating both spatial uncertainty and temporal diversity [115]. By accounting for temporal redundancy, these approaches reduce annotation cost while preserving critical dynamic information. Post-processing mechanisms (e.g., pseudo-label refinement) further improve robustness in semi-supervised settings by filtering noisy pseudo labels.

5.3.3. Multi-Task Medical Data Analysis

Beyond single-task settings, recent ADA frameworks explore multi-task adaptation, aiming to support tasks such as detection, localization, and segmentation within a unified framework. These methods such as Template Choosing Policy (TECP) [110] often rely on task-agnostic similarity or representativeness measures to guide sample selection, suggesting a shift toward more generalizable and scalable ADA paradigms in medical applications.

5.3.4. Active Domain Adaptation of VFM/VLM in Medical Data

Recent advances in vision foundation models (VFMs) and vision-language models (VLMs) have demonstrated strong generalization capabilities due to large-scale self-supervised pretraining. However, adapting these models to domain-specific medical tasks remains necessary. ADA or ASFDA play a crucial role by identifying a minimal set of informative target samples for efficiently fine-tuning 3D VFMs for CT and MR segmentation by measuring uncertainty and domain discrepancy [116]. They may further be integrated with semi-supervised learning to improve fine-tuning efficiency by utilizing unlabeled samples. Recent approaches such as fairness-aware ADA incorporate fairness considerations into leveraging VLMs, ensuring balanced sampling across demographic groups [117]. These developments highlight an important direction: ADA is evolving from purely performance-driven adaptation toward trustworthy and equitable model deployment in clinical settings.

5.4. Active Domain Adaptation in Natural Language Processing

Models in natural language processing (NLP) often experience significant performance degradation when transferred from a source domain to a new target domain. This degradation is primarily caused by several types of domain shifts:

Vocabulary shift: distinct lexical expressions are used to describe the same underlying concepts, and the distribution of tokens or textual features differs across domains, leading models trained on source-domain vocabulary to encounter unfamiliar or differently distributed words in the target domain (Figure 4A). This shift can be expressed as $P_{S} (x) \neq P_{T} (x)$ with $P_{S} (y | x) = P_{T} (y | x)$ , where x denotes textual features such as words, subwords, or embeddings.
Context shift: contextual patterns that determine meaning vary across domains. Because NLP models rely heavily on contextual information to infer semantics, differences in contextual usage may cause incorrect predictions (Figure 4B). This shift is represented as $P_{S} (y | x) \neq P_{T} (y | x)$ , where y denotes labels and x represents contextual features.
Label shift: the distribution of labels changes between domains, causing models trained on the source distribution to produce biased predictions in the target domain (Figure 4C). This shift can be written as $P_{S} (y) \neq P_{T} (y)$ .

To mitigate these issues, ADA has been explored in several NLP tasks for efficient model adaptation. In word sense disambiguation, models must determine the correct meaning of ambiguous words based on context. Since word sense distributions can vary substantially across domains, models trained on a source corpus often misidentify the predominant sense in a new domain. An ADA approach addresses the label shifts in this issue by querying highly confident target-domain predictions to efficiently adapt the sense distribution of the model [118].

In dependency parsing, models trained on newswire text frequently perform poorly on other domains because of differences in syntactic structures and vocabulary usage. To address this context shift, an ADA strategy selects target-domain sentences with high perplexity, which indicate unfamiliar linguistic patterns and low predictive confidence, for annotation and model retraining [119].

Similarly, coreference resolution systems trained on newswire data often struggle in specialized domains such as biomedical literature due to domain-specific terminology and rare entity mentions. To handle this vocabulary shift, an ADA method queries instances with low predictive confidence for manual labeling to improve model adaptation [120].

In sentiment analysis, both vocabulary and contextual shifts frequently occur across domains because sentiment expressions vary widely across different domains, and usage patterns of the same word also vary largely. An active sentiment domain adaptation method addresses these shifts by selecting instances with high classification uncertainty for annotation, allowing the model to learn domain-specific sentiment patterns efficiently [121].

5.5. Active Domain Adaptation in Graph Learning

Graph domain adaptation is inherently challenging due to the complex topology and relational dependencies among nodes. Domain shifts may arise from variations in node features, connectivity patterns, and structural relationships, making direct transfer across graphs difficult [122]. To address this, ADA methods focus on selective node-level querying, where informative nodes are identified based on both predictive uncertainty and structural inconsistency. A common strategy is to leverage model disagreement and topological uncertainty to guide sampling. For example, the Dual Consistency Delving with Topological Uncertainty (DELTA) framework identifies candidate nodes by measuring prediction inconsistency and further refines selection using graph structure-aware uncertainty and domain discrepancy [122]. More broadly, these approaches highlight that effective graph ADA requires integrating feature-level and topology-aware criteria rather than relying on conventional uncertainty alone.

In real-world applications such as cross-city trip prediction, graph-based ADA methods such as Graph Embedding Network and Active Domain Adaptation (AGENDA) further incorporates hybrid query strategies that combine uncertainty, domain discrepancy, and diversity [45]. By actively selecting representative and informative samples, these frameworks enable robust adaptation across cities with differing spatial structures and human mobility patterns. Overall, graph ADA emphasizes the importance of structure-aware active sampling in relational domains.

5.6. Active Domain Adaptation in Science and Engineering

In science and engineering, ML/DL models are widely used for prediction and pattern recognition, yet their performance often degrades under environmental variability and system evolution. ADA provides a practical solution by adapting models with minimal labeled data, making it particularly valuable in scenarios where annotation is costly or infeasible. (Table 3).

One representative application arises in electronic nose systems, where gas sensor responses used for gas recognition drift due to environmental variations and hardware aging over time, leading to distribution shifts between historical and newly collected measurements. To address this issue, a domain-adaptation-based active ensemble learning (DAEL) framework employs a query-by-committee strategy to adapt gas classification models across drifted sensor distributions [20].

Geological analysis, such as logging lithology identification, utilizes models to infer rock types from well-logging data, but models trained on one well often generalize poorly to another because logging signals vary across geological environments and measurement conditions. Here, ADA methods such as active adaptation for logging lithology identification (AALLI) rely on query-by-committee uncertainty sampling for active labeling, and are combined with reliable pseudo-labeling in semi-supervised learning [19].

In industrial safety assessment, classification models must detect system damage under changing operational conditions and previously unseen scenarios. These shifts make it difficult for models trained on known operating modes to generalize to new environments. ADA frameworks such as adversarial weighted active domain adaptation (AWADA) integrate adversarial learning and distribution alignment with active querying to handle unseen conditions and evolving system behaviors [123]. Similarly, in Industrial Internet of Things (IIoT) security, where intrusion detection models must operate across diverse and dynamic network environments. The presence of noisy, incomplete, or scarce labeled data further complicates cross-domain generalization, but a dual active domain adaptation approach combines active sampling with robust domain adaptation techniques to improve cross-environment generalization [124].

6. Active Continual Learning

In many real-world applications, data arrive sequentially and their distributions evolve over time. Models trained on earlier data often fail to generalize to newly observed domains, while continuous retraining may cause catastrophic forgetting, where previously acquired knowledge is overwritten by new learning. ACL addresses this challenge by integrating AL and continual learning (CL). Specifically, ACL methods employ AL strategies to query informative samples from newly observed data streams and utilize CL techniques to retain knowledge learned from earlier domains (Figure 5, Table 4).

6.1. Problem Formulation

ACL considers a sequence of domains

D = {D_{1}, D_{2}, . . ., D_{T}}

. The initial domain

D_{1}

contains

N_{1}

labeled samples

(X_{1}, Y_{1})

used to train a model

F (Θ)

. At each subsequent task i, a new domain

D_{i}

provides unlabeled data

X_{i}

. ACL employs a query strategy

π (\cdot)

to select a small subset of informative samples

X_{i}^{l}

from

X_{i}

for annotation. The model is then updated by these labeled samples

(X_{i}^{l}, Y_{i}^{l})

, thus continuously being adapted to new domains

{D_{i}, . . ., D_{T}}

. CL mechanisms ensure models to preserve performance on previously encountered domains

{D_{1}, \dots, D_{i - 1}}

.

6.2. Query Strategies in Active Continual Learning

6.2.1. Uncertainty-Based Strategies

Uncertainty-based querying is one of the most widely adopted strategies in ACL, as uncertain samples are often the most informative for adapting to evolving data distributions. Existing methods employ various uncertainty measures, including margin-based uncertainty in Online Self-Adaptive Mirror Descent (OSAMD) [125], predictive entropy in active source-free batch normalization adaptation (ASFBNA) [14], predictive confidence in Continual Uncertainty-aware Active Learner (CUAL) and active continual learning approach for quality monitoring (ACL-QM) [126,127], and query-by-committee in Few-Shot Continual Active Learning (FoCAL) and Dynamic Active Learning (DAL) [128,129]. In addition, CUAL combine uncertainty-driven querying with pseudo-labeling of high-confidence samples, facilitating semi-supervised adaptation [126]. Overall, uncertainty-based methods emphasize rapid adaptation to distribution shifts, though they may be sensitive to model miscalibration in non-stationary environments.

6.2.2. Diversity-Based Strategies

In contrast, diversity-based strategies aim to ensure that selected samples are representative of the evolving data distribution, thereby improving coverage of newly emerging patterns. These methods such as CASA and CASAv2 often rely on clustering or feature-space grouping to partition streaming data into pseudo-domains and select representative samples from each group [130,131]. Such strategies are particularly effective in scenarios with significant domain heterogeneity, as they promote balanced sampling across different data modes rather than focusing solely on difficult samples.

6.2.3. Hybrid Strategies

Hybrid approaches combine uncertainty and diversity to balance informativeness and representativeness. Typically, uncertainty is first used to identify candidate samples, followed by diversity-based selection to reduce redundancy. The online active continual learning (OACL) framework follows this design based on confidence-based uncertainty and class discrepancy to support lifelong robotic object recognition [132]. This sequential design improves sampling efficiency while maintaining broad coverage of the data distribution. Recent methods such as Energy Alignment Sampling Strategy (EASS) further incorporate alternative uncertainty measures, such as energy-based scores, alongside feature-space diversity to enhance robustness [133]. These hybrid strategies reflect a general trend toward multi-criteria query mechanisms in ACL.

6.2.4. ACL-Specific Query Strategies

Unlike conventional ADA, ACL must also account for catastrophic forgetting in streaming settings. Therefore, several methods design query strategies that explicitly consider the interaction between new learning and prior knowledge. For example, Fisher information-based approaches estimate the impact of each sample on model parameters, prioritizing those that improve adaptation while minimizing interference with previously learned tasks in Accumulated informativeness-based Active Continual Learning (AccuACL) [134]. Other methods such as active continual learning approach with Motion Planning Networks (ACL-MPNet) integrate episodic memory mechanisms, where actively selected samples are used to populate memory buffers for rehearsal-based continual learning [135]. These strategies highlight the need for task-aware and memory-aware querying in ACL.

6.2.5. Empirical Analyses of Query Strategies

Several works systematically evaluated query strategies under different ACL scenarios. Meta-Continual Active Learning (Meta-CAL) evaluates uncertainty-based, diversity-based, representation-based, and random sampling across domain-, class-, and task-incremental learning scenarios [136]. Similarly, Balancing knowledge retention and learnability (BKRL) investigates the interaction between AL and CL strategies and shows that uncertainty sampling performs best in domain-incremental settings, while diversity-based sampling is more effective when new classes are introduced [137]. These findings underscore that the effectiveness of query strategies in ACL is scenario-dependent, and no single strategy is universally optimal.

6.3. Mitigate Catastrophic Forgetting in Active Continual Learning

CL techniques play a critical role in ACL by preventing catastrophic forgetting during sequential adaptation. Existing CL methods can generally be categorized into three groups.

6.3.1. Replay-Based Methods

Replay-based (memory-based or rehearsal-based) methods maintain a memory buffer of previously observed samples or features and periodically retrain the model using these stored examples. Representative techniques include Experience Replay [139], Pseudo-Rehearsal, and Gradient Episodic Memory [140]. Many ACL frameworks adopts replay mechanisms, including CUAL [126], ACL-MPNet [135], FoCAL [128], CASA [130], CASAv2 [131], OACL [132], DAL [129], ACL-QM [127], RBACA [138], and EASS [133].

6.3.2. Regularization-Based Methods

Regularization-based approaches identify parameters that are important for previously learned tasks and penalize changes to these parameters during adaptation. Representative methods include Elastic Weight Consolidation [141] and Synaptic Intelligence [142]. Some ACL approaches incorporate similar principles; for example, AccuACL uses Fisher information to constrain parameter updates [134], while OSAMD employs knowledge distillation to preserve previous knowledge [125].

6.3.3. Parameter-Isolation Methods

Parameter-based techniques allocate task-specific parameters to prevent interference between tasks. ASFBNA, for instance, adapts models by updating task-specific batch normalization parameters while freezing the remaining network weights [14].

Additionally, several studies bench-marked different CL strategies across domain-, class-, and task-incremental scenarios, highlighting the importance of selecting appropriate CL techniques for different ACL settings, such as Elastic Weight Consolidation and Experience Replay in BKRL [137], and Replay-based and Regularization-based strategies in Meta-CAL [136].

6.4. Applications of Active Continual Learning

6.4.1. Medical Data Analysis

In medical imaging, data streams from new scanners or acquisition protocols can introduce domain shifts over time. The CASA and CASAv2 frameworks monitor incoming medical images to detect distribution changes and actively select informative samples for annotation, enabling continual adaptation across scanner domains [130,131]. Similarly, the RBACA method integrates rehearsal-based continual learning with active sample selection for medical image segmentation and diagnosis [138].

6.4.2. Robotics

Robotic systems operate in dynamic environments and must continuously learn from new observations while retaining prior knowledge. ACL-MPNet applies memory-guided active sampling and gradient episodic memory to adapt motion planning networks using streaming data [135]. In addition, DAL and OACL enable continual adaptation in prosthetic control and robotic object recognition by selecting informative samples from evolving environments [129,132].

7. Challenges and Future Directions

This survey reviewed recent advances in ADA and ACL, two complementary paradigms for handling distribution shifts across domains and over time. By actively querying informative samples, these approaches improve adaptation performance while reducing annotation cost. ADA focuses on mitigating discrepancies between labeled source data and unlabeled target data through selective sampling, while ACL extends this paradigm to sequential settings, enabling models to adapt to evolving domains while mitigating catastrophic forgetting. Despite notable progress, several challenges remain to limit the performance.

A key challenge in ADA lies in designing robust query strategies under severe domain and label distribution shifts in source-free settings. Most existing methods inherit designs from standard ADA and rely heavily on target-domain structural cues, such as feature distributions, clustering structures, or neighborhood relationships. However, these signals are often derived from source-trained models and may become unreliable when domain gaps are large. As a result, both uncertainty estimation and diversity-based sampling can be significantly degraded. Future research should therefore focus on developing shift-robust querying mechanisms, potentially by incorporating more reliable cross-domain representations, adaptive uncertainty calibration, or auxiliary knowledge beyond source-induced features.

Another open challenge arises from the integration of semi-supervised learning with ADA, particularly in ensuring the reliability of pseudo labels. Naively assigning pseudo labels to all target samples risks introducing noisy supervision, while overly conservative selection limits the benefits of leveraging unlabeled data. This highlights a fundamental trade-off between label quantity and label quality. To address this issue, future work should explore more principled pseudo-labeling strategies, such as confidence-aware selection, prototype-based alignment, graph-based label propagation, and consistency regularization, to improve both the accuracy and utility of pseudo labels in domain adaptation.

A key challenge in ACL lies in jointly optimizing sample selection and knowledge retention under non-stationary data streams. Existing approaches often treat AL query strategies and CL mechanisms as loosely coupled components, prioritizing either rapid sequential adaptation or knowledge stability without explicitly modeling their interaction. This separation can lead to suboptimal trade-offs, where aggressively selecting informative samples accelerates adaptation but exacerbates catastrophic forgetting, while conservative CL strategies may hinder learning of new distributions. To address these issues, future research should focus on designing unified, task-aware query strategies that explicitly account for both informativeness and forgetting risk. Developing adaptive query strategies enables ACL methods to dynamically balance exploration of new knowledge and retention of past knowledge. In addition, establishing streaming benchmarks and evaluation protocols will be essential for systematically assessing how different AL–CL integrations perform under realistic continual learning scenarios.

References

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Jawahar, G.; Sagot, B.; Seddah, D. What does BERT learn about the structure of language? In Proceedings of the Proceedings of the 57th annual meeting of the association for computational linguistics, 2019; pp. 3651–3657. [Google Scholar]
Goh, G.B.; Hodas, N.O.; Vishnu, A. Deep learning for computational chemistry. J. Comput. Chem. 2017, 38, 1291–1307. [Google Scholar] [CrossRef]
Khalil, R.A.; Saeed, N.; Masood, M.; Fard, Y.M.; Alouini, M.S.; Al-Naffouri, T.Y. Deep learning in the industrial internet of things: Potentials, challenges, and emerging applications. IEEE Internet Things J. 2021, 8, 11016–11040. [Google Scholar] [CrossRef]
Luo, Y.; Zheng, L.; Guan, T.; Yu, J.; Yang, Y. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 2507–2516. [Google Scholar]
Kouw, W.M.; Loog, M. A review of domain adaptation without target labels. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 766–785. [Google Scholar] [CrossRef] [PubMed]
Zhao, H.; Des Combes, R.T.; Zhang, K.; Gordon, G. On learning invariant representations for domain adaptation. In Proceedings of the International conference on machine learning. PMLR, 2019; pp. 7523–7532. [Google Scholar]
Huang, S.J.; Jin, R.; Zhou, Z.H. Active learning by querying informative and representative examples. Adv. Neural Inf. Process. Syst. 2010, 23. [Google Scholar] [CrossRef]
Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Gupta, B.B.; Chen, X.; Wang, X. A survey of deep active learning. ACM Comput. Surv. (CSUR) 2021, 54, 1–40. [Google Scholar] [CrossRef]
Saha, A.; Rai, P.; Daumé, H., III; Venkatasubramanian, S.; DuVall, S.L. Active supervised domain adaptation. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2011; Springer; pp. 97–112. [Google Scholar]
Su, J.C.; Tsai, Y.H.; Sohn, K.; Liu, B.; Maji, S.; Chandraker, M. Active adversarial domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020; pp. 739–748. [Google Scholar]
Machireddy, A.; Krishnan, R.; Ahuja, N.; Tickoo, O. Continual active adaptation to evolving distributional shifts. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022; pp. 3444–3450. [Google Scholar]
Cai, Z.; Sener, O.; Koltun, V. Online continual learning with natural distribution shifts: An empirical study with visual data. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 8281–8290. [Google Scholar]
Wang, D.; Shang, Y. A new active labeling method for deep learning. In Proceedings of the 2014 International joint conference on neural networks (IJCNN); IEEE, 2014; pp. 112–119. [Google Scholar]
Li, M.; Sethi, I.K. Confidence-based active learning. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1251–1261. [Google Scholar] [CrossRef]
Fu, B.; Cao, Z.; Wang, J.; Long, M. Transferable query selection for active domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021; pp. 7272–7281. [Google Scholar]
Chang, J.; Kang, Y.; Zheng, W.X.; Cao, Y.; Li, Z.; Lv, W.; Wang, X.M. Active domain adaptation with application to intelligent logging lithology identification. IEEE Trans. Cybern. 2021, 52, 8073–8087. [Google Scholar] [CrossRef]
Yan, J.; Sun, R.; Liu, T.; Duan, S. Domain-adaptation-based active ensemble learning for improving chemical sensor array performance. Sens. Actuators A Phys. 2023, 357, 114411. [Google Scholar] [CrossRef]
Sun, T.; Lu, C.; Ling, H. Local context-aware active domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp. 18634–18643. [Google Scholar]
Tian, Q.; Li, Y.; Yu, J.; Shen, J.; Ou, W. Rethinking Active Domain Adaptation: Balancing Uncertainty and Diversity. Image Vis. Comput. 2025, 158, 105492. [Google Scholar] [CrossRef]
Wang, F.; Han, Z.; Zhang, Z.; He, R.; Yin, Y. Mhpl: Minimum happy points learning for active source free domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023; pp. 20008–20018. [Google Scholar]
Sun, Y.; Shi, G.; Dong, W.; Li, X.; Dong, L.; Xie, X. Local Uncertainty Energy Transfer for Active Domain Adaptation. IEEE Transactions on Image Processing, 2025. [Google Scholar]
Rangwani, H.; Jain, A.; Aithal, S.K.; Babu, R.V. S3vaada: Submodular subset selection for virtual adversarial active domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp. 7516–7525. [Google Scholar]
Li, S.; Zhang, R.; Gong, K.; Xie, M.; Ma, W.; Gao, G. Source-free active domain adaptation via augmentation-based sample query and progressive model adaptation. IEEE Transactions on Neural Networks and Learning Systems, 2023. [Google Scholar]
Tian, Q.; Zhang, H. Feature mixing and self-training for source-free active domain adaptation. Comput. Electr. Eng. 2023, 111, 108966. [Google Scholar] [CrossRef]
Yang, J.; Yu, X.; Qiu, P.; Marcus, D.; Sotiras, A. Active Source-Free Cross-Domain and Cross-Modality Adaptation for Volumetric Medical Image Segmentation by Image Sensitivity and Organ Heterogeneity Sampling. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2025; Springer; pp. 3–12. [Google Scholar]
Nguyen, A.; Yosinski, J.; Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015; pp. 427–436. [Google Scholar]
Zurbrügg, R.; Blum, H.; Cadena, C.; Siegwart, R.; Schmid, L. Embodied active domain adaptation for semantic segmentation via informative path planning. IEEE Robot. Autom. Lett. 2022, 7, 8691–8698. [Google Scholar] [CrossRef]
Huang, D.; Li, J.; Chen, W.; Huang, J.; Chai, Z.; Li, G. Divide and adapt: Active domain adaptation via customized learning. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 7651–7660. [Google Scholar]
Xu, X.; Yen, G.G.; Zhao, C.; Sun, Q.; Ren, W.; Sheng, L.; Tang, Y. Boundary-Based Active Domain Adaptation for Semantic Segmentation Under Adverse Conditions. IEEE Transactions on Neural Networks and Learning Systems, 2025. [Google Scholar]
Xie, B.; Yuan, L.; Li, S.; Liu, C.H.; Cheng, X.; Wang, G. Active learning for domain adaptation: An energy-based approach. Proc. Proc. AAAI Conf. Artif. Intell. 2022, Vol. 36, 8708–8716. [Google Scholar] [CrossRef]
Li, X.; Du, Z.; Li, J.; Zhu, L.; Lu, K. Source-free active domain adaptation via energy-based locality preserving transfer. In Proceedings of the Proceedings of the 30th ACM international conference on multimedia, 2022; pp. 5802–5810. [Google Scholar]
Du, Z.; Li, J. Diffusion-based probabilistic uncertainty estimation for active domain adaptation. Adv. Neural Inf. Process. Syst. 2023, 36, 17129–17155. [Google Scholar]
Xie, M.; Li, S.; Zhang, R.; Liu, C.H. Dirichlet-based Uncertainty Calibration for Active Domain Adaptation. In Proceedings of the The Eleventh International Conference on Learning Representations, 2023. [Google Scholar]
Bao, W.; Yu, Q.; Kong, Y. Evidential deep learning for open set action recognition. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 13349–13358. [Google Scholar]
Zhang, W.; Lv, Z.; Zhou, H.; Liu, J.W.; Li, J.; Li, M.; Li, Y.; Zhang, D.; Zhuang, Y.; Tang, S. Revisiting the domain shift and sample uncertainty in multi-source active domain transfer. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024; pp. 16751–16761. [Google Scholar]
Tian, Q.; Yu, J.; Zhao, Y.; Li, W.; Lei, Z. Evidential Deep Learning for Open-Set Active Domain Adaptation. IEEE Transactions on Neural Networks and Learning Systems, 2025. [Google Scholar]
Persello, C. Interactive domain adaptation for the classification of remote sensing images using active learning. IEEE Geosci. Remote Sens. Lett. 2012, 10, 736–740. [Google Scholar] [CrossRef]
Bouvier, V.; Very, P.; Chastagnol, C.; Tami, M.; Hudelot, C. Stochastic Adversarial Gradient Embedding for Active Domain Adaptation. In Proceedings of the ECML, Bilbao, Spain, 2021. [Google Scholar]
Wen, L.; Xu, Y.; Feng, Z.; Zhou, J.; Zhou, L.; Wang, Y. Semi-supervised domain adaptation for semantic segmentation via active learning with feature-and semantic-level alignments. IEEE Transactions on Intelligent Vehicles, 2024. [Google Scholar]
Sun, Z.; Lin, L.; Yu, Y. You only label once: A self-adaptive clustering-based method for source-free active domain adaptation. IET Image Process. 2024, 18, 1268–1282. [Google Scholar]
Luo, Z.; Luo, X.; Gao, Z.; Wang, G. An uncertainty-guided tiered self-training framework for active source-free domain adaptation in prostate segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2024; Springer; pp. 107–117. [Google Scholar]
Liao, C.; Chen, C.; Zhang, W.; Guo, S.; Liu, C. AGENDA: Predicting Trip Purposes with A New Graph Embedding Network and Active Domain Adaptation. ACM Trans. Knowl. Discov. From Data 2024, 18, 1–25. [Google Scholar] [CrossRef]
Duan, Y.; Huang, Y.; Yang, X.; Han, L.; Xie, X.; Zhu, Z.; He, P.; Chan, K.H.; Cui, L.; Im, S.K.; et al. ADAptation: Reconstruction-Based Unsupervised Active Learning for Breast Ultrasound Diagnosis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2025; Springer; pp. 35–45. [Google Scholar]
Wang, H.; Zhang, S.; Chen, J.; He, Y.; Xu, J.; Huang, H.; Xiao, J.; Li, L.; Liao, W.; Zhang, S.; et al. Versatile Source-Free Active Domain Adaptation for multi-center and multi-rater medical image segmentation. Inf. Fusion 2025, 103586. [Google Scholar] [CrossRef]
Sener, O.; Savarese, S. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In Proceedings of the International Conference on Learning Representations, 2018. [Google Scholar]
Liu, S.; Jiang, Z.; Li, Y.; Peng, J.; Wang, Y.; Lin, W. Density matters: improved core-set for active domain adaptive segmentation. Proc. Proc. AAAI Conf. Artif. Intell. 2024, Vol. 38, 13999–14007. [Google Scholar] [CrossRef]
He, J.; Liu, B.; Yin, G. Enhancing semi-supervised domain adaptation via effective target labeling. Proc. Proc. AAAI Conf. Artif. Intell. 2024, Vol. 38, 12385–12393. [Google Scholar] [CrossRef]
Zhu, J.; Chen, X.; Hu, Q.; Xiao, Y.; Wang, B.; Sheng, B.; Chen, C.P. Clustering environment aware learning for active domain adaptation. IEEE Trans. Syst. Man. Cybern. Syst. 2024, 54, 3891–3904. [Google Scholar] [CrossRef]
Luo, H.; Zhong, S.; Gong, C. Prototype-Guided Class-Balanced Active Domain Adaptation for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing, 2025. [Google Scholar]
Lyu, M.; Hao, T.; Xu, X.; Chen, H.; Lin, Z.; Han, J.; Ding, G. Learn from the learnt: Source-free active domain adaptation via contrastive sampling and visual persistence. In Proceedings of the European Conference on Computer Vision, 2024; Springer; pp. 228–246. [Google Scholar]
Zhou, F.; Shui, C.; Yang, S.; Huang, B.; Wang, B.; Chaib-draa, B. Discriminative active learning for domain adaptation. Knowl.-Based Syst. 2021, 222, 106986. [Google Scholar] [CrossRef]
Han, K.; Kim, Y.; Han, D.; Lee, H.; Hong, S. TL-ADA: Transferable loss-based active domain adaptation. Neural Netw. 2023, 161, 670–681. [Google Scholar] [CrossRef]
Menke, M.; Wenzel, T.; Schwung, A. Bridging the gap: Active learning for efficient domain adaptation in object detection. Expert Syst. With Appl. 2024, 254, 124403. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, L.; Liu, Z. Active domain adaptation for semantic segmentation via dynamically balancing domainness and uncertainty. Image Vis. Comput. 2024, 148, 105132. [Google Scholar] [CrossRef]
Nakamura, Y.; Ishii, Y.; Yamashita, T. Active domain adaptation with false negative prediction for object detection. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024; pp. 28782–28792. [Google Scholar]
mathelin, A.D.; Deheeger, F.; MOUGEOT, M.; Vayatis, N. Discrepancy-Based Active Learning for Domain Adaptation. In Proceedings of the International Conference on Learning Representations, 2022. [Google Scholar]
Kalita, I.; Kumar, R.N.S.; Roy, M. Deep learning-based cross-sensor domain adaptation under active learning for land cover classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Wu, T.H.; Liou, Y.S.; Yuan, S.J.; Lee, H.Y.; Chen, T.I.; Huang, K.C.; Hsu, W.H. D 2 ada: Dynamic density-aware active domain adaptation for semantic segmentation. In Proceedings of the European Conference on Computer Vision, 2022; Springer; pp. 449–467. [Google Scholar]
Ustun, B.; Kaya, A.K.; Ayerden, E.C.; Altinel, F. Spectral transfer guided active domain adaptation for thermal imagery. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023; pp. 449–458. [Google Scholar]
Tian, Q.; Pan, J.; Yang, Y.; Ou, W. Dual-Focus Memory Contrastive Learning for Active Domain Adaptation. Neural Netw. 2025, 108224. [Google Scholar] [CrossRef]
Xie, M.; Li, Y.; Wang, Y.; Luo, Z.; Gan, Z.; Sun, Z.; Chi, M.; Wang, C.; Wang, P. Learning distinctive margin toward active domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022; pp. 7993–8002. [Google Scholar]
Prabhu, V.; Chandrasekaran, A.; Saenko, K.; Hoffman, J. Active domain adaptation via clustering uncertainty-weighted embeddings. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 8505–8514. [Google Scholar]
Ning, M.; Lu, D.; Wei, D.; Bian, C.; Yuan, C.; Yu, S.; Ma, K.; Zheng, Y. Multi-anchor active domain adaptation for semantic segmentation. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 9112–9122. [Google Scholar]
Saboori, A.; Ghassemian, H. Adversarial discriminative active Deep Learning for domain adaptation in hyperspectral images classification. Int. J. Remote Sens. 2021, 42, 3981–4003. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, R. Active domain adaptation with multi-level contrastive units for semantic segmentation. In Proceedings of the Proceedings of the Asian Conference on Computer Vision, 2022; pp. 1640–1657. [Google Scholar]
Ning, M.; Lu, D.; Xie, Y.; Chen, D.; Wei, D.; Zheng, Y.; Tian, Y.; Yan, S.; Yuan, L. MADAv2: Advanced multi-anchor based active domain adaptation segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 13553–13566. [Google Scholar] [CrossRef]
Kothandaraman, D.; Shekhar, S.; Sancheti, A.; Ghuhan, M.; Shukla, T.; Manocha, D. Salad: Source-free active label-agnostic domain adaptation for classification, segmentation and detection. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023; pp. 382–391. [Google Scholar]
Tian, Q.; Zhou, L.; Zhu, Y.; Kang, L. Active domain adaptation with mining diverse knowledge: An updated class consensus dictionary approach. Inf. Sci. 2024, 667, 120485. [Google Scholar] [CrossRef]
Guan, L.; Yuan, X. Dynamic weighting and boundary-aware active domain adaptation for semantic segmentation in autonomous driving environment. IEEE Transactions on Intelligent Transportation Systems, 2024. [Google Scholar]
Shang, L.; Zhao, D.; Nie, Y.; Zhao, K.; Xiao, L.; Dai, B. A Two-Stage Active Domain Adaptation Framework for Vehicle Re-Identification. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2024; Springer; pp. 380–394. [Google Scholar]
Yang, L.; Chen, H.; Yang, A.; Li, J. EasySeg: An error-aware domain adaptation framework for remote sensing imagery semantic segmentation via interactive learning and active learning. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–18. [Google Scholar] [CrossRef]
Han, F.; Ye, P.; Duan, S.; Wang, L. Ada-iD: Active Domain Adaptation for Intrusion Detection. In Proceedings of the Proceedings of the 32nd ACM International Conference on Multimedia, 2024; pp. 7404–7413. [Google Scholar]
Franco, L.; Mandica, P.; Kallidromitis, K.; Guillory, D.; Li, Y.T.; Darrell, T.; Galasso, F. Hyperbolic Active Learning for Semantic Segmentation under Domain Shift. In Proceedings of the Forty-first International Conference on Machine Learning, 2024. [Google Scholar]
Peng, J.; Sun, M.; Lim, E.G.; Wang, Q.; Xiao, J. Prototype Guided Pseudo Labeling and Perturbation-based Active Learning for domain adaptive semantic segmentation. Pattern Recognit. 2024, 148, 110203. [Google Scholar] [CrossRef]
Ouyang, J.; Zhang, Z.; Meng, Q.; Chi, J. Structure-Based Uncertainty Estimation for Source-Free Active Domain Adaptation. IET Comput. Vis. 2025, 19, e70020. [Google Scholar] [CrossRef]
Wang, F.; Han, Z.; Sun, H.; Yin, Y. Active source-free open-set domain adaptation. Knowl.-Based Syst. 2025, 114342. [Google Scholar] [CrossRef]
Safaei, B.; Vibashan, V.; Patel, V.M. Certainty and uncertainty guided active domain adaptation. In Proceedings of the 2025 IEEE International Conference on Image Processing (ICIP); IEEE, 2025; pp. 2342–2347. [Google Scholar]
Gilles, M.; Furmans, K.; Rayyes, R. Metamvuc: Active learning for sample-efficient sim-to-real domain adaptation in robotic grasping. IEEE Robotics and Automation Letters, 2025. [Google Scholar]
Wang, F.; Han, Z.; Yin, Y. BIAS: Bridging Inactive and Active Samples for active source free domain adaptation. Knowl.-Based Syst. 2024, 284, 111151. [Google Scholar]
Sagawa, S.; Hino, H. Cost-effective framework for gradual domain adaptation with multifidelity. Neural Netw. 2023, 164, 731–741. [Google Scholar] [CrossRef]
Hwang, S.; Lee, S.; Kim, S.; Ok, J.; Kwak, S. Combating label distribution shift for active domain adaptation. In Proceedings of the European Conference on Computer Vision, 2022; Springer; pp. 549–566. [Google Scholar]
Xiao, W.; Gu, J.; Liu, H. Category-aware active domain adaptation. In Proceedings of the Forty-first International Conference on Machine Learning, 2024. [Google Scholar]
You, K.; Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Universal domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 2720–2729. [Google Scholar]
Ma, X.; Gao, J.; Xu, C. Active universal domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 8968–8977. [Google Scholar]
Zhang, L.; Xu, L.; Motamed, S.; Chakraborty, S.; De la Torre, F. D3GU: Multi-target active domain adaptation via enhancing domain alignment. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024; pp. 2577–2586. [Google Scholar]
Zhu, Y.; Ai, J.; Wu, L.; Guo, D.; Jia, W.; Hong, R. An active multi-target domain adaptation strategy: Progressive class prototype rectification. IEEE Transactions on Multimedia, 2024. [Google Scholar]
Yao, X.; Peng, X.; Gao, J.; Yuan, Z.; Wu, X.; Xu, C. Active Cross-Modal Domain Adaptation. IEEE Transactions on Multimedia; 2025. [Google Scholar]
Vázquez, D.; López, A.; Ponsa, D.; Marin, J. Cool world: domain adaptation of virtual and real worlds for human detection using active learning. In Proceedings of the Advances in Neural Information Processing Systems–Workshop on Domain Adaptation: Theory and Applications, 2011. [Google Scholar]
Liu, G.; Yan, Y.; Subramanian, R.; Song, J.; Lu, G.; Sebe, N. Active domain adaptation with noisy labels for multimedia analysis. World Wide Web 2016, 19, 199–215. [Google Scholar] [CrossRef]
Saenko, K.; Kulis, B.; Fritz, M.; Darrell, T. Adapting visual category models to new domains. In Proceedings of the European conference on computer vision, 2010; Springer; pp. 213–226. [Google Scholar]
Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012; pp. 2066–2073. [Google Scholar]
Venkateswara, H.; Eusebio, J.; Chakraborty, S.; Panchanathan, S. Deep hashing network for unsupervised domain adaptation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 5018–5027. [Google Scholar]
Peng, X.; Usman, B.; Kaushik, N.; Hoffman, J.; Wang, D.; Saenko, K. Visda: The visual domain adaptation challenge. arXiv 2017, arXiv:1710.06924. [Google Scholar] [CrossRef]
Peng, X.; Bai, Q.; Xia, X.; Huang, Z.; Saenko, K.; Wang, B. Moment matching for multi-source domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019; pp. 1406–1415. [Google Scholar]
Caputo, B.; Müller, H.; Martinez-Gomez, J.; Villegas, M.; Acar, B.; Patricia, N.; Marvasti, N.; Üsküdarlı, S.; Paredes, R.; Cazorla, M.; et al. Imageclef 2014: Overview and analysis of the results. In Proceedings of the International conference of the cross-language evaluation forum for European languages, 2014; Springer; pp. 192–211. [Google Scholar]
Ringwald, T.; Stiefelhagen, R. Adaptiope: A modern benchmark for unsupervised domain adaptation. In Proceedings of the Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021; pp. 101–110. [Google Scholar]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Xie, B.; Yuan, L.; Li, S.; Liu, C.H.; Cheng, X. Towards fewer annotations: Active learning via region impurity and prediction uncertainty for domain adaptive semantic segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 8068–8078. [Google Scholar]
Persello, C.; Bruzzone, L. A novel active learning strategy for domain adaptation in the classification of remote sensing images. In Proceedings of the 2011 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2011; pp. 3720–3723. [Google Scholar]
Matasci, G.; Tuia, D.; Kanevski, M. SVM-based boosting of active learning strategies for efficient domain adaptation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1335–1343. [Google Scholar] [CrossRef]
Deng, C.; Liu, X.; Li, C.; Tao, D. Active multi-kernel domain adaptation for hyperspectral image classification. Pattern Recognit. 2018, 77, 306–315. [Google Scholar] [CrossRef]
Liu, X.; Araki, K.; Harada, S.; Yoshizawa, A.; Terada, K.; Kurata, M.; Nakajima, N.; Abe, H.; Ushiku, T.; Bise, R. Cluster entropy: Active domain adaptation in pathological image segmentation. In Proceedings of the 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI); IEEE, 2023; pp. 1–5. [Google Scholar]
Mahapatra, D.; Tennakoon, R.; George, Y.; Roy, S.; Bozorgtabar, B.; Ge, Z.; Reyes, M. ALFREDO: Active Learning with FeatuRe disEntangelement and DOmain adaptation for medical image classification. Med. Image Anal. 2024, 97, 103261. [Google Scholar] [CrossRef] [PubMed]
Ran, J.; Zhang, G.; Xia, F.; Zhang, X.; Xie, J.; Zhang, H. Source-free active domain adaptation for diabetic retinopathy grading based on ultra-wide-field fundus images. Comput. Biol. Med. 2024, 174, 108418. [Google Scholar] [CrossRef]
Wang, H.; Chen, J.; Zhang, S.; He, Y.; Xu, J.; Wu, M.; He, J.; Liao, W.; Luo, X. Dual-reference source-free active domain adaptation for nasopharyngeal carcinoma tumor segmentation across multiple hospitals. IEEE Transactions on Medical Imaging, 2024. [Google Scholar]
Wang, H.; Luo, X.; Chen, W.; Tang, Q.; Xin, M.; Wang, Q.; Zhu, L. Advancing uwf-slo vessel segmentation with source-free active domain adaptation and a novel multi-center dataset. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2024; Springer; pp. 75–85. [Google Scholar]
Quan, Q.; Yao, Q.; Zhu, H.; Wang, Q.; Zhou, S.K. Which images to label for few-shot medical image analysis? Med. Image Anal. 2024, 96, 103200. [Google Scholar] [CrossRef]
Chen, Y.; Luo, X.; Chen, R.; Li, Y.; Zhang, H.; Lyu, H.; Song, H.; Li, K. Source-Free Active Domain Adaptation via Influential-Points-Guided Progressive Teacher for Medical Image Segmentation. IEEE Transactions on Medical Imaging, 2025. [Google Scholar]
Qin, C.; Wang, Y.; Zeng, F.; Zhang, J.; Cao, Y.; Yin, X.; Huang, S.; Chen, D.; Zhang, H.; Ju, Z. Active Domain Adaptation Based on Probabilistic Fuzzy c-means Clustering for Pancreatic Tumor Segmentation. IEEE Transactions on Fuzzy Systems, 2025. [Google Scholar]
Wang, K.; Yang, M.; Liu, A.; Li, C.; Qian, R.; Chen, X. Active source-free domain adaptation for intracranial EEG classification via neighborhood uncertainty and diversity. Biomed. Signal Process. Control 2025, 104, 107464. [Google Scholar] [CrossRef]
Ghasemigarjan, R.; Mikaeili, M.; Setarehdan, S.K.; Saboori, A. Enhancing EEG-based sleep staging efficiency with minimal channels through adversarial domain adaptation and active deep learning. J. Neural Eng. 2025, 22, 046043. [Google Scholar] [CrossRef]
Li, J.; Wang, H.; Wang, W.; Qin, J.; Wang, Q.; Zhu, L. Source-Free Active Domain Adaptation for Efficient Medical Video Polyp Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 2025; Springer; pp. 499–509. [Google Scholar]
Yang, J.; Marcus, D.S.; Sotiras, A. Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning. arXiv 2025, arXiv:2509.10784. [Google Scholar] [CrossRef]
Wang, H.; Chen, W.; Luo, X.; Xing, Z.; Liu, L.; Qin, J.; Wu, S.; Zhu, L. Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025; pp. 24102–24112. [Google Scholar]
Chan, Y.S.; Ng, H.T. Domain adaptation with active learning for word sense disambiguation. In Proceedings of the Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 2007; pp. 49–56. [Google Scholar]
Attardi, G.; Simi, M.; Zanelli, A. Domain adaptation by active learning. In Proceedings of the International Workshop on Evaluation of Natural Language and Speech Tool for Italian, 2012; Springer; pp. 77–85. [Google Scholar]
Zhao, S.; Ng, H.T. Domain adaptation with active learning for coreference resolution. In Proceedings of the Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi), 2014; pp. 21–29. [Google Scholar]
Wu, F.; Huang, Y.; Yan, J. Active sentiment domain adaptation. Proceedings of the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 2017, Volume 1, 1701–1711. [Google Scholar]
Wang, P.; Cao, Y.; Russell, C.; Shen, Y.; Luo, J.; Zhang, M.; Heng, S.; Luo, X. DELTA: Dual Consistency Delving with Topological Uncertainty for Active Graph Domain Adaptation. Transactions on Machine Learning Research, 2025. [Google Scholar]
Liu, C.; He, X. Adversarial Weighted Active Domain Adaptation for Safety Assessment in Open Environments. IEEE Transactions on Industrial Informatics, 2024. [Google Scholar]
Ma, W.; Lan, X.; Liu, R.; Wang, J.; Zhou, Q. A Dual Active Domain Adaptation Approach with Loss Prediction for IIoT Intrusion Detection under Imperfect Samples. IEEE Internet of Things Journal, 2025. [Google Scholar]
Zhou, S.; Zhao, H.; Zhang, S.; Wang, L.; Chang, H.; Wang, Z.; Zhu, W. Online continual adaptation with active self-training. In Proceedings of the International conference on artificial intelligence and statistics. PMLR, 2022; pp. 8852–8883. [Google Scholar]
Rios, A.S.; Ndiour, I.J.; Sydir, J.; Datta, P.; Tickoo, O.; Ahuja, N. CUAL: Continual Uncertainty-aware Active Learner. In Proceedings of the NeurIPS 2024 Workshop on Scalable Continual Learning for Lifelong Foundation Models, 2024. [Google Scholar]
Bauer, J.C.; Trattnig, S.; Geng, P.; Raffin, T.; Daub, R. A continual active learning approach to adapt neural networks to distribution shifts in quality monitoring applications. Int. J. Adv. Manuf. Technol. 2025, 1–17. [Google Scholar] [CrossRef]
Ayub, A.; Fendley, C. Few-shot continual active learning by a robot. Adv. Neural Inf. Process. Syst. 2022, 35, 30612–30624. [Google Scholar]
Johnson, C.; Maldonado-Contreras, J.; Young, A. Accelerating constrained continual learning with dynamic active learning: A study in adaptive speed estimation for lower-limb prostheses. In Proceedings of the 2024 International Symposium on Medical Robotics (ISMR); IEEE, 2024; pp. 1–8. [Google Scholar]
Perkonigg, M.; Hofmanninger, J.; Langs, G. Continual active learning for efficient adaptation of machine learning models to changing image acquisition. In Proceedings of the International Conference on Information Processing in Medical Imaging, 2021; Springer; pp. 649–660. [Google Scholar]
Perkonigg, M.; Hofmanninger, J.; Herold, C.; Prosch, H.; Langs, G. Continual Active Learning Using Pseudo-Domains for Limited Labelling Resources and Changing Acquisition Characteristics. Mach. Learn. Biomed. Imaging 2022, 1, 1–28. [Google Scholar] [CrossRef]
Nie, X.; Deng, Z.; He, M.; Fan, M.; Tang, Z. Online active continual learning for robotic lifelong object recognition. IEEE Transactions on Neural Networks and Learning Systems, 2023. [Google Scholar]
Zhang, X.; Loo, C.K.; Chuah, J.H. Active continual learning with Energy Alignment Sampling Strategy (EASS) for structural damage classification. Appl. Intell. 2025, 55, 886. [Google Scholar] [CrossRef]
Park, J.; Park, D.; Lee, J.G. Active Learning for Continual Learning: Keeping the Past Alive in the Present. In Proceedings of the The Thirteenth International Conference on Learning Representations, 2025. [Google Scholar]
Qureshi, A.H.; Miao, Y.; Yip, M.C. Active continual learning for planning and navigation. In Proceedings of the ICML 2020 Workshop on Real World Experiment Design and Active Learning, 2020. [Google Scholar]
Ho, S.; Liu, M.; Gao, S.; Gao, L. Learning to learn for few-shot continual active learning. Artif. Intell. Rev. 2024, 57, 280. [Google Scholar] [CrossRef]
Vu, T.T.; Khadivi, S.; Ghorbanali, M.; Phung, D.; Haffari, G. Active continual learning: On balancing knowledge retention and learnability. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, 2024; Springer; pp. 137–150. [Google Scholar]
Daniel, R.; Verdelho, M.R.; Barata, C.; Santiago, C. Continual Deep Active Learning for Medical Imaging: Replay-Based Architecture for Context Adaptation. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, 2025; Springer; pp. 108–121. [Google Scholar]
Rolnick, D.; Ahuja, A.; Schwarz, J.; Lillicrap, T.; Wayne, G. Experience replay for continual learning. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Lopez-Paz, D.; Ranzato, M. Gradient episodic memory for continual learning. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Kirkpatrick, J.; Pascanu, R.; Rabinowitz, N.; Veness, J.; Desjardins, G.; Rusu, A.A.; Milan, K.; Quan, J.; Ramalho, T.; Grabska-Barwinska, A.; et al. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 2017, 114, 3521–3526. [Google Scholar] [CrossRef]
Zenke, F.; Poole, B.; Ganguli, S. Continual learning through synaptic intelligence. In Proceedings of the International conference on machine learning. PMLR, 2017; pp. 3987–3995. [Google Scholar]

Figure 1. The organization of our survey. We systematically review active domain adaptation in four aspects: (1) Active Domain Adaptation and Query Strategies, covering various advanced query strategies, (2) Emerging Learning Paradigms, (3) its Challenging Scenarios in Active Domain Adaptation, and (4) its applications. We also review Active Continual Learning. Finally, we discuss challenges and future directions.

Figure 2. The paradigms of (A) active domain adaptation; (B) Active source-free domain adaptation; (C) active domain adaptation integrated with semi-supervised learning; and (D) active source-free domain adaptation integrated with semi-supervised learning.

Figure 3. The paradigms of (A) label distribution shift; (B) open set domain adaptation; (C) universal domain adaptation; (D) multi-source domain adaptation; and (E) multi-target domain adaptation.

Figure 4. Examples of (A) Vocabulary shift, (B) Context shift, and (C) Label shift.

Figure 5. The paradigm of active continual learning.

Table 2. The summary of ADA and ASFDA methods in medical data analysis.

Table 3. The summary of the application of ADA methods in graph learning, science and engineering.

Table 4. The summary of active continual learning methods.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

How does Active Learning Tackle Domain Adaptation: A Survey on Active Domain Adaptation and Active Continual Learning

Abstract

Keywords:

Subject:

1. Introduction

1.1. Contributions of the Survey

1.2. Structure of the Survey

2. Active Domain Adaptation and Query Strategies

2.1. Problem Formulation

2.2. Uncertainty-Based Query Strategies

2.2.1. Classical Uncertainty Metrics

2.2.2. Disagreement-Based Uncertainty Metrics

2.2.3. Novel Uncertainty Metrics

2.3. Diversity-Based Query Strategies

2.3.1. Clustering-Based Diversity Sampling

2.3.2. Intra-Domain Diversity Sampling

2.3.3. Domain Discrepancy Diversity Sampling

2.4. Hybrid Query Strategies

2.4.1. Sequential Hybrid Strategies

2.4.2. Joint Hybrid Strategies

3. Emerging Learning Paradigms in Active Domain Adaptation

3.1. Active Source-Free Domain Adaptation

3.1.1. Source-Free Query Strategies

3.1.2. Source-Free Source Knowledge Utilization

3.2. Semi-Supervised Learning in Active Domain Adaptation

3.2.1. Pseudo Labeling Strategies

3.2.2. Selective Pseudo Labeling Strategies

3.3. Class-Balanced Active Domain Adaptation

3.4. Multi-fidelity active domain adaptation

4. Challenging Scenarios in Active Domain Adaptation

4.1. Label Distribution Shift

4.2. Open-Set and Universal Domain Adaptation

4.3. Multi-Source or Multi-Target Domain Adaptation

4.4. Cross-Modality Adaptation

5. Applications of Active Domain Adaptation

5.1. Active Domain Adaptation in Natural Images

5.1.1. Image Classification

5.1.2. Object Detection

5.1.3. Semantic Segmentation

5.1.4. Multi-Tasks

5.1.5. Remote Sensing

5.1.6. Vehicle Re-Identification

5.2. Active Domain Adaptation in Robotics

5.3. Active Domain Adaptation in Medical Data Analysis

5.3.1. Classification for Diagnosis

5.3.2. Medical Image and Video Segmentation

5.3.2.1. CT and MR images.

5.3.2.2. Pathologic images.

5.3.2.3. Medical videos.

5.3.3. Multi-Task Medical Data Analysis

5.3.4. Active Domain Adaptation of VFM/VLM in Medical Data

5.4. Active Domain Adaptation in Natural Language Processing

5.5. Active Domain Adaptation in Graph Learning

5.6. Active Domain Adaptation in Science and Engineering

6. Active Continual Learning

6.1. Problem Formulation

6.2. Query Strategies in Active Continual Learning

6.2.1. Uncertainty-Based Strategies

6.2.2. Diversity-Based Strategies

6.2.3. Hybrid Strategies

6.2.4. ACL-Specific Query Strategies

6.2.5. Empirical Analyses of Query Strategies

6.3. Mitigate Catastrophic Forgetting in Active Continual Learning

6.3.1. Replay-Based Methods

6.3.2. Regularization-Based Methods

6.3.3. Parameter-Isolation Methods

6.4. Applications of Active Continual Learning

6.4.1. Medical Data Analysis

6.4.2. Robotics

7. Challenges and Future Directions

References

MDPI Initiatives

Important Links

Subscribe