Preprint
Article

This version is not peer-reviewed.

Semi-Supervised Traffic Sign Detection with Dynamic Pseudo-Label Selection and Gated-Feature-Fusion-Based Proposal Refinement

Submitted:

29 April 2026

Posted:

30 April 2026

You are already at the latest version

Abstract
Accurate traffic sign detection is important for the safety of autonomous driving systems. However, fully supervised methods require a large amount of manual annotation, which is cost-prohibitive and time-consuming. Semi-supervised methods employ a small amount of labeled data and a large amount of unlabeled data to train the models, hence largely reducing the annotation costs. However, these methods have the following challenges: (1) with an imbalanced long-tail class distribution of traffic signs, they tend to achieve poor performance on tail classes; (2) they often fail to detect small traffic signs. To solve these issues, we propose a Semi-Supervised Traffic Sign Detection method with Dynamic Pseudo-Label Selection and Gated-Feature-Fusion-based Proposal Refinement. Firstly, we design a Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS) to select pseudo-labels for different classes based on the class distribution information, which reduces the tendency to select more pseudo-labels from head classes instead of tail classes, thereby improving the tail class detection performance. Secondly, we employ a Gated-Feature-Fusion-based Proposal Refinement strategy (GFF-PR) to refine detection proposals by fusing different-scale features with a gating mechanism, which facilitates the detection of small traffic signs. Besides, we use an Adaptive-Weight Focal Loss (AWFL), with which the weight of each pseudo-label is determined by the ratio between its classification confidence and the corresponding class-specific classification-confidence threshold. Experiments on traffic sign datasets demonstrate that the proposed method outperforms state-of-the-art semi-supervised approaches, with mAP50 scores of 11.5% and 36.3% using only 1% and 10% labeled data, respectively.
Keywords: 
;  ;  ;  ;  

1. Introduction

Traffic signs provide essential visual guidance to drivers by conveying regulatory, warning, and informational messages in the road environment. In real-world driving scenarios, failing to correctly detect traffic signs can significantly increase the risk of traffic accidents. With the continuous advancement of autonomous driving and Advanced Driver Assistance Systems (ADAS), the automatic and accurate traffic sign detection has become more and more vital for driving safety [1].
In recent years, deep-learning-based object detection methods have achieved great breakthroughs, largely due to the availability of large-scale annotated datasets [2,3,4,5,6,7]. These fully supervised methods have demonstrated excellent performances across various research domains. However, these fully supervised methods heavily rely on extensive manual annotation, which is inherently labor-intensive and cost-prohibitive, especially for traffic sign detection tasks that require high-precision bounding boxes for small targets.
To reduce the dependence on annotated data, researchers have turned to semi-supervised learning methods [8,9,10,11], which rely on a small amount of labeled data and a large amount of unlabeled data. By exploiting information from the unlabeled data, semi-supervised learning can achieve accurate and robust detection results while significantly lowering the dependence on expensive annotations, particularly when labeled data is hard to obtain. The semi-supervised learning methods mainly fall into two categories: Consistency training and Pseudo-labeling [12,13,14,15,16,17,18,19,20,21,22].
Consistency training typically constructs a regularization term to enforce prediction invariance across different perturbations, thereby smoothing the decision boundary. However, this mechanism inherently depends on the reliability of supervision signals. If the underlying pseudo-labels are inaccurate due to class imbalance, consistency regularization may inadvertently reinforce incorrect predictions.
Existing pseudo-labeling methods in traffic scenes usually rely on a single classification-confidence threshold for all object classes. However, this simple strategy ignores the difference in classification confidences of object classes, which is due to the imbalanced class distribution. As shown in Figure 1, the traffic sign dataset exhibits imbalance in instance counts of different classes, where the classes on the left appear much more frequently than those on the right. This is characterized by a long-tail, imbalanced class distribution. Here we take the 18 classes with more than 500 instances as “head” classes and the remaining 67 classes with fewer than 500 instances as “tail” classes. Since the model mainly learns from the head classes, it tends to predict these classes with higher classification confidences. In contrast, due to insufficient training, the model tends to predict tail classes with lower classification confidences. Therefore, a fixed threshold will filter out many potential positive samples from tail classes and treat them as background, which will lead to a severe loss of useful supervision information from tail classes.
To address these challenges, we propose a semi-supervised traffic sign detection method with dynamic pseudo-label selection and Gated-Feature-Fusion-based Proposal Refinement. Built on YOLOv11, the proposed method introduces class-aware pseudo-label selection, gated cross-scale feature fusion, and adaptive loss weighting to reduce the tendency to select more pseudo-labels from head classes instead of tail classes, improving the detection performance of small traffic signs. The main contributions of this paper are summarized as follows:
(1) We propose a Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS). Instead of using a single global classification-confidence threshold for all classes, the CD-DPLS assigns different classification-confidence thresholds to different traffic sign classes to avoid retaining more pseudo-labels for head classes while filtering out tail classes. To make the class distribution estimation more reliable under limited labeled data, we combine two kinds of class distributions: the class distribution from labeled data and the class distribution provided by CLIP on unlabeled data. Thus, the proposed method can adjust class-specific classification-confidence thresholds dynamically to improve the detection performance on tail classes.
(2) We propose a Gated-Feature-Fusion-based Proposal Refinement strategy (GFF-PR). To avoid the missed detections of small traffic signs, the GFF-PR fuses different-scale features through a gating mechanism, and uses the fused features to refine object proposals. Specifically, the GFF-PR uses two types of proposals based on the fused feature pyramid for training. The first type of proposals consists of those whose fused confidence (computed as the product of classification confidence and localization confidence) exceeds a dynamic positive threshold determined from the mean and standard deviation of fused confidences in the current batch. The second type of proposals consists of those whose fused confidence lies between the dynamic positive threshold and the fixed negative threshold of 0.1. These proposals are retained only when the confidence gain between the fused feature pyramid and original feature pyramid exceeds 0.2. This strategy thereby improves the detection performance of small traffic signs.
(3) We propose an Adaptive-Weight Focal Loss (AWFL). Existing methods usually treat all pseudo-labels above the classification-confidence threshold equally, while the AWFL assigns each pseudo-label an adaptive weight based on the ratio between its classification confidence and the class-specific classification-confidence threshold. As a result, pseudo-labels, which are much higher than the aforementioned threshold, can get larger weights. This mitigates the issues caused by the imbalanced class distributions.

3. Methods

3.1. Overview of Our Method

The semi-supervised traffic sign detection method aims to address the limitations of existing methods in imbalanced class distributions and small object detection. The overall framework is illustrated in Figure 2. The proposed method consists of three main components: the Class-Distribution-based Dynamic Pseudo-Label Selection module, the Gated-Feature-Fusion-based Proposal Refinement strategy, and the Adaptive-Weight Focal Loss. The efficient single-stage fully convolutional network, YOLOv11, is adopted as the baseline detector.
We divide the training data into a small set of labeled data D L = { x i l , y i l } i = 1 N l and a large set of unlabeled data D U = { x i u } i = 1 N u , typically satisfying N u N l .
The framework consists of a teacher model and a student model with identical structures. The student model updates its parameters via backpropagation, while the teacher model’s parameters are updated using the Exponential Moving Average (EMA) of the student model’s parameters to ensure stability in pseudo-label generation.

3.2. Class-Distribution-Based Dynamic Pseudo-Label Selection

Existing pseudo-label selection methods mainly rely on fixed confidence estimation to filter unreliable predictions. Although such strategies can improve the quality of pseudo-labels to some extent, they are still insufficient for traffic sign detection under class imbalance. In particular, teacher models tend to produce higher classification confidences for head classes, while the classification confidences of tail classes are lower due to limited training samples. Therefore, using a fixed global threshold or a simple Top-k strategy will suppress the tail classes, introducing pseudo-label noise for head classes. This issue may accumulate during iterative semi-supervised training, further exacerbating the model’s preference for head classes.
More importantly, using fixed classification-confidence thresholds alone cannot explicitly regulate the distribution of pseudo-labels. As a result, even after low-quality pseudo-labels are filtered out, the generated pseudo-labels may still remain imbalanced across classes, which prevents the detector from learning tail classes. To address this issue, we propose the Class-Distribution-based Dynamic Pseudo-Label Selection (CD-DPLS) module, as illustrated in Figure 3. Instead of applying fixed classification-confidence thresholds to all classes, the CD-DPLS exploits the class distribution derived from labeled data and unlabeled data to dynamically adjust the classification-confidence threshold for each class. The CD-DPLS improves the detection performance on tail classes.

3.2.1. Theoretical Basis and Distribution Estimation

The effectiveness of class distribution probabilities is grounded in a statistical observation: in semi-supervised learning settings, even when the proportion of labeled data is small, the empirical class distribution of the labeled data can still approximate the class distribution characteristics of the entire dataset.
Assuming the labeled dataset D L and the unlabeled dataset D U are independent and identically distributed, according to the generalization of Hoeffding’s inequality, for any class k, the deviation between the estimated class distribution probability γ k and the true distribution γ k * satisfies the following probability bound:
γ k γ k * log ( 2 n ) 2 n + log ( 2 m ) 2 m
where n and m denote the numbers of labeled and unlabeled data, respectively. As the number of labeled samples n increases, this error converges at a rate of O p 1 / n .
However, this distribution estimation approach based on labeled data still has clear limitations in long-tailed semi-supervised scenarios. On the one hand, when the labeling ratio is low, the number of samples from tail classes in the labeled set is often limited. As a result, although the overall error bound decreases as n grows, the estimated class probabilities for different classes may still fluctuate. On the other hand, the unlabeled data is usually more abundant than the labeled data and thus contain richer information. Relying only on the labeled set makes it difficult to fully capture the latent distribution characteristics of the entire training dataset. Therefore, we calibrate the dataset’s class distribution by combining the class distribution probabilities derived from labeled data with those from unlabeled data estimated by a pre-trained CLIP, ultimately obtaining more reliable class distribution probabilities for the subsequent dynamic selection process.

3.2.2. CLIP-Based Class Distribution Estimation

Since class labels in traffic sign detection tasks are often abbreviations (e.g., “pl50”, “p23”) and lack semantic information, directly applying a pre-trained CLIP for zero-shot inference faces challenges. To this end, we designed a "Class-Semantic Mapping Mechanism" to utilize CLIP’s generalization ability to extract class distribution probabilities γ k c l i p from unlabeled data, which serve as a complement to the class distribution probabilities γ k l a b e l derived from labeled data.
Since the original dataset labels (e.g., “pl50”, “p23”) are engineering codes, the CLIP text encoder cannot directly capture their semantics. Therefore, we construct a mapping function M that maps the abbreviated label space C abbr to the natural language description space C desc . For example, “pl50” is mapped to “Speed limit 50 km/h”, and “p23” is mapped to “No left turn”. To enhance the contextual representation of textual features, we insert the mapped descriptions into a prompt template to generate the text embedding t k for each class k:
t k = Encoder text " A photo of a " + M class k + " traffic sign "
Then we use the CLIP to perform sampling inference on the unlabeled dataset D U to estimate its distribution. Specifically, for unlabeled images, we use the preliminary prediction boxes from the teacher model to crop them and input them into the CLIP image encoder to extract visual features v i . By calculating the cosine similarity between the visual features and all class text embeddings t k , we obtain the semantic prediction probability for the sample:
p k c l i p v i = e x p s i m t k , v i / τ j = 1 K e x p s i m t j , v i / τ
By aggregating statistics on a subset of unlabeled samples, we obtain the distribution estimation γ k c l i p based on semantic understanding.
By counting the predicted class of unlabeled data, we obtain the class distribution probabilities of the unlabeled data, denoted as γ k c l i p . Finally, to combine the class distribution probabilities derived from labeled and unlabeled data, we employ a weighted summation to fuse the two probabilities, yielding the final fused class distribution probability γ k :
γ k = β · γ k label + 1 β · γ k clip

3.2.3. Class-Distribution-Based Threshold Setting

To achieve better alignment between the pseudo-label distribution and the true distribution, we formulate the semi-supervised learning process as an optimization problem with regularization constraints. Unlike traditional methods that only minimize classification loss, we introduce a class-aware regularization term to explicitly control the quantity of pseudo-labels generated for each class.
Specifically, for class k, the teacher model predicts the classification confidences for the corresponding traffic sign on the unlabeled dataset, and these confidences are sorted in descending order as C o n f k , 1 C o n f k , 2 C o n f k , m . Additionally, we further introduce positive- and negative-sample reliability ratios η 1 , η 0 0 , 1 to filter out the false positive samples. The top η 1 γ k proportion of samples are selected as positive pseudo-labels, while the bottom η 0 1 γ k proportion are selected as negative pseudo-labels. Thereby, the classification-confidence thresholds of positive and negative samples are computed as follows:
τ k + = C o n f k , η 1 γ k m , τ k = C o n f k , m η 0 1 γ k m
Under this definition, τ k + and τ k vary dynamically with the class distribution probability to improve the detection accuracy of each class.

3.2.4. Dynamic Pseudo-Label Selection

Finally, the classification pseudo-label y ^ k for an unlabeled sample x by the teacher model is determined by the following rule:
y ^ k = 1 , Conf k ( x ) τ k + 0 , Conf k ( x ) τ k ignore , otherwise
This mechanism encourages the pseudo-label distribution of each class to better align with the target class distribution.

3.3. Gated-Feature-Fusion-Based Candidate Refinement Strategy

In the previous section, we introduced the CD-DPLS to address the challenge of imbalanced class distributions. However, in semi-supervised object detection, pseudo-label quality is also limited by the missed detection of small objects. The small objects often obtain low classification confidences due to their limited visual information, making them more likely to be mistaken as background. As a result, many potentially useful positive samples are ignored during the training.
To address this issue, we propose the Gated-Feature-Fusion-based Proposal Refinement strategy (GFF-PR). The main idea is to enhance the feature representation of small traffic signs by adaptively fusing features from different pyramid levels. Based on the fused feature pyramid, the model generates proposals for training and further refines them. In this way, proposals, which are weak in the original feature pyramid but become more reliable after feature fusion, can be identified and retained as valuable training signals.

3.3.1. Feature Pyramid Construction for Feature Fusion

To introduce scale robustness at the feature level, we construct a fused feature pyramid. Specifically, the teacher model processes the original scale image and a 0.5x down-sampled image in parallel. After extracting features using the YOLOv11 Backbone, we align and fuse the original scale feature layer F i with the down-sampled feature layer F i 1 d o w n . To achieve adaptive feature fusion, we designed a gated fusion function g · to dynamically calculate the weights. As shown in Figure 4, for the i-th feature layer, we first concatenate the original scale feature F i and the down-sampled feature F i 1 d o w n along the channel dimension, and generate the fusion coefficient λ i through a lightweight network composed of global average pooling and a Multi-Layer Perceptron (MLP):
λ i = σ M L P A v g P o o l F i , F i 1 d o w n
where σ is the Sigmoid activation function, used to normalize the weights to the 0 , 1 . The final fused feature F i f u s e d is obtained by weighted summation:
F i f u s e d = λ i · F i + 1 λ i · F i 1 d o w n
This mechanism allows the network to adaptively select whether to retain the detailed information from the original feature layer or the semantic information from the down-sampled feature layer, based on the specific scale characteristics of the target.

3.3.2. Candidate Boxes Selection Based on Confidence Gain

Since the YOLOv11 is a one-stage anchor-free detector, it lacks the candidate box set generated by a Region Proposal Network (RPN). To maintain evaluation under the dense prediction paradigm, we utilize the unique prediction branch structure of YOLO to construct a dual confidence metric.
Dense Prediction Alignment and Association:Let B be the set of predictions output by the teacher model on the original feature pyramid, and B f u s e d be the set of predictions on the fused feature pyramid. For each candidate box b i B f u s e d , we first compute its fused confidence and categorize it into positive, ambiguous, or negative groups according to the positive threshold ( τ p o s ) and negative threshold ( τ n e g ). τ p o s is dynamically determined from the mean and standard deviation of fused confidences in the current batch, while τ n e g is fixed at 0.1. For each ambiguous candidate box, we compare it with the corresponding prediction b i B r e g at the same spatial location.
Dual Confidence Gain Calculation: The fused confidence in the original and fused feature pyramids is computed as
C o n f b i = C l s b i · I o U b i
C o n f fused b i = C l s fused b i · I o U fused b i
Here, Cls ( · ) represents the classfication confidence predicted by the network. IoU ( · ) represents the localization confidence predicted by the network.
Pseudo-Label Generation: We define the fusion confidence gain as Δ C o n f = C o n f f u s e d b i C o n f r e g b i . If Δ C o n f exceeds 0.2, the ambiguous proposals are retained, and the proposals b i from the fused feature pyramid are adopted. Therefore, the final candidate box set consists of two parts: positive candidates whose fused confidence exceeds the dynamic positive threshold, and ambiguous candidates which satisfy the fused-confidence gain criterion. In this way, the GFF-PR reduces missed detections of small traffic signs by feature fusion.

3.4. Overall Optimization Objective

To fully leverage the proposed semi-supervised framework, we design a comprehensive objective function. The total loss L t o t a l for the student model is composed of a supervised loss on labeled data and a multi-task unsupervised loss on unlabeled data:
L t o t a l = L s u p + L u n s u p = L s u p c l s + L s u p l o c + L u n s u p A W F L + L u n s u p l o c + μ · L u n s u p i o u
For the labeled data D L , we employ standard detection losses. Specifically, L s u p c l s utilizes the Binary Cross Entropy Loss to minimize the classification error between the prediction and the ground truth class labels. For bounding box regression, L s u p l o c adopts the Generalized IoU Loss.
For the unlabeled data D U , the loss terms are computed using the refined pseudo-labels:
Adaptive-Weight Focal Loss: For the unsupervised classification loss, standard semi-supervised methods typically use a binary cross-entropy or standard Focal Loss, and treat all filtered pseudo-labels as equally important ground truths. This is ineffective because pseudo-labels have different levels of reliability. Moreover, directly using the raw classification confidences as the weight leads to a side-effect: it gives higher weights to head-class pseudo-labels and lower weights to tail class pseudo-labels. As a result, the model pays less attention to the very classes that need more supervision. To address this, we propose the Adaptive-Weight Focal Loss ( L unsup AWFL ). Instead of relying on raw classification confidences, we introduce a Class-Adaptive Relative Weight strategy that adjusts the contribution of each pseudo-label based on how much its classification confidence exceeds the class-specific classification-confidence threshold. The formulation is:
L u n s u p A W F L = j ω j · ( 1 p j ) γ log p j
where p j is the student model’s classification confidence for the target class, and γ is the focusing parameter. Crucially, ω j is defined as the ratio between the teacher model’s classification confidence and the class-specific classification-confidence threshold.
ω j = min p j f u s e d τ c j + , 1.0
where p j f u s e d represents the classification confidence of the j-th sample in the fused feature pyramid, and τ c j + denotes the positive classification-confidence threshold corresponding to the predicted class c j of the sample. By defining ω j as the ratio of the classification confidence to the specific classification-confidence threshold, we ensure that tail class samples are assigned higher importance weights when they exceed their corresponding thresholds. This mitigates the issues caused by the imbalanced class distributions.
Localization Loss: Consistent with the supervised branch, we employ GIoU Loss for bounding box regression.
IoU Consistency Loss: To improve the student’s estimation of localization quality, we employ Binary Cross Entropy Loss as an auxiliary task.

4. Experiments

We present a semi-supervised object detection method implemented in Python using the PyTorch deep learning framework. The method employs YOLOv11s as the detector, with over 300-epoch pre-training and 100-epoch semi-supervised fine-tuning. During the training, both unlabeled and labeled data batches contain 16 samples. The Adam optimizer is utilized with a constant learning rate of 0.001. The Exponential Moving Average (EMA) parameter α is set to 0.999, and the loss function weight is set to 1.

4.1. Evaluation Metrics

To evaluate the performance of the proposed method, we adopt mAP50, A P S , A P M , and A P L as evaluation metrics.
The mean Average Precision (mAP) is a standard metric for object detection and is computed from the Precision-Recall (P-R) curve. The precision and recall are defined as:
m A P = 0 1 P ( R ) d R
P = T P T P + F P
R = T P T P + F N
where P R denotes the precision at a given recall level R; T P , F P , and F N denote the numbers of true positives, false positives, and false negatives, respectively. mAP50 denotes the mAP values computed at IoU thresholds of 0.5. Also, A P S , A P M , and A P L represent the average precision for small, medium, and large-scale objects, respectively.

4.2. Ablation Study

In ablation studies, we conducted experiments using the 10% labeled data setting. As shown in Table 1, we conduct the experiments to evaluate the contribution of each component of the proposed semi-supervised traffic sign detection method.
As shown in Table 1, the baseline model, Efficient Teacher, achieves only 23.2% mAP50 with 10% labeled data. However, the CD-DPLS improves the mAP50 from 23.2% to 33.2%. This improvement mainly comes from the ability of the CD-DPLS to dynamically adjust the pseudo-label selection threshold for each class according to the class distribution probabilities, and hence improving the detection performance of the proposed method.
We then evaluate the effectiveness of the GFF-PR independently based on the baseline model. The experimental results show that the GFF-PR improves the mAP50 from 23.2% to 33.8% (see Table 1). This is because the GFF-PR successfully retains the missed detections of small traffic signs. When the CD-DPLS and the GFF-PR are used together, the proposed method further increases the mAP50 from 23.2% to 35.6% due to advantages of the CD-DPLS and the GFF-PR.
To further verify the effectiveness of the proposed loss function, we replace the focal loss in Efficient Teacher with AWFL. As shown in the last row of Table 1, our method with the AWFL improves the mAP50 from 35.6% to 36.3%, achieving an additional gain of 0.7%. By combining the CD-DPLS, GFF-PR, and AWFL, the proposed method achieves the best overall performance.
To visually demonstrate the effectiveness of the proposed components in improving traffic sign detection performance, we conduct an experiment with rare traffic sign classes, as well as small traffic signs. As shown in Figure 5, the baseline model, Efficient Teacher, misses the detection of the tail class “pr40”, while it misclassifies other tail-class objects, such as “pr60” and “w66”. In addition, it also fails to detect small traffic signs in the scenes.
With the CD-DPLS, the model not only detects the previously missed tail-class “pr40”, but also corrects the misclassification of “pr60” and “w66” by dynamically adjusting the classification-confidence thresholds, thereby improving the detection performance on tail classes. With the GFF-PR, the model can capture small distant targets.
Finally, after adding the AWFL for overall optimization, the model shows higher classification confidence.
Different Feature Scales: To verify the impact of fused features in the GFF-PR module, we compare the detection accuracy and inference speed of down-sampled feature ( F d o w n ), original feature (F) and fused feature ( F f u s e d ). Details are shown in Table 2.
The experimental results show that, although the down-sampled feature F d o w n has the fastest inference speed (56 FPS), the resolution reduction of the feature map damages the details of small objects, resulting in only 8.9% Small Object Average Precision ( A P S ), and the lowest overall mAP50.
In contrast, the fused feature F f u s e d achieves the best detection performance. Specifically, the A P S increases from 12.4% to 15.4%, demonstrating the effectiveness of the GFF-PR strategy for small traffic sign detection. Although the inference speed decreases to 46 FPS, the model can still satisfy real-time detection requirement. Besides small objects, the fused feature strategy also leads to the detection performance improvements on medium and large objects.
Pseudo-Label Selection Strategies: To validate the effectiveness of the proposed CD-DPLS, we compare the model performance under different pseudo-label filtering strategies. Traditional semi-supervised object detection methods typically employ a fixed classification-confidence threshold to filter pseudo-labels; however, this traditional strategy exhibits limitations in the scenario of imbalanced class distributions.
As shown in Table 3, the threshold setting of 0.9 drops the mAP50 of tail classes to 15.4%, and yields only 20.5 valid pseudo-labels per image on average, limiting the overall mAP50 to 30.5%. Conversely, the threshold setting of 0.5 leads to the overall mAP50 to 33.8%, because of the introduction of incorrect pseudo-labels. However, by dynamically adjusting classification-confidence thresholds based on the fused class distribution probability, the CD-DPLS boosts tail class detection performance to 24.8% and achieves the best overall performance (36.3% mAP50).

4.3. Parameter Sensitivity Analysis

4.3.1. Positive- and Negative-Sample Reliability Ratios

The performance of the CD-DPLS is highly dependent on the threshold settings used for pseudo-label filtering. Specifically, this process is controlled by the positive- and negative-sample reliability ratios η 1 and η 0 , which define the proportions of reliable positive pseudo-labels and negative pseudo-labels. To explore the configuration of these two ratios, we conduct a joint grid search within the ranges of η 1 0.80 , 1.00 and η 0 0.90 , 0.99 under the 10% labeled data setting. The experiment results are detailed in Table 4.
As shown in Table 4, the best detection performance (36.3%) of the proposed method is achieved when η 1 = 0.90 and η 0 = 0.97 . Here, the optimal settings enable the proposed method to select more reliable pseudo-labels.

4.3.2. Class Distribution Fusion Weight

The parameter β serves as a balancing factor between the class distribution probabilities derived from labeled data ( γ k l a b e l ) and those derived from unlabeled data ( γ k c l i p ). We conduct a sensitivity analysis by varying β from 0.0 to 1.0 with a step size of 0.2 under the 10% labeled data setting.
As shown in Table 5, the detection performance of the proposed method follows an inverted U-shaped trend, indicating that using only labeled or unlabeled class distribution probability is not the best choice. The best detection performance (36.3% m A P 50 ) of the proposed method is achieved at β = 0.6 , demonstrating that combining labeled and unlabeled class distribution probabilities can provide the most accurate fused class distribution probability, which enables better classification-confidence thresholds for each class.

4.4. Comparison with the State-of-the-Art Methods

To comprehensively evaluate our method, we compare it with the state-of-the-art two-stage, one-stage, and Transformer-based semi-supervised detectors under identical settings. As shown in Table 6, our method achieves the highest mAP50 across all labeled data ratios. Under the 10% labeled setting, our method reaches 36.3% mAP50, outperforming the PseCo and Semi-DETR by 8.5% and 6.9%, respectively. This improvement mainly comes from the combination of our three proposed modules: the CD DPLS, GFF PR, and AWFL. It can be observed that with the increasing of labeled data ratio, the detection performance of the proposed method gradually increases. Regarding inference speed, our method achieves 45.8 FPS, outperforming two-stage and Transformer-based models. Although our method is slower than the original Efficient Teacher (50.5 FPS) due to the computation overhead of the CD DPLS, GFF PR, and AWFL, the mAP50 gain (from 23.2% to 36.3%) justifies the trade-off between efficiency and accuracy.

4.5. Visualization

To demonstrate the effectiveness of the proposed semi-supervised detection method, we select several representative cases of small objects and tail classes for qualitative comparison. Figure 6 presents the detection results of our method, the ground truth, and several semi-supervised methods. These methods include Semi-DETR and the PseCo (representing the best detection performances in end-to-end and two-stage categories, respectively), as well as our baseline, Efficient Teacher, implemented on both YOLOv5 and YOLOv11.
(1) Small-object detection: As shown in the distant traffic sign cases, small targets have only a few pixels in the image, so they are easily ignored in deep networks. Specifically, Efficient Teacher and the PseCo exhibit missed small-object detections. Besides, all the comparison methods except ours also suffer from false detections. In contrast, with the help of the GFF-PR, our method can better recover small objects by using fused features.
(2) Tail-class detection: As shown in Figure 6, all the comparison methods except ours misclassify the tail class objects as other classes. Furthermore, the PseCo also exhibit missed detections for tail-class objects. In contrast, the CD-DPLS retains more pseudo-labels for tail classes by dynamically adjusting the classification-confidence thresholds according to the fused class distribution probability, while the AWFL mitigates the issues caused by the imbalanced class distribution. As shown in the last row of Figure 6, our proposed method predicts the correct classes with higher confidence for tail-class objects such as “pr30”, “w45” and “w60”.

5. Conclusion

In this paper, we propose a semi-supervised traffic sign detection method to address two challenges in traffic scenes: imbalanced class distributions and missed small-object detections. The proposed CD-DPLS combines class distribution probabilities from labeled and unlabeled data to dynamically adjust pseudo-label selection, increasing the detection performance on tail classes. In addition, the GFF-PR improves the detection of small objects by leveraging fused features to re-evaluate ambiguous proposals. Finally, we introduce the AWFL, which assigns adaptive weights to traffic sign classes, to mitigate the issues caused by the imbalanced class distributions.
Experiments show that the proposed method outperforms the existing state-of-the-art semi-supervised detection methods. It also maintains a real-time inference speed of 45.8 FPS, demonstrating a balance between detection accuracy and efficiency. The experiment results also suggest that the proposed method is an effective solution for real-world intelligent transportation systems.

Author Contributions

Conceptualization, Y.S. and G.Y.; methodology, C.X., Y.S. and M.C.; software, C.X.; validation, C.X.; formal analysis, C.X.; investigation, C.X.; resources, G.Y.; writing—original draft preparation, C.X.; writing—review and editing, Y.S. and M.C.; visualization, C.X.; supervision, Y.S., G.Y. and M.C.; project administration, C.X.; funding acquisition, Y.S. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the National Natural Science Foundation of China (Grant No. 61671255), Nantong Social Livelihood Science and Technology Plan (Grant No. MS2024025), Jiangsu Province’s “Qinglan Project” for Middle-Aged and Young Academic Leaders (2023), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX25_2016).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mogelmose, M.M.; Trivedi, M.M.; Moeslund, T.B. Vision-based traffic sign detection and analysis for intelligent driver assistance systems: Perspectives and survey. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1484–1497. [Google Scholar] [CrossRef]
  2. Sun, H.; Wang, R.; Li, Y.; et al. SET: Spectral enhancement for tiny object detection. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2025, 4713–4723. [Google Scholar]
  3. Sun, H.; Li, Y.; Yang, L.; et al. Uncertainty-aware gradient stabilization for small object detection. Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) 2025, 8407–8417. [Google Scholar]
  4. Wang, Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
  5. Yang, C.; Zhuang, K.; Chen, M.; et al. Traffic sign interpretation via natural language description. IEEE Trans. Intell. Transp. Syst. 2024, 25, 18939–18953. [Google Scholar] [CrossRef]
  6. Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2020, 10781–10790. [Google Scholar]
  7. Yang, W.; Wang, C.; Zhang, T.; et al. SA3Det++: Side-aware quality estimation for semi-supervised 3D object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 10664–10679. [Google Scholar] [CrossRef]
  8. Shehzadi, T.; Hashmi, K.A.; Sarode, S.; et al. STEP-DETR: Advancing DETR-based semi-supervised object detection with super teacher and pseudo-label guided text queries. Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) 2025, 3069–3079. [Google Scholar]
  9. Chen, C.; Han, J.; Debattista, K. Virtual category learning: A semi-supervised learning method for dense prediction with extremely limited labels. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5595–5611. [Google Scholar] [CrossRef]
  10. Luo, Y.; Zhu, J.; Li, M.; et al. Smooth neighbors on teacher graphs for semi-supervised learning. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2018, 8896–8905. [Google Scholar]
  11. Yang, X.; Li, P.; Zhou, Q.; et al. Dense information learning based semi-supervised object detection. IEEE Trans. Image Process. 2025, 34, 1022–1035. [Google Scholar] [CrossRef]
  12. Zeng, X.; Liu, X.; Xiang, X. Confidence-weighted teacher: Semi-supervised object detection based on confidence correction. Pattern Recognit. Comput. Vis. LNCS 15043(2025), 1–15.
  13. Zhang, B.; Wang, Z.; Du, B. Boosting semi-supervised object detection in remote sensing images with active teaching. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
  14. Huang, T.-K.; Yeh, M.-C. Improving semi-supervised object detection by ROI-enhanced contrastive learning. APSIPA ASC 2024, 1–6. [Google Scholar]
  15. Zhang, R.; Xu, C.; Xu, F.; et al. S3OD: Size-unbiased semi-supervised object detection in aerial images. ISPRS J. Photogramm. Remote Sens. 2025, 221, 179–192. [Google Scholar] [CrossRef]
  16. Tran, P.V. SimLTD: Simple supervised and semi-supervised long-tailed object detection. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2025, 4672–4681. [Google Scholar]
  17. Yang, X.; Song, Z.; King, I.; et al. A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]
  18. Wang, C.; Xu, C.; Li, X.; et al. Multi-clue consistency learning to bridge gaps between general and oriented objects in semi-supervised detection. Proc. AAAI 2025, 7582–7590. [Google Scholar] [CrossRef]
  19. Zhao, T.; Fang, Q.; Shi, S.; et al. Density-guided dense pseudo-label selection for semi-supervised oriented object detection. Proc. IEEE Int. Conf. Image Process. (ICIP) 2024, 1092–1098. [Google Scholar]
  20. Shehzadi, T.; Hashmi, K.A.; Stricker, D.; et al. Sparse Semi-DETR: Sparse learnable queries for semi-supervised object detection. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2024, 5840–5850. [Google Scholar]
  21. Cao, F.; Yan, K.; Chen, H.; et al. SSCD-YOLO: Semi-supervised cross-domain YOLOv8 for pedestrian detection in low-light conditions. IEEE Access 2025, 13, 61225–61236. [Google Scholar] [CrossRef]
  22. Chen, S.; Zhang, Z.; Zhang, L.; et al. A semi-supervised learning framework combining CNN and multiscale transformer for traffic sign detection and recognition. IEEE Internet Things J. 2024, 11, 19500–19519. [Google Scholar] [CrossRef]
  23. Yang, Y.; Luo, H.; Xu, H.; et al. Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2022–2031. [Google Scholar] [CrossRef]
  24. Yuan, X.; Hao, X.; Chen, H.; et al. Robust traffic sign recognition based on color global and local oriented edge magnitude patterns. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1466–1477. [Google Scholar] [CrossRef]
  25. Liu, C.; Chang, F.; Chen, Z.; et al. Fast traffic sign recognition via high-contrast region extraction and extended sparse representation. IEEE Trans. Intell. Transp. Syst. 2016, 17, 79–92. [Google Scholar] [CrossRef]
  26. Chen, S.; Zhang, Z.; Ma, H.; et al. A content-adaptive hierarchical deep learning model for detecting arbitrary oriented road surface elements using MLS point clouds. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
  27. Wang, J.; Chen, Y.; Dong, Z.; et al. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 2023, 35, 7853–7865. [Google Scholar] [CrossRef]
  28. Manzari, O.N.; Boudesh, A.; Shokouhi, S.B. Pyramid transformer for traffic sign detection. Int. Conf. Comput. Knowl. Eng. 2022, 112–116. [Google Scholar]
  29. Vaswani, A.; Shazeer, N.; Parmar, N.; et al. Attention is all you need. Adv. Neural Inf. Process. Syst. (NeurIPS) 2017, 5998–6008. [Google Scholar]
  30. Wang, G.; Zhou, K.; Wang, L.; et al. Context-aware and attention-driven weighted fusion traffic sign detection network. IEEE Access 2023, 11, 42104–42112. [Google Scholar] [CrossRef]
  31. Zhang, J.; Xie, Z.; Sun, J.; et al. A cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 2020, 8, 29742–29754. [Google Scholar] [CrossRef]
  32. Sohn, K.; Zhang, Z.; Li, C.L.; et al. A simple semi-supervised learning framework for object detection. arXiv 2020, arXiv:2005.04757. [Google Scholar] [CrossRef]
  33. Liu, Y.-C.; Ma, C.-Y.; He, Z.; et al. Unbiased teacher for semi-supervised object detection. arXiv 2021, arXiv:2102.09480. [Google Scholar] [CrossRef]
  34. Tang, Y.; Chen, W.; Luo, Y.; et al. Humble teachers teach better students for semi-supervised object detection. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2021, 3132–3141. [Google Scholar]
  35. Xu, M.; Zhang, Z.; Hu, H.; et al. End-to-end semi-supervised object detection with soft teacher. Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) 2021, 3060–3069. [Google Scholar]
  36. Zhang, J.; Lin, X.; Zhang, W.; et al. Semi-DETR: Semi-supervised object detection with detection transformers. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2023, 23809–23818. [Google Scholar]
  37. Wang, P.; Cai, Z.; Yang, H.; et al. Omni-DETR: Omni-Supervised Object Detection with Transformers. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2022, 9367–9376. [Google Scholar]
  38. Li, G.; Li, X.; Wang, Y.; et al. PseCo: Pseudo labeling and consistency training for semi-supervised object detection. arXiv 2022, arXiv:2203.16348. [Google Scholar]
  39. Xu, B.; Chen, M.; Guan, W.; et al. Efficient Teacher: Semi-supervised object detection for YOLOv5. arXiv 2023, arXiv:2302.07577. [Google Scholar] [CrossRef]
  40. Liu, Y.C.; Ma, C.Y.; Kira, Z. Unbiased Teacher v2: Semi-supervised object detection for anchor-free and anchor-based detectors. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 2022, 9819–9828. [Google Scholar]
  41. Luo, G.; Zhou, Y.; Jin, L.; et al. Towards end-to-end semi-supervised learning for one-stage object detection. arXiv 2023, arXiv:2306.00930. [Google Scholar]
  42. Zhou, H.; Ge, Z.; Liu, S.; et al. Dense Teacher: Dense pseudo-labels for semi-supervised object detection. arXiv 2022, arXiv:2206.07246. [Google Scholar]
Figure 1. Imbalanced class distribution in the traffic sign dataset. The horizontal axis shows the classes of traffic signs, and the vertical axis shows the number of traffic sign instances, revealing a severe imbalance between head classes and tail classes.
Figure 1. Imbalanced class distribution in the traffic sign dataset. The horizontal axis shows the classes of traffic signs, and the vertical axis shows the number of traffic sign instances, revealing a severe imbalance between head classes and tail classes.
Preprints 211031 g001
Figure 2. The overall architecture of the proposed semi-supervised traffic sign detection framework. The framework is based on the Efficient Teacher paradigm using YOLOv11 as the baseline. It contains two parallel streams: the teacher model generates pseudo-labels from weakly augmented unlabeled data, while the student model learns from labeled and unlabeled data under stronger augmentation. The framework includes three key components: the Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS), which dynamically sets class-specific pseudo-label classification-confidence thresholds; the Gated-Feature-Fusion-based Proposal Refinement strategy (GFF-PR), which fuses multi-scale features to refine proposals and recover small traffic signs; and the Adaptive-Weight Focal Loss (AWFL), which mitigates the issues caused by the imbalanced class distributions.
Figure 2. The overall architecture of the proposed semi-supervised traffic sign detection framework. The framework is based on the Efficient Teacher paradigm using YOLOv11 as the baseline. It contains two parallel streams: the teacher model generates pseudo-labels from weakly augmented unlabeled data, while the student model learns from labeled and unlabeled data under stronger augmentation. The framework includes three key components: the Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS), which dynamically sets class-specific pseudo-label classification-confidence thresholds; the Gated-Feature-Fusion-based Proposal Refinement strategy (GFF-PR), which fuses multi-scale features to refine proposals and recover small traffic signs; and the Adaptive-Weight Focal Loss (AWFL), which mitigates the issues caused by the imbalanced class distributions.
Preprints 211031 g002
Figure 3. Schematic illustration of the Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS). The CD-DPLS combines the class distribution from labeled data with the class distribution estimated by CLIP from unlabeled data. By using the fused class distribution, the module dynamically adjusts the classification-confidence threshold for each class.
Figure 3. Schematic illustration of the Class-Distribution-based Dynamic Pseudo-Label Selection module (CD-DPLS). The CD-DPLS combines the class distribution from labeled data with the class distribution estimated by CLIP from unlabeled data. By using the fused class distribution, the module dynamically adjusts the classification-confidence threshold for each class.
Preprints 211031 g003
Figure 4. Detailed architecture of the Gated Feature Fusion module. This module builds a fused feature pyramid by adaptively combining features from different scales. Specifically, it takes the original feature map F i and the downsampled feature F i 1 d o w n as inputs, and feeds the concatenated result into a lightweight gating network, composed of Global Average Pooling and a Multi-Layer Perceptron to generate the fusion weight λ i . The final fused feature F i f u s e d is then obtained through weighted fusion. In this way, the module can balance spatial details from the original feature and stronger semantic information from the downsampled feature, making it more effective for small traffic signs.
Figure 4. Detailed architecture of the Gated Feature Fusion module. This module builds a fused feature pyramid by adaptively combining features from different scales. Specifically, it takes the original feature map F i and the downsampled feature F i 1 d o w n as inputs, and feeds the concatenated result into a lightweight gating network, composed of Global Average Pooling and a Multi-Layer Perceptron to generate the fusion weight λ i . The final fused feature F i f u s e d is then obtained through weighted fusion. In this way, the module can balance spatial details from the original feature and stronger semantic information from the downsampled feature, making it more effective for small traffic signs.
Preprints 211031 g004
Figure 5. Visual comparison of different components in the proposed method. As the components are gradually introduced, the proposed method shows improvements in tail-class detection, distant small-object detection, as well as prediction confidence.
Figure 5. Visual comparison of different components in the proposed method. As the components are gradually introduced, the proposed method shows improvements in tail-class detection, distant small-object detection, as well as prediction confidence.
Preprints 211031 g005
Figure 6. Visual comparison of different components in the proposed method. As the components are gradually introduced, the proposed method shows improvements in tail-class detection, distant small-object detection, as well as prediction confidence.
Figure 6. Visual comparison of different components in the proposed method. As the components are gradually introduced, the proposed method shows improvements in tail-class detection, distant small-object detection, as well as prediction confidence.
Preprints 211031 g006
Table 1. Ablation study on the effectiveness of different components under the 10% labeled data setting. “CD-DPLS” denotes the Class-Distribution-based Dynamic Pseudo-Label Selection module, “GFF-CR” denotes the Gated-Feature-Fusion-based Candidate Refinement strategy, and “AWFL” denotes the Adaptive-Weight Focal Loss.
Table 1. Ablation study on the effectiveness of different components under the 10% labeled data setting. “CD-DPLS” denotes the Class-Distribution-based Dynamic Pseudo-Label Selection module, “GFF-CR” denotes the Gated-Feature-Fusion-based Candidate Refinement strategy, and “AWFL” denotes the Adaptive-Weight Focal Loss.
DC-Fusion SBR-Neck SCA-Upsampling mAP50 mAP50:95
23.2% 12.8%
27.9% 17.5%
27.5% 17.9%
25.7% 19.2%
28.8% 19.8%
32.1% 20.4%
Table 2. Performance comparison of different feature scale. We compare the detection accuracy and inference speed utilizing strictly down-sampled features ( F d o w n ), original scale features (F), and the proposed fused features ( F f u s e d ).
Table 2. Performance comparison of different feature scale. We compare the detection accuracy and inference speed utilizing strictly down-sampled features ( F d o w n ), original scale features (F), and the proposed fused features ( F f u s e d ).
Feature Level mAP50 A P S A P M A P L FPS
F d o w n 32.8% 8.9% 22.1% 36.1% 56
F 34.1% 12.4% 24.5% 33.5% 52
F f u s e d 36.3% 15.4% 25.8% 36.5% 46
Table 3. Performance comparison of different pseudo-label selection strategies. We compare the detection accuracy on head and tail classes, and the dynamic thresholding process from initial candidates to final valid pseudo-labels utilizing fixed thresholds (0.9, 0.5) and the proposed CD-DPLS. “All” means all the classes in the dataset, “Head” means the head classes, and “Tail” means the tail classes.
Table 3. Performance comparison of different pseudo-label selection strategies. We compare the detection accuracy on head and tail classes, and the dynamic thresholding process from initial candidates to final valid pseudo-labels utilizing fixed thresholds (0.9, 0.5) and the proposed CD-DPLS. “All” means all the classes in the dataset, “Head” means the head classes, and “Tail” means the tail classes.
Threshold Strategy mAP50(All) mAP50(Head) mAP50(Tail) Avg. Initial Candidates Avg. Pseudo-labels
0.9 (Fixed) 30.5% 58.2% 15.4% 138.6 20.5
0.5 (Fixed) 33.8% 52.1% 20.6% 162.4 45.8
CDA (Dynamic) 36.3% 60.5% 24.8% 145.2 32.4
Table 4. Sensitivity analysis of positive- and negative-sample reliability ratios. The table reports the mAP50 performance under varying combinations of η 1 and η 0 .
Table 4. Sensitivity analysis of positive- and negative-sample reliability ratios. The table reports the mAP50 performance under varying combinations of η 1 and η 0 .
η 1 η 0 0.90 0.93 0.95 0.97 0.99
0.80 33.8% 34.2% 34.5% 34.2% 33.9%
0.85 34.2% 34.6% 34.9% 34.9% 34.5%
0.90 34.5% 34.9% 35.1% 36.3% 35.3%
0.95 34.3% 34.7% 34.8% 35.2% 34.9%
1.00 33.9% 34.3% 34.5% 34.7% 34.4%
Table 5. Sensitivity analysis of the class distribution fusion weight β in the CD-DPLS. β = 0 represents using only class distribution probabilities derived from unlabeled data, while β = 1 represents using only the class distribution probabilities derived from labeled data.
Table 5. Sensitivity analysis of the class distribution fusion weight β in the CD-DPLS. β = 0 represents using only class distribution probabilities derived from unlabeled data, while β = 1 represents using only the class distribution probabilities derived from labeled data.
β Value 0.0 0.2 0.4 0.6 0.8 1.0
mAP50 33.9% 35.6% 36.1% 36.3% 35.8% 35.1%
Table 6. Comparison with state-of-the-art semi-supervised object detection methods on the traffic sign dataset. The results are reported using mAP50 with 1%, 2%, 5%, and 10% labeled data ratios. FPS indicates the inference speed on the same hardware. Our method achieves superior accuracy across different labeled data ratios while maintaining real-time performance.
Table 6. Comparison with state-of-the-art semi-supervised object detection methods on the traffic sign dataset. The results are reported using mAP50 with 1%, 2%, 5%, and 10% labeled data ratios. FPS indicates the inference speed on the same hardware. Our method achieves superior accuracy across different labeled data ratios while maintaining real-time performance.
Category Method 1% 2% 5% 10% FPS
End-to-end Omni-DETR [37] 9.07% 15.1% 20.1% 28.7% 10.7
Semi-DETR [36] 9.72% 16.2% 21.5% 29.4% 9.1
Two-stage STAC [32] 5.54% 7.36% 12.9% 21.2% 14.4
Unbiased Teacher [33] 6.81% 12.5% 16.7% 24.5% 16.8
PseCo [38] 8.04% 14.8% 19.1% 27.8% 15.7
Humble Teacher [34] 6.48% 9.31% 15.2% 24.5% 17.8
One-stage Efficient Teacher (YOLOv5) [39] 7.33% 12.5% 15.7% 23.2% 50.5
Efficient Teacher (YOLOv11) 8.58% 14.7% 19.4% 27.8% 49.3
Unbiased Teacher v2 [40] 7.19% 9.82% 16.2% 23.5% 48.8
One Teacher [41] 7.47% 11.4% 15.8% 24.7% 46.7
Dense Teacher [42] 8.03% 12.8% 16.9% 25.1% 48.6
Ours 11.5% 18.9% 26.6% 36.3% 45.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated