Submitted:
28 January 2026
Posted:
29 January 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction

- We propose Class-Adaptive Ensemble-Vote Consistency (AEVC), a novel semi-supervised text classification framework that effectively integrates co-training, consistency regularization, and pseudo-labeling.
- We introduce the Dynamically Weighted Ensemble Prediction (DWEP) module, which adaptively combines predictions from multiple classification heads based on class-specific confidence and consistency, leading to more robust pseudo-label generation.
- We develop the Class-Aware Pseudo-Label Adjustment (CAPLA) mechanism, specifically designed to mitigate the class imbalance problem through category-specific pseudo-label filtering and dynamic weighting, thereby significantly boosting the recognition performance of minority classes.
2. Related Work
2.1. Semi-Supervised Text Classification
2.2. Addressing Class Imbalance in Text Classification
3. Method

3.1. Model Architecture and Multi-Head Design
3.2. Dynamically Weighted Ensemble Prediction (DWEP)
3.3. Class-Aware Pseudo-Label Adjustment (CAPLA)
3.4. Overall Training Objective
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Labeled Data Amount
4.1.3. Baseline Methods
- FixMatch [5]: A foundational consistency regularization method that uses a confidence threshold for pseudo-labeling.
- FreeMatch [5]: An extension of FixMatch that employs an adaptive confidence threshold mechanism.
- MarginMatch [5]: Enhances pseudo-label filtering by incorporating a dynamic pseudo-margin strategy.
- MultiMatch [18]: A strong baseline that leverages multi-head consistency and a more complex pseudo-label weighting scheme, serving as our most direct comparison given its architectural similarities.
4.1.4. Evaluation Metrics
4.1.5. Data Preprocessing and Augmentation
- Weak Augmentation (): Implemented as an identity transformation, meaning the input text remained unchanged.
- Strong Augmentation (): Utilized back-translation, where text was translated from English to an intermediate language (e.g., German or Russian) and then back to English. This strategy generates semantically invariant but syntactically diverse samples.
4.1.6. Training Details
4.2. Performance Comparison
4.2.1. Results on Balanced Datasets
4.2.2. Results on Highly Imbalanced Datasets
4.3. Ablation Study
- When the CAPLA mechanism is removed (i.e., using uniform confidence thresholds and weights across all classes), the average error rate increases from 29.95% to 31.10%. This significant degradation highlights CAPLA’s critical role in mitigating class imbalance by adaptively promoting learning from minority classes.
- Disabling the DWEP module (i.e., using a simple average for ensemble predictions instead of dynamic weighting), while retaining CAPLA, results in an error rate of 30.65%. This indicates that dynamic weighting significantly improves the quality and robustness of pseudo-labels, even when class-aware adjustments are in place. The adaptive nature of DWEP ensures that more reliable heads contribute more, leading to better ensemble predictions.
- When both DWEP and CAPLA are removed, reducing AEVC to a basic multi-head consistency model with standard pseudo-labeling, the error rate further increases to 32.05%. This variant performs worse than MultiMatch (31.50% in Table 2), underscoring the combined positive impact of our proposed innovations.
4.4. Human Evaluation on Minority Class Samples
4.5. Analysis of Class-Adaptive Pseudo-Label Adjustment (CAPLA)
4.6. Impact of Dynamic Ensemble Weighting (DWEP)
4.7. Sensitivity to Labeled Data Amount
4.8. Hyperparameter Sensitivity Analysis
5. Conclusions
References
- Chen, L.; Garcia, F.; Kumar, V.; Xie, H.; Lu, J. Industry Scale Semi-Supervised Learning for Natural Language Understanding. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, 2021; Association for Computational Linguistics; pp. 311–318. [Google Scholar] [CrossRef]
- Hsieh, C.Y.; Li, C.L.; Yeh, C.k.; Nakhost, H.; Fujii, Y.; Ratner, A.; Krishna, R.; Lee, C.Y.; Pfister, T. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023; Association for Computational Linguistics, 2023; pp. 8003–8017. [Google Scholar] [CrossRef]
- Wei, K.; Zhong, J.; Zhang, H.; Zhang, F.; Zhang, D.; Jin, L.; Yu, Y.; Zhang, J. Chain-of-specificity: Enhancing task-specific constraint adherence in large language models. In Proceedings of the Proceedings of the 31st International Conference on Computational Linguistics, 2025; pp. 2401–2416. [Google Scholar]
- Luo, Y.; Zheng, Z.; Zhu, Z.; You, Y. How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning? In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024; pp. 5321–5335. [Google Scholar]
- Li, J.; Pan, J.; Tan, V.Y.F.; Toh, K.; Zhou, P. Towards Understanding Why FixMatch Generalizes Better Than Supervised Learning. In Proceedings of the The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28; 2025. Available online: OpenReview.net. [Google Scholar]
- Luo, Y.; Ren, X.; Zheng, Z.; Jiang, Z.; Jiang, X.; You, Y. CAME: Confidence-guided Adaptive Memory Efficient Optimization. Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics 2023, Volume 1, 4442–4453. [Google Scholar]
- Tan, Q.; He, R.; Bing, L.; Ng, H.T. Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022; Association for Computational Linguistics, 2022; pp. 1672–1681. [Google Scholar] [CrossRef]
- Shi, W.; Li, F.; Li, J.; Fei, H.; Ji, D. Effective Token Graph Modeling using a Novel Labeling Strategy for Structured Sentiment Analysis. Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics 2022, Volume 1, 4232–4241. [Google Scholar] [CrossRef]
- Li, T.; Luo, Y.; Zhang, W.; Duan, L.; Liu, J. Harder-net: Hardness-guided discrimination network for 3d early activity prediction. IEEE Transactions on Circuits and Systems for Video Technology, 2024. [Google Scholar]
- Hoxha, A.; Shehu, B.; Kola, E.; Koklukaya, E. A Survey of Generative Video Models as Visual Reasoners. 2026. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Zhang, X.; Zhao, S.; Zhang, Y.; Li, J.; Zhang, L.; Zhang, J. Q-insight: Understanding image quality via visual reinforcement learning. arXiv arXiv:2503.22679. [CrossRef]
- Zhang, X.; Li, W.; Zhao, S.; Li, J.; Zhang, L.; Zhang, J. VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning. arXiv arXiv:2506.18564.
- Xu, Z.; Zhang, X.; Zhou, X.; Zhang, J. AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection. arXiv arXiv:2505.15173.
- Zhou, H.; Wang, J.; Cui, X. Causal effect of immune cells, metabolites, cathepsins, and vitamin therapy in diabetic retinopathy: a Mendelian randomization and cross-sectional study. Frontiers in Immunology 2024, 15, 1443236. [Google Scholar] [CrossRef] [PubMed]
- Xuehao, C.; Dejia, W.; Xiaorong, L. Integration of Immunometabolic Composite Indices and Machine Learning for Diabetic Retinopathy Risk Stratification: Insights from NHANES 2011–2020. Ophthalmology Science 2025, 100854. [Google Scholar] [CrossRef] [PubMed]
- Hui, J.; Cui, X.; Han, Q. Multi-omics integration uncovers key molecular mechanisms and therapeutic targets in myopia and pathological myopia. Asia-Pacific Journal of Ophthalmology 2026, 100277. [Google Scholar] [CrossRef] [PubMed]
- Uchendu, A.; Ma, Z.; Le, T.; Zhang, R.; Lee, D. TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics 2021, 2001–2016. [Google Scholar] [CrossRef]
- Sirbu, I.; Popovici, R.; Caragea, C.; Trausan-Matu, S.; Rebedea, T. MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification. CoRR 2025. [Google Scholar] [CrossRef]
- Hu, X.; Zhang, C.; Ma, F.; Liu, C.; Wen, L.; Yu, P.S. Semi-supervised Relation Extraction via Incremental Meta Self-Training. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021; Association for Computational Linguistics, 2021; pp. 487–496. [Google Scholar] [CrossRef]
- Wei, K.; Sun, X.; Zhang, Z.; Zhang, J.; Zhi, G.; Jin, L. Trigger is not sufficient: Exploiting frame-aware knowledge for implicit event argument extraction. Proceedings of the Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021, Volume 1, 4672–4682. [Google Scholar]
- Wei, K.; Yang, Y.; Jin, L.; Sun, X.; Zhang, Z.; Zhang, J.; Li, X.; Zhang, L.; Liu, J.; Zhi, G. Guide the many-to-one assignment: Open information extraction via iou-aware optimal transport. Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics 2023, Volume 1, 4971–4984. [Google Scholar]
- Xiao, B.; Shen, Q.; Wang, D.Z. From Text to Multi-Modal: Advancing Low-Resource-Language Translation through Synthetic Data Generation and Cross-Modal Alignments. Proceedings of the Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025, 2025, 24–35. [Google Scholar]
- Zhang, L.; Ding, J.; Xu, Y.; Liu, Y.; Zhou, S. Weakly-supervised Text Classification Based on Keyword Graph. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2021; pp. 2803–2813. [Google Scholar] [CrossRef]
- Yoo, K.M.; Park, D.; Kang, J.; Lee, S.W.; Park, W. GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021; Association for Computational Linguistics, 2021; pp. 2225–2239. [Google Scholar] [CrossRef]
- Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021; Association for Computational Linguistics; pp. 6894–6910. [Google Scholar] [CrossRef]
- Zang, J.; Liu, H. Improving text semantic similarity modeling through a 3d siamese network. arXiv 2023. arXiv:2307.09274. [CrossRef]
- Zang, J.; Liu, H. Modeling selective feature attention for lightweight text matching. In Proceedings of the Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024; pp. 6624–6632. [Google Scholar]
- Li, C.; Xu, H.; Tian, J.; Wang, W.; Yan, M.; Bi, B.; Ye, J.; Chen, H.; Xu, G.; Cao, Z.; et al. mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022; Association for Computational Linguistics; pp. 7241–7259. [Google Scholar] [CrossRef]
- Rojas, M.A.; Gu, H.; Carranza, R. Instruction Tuning for Multimodal Models: A Survey of Data, Methods, and Evaluation. 2025. [Google Scholar]
- Si, S.; Zhao, H.; Chen, G.; Li, Y.; Luo, K.; Lv, C.; An, K.; Qi, F.; Chang, B.; Sun, M. GATEAU: Selecting Influential Samples for Long Context Alignment. In Proceedings of the Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing; Suzhou, China, Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V., Eds.; 2025; pp. 7380–7411. [Google Scholar] [CrossRef]
- Si, S.; Ma, W.; Gao, H.; Wu, Y.; Lin, T.E.; Dai, Y.; Li, H.; Yan, R.; Huang, F.; Li, Y. SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents. In Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Google Scholar]
- Si, S.; Zhao, H.; Luo, K.; Chen, G.; Qi, F.; Zhang, M.; Chang, B.; Sun, M. A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks. arXiv 2025, arXiv:2510.05608. [Google Scholar] [CrossRef]
- Xiao, B.; Yin, Z.; Shan, Z. Simulating public administration crisis: A novel generative agent-based simulation system to lower technology barriers in social science research. arXiv 2023, arXiv:2311.06957. [Google Scholar] [CrossRef]
- Xiao, B.; Bennie, M.; Bardhan, J.; Wang, D.Z. Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models. arXiv arXiv:2502.17669. [CrossRef]
- Ye, D.; Lin, Y.; Huang, Y.; Sun, M. TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 5798–5809. [Google Scholar] [CrossRef]
- Dai, L.; Xu, Y.; Ye, J.; Liu, H.; Xiong, H. Seper: Measure retrieval utility through the lens of semantic perplexity reduction. arXiv arXiv:2503.01478. [CrossRef]
- Dai, X.; Chalkidis, I.; Darkner, S.; Elliott, D. Revisiting Transformer-based Models for Long Document Classification. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022; Association for Computational Linguistics, 2022; pp. 7212–7230. [Google Scholar] [CrossRef]
- Wei, J.; Huang, C.; Vosoughi, S.; Cheng, Y.; Xu, S. Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 5493–5500. [Google Scholar] [CrossRef]
- Deng, Z.; Peng, H.; He, D.; Li, J.; Yu, P. HTCInfoMax: A Global Model for Hierarchical Text Classification via Information Maximization. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 3259–3265. [Google Scholar] [CrossRef]
- Lehman, E.; Jain, S.; Pichotta, K.; Goldberg, Y.; Wallace, B. Does BERT Pretrained on Clinical Notes Reveal Sensitive Data? In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 946–959. [Google Scholar] [CrossRef]
- Zang, J.; Liu, H. Explanation based bias decoupling regularization for natural language inference. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), 2024; IEEE; pp. 1–8. [Google Scholar]
- Gera, A.; Halfon, A.; Shnarch, E.; Perlitz, Y.; Ein-Dor, L.; Slonim, N. Zero-Shot Text Classification with Self-Training. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022; Association for Computational Linguistics; pp. 1107–1119. [Google Scholar] [CrossRef]
- Du, M.; Manjunatha, V.; Jain, R.; Deshpande, R.; Dernoncourt, F.; Gu, J.; Sun, T.; Hu, X. Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 915–929. [Google Scholar] [CrossRef]
- Wang, Z.; Mekala, D.; Shang, J. X-Class: Text Classification with Extremely Weak Supervision. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 3043–3053. [Google Scholar] [CrossRef]
- Shen, J.; Qiu, W.; Meng, Y.; Shang, J.; Ren, X.; Han, J. TaxoClass: Hierarchical Multi-Label Text Classification Using Only Class Names. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021; Association for Computational Linguistics; pp. 4239–4249. [Google Scholar] [CrossRef]


| Dataset (Labeled Samples) | FixMatch | FreeMatch | MarginMatch | MultiMatch | AEVC (Ours) | MultiMatch Improve (%)† |
| IMDB (100) | 32.15 | 31.02 | 30.58 | 29.83 | 29.15 | 0.68 |
| AG News (200) | 31.88 | 30.95 | 30.51 | 30.33 | 29.78 | 0.55 |
| Amazon Review (200) | 33.01 | 32.18 | 31.65 | 31.29 | 30.55 | 0.74 |
| Yahoo! Answers (200) | 29.80 | 28.92 | 28.35 | 27.91 | 27.32 | 0.59 |
| Yelp Review (200) | 31.50 | 30.65 | 30.12 | 29.75 | 29.05 | 0.70 |
| Average (Balanced) | 31.67 | 30.74 | 30.24 | 29.82 | 29.17 | 0.65 |
| Setting | FixMatch | FreeMatch | MarginMatch | MultiMatch | AEVC (Ours) | MultiMatch Improve (%)† |
| Avg. (Imbalanced) | 35.10 | 34.05 | 32.90 | 31.50 | 29.95 | 1.55 |
| Model Variant | Average Error Rate (%) |
| AEVC (Full Model) | 29.95 |
| AEVC w/o CAPLA (Uniform Thresholds/Weights) | 31.10 |
| AEVC w/o DWEP (Simple Average Ensemble) | 30.65 |
| AEVC w/o DWEP & w/o CAPLA (Basic Multi-Head) | 32.05 |
| Model Variant | Class Type | Accepted Pseudo-Labels (%)† | Avg. Pseudo-Label Weight†† |
| AEVC (Full Model) | Minority | 65.8 | 1.85 |
| Majority | 78.1 | 0.90 | |
| AEVC w/o CAPLA | Minority | 42.3 | 1.00 |
| (Uniform Thresh/Weights) | Majority | 81.5 | 1.00 |
| Dataset | Labeled Samples | MultiMatch | AEVC (Ours) | Improvement (%)† |
| IMDB | 20 | 35.12 | 33.95 | 1.17 |
| 40 | 33.50 | 32.20 | 1.30 | |
| 100 | 29.83 | 29.15 | 0.68 | |
| AG News | 40 | 34.80 | 33.55 | 1.25 |
| 100 | 32.50 | 31.70 | 0.80 | |
| 200 | 30.33 | 29.78 | 0.55 |
| Hyperparameter | Value | Average Error Rate (%) |
| Unsupervised | 0.5 | 30.40 |
| Loss Weight | 1.0 (Default) | 29.95 |
| 2.0 | 30.25 | |
| 5.0 | 31.05 | |
| Number of Heads | 2 | 30.30 |
| H | 3 (Default) | 29.95 |
| 4 | 30.15 | |
| 5 | 30.45 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).