Submitted:
29 May 2025
Posted:
30 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Methodology
3.1. Data Preprocessing
3.1.1. Data Generation
3.1.2. Data Augmentation
3.1.3. Data Validation and Quality Check
3.1.4. Data Splitting
3.2. Teacher Model
3.2.1. DeBERTa Model
3.2.2. Mamba Models
3.2.3. Ensemble Model
3.3. Student Model
3.4. Training of the Student Model
3.4.1. Data Selection
3.4.2. Prediction Averaging
3.5. Training Details and Fine-Tuning
- Context Length: One model uses a context length of 128 tokens, while the other uses 256 tokens.
- Batch Size: Both models are trained with a batch size of 16 for short-context inputs.
- Learning Rate: A linear warm-up followed by linear decay is employed, improving training stability and adaptation.
- Dropout: Dropout is disabled during training and inference, ensuring consistent predictions.
3.6. Inference Process
4. Evaluation Metrics
- Accuracy: Measures the proportion of correctly classified instances among all instances:where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.
- F1-Score: Provides a harmonic mean of precision and recall, balancing the trade-off between false positives and false negatives:with and .
- Logarithmic Loss (LogLoss): Captures the uncertainty in predictions, penalizing confident yet incorrect predictions:where is the true label and is the predicted probability for instance i.
-
Area Under the Receiver Operating Characteristic Curve (AUROC): Evaluates the model’s ability to distinguish between classes by plotting true positive rate (TPR) against false positive rate (FPR):The AUROC value ranges from 0.5 (random guessing) to 1 (perfect classification).
5. Experiment Results
6. Conclusion
References
- Baskara, F.R. AI-Driven Dynamics: ChatGPT Transforming ELT Teacher-Student Interactions. Lensa: Kajian Kebahasaan, Kesusastraan, dan Budaya 2023, 13, 261–275. [Google Scholar] [CrossRef]
- Shen, G. Computation Offloading for Better Real-Time Technical Market Analysis on Mobile Devices. In Proceedings of the Proceedings of the 2021 3rd International Conference on Image Processing and Machine Vision, 2021, pp. 72–76.
- Sun, Y.; Xiang, Y.; Zou, D.; Li, N.; Chen, H. A Multi-Objective Recommender System for Enhanced Consumer Behavior Prediction in E-Commerce. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS). IEEE; 2024; pp. 884–889. [Google Scholar]
- Xu, J.; Wang, Y. Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture. arXiv arXiv:2412.11557 2024. [CrossRef]
- Lu, J.; Long, Y.; Li, X.; Shen, Y.; Wang, X. Hybrid Model Integration of LightGBM, DeepFM, and DIN for Enhanced Purchase Prediction on the Elo Dataset. In Proceedings of the 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE). IEEE; 2024; pp. 16–20. [Google Scholar]
- Lu, J. Enhancing Chatbot User Satisfaction: A Machine Learning Approach Integrating Decision Tree, TF-IDF, and BERTopic. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS). IEEE; 2024; pp. 823–828. [Google Scholar]
- Jin, T. Integrated Machine Learning for Enhanced Supply Chain Risk Prediction 2025.
- Li, S. Harnessing multimodal data and mult-recall strategies for enhanced product recommendation in e-commerce. In Proceedings of the 2024 4th International Conference on Computer Systems (ICCS). IEEE; 2024; pp. 181–185. [Google Scholar]
- Jin, T. Attention-Based Temporal Convolutional Networks and Reinforcement Learning for Supply Chain Delay Prediction and Inventory Optimization 2025.
- Yang, Y. Large Capacity Data Hiding in Binary Image black and white mixed regions. In Proceedings of the 2023 3rd International Conference on Electronic Information Engineering and Computer (EIECT). IEEE; 2023; pp. 516–521. [Google Scholar]
- Li, S.; Zhou, X.; Wu, Z.; Long, Y.; Shen, Y. Strategic deductive reasoning in large language models: A dual-agent approach. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS). IEEE; 2024; pp. 834–839. [Google Scholar]





| Parameter | Range/Values |
|---|---|
| Sampling Temperature | [0, 2] |
| Top-K Filter | [Disabled, 20, 40] |
| Top-P Filter | [0.5, 1] |
| Frequency Penalty | [0, 0.5] |
| Augmentation Technique | Probability |
|---|---|
| Spelling Correction (jamspell) | 70% for persuade, 20% for others |
| Character Blacklist Removal | 70% for persuade, 20% for others |
| Typo Introduction | 10% per typo |
| Capitalization Flipping | 10% per flip |
| Model | Accuracy | F1-Score | LogLoss | AUROC |
|---|---|---|---|---|
| Baseline (DeBERTa) | 91.2% | 89.7% | 0.233 | 0.928 |
| Baseline (Mamba) | 88.5% | 87.3% | 0.295 | 0.910 |
| Proposed Ensemble | 93.8% | 91.4% | 0.195 | 0.943 |
| Model Variant | Accuracy | F1-Score |
|---|---|---|
| Without Data Augmentation | 91.7% | 90.1% |
| Without Short Context Model | 92.3% | 90.5% |
| Proposed Full Model | 93.8% | 91.4% |
| Model Variant | LogLoss | AUROC |
| Without Data Augmentation | 0.228 | 0.930 |
| Without Short Context Model | 0.215 | 0.935 |
| Proposed Full Model | 0.195 | 0.943 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).