Submitted:
23 May 2026
Posted:
27 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Contributions
1.2. Scope
2. Materials and Methods
2.1. The Soft-Aspect ABSA Framework
2.1.1. Problem Formulation and Notation
2.1.2. Topic-Conditioned Aspect-Membership Posteriors
2.1.3. Aspect-Conditional Polarity Heads
2.1.4. Class-Weighted and Focal-Loss Corrections
2.1.5. Bootstrap Stability Protocol
2.2. Substrate Pipeline for the Case Study
2.2.1. Preprocessing
2.2.2. TF-IDF Vectorisation
2.2.3. Spectral Clustering with NMF Post-Processing
2.2.4. Supervised Polarity Head
2.3. Data
2.4. Bootstrap Label Scheme
2.5. Evaluation
3. Results
3.1. Corpus Characterisation
3.2. Topic Discovery and the Four-Aspect Substrate
3.3. Bootstrap Stability of the Cluster Count

3.4. Baseline Polarity Head: The Classifier-Collapse Failure Mode
3.5. Loss-Level Corrections: Class-Weighted and Focal-Loss Remediations


4. Discussion
4.1. Why Headline Accuracy Hides Classifier Failure
4.2. The Framework as an Empirical Remediation Pathway
4.3. The Tiny-Cluster Artefact and the Limits of Silhouette Point Estimates
4.4. The Role of Weak Supervision
4.5. Methodological Implications
5. Limitations and Future Work
5.1. Limitations
5.2. Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Hu, M.; Liu, B. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘04); ACM: Seattle, WA, USA, 2004; pp. 168–177. [Google Scholar] [CrossRef]
- Liu, B. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 2nd ed.; Cambridge University Press: Cambridge, UK, 2020; ISBN 978-1108486378. [Google Scholar]
- Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A survey on aspect-based sentiment analysis: Tasks, methods, and challenges. IEEE Trans. Knowl. Data Eng. 2023, 35, 11019–11038. [Google Scholar] [CrossRef]
- Haznitrama, F.G.; Choi, H.-J.; Chung, C.-W. Methodologies and their comparison in complex compound aspect-based sentiment analysis: A survey. AI Open 2025, 6, 53–69. [Google Scholar] [CrossRef]
- Niu, H.; Xiong, Y.; Wang, X.; Yu, W.; Zhang, Y.; Guo, Z. Adaptive structure induction for aspect-based sentiment analysis with spectral perspective. In Findings of the Association for Computational Linguistics: EMNLP 2023; ACL: Singapore, 2023; pp. 1113–1126. [Google Scholar] [CrossRef]
- Aziz, K.; Ji, D.; Chakrabarti, P.; Chakrabarti, T.; Iqbal, M.S.; Abbasi, R. Unifying aspect-based sentiment analysis BERT and multi-layered graph convolutional networks for comprehensive sentiment dissection. Sci. Rep. 2024, 14, 14646. [Google Scholar] [CrossRef]
- Feng, A.; Liu, T.; Li, X.; Jia, K.; Gao, Z. Dual syntax aware graph attention networks with prompt for aspect-based sentiment analysis. Sci. Rep. 2024, 14, 23528. [Google Scholar] [CrossRef]
- Xu, L.; Xie, H.; Qin, S.J.; Wang, F.L.; Tao, X. Exploring ChatGPT-based augmentation strategies for contrastive aspect-based sentiment analysis. IEEE Intell. Syst. 2025, 40, 69–76. [Google Scholar] [CrossRef]
- Huang, J.; Meng, Y.; Guo, F.; Ji, H.; Han, J. Weakly-supervised aspect-based sentiment analysis via joint aspect-sentiment topic embedding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020); ACL: Online, 2020; pp. 6989–6999. [Google Scholar] [CrossRef]
- Hutto, C.; Gilbert, E. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media (ICWSM); AAAI: Ann Arbor, MI, USA, 2014; pp. 216–225. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Lin, C.; He, Y. Joint sentiment/topic model for sentiment analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM ‘09); ACM: Hong Kong, China, 2009; pp. 375–384. [Google Scholar] [CrossRef]
- Mei, Q.; Ling, X.; Wondra, M.; Su, H.; Zhai, C. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of the 16th International Conference on World Wide Web (WWW ‘07); ACM: Banff, Canada, 2007; pp. 171–180. [Google Scholar] [CrossRef]
- Dieng, A.B.; Ruiz, F.J.R.; Blei, D.M. Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 2020, 8, 439–453. [Google Scholar] [CrossRef]
- Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar] [CrossRef]
- Bianchi, F.; Terragni, S.; Hovy, D. Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers); ACL: Online, 2021; pp. 759–766. [Google Scholar] [CrossRef]
- Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
- Lin, C.-J. Projected gradient methods for nonnegative matrix factorization. Neural Comput. 2007, 19, 2756–2779. [Google Scholar] [CrossRef]
- Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar] [CrossRef] [PubMed]
- von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
- Ratner, A.; Bach, S.H.; Ehrenberg, H.; Fries, J.; Wu, S.; Ré, C. Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endow. 2017, 11, 269–282. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN); IEEE: Hong Kong, China, 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Guo, H.; Li, Y.; Shang, J.; Gu, M.; Huang, Y.; Gong, B. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); IEEE: Venice, Italy, 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
- Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Long Beach, CA, USA, 2019; pp. 9268–9277. [Google Scholar] [CrossRef]
- Cao, K.; Wei, C.; Gaidon, A.; Aréchiga, N.; Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019); Curran Associates: Vancouver, Canada, 2019; pp. 1565–1576. [Google Scholar]
- Douzas, G.; Bação, F.; Last, F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf. Sci. 2018, 465, 1–20. [Google Scholar] [CrossRef]
- Liu, Y.; Zhu, Y.; Cui, B.; et al. A comprehensive survey on imbalanced data learning. Front. Comput. Sci. 2025, 20, 2011622. [Google Scholar] [CrossRef]
- Krol, K.; Philippou, E.; De Cristofaro, E.; Sasse, M.A. ‘They brought in the horrible key ring thing!’ analysing the usability of two-factor authentication in UK online banking. In NDSS Workshop on Usable Security (USEC); ISOC: San Diego, CA, USA, 2015. [Google Scholar] [CrossRef]
- Brauwers, G.; Frasincar, F. A survey on aspect-based sentiment classification. ACM Comput. Surv. 2022, 55, 65. [Google Scholar] [CrossRef]
- Liang, S.; Wei, W.; Mao, X.-L.; Wang, F.; He, Z. BiSyn-GAT+: Bi-syntax aware graph attention network for aspect-based sentiment analysis. In Findings of the Association for Computational Linguistics: ACL 2022; ACL: Dublin, Ireland, 2022; pp. 1835–1848. [Google Scholar] [CrossRef]





| Cluster | Aspect Name | n | Pos. Share | Top NMF Terms (Top-10) |
|---|---|---|---|---|
| C0 | Account & Transactions | 101 | 0.337 | account, money, app, number, bank, wallet, good, login, transfer, mobile |
| C1 | Positive Praise (small) | 8 | 0.375 | convenient, simple, fast, easy, great, use, app, code, operation, alot |
| C2 | Access & Device Errors | 98 | 0.071 | access, app, denied, phone, device, developer, please, sim, open, mode |
| C3 | Overall App Quality | 85 | 0.541 | best, app, banking, easy, gcb, use, far, one, great, feature |
| Metric | Value | Interpretation |
|---|---|---|
| Accuracy | 0.6949 | Equals the test-set majority-class prior. |
| Balanced accuracy | 0.5000 | Equal to 0.5—the constant predictor. |
| Precision (positive) | 0.0000 | Model never predicts positive. |
| Recall (positive) | 0.0000 | No positive review recovered. |
| F1 (positive) | 0.0000 | Harmonic mean collapses with recall. |
| ROC-AUC | 0.9336 | Probabilities are well-ranked; only the threshold is wrong. |
| PR-AUC (positive) | 0.9290 | High area under PR curve confirms ranking is informative. |
| MCC | 0.0000 | No correlation between prediction and label. |
| Cohen’s κ | 0.0000 | No agreement beyond chance. |
| Configuration | Acc. | Bal. acc. | F1+ | Rec+ | ROC-AUC | MCC |
|---|---|---|---|---|---|---|
| Baseline BCE | 0.695 | 0.500 | 0.000 | 0.000 | 0.934 | 0.000 |
| Class-weighted CE | 0.797 | 0.745 | 0.647 | 0.611 | 0.896 | 0.507 |
| Focal loss γ = 1 | 0.864 | 0.902 | 0.818 | 1.000 | 0.986 | 0.746 |
| Focal loss γ = 2 | 0.746 | 0.770 | 0.667 | 0.833 | 0.862 | 0.500 |
| Focal loss γ = 5 | 0.831 | 0.862 | 0.773 | 0.944 | 0.942 | 0.672 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).