Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

POS‐HC: A Part‐of‐Speech Hierarchical Clustering Approach for Normative Texts Partition

Version 1 : Received: 27 February 2024 / Approved: 27 February 2024 / Online: 28 February 2024 (08:28:07 CET)

How to cite: Li, W.; Liu, Y.; Deng, K.; Wu, X. POS‐HC: A Part‐of‐Speech Hierarchical Clustering Approach for Normative Texts Partition. Preprints 2024, 2024021575. https://doi.org/10.20944/preprints202402.1575.v1 Li, W.; Liu, Y.; Deng, K.; Wu, X. POS‐HC: A Part‐of‐Speech Hierarchical Clustering Approach for Normative Texts Partition. Preprints 2024, 2024021575. https://doi.org/10.20944/preprints202402.1575.v1

Abstract

Chinese texts often feature a substantial volume of normative content, and their analysis has increasingly become a focal point in recent semantic analysis endeavors. Enhancing the effectiveness of text classification and clustering is pivotal in unveiling the intricate semantics within extensive textual data. Empirical research within the realm of computational social science has underscored the need for improved explanatory power when it comes to general text clustering methods applied to normative texts. This study introduces an innovative hierarchical clustering method that incorporates part-of-speech (POS) features and pioneers a novel semantic network model. It achieves this by amalgamating a POS-based feature weight calculation method with hierarchical clustering techniques. To assess its efficacy, the method is rigorously evaluated across three diverse text datasets, encompassing normative reports, news articles, and social media content. The experimental results unequivocally demonstrate the superior performance of the POS-HC method, surpassing the traditional TF and TF-IDF feature extraction methods in terms of clustering accuracy, and surpassing some existing clustering methods. Furthermore, its classification effectiveness exhibits notable advantages, particularly in the semantic classification of normative texts.

Keywords

NLP; semantic partition; clustering techniques; POS

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.