Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Automatic Detection of Stop Words for Texts in the Uzbek Language

Version 1 : Received: 21 April 2022 / Approved: 26 April 2022 / Online: 26 April 2022 (10:30:54 CEST)

How to cite: Madatov, K.; Bekchanov, S.; Vičič, J. Automatic Detection of Stop Words for Texts in the Uzbek Language. Preprints 2022, 2022040234 (doi: 10.20944/preprints202204.0234.v1). Madatov, K.; Bekchanov, S.; Vičič, J. Automatic Detection of Stop Words for Texts in the Uzbek Language. Preprints 2022, 2022040234 (doi: 10.20944/preprints202204.0234.v1).

Abstract

Stop words are very important for information retrieval and text analysis investigation. This study aimed to automatically analyze and detect stop words in texts in the Uzbek language. Because of the limited availability of methods for automatic search of stop words of texts in Uzbek we analyzed a newly prepared corpus. The Uzbek language belongs to the family of agglutinative languages. As with all agglutinative languages, we can explain that the detection of stop words in Uzbek texts is a more complex process than in inflected languages: In inflected languages, words such as auxiliary words, articles, prepositions can be included in the stop words group. In agglutinative languages, the meanings of such words are hidden in the text. Therefore, it is not appropriate to apply all known methods of stop words detection in inflected languages directly to agglutinative languages. In this work, the “School corpus” which contains 731156 Uzbek words has been investigated. The bigram method of analysis was applied to the corpus. We proposed the collocation method of detecting stop words of the corpus. We proposed the method of automatically detecting stop words of texts in Uzbek. It is shown that the collocation method is 6 times better than the bigram method.

Keywords

stop word detection; Uzbek language; agglutinative language; algorithm

Subject

MATHEMATICS & COMPUTER SCIENCE, General & Theoretical Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.