Preprint Article Version 1 This version is not peer-reviewed

Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features

Version 1 : Received: 2 January 2018 / Approved: 3 January 2018 / Online: 3 January 2018 (02:03:51 CET)

How to cite: Lewoniewski, W.; Węcel, K.; Abramowicz, W. Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. Preprints 2018, 2018010017 (doi: 10.20944/preprints201801.0017.v1). Lewoniewski, W.; Węcel, K.; Abramowicz, W. Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. Preprints 2018, 2018010017 (doi: 10.20944/preprints201801.0017.v1).

Abstract

Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500,000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.

Subject Areas

Wikipedia; Polish; information quality; linguistic features; linguistics; data mining; NLP

Readers' Comments and Ratings (0)

Leave a public comment
Send a private comment to the author(s)
Rate this article
Views 0
Downloads 0
Comments 0
Metrics 0
Leave a public comment

×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.