Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features

Version 1 : Received: 2 January 2018 / Approved: 3 January 2018 / Online: 3 January 2018 (02:03:51 CET)

A peer-reviewed article of this Preprint also exists.

Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham. Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham.

Abstract

Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500,000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.

Keywords

Wikipedia; Polish; information quality; linguistic features; linguistics; data mining; NLP

Subject

Computer Science and Mathematics, Information Systems

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.