Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham.
Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham.
Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham.
Lewoniewski W., Węcel K., Abramowicz W. (2018) Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features. In: Damaševičius R., Vasiljevienė G. (eds) Information and Software Technologies. ICIST 2018. Communications in Computer and Information Science, vol 920. Springer, Cham.
Abstract
Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500,000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.
Keywords
Wikipedia; Polish; information quality; linguistic features; linguistics; data mining; NLP
Subject
MATHEMATICS & COMPUTER SCIENCE, Information Technology & Data Management
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.