Version 1
: Received: 5 November 2018 / Approved: 7 November 2018 / Online: 7 November 2018 (14:47:58 CET)
How to cite:
Mendonca dos Santos, I.; Trouve, A.; Fukuda, A.; Murakami, K. Exploring the effects of Clustering Algorithms on Free Text Recommendation. Preprints2018, 2018110172. https://doi.org/10.20944/preprints201811.0172.v1
Mendonca dos Santos, I.; Trouve, A.; Fukuda, A.; Murakami, K. Exploring the effects of Clustering Algorithms on Free Text Recommendation. Preprints 2018, 2018110172. https://doi.org/10.20944/preprints201811.0172.v1
Mendonca dos Santos, I.; Trouve, A.; Fukuda, A.; Murakami, K. Exploring the effects of Clustering Algorithms on Free Text Recommendation. Preprints2018, 2018110172. https://doi.org/10.20944/preprints201811.0172.v1
APA Style
Mendonca dos Santos, I., Trouve, A., Fukuda, A., & Murakami, K. (2018). Exploring the effects of Clustering Algorithms on Free Text Recommendation. Preprints. https://doi.org/10.20944/preprints201811.0172.v1
Chicago/Turabian Style
Mendonca dos Santos, I., Akira Fukuda and Kazuaki Murakami. 2018 "Exploring the effects of Clustering Algorithms on Free Text Recommendation" Preprints. https://doi.org/10.20944/preprints201811.0172.v1
Abstract
In this paper, we provide a study on the effects of applying classical clustering algorithms, such as k-Means to free text recommender systems. A typical recommender system may face problems when the number of items from a database goes from a few items to hundreds of items. Currently, one of the most prominent techniques to scale the database is applying clustering, however clustering may have a negative impact on the accuracy of the system when applied without taking into consideration the underlying items. In this work, we build a conceptual text recommender system and use k-Means to partition its search space into different groups. We study how the variation of the number of clusters affects its performance in the light of two performance measurements: recommendation time and precision. We also analyze if this clustering is affected by the representation of text we use. All the techniques used in this study uses word-embeddings to represent the document. One of the main findings of this work is that using clustering we can improve the recommendation time in up to almost 30 times without affecting much off its initial accuracy. Another interesting finding is that the increment of the number of clusters is not directly translated into linear performance.
Keywords
Clustering; Recommender-Systems; Word-Embedding
Subject
Computer Science and Mathematics, Information Systems
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.