Version 1
: Received: 6 June 2021 / Approved: 7 June 2021 / Online: 7 June 2021 (16:16:18 CEST)
Version 2
: Received: 23 November 2021 / Approved: 23 November 2021 / Online: 23 November 2021 (14:45:31 CET)
Version 3
: Received: 9 February 2022 / Approved: 17 February 2022 / Online: 17 February 2022 (13:15:23 CET)
How to cite:
Shoeibi, N.; Shoeibi, N.; Chamoso, P.; AlizadehSani, Z.; Corchado, J. M. Similarity Approximation of Twitter Profiles. Preprints2021, 2021060196. https://doi.org/10.20944/preprints202106.0196.v2
Shoeibi, N.; Shoeibi, N.; Chamoso, P.; AlizadehSani, Z.; Corchado, J. M. Similarity Approximation of Twitter Profiles. Preprints 2021, 2021060196. https://doi.org/10.20944/preprints202106.0196.v2
Shoeibi, N.; Shoeibi, N.; Chamoso, P.; AlizadehSani, Z.; Corchado, J. M. Similarity Approximation of Twitter Profiles. Preprints2021, 2021060196. https://doi.org/10.20944/preprints202106.0196.v2
APA Style
Shoeibi, N., Shoeibi, N., Chamoso, P., AlizadehSani, Z., & Corchado, J. M. (2021). Similarity Approximation of Twitter Profiles. Preprints. https://doi.org/10.20944/preprints202106.0196.v2
Chicago/Turabian Style
Shoeibi, N., Zakie AlizadehSani and Juan M. Corchado. 2021 "Similarity Approximation of Twitter Profiles" Preprints. https://doi.org/10.20944/preprints202106.0196.v2
Abstract
Social media platforms have been entirely an undeniable part of the lifestyle for the past decade. Analyzing the information being shared is a crucial step to understanding human behavior. Social media analysis aims to guarantee a better experience for the user and risen user satisfaction. However, first, it is necessary to know how and from which aspects to compare users. In this paper, an intelligent system has been proposed to measure the similarity of Twitter profiles. For this, firstly, the timeline of each profile has been extracted using the official TwitterAPI. Then, all information is given to the proposed system. Next, in parallel, three aspects of a profile are derived. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized for the comparison of the behavioral ratios of two profiles. Next, the audience network is extracted for each user, and for estimating the similarity of two sets, Jaccard similarity is used. Finally, for the Content similarity measurement, the tweets are preprocessed respecting the feature extraction method; TF-IDF and DistilBERT for feature extraction are employed and then compared using the cosine similarity method. Results have shown that TF-IDF has slightly better performance; therefore, the more straightforward solution is selected for the model. Similarity level of different profiles. As in the case study, a Random Forest classification model was trained on almost 20000 users revealed a 97.24% accuracy. This comparison enables us to find duplicate profiles with nearly the same behavior and content.
Keywords
Twitter; Social Media; Social Networking; Social Network Analytic; Text Similarity; Natural Language Processing; User Engagement; DistilBERT
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.