Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Investigating Misinformation about COVID-19 on YouTube using Topic Modeling, Sentiment Analysis, and Language Analysis

Version 1 : Received: 21 December 2023 / Approved: 21 December 2023 / Online: 21 December 2023 (11:52:17 CET)

A peer-reviewed article of this Preprint also exists.

Thakur, N.; Cui, S.; Knieling, V.; Khanna, K.; Shao, M. Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis. Computation 2024, 12, 28. Thakur, N.; Cui, S.; Knieling, V.; Khanna, K.; Shao, M. Investigation of the Misinformation about COVID-19 on YouTube Using Topic Modeling, Sentiment Analysis, and Language Analysis. Computation 2024, 12, 28.

Abstract

The work presented in this paper makes multiple scientific contributions with a specific focus on the analysis of misinformation about COVID-19 on YouTube. First, the results of topic modeling performed on the video descriptions of YouTube videos containing misinformation about COVID-19 revealed four distinct themes or focus areas - Promotion and Outreach Efforts, Treatment for COVID-19, Conspiracy Theories regarding COVID-19, and COVID-19 and Politics. Second, the results of topic-specific sentiment analysis revealed the sentiment associated with each of these themes. For the videos belonging to the theme of Promotion and Outreach Efforts, 45.8% were neutral, 39.8% were positive, and 14.4% were negative, for the videos belonging to the theme of Treatment for COVID-19, 38.113% were positive, 31.343% were neutral, and 30.544% were negative, for the videos belonging to the theme of Conspiracy Theories regarding COVID-19, 46.9% were positive, 31.0% were neutral, and 22.1% were negative, and for the videos belonging to the theme of COVID-19 and Politics, 35.70% were positive, 32.86% were negative, and 31.44% were negative. Third, topic-specific language analysis was performed to detect the various languages in which the video descriptions per topic were published on YouTube. This analysis revealed multiple novel insights. For instance, for all the themes, English and Spanish were the most widely used and second-most widely used languages, respectively. Fourth, the patterns of sharing these videos on other social media channels such as Facebook and Twitter were also investigated. The results revealed that videos containing video descriptions in English were shared the highest number of times on Facebook and Twitter. Finally, correlation analysis was performed by taking into account multiple characteristics of these videos. The results revealed that the correlation between the length of the video title and the number of Tweets as well as the correlation between the length of the video title and the number of Facebook posts was statistically significant.

Keywords

COVID-19; YouTube; Misinformation; Big Data; Data Analysis; Topic Modeling; Sentiment Analysis; Correlation Analysis

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.