PreprintArticleVersion 2Preserved in Portico This version is not peer-reviewed
An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection
Version 1
: Received: 17 May 2022 / Approved: 18 May 2022 / Online: 18 May 2022 (08:09:18 CEST)
Version 2
: Received: 7 July 2022 / Approved: 7 July 2022 / Online: 7 July 2022 (08:36:40 CEST)
Thakur, N.; Han, C.Y. An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection. COVID2022, 2, 1026-1049.
Thakur, N.; Han, C.Y. An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection. COVID 2022, 2, 1026-1049.
Thakur, N.; Han, C.Y. An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection. COVID2022, 2, 1026-1049.
Thakur, N.; Han, C.Y. An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection. COVID 2022, 2, 1026-1049.
Abstract
This paper presents the findings of an exploratory study on the continuously generating Big Data on Twitter related to the sharing of information, news, views, opinions, ideas, knowledge, feedback, and experiences about the COVID-19 pandemic, with a specific focus on the Omicron variant, which is the globally dominant variant of SARS-CoV-2 at this time. A total of 12028 tweets about the Omicron variant were studied, and the specific characteristics of tweets that were analyzed include - sentiment, language, source, type, and embedded URLs. The findings of this study are manifold. First, from sentiment analysis, it was observed that 50.5% of tweets had the ‘neutral’ emotion. The other emotions - ‘bad’, ‘good’, ‘terrible’, and ‘great’ were found in 15.6%, 14.0%, 12.5%, and 7.5% of the tweets, respectively. Second, the findings of language interpretation showed that 65.9% of the tweets were posted in English. It was followed by Spanish or Castillian, French, Italian, Japanese, and other languages, which were found in 10.5%, 5.1%, 3.3%, 2.5%, and <2% of the tweets, respectively. Third, the findings from source tracking showed that “Twitter for Android” was associated with 35.2% of tweets. It was followed by “Twitter Web App”, “Twitter for iPhone”, “Twitter for iPad”, “TweetDeck”, and all other sources that accounted for 29.2%, 25.8%, 3.8%, 1.6%, and <1% of the tweets, respectively. Fourth, studying the type of tweets revealed that retweets accounted for 60.8% of the tweets, it was followed by original tweets and replies that accounted for 19.8% and 19.4% of the tweets, respectively. Fifth, in terms of embedded URL analysis, the most common domains embedded in the tweets were found to be twitter.com, which was followed by biorxiv.org, nature.com, wapo.st, nzherald.co.nz, recvprofits.com, science.org, and other URLs. Finally, to support similar research and development in this field centered around the analysis of tweets, we have developed an open-access Twitter dataset that comprises tweets about the SARS-CoV-2 omicron variant since the first detected case of this variant on November 24, 2021.
Keywords
COVID-19; SARS-CoV-2; Omicron; Twitter; tweets; sentiment analysis; big data; Natural Language Processing; Data Science; Data Analysis
Subject
Computer Science and Mathematics, Information Systems
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Received:
7 July 2022
Commenter:
Nirmalya Thakur
Commenter's Conflict of Interests:
Author
Comment:
The following are the changes that have been made in this version of the preprint: 1. The Twitter dataset that was developed while conducting this study is now hosted online as open-access and the link to the dataset has been included in the paper. 2. A proper explanation has been added on how this study and the associated dataset are compliant with Twitter privacy policies, Twitter developer guidelines, and guidelines for content redistribution of Twitter. 3. The methodology section has been revised to include the details about the research tool that was used for this study. 4. The discussion section has been rewritten to highlight the scientific contributions of this paper.
Commenter: Nirmalya Thakur
Commenter's Conflict of Interests: Author
1. The Twitter dataset that was developed while conducting this study is now hosted online as open-access and the link to the dataset has been included in the paper.
2. A proper explanation has been added on how this study and the associated dataset are compliant with Twitter privacy policies, Twitter developer guidelines, and guidelines for content redistribution of Twitter.
3. The methodology section has been revised to include the details about the research tool that was used for this study.
4. The discussion section has been rewritten to highlight the scientific contributions of this paper.