DATASET | doi:10.20944/preprints202004.0263.v1
Subject: Public Health And Healthcare, Health Policy And Services Keywords: Twitter; Arabic; COVID-19
Online: 16 April 2020 (08:15:28 CEST)
The COVID-19 pandemic spread of the coronavirus across the globe has affected our lives on many different levels. The world we knew before the spread of the virus has become another one. Every country has taken preventive measures, including social distancing, travel restrictions, and curfew, to control the spread of the disease. With these measures implemented, people have shifted to social media platforms in the online sphere, such as Twitter, to maintain connections. In this paper, we describe a coronavirus data set of Arabic tweets collected from January 1, 2020, primarily from hashtags populated from Saudi Arabia. This data set is available to the research community to glean a better understanding of the societal, economical, and political effects of the outbreak and to help policy makers make better decisions for fighting this epidemic.
REVIEW | doi:10.20944/preprints202103.0734.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Digital citizenship; Twitter; e-participation
Online: 30 March 2021 (12:17:49 CEST)
(1) Background: Spain launched an official campaign, #EsteVirusLoParamosUnidos, to try and unite the efforts of the entire country through citizen cooperation to combat coronavirus. The research goal is to analyze the Twitter campaign’s repercussion on general citizen feeling. (2) Methods: The research is based on a composite design that triangulates from a theoretical model, a quantitative analysis and a qualitative analysis. (3) Results: Of the 7357 tweets in the sample, 72.32% were found to be retweets. Four content families were extracted: politics, education, messages to society and defense of occupational groups. The feelings expressed ranged along a continuum, from unity, admiration and support at one end to discontent and criticism regarding the health situation at the other. (4) Conclusions: The development of networked sociopolitical and technical measures that enable citizen participation facilitates the development of new patterns of interaction between governments and digital citizens, increasing citizens’ possibilities of influencing the public agenda and therefore strengthening citizen engagement vis-à-vis such situations.
ARTICLE | doi:10.20944/preprints202002.0170.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Twitter; dataset; redundancy; reduction; archive
Online: 13 February 2020 (12:45:44 CET)
The data from social networks like Twitter is a valuable source for research but full of redundancy, making it hard to provide large-scale, self-contained, and small datasets. The data recording is a common problem in social media-based studies and could be standardized. Sadly, this is hardly done. This paper reports on lessons learned from a long-term evaluation study recording the complete public sample of the German and English Twitter stream. It presents a recording solution proposal that merely chunks a linear stream of events to reduce redundancy. If events are observed multiple times within the time-span of a chunk, only the latest observation is written to the chunk. A 10 Gigabyte Twitter raw dataset covering 1,2 Million Tweets of 120.000 users recorded between June and September 2017 was used to analyze expectable compression rates. It turned out that resulting datasets need only between 10\% and 20\% of the original data size without losing any event, metadata or the relationships between single events. This kind of redundancy reduction recording makes it possible to curate large-scale (even nation-wide), self-contained, and small datasets of social networks for research in a standardized and reproducible manner.
ARTICLE | doi:10.20944/preprints202308.1832.v1
Online: 29 August 2023 (03:08:24 CEST)
This study analyzes the evolution of Corporate and Social Responsibility (CSR) and Environmental, Social, and Governance (ESG) concepts in social media, specifically on Twitter, from 2007 to 2022. The research aims to understand how society perceives organizational practices related to these aspects. Twitter provides an authentic environment for capturing user perceptions. The methodology involved collecting Tweets using Python and analyzing them in English, Spanish, and Portuguese with RStudio. The study's main findings highlight significant contributions and concerns within the CSR and ESG debates. While CSR's popularity showed a downward trend from 2010 to 2019, ESG grew exponentially during the pandemic and post-pandemic periods. Word analysis revealed the impact of specific hashtags: RSC in Spanish, #CSR in English, and #ESG in Portuguese. Society's increasing concerns concern environmental issues for combating global warming and ensuring a better quality of life, alongside financial transparency and ethical practices in companies. However, the social aspect of ESG, represented by the letter "S," didn't garner the same attention as environmental and governance matters. This discrepancy might stem from a lack of awareness or lower priority in public discussions on social media, overshadowing the social dimension in the ESG framework.
ARTICLE | doi:10.20944/preprints202007.0172.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: COVID-19; Twitter; Topic Discovery; Arabic
Online: 9 July 2020 (07:25:19 CEST)
The new coronavirus outbreak (COVID-19) has swept the world since December 2019 posing a global threat to all countries and communities on the planet. Information about the outbreak has been rapidly spreading on different social media platforms in unprecedented level. As it continues to spread in different countries, people tend to increasingly share information and stay up-to-date with the latest news. It is crucial to capture the discussions and conversations happening on social media to better understand human behavior during pandemics and alter possible strategies to combat the pandemic. In this work, we analyze the Arabic content of Twitter to capture the main discussed topics among Arabic users. We utilize Non-negative Matrix Factorization (NMF) to discover main issues and topics based on a dataset of Arabic tweets from early January to the end of April, and identify the most frequent unigrams, bigrams, and trigrams of the tweets. The final discovered topics are then presented and discussed which can be roughly classified into COVID-19 origin topics, prevention measures in different Arabic countries, prayers and supplications, news and reports, and finally topics related to preventing the spread of the disease such as curfew and quarantine. To our best knowledge, this is the first work addressing the issue of detecting COVID-19 related topics from Arabic tweets.
REVIEW | doi:10.20944/preprints202008.0235.v1
Subject: Social Sciences, Media Studies Keywords: Twitter; Disaster; Risk Reduction; Preparedness; Response; Recovery
Online: 10 August 2020 (04:45:36 CEST)
Background: Twitter is a major tool for communication in emergencies such as natural disasters. This online social network allows the user to produce content, and it is not designed exclusively for news releases, as opposed to other service providers. Aim: The aim of this study is to investigate Twitter uses in natural disasters and pandemics. Methods: The included studies reported the role of Twitter in natural disasters. The studies that report in settings other than the natural disasters (such as man-made disasters) and other social media were excluded. Electronic databases for a comprehensive literature search including MEDLINE, Web of Science, CINAHL, PsycINFO, Cochrane Register of Controlled Trials (CENTRAL) and EMBASE were used to identify the records that match the mentioned inclusion criteria published till May 2020. The study characteristics were extracted from the qualified studies including year of publication, findings, and geographical location of the study conduct. A narrative synthesis for this literature review was used. Results: The search identified 822 articles of which 780 articles were removed, 256 were not available, 311 papers were not relevant, 16 were duplicated articles, and 197 were non-related to the emergencies. 45 articles met the selection criteria and were included in the review. eleven themes were found in the narrative synthesis including early warning, disseminating information and misinformation, advocacy, personal gains, assessment, various roles of organizations, public mood, geographical analysis, charity, using influencers, and trust. Conclusions: It is recommended that influential individuals be identified in each country and community before disasters occur so that the necessary information can be disseminated in response to disasters. Preventing the spread of misinformation is one of the most important issues in times of disaster, especially pandemics. Disseminating accurate, transparent, and prompt information from relief organizations and governments can help. Also, analyzing Twitter data can be a good source for understanding the mental state of the community, estimating the number of injured people, estimating the points affected by natural disasters, and modeling the prevalence of epidemics. Therefore, various groups such as politicians, the government, non-governmental organizations, aid workers, and the health system can use this information to plan and implement interventions.
ARTICLE | doi:10.20944/preprints202007.0257.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: Twitter; Spatiotemporal analysis; Mega-events; Olympic Games
Online: 12 July 2020 (14:25:23 CEST)
Olympic Games have a huge impact on the cities where they are held, both during the actual celebration of the event and before and after it. This study presents a new approach based on spatial analysis, GIS, and data coming from Location Based Social Networks to model the spatiotemporal dimension of impacts associated with the Rio 2016 Olympic Games. Geolocalized data from Twitter are used to analyze the activity pattern of users from two different viewpoints. The first monitors the activity of Twitter users during the event -the arrival of visitors, where they came from, and the use resident and tourist made of different areas of the city. The second assesses the spatiotemporal use of the city by Twitter users before the event, compared to the use during and after the event. The results not only reveal which spaces were the most used while the Games were being held but also changes in the urban dynamics after the Games. Both approaches can be used to assess the impacts of mega-events and to improve the management and allocation of urban resources such as transport and public services infrastructure.
ARTICLE | doi:10.20944/preprints202306.1992.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Twitter; child-rearing information; Machine Learning; Numerical Classification
Online: 28 June 2023 (10:20:48 CEST)
It is difficult to obtain necessary information accurately from Social Networking Service (SNS) while raising children, and it is thought that there is a certain demand for the development of a system that presents appropriate information to users according to the child's developmental stage. There are still few examples of research on knowledge extraction that focuses on childcare. This research aims to develop a system that extracts and presents useful knowledge for people who are actually raising children, using texts about childcare posted on Twitter. In many systems numbers in text data are just strings like words and are normalized to zero or simply ignored. In this paper, we created a set of tweet texts and a set of profiles created according to the developmental stages of infants from "0-year-old child" to "6-year-old child". For each set, we used ML algorithms such as NB (Naive Bayes), LR (Logistic Regression), ANN (Approximate Nearest Neighbor algorithms search), XGboost, RF (random forest), decision trees, and SVM (Support Vector Machine) to compare with BERT, a neural language model, to construct a classification model that predicts numbers from "0" to "6" from sentences. The accuracy rate predicted by the BERT (Bidirectional Encoder Representations from Transformers) classifier was slightly higher than that of the NB, LR, ANN, XGboost, RF, decision trees, and SVM classifiers, indicating that the BERT classification method was better.
ARTICLE | doi:10.20944/preprints202304.1230.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Twitter; child-rearing information; Machine Learning; Numerical Classification
Online: 29 April 2023 (08:14:03 CEST)
It is difficult to obtain necessary information accurately from Social Networking Service (SNS) while raising children, and it is thought that there is a certain demand for the development of a system that presents appropriate information to users according to the child's developmental stage. There are still few examples of research on knowledge extraction that focuses on childcare. This research aims to develop a system that extracts and presents useful knowledge for people who are actually raising children, using texts about childcare posted on Twitter. In many systems numbers in text data are just strings like words and are normalized to zero or simply ignored. In this paper, we created a set of tweet texts and a set of profiles created according to the developmental stages of infants from "0-year-old child" to "6-year-old child". For each set, we used Support Vector Machine (SVM), and Bidirectional Encoder Representations from Transformers (BERT), a neural language model, to construct a classification model that predicts numbers from "0" to "6" from sentences. The accuracy rate predicted by the BERT classifier was slightly higher than that of the SVM classifier, indicating that the BERT classification method was better.
ARTICLE | doi:10.20944/preprints202202.0306.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: COVID-19; Twitter; Blood Clots; Social Media; Clots
Online: 24 February 2022 (09:35:35 CET)
After the first weeks of vaccination against the SARS-CoV-2, several cases of acute thrombosis were reported. These news reports began to be shared frequently across social media platforms. The aim of this study was to conduct an analysis of Twitter data related to the overall discussion. Data was retrieved from 14th March to 14th April using the keyword ‘blood clots’. A dataset with n=266,677 tweets was retrieved, and a systematic random sample of 5% of tweets (n=13,334) were entered into NodeXL for further analysis. Social network analysis was used to analyse the data by drawing upon the Clauset-Newman-Moore algorithm. Influential users were identified by drawing upon the betweenness centrality metric. Text analysis was applied to identify the key hashtags and websites used at this time. More than half of the network was comprised of retweets and the largest groups within the network were broadcast clusters where a number of key users were retweeted. The most popular narratives were around highlighting the low risk of obtaining a blood clot from a vaccine and highlighting higher blood clot risks in medicines commonly consumed. A wide-variety of actors drove the discussion on Twitter ranging from writers, physicians, the general public academics, celebrities, and journalists. Twitter was used to highlight the low potential of obtaining a blood clot from a vaccine and encouraged vaccinations among the public.
ARTICLE | doi:10.20944/preprints202110.0070.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Online social networks (OSNs); Deep Learning; cyberbullying; Twitter
Online: 5 October 2021 (08:27:41 CEST)
Online social networks (OSNs) play an integral role in facilitating social interaction; however, these social networks increase antisocial behavior, such as cyberbullying, hate speech, and trolling. Aggression or hate speech that takes place through short message service (SMS) or the Internet (e.g., in social media platforms) is known as cyberbullying. Therefore, automatic detection utilizing natural language processing (NLP) is a necessary first step that helps prevent cyberbullying. This research proposes an automatic cyberbullying method to detect aggressive behavior using a consolidated deep learning model. This technique utilizes multichannel deep learning based on three models, namely, the bidirectional gated recurrent unit (BiGRU), transformer block, and convolutional neural network (CNN), to classify Twitter comments into two categories: aggressive and not aggressive. Three well-known hate speech datasets were combined to evaluate the performance of the proposed method. The proposed method achieved promising results. The accuracy of the proposed method was approximately 88%.
ARTICLE | doi:10.20944/preprints202110.0015.v1
Subject: Public Health And Healthcare, Nursing Keywords: Social media; Community; Facebook; Twitter; Google; Information; Interaction
Online: 1 October 2021 (12:03:09 CEST)
Background: Caregivers often use the internet to access information related to stroke care to improve preparedness, thereby reducing uncertainty and enhancing the quality of care. Method: Social media communities used by caregivers of people affected by stroke were identified using popular keywords searched for using Google. Communities were filtered based on their ability to provide support to caregivers. Data from the included communities were extracted and analysed to determine the content and level of interaction. Results: There was a significant rise in the use of social media by caregivers of people affected by stroke. The most popular social media communities were charitable and governmental organizations with the highest user interaction – this was for topics related to stroke prevention, signs and symptoms, and caregiver self-care delivered through video-based resources. Conclusion: Findings show the ability of social media to support stroke caregiver needs and practices that should be considered to increase their interaction and support.
ARTICLE | doi:10.20944/preprints202004.0031.v1
Subject: Public Health And Healthcare, Health Policy And Services Keywords: Saudi Arabia, COVID-19, Sentiment Analysis, Twitter, Measures
Online: 3 April 2020 (11:45:33 CEST)
Background: Countries around the world are facing extraordinary challenges in implementing various measures to slow down the spread of the novel coronavirus (COVID-19). Guided by international recommendations, Saudi Arabia has implemented a series of infection control measures after the detection of the first confirmed case in the country. However, in order for these measures to be effective, public attitudes and compliance must be conducive as perceived risk is strongly associated with health behaviors. The primary objective of this study is to assess Saudis’ attitudes towards COVID-19 preventive measures to guide future health communication content. Methods: Naïve Bayes machine learning model was used to run Arabic sentiment analysis of Twitter posts through the Natural Language Toolkit (NLTK) library in Python. Tweets containing hashtags pertaining to seven public health measures imposed by the government were collected and analyzed. Results: A total of 53,127 tweets were analyzed. All measures, except one, showed more positive tweets than negative. Measures that pertain to religious practices showed the most positive sentiment. Discussion: Saudi Twitter users showed support and positive attitudes towards the infection control measures to combat COVID-19. It is postulated that this conducive public response is reflective of the overarching, longstanding popular confidence in the government. Religious notions may also play a positive role in preparing believers at times of crises. Findings of this study broadened our understanding to develop proper public health messages and promote stronger compliance with control measures to control COVID-19.
ARTICLE | doi:10.20944/preprints202305.0512.v1
Subject: Social Sciences, Government Keywords: E-participation; social media; E-government; Twitter; local government
Online: 8 May 2023 (10:23:30 CEST)
Communication and effective interactions are inevitable necessities in every organizational setting. In this era of information and communication technology, where limitations and difficulties in proper communication and interactions between different entities of various organizations have been reduced maximally, the government, stakeholders, and citizens of the different nations should also utilize these available tools in a way to improve the maximum performance in governance through interactions and e-participation between the citizens, stakeholders, and the government parastatals. This research focuses on examining the available and most preferred applications or platforms which encourage the best level of communication and interaction through E-participation among the citizens, stakeholders, and government from the local government lev-el taking Nigeria as a case study
ARTICLE | doi:10.20944/preprints202305.0389.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: LDA; Topic Modeling; Twitter; Time Series; Sentiment Analysis; CIER
Online: 6 May 2023 (08:13:29 CEST)
The use of the voice of the customer as a main input to guide decision making towards customer centricity strategies has become a necessity for companies. This research proposes a structured method of textual processing using the KDD (Knowledge Discovery Databases) methodology applied to the tweets of users of Colombian public sector companies, through the analysis of temporal sentiments and topic modeling to identify the areas in which actions should be taken to improve the perception of service. To fulfill such purpose, tweets from January to June 2022 are processed, followed by a temporal analysis of the evolution of the sentiment based on 3 enriched dictionaries; after, the LDA (Linear Discriminant Analysis) algorithm is implemented to find the areas with ailment for the user, in addition to propose a method to homologate the CIER (Comisión de Integración Energética Regional) survey. Finally, metrics are detailed to follow up the perception of the service. It is concluded that for the Acueducto the topic with the highest number of complaints is related to "Water truck request", for Enel "Servide Outages" and for Vanti: " Case solution and request information". Also, the homologation of 3 of the 5 pillars on which the CIER survey is based is presented.
REVIEW | doi:10.20944/preprints202302.0364.v1
Subject: Social Sciences, Media Studies Keywords: Twitter; Social Media; Altmetrics; Citations; Orthopedic Research Society; Publication
Online: 22 February 2023 (01:34:43 CET)
The purpose of this paper is to highlight the main themes and insights from the Journal of Orthopedic Research (JOR)/JOR Spine Workshop during the Orthopedic Research Society 2022 Annual Meeting in Tampa Bay, Florida. This workshop, organized by JOR Editor-in-Chief Dr. Linda Sandell, focused on communication strategies (particularly using social media) to broadcast published work in orthopedic research. In this manuscript, we summarize data that support the beneficial impacts of amplifying scholarly works on social media and outline a linearized workflow for constructing a Twitter posts, which can be generalized to other social media platforms, to share academic research. Finally, we identify resources to alleviate barriers to social media use and help promote professionalism and success online in the orthopedic research community. As early career scientists in orthopedics, we see immense value in using social media, particularly Twitter, to communicate our research findings and build our scholarly networks. We hope this information will be persuasive to those in the orthopedic field and be broadly applicable to others in related scientific fields who wish to disseminate findings and engage a public audience on social media. For the orthopedic research society and journal of orthopedic research, social media can assist in accomplishing our mission of creating a world without musculoskeletal limitations via the timely dissemination of orthopedic research findings and news.
DATA DESCRIPTOR | doi:10.20944/preprints202206.0246.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: dataset; twitter; tweets; IMDb ratings; movies; sentiment analysis; NLP
Online: 17 June 2022 (04:39:16 CEST)
In this paper we intend to present a dataset that contain a collection of tweets generated as reactions of the release of 50 different movies. The dataset can be used for gaining useful insights regarding the conversation that is generated around a particular movie. It is particularly suitable for conducting sentiment analysis and other NLP techniques. The dataset contains approximately 2.5 million tweets with their related meta data and cover 50 movies. For each movie, its IMDb rating is included. The movies are the 25 releases with the highest number of votes during 2020 and 2021. The collected tweets represent the reactions of the twitter community during the first week of the release date in US of that particular movie. The tweets per movie ranged from 1.000 to approximately 200.000 tweets with an average of 50.000 per release. We used The Internet Archive Wayback Machine in order to retrieve the IMDb movie rating after one week of the US release date. The tweets and related metadata have been collected using the Tweet Downloader tool.
Subject: Social Sciences, Media Studies Keywords: COVID-19; Twitter; Geo-Tagged; Metropolitan; Computational Social Science
Online: 18 May 2021 (10:24:58 CEST)
One of the unfortunate findings from the ongoing COVID-19 crisis is the disproportionate impact the crisis has had on people and communities who were already socioeconomically disadvantaged. It has, however, been difficult to study this issue at scale and in greater detail using social media platforms like Twitter. Several COVID-19 Twitter datasets have been released, but they have very broad scope, both topically and geographically. In this paper, we present a more controlled and compact dataset that can be used to answer a range of potential research questions (especially pertaining to computational social science) without requiring extensive preprocessing or tweet-hydration from the earlier datasets. The proposed dataset comprises tens of thousands of geotagged (and in many cases, reverse-geocoded) tweets originally collected over a 255-day period in 2020 over 10 metropolitan areas in North America. Since there are socioeconomic disparities within these cities (sometimes to an extreme extent, as witnessed in `inner city neighborhoods’ in some of these cities), the dataset can be used to assess such socioeconomic disparities from a social media lens, in addition to comparing and contrasting behavior across cities.
ARTICLE | doi:10.20944/preprints202301.0415.v2
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Psychological Health; Drugs; Twitter; Machine Learning; Big Data; Drug Abuse; Toxicology; Social Factors; Economic Factors; Environmental Factors
Online: 27 February 2023 (13:31:40 CET)
Mental health issues can have significant impacts on individuals and communities and hence on social sustainability. There are several challenges facing mental health treatment, however, more important is to remove the root causes of mental illnesses because doing so can help prevent mental health problems from occurring or recurring. This requires a holistic approach to understanding mental health issues that are missing from the existing research. Mental health should be understood in the context of social and environmental factors. More research and awareness are needed, as well as interventions to address root causes. The effectiveness and risks of medications should also be studied. This paper proposes a big data and machine learning-based approach for the automatic discovery of parameters related to mental health from Twitter data. The parameters are discovered from three different perspectives, Drugs & Treatments, Causes & Effects, and Drug Abuse. We used Twitter to gather 1,048,575 tweets in Arabic about psychological health in Saudi Arabia. We built a big data machine learning software tool for this work. A total of 52 parameters were discovered for all three perspectives. We defined 6 macro-parameters (Diseases & Disorders, Individual Factors, Social & Economic Factors, Treatment Options, Treatment Limitations, and Drug Abuse) to aggregate related parameters. We provide a comprehensive account of mental health, causes, medicines and treatments, mental health and drug effects, and drug abuse, as seen on Twitter, discussed by the public and health professionals. Moreover, we identify their associations with different drugs. The work will open new directions for social media-based identification of drug use and abuse for mental health, as well as other micro and macro factors related to mental health. The methodology can be extended to other diseases and provides a potential for discovering evidence for forensics toxicology from social and digital media.
ARTICLE | doi:10.20944/preprints202207.0196.v1
Subject: Biology And Life Sciences, Food Science And Technology Keywords: Food Security; Machine Learning; Topic Modeling; Twitter; Natural Language Processing
Online: 13 July 2022 (09:14:39 CEST)
Objective: Food security during public health emergencies relies on situational awareness of needs and resources. Artificial intelligence (AI) has revolutionized situational awareness during crises, allowing the allocation of resources to needs through machine learning algorithms. Limited research exists monitoring Twitter for changes in the food security-related public discourse during the COVID-19 pandemic. We aim to address that gap with AI by classifying food security topics on Twitter and showing topic frequency per day. Methods: Tweets were scraped from Twitter from January 2020 through December 2021 using food security keywords. Latent Dirichlet Allocation (LDA) topic modeling was performed, followed by time-series analyses on topic frequency per day.Results: 237,107 tweets were scraped and classified into topics, including food needs and resources, emergency preparedness and response, and mental/physical health. After the WHO’s pandemic declaration, there were relative increases in topic density per day regarding food pantries, food banks, economic and food security crises, essential services, and emergency preparedness advice. Threats to food security in Tigray emerged in 2021.Conclusions: AI is a powerful yet underused tool to monitor food insecurity on social media. Machine learning tools to improve emergency response should be prioritized, along with measurement of impact. Further food insecurity word patterns testing, as generated by this research, with supervised machine learning models can accelerate the uptake of these tools by policymakers and aid organizations.
Subject: Computer Science And Mathematics, Information Systems Keywords: Online Social Media prediction, Covid-19 prediction, Twitter, Google Trends
Online: 3 June 2021 (11:37:56 CEST)
As the coronavirus disease 2019 (COVID-19) continues to rage worldwide, the United States has become the most affected country with more than 34.1 million total confirmed cases up to June 1, 2021. In this work, we investigate correlations between online social media and Internet search for the COVID-19 pandemic among 50 U.S. states. By collecting the state-level daily trends through both Twitter and Google Trends, we observe a high but state-different lag correlation with the number of daily confirmed cases. We further find that the predictive accuracy measured by the correlation coefficient is positively correlated to a state’s demographic, air traffic volume and GDP development. Most importantly, we show that a state’s early infection rate is negatively correlated with the lag to the previous peak in Internet search and tweeting about COVID-19, indicating that earlier collective awareness on Twitter/Google correlates with lower infection rate. Lastly, we demonstrate that correlations between online social media and search trends are sensitive to time, mainly due to the attention shifting of the public.
ARTICLE | doi:10.20944/preprints202005.0015.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: COVID-19; coronavirus; machine learning; sentiment analysis; textual analytics; Twitter
Online: 2 May 2020 (13:52:28 CEST)
Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.
ARTICLE | doi:10.20944/preprints201905.0141.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: twitter spam detection; adversarial machine learning; online social networks; survey
Online: 13 May 2019 (01:49:41 CEST)
Online Social Networks (OSNs), such as Facebook and Twitter, have become a very important part of many people’s daily lives. Unfortunately, the high popularity of these platforms makes them very attractive to spammers. Machine-learning (ML) techniques have been widely used as a tool to address many cybersecurity application problems (such as spam and malware detection). However, most of the proposed approaches do not consider the presence of adversaries that target the defense mechanism itself. Adversaries can launch sophisticated attacks to undermine deployed spam detectors either during training or the prediction (test) phase. Not considering these adversarial activities at the design stage makes OSNs’ spam detectors prone to a range of adversarial attacks. This paper thus surveys the attacks against Twitter spam detectors in an adversarial environment. In addition, a general taxonomy of potential adversarial attacks is proposed by applying common frameworks from the literature. Examples of adversarial activities on Twitter were provided after observing Arabic trending hashtags. A new type of spam tweet (Adversarial spam tweet), which can be used to undermine deployed classifier, were found. In addition, possible countermeasures that could increase the robustness of Twitter spam detectors against such attacks are investigated.
COMMUNICATION | doi:10.20944/preprints202309.0047.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: MPox; big data; data analysis; data science; Twitter; natural language processing
Online: 1 September 2023 (10:23:41 CEST)
In the last decade and a half, the world has experienced the outbreak of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika Virus, Middle East Respiratory Syndrome (MERS), Measles, and West Nile Virus, just to name a few. During these virus outbreaks, the usage and effectiveness of social media platforms increased significantly as such platforms served as virtual communities, enabling their users to share and exchange information, news, perspectives, opinions, ideas, and comments related to the outbreaks. Analysis of this Big Data of conversations related to virus outbreaks using concepts of Natural Language Processing such as Topic Modeling has attracted the attention of researchers from different disciplines such as Healthcare, Epidemiology, Data Science, Medicine, and Computer Science. The recent outbreak of the MPox virus has resulted in a tremendous increase in the usage of Twitter. Prior works in this field have primarily focused on the sentiment analysis and content analysis of these Tweets, and the few works that have focused on topic modeling have multiple limitations. This paper aims to address this research gap and makes two scientific contributions to this field. First, it presents the results of performing Topic Modeling on 601,432 Tweets about the 2022 Mpox outbreak, which were posted on Twitter between May 7, 2022, and March 3, 2023. The results indicate that the conversations on Twitter related to Mpox during this time range may be broadly categorized into four distinct themes - Views and Perspectives about MPox, Updates on Cases and Investigations about Mpox, MPox and the LGBTQIA+ Community, and MPox and COVID-19. Second, the paper presents the findings from the analysis of these Tweets. The results show that the theme that was most popular on Twitter (in terms of the number of Tweets posted) during this time range was - Views and Perspectives about MPox. It is followed by the theme of MPox and the LGBTQIA+ Community, which is followed by the themes of MPox and COVID-19 and Updates on Cases and Investigations about Mpox, respectively. Finally, a comparison with prior works in this field is also presented to highlight the novelty and significance of this research work.
ARTICLE | doi:10.20944/preprints202008.0487.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: Twitter; data reliability; risk communication; data mining; Google Cloud Vision API
Online: 22 August 2020 (02:32:40 CEST)
While Twitter has been touted to provide up-to-date information about hazard events, the reliability of tweets is still a concern. Our previous publication extracted relevant tweets containing information about the 2013 Colorado flood event and its impacts. Using the relevant tweets, this research further examined the reliability (accuracy and trueness) of the tweets by examining the text and image content and comparing them to other publicly available data sources. Both manual identification of text information and automated (Google Cloud Vision API) extraction of images were implemented to balance accurate information verification and efficient processing time. The results showed that both the text and images contained useful information about damaged/flooded roads/street networks. This information will help emergency response coordination efforts and informed allocation of resources when enough tweets contain geocoordinates or locations/venue names. This research will help identify reliable crowdsourced risk information to enable near-real time emergency response through better use of crowdsourced risk communication platforms.
ARTICLE | doi:10.20944/preprints202007.0648.v1
Subject: Social Sciences, Media Studies Keywords: Twitter; Social Media; NLP; Tweet; User Categorizations and Mathematical Frame Work
Online: 26 July 2020 (17:23:33 CEST)
Social networking applications such as Twitter have increasingly gained significance in terms of socio-economic, political, and religious as well as entertainment sectors. This in turn, has witnessed a wide gamut of information explosion in the social networking realm that can tend to be both useful as well as misleading at the same point of time. Spam detection is one such solution that caters to this problem through identification of irrelevant users and their data. However, existing research has so far laid primary focus on user profile information through activity detection and relevant techniques that may underperform when these profiles exhibit characteristics of temporal dependency, poor reflection of generated content from the user profile, etc. This is the primary motivation for this paper that addresses the aforementioned problem of user profiles by focusing on both profile information and content-based spam detection. To this end, this work delivers three significant contributions. Firstly, exhaustive use of Natural language processing (NLP) techniques has been rendered towards creation of a new comprehensive dataset with a wide range of content-based features. Secondly, this dataset has been fed into a customized state-of-art hybrid machine learning model that has been exclusively built using a combination of both machine learning and deep learning techniques. Extensive simulation based analysis not only records over 98% accuracy but also establishes the practical applicability of this proposal by proving that modeling based on the mixed profile and content-generated data is more capable of spam detection in contrast to each of these standalone approaches. Finally, a novel methodology based on logistic regression is proposed and supported by analytical formulations. This paves the way for the custom-built dataset to be analyzed and corresponding probabilities to be obtained that differentiate legitimate users from spammers. The obtained mathematical outcome can henceforth be used for future prediction of user categories through appropriate parameter tuning for any given dataset. This makes our method a truly generic one capable of identifying and classifying different user categories.
ARTICLE | doi:10.20944/preprints202007.0019.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: COVID-19; Coronavirus; reopen; sentiment analysis; Twitter; Census; Binary Logit Model
Online: 3 July 2020 (08:35:46 CEST)
Investigating and classifying sentiments of social media users (e.g., positive, negative) towards an item, situation, and system are very popular among the researchers. However, they rarely discuss the underlying socioeconomic factor associations for such sentiments. This study attempts to explore the factors associated with positive and negative sentiments of the people about reopening the economy, in the United States (US) amidst the COVID-19 global crisis. It takes into consideration the situational uncertainties (i.e., changes in work and travel pattern due to lockdown policies), economic downturn and associated trauma, and emotional factors such as depression. To understand the sentiment of the people about the reopening economy, Twitter data was collected, representing the 51 states including Washington DC of the US. State-wide socioeconomic characteristics of the people (e.g., education, income, family size, and employment status), built environment data (e.g., population density), and the number of COVID-19 related cases were collected and integrated with Twitter data to perform the analysis. A binary logit model was used to identify the factors that influence people toward a positive or negative sentiment. The results from the logit model demonstrate that family households, people with low education levels, people in the labor force, low-income people, and people with higher house rent are more interested in reopening the economy. In contrast, households with a high number of members and high income are less interested to reopen the economy. The accuracy of the model is good (i.e., the model can correctly classify 56.18\% of the sentiments). The Pearson chi2 test indicates that overall this model has high goodness-of-fit. This study provides a clear indication to the policymakers where to allocate resources and what policy options they can undertake to improve the socioeconomic situations of the people and mitigate the impacts of pandemics in the current situation and as well as in the future.
ARTICLE | doi:10.20944/preprints202009.0213.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: twitter; discourse analysis; Covid-19; Coronavirus; disinformation; misinformation; social media activity; downplay
Online: 10 September 2020 (03:28:01 CEST)
Misinformation can amplify humanity's most significant challenges. As the novel coronavirus spreads across the world, concerns regarding the spreading of misinformation about it and also people downplaying the severity of it are also growing. This article investigates social media activity in May 2020, specifically Twitter, with respect to COVID-19, the themes of tweets, where the discussion is emerging from, disinformation shared about the virus, and its relationship with COVID-19 incidence rate at the state and county level. A geodatabase of all geotagged COVID-19 related tweets was compiled. Multiscale Geographically Weighted Regression was employed to examine the association between social media activity, population, and the spatial variability of disease incidence; our results suggest that MGWR could explain 96.7% of the variations. Moreover, Covid-19 related twitter dataset content analysis reveals a meaningful strong spatial relationship that exists between social media activity and known cases of COVID-19. Discourses analysis was conducted on tweets to index tweets downplaying the Pandemic or disseminating disinformation; the discourses analysis findings suggest that states in where twitter users spread more misinformation and showed more resistance to pandemic management measures in May are experiencing a surge in the number of cases in July.
ARTICLE | doi:10.20944/preprints202302.0032.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: Social network; Twitter; Structural analysis; Echo chamber; Detection; Case study; German language; Disinformation
Online: 2 February 2023 (06:54:35 CET)
Background: This study presents a graph-based and purely structural analysis to detect echo chambers on Twitter. Echo chambers are a concern as they can spread misinformation and reinforce harmful stereotypes and biases in social networks. Methods: The study recorded the German-language Twitter stream over two months, recording about 180.000 accounts and their interactions. The study focuses on retweet interaction patterns in the German-speaking Twitter stream and found that the greedy modularity maximization and HITS metric are the most effective methods for identifying echo chambers. Results: The purely structural detection approach was able to identify an echo chamber (red community) that was focused on a few topics with a triad of Anti-Covid, right-wing populism, and pro-Russian positions (very likely reinforced by Kremlin-orchestrated troll accounts). In contrast, a blue community was much more heterogeneous and showed "normal" communication interaction patterns. Conclusions: The study highlights the effects of echo chambers as they can make political discourse dysfunctional and foster polarization in open societies. The presented results contribute to identifying problematic interaction patterns in social networks often involved in the spread of disinformation by problematic actors. It is important to note that not the content but only the interaction patterns would be used as a decision criterion, thus avoiding problematic content censorship.
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Urdu Twitter Dataset; Urdu Natural language processing (NLP); Urdu text Sentiments and Emoticons
Online: 24 March 2021 (12:03:46 CET)
This article presents a dataset of tweets in the Urdu language. There are 1,140,824 tweets in the dataset, collected from Twitter for September and October 2020. This large-scale corpus of tweets is generated by performing pre-processing which includes removing columns containing user information, retweet’s count, followers information, duplicate tweets, removing unnecessary punctuation, links, symbols, and spaces, and finally extracting emojis if present in the tweet text. In the final dataset each tweet record contains columns for tweet id, text, and emoji extracted from the text with a sentiment score. Emojis are extracted to validate Machine Learning models used for the multilingual sentiment and behavior analysis. These are extracted using a Python script that searches for an emoji from the list of 751 most frequently used emojis. If an emoji is present in the text, a column with the emoji description and sentiment score is added.
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Covid-19; Twitter; sustainable cities; sustainable citizenship; environmental awareness; responsible consumption; sustainable tourism
Online: 5 February 2021 (22:15:27 CET)
The social confinement resulting from the COVID-19 crisis temporarily reduced greenhouse gas emissions. Although experts consider that the decrease in pollution rates was not drastic, some surveys detect a growth in social concern about the climate. In this new climate-conscious environment, municipalities and local governments are promoting a new way of living and caring for cities, even before they can regain national and international freedom of movement. This work analyzes the connection between the new climate awareness arising from the COVID-19 crisis, the proposals of sustainable citizenship around the world, and its communication on Twitter to educate the new eco-conscious audience. The methodology mixes quantitative and qualitative analysis, using the Twitonomy Premium tool and the Twitter research tool, with data extracted at the end of December 2020. Among the top 10 most influential and active accounts, the results show educational institutions, local institutions, companies, neighborhood, associations, and influencers. The impossibility of living the city, has not prevented citizen education and commitment to make real change for when that city and its citizens return to normality. Although this new normality must be different: more ecological, more responsible, more sustainable and practiced from early childhood.
ARTICLE | doi:10.20944/preprints201901.0077.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Privacy; security; Machine Learning; K-Means; Natural Language Processing; Twitter; Private Information Retrieving
Online: 8 January 2019 (15:40:09 CET)
The violation of privacy, others people or personal, is a very current problem, which concerns not only on the web but also in private life. In the years 1990 it was expected that nowadays, that any routine operation was carried out "manually", and it would be performed through mobile phones or personal computers. The problem pertains the distribution network that allows to share and bring together information and as result the network becomes unsafe, if subjected to attacks. Nowaday we put personal information on web because otherwise we are seen as “weak”. This work aims to measure and analyze how much information are shared by users of a pre-established social network and it is carried out through a set of algorithms techniques of machine learning.
ARTICLE | doi:10.20944/preprints201803.0247.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: security; social sentiment sensor; hackers; social media; statistics; L1 regression; twitter; cyber attacks
Online: 29 March 2018 (07:47:48 CEST)
In recent years, online social media information has been subject of study in several data science fields due to its impact on users as a communication and expression channel. Data~gathered from online platforms such as Twitter has the potential to facilitate research over social phenomena based on sentiment analysis, which usually employs Natural Language Processing and Machine Learning techniques to interpret sentimental tendencies related to users opinions and make predictions about real events. Cyber attacks are not isolated from opinion subjectivity on online social networks. Various security attacks are performed by hacker activists motivated by reactions from polemic social events. In this paper, a methodology for tracking social data that can trigger cyber attacks is developed. Our main contribution lies in the monthly prediction of tweets with content related to security attacks and the incidents detected based on ℓ1 regularization.
ARTICLE | doi:10.20944/preprints202303.0335.v1
Subject: Medicine And Pharmacology, Internal Medicine Keywords: NLP; NLU; Twitter; Sentiment Analysis; Opinion Mining; Nigeria; Election; Machine Learning; BERT; LSTM; SVM
Online: 20 March 2023 (02:52:52 CET)
Election outcomes have been predicted in the past with the help of various state-of-the-art language models. Sentiment analysis helps in establishing the opinions of the public about a particular subject, a popular experiment known as opinion mining. Twitter has grown in popularity and proven to be a key tool in mining people’s sentiments concerning election and other trending subjects of interest. The outcome of the just concluded Presidential election in Nigeria shifts the focus on Lagos State governorship election. In this study, we propose a Bidirectional Encoder Representations from Transformers (BERT) model for the sentiment analysis of governorship election in Lagos State Nigeria using Twitter data. A total of 800,000 personal and public tweets were scraped from twitter concerning the three prominent contesting candidates using carefully selected search queries. The tweets were preprocessed to avoid noise and inconsistencies. The preprocessed tweets were passed into the pretrained and finetuned BERT model. The result was analyzed to establish the sentiments of the public about the candidates. The social networks of the candidates were also analyzed. The parameter-tuning yield different results with different learning rates (LR). Results showed that the learning rate at 1e-7 gave the best performance and that the smaller the learning rate, the higher the accuracy but the larger the epoch size, the higher the accuracy.
REVIEW | doi:10.20944/preprints202211.0427.v1
Subject: Social Sciences, Media Studies Keywords: Corporate Social Responsibility; Twitter; Stakeholder Management; Social Media Communication; Social Media; CSR; Communication Strategy
Online: 23 November 2022 (01:14:53 CET)
Corporate social responsibility (CSR) has become increasingly important for companies in recent years. On the one hand, regulatory frameworks require the disclosure of measures for sustainable management. On the other hand, for long-term corporate success, stakeholders must be strategically engaged in the dialog on sustainability aspects. Social media, and Twitter in particular, offer the potential to foster a meaningful stakeholder dialogue on CSR topics. Due to Elon Musk's acquisition in the fall of 2022, this strategic disruption provides an opportunity to systematically capture the platform's past activities and strategies to synthesize practical information that can guide Twitter usage decision making and be used for research to serve as the basis for future comparative longitudinal studies of changes in usage. We conducted a literature review including 42 papers to contribute to the body of evidence on CSR communication strategies on Twitter across industries and countries by deriving interdisciplinary suggestions for strategic CSR-related stakeholder management. Results cover relevant CSR topics, prioritized stakeholder groups for CSR communication on Twitter and successful communication strategies for companies to obtain beneficial results, such as generating social media capital. The results contribute to the strategic planning and implementation of CSR stakeholder management on Twitter and offer starting points for future studies on social media mining and CSR communication strategies.
ARTICLE | doi:10.20944/preprints202210.0238.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: NLP; NLU; Twitter; Sentiment Analysis; Opinion Mining; Nigeria; Election; Machine Learning; BERT; LSTM; SVM
Online: 17 October 2022 (12:01:42 CEST)
Introduction: Social media platforms such as Facebook, LinkedIn, Twitter, among others have been used as a tool for staging protests, opinion polls, campaign strategy, medium of agitation and a place of interest expression especially during elections. Past studies have established people’s opinion elections using social media posts. The advent of state-of-the-art algorithms for unstructured text processing implies tremendous progress in natural language processing and understanding. Aim: In this work, a Natural Language framework is designed to understand Nigeria 2023 presidential election based on public opinion using Twitter dataset. Methods: Raw datasets concerning discourse around Nigeria 2023 elections from Twitter of 2,059,113 18 dimensions were collected. Sentiment analysis was performed on the preprocessed dataset using three different machine learning models namely: Long Short-Term Memory (LSTM) Recurrent Neural Network, Bidirectional Encoder Representations from Transformers (BERT) and Linear Support Vector Classifier (LSVC) models. Personal tweet analysis of the three candidates provided insight on their campaign strategies and personalities while public tweet analysis established the public’s opinion about them. The performance of the models was also compared using accuracy, recall, false positive rate, precision and F-measure. Results: LSTM model gave an accuracy, precision, recall, AUC and f-measure of 88%, 82.7%, 87.2% , 87.6% and 82.9% respectively; the BERT model gave an accuracy, precision, recall, AUC and f-measure of 94%, 88.5%, 92.5%, 94.7% and 91.7% respectively while the LSVC model gave an accuracy, precision, recall, AUC and f-measure of 73%, 81.4%, 76.4%, 81.2% and 79.2% respectively. Conclusion: The experimental results show that sentiment analysis and other Natural Language Processing tasks can aid in the understanding of the social media space. Results also revealed the leverage of each aspirant towards winning the election. We conclude that sentiment analysis can form a general basis for generating insights for election and modeling election outcomes.
COMMUNICATION | doi:10.20944/preprints202206.0172.v3
Subject: Computer Science And Mathematics, Information Systems Keywords: Monkeypox; monkey pox; Twitter; Dataset; Tweets; Social Media; Big Data; Data Mining; Data Science
Online: 25 July 2022 (09:41:19 CEST)
ARTICLE | doi:10.20944/preprints202109.0216.v2
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Rural Health; Twitter Messaging; Social Media; Covid-19, SARS-CoV-2; coronavirus; social network analysis
Online: 19 November 2021 (14:41:47 CET)
Individuals from rural areas are increasingly using social media as a means of communication, receiving information, or actively complaining of inequalities and injustices. This study captured 57 days’ worth of Twitter data from June to August 2021 related to rural health using English language keywords. The study utilised social network analysis and natural language processing to analyse the data. It was found that Twitter served as a fruitful platform to raise awareness of problems faced by those living in rural areas. Overall, Twitter was utilised in rural areas to express complaints, to debate, and share information. Twitter could be leveraged as a powerful social listening tool for individuals and organisations who want to gain insight into popular narratives around rural health.
ARTICLE | doi:10.20944/preprints202108.0516.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: machine learning; time; naive bayes classification; recurrent neural networks, Twitter; social media data; automatic classification
Online: 27 August 2021 (11:23:50 CEST)
Machine learning (ML) is increasingly useful as data grows in volume and accessibility as it can perform tasks (e.g. categorisation, decision making, anomaly detection, etc.) through experience and without explicit instruction, even when the data are too vast, complex, highly variable, full of errors to be analysed in other ways , . Thus, ML is great for natural language, images, or other complex and messy data available in large and growing volumes. Selecting a ML algorithm depends on many factors as algorithms vary in supervision needed, tolerable error levels, and ability to account for order or temporal context, among many other things. Importantly, ML methods for explicitly ordered or time-dependent data struggle with errors or data asymmetry. Most data are at least implicitly ordered, potentially allowing a hidden `arrow of time’ to affect non-temporal ML performance. This research explores the interaction of ML and implicit order by training two ML algorithms on Twitter data before performing automatic classification tasks under conditions that balance volume and complexity of data. Results show that performance was affected, suggesting that researchers should carefully consider time when selecting appropriate ML algorithms, even when time is only implicitly included.
ARTICLE | doi:10.20944/preprints202105.0447.v1
Subject: Public Health And Healthcare, Health Policy And Services Keywords: Vaccine; Sentiment analysis; Public Sentiment Scenarios framework; COVID-19; Coronavirus; Twitter; Textual analytics; Public policy
Online: 19 May 2021 (13:51:57 CEST)
There exists a compelling need to better understand the temporal dynamics of public sentiment towards COVID-19 vaccines in the US on a national and state-wise level for facilitating appropriate public policy applications. Our analysis of social media data from early February of 2021 and late March of 2021 shows that in spite of overall strength of positive sentiment, and increasing numbers of Americans being fully vaccinated, negative sentiment about COVID-19 vaccines still persists among sections of people who are hesitant towards the vaccine. In this study, we performed sentiment analytics on vaccine tweets, studied changes in public sentiment over time, conducted vaccination sentiment validation using actual vaccination data from the US CDC and Household Pulse Survey (HPS), explored influence of maturity of Twitter user-accounts and generated geographic mapping of sentiments by location of Twitter users. Furthermore, we leverage the emotion polarity based Public Sentiment Scenarios (PSS) framework which was developed for COVID-19 sentiment analytics, to systematically analyze directions for public policy processes to potentially improve the administration of vaccines. Application of the PSS framework provides important time sensitive insights for state and federal government agencies and associated organizations to better implement public policy processes for healthcare management, communication, transparency, motivation and societal operational policies such as social distancing. These insights are expected to contribute to processes that can expedite the vaccination program and move closer to the cherished herd immunity goal.
ARTICLE | doi:10.20944/preprints202106.0196.v3
Subject: Computer Science And Mathematics, Information Systems Keywords: Twitter; Social Media; Social Networking; Social Network Analytic; DistilBERT; Text Similarity; Natural Language Processing; Character Computing
Online: 17 February 2022 (13:15:23 CET)
Social media platforms have been entirely an undeniable part of the lifestyle for the past decade. Analyzing the information being shared is a crucial step to understanding human behavior. Social media analysis aims to guarantee a better experience for the user and risen user satisfaction. For deriving any further conclusion, first, it is necessary to know how to compare users. In this paper, a hybrid model has been proposed to measure Twitter profiles’ similarity and quantifies the likeness degree of profiles by calculating features considering users’ behavioral habits. For this, first, the timeline of each profile has been extracted using the official TwitterAPI. Then, in parallel, three aspects of a profile are deliberated. Behavioral ratios are time-series-related information showing the consistency and habits of the user. Dynamic time warping has been utilized to compare the behavioral ratios of two profiles. Next, the audience network is extracted for each user, and for estimating the similarity of two sets, Jaccard similarity is used. Finally, for the Content similarity measurement, the tweets are preprocessed respecting the feature extraction method; TF-IDF and DistilBERT for feature extraction are employed and then compared using the cosine similarity method. Results have shown that TF-IDF has slightly better performance; therefore, the more straightforward solution is selected for the model. Similarity level of different profiles. As in the case study, a Random Forest classification model was trained on almost 20000 users revealed a 97.24% accuracy. This comparison enables us to find duplicate profiles with nearly the same behavior and content.
COMMUNICATION | doi:10.20944/preprints202309.1969.v1
Subject: Computer Science And Mathematics, Computer Science Keywords: Twitter; Data Analysis; Big Data; Exoskeletons; Data Science; Text Analysis; Sentiment Analysis; Content Analysis; Natural Language Processing
Online: 28 September 2023 (13:25:30 CEST)
The work of this paper presents multiple novel findings from a comprehensive analysis of about 150,000 tweets about exoskeletons posted between May 2017 and May 2023. First, findings from content analysis and temporal analysis of these tweets reveal the specific months per year when a significantly higher volume of Tweets was posted and the time windows when the highest number of tweets, the lowest number of tweets, tweets with the highest number of hashtags, and tweets with the highest number of user mentions were posted. Second, the paper shows that there are statistically significant correlations between the number of tweets posted per hour and different characteristics of these tweets. Third, the paper presents a multiple linear regression model to predict the number of tweets posted per hour in terms of these characteristics of tweets. The R2 score of this model was observed to be 0.9540. Fourth, the paper reports that the 10 most popular hashtags were #exoskeleton, #robotics, #iot, #technology, #tech #innovation, #ai, #sci, #construction and #news. Fifth, sentiment analysis of these tweets was performed using VADER and the DistilRoBERTa-base library. The results show that the percentage of positive, neutral, and negative tweets were 46.8%, 33.1%, and 20.1%, respectively. The results also show that in the tweets that did not express a neutral sentiment, the sentiment of surprise was the most common sentiment. It was followed by the sentiments of joy, disgust, sadness, fear, and anger. Furthermore, analysis of hashtag-specific sentiments revealed several novel insights, for instance, for almost all the months in 2022, the usage of #ai in tweets about exoskeletons was mainly associated with a positive sentiment. Sixth, text processing-based approaches were used to detect possibly sarcastic tweets and tweets that contained news. Finally, a comparison of positive tweets, negative tweets, neutral tweets, possibly sarcastic tweets, and tweets that contained news, in terms of different characteristic properties of these tweets are presented. The findings reveal multiple novel insights, for instance, the average number of hashtags used in tweets that contained news has considerably increased since January 2022.
COMMUNICATION | doi:10.20944/preprints202303.0453.v1
Subject: Social Sciences, Media Studies Keywords: COVID-19; MPox; Twitter; Big Data; Data Mining; Data Analysis; Sentiment Analysis; Data Science; Social Media; Monkeypox
Online: 27 March 2023 (08:39:28 CEST)
Mining and analysis of the Big Data of Twitter conversations have been of significant interest to the scientific community in the fields of healthcare, epidemiology, big data, data science, computer science, and their related areas, as can be seen from several works in the last few years that focused on sentiment analysis and other forms of text analysis of Tweets related to Ebola, E-Coli, Dengue, Human papillomavirus (HPV), Middle East Respiratory Syndrome (MERS), Measles, Zika virus, H1N1, influenza-like illness, swine flu, flu, Cholera, Listeriosis, cancer, Liver Disease, Inflammatory Bowel Disease, kidney disease, lupus, Parkinson's, Diphtheria, and West Nile virus. The recent outbreaks of COVID-19 and MPox have served as "catalysts" for Twitter usage related to seeking and sharing information, views, opinions, and sentiments involving both these viruses. While there have been a few works published in the last few months that focused on performing sentiment analysis of Tweets related to either COVID-19 or MPox, none of the prior works in this field thus far involved analysis of Tweets focusing on both COVID-19 and MPox at the same time. With an aim to address this research gap, a total of 61,862 Tweets that focused on Mpox and COVID-19 simultaneously, posted between May 7, 2022, to March 3, 2023, were studied to perform sentiment analysis and text analysis. The findings of this study are manifold. First, the results of sentiment analysis show that almost half the Tweets (the actual percentage is 46.88%) had a negative sentiment. It was followed by Tweets that had a positive sentiment (31.97%) and Tweets that had a neutral sentiment (21.14%). Second, this paper presents the top 50 hashtags that were used in these Tweets. Third, it presents the top 100 most frequently used words that are featured in these Tweets. The findings of text analysis show that some of the commonly used words involved directly referring to either or both viruses. In addition to this, the presence of words such as "Polio", "Biden", "Ukraine", "HIV", "climate", and "Ebola" in the list of the top 100 most frequent words indicate that topics of conversations on Twitter in the context of COVID-19 and MPox also included a high level of interest related to other viruses, President Biden, and Ukraine. Finally, a comprehensive comparative study that involves a comparison of this work with 49 prior works in this field is presented to uphold the scientific contributions and relevance of the same.
DATA DESCRIPTOR | doi:10.20944/preprints202206.0146.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: COVID-19; COVID; Omicron; online learning; remote learning; online education; Twitter; dataset; Tweets; social media; Big Data
Online: 21 July 2022 (08:05:19 CEST)
COMMUNICATION | doi:10.20944/preprints202206.0383.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: Exoskeleton; Twitter; Tweets; Big Data; social media; Data Mining; dataset; Data Science; Natural Language Processing; Information Retrieval
Online: 21 July 2022 (04:06:53 CEST)
The exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use-cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times of its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today's living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset, by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 tweets about exoskeletons that were posted in a 5-year period from May 21, 2017, to May 21, 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.
ARTICLE | doi:10.20944/preprints201808.0269.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: social sensing; supervised learning; statistical methods; social networks; twitter; tweets; natural disaster; random forest, kernel density estimation
Online: 15 August 2018 (11:34:43 CEST)
In recent years, online social networks have received important consideration in spatial modelling fields given the critical information that can be extracted from them for events in real time; one of the most latent issues is that regarding various natural disasters such as earthquakes. Although it is possible to retrieve data from these social networks with embedded geographic information provided by GPS, in many cases this is not possible. An alternative solution is to reconstruct specific locations using probabilistic language models, more specifically those based on Name Entity Recognition (NER), which extracts names from a user’s description about an event occurring in a specific place (e.g., a collapsed building on a specific avenue). In this work, we present a methodology to use twitter as a social sensor system for disasters. The methodology scores NER locations with a kernel density estimation function for different subtopics originating from a natural disaster and that maps them into a geographic space is proposed. The proposed methodology is evaluated with tweets related to the 2017 earthquake in Mexico.
COMMUNICATION | doi:10.20944/preprints202310.0157.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: online learning; COVID-19; Twitter; Data Analysis; Natural Language Processing; Sentiment Analysis; Subjectivity Analysis; Toxicity Analysis; Diversity Analysis
Online: 3 October 2023 (12:59:21 CEST)
The work presented in this paper presents several novel findings from a comprehensive analysis of about 50,000 Tweets about online learning during COVID-19, posted on Twitter between November 9, 2021, and July 13, 2022. First, the results of sentiment analysis from VADER, Afinn, and TextBlob show that a higher percentage of these tweets were positive. The results of gender-specific sentiment analysis indicate that for positive tweets, negative tweets, and neutral tweets, between males and females, males posted a higher percentage of the tweets. Second, the results from subjectivity analysis show that the percentage of least opinionated, neutral opinionated, and highly opinionated tweets were 56.568%, 30.898%, and 12.534%, respectively. The gender-specific results for subjectivity analysis indicate that for each subjectivity class, males posted a higher percentage of tweets as compared to females. Third, toxicity detection was performed on the tweets to detect different categories of toxic content - toxicity, obscene, identity attack, insult, threat, and sexually explicit. The gender-specific analysis of the percentage of tweets posted by each gender in each of these categories revealed several novel insights. For instance, for the sexually explicit category, females posted a higher percentage of tweets as compared to males. Fourth, gender-specific tweeting patterns for each of these categories of toxic content were analyzed to understand the trends of the same. The results unraveled multiple paradigms of tweeting behavior, for instance, the intensity of obscene content in tweets about online learning by males and females has decreased since May 2022. Fifth, the average activity of males and females per month was calculated. The findings indicate that the average activity of females has been higher in all months as compared to males other than March 2022. Finally, country-specific tweeting patterns of males and females were also performed which presented multiple novel insights, for instance, in India a higher percentage of the tweets about online learning during COVID-19 were posted by males as compared to females.
COMMUNICATION | doi:10.20944/preprints202309.0694.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: COVID-19; long COVID; social media; Twitter; big data; data analysis; natural Language processing; data science; sentiment analysis
Online: 12 September 2023 (05:32:14 CEST)
Since the outbreak of COVID-19, social media platforms, such as Twitter, have experienced a tremendous increase in conversations related to Long COVID. The term “Long COVID” describes the persistence of symptoms of COVID-19 for several weeks or even months following the initial infection. Recent works in this field have focused on sentiment analysis of Tweets related to COVID-19 to unveil the multifaceted spectrum of emotions, viewpoints, and perspectives held by the Twitter community. However, most of these works did not focus on Long COVID, and the few works that focused on Long COVID have limitations. Furthermore, no prior work in this field has investigated Tweets where individuals self-reported experiencing Long COVID on Twitter. The work presented in this paper aims to address these research challenges by presenting multiple novel findings from a comprehensive analysis of a dataset comprising 1,244,051 Tweets about Long COVID, posted on Twitter between May 25, 2020, and January 31, 2023. First, the analysis shows that the average number of Tweets per month where individuals self-reported Long COVID on Twitter, has been considerably high in 2022 as compared to the average number of Tweets per month in 2021. Second, findings of sentiment analysis using VADER show that the percentage of Tweets with positive, negative, and neutral sentiment were 43.12%, 42.65%, and 14.22%, respectively. Third, the analysis of sentiments associated with these Tweets also shows that the emotion of sadness was expressed in most of these Tweets. It was followed by the emotions of fear, neutral, surprise, anger, joy, and disgust, respectively.
ARTICLE | doi:10.20944/preprints201908.0226.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: crowdsourcing; citizen science; ecotourism; Facebook; Flickr; photo-elicitation; Instagram; photovoice; social media; social networking sites; Twitter; wildlife conservation
Online: 21 August 2019 (10:34:58 CEST)
The first two decades of the 21st-century have seen the emergence of the modern citizen science movement, increased demand for niche eco and wildlife tourism experiences, and the willingness of people to voluntarily share information and photographs online. To varying extents, the rapid growth of these three phenomena has been driven by the availability of portable smart devices, access to the Web 2.0 internet from almost anywhere on the planet, and the development of applications and services, including social media/networking sites (SNSs). In addition, the number of peer-reviewed publications that explore how text and images shared on SNSs can be data-mined for academic research has surged in recent years. This systematic quantitative review has two goals. The first goal is to provide an oversight of how the photographs that ecotourists share online are contributing to wildlife tourism research. The second goal is to promote the emerging photovoice technique as a theoretical context for social research based on the photographs and comments that ecotourists share on SNSs. From the perspectives of community benefits, conservation behaviours, and environmental education, there are many similarities between authentic ecotourism experiences and quality ecological citizen science programs. Much of the literature regarding the theory and practice of citizen science reports on the difficulties of attracting, training, motivating and retaining community members. The synthesis of this review is that crowdsourcing wildlife and tourism data from comments and photographs that ecotourists share on SNSs is a credible method of research that provides a self-replenishing pool of citizen scientists.
ARTICLE | doi:10.20944/preprints202205.0238.v2
Subject: Computer Science And Mathematics, Information Systems Keywords: COVID-19; SARS-CoV-2; Omicron; Twitter; tweets; sentiment analysis; big data; Natural Language Processing; Data Science; Data Analysis
Online: 7 July 2022 (08:36:40 CEST)
This paper presents the findings of an exploratory study on the continuously generating Big Data on Twitter related to the sharing of information, news, views, opinions, ideas, knowledge, feedback, and experiences about the COVID-19 pandemic, with a specific focus on the Omicron variant, which is the globally dominant variant of SARS-CoV-2 at this time. A total of 12028 tweets about the Omicron variant were studied, and the specific characteristics of tweets that were analyzed include - sentiment, language, source, type, and embedded URLs. The findings of this study are manifold. First, from sentiment analysis, it was observed that 50.5% of tweets had the ‘neutral’ emotion. The other emotions - ‘bad’, ‘good’, ‘terrible’, and ‘great’ were found in 15.6%, 14.0%, 12.5%, and 7.5% of the tweets, respectively. Second, the findings of language interpretation showed that 65.9% of the tweets were posted in English. It was followed by Spanish or Castillian, French, Italian, Japanese, and other languages, which were found in 10.5%, 5.1%, 3.3%, 2.5%, and <2% of the tweets, respectively. Third, the findings from source tracking showed that “Twitter for Android” was associated with 35.2% of tweets. It was followed by “Twitter Web App”, “Twitter for iPhone”, “Twitter for iPad”, “TweetDeck”, and all other sources that accounted for 29.2%, 25.8%, 3.8%, 1.6%, and <1% of the tweets, respectively. Fourth, studying the type of tweets revealed that retweets accounted for 60.8% of the tweets, it was followed by original tweets and replies that accounted for 19.8% and 19.4% of the tweets, respectively. Fifth, in terms of embedded URL analysis, the most common domains embedded in the tweets were found to be twitter.com, which was followed by biorxiv.org, nature.com, wapo.st, nzherald.co.nz, recvprofits.com, science.org, and other URLs. Finally, to support similar research and development in this field centered around the analysis of tweets, we have developed an open-access Twitter dataset that comprises tweets about the SARS-CoV-2 omicron variant since the first detected case of this variant on November 24, 2021.
ARTICLE | doi:10.20944/preprints202111.0023.v1
Subject: Engineering, Control And Systems Engineering Keywords: Twitter; Social Media Analysis; User Behavior Mining; Crime Detection; Feature Extraction; Graph Analysis; Natural Language Processing; Text Classification; Aspect-based Sentiment Analysis; DistilBERT
Online: 1 November 2021 (15:25:19 CET)
Maintaining a healthy cyber society is a big challenge due to the users’ freedom of expression and behaving. It can be solved by monitoring and analyzing the users’ behavior and taking proper actions towards them. This research aims to present a platform that monitors the public content on Twitter by extracting tweet data. After maintaining the data, the users’ interactions are analyzed using Graph Analysis methods. Then the users’ behavioral patterns are analyzed by applying Metadata Analysis, in which the timeline of each profile is obtained; also, the time-series behavioral features of users are investigated. Then in the Abnormal Behavior Detection Filtering component, the interesting profiles are selected for further examinations. Finally, in the Contextual Analysis component, the contents will be analyzed using natural language processing techniques; A binary text classification model (SVM + TF-IDF with 88.89% accuracy) for detecting if the tweet is related to crime or not. Then, a sentiment analysis method is applied to the crime-related tweets to perform aspect-based sentiment analysis (DistilBERT + FFNN with 80% accuracy); because sharing positive opinions about a crime-related topic can threaten society. This platform aims to provide the end-user (Police) suggestions to control hate speech or terrorist propaganda.
REVIEW | doi:10.20944/preprints202005.0234.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: SIRD; Twitter; GHSI; Pre-symptomatic; EHR; Contact tracing; On-line survey; qRT-PCR; X-ray; CT/HRCT; CNN; Autoencoder; Drug affinity; CPI; and Inflation.
Online: 14 May 2020 (11:25:57 CEST)
World is now experiencing a major health calamity due to the coronavirus disease (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus clade 2 (SARS-CoV- 2). The foremost challenge facing the scientific community is to explore the growth and transmission capability of the virus. Use of artificial intelligence (AI), such as, deep learning, in (i) rapid disease detection from x-ray/computerized tomography (CT)/ high-resolution computed tomography (HRCT) images, (ii) accurate prediction of the epidemic patterns and their saturation throughout the globe, (iii) identification of the epicenter in each country/state and forecasting the disease from social networking data, (iv) prediction of drug-protein interactions for repurposing the drugs, and (v) socio-economic impact and prediction of future relapses, has attracted much attention. In the present manuscript, we describe the role of various AI-based technologies for rapid and efficient detection from CT images complementing quantitative real time polymerase chain reaction (qRT-PCR) and immunodiagnostic assays. AI-based technologies to anticipate the current pandemic pattern, possibility of future relapses and socio-economic impact are also discussed. We inspect how the virus transmits depending on different factors, such as, population density and mobility among others. We depict how AI-based mobile app for contact tracing and surveys can prevent the transmission. A modified deep learning technique can assess affinity of the most probable drugs to treat COVID-19. Here a few effective antiviral drugs, such as, Geneticin, Avermectin B1, and Ancriviroc among others, have been reported with their appropriate validation from previous investigations.
ARTICLE | doi:10.20944/preprints202211.0005.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: Epidemics; Twitter; Natural Language Processing; Topic Modelling; Sentiment Analysis; ARI; Cholera; Ebola; HIV/AIDS; Influenza; Malaria; Spanish influenza; Swine flu; Tuberculosis; Typhus; Yellow fever; and Zika
Online: 1 November 2022 (01:17:14 CET)
At the end of 2019, while the world was being hit by the COVID-19 virus and, consequently, was living a global health crisis, many other pandemics were putting humankind in danger. The role of social media is of paramount importance in these kinds of contexts since they help health systems to cope with emergencies by contributing to conducting some activities such as the identification of public concerns, the detection of infections’ symptoms, and the traceability of the virus diffusion. In this paper, we have analyzed comments on events related to cholera, ebola, HIV/AIDS, influenza, malaria, Spanish influenza, swine flu, tuberculosis, typhus, yellow fever, and zika, collecting 369,472 tweets from the 3rd of March to the 15th of September, 2022. Our analysis has started with the collection of comments composed of unstructured texts on which we have applied natural language processing solutions. Afterward, we have employed topic modelling and sentiment analysis techniques to obtain a collection of people’s concerns and attitudes toward these pandemics. According to our findings, people's discussions were mostly about malaria, influenza, and tuberculosis and the focus was on the diseases themselves. As regards emotions, the most popular were fear, trust, and disgust where trust is mainly regarding HIV/AIDS tweets.