Hold On! Your emotion and behaviour when falling for fake news in social media

Researchers are concerned about the impact of fake news on democracy, while it could also escalate to life-threatening problems. Fake news continues to spread, so does people’s behaviour and emotions about fake news via social media. This opens up the back door for cyber-criminals to entice people (i.e. taking advantage of victims’ emotional and behavioural aspects) to click on fraudulent links (e.g. phishing links) associated with fake news when reading. Therefore, we investigate how people’s emotional and behavioural features influence reading and diffusing fake news on social media. We proposed a classification model incorporating people’s behavioural features and their emotions to better detect fake news in social media. Our results reveal that fake news has more negative emotions than legitimate ones and both title and the content of the news/posts are equally important. Furthermore, we have identified that there exist strong correlations between some of the behavioural and emotional features. Finally, we concluded that emotional and behavioural features are important for fake news classifications as they improve the accuracy of detecting fake news, and the findings of our study can ultimately be used to develop a risk score prediction model for fake news in social media.

the five months preceding the election day spread either fake or extremely biased news, i.e. 30 million tweets from 2.2 million users contain a link to fake news outlets [5]. Fake news is also said to have influenced the UK European Union membership (Brexit) referendum. More recently, Australian Parliament emphasised that fake news and misinformation have loomed over the Australian federal election in 2019 [7].
The most serious case is when fake news may also lead to life-threatening issues like with the contemporary example of Coronavirus (i.e. COVID-19) pandemic [16]. During the COVID-19 pandemic, as Coronavirus continues to spread, so does misinformation via social media platforms. There were some exciting news contents spreading through Facebook or Twitter, which seem to come from a legitimate source that contains valid advice from medical professionals but turn out to be erroneous and in some cases dangerously egregious. For example, Johns Hopkins University warned of messages that were being shared on social media platforms which claimed to have an "excellent summary" of COVID-19, but has been labelled as "misattributed" by Snopes [16].
For the news industry, telling stories through information has traditionally been organised around "facts" [2]. Fake news contains stories that aren't true or some stories that have some truth but aren't 100 per cent accurate. Due to the pervasiveness of Internet and Communication Technologies [4], social media (e.g. Facebook, Twitter, or even more professional social media networking websites like LinkedIn) has become the main source for the prevalence of fake news and dissemination of misinformation [3,12]. These platforms allow almost anyone, who has a profile with them, to publish their opinions or share thoughts to the world. However, the problem arises when people ignore fact-checking, for example, checking the source of the material that they read online before sharing it, resulting in fake news being spreading quickly or even "going viral" over the social media websites. On the other hand, it has become a challenge to detect the source of news stories appearing on the internet, which can make it difficult to assess their legitimacy.
During the COVID-19 pandemic, a study revealed that Facebook failed to identify fake news on its platforms [6]. As reported, Facebook approved a number of paid advertisements on its platforms claiming everything about COVID-19 facts including the ones that contain malicious deception (i.e. hoax), such as the idea of drinking bleach to help keep us healthy. Moreover, fake news could lead the readers more susceptible to phishing attacks [4]. For example, cyber-criminals can craft fake news and spread through social media enticing the victims to click on malicious links (a.k.a. phishing links) associated with the news. These kinds of attacks aim to steal confidential information such as username, password and online banking details from its victims. One can argue, that the main reason for the sudden escalation of fake news diffusion in social media during disease pandemic or election times is because people are often more emotional and tend to react or respond quickly during such times.
The likelihood of news on social media is being fake can be associated with many factors, such as the metadata of the news articles, for example, source, published country, and top-image, as validated by several studies [15]. So far, however, there has been very little research published on the human aspects of fake news identification [8], for example, how people's psychological aspects (e.g. emotions or behaviours) influence their trust decisions when dealing with fake news. Fake news raises the readers' emotions together with their behavioural aspects (e.g. reacting to or sharing the news) to make them trapped through leveraging feelings (i.e. sentiments) and behaviours. Therefore, it is imperative to investigate how people's emotional and behavioural features influence on their falling (e.g. reading and disseminating) for fake news in social media.
There has been a number of studies conducted in the direction of analysing emotions in news to detect fake news [1,2,9,17]. These studies are mostly limited to the negative and positive polarities of the keywords. However, they failed to consider numerous lexical features of the news/posts in their sentiment analysis, for example, emojis (e.g. ':-)'), sentiment related acronyms and initialism (e.g. 'LoL', 'WTF') and commonly used slang (e.g. 'nah', 'meh', 'giggly'). Furthermore, emotions expressed in different forms in social media, for example, through the title of the news/posts, images and videos used, or users' comments and reactions, are not considered in the previous studies [1,2,9]. Moreover, there has little attention been paid on analysing users' behavioural features to detect fake news in social media [12,17].
On the one hand, it is important to understand the fact that perpetrators use people's (i.e. readers') emotions and behaviour against people, while on the other hand it is also important to learn how emotional and behavioural features are used to create fake news that can manipulate people (i.e. readers) in social media [12]. Therefore, this study proposes a comprehensive model based on users' behavioural and emotional features to investigate their impact on fake news identification. Specifically, using our proposed model, we study how different user behavioural (such as number of likes, shares, and comments) and emotional (such as positive, negative, and neutral sentiment scores in news content or title) features influence on reading fake news in social media. Moreover, we propose a classification model incorporating people's behavioural features and emotions to better detect fake news in social media. The research findings of our study can be used in a risk prediction model, that can be used as a browser plug-in to social media apps, to predict the risk score of any news content being fake in social media.

METHODOLOGY
In this section we describe the proposed framework for our study, as outlined in Figure 1. It consists of four main categories, which are data types, models, analysis, and outcome.

Data types
News or posts in social media often contain a rich set of hidden information in the form of emotional and behavioural features. The three types of data required for our proposed framework to identify fake news are: (1) Emotional features: Emotions expressed by the news can have a significant impact on trapping a user into the fake news [1]. Highly sensitive emotional aspects in the news can easily target victims to fall into the fake news and further diffuse the news to others in social media. Emotions are not readily available as features in the data, but they are hidden in the text data of the news content, title of the news/post, emoticons used, images and/or videos included in the news, as well as in the comments and replies to the news. These emotions can be positive, negative, or neutral. For example, joy, happiness, love, interest, satisfaction, contentment, serenity, and amusement are positive emotions, while fear, anger, disgust, sadness, hate, and panic are negative emotions. Neutral emotions are middle ground feelings where neither extremely positive nor extremely negative feelings are experienced, for example, 'OK', 'not too bad', and 'nothing much' state neutral emotions. Extracting the emotions from the social media news data using sentiment analysis techniques would help improve the classification/prediction model used for fake news identification.
(2) Behavioural features: These features correspond to the behavioural aspects of users with regard to the news/posts in social media. Behavioural features can be captured from the number of shares or re-tweets, number of likes, number of reactions, number of comments, and number of replies to a news/post in social media. Behavioural features influence users in reading and sharing the news, as often users get influenced by the behaviour of other users (who could be known friends, unknown users, in the common circle of friends, or popular users) in social media. We believe incorporating such behavioural features into the classification/prediction model for fake news identification can help improve accuracy.
(3) Ground truth labels: Most of the classification/prediction models used for fake news identification are supervised machine learning methods (for example, support vector machines, random forests, or deep learning neural networks) [14], and hence require training data with ground truth in the form of 'fake' and 'legitimate' labels. Ground truth labels are required in our study for the analysis tasks as well as for training the fake news classification and prediction models.
In our experimental study, we use the following datasets encompassing different categories, including online news articles, print media data, social media news, and short political statements: (1) Buzz_feed: This dataset is available in Kaggle 1 containing 1932 fake and 2537 real entries of online news articles with title, text, authors, source, and metadata features.
(2) FA-KES: This Kaggle dataset 2 consists of 804 news articles from several media outlets including mobilisation press, loyalist press, and diverse print media.
(3) News: This is another Kaggle news dataset 3 containing around 20700 news entries with the attributes of title, author, text of the news, and label.
(4) SMNews: This dataset contains social media news 4 of around 13000 entries with several attributes including title, text, number of replies, likes, comments, participants, and shares.

Models
Three different models are used in our proposed framework, each of which corresponds to different tasks.
(1) Sentiment score modeling: Sentiment analysis is important in fake news identification to gauge the attitudes, sentiments, and emotions contained in the news [1]. We use VADER (Valence Aware Dictionary and sEntiment Reasoner) [9], which is an open-source lexicon and rule-based sentiment analysis tool that is specifically developed to extract sentiments expressed in social media. It analyses sentiments based on the use of exclamation marks, capitalization, intensifying words (e.g. extremely, considerably), conjunctions (e.g. but, in addition to), emojis, slangs, acronyms and emoticons. The output of this model is the positive, negative, neutral and compound sentiment scores. The positive, negative, and neutral scores are summed to 1.0, while the compound score is the sum of lexicon ratings normalized between -1 and +1. A compound score of above 0.05 indicates positive sentiment, below −0.05 indicates negative sentiment, and in between indicates neutral.
(2) Fake news classification: A machine learning-based supervised method (such as random forest or support vector machine) is used to train the classification model on the emotional and behavioural features extracted from the training data. The trained model is used on the test data to validate the model. A new data is then classified as fake or legitimate based on the trained model.
(3) Risk score prediction: Based on the analysis results of our study, a risk score prediction model can be developed to predict the likelihood of news or posts in social media being fake based on behavioural and emotional aspects of the news. Predicted risk scores can be used to alert the users of the potential risk.

Analysis
We conduct four different types of analysis with regard to fake news identification in social media. Each of these analysis contributes to the understanding of emotional and behavioural features of legitimate vs. fake news.
(1) Comparison of sentiment scores for fake and legitimate news: We compare the sentiment scores of fake and legitimate news calculated by the sentiment model to study the different ranges of sentiment scores and patterns. This allows us to study which sentiment (negative, positive, or neutral) is most influential in fake news and which aspect of the emotions (news title, content, images, videos, or emoticons) contributes significantly to the sentiment scores. For example, fake news title might have significantly higher negative sentiment scores than real news, as fake news generally have a strong title to emotionally attract users.
(2) Feature importance: We study which features are important in the fake news classification task. We rank the features according to their importance considered by the classifier [11]. For example, linear and logistic regression models use the coefficient of features to determine their importance, while decision tree and random forest models use the reduction in the criterion used to select split points, like Gini or entropy, to calculate the importance scores of features. These scores help identify the most significant features to classify fake news and legitimate news. Moreover, they can be used as weights for the corresponding features in the risk score prediction model. shares. We use the Pearson correlation for this analysis, which is a statistical method to calculate the strength of correlation between two continuous variables/features [11].
(4) Prediction accuracy: We study how effective the classification model trained on emotional and behavioural features is on the test dataset. This analysis compares the predicted labels ('fake' and 'legitimate') by the classifier with the ground-truth labels to evaluate how accurate the predicted labels are. We use the standard metrics, precision, recall, and 1-measure, to evaluate the accuracy of classification [11].

Outcome
The outcomes of our study are the findings of the analysis study, predictions for new data using the trained classification model on emotional and behavioural aspects of news, and risk scores predicted for new data using the prediction model that predicts the likelihood of news/posts being fake in social media. interactive, such that the input, ratings or suggestions from the users about the predicted risk score are used to improve the prediction model.

RESULTS
In this section, we present the results of our experimental study with regard to the four different analyses used in our framework.
1. Comparison of sentiment scores for fake and legitimate news: Figure 2 shows the comparison of positive and negative sentiment scores for fake and real/legitimate news title and content. In general, negative emotions are abundant in the news of both real and fake, however, the negative sentiment scores are higher on the fake news compared to real news. This shows that fake news often use high negative emotions to trap victims. Moreover, the emotions in news title play an important role in pulling victims, especially in social media.
2. Feature importance: Which of the emotional and behavioural features are highly important in identifying fake news using the random forest classification model is analysed in Figure 3. The results show that negative and neutral emotions in the news content and title are essential in distinguishing fake and real news compared to positive emotions. Specifically, in the title, positive emotions are largely common in both fake and real news and therefore it is not a significant feature for the classification. The same pattern appears with the news content as well, where the positive scores are not as important as negative and neutral scores. The negative and neutral sentiment scores in news title provide importance as closer to the importance of the sentiment scores in the news content, which means that sentiment analysis of title should be used as an essential feature in any models used for fake news identification. It is worth noting that we have achieved similar results with other classification models (such as support vector machine and logistic regression) as well, however, due to space limitations we do not include all the results.   These observations imply that news with negative emotions trigger more actions by users in social media, and therefore cyber-criminals often use negative emotions in fake news as a successful strategy.
4. Accuracy of classification: Finally, we evaluate the accuracy of our classification model built using the behavioural and emotional features of news in Figure 5. These results show that a higher accuracy of fake news identification is achieved with our approach of using emotional and behavioural features. The SMNews dataset contains many emotional and behavioural features compared to the other datasets, followed by the Buzz_feed dataset, and therefore these datasets achieve higher accuracy of fake news identification using our model. Liar dataset contains only the news contents which are also in short text and therefore does not provide much information to the classification model. Similarly, the FA-KES and Kaggle News datasets do not contain any behavioural features, and thus the accuracy of the model on these datasets is lower. Higher accuracy achieved with datasets containing more emotional and behavioural features validates the significance of these features for fake news identification.

Summary of key findings
We summarize the key findings of our study as below: (1) Fake news tends to have more negative emotions with higher sentiment scores than legitimate news.
(2) Emotions in the title of news/posts are equally important as emotions in the news content to detect fake news.
Specifically, negative emotions in the title play a key role in classifying news as fake or not.
(3) Strong correlations exist between some of the behavioural and emotional features. One of the interesting observations is that neutral sentiment in the news has a negative correlation with the number of likes or shares, while negative sentiment has a strong positive correlation with those behavioural features. This reveals that news with negative emotions have more likelihood of being diffused in social media, and since fake news have more negative sentiment scores than real news, they are more likely to be diffused. For example, posts containing strong emotions about COVID-19 death toll tend to get shared and/or commented by many people. Cyber-criminals take advantage of this by crafting fake news with strong emotional elements, so that many people would fall for cyber-attacks.
(4) Finally, when more emotional and behavioural features are available, the accuracy of fake news classification significantly improves, which necessitates using such information for effective fake news classification. Moreover, these features are abundantly available in social media. Utilising such valuable information is the key to successfully classify and predict potential fake news/posts in social media.

BREAKING INTO PEOPLE'S MINDSET VIA FAKE NEWS
Our study findings have shown people's emotions and behaviours play a significant role when falling for fake news in social media. Therefore, fake news can be crafted to entice readers to perform malicious activities through manipulating their emotions and behaviours.

Breaking into people's emotions via fake news
Our results revealed that fake news contains more negative emotions than legitimate ones. Furthermore, the present findings also highlight the emotions of both title and content are equally important and in particular, negative emotion will help identify fake news from legitimate ones. This demonstrates how perpetrators can leverage their malicious activities through crafting fake news with negative emotions.
Fake news that contains more negative emotions can obviously lead someone feeling miserable and sad, take away their confidence and affect their well-being. This has been seen in the case of American election in 2016 [13]. Similarly, "Brexit" referendum and more recently, Australian Parliament emphasised that fake news and misinformation not only have influenced on people's confidence but also have been used to share racist comments and hate speech encouraging violence on social media [7]. As a result, they indirectly helped the perpetrators to make the fake news becoming more popular within a short time period on social media.
Perpetrators (i.e. cyber-criminals) can possibly lure people to perform various malicious tasks by crafting fake news that contains more emotional aspects. A contemporary example is that cyber-criminals have used fake news during the COVID-19 pandemic as a "weapon" to steal sensitive information, such as usernames, passwords or online banking details from the victims, as people are negatively emotional during the pandemic situation. As reported by the Australian Cyber Security Centre, almost 100 new scams related to COVID-19 pandemic have been reported since March 2020, claiming money or personal information loss. Recipients were asked to click on a link to learn more about COVID-19 restrictions and also to educate themselves about how to be safe during the COVID-19 pandemic. Therefore, "COVID-19-themed" scams reflect on our findings highlighting the techniques/strategies of crafting fake news with a title that contains highly negative emotions.
In summary, fake news can be crafted to entice readers to perform malicious activities through manipulating their emotions. This is so called the art of human hacking (e.g. "identity theft" or "phishing") through psychological traits (i.e. in our case incorporating emotional and behavioural features into fake news).

Breaking into people's behaviours via fake news
Our results revealed strong correlation between some of the behavioural and emotional features. One of the interesting observations is that news containing negative emotions has more likelihood of being diffused in social media. Hence, fake news are more likely to be diffused in social media than real ones since fake news have higher negative sentiment scores. In other words, people tend to react to fake news (e.g. reading, reacting, commenting and sharing) than legitimate ones in social media platforms.
For example, one tends to believe a fake news/post about a vaccine developed for COVID-19 if it was shared by a close friend or family. When others are reacting or commenting to the post positively, the level of distrust about the (fake) news/post will disappear even without noticing. Perpetrators use this social engineering strategy to increase the likelihood of reading fake news/posts by many people (i.e. victims), and thereby increase their susceptibility to an attack (e.g. phishing attack) if that was meant to be the target.
Geeng et al. [8] investigated how people interact with fake news in social media. Their study findings have revealed that participants often took posts at face value or looked to the poster for context when they were uncertain. On the other hand, one can argue that perpetrators might take advantage of the users' improper behaviour (e.g. taking fake news/posts at face value, looking at the context of the news, or looking at others' behaviour towards the news/posts) to leverage their malicious activities.
For instance, perpetrators (a.k.a. cyber-criminals) can attempt to use click farms (where a large group of low-paid workers are hired to react and share/re-tweet fake news/posts in social media) to influence users' behaviour towards fake news. Making the users influenced by the fake likes/reactions to such fake news, they can lure users to click on fraudulent links associated with the fake news posts soliciting to find out more information. People who have clicked on the fraudulent links, unfortunately will disclose their credentials to the hackers. Moreover, there are circumstances, where once the link is clicked, hackers will get a malicious IT application (so-called "Ransomware") automatically installed on the victim's computer, which encrypts all the data including all files, folders, audios, videos and images, leaving the computer system pretty much no longer accessible to the victim. To get access to the data back, hackers might demand a ransom (i.e. virtual currency paid by Bitcoin), so that no-one can trace who the hackers nor the victims are.

LIMITATIONS AND FUTURE WORK
While the initial results of our preliminary study are promising in terms of reflecting the current example scenarios (e.g. fake news during the COVID-19 pandemic and election times) as well as some previous cross-domain research studies (as discussed in the previous section), it has several limitations and opens up interesting future directions: (1) Conducting a user study to investigate how readers' emotions and behaviours are affected by fake news is an important research direction (2) Validating the importance of emotions available in images and videos for fake news identification is important as they often contain a lot of hidden emotions. This study requires appropriate datasets containing images and/or videos with ground-truth labels, i.e. fake and real news with images and videos need to be collected.
(3) Similarly, investigating other emotional aspects, associated with, for example, religious and racism aspects, requires relevant datasets. Since datasets with such information are not readily available, we need to crawl such information from social media or collect data based on a user study, which might not be possible due to privacy consideration.
(4) Behavioural features in this study are also limited to the number of shares, comments, likes, reactions, replies, and participants of the news/posts. However, there are other behavioural features that could play an important role in fake news identification. Such advanced features include the number of mutual friends or common users who have liked, shared, or commented, as well as the number of popular or favourite users (e.g. celebrities) who have reacted, shared or replied to posts/news. Since users in the common circle of friends or popular/favourite users have more influential power, it would be interesting to study the importance of such features in fake news identification, which obviously needs crawling such data from social media.

CONCLUSION
The wide spread use of social media has led to the proliferation of fake news. Social media has increasingly become one of the main news sources due to the fact of low cost, easy access and rapid diffusion of published news. However, the trustworthiness of the published news has become a serious question, and it brings in significant risk of exposure to 'fake news', which is intentionally written to mislead the readers, leading to several psychological, financial and economical damages to individuals, organisations and the government.
In this paper, we propose a novel framework for fake news identification in social media. Our framework leverages the vastly available emotional and behavioural information of users when reading news/posts in social media to improve the accuracy of fake news identification. We conducted an extensive experimental study on several datasets, which shows the efficacy of using users' emotional and behavioural features for fake news identification. The findings of our study can be used in a risk prediction model to predict the likelihood of news or posts being fake and eventually to be implemented as browser plug-in to social media apps to alert users and make them aware of the potential risks associated with each of the posts/news in social media. Our findings can also be used to educate the users in terms of which aspects to be aware of when reading (fake) news in social media.