Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa

Shaowen Wang; Qingyang Liu; Yanrong Hu; Hongjiu Liu

doi:10.20944/preprints202412.2020.v1

Submitted:

20 December 2024

Posted:

24 December 2024

You are already at the latest version

Abstract

In today’s information age, dominated by social media, online platforms have become crucial venues for information dissemination. While the free flow of information promotes public participation, it also introduces certain challenges. Therefore, analyzing the evolution of public opinion and extracting public sentiment holds significant practical value for managing online public sentiment. This study takes the Zibo Barbecue incident as a case study, utilizing the two-dimensional theory of emotion and Top2Vec for thematic analysis of public opinion comments. By combining sentiment dictionary methods with the RoBERTa model, we conducted a sentiment polarity analysis of public opinion comments. The results show that the RoBERTa model achieved an accuracy of 98.46% on the test set. The proposed method effectively uncovers public sentiment biases and the influencing factors on public emotions during the evolution of public opinion events, providing a more comprehensive understanding of the emotional dynamics throughout the development of public sentiment. This deeper insight aids in addressing issues related to public opinion more effectively.

Keywords:

BiliBili

;

sentiment analysis

;

public opinion evolution

;

Top2Vec

;

RoBERTa

;

two-dimensional theory of emotion

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

In today’s information age, dominated by social media, online platforms have become vital venues for information dissemination. Every day, hundreds of millions of users share various types of information on these platforms, including numerous high-profile events. This information often contains rich personal opinions and emotional biases. The rise of social media has made information dissemination faster and more widespread, where a single post or comment can spark extensive discussion and reaction within a short period. This rapid spread and large-scale interaction have not only transformed the way information circulates but also profoundly influenced the formation and evolution of public opinion. Through posting, commenting, and sharing, users express their personal stances, creating a complex information ecosystem. This phenomenon has simultaneously brought about unprecedented social changes and challenges [1]. While the free flow of information promotes public participation, it also brings certain challenges. Information on social media may include false or misleading content, and even malicious or hate speech, all of which pose potential threats to public opinion and social stability [2]. Therefore, in this dynamic and ever-changing environment, understanding and harnessing the power of social media has become a crucial topic in the information age [3].

Following the "Zibo Barbecue Incident," it not only garnered widespread attention in a short period but also sparked intense public discussion on online platforms. This paper takes the "Zibo Barbecue Incident" as a case study, using Python web scraping to collect comments from videos related to the event on the BiliBili platform as the dataset. Textual topic mining was performed using Top2Vec, and a custom sentiment dictionary was constructed based on the Dalian University of Technology Sentiment Dictionary for annotating the text. The sentiment classification was then evaluated using the RoBERTa model. According to the lifecycle theory, the public opinion dissemination cycle was divided into the initiation, outbreak, decline, and cessation stages. Based on the two-dimensional theory of emotion, comments were classified into valence and arousal, and two-dimensional topic extraction was performed accordingly. Finally, the evolution of sentiment means, two-dimensional topic analysis, and the evolution of comment popularity were conducted to analyze the evolution of public opinion. This approach provides valuable insights for understanding and addressing related issues and offers a useful reference for public opinion analysis in similar events.

The innovations of this study are as follows:

(1) Traditional sentiment analysis typically focuses only on the categories of emotions, neglecting the intensity of emotions. This study integrates the two-dimensional theory of emotion, analyzing changes in different emotional states from the dimensions of valence and arousal. This approach allows for a more comprehensive capture and understanding of the emotional dynamics in the evolution of public opinion.

(2) By combining a sentiment dictionary with deep learning models, this study addresses the limitation of low efficiency in traditional manual annotation. The sentiment dictionary provides rich prior knowledge to help identify emotional tendencies in the text, while the deep learning model, through automatic learning from large datasets, further enhances classification accuracy and generalization ability. Through model training, the accuracy of sentiment classification results has been more comprehensively evaluated and validated.

The second part of this study reviews the current state of research methods in online public opinion. The third part outlines the research framework and methodology of this study. The fourth part presents the research process and results analysis. The fifth part concludes the study.

2. Related Research

Compared to traditional public opinion, online public opinion relies on major information exchange platforms, offering a broader reach that can connect with groups across different regions, age groups, professions, and interests. Traditional public opinion is limited by geography and media channels, whereas online public opinion transcends these limitations, often exerting a more extensive influence [4]. Analyzing comment data on social media can help us understand public sentiment and the patterns of public opinion evolution [5]. In recent years, sentiment classification and topic mining have been widely applied and extensively researched in this field [6].

(1) Research on Sentiment Classification Methods

Sentiment classification methods can be categorized into fine-grained and coarse-grained approaches.Fine-grained analysis involves sentiment classification methods based on emotional polarity and intensity using sentiment dictionaries. Sentiment classification methods based on sentiment dictionaries use pre-constructed lexicons to determine the sentiment orientation of the text by matching and analyzing the emotional vocabulary within the text. Constructing a sentiment dictionary involves filtering and categorizing vocabulary to create a lexicon that accurately reflects emotional nuances. Zhang et al. expanded sentiment dictionaries by extracting and constructing related dictionaries, such as network terminology dictionaries and negation dictionaries, to enhance topic monitoring on Weibo [7]. Nie et al. proposed a method that combines semantic mapping functions with dictionary construction to capture the rich emotions hidden in hotel review texts [8]. Liu et al. combined sentiment dictionaries with pre-trained word embeddings and used TF-IDF values for weighting. By calculating the weights of sentiment words and neutral words separately and highlighting the role of sentiment words in sentence vectors, they improved the accuracy of sentiment analysis [9].

Coarse-grained text sentiment analysis methods involve using machine learning or deep learning techniques to classify the overall sentiment of the entire text.

Machine learning-based sentiment classification methods rely on training large-scale labeled datasets to learn patterns and features of emotional expression. By extracting text features and applying classifiers, these methods identify sentiments within the text. Stefanis et al. explored the emotions related to daily COVID-19 monitoring reports posted on Facebook pages and used machine learning algorithms to predict sentiment classifications [10]. Rahman et al. proposed a multilayer classification model that employs supervised machine learning techniques, achieving better recall rates in sentiment classification tasks [11]. Hokijuliandy et al. used a combination of SVM classification and chi-square feature selection methods for sentiment analysis. Their analysis of user comments revealed the main trends in positive reviews [12].

Deep learning models use word embedding techniques (such as Word2Vec and GloVe) to simplify feature engineering and capture semantic information. They employ Recurrent Neural Network (RNN) and Long Short-Term Memory network (LSTM) to handle sequential data, and use Convolutional Neural Network (CNN) to capture local features [13], Incorporate attention mechanisms and Transformer models (such as BERT) to capture global dependencies [14], and enhance the model’s generalization ability through large-scale pre-training and fine-tuning. Sisi et al. used a CNN model, combining encoded emotional sequence features with traditional word embedding features for email sentiment classification [15]. Arbane et al. used Bi-LSTM to reveal various issues related to COVID-19 public opinion, aiming to understand people’s concerns during the pandemic [16]. Pota et al. used the BERT model to evaluate the impact of tweet preprocessing operations on sentiment analysis performance. They considered available data in two languages (English and Italian) to assess language dependency [17].He et al. proposed a BERT-CNN-BiLSTM-Att hybrid model for text sentiment analysis, addressing issues of ambiguity and feature extraction in the sentiment analysis process [18].

(2) Research on Topic Mining Methods

Topic mining, as an important technique in natural language processing, aims to discover hidden semantic structures and thematic information from text data. The traditional topic model, Latent Dirichlet Allocation (LDA), proposed by Blei et al., models the distribution of vocabulary over topics using probabilistic distributions. It has been widely applied to thematic analysis and document summarization tasks [19]. Zhao et al. used word frequency statistics and LDA methods to identify key terms related to tourism in Nanjing, thereby promoting tourism development [20]. Uthirapathy et al. used the LDA method to identify topics related to climate change in an existing Twitter dataset of public discussions [21]. Yoo et al. utilized LDA and Word2Vec algorithms to extract papers related to specific keywords from research on COVID-19 and identified detailed topics [22].

In summary, there are some limitations in existing research. Current mainstream topic mining methods mainly rely on LDA models and Word2Vec technology. However, these methods have limited capabilities in understanding complex semantic relationships, particularly when dealing with unstructured social media text, where capturing deep semantic information is challenging. Although deep learning-based methods provide various metrics to evaluate model performance, they often require cumbersome manual annotation, which is a massive engineering task for large-scale data and carries a significant degree of subjectivity. Additionally, relying solely on models for sentiment analysis lacks theoretical support and has lower credibility.

Therefore, this study develops a public opinion topic analysis framework based on the two-dimensional theory of emotion and lifecycle theory, using the Top2Vec topic mining method. On the other hand, it combines a sentiment dictionary with the RoBERTa model to perform sentiment polarity analysis on public opinion comments. The sentiment dictionary is used to calculate sentiment values and perform initial sentiment classification, while the RoBERTa model is used to evaluate the accuracy of sentiment classification.

3. Materials and Methods

3.1. Research Framework

This study uses the "Zibo Barbecue Incident" as a case study, selecting comment data from videos related to the event on the BiliBili platform to construct a text dataset. It proposes a public opinion evolution research method based on the two-dimensional theory of emotion and the Top2Vec-RoBERTa model. The overall research framework is illustrated in Figure 1 and includes five stages: data collection and preprocessing, division of the public opinion dissemination cycle, sentiment analysis, topic extraction, and public opinion evolution analysis.

The first part involves crawling comments from videos related to the "Zibo Barbecue Incident" on the BiliBili platform. The collected text is preprocessed by removing irrelevant comments, deleting duplicates, and eliminating emojis. The jieba segmentation tool and a custom vocabulary are used to segment Chinese sentences, with a stopword list from Sichuan University applied to obtain the text dataset.

The second part divides the specific stages of public opinion based on changes in the volume of comments over time, according to the lifecycle method.

The third part, a custom sentiment dictionary is constructed based on the Dalian University of Technology Sentiment Dictionary. Sentiment values are calculated for the comment corpus related to key figures in the public opinion, and sentiment polarity is annotated. The RoBERTa model is then used to evaluate the sentiment classification performance of the dictionary, resulting in the identification of public emotional attitudes.

The fourth part involves categorizing sentiment values into valence and arousal based on the two-dimensional theory of emotion and using the Top2Vec model to identify topics within these dimensions.

The fifth part includes analyzing sentiment mean evolution, two-dimensional topic analysis, and comment popularity evolution to perform a comprehensive analysis of public opinion evolution.

3.2. Research Methods

3.2.1. Data Pre-Processing Methods

Text data preprocessing includes removing duplicate and irrelevant texts, eliminating emojis, tokenization, and removing stopwords. Words are the smallest semantic units in text, and the accuracy of text segmentation directly impacts the results of sentiment classification. By comparing the segmentation results with the stopword list and removing stopwords, the complexity of subsequent calculations is reduced, and the performance of classification predictions is improved. The completeness of the vocabulary affects the results of tokenization.

3.2.2. Two-Dimensional Theory of Emotion

The Two-Dimensional Theory of Emotion was proposed by Russell [23], is an important model in emotional psychology, as shown in Figure 2. This theory suggests that emotional states can be described using two fundamental dimensions: Valence and Arousal. Valence represents the positive or negative polarity of an emotion, ranging from extreme pleasure (such as joy and happiness) to extreme displeasure (such as sadness and anger). Arousal indicates the level of activation of an emotion, ranging from high activation (such as excitement and anger) to low activation (such as calmness and fatigue).

In sentiment analysis, the Two-Dimensional Theory of Emotion provides an effective method to refine emotional categories. By analyzing the valence and arousal scores of emotional vocabulary in the text, we can more accurately assess the emotional state of the text. For example, although "happiness" and "anger" are both high-arousal emotions, the former has a positive valence, while the latter has a negative valence. In this way, the Two-Dimensional Theory of Emotion not only helps to identify the basic polarity of emotions (positive or negative) but also provides insights into the specific nature and intensity of emotions.

3.2.3. Top2Vec

Top2Vec is an algorithm for topic modeling and semantic search that automatically detects text topics and generates vector representations through joint embedding, dimensionality reduction, and clustering. The algorithm uses techniques such as Doc2Vec to create joint embeddings of documents and words. It then applies UMAP for dimensionality reduction to identify dense regions and uses HDBSCAN for clustering to compute topic vectors, with topic words determined based on word vectors.

Compared to traditional LDA, Top2Vec automatically discovers the number of topics through HDBSCAN and does not require a stopword list. In model training, Top2Vec requires minimal human intervention beyond selecting parameters during the process [24].

3.2.4. Sentiment Polarity Recognition Based on Sentiment Dictionary and RoBERTa

Constructing the Sentiment Dictionary

By using a sentiment dictionary to calculate the sentiment values of comments, the time cost of manual annotation can be reduced. This study uses the Dalian University of Technology Sentiment Dictionary as the foundational dictionary. Based on Ekman’s model, this dictionary refines positive emotions by incorporating a sentiment category of "good" and categorizes emotions into 7 major categories and 21 subcategories, with emotion intensity graded on a scale of 1, 3, 5, 7, and 9. There are 7 types of parts of speech: noun, verb, adjective, adverb, network words, idioms, and prepositional phrases. The sentiment values are calculated using this dictionary to determine sentiment polarity. The format of the sentiment vocabulary is generally as shown in Table 1, with specific emotion classifications detailed in Table 2. One emotional word may correspond to multiple emotions.

RoBERTa Sentiment Polarity Classification

RoBERTa (Robustly optimized BERT approach) is an improved version of BERT (Bidirectional Encoder Representations from Transformers), as shown in Figure 3. It has been pre-trained on a large-scale corpus, learning rich language representations.

In this study, the RoBERTa model is fine-tuned for the task of sentiment polarity recognition. The main feature of RoBERTa is its bidirectional attention mechanism, which allows the model to consider both the preceding and following context of a word, enhancing its understanding of the context.

4. Results

4.1. Data Collection and Pre-Processing

This study employs Python web scraping to collect comment texts related to the "Zibo Barbecue Incident" from the BiliBili platform. The dataset, using keywords "Zibo Explosion" and "Zibo Barbecue," includes comments from content on BiliBili between "2023-03-22" and "2023-07-01." The study uses the Jieba segmentation tool, matching the text with words from a custom vocabulary. A stopword list is used to filter out stopwords. After removing duplicate data and meaningless texts, a total of 17873 valid comments are obtained.

4.2. Lifecycle Classification

The distribution of public opinion data over time is shown in Figure 4. Using the public opinion evolution cycle classification method, the data for the "Zibo Barbecue Incident" is divided into four stages: the Initiation Stage (from "2023-03-22 to 2023-04-07"), the Outbreak Stage (from "2023-04-08 to 2023-04-11"), the Decline Stage (from "2023-04-12 to 2023-05-06"), and the Resolution Stage (from "2023-05-07 to 2023-07-01").

4.3. Sentiment Analysis

4.3.1. Sentiment Calculation

In this study, sentiment scores and emotions for each text are calculated based on the vocabulary and corresponding emotion labels and scores from the Dalian University of Technology sentiment dictionary. First, each text is segmented into individual words using the jieba tokenizer, and the segmented results are matched with the vocabulary in the sentiment dictionary to extract all matching sentiment words. Then, based on the matched sentiment words and their corresponding scores, the total sentiment score and the number of corresponding emotions for each text are calculated. An example of the calculation results is shown in Table 3.

4.3.2. Roberta Model Sentiment Classification

This study uses the RoBERTa model to evaluate the accuracy of sentiment labels and further classify the comments annotated with the sentiment dictionary. The experimental environment for the research includes Windows, Jupyter Notebook as the development environment, Python 3.8 as the programming language, and TensorFlow 2.10.0 as the deep learning framework.

(1) Dataset Splitting: From the corpus, 20% is randomly selected as the test set. Then, 20% is randomly selected from the remaining 80% of the dataset to form the validation set, with the remaining portion used as the training set. Overall, the dataset is divided into test, validation, and training sets in a ratio of 0.2:0.16:0.64.

(2) Model Training and Validation: In the RoBERTa model, input text is tokenized and encoded into a format understandable by the model, converting the text into a series of integers, each corresponding to a word or subword in the RoBERTa vocabulary. Special tokens, such as [CLS] and [SEP], are then added to the tokenized representation to ensure the model correctly processes the input. An attention mask value is generated for each token. The data is then input into the model for training. After six epochs, the model began to overfit, so the number of training epochs was set to 6 for this study. The curves showing the accuracy and loss values for the training and validation sets over the training epochs are illustrated in Figure 5.

As shown in Figure 5, after training, the RoBERTa model achieved an accuracy of 98.67% on the validation set and 98.46% on the test set, indicating good model fitting performance. The method of annotating comment sentiment values based on the dictionary achieved high accuracy, demonstrating significant feasibility and practical value.

4.3.3. Two-Dimensional Emotion Analysis

Based on the lifecycle and emotional valence, "Happy," "Good," and "Surprise" are categorized as high valence emotions, while "Sadness," "Anger," "Disgust," and "Fear" are categorized as low valence emotions. The results are shown in Table 4 and Figure 6.

From the valence dimension, it can be observed that in the Initiation Stage, the proportion of high-valence and low-valence emotions in public comments is relatively low. This suggests limited emotional feedback, likely due to the event being in its early stages and attracting less public attention. During the Outbreak Stage, there is a significant increase in both high-valence and low-valence emotions, with proportions being nearly equal. This reflects a vigorous reaction and diverse public sentiment as the controversy intensifies. In the Decline Stage, both high-valence and low-valence emotions show a similar but lower proportion, indicating weakened emotional responses as the event wanes. In the Resolution Stage, high-valence emotions slightly surpass low-valence emotions. Although overall emotional feedback remains balanced, positive comments slightly dominate, which may indicate that the public feels relatively satisfied with the resolution of the issue.

Based on emotional arousal, "Good," "Sadness," and "Disgust" are categorized as low arousal emotions, while "Happy," "Anger," "Surprise," and "Fear" are categorized as high arousal emotions. The results are shown in Table 5 and Figure 7.

During the Initiation Stage, the proportion of high arousal comments is 3.39%, while low arousal comments account for 5.39%, indicating a relatively calm public emotional response at this time. During the Outbreak Stage, the proportion of high arousal comments rises significantly to 36.29%, while low arousal comments are 37.86%, showing that the event triggered intense public attention and emotional reactions. In the Decline Stage, the proportion of high arousal comments further increases to 52.89%, while low arousal comments account for 50.64%, indicating that despite the event gradually fading, public emotions remain highly agitated. Finally, in the Resolution Stage, the proportion of high arousal comments drops to 7.43%, with low arousal comments at 6.11%, reflecting a significant decrease in emotional arousal after the event’s resolution, with comments becoming calmer.

4.3.4. Evolution of Sentiment Mean

Figure 8 shows the evolution of the average sentiment values of public comments on the same date throughout the public opinion period. Significant fluctuations in sentiment values are observed between different dates. During the Resolution Stage, the number of comments decreases sharply, with some dates having only a few comments. In such cases, the sentiment values calculated from a small number of comments may cause extreme fluctuations in the results. Therefore, sentiment values from the Resolution Stage are excluded from the analysis, focusing only on data from stages with a higher volume of comments.

By analyzing the sentiment means during the high-comment volume stages, we can more clearly capture the evolution of public sentiment throughout the public opinion event. Significant fluctuations in sentiment are closely related to specific points in time. In the initiation stage of the public opinion, due to the limited understanding of the event, comments exhibit considerable diversity, leading to noticeable positive and negative fluctuations in sentiment means. These polarized comments reflect public emotional uncertainty and incomplete information at the early stage of the event. As time progresses and more information is disclosed, public understanding of the event deepens. During the outbreak stage of the public opinion, the overall sentiment mean reaches its peak. This corresponds to the public’s positive feedback on Zibo barbecue after the pandemic ended and they experienced it firsthand. The gradual stabilization of sentiment reflects the diminishing impact of the event.

4.4. Topic Analysis

Based on the Two-Dimensional Theory of Emotion, the study categorizes texts according to valence and arousal dimensions and uses Top2Vec for topic extraction. The distribution of topics and keywords under these two dimensions is shown in Table 6.

From the perspective of valence, Topic 1 illustrates a pleasant dining experience with an overall positive emotional inclination. Topic 2 includes both positive emotions such as "honest" and "reassured" as well as negative emotions like "short weight," resulting in a mixed emotional tendency. Topic 3 involves positive experiences related to the city and marketing, with an overall positive emotional inclination. Topic 4 encompasses both positive aspects such as "harmonious governance" and negative aspects like "deceitful," resulting in a more complex emotional tone. This indicates that the quality of dining experiences and market management significantly affects the public’s emotional experience from positive to negative.

From the perspective of arousal, Topic 1 mainly involves novel and special experiences with moderate emotional arousal, displaying a certain level of excitement. Topic 2 includes elements of surprise and astonishment, with higher arousal that may provoke stronger emotional reactions. Topic 3 has lower emotional arousal, showing a more calm emotion. Topic 4 primarily describes local characteristics and stable experiences, also with low arousal, conveying a sense of calm and satisfaction. This suggests that integrity and pricing significantly influence public emotional responses, highlighting the importance of better serving public needs.

From this, we can conclude that public emotional experiences are diverse, influenced by factors such as dining out and market management. Novel and special experiences can enhance positive emotions, while integrity and pricing have significant impacts on emotional responses, emphasizing the importance of maintaining freshness and integrity. Despite an overall positive emotional tendency, issues in market management still provoke negative emotions, indicating a need for further supervision and management. Additionally, stable and reliable experiences convey calm and satisfaction, showing that stability and reliability are key factors in improving public satisfaction. Therefore, focusing on diverse factors, particularly novelty, integrity, fairness, and stability, is crucial for enhancing public emotional experiences and overall well-being.

4.5. Public Opinion Evolution Analysis

The details of the "Zibo Barbecue Incident" are shown in Table 7. Due to the pandemic, students from Shandong University were quarantined at home in Zibo. During this period, the local government warmly hosted them for free and arranged a barbecue for them before their departure. This event added warmth and human touch to the image of Zibo city and marked the initiation of the incident, with relatively low public attention at this time.

In the Initiation Stage, public sentiment began to fluctuate gradually. This corresponds to the topic of "students group visiting Zibo for barbecue" gaining traction on social media starting April 5th. The extensive discussions and shares about Zibo barbecue on social platforms rapidly increased the event’s popularity, causing a surge in public attention. However, the public’s understanding was insufficient, and sentiment was mixed. In the Outbreak Stage, public sentiment was highly positive, corresponding to April 8th to April 10th, 2023. The incident gained traction due to the confirmed integrity of local businesses and the release of several favorable policies by local authorities, making it a hot topic on social media. Although the event remained a topic of discussion, attention began to decline gradually, and sentiment levels stabilized, indicating that public emotional responses had become more stable.

Keywords in the public opinion themes such as "political stability," "honest," "reassuring," and "dishonest" indicate the public’s focus on policies and businesses. In fact, the confirmation of local favorable policies and conscientious businesses resulted in positive public sentiment regarding the "Zibo Barbecue Incident." Overall, conscientious businesses and positive government policies contribute to favorable evaluations and development in the local area.

5. Conclusions

This study proposes a method for analyzing the evolution of public sentiment based on the Two-Dimensional Theory of Emotion and the Top2Vec-RoBERTa model, incorporating a sentiment analysis approach that combines sentiment dictionaries with deep learning techniques. By integrating these two methods, sentiment analysis results regarding the central figures in public opinion are obtained. Using the "Zibo Barbecue Incident" as a case study, 17873 comments from BiliBili videos related to the event were collected as samples. The sentiment of these comments was annotated and analyzed using the Dalian University of Technology sentiment dictionary and the RoBERTa model, which reduced the workload of manual annotation. Top2Vec, combined with the Two-Dimensional Theory of Emotion, was used to analyze changes in emotional states from both the valence and arousal dimensions, providing a more comprehensive understanding of the emotional dynamics throughout the development of public opinion. Under the RoBERTa model, the accuracy of the sentiment classification was evaluated using accuracy metrics, achieving an accuracy rate of 98.46% on the test set. The analysis of sentiment mean evolution, two-dimensional topic analysis, and comment popularity evolution provided deeper insights and solutions for related issues. The limitation of this study is that it did not consider the understanding of emojis during sentiment value calculation using the sentiment dictionary. Future research will further consider more granular sentiment classification to improve sentiment analysis accuracy.

Author Contributions

Writing—Original Draft Preparation: Hongjiu Liu; Supervision: Hongjiu Liu; Funding Acquisition: Hongjiu Liu; Investigation: Yanrong Hu; Formal Analysis: Yanrong Hu; Resources: Shaowen Wang; Methodology: Shaowen Wang; Validation: Shaowen Wang; Data Curation: Qingyang Liu. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Humanity and Social Science Foundation of Ministry of Education of China (No. 18YJA630037,21YJA630054).

Institutional Review Board Statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Data Availability Statement

All computer code used in this study is available at the GitHub repository (https://github.com/w164186/Mycode.git)

Conflicts of Interest

The authors declare that there is no conflict of interests regarding the publication of our paper "Public Opinion Evolution Based on the Two-Dimensional Theory of Emotion and Top2Vec-RoBERTa".

References

Wang, J.; Zhang, X.; Liu, W.; Li, P. Spatiotemporal pattern evolution and influencing factors of online public opinion——Evidence from the early-stage of COVID-19 in China. Heliyon 2023, 9. [Google Scholar] [CrossRef] [PubMed]
Ren, S.; Gong, C.; Zhang, C.; Li, C. Public opinion communication mechanism of public health emergencies in Weibo: take the COVID-19 epidemic as an example. Frontiers in Public Health 2023, 11, 1276083. [Google Scholar] [CrossRef]
Xie, Q.; Han, Q.; Chen, D. Analysis of Sports Popular Trend Based on Public Opinion Mining of New Media. Mathematical Problems in Engineering 2022, 2022, 9144231. [Google Scholar] [CrossRef]
Zhang, C.; Ma, N.; Sun, G. Using Grounded Theory to Identify Online Public Opinion in China to Improve Risk Management—The Case of COVID-19. International Journal of Environmental Research and Public Health 2022, 19, 14754. [Google Scholar] [CrossRef] [PubMed]
Xu, B.; Liu, Y. The role of big data in network public opinion within the colleges and universities. Soft Computing 2022, 26, 10853–10862. [Google Scholar] [CrossRef]
Smitha, E.; Sendhilkumar, S.; Mahalakshmi, G. Intelligence system for sentiment classification with deep topic embedding using N-gram based topic modeling. Journal of Intelligent & Fuzzy Systems 2023, 45, 1539–1565. [Google Scholar]
Zhang, S.; Wei, Z.; Wang, Y.; Liao, T. Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary. Future Generation Computer Systems 2018, 81, 395–403. [Google Scholar] [CrossRef]
Nie, R.x.; Tian, Z.p.; Wang, J.q.; Chin, K.S. Hotel selection driven by online textual reviews: Applying a semantic partitioned sentiment dictionary and evidence theory. International Journal of Hospitality Management 2020, 88, 102495. [Google Scholar] [CrossRef]
Liu, H.; Chen, X.; Liu, X. A study of the application of weight distributing method combining sentiment dictionary and TF-IDF for text sentiment analysis. IEEE Access 2022, 10, 32280–32289. [Google Scholar] [CrossRef]
Stefanis, C.; Giorgi, E.; Kalentzis, K.; Tselemponis, A.; Nena, E.; Tsigalou, C.; Kontogiorgis, C.; Kourkoutas, Y.; Chatzak, E.; Dokas, I.; et al. Sentiment analysis of epidemiological surveillance reports on COVID-19 in Greece using machine learning models. Frontiers in Public Health 2023, 11, 1191730. [Google Scholar] [CrossRef]
Rahman, H.; Tariq, J.; Masood, M.A.; Subahi, A.F.; Khalaf, O.I.; Alotaibi, Y. Multi-tier sentiment analysis of social media text using supervised machine learning. Comput. Mater. Contin 2023, 74, 5527–5543. [Google Scholar] [CrossRef]
Hokijuliandy, E.; Napitupulu, H.; Firdaniza. Application of SVM and Chi-Square Feature Selection for Sentiment Analysis of Indonesia’s National Health Insurance Mobile Application. Mathematics 2023, 11, 3765. [Google Scholar] [CrossRef]
Alrashidi, M.; Selamat, A.; Ibrahim, R.; Fujita, H. Social Recommender System Based on CNN Incorporating Tagging and Contextual Features. Journal of Cases on Information Technology (JCIT) 2024, 26, 1–20. [Google Scholar] [CrossRef]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Liu, S.; Lee, I. Sequence encoding incorporated CNN model for Email document sentiment classification. Applied Soft Computing 2021, 102, 107104. [Google Scholar] [CrossRef]
Arbane, M.; Benlamri, R.; Brik, Y.; Alahmar, A.D. Social media-based COVID-19 sentiment classification model using Bi-LSTM. Expert Systems with Applications 2023, 212, 118710. [Google Scholar] [CrossRef] [PubMed]
Pota, M.; Ventura, M.; Fujita, H.; Esposito, M. Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets. Expert Systems with Applications 2021, 181, 115119. [Google Scholar] [CrossRef]
He, A.; Abisado, M. Text Sentiment Analysis of Douban Film Short Comments Based on BERT-CNN-BiLSTM-Att Model. IEEE Access 2024. [Google Scholar] [CrossRef]
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. Journal of machine Learning research 2003, 3, 993–1022. [Google Scholar]
Zhao, N.; Fan, G.; Qi, Z.; Shi, J. Exploring the current situation of cultural tourism scenic spots based on LDA model——Take Nanjing, Jiangsu Province, China as an example. Procedia Computer Science 2023, 221, 826–832. [Google Scholar] [CrossRef]
Uthirapathy, S.E.; Sandanam, D. Topic Modelling and Opinion Analysis On Climate Change Twitter Data Using LDA And BERT Model. Procedia Computer Science 2023, 218, 908–917. [Google Scholar] [CrossRef]
Yoo, S.y.; Lim, G.g. A study on the classification of research topics based on COVID-19 academic research using Topic modeling. Journal of Intelligence and Information Systems 2022, 28, 155–174. [Google Scholar]
Russell, J.A. A circumplex model of affect. Journal of personality and social psychology 1980, 39, 1161. [Google Scholar] [CrossRef]
Ghasiya, P.; Okamura, K. Investigating COVID-19 news across four nations: A topic modeling and sentiment analysis approach. Ieee Access 2021, 9, 36645–36656. [Google Scholar] [CrossRef]

Figure 1. Research Framework.

Figure 2. Conceptual Model of the Two-Dimensional Theory of Emotion.

Figure 3. RoBERTa Model.

Figure 4. Distribution of Public Opinion Data.

Figure 5. Changes in Loss and Accuracy over Model Training Epochs.

Figure 6. Valence Public Emotion Analysis.

Figure 7. Arousal Public Emotion Analysis.

Figure 8. Evolution of Sentiment Mean.

Table 1. Sentiment Vocabulary Ontology.

Word	Part of Speech	Number of Meanings	Meaning Sequence	Emotion Category	Intensity	Polarity
Fearless	idiom	1	1	PH	7	1
Cash-strapped	idiom	1	1	NE	7	0
Tight Thoughtful	adj	1	1	PH	5	1
Exaggeration	idiom	1	1	NN	5	2

Table 2. Classification of Sentiment Words.

Emotion Category	Emotion Type	Example Words
Happy	Joy (PA)	Joyful, happy, smiling, overjoyed
	Calm (PE)	Secure, relieved, at ease, calm and untroubled
Good	Respect (PD)	Admire, respect, salute, revere
	Praise (PH)	Heroic, excellent, distinguished, praiseworthy
	Trust (PG)	Trust, rely on, believe, be confident
	Love (PB)	Fondness, beloved, love, cherish
	Wish (PK)	Wish, desire, hope for, long for
Anger	Anger (NA)	Angry, outraged, furious, enraged
Sadness	Sad (NB)	Sad, sorrowful, heartbroken, grief-stricken
	Despair (NJ)	Desperate, hopeless, desolate, devastated
	Regret (NH)	Regret, remorse, guilt, sorrow
	Pity (PF)	Pity, sympathy, compassion, sorrow
Fear	Anxiety (NI)	Anxious, uneasy, apprehensive, jittery
	Fear (NC)	Fearful, scared, afraid, terrified
	Shame (NG)	Shameful, disgraced, humiliated, mortified
Disgust	Disgust (NE)	Disgusted, repulsed, sickened, revolted
	Hate (ND)	Hatred, loathing, abhorrence, aversion
	Contempt (NN)	Contemptuous, scornful, disdainful, sneering
	Jealousy (NK)	Jealous, envious, resentful, covetous
	Regret (NL)	Regretful, remorseful, sorry, rueful
Surprise	Regret (NL)	Astonished, amazed, surprised, shocked

Table 3. Sentiment Dictionary Calculation Results.

Time	Content	anger	disgust	fear	sadness	surprise	good	happy	label
2023-04-08	That’s a real good businessman.	0	0	0	0	0	1	0	1
2023-04-08	Why bother others when you’re so old? If you want to eat, rob the tourists yourself.	0	1	0	0	0	0	0	-1
2023-04-08	I brought a scale in Shandong.I’m sorry.I brought a scale in Chengdu.I’m sorry.	0	0	0	2	0	0	0	0
2023-04-08	The Art of Speaking	0	0	0	0	0	0	0	0
2023-04-08	I’m a bit touched and proud of Zibo,but I’m touched and proud of the most normal and desirable thing I’ve ever seen.	0	0	0	0	0	0	2	1

Table 4. Valence Analysis of Public Sentiment.

	Initiation Stage	Outbreak Stage	Decline Stage	Resolution Stage
High Valence(num)	666	4632	6364	783
Percentage(%)	5.35%	37.22%	51.14%	6.29%
Low Valence(num)	218	1878	2451	305
Percentage(%)	4.49%	38.71%	50.52%	6.28%

Table 5. Arousal Analysis of Public Sentiment.

	Initiation Stage	Outbreak Stage	Decline Stage	Resolution Stage
High Arousal(num)	83	889	1296	182
Percentage(%)	3.39%	36.29%	52.89%	7.43%
Low Arousal(num)	801	5621	7519	906
Percentage(%)	5.39%	37.86%	50.64%	6.11%

Table 6. Distribution of Topics and Keywords in Two Dimensions.

	Valence	Arousal
Topic1	Taste,Northeast,Grocery Shopping,	Classmates,Special,First Time,Express,Feelings
	Going Out,Small Cakes,Atmosphere
Topic2	Business,Honest,Short Weight,Dining,Assured	Grocery Shopping,Unexpectedly,
		Integrity,Prices,exceeded expectations
Topic3	Observing,Marketing,Qingdao,	Taxi,Two,Jinan,Hotel,Remember
	Special,Affirmative,Remember
Topic4	Feelings,Harmonious Governance,	Qingdao,Weifang,Yantai,Experience,
	Deceitful,Customers,Market Supervision	Harmonious Governance,Simple and Honest

Table 7. The Process of the Zibo Barbecue Incident.

Time	Incidents
2022-05	During the home quarantine incident in Zibo involving Shandong University students, the local government warmly hosted them for free. Before leaving, they were treated to a barbecue, and they agreed to meet again in Zibo for barbecue when the spring arrives.
2023-04-05	With the public largely in a state of recovery from COVID-19, the correct way to enjoy Zibo barbecue has started to spread. The topic of ’college students organizing trips to Zibo for barbecue’ has gradually gained popularity on online platforms.
2023-04-08	A popular Douyin influencer tested the fairness of scales in Zibo and found that no store was shortchanging customers. The genuine and honest quality of Zibo locals once again brought Zibo barbecue into the spotlight. A wealth of user-generated content continues to be produced on various online platforms, generating high levels of engagement.
2023-04-10	The city of Zibo held a special press conference for barbecue, launched 24 dedicated high-speed train services for weekend round trips, and introduced 21 new customized barbecue bus routes as part of a series of comprehensive services. Consequently, ’Zibo Barbecue’ continuously trended on major platforms, and both ’Zibo Barbecue’ and the city of ’Zibo’ became the latest internet sensations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.