2. Related Research
Compared to traditional public opinion, online public opinion relies on major information exchange platforms, offering a broader reach that can connect with groups across different regions, age groups, professions, and interests. Traditional public opinion is limited by geography and media channels, whereas online public opinion transcends these limitations, often exerting a more extensive influence [
4]. Analyzing comment data on social media can help us understand public sentiment and the patterns of public opinion evolution [
5]. In recent years, sentiment classification and topic mining have been widely applied and extensively researched in this field [
6].
(1) Research on Sentiment Classification Methods
Sentiment classification methods can be categorized into fine-grained and coarse-grained approaches.Fine-grained analysis involves sentiment classification methods based on emotional polarity and intensity using sentiment dictionaries. Sentiment classification methods based on sentiment dictionaries use pre-constructed lexicons to determine the sentiment orientation of the text by matching and analyzing the emotional vocabulary within the text. Constructing a sentiment dictionary involves filtering and categorizing vocabulary to create a lexicon that accurately reflects emotional nuances. Zhang et al. expanded sentiment dictionaries by extracting and constructing related dictionaries, such as network terminology dictionaries and negation dictionaries, to enhance topic monitoring on Weibo [
7]. Nie et al. proposed a method that combines semantic mapping functions with dictionary construction to capture the rich emotions hidden in hotel review texts [
8]. Liu et al. combined sentiment dictionaries with pre-trained word embeddings and used TF-IDF values for weighting. By calculating the weights of sentiment words and neutral words separately and highlighting the role of sentiment words in sentence vectors, they improved the accuracy of sentiment analysis [
9].
Coarse-grained text sentiment analysis methods involve using machine learning or deep learning techniques to classify the overall sentiment of the entire text.
Machine learning-based sentiment classification methods rely on training large-scale labeled datasets to learn patterns and features of emotional expression. By extracting text features and applying classifiers, these methods identify sentiments within the text. Stefanis et al. explored the emotions related to daily COVID-19 monitoring reports posted on Facebook pages and used machine learning algorithms to predict sentiment classifications [
10]. Rahman et al. proposed a multilayer classification model that employs supervised machine learning techniques, achieving better recall rates in sentiment classification tasks [
11]. Hokijuliandy et al. used a combination of SVM classification and chi-square feature selection methods for sentiment analysis. Their analysis of user comments revealed the main trends in positive reviews [
12].
Deep learning models use word embedding techniques (such as Word2Vec and GloVe) to simplify feature engineering and capture semantic information. They employ Recurrent Neural Network (RNN) and Long Short-Term Memory network (LSTM) to handle sequential data, and use Convolutional Neural Network (CNN) to capture local features [
13], Incorporate attention mechanisms and Transformer models (such as BERT) to capture global dependencies [
14], and enhance the model’s generalization ability through large-scale pre-training and fine-tuning. Sisi et al. used a CNN model, combining encoded emotional sequence features with traditional word embedding features for email sentiment classification [
15]. Arbane et al. used Bi-LSTM to reveal various issues related to COVID-19 public opinion, aiming to understand people’s concerns during the pandemic [
16]. Pota et al. used the BERT model to evaluate the impact of tweet preprocessing operations on sentiment analysis performance. They considered available data in two languages (English and Italian) to assess language dependency [
17].He et al. proposed a BERT-CNN-BiLSTM-Att hybrid model for text sentiment analysis, addressing issues of ambiguity and feature extraction in the sentiment analysis process [
18].
(2) Research on Topic Mining Methods
Topic mining, as an important technique in natural language processing, aims to discover hidden semantic structures and thematic information from text data. The traditional topic model, Latent Dirichlet Allocation (LDA), proposed by Blei et al., models the distribution of vocabulary over topics using probabilistic distributions. It has been widely applied to thematic analysis and document summarization tasks [
19]. Zhao et al. used word frequency statistics and LDA methods to identify key terms related to tourism in Nanjing, thereby promoting tourism development [
20]. Uthirapathy et al. used the LDA method to identify topics related to climate change in an existing Twitter dataset of public discussions [
21]. Yoo et al. utilized LDA and Word2Vec algorithms to extract papers related to specific keywords from research on COVID-19 and identified detailed topics [
22].
In summary, there are some limitations in existing research. Current mainstream topic mining methods mainly rely on LDA models and Word2Vec technology. However, these methods have limited capabilities in understanding complex semantic relationships, particularly when dealing with unstructured social media text, where capturing deep semantic information is challenging. Although deep learning-based methods provide various metrics to evaluate model performance, they often require cumbersome manual annotation, which is a massive engineering task for large-scale data and carries a significant degree of subjectivity. Additionally, relying solely on models for sentiment analysis lacks theoretical support and has lower credibility.
Therefore, this study develops a public opinion topic analysis framework based on the two-dimensional theory of emotion and lifecycle theory, using the Top2Vec topic mining method. On the other hand, it combines a sentiment dictionary with the RoBERTa model to perform sentiment polarity analysis on public opinion comments. The sentiment dictionary is used to calculate sentiment values and perform initial sentiment classification, while the RoBERTa model is used to evaluate the accuracy of sentiment classification.