4.4. Sentiment Analysis
We then performed a Sentiment Analysis on all comments. The results of this analysis are shown below.
Figure 17 presents a bar plot illustrating the distribution of sentiment in comments made on posts in the r/cryptocurrency subreddit. The sentiment analysis categorizes comments into three distinct groups: negative, neutral, and positive sentiments.
The bar plot shows that the largest proportion of comments are neutral, accounting for 43.65% of the total comments. Neutral comments likely include factual information, questions, and statements that do not convey strong emotional content or opinions. This predominance of neutral sentiment suggests that a significant portion of the community engages in discussions that are informational or inquisitive in nature.
Positive comments make up the second largest category, representing 36.9% of the total. These comments likely express optimism, agreement, enthusiasm, or other positive sentiments about the topics being discussed. The high proportion of positive sentiment indicates a generally favorable or hopeful attitude within the community towards the subjects of their discussions, which could include market trends, technological developments, or other relevant news.
Negative comments comprise 19.3% of the total, indicating that a smaller, yet substantial, portion of the community expresses criticism, disagreement, concern, or other negative sentiments. This segment reflects the presence of skepticism or dissatisfaction within the community, which could be in response to unfavorable market conditions, regulatory news, or other challenges facing the cryptocurrency space.
Overall, the sentiment analysis underscores the diverse range of opinions and attitudes within the r/cryptocurrency subreddit. Monitoring sentiment trends over time can provide valuable insights into shifts in community mood and potential impacts on market behavior.
Figure 18 presents a time series plot showing the distribution of sentiment in comments made on the r/cryptocurrency subreddit from January 1, 2021, to December 31, 2022. The plot illustrates how the proportions of positive, neutral, and negative comments have changed over this two-year period. The time series plot reveals distinct trends in the sentiment of comments corresponding with Bitcoin’s market performance.
Throughout 2021, positive comments were more prevalent, reflecting the generally bullish sentiment in the cryptocurrency market during this period. As Bitcoin’s price surged to new highs, optimism and enthusiasm within the community were high, leading to a greater number of positive comments. However, in 2022, the number of positive comments declined notably. This decrease coincides with the downturn in Bitcoin’s price, suggesting that the market’s bearish trend negatively impacted community sentiment.
On the other hand, negative comments increased slightly in 2022 compared to 2021. This uptick in negative sentiment aligns with the declining market conditions and the reduced price of Bitcoin. As challenges in the cryptocurrency market emerged, including regulatory issues and market crashes, the community’s sentiment shifted towards a more critical and concerned tone.
The proportion of neutral comments increased slightly over the time period, indicating a growing level of factual or unemotional discussions within the subreddit. This slight increase suggests that as the market conditions fluctuated, more community members focused on information sharing and analysis, providing a balanced perspective amidst the prevailing positive and negative sentiments.
4.4.1. Correlations with BTC Market Data
Figure 19 presents an autocorrelation plot depicting the relationship between the BTC closing price and the percentage of positive comments made on posts in the r/cryptocurrency subreddit. The autocorrelation plot reveals a significant positive correlation that spans a wide range of lags. The correlation is particularly notable at lag = -1 day, where it peaks, indicating the strongest relationship between the two variables. This implies that an increase in the percentage of positive comments tends to be followed by a higher BTC closing price the next day, highlighting a near-immediate impact of community sentiment on market performance. The significance of the correlation over a broad range of lags (from -40 to 40 days) indicates a sustained relationship between market performance and community sentiment. This extended period of significant correlation suggests that the community’s positive sentiment is both influenced by and influential to Bitcoin’s price over a longer timeframe.
Figure 20 and
Figure 21 present scatter plots illustrating the relationship between the BTC closing price and the percentage of positive comments made on posts in the r/cryptocurrency subreddit, at lag = -1 and lag = 1, respectively.
The scatter plot in
Figure 20 shows a positive correlation between the BTC closing price and the percentage of positive comments one day prior (Pearson’s
). This suggests that higher percentages of positive comments are associated with higher BTC closing prices the following day. This pattern highlights the predictive power of positive sentiment within the subreddit, where increased optimism and positive discussions often precede a rise in Bitcoin’s market price.
The scatter plot in
Figure 21 shows a positive correlation between the BTC closing price and the percentage of positive comments one day later (Pearson’s
). This suggests that higher BTC closing prices are associated with a greater percentage of positive comments the next day. This relationship indicates that not only does positive sentiment predict higher market prices, but strong market performance also tends to enhance positive community sentiment, creating a feedback loop where market performance and sentiment mutually reinforce each other.
We also performed similar autocorrelations between the BTC closing price and the percentage of neutral (
Figure 22) and negative comments (
Figure 23). Autocorrelations showed significant correlations in both cases. The correlations in
Figure 22 peak at
(Pearson’s
), while those in
Figure 23 peak at
(Pearson’s
). While both neutral and negative sentiments show significant negative correlations with Bitcoin’s price, the correlations are more pronounced when using the percentage of positive comments. This highlights the stronger influence of positive sentiment on Bitcoin’s market performance compared to neutral or negative sentiments. Positive sentiment tends to have a more immediate and robust impact on driving the price up, whereas neutral and negative sentiments are associated with subsequent declines in the price.
To investigate the relationship between BTC volume and the sentiment of user comments in the r/cryptocurrency subreddit, we performed the autocorrelations shown in
Figure 24,
Figure 25 and
Figure 26. These show the autocorrelations between BTC volume and positive comments, neutral comments and negative comments respectively.
As with the case of the autocorrelations between BTC price and sentiment, BTC volume exhibits a significant positive correlation with the percentage of positive comments, which peak at (Pearson’s ). It also shows significant negative correlations with the percentage of neutral comments, which peak at (Pearson’s ) as well as with the percentage of negative comments, which peak at (Pearson’s ). The correlations for neutral and negative comments with BTC volume are very similar, although they are slightly more pronounced when using neutral comments. Overall, the correlations are more pronounced when correlating BTC volume with the percentage of positive comments.
Figure 27 and
Figure 28 present scatterplots illustrating the correlations between BTC trading volume and the percentage of positive comments made on the r/cryptocurrency subreddit at different time lags.
Figure 27 shows the relationship between BTC trading volume and the percentage of positive comments made six days earlier (Pearson’s
). This lagged effect implies that positive sentiment in the community can be a leading indicator of future trading activity.
Figure 28 shows the relationship between BTC trading volume and the percentage of positive comments made six days later (Pearson’s
). This indicates that significant trading activity can subsequently influence community sentiment, leading to increased positivity. In both figures, a notable outlier is present. This is caused by the same unusually high BTC trading volume that occured in February 26, 2021 that is also shown in
Figure 15 and
Figure 16.
4.5. Topic Modelling
In order to effectively perform Latent Dirichlet Allocation (LDA) for topic modeling, it is essential to select the appropriate number of topics [
28]. The choice of the number of topics significantly impacts the coherence and interpretability of the resulting topics. To determine the optimal number of topics, we employed several evaluation metrics and visualized the results through a perplexity plot and a combined metrics plot.
Figure 29 presents a perplexity plot, which is a common metric used to evaluate the performance of topic models. The plot shows a distinct knee at 4 topics, suggesting that this number provides a good balance between model complexity and generalization capability. The knee point in the perplexity curve is a strong indicator that increasing the number of topics beyond this point does not substantially improve the model’s performance.
Figure 30 illustrates the results of the CaoJuan2008, Arun2010, Griffiths2004, and Deveaud2014 metrics. All four metrics indicate that 4 topics provide a good compromise, balancing the need for distinct, coherent topics with the overall model performance. Therefore, by choosing 4 topics, we ensure that the model achieves a good balance between complexity and interpretability, facilitating meaningful analysis and interpretation of the subreddit comments.
Figure 31 presents a plot illustrating the beta values for the top ten terms in each of the four topics identified through Latent Dirichlet Allocation (LDA) modeling. The beta value, also known as the term-topic probability, indicates the probability of a term belonging to a specific topic. Higher beta values suggest a stronger association between the term and the topic.
The first topic is characterized mainly by the "scam", "dip", "pump" and "cash" terms. A detailed interpretation of each key term and their collective significance follows:
scam: The presence of this term with the highest beta value indicates that discussions often involve concerns about fraudulent activities in the cryptocurrency market. This term suggests that the community is vigilant about identifying and discussing potential scams.
dip: This term refers to a temporary decline in cryptocurrency prices. Its prominence suggests that community members frequently discuss price fluctuations and strategies for navigating market downturns.
pump: The term "pump" is associated with rapid increases in asset prices, often as a result of coordinated efforts. Discussions around "pump" suggest a focus on market manipulation tactics and their impacts.
cash: This term could refer to liquid assets or fiat currency in the context of cryptocurrency trading. Its inclusion indicates discussions about liquidity, cashing out, or converting crypto to cash.
fund: This term suggests topics related to investment funds, funding sources, or financial backing within the cryptocurrency space. It highlights conversations about financial strategies and investment opportunities.
elon: The presence of Elon Musk’s first name suggests that his influence on the cryptocurrency market, especially through tweets and public statements, is a significant topic of discussion.
meme: The term "meme" indicates the role of internet culture and humor in cryptocurrency discussions. Memes often reflect market sentiment and can influence trading behavior.
tweet: This term reinforces the influence of social media, particularly Twitter, on market movements. Tweets from influential figures can drive significant changes in market dynamics.
origin: This term may refer to the origin or beginnings of certain cryptocurrencies, projects, or movements within the market. It suggests historical discussions and tracing the roots of market trends.
pull: This term could refer to "rug pulls," a type of scam where developers abandon a project and take investors’ funds, or to pulling out investments. Its inclusion highlights concerns about exit strategies and potential scams.
Topic 1 appears to encapsulate the themes of market manipulation, significant market events, and the influence of key figures and social media on the cryptocurrency market. The terms suggest that the community is highly engaged in discussions about the risks, strategies, and impacts of influential personalities and market tactics. This topic highlights the interplay between social media, market sentiment, and the strategies employed by community members to navigate the volatile cryptocurrency landscape.
The most dominant term in the second topic is "bitcoin", with others having a smaller influence. A detailed interpretation of each key term and their collective significance follows:
bitcoin: As the dominant term with the highest beta value, "bitcoin" indicates that a significant portion of the discussion focuses on Bitcoin, the most well-known and widely discussed cryptocurrency.
doge: The inclusion of "doge" (referring to Dogecoin) suggests that another popular cryptocurrency is a frequent topic of conversation. Dogecoin’s meme origins and its community-driven popularity often make it a subject of interest.
bank: This term points to discussions about the role of traditional banking institutions in the cryptocurrency space. It may involve topics like banks’ interactions with cryptocurrencies, the impact of crypto on banking, or the adoption of blockchain technology by banks.
govern: The presence of this term indicates discussions about government policies, actions, and involvement in the cryptocurrency market. This could include regulatory frameworks, government-backed cryptocurrencies, or geopolitical influences.
countri: This term suggests that discussions often focus on how different countries are approaching cryptocurrencies. Topics may include national regulations, adoption rates, and international differences in crypto policies.
mine: The term "mine" refers to cryptocurrency mining, the process of validating transactions and generating new coins. Discussions may cover mining technologies, environmental impacts, profitability, and geographical distribution of mining operations.
currenc: This term likely represents "currency," highlighting the broader discussion about cryptocurrencies as a form of digital money. This includes debates on their viability as currency, comparison with fiat currencies, and their role in the financial system.
flat: This term is likely a misspelling or abbreviation of "fiat," referring to traditional government-issued currencies. Discussions might compare fiat currencies to cryptocurrencies, covering topics like stability, value, and adoption.
cap: The term "cap" likely refers to market capitalization, a common metric used to assess the value of cryptocurrencies. Discussions may involve the market cap rankings of different cryptocurrencies, trends, and their implications.
regul: Short for "regulation," this term signifies discussions about the regulatory environment surrounding cryptocurrencies. This includes laws, compliance requirements, regulatory challenges, and their impact on the market.
Topic 2 appears to encapsulate themes related to major cryptocurrencies, institutional and governmental involvement, and regulatory issues. The prominent presence of "bitcoin" and "doge" suggests a focus on popular cryptocurrencies, while terms like "bank," "govern," and "regul" highlight the interaction between cryptocurrencies and traditional institutions.
This topic indicates that the community is deeply engaged in understanding the implications of government policies, regulatory frameworks, and the role of traditional financial institutions in the evolving cryptocurrency landscape. Discussions often revolve around the comparison between cryptocurrencies and fiat currencies, the environmental and economic impact of mining, and the influence of market capitalization on investment strategies.
In Topic 3, the terms suggest that it revolves around investment strategies, market sentiment, and specific cryptocurrencies. A detailed interpretation of each key term and their collective significance follows:
long: The term "long" refers to a long-term investment strategy, indicating discussions about holding assets over an extended period to realize gains. This term suggests that a significant portion of the community engages in or discusses long-term investment approaches.
hodl: "Hodl" is a popular term in the cryptocurrency community, derived from a misspelling of "hold." It represents the strategy of holding onto cryptocurrency investments regardless of market volatility. Its presence indicates strong discussions around the hodling philosophy.
bear: This term refers to a bear market, characterized by declining prices. The inclusion of "bear" suggests that the community frequently discusses market downturns and strategies for navigating bearish conditions.
shitcoin: A derogatory term used to describe cryptocurrencies with little to no value or potential. The presence of this term suggests that community members are critical and discerning about the quality and viability of various cryptocurrencies.
space: This term likely refers to the broader cryptocurrency ecosystem or market space. Discussions around "space" may include market trends, developments, and the overall state of the cryptocurrency industry.
risk: The term "risk" highlights discussions about the inherent risks associated with cryptocurrency investments. Topics may include risk management strategies, volatility, and the factors contributing to investment risk.
bull: In contrast to "bear," the term "bull" refers to a bull market, characterized by rising prices. Discussions involving "bull" suggest that the community also focuses on bullish conditions and strategies for capitalizing on upward market trends.
drop: This term indicates price drops or market corrections. Its presence suggests that community members frequently discuss sudden declines in cryptocurrency prices and their implications.
ada: This term likely refers to Cardano’s cryptocurrency (ADA). The inclusion of "ada" indicates that specific cryptocurrencies, particularly Cardano, are a significant topic of discussion within this theme.
bit: Likely referring to Bitcoin or bits as a unit of Bitcoin. The term "bit" suggests discussions about Bitcoin in general or its fractional units.
Topic 3 appears to encapsulate themes related to investment strategies, market sentiment, and specific cryptocurrencies. The emphasis on terms like "long" and "hodl" suggests a strong focus on long-term investment philosophies within the community. The presence of "bear" and "bull" indicates discussions about market conditions and strategies for both bearish and bullish trends. The term "shitcoin" highlights a critical view of less valuable cryptocurrencies, while "risk" points to an awareness of the volatility and uncertainty inherent in the cryptocurrency market. The inclusion of specific cryptocurrencies like "ada" indicates that certain digital assets are particularly prominent in discussions.
In Topic 4, the most common term is "eth", followed by "exchang" and "moon". A detailed interpretation of each key term and their collective significance follows:
eth: The term "eth" (Ethereum) has the highest beta value, indicating that discussions frequently involve Ethereum. This suggests a significant focus on one of the most prominent and influential cryptocurrencies in the market.
exchang: This term likely refers to cryptocurrency exchanges, platforms where users can trade cryptocurrencies. The prominence of this term suggests extensive discussions about exchange-related topics, such as trading strategies, exchange reviews, and transaction experiences.
moon: In the cryptocurrency community, "moon" refers to significant price increases. Discussions involving "moon" suggest that community members are interested in and hopeful for substantial price surges and investment returns.
fee: The term "fee" indicates discussions about transaction costs associated with trading or transferring cryptocurrencies. This can include exchange fees, gas fees on Ethereum, and other costs that impact traders and investors.
nft: Non-fungible tokens (NFTs) are unique digital assets representing ownership of specific items or content. The presence of "nft" suggests that the community is actively discussing this burgeoning sector within the cryptocurrency space.
asset: The term "asset" points to discussions about cryptocurrencies as financial assets. Topics might include asset management, valuation, and the role of different cryptocurrencies in investment portfolios.
coinbas: Likely referring to Coinbase, one of the largest and most popular cryptocurrency exchanges. This term indicates that discussions frequently involve Coinbase, its services, and user experiences.
token: This term refers to various types of cryptocurrency tokens, which can represent assets, utility, or value within specific platforms. Discussions about tokens might cover new token offerings, token performance, and their utility within ecosystems.
predict: The term "predict" suggests discussions about price predictions, market forecasts, and analytical methods used to anticipate future market movements.
secur: Likely referring to "secure" or "security," this term indicates discussions about the security of cryptocurrency assets, exchanges, and transactions. Topics might include best practices for securing assets, security breaches, and regulatory measures.
Topic 4 appears to encapsulate themes related to Ethereum, cryptocurrency exchanges, NFTs, and various aspects of trading and security. The emphasis on "eth" indicates that Ethereum is a central focus within this topic, reflecting its significant role in the cryptocurrency market and its extensive ecosystem.
The term "exchang" highlights the importance of trading platforms and user interactions with these exchanges. "Moon" and "predict" suggest a strong interest in market dynamics, price predictions, and the potential for substantial returns. The inclusion of "fee" points to concerns about transaction costs and their impact on trading activities.
The presence of "nft" signifies active discussions about non-fungible tokens, reflecting their growing popularity and influence. Terms like "asset" and "token" indicate broader discussions about the nature of cryptocurrencies as financial assets and their various applications.
Finally, "coinbas" and "secur" highlight the importance of major exchanges like Coinbase and the critical issue of security within the cryptocurrency space. These discussions are vital for understanding user experiences, investment strategies, and the measures needed to protect assets.
Figure 32 illustrates the distribution of comments among the four identified topics. The percentages represent the proportion of total comments assigned to each topic, offering insights into the relative prominence and engagement levels associated with each thematic area.
The distribution of comments across these four topics provides a comprehensive view of the community’s primary areas of interest and concern. The relatively even distribution, with Topic 1 slightly leading, with 27.3% of the comments assigned to it, indicates a balanced engagement across various critical aspects of the cryptocurrency market. This distribution highlights the multifaceted nature of cryptocurrency discussions, covering market dynamics, investment strategies, regulatory issues, and technological innovations.
Figure 33 illustrates how the distribution of comments across the four identified topics changes over time from January 2021 to December 2022. This temporal analysis highlights the dynamic nature of discussions within the cryptocurrency community and their responsiveness to significant market events.
We can observe that there is a noticeable spike in Topic 1 during July 2021. This period coincides with the significant dip in Bitcoin’s price. The increase in discussions related to market manipulation, significant market events, and influential personalities reflects heightened community concern and interest during this volatile period. Concurrently, there is a sharp decline in Topic 2, which covers major cryptocurrencies, institutional involvement, and regulatory issues. The drop in comments suggests a temporary shift in focus away from regulatory and institutional topics towards immediate market reactions and individual market events. Topic 3, focusing on investment strategies and market sentiment, also shows a decline. This suggests that during periods of significant market downturns, discussions shift more towards immediate market impacts and less on long-term strategies and sentiments.
During the summer of 2022, particularly around June, Topic 1 experiences a dramatic increase, accounting for more than 70% of comments. This period coincides with another significant dip in Bitcoin’s price. The overwhelming focus on Topic 1 indicates intense community engagement with discussions about market manipulation, crashes, and key influencers during this period of poor market performance. All other topics see a significant reduction in comments. The decrease in Topic 2 (major cryptocurrencies and regulatory issues) and Topic 4 (Ethereum, exchanges, NFTs, and security) suggests that during periods of significant market stress, the community’s attention is heavily drawn towards the immediate implications of market crashes and less towards regulatory, institutional, and technical discussions. While Topic 3 generally sees fluctuations, it tends to decrease during significant market downturns (July 2021 and June 2022), indicating a shift away from long-term investment discussions during these periods.