A Hybrid Deep Learning Model to Predict the Impact of COVID-19 on Mental Health form Social Media Big Data

The novel coronavirus disease (COVID-19) pandemic is provoking a prevalent consequence on mental health because of less interaction among people, economic collapse, negativity, fear of losing jobs, and death of the near and dear ones. To express their mental state, people often are using social media as one of the preferred means. Due to reduced outdoor activities, people are spending more time on social media than usual and expressing their emotion of anxiety, fear, and depression. On a daily basis, about 2.5 quintillion bytes of data are generated on social media, analyzing this big data can become an excellent means to evaluate the effect of COVID-19 on mental health. In this work, we have analyzed data from Twitter microblog (tweets) to find out the effect of COVID-19 on peoples mental health with a special focus on depression. We propose a novel pipeline, based on recurrent neural network (in the form of long-short term memory or LSTM) and convolutional neural network, capable of identifying depressive tweets with an accuracy of 99.42%. Preprocessed using various natural language processing techniques, the aim was to find out depressive emotion from these tweets. Analyzing over 571 thousand tweets posted between October 2019 and May 2020 by 482 users, a significant rise in depressing tweets was observed between February and May of 2020, which indicates as an impact of the long ongoing COVID-19 pandemic situation.


Introduction
and Linear Support Vector (LSV) with the LSTM classification algorithm. Almeida et al. [38] proposed an NLP based method for the early detection of mental illness from Reddit posts. They used user posting frequency, different dictionary words (feelings, medicine, drugs, and disease), n-grams, and parts of speech as input features. They found that information retrieval with ML is the best way to predict mental illness. Wolohan et al. [39] have built a corpus of more than twelve thousand Reddit users and performed a lexical and predictive analysis to find the language used by depressed persons. They found significant results using LIWC analysis. Katchapakirin et al. [40] proposed a depression detection tool kit for the Thai people using NLP. They worked with 35 users and their Facebook posts, time of posting, privacy settings, and social interactions were used for this model.
Hu et al. [41] tried to find hidden indicators of depression in social media posts using sentiment analysis. They have used 19,000 posts from PTT (a social media in Taiwan) for this work. Choudhury et al. [42] collected data of Twitter users who are diagnosed with depression by crowd-sourcing and created a model for predicting the risk of depression. They used social engagement, mention of prescribed medicines, ego network, and linguistic style of the user as features. To avoid overfitting, they used principal component analysis (PCA) and used a support vector machine (SVM) classifier with the radial basis function (RBF) kernel for prediction. Coppersmith et al. [43] used Twitter data to classify different mental diseases using NLP. They used linguistics of the Twitter users and tried to detect depression, bi-polar disorder, post-traumatic disorder, and seasonal affective disorders. Guntuku et al. [44] reviewed the works that used social media content for identifying mental illness. These studies predicted depression among social media users and determined the risk of other users falling into depression. Yazdavar et al. [45] tried to find a correlation between medical questionnaires and linguistic patterns in Twitter posts for depressed users. They considered the writing pattern of the user and their duration of symptoms for this study. They used a semi-supervised statistical model based on LDA and achieved good accuracy. Wang et al. [46] tested the performance of different machine learning algorithms for predicting the sentiment from Twitter big data.
Jelodar et al. [47] proposed a LSTM based deep learning (DL) model to automatically extract COVID-19 related posts from popular social media like Reddit. They used topic modeling to reveal unnoticed issues related to COVID-19 using NLP based sentiment analysis. Li et al. [48] proposed a pre-trained deep learning NLP model to classify public sentiment on COVID-19 related posts. They used data of 1000 tweets and tried to find out the cause of fear and sadness. Wolohan [49] predicted depression on Reddit post using deep LSTM and fast text method. Then they showed that the overall depression rates have increased by up to 50% during the time of COVID-19. Nooripour et al. [50] proposed to combine ML methods along with the resilience and hope of a human to infer the stress level from a survey in Iran. They also observed that spiritual well being not necessarily refers to lower stress level.
In this paper, the following contributions were made: • The Twitter data of the sentiment 140 dataset and a scrapped dataset of depressing tweets were used to build a depression detection model. This model was a combination of LSTM and convolutional neural network (CNN), which achieved significant prediction accuracy on testing samples.
pandemic and the previous four months were inquired. A significant increase in the number of depressing posts using this model was observed.
In the next section, the working principle of the LSTM model and the CNN model will be discussed. Section 3 will explain the methodology, and the results will be discussed in section 4. In section 5, we will present our concluding remarks and discuss future goals.
2 Working Principle of LSTM and CNN models 2.1 Recurrent Neural Network (RNN) The traditional neural network does not consider the previous inputs, but sometimes previous information is needed to predict the future. RNN can hold information, and the output of a state can be considered as input to the next state, which is essential for sequential data processing. RNN creates a hidden neural network chain, where the existing network considers the input of its own and the output of the previously hidden layer [51]. It tries to minimize its loss function through backpropagation and updates its bias and weights.
The input needs to be converted to a vector to feed them into RNN. One hidden state passes information to the next state in RNN. At first, the current input and the output of the previous hidden state get concatenated and pass through a tanh activation function, which converts the value between -1 to 1. Thus, each state contains some information about the previous states. RNN is prone to gradient vanishing and gradient exploding problems when it needs to relate to a broader context. Due to this problem, RNN does not perform well when the gap is vast between the current and the previous information.

Long-short Term Memory (LSTM)
LSTM was constructed to remove the gradient vanishing and gradient exploding problem so that it can relate to a vast concept and perform well when the gap between current and previous important information is large. It also processes information during the forward propagation, but it has quite a different cell structure, which enables it to remember useful information and pass them through the long chain of sequences [52]. LSTM layers are mainly constructed using three gates: input gate, forget gate, and output gate and a cell state [52]. The cell state is considered as the memory of the network, which is capable of carrying relevant information to the sequence. The gates are neural networks that can recognize the relevant information to keep. These gates also forget the rest of the information. The forget gate is used to erase irrelevant information. The input gate selects the relevant information to add, and the output gate determines the following hidden state information.
The forget gate decides to keep or forget the information. Values from current input and previous hidden state propagate through a sigmoid function, which translates to a value between 0 and 1, where the values which are close to 1 are kept and close to 0 are forgotten. In the figure, h t−1 is the previous hidden state information, and x t is the current input. The output of the forget gate, f t is calculated using Eq. (1).
Here, W f is the weights of the forget gate and b f is the bias of the forget gate. The cell state is the memory of LSTM, and the input gate is used to update it. At first, the current input and hidden state information are passed through a sigmoid function to transform it to be within 0 to 1, which indicates the importance of the information. The 4/21 current input and hidden state information also go through a tanh function, which is used to regulate the network. The output of these functions is multiplied, and when the output of sigmoid is 0, the information is thrown away. When the output is 1, it is kept in the cell. IfC t is the output of the tanh activation and i t is the output of sigmoid function then, they can be calculated using Eq. (2) and Eq. (3).
where W i and W C are the weights of the input gate and cell state and b i and b C are the biases of the input gate and the cell state, respectively. h t−1 is the previous hidden state information, and x t is the current input. The cell state is updated through a point-wise multiplication of the forget gate's output and adding it with the output of the input gate through point-wise addition. Values are dropped in point-wise multiplication if it is multiplied by 0. If C t−1 is the previous state information and C t is the current state information, then, C t can be found out using Eq. (4) The last gate of the structure is the output gate, which is used to determine the next hidden state. The previous hidden information and the current inputs are passed through a sigmoid activation to convert it to a value between 0 and 1. The updated cell state information is passed through a tanh activation, and the output of this function is multiplied with the output of the sigmoid activation function. The multiplication is the information that the hidden state will carry. If o t is the sigmoid output in this gate and h t is the output of this gate then, they can be calculated by Eq. (5) and Eq. (6).

5/21
where W o is the weights of the output gate and b o is the bias of the output gate. At first, the current input and hidden state information are added, and it is known as combine. The combine is given input to the hidden layer, which selects the relevant information. A candidate layer is built, which contains the information to be added in the cell state. The input gate takes combine as input and selects the data that would be added in the updated cell state from the candidate. The cell state is then updated using the forget and input gate output and gives the new hidden state. The output considers it and find out the information that the next state will contain. This way LSTM produces output considering previous states. Figure 1 shows the working principle of an LSTM cell.

Convolutional Neural Network (CNN)
CNN is a combination of the Convolutional layer, the pooling layer, and dense layers. The convolutional layer extracts information from the data and passes it to the next layer. The higher the number of the filter of the convolutional layer, the more depth of the information is extracted [53]. It makes the spatial size large, which is then reduced by pooling layers. Pooling layers can be of different types such as max pool, min pool, and so on. A set of fully connected dense layers is then concatenated with these convolutional layers and pooling layers to classify or predict the output.

Methodology
The trend prediction of depression during the COVID-19 was performed in four stages. At first, a dataset was constructed from two existing datasets to train and test the depression prediction model, and another dataset was prepared for depression trend prediction. The texts of the datasets were then pre-processed and converted into a suitable form to feed into the proposed LSTM-CNN model. The model was trained and tested using a labelled dataset in the next stage. Finally, the trained model was used to find depressive posts to analyse the mental health situation of the 482 Twitter users before and during COVID-19.

Dataset Preparation
For training and testing purposes, we have used the Sentiment 140 dataset [54] and the Depressive Tweets Processed dataset [55]. Sentiment 140 dataset contains about 1.6 million tweets, and they are labelled as 0, 2, and 4, where 0 means the negative texts, 2 means the neutral texts, and 4 means the positive texts. Depressive Tweets Processed dataset contains around 2345 depressive tweets (53748 words). We have randomly selected 8000 positive tweets (98053 words) from the Sentiment 140 dataset, which are labelled as 4. We labelled the selected tweets as 0, which means not depressed tweets. We then labelled Depressive Tweets Processed dataset as 1, which indicates depressive tweets. We merged these two datasets, and the whole set contained 10345 tweets (151801 words). Apart from the 8000 tweets of the Sentiment 140 dataset used in the initial case, another four randomly selected sets containing same amount of tweets from the same dataset was taken to check the robustness of the trained model. All the sets provided similar results which is mentioned in Result Analysis section. People generally tend to post tweets on recent topics or news rather than depressive topics. Therefore, it is difficult to collect depressive tweets and no dataset contains a very large number of depressive tweets. In order to keep the general nature of posts in social media, a ratio of 4:1 is chosen, which makes the dataset balanced and also keeps the nature of the social media trend. It also helped the model to train without biasness and provided a training, and 30% (3104 out of 10345 tweets) tweets were used for testing purposes. Figure 2(a) presents the process of building the combined dataset. Figure 2(b) shows the percentage of data taken from each dataset. Big data usually means three V's -volume, variety, and velocity. For the initial model to predict depressive tweets, a set of 10,345 tweets were used, which part was not regarded for the big data analysis rather the collected 571,946 tweets of 482 users which was posted within October 2019 -May 2020 in London, Bradford, Barnsley and Huddersfield to find the effect of COVID-19 in mental health was regarded as big data. The trend prediction is a continuous process, and we can analyze the data of the following months with this model. Thus, the volume becomes massive and due to the difference in length and emojis, the variety property is also observed. If the monitoring is continued for the following months, the velocity property is also met. Gender wise distribution is provided in Table 1. The global pandemic COVID-19 started to expand worldwide from the later part of December, but in England, it started in the February. Therefore, we tested the tweets of four months during the pandemic and compared it with the previous four months.

Text Pre-processing
The Sentiment 140 dataset [54] and the Depressive Tweets Processed dataset was merged and pre-processed before feeding to the LSTM-CNN model. At first, all the texts were converted to lowercase so that a word with uppercase and lowercase do not interpret different values. Then the contractions such as can't, he's, etc. used in these sentences were expanded to their full form. Bad symbols, emojis, punctuation, and extra white spaces were removed from the texts to make the text tokenizable and straightforward. Stop words such as he, is, a, etc. were filtered from the texts as these words do not have any semantic value. Then, the texts of the merged dataset were tokenized to create a dictionary having a set of 270,000 words. In the next step, these texts were converted to a vector, where max sequence length was set to 140. This vector was the input to the model. The dataset for training and testing model was converted to a tensor having a shape of (10345, 140). Further, this dataset was divided into training and testing portions. For example, when the tweet was "What's the importance of this locked life ? I'm at a VeRy DepresSive , :)( situation!!!", after converting it to lowercase and expanding the contractions, it became, "what is the importance of this locked life ? i am at a very depressive , :)( situation!!!". After excluding the bad symbols and the extra spaces, it became, "what is the importance of this locked life i am at a very depressive situation." Finally, after filtering the stop words, it is converted to "importance locked life depressive situation." After tokenization, these words were represented as a list of numeric values, "[[3408, 2506, 71, 12023, 855]]". Then a vector was made, which was the input to the model with a length of 140. All the vectors were padded to have the same size. Since no tweets were larger than 140 words in the dataset, this was selected as the vector size. Figure 2(c) shows our proposed pre-processing workflow.

LSTM-CNN Model construction, Training, and Testing
Our proposed model has an input layer, nine hidden layers, and an output layer. The first hidden layer is the embedding layer, and Google's pre-trained word2vec model was used, which can deal with three million words [56] as a dictionary. The maximum number of words for the construction of the dictionary was set to 270,000 for our model. The output of this layer was the input of the only bi-directional LSTM layer containing 100 neurons, where the dropout ratio was set to 0.25. The dropouts help the model to avoid overfitting. The output tensor had maximum sequence length of 140, and dimension of 300. Figure 2(d) depicts the proposed LSTM-CNN model.
Two LSTM layers with 100 and 50 neurons were added consecutively after the bidirectional LSTM layer with a 25% dropout ratio. It converted the tensor to the shape of (140, 50). Then a global max-pooling layer was introduced, which outputs a tensor with a dimension of 50. This layer was the bridge between the LSTM layers and the dense layers. Three dense layers were used in this model. Both first and second dense layers contained 50 neurons and the ReLU activation function. A dropout of 50% was used between these two dense layers. Another dropout of 50% was introduced between the second and the third dense layer. The third dense layer was the final output layer of the model, which provided a probability score of whether the tweet can be considered a depressive one or not. The sigmoid activation function was used in this layer. The model was trained with a learning rate of 0.001, Adam optimizer, and 10 epochs. Then the model was tested using the separated portion of the dataset, which provided satisfactory accuracy.

Trend prediction of the depressive posts
For finding a trend in depression before and during the COVID-19 pandemics, a dataset was constructed taking all the tweets of 182 users of London, 100 users of Barnsley, 100 users of Bradford, and 100 users of Huddersfield from October 2019 to May 2020. Figure 3 depicts a word cloud showing the frequent words used in the depressing tweets. These tweets were pre-processed by using the mentioned methods in the text pre-processing sub-section. Then the processed Twitter posts were given as input of the model in a monthly order, and the LSTM-CNN model predicted the tweets as 1 or 0, where 1 indicates a depressive tweet, and 0 indicates a not depressive one. The ratio of the depressive tweet to the total number of tweets in each month was calculated to determine the depression percentage. Based on these percentages, the trend of depression during the COVID-19 pandemics is analyzed.

Result Analysis
Due to the escalation of micro-blogging sites like Twitter, views of mass people can be generalized with ease. To predict depression, a model was needed to build that can classify tweets into depressive and non-depressive with a perfect accuracy score. Since tweets are sentences and words in sentences maintain a sequence to make meaningful sense, we wanted to use a neural network that can cope with this word sequence. Therefore, we have used the LSTM-CNN neural network. The model was able to predict depression with 99.42% accuracy and an f1-score of 0.9916. Among the 3,104 testing data, this model rightly predicted 2,401 non-depressive tweets and 685 depressive tweets. The model wrongly predicted seven non-depressive tweets and eleven depressive tweets. So, the error percentage is 0.58%. The model has been trained and tested with another four different sets to prove the robustness. In these sets, depressive tweets were same but non-depressive tweets (8000 tweets) were different which were collected from Sentiment 140 dataset. The proposed architecture showed similar kind of efficiency in every cases. Performance of the proposed model in different sets have been provided in Table 2. Figure 4(a) depicts the confusion matrix of the LSTM-CNN model. To construct the proposed architecture, different configurations have been tried to determine optimal number of layers and neurons in Bi-LSTM, LSTM and Dense layers. Embedding, Global Max Pool 1D and Dropout layers were kept as the proposed architecture. These models were found out using Keras Tuner. The proposed architecture provided better accuracy than the other models and it provided greater sensitivity which is crucial in this case. Performances of the best models have been described in Table 3   We then compared the model with other DL and ML algorithms. The vanilla LSTM network with one LSTM layer was able to produce better accuracy than the ML algorithms (LR, LSV, NB, SVM). This model was 98.87% accurate. The SVM classifier performed worst with only 76.64% accuracy. Figure 4(b) shows the comparison among different algorithms based on accuracy on the combined dataset used in this work. The proposed model performed better than the other DL and ML models. A comparison among the performance of the proposed architecture with other existing models is provided in Table 4. From this table, we can get a clear idea about the supremacy of the proposed architecture. The main goal of this study is to find the relation between COVID-19 and people's depression. Therefore, the proposed model was used to predict tweets to appraise  . Network diagram of keywords found from depressive and non depressive tweets of the habitats of different cities of London during October-19 and May-20. As COVID-19 cases increases from Feb-20, the keywords of depressive tweets changed a lot. "Lockdown", "Covid-19", "Death" were found in almost every month between Feb-20 and May-20. Even, non-depressive tweets had the impact of COVID-19 as people tweeted on "hope" and "family time" during this pandemic. mental health. The tweets from February 2020 to May 2020 were considered as they have effects of COVID-19 pandemic. We considered the tweets of the previous four months as they do not have the effect of COVID-19. Figure 5 depicts the overall distribution of the classified tweets, their trend found by the proposed model, and trend of new COVID-19 cases. In London, during this four months before COVID-19, the depressing post percentage was the highest in Jan-19 (33.52%) and the lowest depressing post was in October-19 (30.92%). During COVID-19 pandemic, the highest percentage of depressive post was found in March-20 when number of COVID-19 cases was increased from 14 to 10,137 than the previous month. The lowest percentage of depressive tweets were noticed in February-20 and May-20 when new COVID-19 infection was comparatively lower than March-20 and April-20. In London, during COVID-19 pandemic, 1,06,222 out of 2,68,893 tweets (39.5%) were depressive. 55,333 out of 1,66,438 tweets (33.2%) by the same users were detected as depressive before the proliferation of COVID-19 in the same region. It indicates an average elevation of 18.07% depressive tweet in London during COVID-19. There is a significant rise in amount of tweets (61.5%) by the same set of users. Though London had a large percentage of depressive tweets in pre-COVID stage, the increment was not that much compared to other cities. London is a big city with a lot of opportunities and variety in race and livelihood. This city has a very high living cost and people get less family time compared to other cities. Due to this diversity and high living cost, it is expected to have greater depressive tweets than other cities in pre and post COVID scenarios. During the COVID scenario, city council took great steps to help the people by providing necessary medical services and also took care of mental health. During COVID-19, claim of unemployment related benefits have been increased to 170% [62]. It was greatly managed by London city council which helped to manage the depressive tendency. People got more family time which provided a break from the monotonous life. These might be the reasons behind the low rate of increment in depressive tweets compared to other cities.
In Barnsley, the highest percentage of depressive post before the COVID-19 pandemic was in November-19 (24.93%) and the lowest depressive posts were noticed in Jan-20 (21.84%). The highest percentage of depressive posts was observed in March-20 as there was a sudden rise (0 to 356) in the COVID-19 infected cases. Percentage of depressive post was quite high in April-20 (24.90%) and May-20 (23.30%). During COVID-19, 6,823 out of 26,470 tweets (25.77%) by 100 users of Barnsley region was detected as depressive, where it was 24.1% (4,672 out of 19,386) in pre-pandemic phase, which indicates an average elevation of 6.9% depressive tweets in Barnsley region. There is a significant rise (36.54%) in total number of tweets, which indicates more activity in social media than pre-pandemic phase.
The highest percentage of depressive post that was done by 100 random Twitter user of Bradford region detected by the proposed model before the COVID-19 pandemic was in November-19 (19.39%) and the lowest depressive posts were noticed in Jan-20 (18.30%). The sudden rise of depressive post was noticed in February-20 (23.26%), when COVID-19 cases started to rise. However, comparatively lower number of new cases was found out in this region. It also shows lower impact of COVID-19 in depressive post percentage. In total, 4,534 out of 23,880 (18.98%) tweets by 100 users were classified as depressive during pre-COVID stage, where it increased to 6,979 out of 32,918 (21.20%) during selected four months of COVID-19 phase. It shows a rise of 11.69% average detection of depressive tweet in Bradford. An elevation of 37.84% tweets was noticed during COVID-19 that shows an increment of social media activity in this region.
In Huddersfield, the highest percentage of depressive post detected during COVID-19 pandemic was in May-20 (30.74%), where the lowest percentage was in February (21.83%). As COVID-19 cases increased, the percentage of depressive post also increased in this region. In the pre-pandemic situation, the highest percentage of depressive post was in October-19 (28.99%). A total of 5,269 out of 19,898 tweets (26.48%) were detected as depressive in this region during COVID-19, where it was 24.6% (3,468 out of 14,063) in four pre-pandemic months. It shows an elevation of 7.6% average depressive tweets and 41.49% greater total tweets by the same set of Twitter user in Huddersfield during COVID-19 pandemic.
An increasing trend of depressive tweets have been noticed in all the cities of UK considering pre-COVID to COVID time frame among both male and female while considering the gender of the participants. During COVID, female participants posted more depressive tweets in London (41.38%) and Huddersfield (30.73%) compared to male participants (37.49%, 24.43% respectively). However, male participants posted more depressive tweets in Barnsley (29.46%) and Bradford (22.09%) than female participants (20.38% and 19.95% respectively) in COVID situation. Detail data has been provided in Table 5. Here, "Dep." stands for "Depressive" There is a significant rise in the number of persons who shared depressive tweets. Fig. 6 shows the radar diagram of study population having depressive tweets of grater than 30% in each month for the city of London, Barnsley, Bradford and Huddersfield. To prepare this statistics, the number of users were at first calculated who posted at least 30% depressive tweets of their total tweets in a month. Then, an average value was calculated using statistics of the four pre-COVID and COVID months. Difference of the above percentage between pre-COVID and COVID scenario has been observed critically. In Bradford, an average of 36% Twitter users were detected sharing more than 30% status that was predicted as depressive during pandemic. It was 24.5% in pre-pandemic situation. In London, the average percentage of users sharing more than 30% status that were depressive during pandemic was 56.25%, that was 53.25% in the pre-pandemic time. The average percentage of this kind of Twitter users was 36% and 35.75% during pandemic, where it was 26.75% and 26.5% during pre-pandemic time in Barnsley and Huddersfield. It shows a significant rise of 46.93%, 5.63%, 34.57% and 34.9% Twitter users of Bradford, London, Barnsley and Huddersfield in our sample who posted more than 30% depressive posts of their total posts in a month.
With the proposed model, 1,25,293 out of 3,48,179 tweets (35.98%) was detected as depressive during the four months of COVID-19, which was collected from 482 users who reside in London, Barnsley, Bradford and Huddersfield. This data summarizes the overall situation of England. During pre-COVID-19 phase, the percentage of depressive tweet was 30.4% (68,007 out of 2,23,767), which indicates a remarkable elevation of 18.22% average depressive tweets during COVID-19 pandemic. The proposed model detected 43.36% of Twitter users who were considered in sample, posted an average of 30% depressive posts out of their total posts during pandemic. This percentage was 35.94% in case of pre-pandemic phase that shows an elevation of 20.64% during pandemic. There is a remarkable change in keywords of depressive tweets. People started using "Covid", "Corona Virus", "Stigma", "fighting", "lockdown", "death" tn their depressive tweets during COVID-19 situation, where most of them shared depressive status about their work or condition of life in the pre-COVID-19 phase. Fig.  7 shows the network diagram of keywords that were mostly used in the status in the sample used in this research. The reason behind this huge increment in depressive social media content is maybe because of the mass death, lockdown, and the fear of infection. The number of posts has increased by 55.59%, which indicates that people are using social media more to express their depression since the lockdown. These are alarming signs, and if no steps are taken regarding mental health, this issue will lead to a massacre.

Conclusions
A global pandemic like COVID-19 takes away millions of lives, but its impact does not conclude as soon as death ceases. It keeps a substantial impression on the people of the affected area with anxiety and depression, eventually resulting in suicide. Analysis of social media big data reveals this worsened situation during COVID-19 pandemic. In this study, we have witnessed a similar result. During the time of this pandemic, the depressing tweets among the subject population have increased profoundly. People use social media platforms to share their thoughts and views, and since most of the cities are in lockdown state, the number of these posts has increased a lot. In March, when the pandemic in London started to rise to its peak, the number of tweets of the subject population was 96,000 more than the previous month, and the depressing tweets were more than 40%. The other three cities also shows similar impacts. This is an alarming sign of mental health among these people. The proposed LSTM-CNN model performed exceptionally well with an accuracy of 99.42% on unseen data for predicting depression. Since, LSTM uses memory concepts to keep informed of the previous states, this model can track the previous words in the tweets. Though this work was based on more than 571,000 tweets, more data needs to be processed to find the overall situation of mental health. A distributed architecture for classification of real-time tweets is recommended. As more data is available, this work can be extended to find the overall situation of mental health of the world during the COVID-19 pandemic.