Evaluation of COVID-19 spread effect on the Commercial Instagram Posts (CIPs) using ANN: a case study on Holy Shrine, Mashhad, Iran

: The widespread deployment of social media has helped researchers access an enormous amount of data in various domains, including the pandemic caused by the COVID-19 spread. This study presents a heuristic approach to classify Commercial Instagram Posts (CIPs) and explores how the businesses around the Holy Shrine – a sacred complex in Mashhad, Iran, surrounded by numerous shopping centers – were impacted by the pandemic. Two datasets of Instagram posts (one gathered data from March 14th to April 10th, 2020, when Holy Shrine and nearby shops were closed, and one extracted data from the same period in 2019), two word embedding models – aimed at vectorizing associated caption of each post, and two neural networks – multilayer perceptron and convolutional neural network – were employed to classify CIPs in 2019. Among the scenarios defined for the 2019 CIPs classification, the results revealed that the combination of MLP and CBoW achieved the best performance, which was then used for the 2020 CIPs classification. It is found out that the fraction of CIPs to total Instagram posts has increased from 5.58% in 2019 to 8.08% in 2020, meaning that business owners were using Instagram to increase their sales and continue their commercial activities to compensate for the closure of their stores during the pandemic. Moreover, the portion of non-commercial Instagram posts (NCIPs) in total posts has decreased from 94.42% in 2019 to 91.92% in 2020, implying the fact that since the Holy Shrine was closed, Mashhad citizens and tourists could not visit it and take photos to post on their Instagram accounts. English or Spanish, were discarded from the dataset in the preprocessing stage of the project, causing valuable information to be lost. For future works, images and videos of each post can be scraped as well, and by using CNN, they can be classified in the same way. The result from caption and hashtags evaluation can be combined with those from the photo and video assessment to provide more accurate classification. Furthermore, authors have adopted two separate neural networks as means of classification (CNN and MLP), which can be combined to constitute a hybrid neural network model to see if a combined topology of neural networks is better capable of classifying text values, photos, and videos of Instagram posts.

agencies, and media groups) and individuals (e.g., consumers, athletes, and journalists).' However, simply introducing social media, social media refers to reciprocal social interaction and communication among users to freely create and share knowledge in web-based applications and virtual networks (Rice et al. 2016;Ahmed et al. 2019).
Depending on the type of social media, various activities on each platform take place -e.g., live communication, watching online videos and playing online games, commenting on shared content, sending and receiving online messages, creating and sharing information and knowledge -leading to creating easily-recognizable accounts, publicly-accessible content, and more importantly, vast quantities of user-generated data, which provide valuable information on users' interests and preferences on a variety of subjects (Skowron et al. 2016, Ahmed et al. 2019. Besides, social media facilitate sending and receiving feedbacks by creating two-way communication based on real-time data between the service provider and customers, which can significantly help decision making and planning procedures (Schweitzer 2014, Giancristofaro andPanangadan 2016). Therefore, once emerged, social media went far beyond the scope of personal use. In fact, many government organizations and private sectors have employed social networks for large-scale deployment and planning in different topics, both academically and commercially, including psychology, transportation, sociology, education, politics, tourism, marketing, and so forth. As a result, it does not surprise why researchers demonstrate a deep enthusiasm for studying different features of social media and their impacts on every aspect of people's lives (Hawi and Samaha 2017).
Numerous research has been conducted on different social media platforms. For instance, (Blank and Lutz 2017) employed a thorough representative survey in Great Britain to study the characteristics of the users of six major social media platforms -i.e., Facebook, Twitter, Instagram, LinkedIn, Google+, and Pinterest. They investigated twelve demographic measures and several related metrics (e.g., self-efficacy, skills, privacy concerns, and Internet accessing device) with the logistic regression algorithm. Psychologically speaking, the relationship between addiction, self-esteem, and satisfaction with life regarding the massive global growth of social media users was investigated by (Hawi and Samaha 2017) through the Social Media Addiction Questionnaire (SMAQ), which collects several demographic information and responses in terms of Rosenberg's Self-Esteem Scale, and the Satisfaction with Life Scale. Concerning politics, (Bail et al. 2018) surveyed a specific number of Republicans and Democrats who are active on Twitter (i.e., checking out Twitter feed at least three times a week) to explore the consequences of being exposed to opposing political mindsets and confirmed a little growth of liberal behaviors in Democrats, while Republicans revealed significantly conservative attitudes. Another research on the political aspect of social media was carried out by (Tambouris et al. 2011), where they presented a literature review on the impacts of social media on elections in the Netherlands. (Tostes et al. 2014) explored the possibility of whether extracted data from social networks-specifically Instagram and Foursquare -can be representative of traffic conditions. (Alves et al. 2016) conducted a systematic literature review focusing on social media from the marketing viewpoint, where 44 papers were collected and analyzed to investigate the relationship between companies, potential consumers, and social media marketing. Referring to education, (Veletsianos 2012) studied the scholars' practices on Twitter by qualitatively analyzing tweets from 45 scholars and indicated that they mostly share information and resources related to their professions, classrooms, or students and involve in online scientific communities where they can contribute to or engage in academic activities.
Although all social media platforms provide useful information thanks to users' outstanding activities, Instagram is currently known as the second most notable online social media platform and the first largest application for photo and video sharing (Fuciu 2019). Accordingly, more than any other social network, Instagram has recently become the focus of research attention in different fields of study. As one of the first studies, (Yang 2016) studied the relationship between loneliness and Instagram activities on undergraduate students in U.S. mid-southern universities regarding Social Comparison Orientation (SCO). (Che et al. 2017) explored the motivation beyond people's intention for purchase from Instagram stores, where they specifically studied three factors contributing to consumers' trust-building. The presence of UK universities on Instagram was explored by (Stuart et al. 2017) by utilizing webometric data extraction with Instagram API and content analysis. The psychological aspect of Instagram's influences on users was mainly studied by (Andalibi et al. 2017), especially self-disclosures under depression-tagged posts and associated responses.  reported their research on exploring the relationship between the different photo features that Instagram users apply and users' personality characteristics and devised an accurate personality regressor. (Zhan et al. 2018) performed a sentiment analysis on two million captions of Instagram posts related to reading and library by applying machine learning algorithms -i.e., Random forests and Support Vector Machine -and classified associated captions based on different combinations of polarities and emotions. (Bashari and Fazl-Ersi 2020) presented a novel approach for identifying influential Instagram posts by using the Support Vector Machine algorithm based on Natural Language Processing in combination with both the caption and set of hashtags of each post.
It should be noted that most of the past research was conducted using statistical analysis, Spatio-temporal analysis, and similar models and algorithms. However, among all the available methods for problem-solving, neural networks have been shown to surpass traditional statistical methods in a variety of applications and research areas (Paliwal and Kumar 2009). Therefore, a combination of Instagram posts data -i.e., captions, hashtags, comments, photos, and videos -was commonly employed as the input features of neural networks in past research. In this regard, having considered users' environment on Instagram -i.e., the content users are most likely to post - (Zhang et al. 2019) presented a dual-attention model based on neural networks to integrate image-caption pairs with the user environment on Instagram for post popularity prediction of a particular user. Illustration of non-suicidal self-injury (NSSI) on Instagram has been explored by (Scherr et al. 2019) by developing an image-recognition Convolutional Neural Network (CNN) detecting the presence or absence of NSSI in digital pictures. In the fashion domain, (Hammar et al. 2020) devised an innovative method to classify captions, comments, and tags of Instagram posts with weak supervision using NLTK (Natural Language Toolkit) (Loper and Bird 2002) and CNN and also word embedding algorithms such as Word2Vec 2 (Mikolov et al. 2013), Glove (Pennington et al. 2014), and FastText 3 . Detection of drug abuse and dealing on Instagram multimodal data (captions, hashtags, comments, and photos) was analyzed by (Yang and Luo 2017) by first exploring drug-related posts and second, specifying drug dealers' accounts based on a Multi-Layer Perceptron (MLP) neural network. From the psychological perspective, (Jabłońska and Zajdel 2020) studied the relationship between several factors such as life satisfaction, anxiety, depression, and the extent to which women use Instagram and employed artificial neural networks for the classification of results. (De et al. 2017) employed a deep neural network to predict the popularity of future posts of a famous Indian lifestyle magazine based on historical data from Instagram posts -e.g., the creation time and associated tags. (Gomez et al. 2019) came up with a heuristic approach based on CNNs and Word2Vec algorithm to analyze extracted images-captions pairs from Instagram related to Barcelona to learn the connection between the Instagram multimodal content and neighborhoods from a touristic perspective.
The global outbreak of Coronavirus in 2020, in addition to the unprecedented mortality rate, has caused several problems for people, the most important of which is the closure of businesses and commercial activities. This research aims at commercial content identification of Instagram posts to investigate the impacts of the COVID-19 outbreak on the stores and shopping centers around the Holy Shrine complex in Mashhad, Iran. In other words, this study tried to explore the changes that happened to business activities after the shutdown of stores and compare the CIPs in 2019 and 2020 to see how business owners responded to the unpredictable situation caused by the COVID-19 pandemic on Instagram.
This study is configured as follows. At first, the overall process of CIP classification, the structure of the proposed neural networks (CNN and MLP) and the word embedding model (Word2Vec), caption labeling criterion, and analysis scenarios were elucidated in the methodology section. Then, Mashhad and Holy Shrine were introduced, followed by taking a brief look at data gathering and web scraping technique, and preprocessing method using Natural Language Processing (NLP), all of which were discussed in the case study and data preprocessing section. Finally, the classification results were presented, which were examined against several metrics for measuring classification scenarios' accuracy. To the best of our knowledge, this is the first work concentrated on using captions and hashtags to classify Instagram posts and to evaluate the impacts of the CoronaVirus outbreak on commercial activities.

2-Methodology
In this section, an overview of word representations and neural networks was presented. After defining a caption labeling criterion, several scenarios were provided for using neural networks to classify CIPs, all of which are found in figure 1, where consecutive steps for CIPs classification regarding the outbreak of CoronaVirus -i.e., data collection, preprocessing, filtering, concatenating, and classification -were introduced accompanied by a brief description in the left column.

-Word Representations
NLP and text mining are essential branches of Artificial Intelligence, which have several applications, including machine translation, named entity recognition, sentiment analysis, part of the speech tagging, idiomaticity analysis, syntax analysis, and semantic analysis. They can also be used to solve classification problems and regression and clustering (Li and Yang 2018). In many of the NLP applications, as text corpus cannot be directly fed as input data, they need to be converted to numeric values -i.e., vectors -so they can be used as the input features of traditional machine learning algorithms (Roshanfekr et al. 2017) -e.g., Support Vector Machine, Logistic Regression, and Naive Bayes -or deep learning models (Zhang et al. 2018). Fixed-length numeric vectors of converted text features are called word representations, which can be created in several ways, one of which is representing text documents as a Bag-of-Words (Bow). Regardless of the word orders, each document is considered a set of words, which are then transformed into a numeric vector equal to the size of the document vocabulary. Each vector element is calculated based on word occurrence, word frequency, or TF-IDF (Term Frequency -Inverse Document Frequency) (Zhang et al. 2018). Despite popularity, efficiency, and simplicity, BoW has several drawbacks. In addition to suffering from sparsity resulting in more using more memory and computational resources, BoW does not consider word orders and sentences structure, which prevents considering the semantic relationships among words. Some methods have been proposed to tackle the BoW flaws, such as the N-grams, but they were not powerful enough to fully overcome the shortcomings (Zhang et al. 2018, Almeida andXexéo 2019). Addressing the BoW problems, researchers have introduced another type of word representation -word embedding -based on the distributional hypothesis suggested by (Harris 1954), where similar words have closer representations (Li and Yang 2018). Instead of high-dimensional sparse vectors in BoW, word embeddings produce low-dimensional dense vectors, causing word embeddings to capture semantic relationships among words. They have recently been proven to demonstrate and incredible performance in NLP tasks in combination with neural networks (Zhang et al. 2018). Probably, the most famous word embedding algorithm is Word2Vec, developed by (Mikolov et al. 2013), which is based on two log-linear models, namely Continuous Bag of Words (CBoW) and Skip-gram (Sg). The only difference between CBoW and Sg is that while the first one tries to predict the target word based on the surrounding words -i.e., several words before and after -the latter works reversely; the center word is used to predict the surrounding words (Camacho-Collados and Pilehvar 2018, Li and Yang 2018, Zhang et al. 2018, Almeida and Xexéo 2019. In this work, CBoW and Sg were chosen for training word representations of caption and hashtags, associated with Instagram posts, which is described in detail in following sections. (Mikolov et al. 2013)

-Neural Networks
Researchers have been recently attracted to deep learning, as a robust machine learning algorithm, due to its significant contributions to solving a wide range of problems such as computer vision, speech recognition, and NLP. Inspired by human brains' activities, deep learning is the primary application of Artificial Neural Networks (ANNs), which learns specific tasks by adjusting weights and biases inside multiple layers, each of which contains several processing units, called neurons (Zhang et al. 2018). Lately, several architectures and topologies of neural networks have been developed, each of which has outstanding performance in specific applications. In this paper, an MLP and a CNN were employed to classify CIPs, which are briefly described upon hereunder.
MLP is the simplest topology of feedforward neural networks, which mostly contains three types of layers. First, an input layer is used to feed input vectors to the network. Second, the hidden layer(s) comprises simple neurons, called perceptron, which does not produce visual outputs; instead, adjust weights and biases based on an activation function and flow the information forward to other perceptrons and hidden layers. Finally, the output layer makes the final decisions or predictions based on the input features and the weight adjustments of perceptrons inside hidden layers (Ramchoun et al. 2016, Rana et al. 2018, Zhang et al. 2018. (Ramchoun et al. 2016) CNN is a particular neural network that is most useful in Computer Vision applications and has a different structure compared to other types of neural networks, which is initially inspired by the visual cortex of human brains. At first, a kernel matrix strides over the image in a convolution layer(s), extracting several features from the input image. Afterward, there are one or more pooling layers, which are mainly responsible for decreasing the dimensions of the convolution layer's extracted features (s). The pooling layer(s) output is then flattened and inserted into a fully connected layer, which distinguishes the class to which the image belongs in a similar feedforward process (Zhang and Wallace 2015, Zhang et al. 2018, Khan et al. 2020.

-Caption Labeling
Solving every classification problem requires input features to have at least one label, based on which the neural network can classify. Therefore, a criterion must be defined to distinguish CIPs from the Non-Commercial Instagram posts (NCIPs) using a numeric label. Since valuable information about each post is provided in its associated caption and hashtags, they can be used for filtering different types of posts. As a result, the best measure was the integration of caption and hashtags to identify commercial content. Thus, a list of 20 words about business and commercial activities in Persian has been gathered so that each caption or hashtag containing at least one of the purchase-related words would be referred to as a commercial post, and its Commercial Index (CI) would be equal to one, whereas the rest of the posts (NCIPs) would have a zero CI.

-Analysis scenarios
Now that the 2019 Instagram posts dataset has been labeled, two word embedding algorithms -i.e., CBoW and Sg -were trained on caption and hashtags of the Instagram posts dataset in 2019 then used to train two neural networks -i.e., CNN and MLP. As a result, four scenarios were defined to implement analyses on the classification of CIPs, which were demonstrated in table 4 and figure 6.  figure 6 are defined based on the caption and hashtags of the 2019 Instagram posts dataset and the type of word embedding algorithm and neural network used for the classification problem. Several metrics will be then be determined to evaluate the performance of each scenario. The best scenario is then chosen based on each scenario's performance, which will classify the Instagram posts dataset in 2020.

-Case Study and Data Preprocessing
As mentioned earlier, this work aims to classify CIPs considering the spread of COVID-19 in Iran. Among all the cities, Mashhad was chosen as it is the second most popular city, which helped provide a large dataset of Instagram posts. In addition to its large population and vast area, the importance of Mashhad has specifically derived from Imam Reza, who was buried in a complex called Holy Shrine. Figure 6 demonstrates the approximate location of Mashhad in the north-east of Iran, and also where Holy Shrine is located in Mashhad.

Figure 6 Location of Holy Shrine and Mashhad in Iran
Every year, millions of Muslims, both domestic and international, travel to visit the shrine complex, given the importance of pilgrimage in Shia 4 tradition. Besides, many stores, malls, and shopping centers are active 24/7 around the shrine, and tourists would always purchase commodities from the surrounding commercial units, especially at the time of Iranian New Year -starting March 21st each year and lasting for about two weeks.
However, the spread of COVID-19 has dictated a mandatory quarantine to the whole city beginning February 24th, 2020, and many changes happened in response to the CoronaVirus outbreak -e.g., Holy Shrine and surrounding stores and businesses were ordered to be closed by the local authorities and government, and no traveler was allowed to enter Mashhad. As the Holy Shrine was temporarily closed on March 14th, 2020, Instagram posts with the Holy Shrine location were extracted between March 14th, 2020, and April 10th, 2020, and the same period in 2019 for two main reasons. First, it provided the opportunity to examine how the Holy Shrine's shutdown was highlighted on Instagram; that is, how Instagram posts related to Holy Shrine differ from 2019 to 2020, considering the emergence of COVID-19. Second, the Coronavirus impacts on business activities around the Holy Shrine can be assessed by analyzing the caption and set of hashtags associated with each Instagram post. Therefore, Holy Shrine is a perfect location on Instagram for this research as many photos are posted with its location on Instagram, which helps explore the impacts of CoronaVirus on the Holy Shrine and the neighboring businesses.
Due to the Holy Shrine's popularity, multiple locations on Instagram have been dedicated to storing posts with the Holy Shrine location in different languages, some of which are outliers. The name of the Holy Shrine is only used as the location of posts; however, the image and caption are not related to the Holy Shrine at all. As a result, in the first step, 11 locations on Instagram, listed in table 1, with relevant images and captions to Holy Shrine, were selected for data crawling.    The second step is to scrape the information of posts from the location pages on Instagram. Python programming language was used for web scraping, where an API 5 was developed from scratch, specially designed for extracting datasets of posts from an Instagram location page. JSON 6 , a line-weighted semistructured data format, has been employed for this task due to its simplicity for humans to read and write and its clarity for machines to parse and generate (Peng et al. 2011). Therefore, Instagram posts datasets were prepared by scraping data from 11 locations listed in table 1 in the period between March 14th and April 10th Before employing neural networks to classify CIPs, datasets needed to be purified and clean since there is much useless information. Therefore, the following items should be taken into account.
1. From key-value pairs in JSON files, only the ones that can be used to train neural networks (captions and hashtags) were needed, so useless data -e.g., the number of likes and comments, dimensions of photos, and Instagram posts URL -should be removed. 2. As most of the captions and hashtags were written in Persian, the few posts associated with non-Persian captions and hashtags should be manually discarded from datasets to provide homogeneous, unilingual datasets of Instagram posts. 3. Most captions have not been written down continuously in a single line, meaning that there were many blank spaces between words and characters that were needed to be removed as well as emojis. 4. Punctuation marks and stopwords -i.e., words repeating several times in a text corpus, which serve no practical meaning, like prepositions and transitional words -should be removed as well (Srividhya and Anitha 2010). 5. Each post's associated caption and hashtags were then separated and split into a list of its constituent words.
All the text preprocessing steps were carried out using a python function that was developed from scratch. Finally, some Instagram posts not associated with captions at all were entirely removed from datasets. Therefore, cleaning captions and removing null values resulted in the Instagram dataset of 2019 decreased to 2834 posts and the dataset of 2020 decreased to 6919 posts.

-Results
5-fold cross-validation was used to evaluate each deep learning model's performance. Moreover, hyperparameter optimization was performed on each deep learning model using grid search to choose the best hyperparameters for MLP and CNN deep learning models. The structure of the deep learning models used in this study is summarized as follows.
• MLP and CNN's input feature comprises an embedding layer -i.e., the trained word representation model in a 100-dimensional vector space -with a size equal to the maximum vocabulary size of each dataset.

Application Programming Interface 6 Java Script Object Notation
• The MLP is made of one hidden layer with 90 neurons, each with the ReLU 7 activation function (Glorot et al. 2011), whereas in the structure of the CNN model was found a one-dimensional convolution layer with 64 filters and a kernel matrix of size 16, followed by a one-dimensional max-pooling layer of size 4. • A dropout layer with a rate of 0.3 was added to the hidden layer of MLP and convolution and pooling layers of the CNN to avoid overfitting. • The output layer of the MLP and a fully-connected layer of the CNN consisted of one node with the sigmoid activation function to classify two classes of Instagram posts -CIPs and NCIPs. • Two callbacks were used in each model; an early stopping callback was used to stop the training process, only if the validation accuracy has not increased after three consecutive epochs. A model checkpoint was also added to save the model with the best performance during the training process. • Both models used Adam 8 (Kingma and Ba 2016) as the optimizer with a learning rate of 0.1 and binary cross-entropy as the loss function.
Since 5-fold cross-validation was used for model evaluation, each scenario was run five times, during which one fold was held out for validation, and the rest were used to train each deep learning model with a batch size of 50 and 60 epochs.

-Implementation
After training neural networks, they have to be evaluated against several measures to quantify the extent to which they could classify CIPs from NCIPs. The first and probably most renowned criterion for evaluation is accuracy, which, in binary classification problems, is the total correct predictions divided by the total number of cases in the dataset (Ferri et al. 2009, Sokolova and Lapalme 2009, Branco et al. 2015. The point is that accuracy is an excellent metric for predictive modeling assessment as long as instances in the dataset are equally distributed, meaning that each class has an equal number of instances or the number of instances is close to each other. However, in this case, the number of instances -i.e., Instagram posts -in each class of each dataset (2019 and 2020) differs significantly. That is, according to table 3, the number of Instagram posts in class 0 is 17 and 24 times greater than those in class 1 in datasets of 2019 and 2020, respectively. As a result, the unequal distribution of instances has led to an imbalanced classification problem. Comparing the majority class (the class with more instances) and the minority class (the class with fewer instances), the latter is usually of the most interest. However, since most of the cases belong to the majority class, the algorithm cannot correctly distinguish the characteristics of the minority class and differentiate its cases from those of the majority class, which makes the classification of the minority class more challenging to implement (Branco et al. 2015, Krawczyk 2016. The main problem is that using conventional metrics like accuracy for imbalanced dataset assessment can cause misinterpretations as they behave indiscriminately toward skewed datasets. (Haibo He and Garcia 2009, SUN et al. 2009, Branco et al. 2015. Therefore, other metrics have to be taken into consideration when dealing with imbalanced datasets, where (Ferri et al. 2009) have gathered several useful metrics associated with sensitivity analysis, which are helpful for imbalanced problems, among which precision, recall, and F1-score were the most useful metrics for the evaluation of deep learning models' performance on imbalanced classification problems. However, another solution to address the imbalanced classification challenge is to artificially resample the training dataset. That is, balancing the distribution of classes' instances and making them equal -or roughly equal -by undersampling the majority class -i.e., removing examples from the majority class in the training dataset -oversampling the minority class -duplicate examples of the minority class in the training dataset, or both. In this case, not only would accuracy demonstrate an unbiased model performance, but other metrics mentioned above can also be simultaneously used to provide a more precise overview of the model efficacy. Several algorithms have been developed for undersampling and oversampling. To name a few, random undersampling, near-miss (Zhang and Mani 2003), condensed nearest neighbors rule (Hart 1968), Tomek links (Tomek 1976), edited nearest neighbors rule (Wilson 1972), one-sided selection (Kubat and Matwin 1997), and neighborhood cleaning rule (Laurikkala 2001) were suggested for undersampling.
7 Rectified Linear Units 8 Derived from Adaptive moment estimation.
On the other hand, random oversampling, SMOTE 9 (Chawla et al. 2002), Borderline SMOTE (Han et al. 2005), Borderline SMOTE SVM 10 (Nguyen et al. 2009), and Adaptive Synthetic Sampling (ADASYN) (He et al. 2008) can be utilized for minority class oversampling. Multiple combinations of the aforementioned undersampling and oversampling techniques can be applied to balance the dataset class distributions. However, SMOTE and random undersampling were used simultaneously in this project, which was suggested by (Chawla et al. 2002), in such a way that SMOTE was first applied to bring the minority class distribution of Instagram posts in 2019 to the 50% of the majority class. Then, random undersampling was employed to bring the majority class down to 90% more than the minority class, resulting in 1338 Instagram posts with CI equal to one and 1486 Instagram posts with the zero CI.
Now that dataset class distributions are balanced, accuracy, precision, recall, and F1-score were adopted for evaluating the performance of neural networks in the binary classification problem. Precision (also called positive predictive value) is defined as the ratio of truly-predicted positive instances to total positive-detected cases and focuses on the extent to which predicted data labels correspond to the positive class. Recall (also known as sensitivity), on the other hand, is the fraction of correctly-determined positive instances to the total number of positive cases in the dataset and denotes the classifier's efficiency in positive-class identification. F1score is used to establish a relationship between precision and recall (Sokolova and Lapalme 2009). Calculating precision, recall, and F1-score needs a clear understanding of the confusion matrix as it is used to demonstrate the results of the classification problem, in which four cells can be found in case of a binary classification -i.e., True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). TP and TN refer to the number of positive and negative instances whose classes were correctly predicted by the classifier, whereas positive and negative examples with wrong predictions are denoted by FP and FN (Branco et al. 2015). From the confusion matrix, precision, recall, and F1-score are defined as follows (Ferri et al. 2009, Sokolova and Lapalme 2009, SUN et al. 2009, Branco et al. 2015, Krawczyk 2016):

-Evaluation
In this section, the best scenario for the classification of CIPs in 2019 was selected, which was then used to classify the 2020 CIPs to see how to COVID-19 spread affected the distribution of CIPs and commercial activities around the Holy shrine in 2020.
The results of the evaluation metrics are summarized in table 6. Averaging evaluation metrics of MLPs (scenarios 1 and 2) and those of CNNs (scenarios 3 and 4), one can conclude that MLPs are more precisely capable of CIP 9 Synthetic Minority Oversampling Technique 10 Support Vector Machine classification in 2019. Among scenarios 1 and 2, the former demonstrated higher values for all four evaluation metrics. Therefore, the weights of the first scenario -the one with MLP and CBoW -is used for CIP classification in 2020 to investigate how the COVID-19 outbreak impacted commercial activities.  Table 7 summarized the total statistics of Instagram posts for both the 2019 and 2020 datasets, including the total number of Instagram posts classified by the chosen scenario (before and after the preprocessing), number of users whose Instagram posts were scraped, number of CIPs, NCIPs, and their ratio to the total number of posts in each year after preprocessing. It should be noted that the number of CIPs and NCIPs of 2020, superscripted by an asterisk, are the results of 2020 CIP classification by scenario 1 -the one with MLP and CBoW. The total number of Instagram posts associated with caption or hashtags has significantly increased from 2834 in 2019 to 6919 in 2020, which can also be justified by incrementing unique Instagram users posting photos with the Holy Shrine location in the time of this study. This phenomenon can be simply explained by the fact that since people were encouraged to stay at home unless, for their essential needs or emergencies, they mostly prefer to use social media over other boring activities. They used social media -specifically, Instagram -for different purposes, such as keeping on with their commercial activities or expressing their emotions regarding the Holy Shrine shutdown.
On the one hand, after the outbreak of CoronaVirus and the temporary closure of the Holy Shrine and surrounding businesses, it was expected that shopkeepers would have used social media -mainly, Instagram, which is the most popular one in Iran -to continue their activities and sales during the pandemic. Therefore, one would expect to see an increment in the portion of CIPs in total Instagram posts with the Holy Shrine location from 2019 to 2020. As expected, the number of CIPs grew from 158 in 2019 to 559 in 2020 and the fraction of total posts demonstrating commercial content has escalated from 5.58% in 2019 to 8.08 % in 2020. In other words, business owners were inclined to advertise to sell their products on Instagram.
Shrine for a pilgrim, so they could not take photos from Holy Shrine to post on their Instagram accounts. Therefore, the number of NCIPs has decreased from 2019 to 2020. Besides, many tourists could not travel to Mashhad because of strict restrictions, so with fewer travels, one could expect fewer NCIPs with the Holy Shrine location.

-Conclusion
Social media platforms have been extensively used worldwide in the recent decade, which allowed access to free, real-time information. On the other hand, the outbreak of CoronaVirus has made substantial changes in the world and attracted researchers' attention toward its impact on people's lives. This paper explored the classification of CIPs regarding the closure of the Holy Shrine and neighboring commercial units to determine how the spread of COVID-19 has influenced the distribution CIPs on Instagram posts from 2019 to 2020. Two datasets of Instagram posts (2019 and 2020) were prepared. Besides, two word embeddings (CBoW and Sg) and two neural networks (CNN and MLP) were employed to classify CIPs in 2019. The results showed that the 2019 dataset of Instagram posts was best modeled and classified with MLP and CBoW, which was then used to classify Instagram posts in 2020. Based on the results, the total number of posts, number of CIPs, and number of NCIPs in each dataset had grown from 2019 to 2020, meaning that people tend to intensively use Instagram during the pandemic when they got stuck at home; people have probably found Instagram more amusing than other activities. Moreover, the fraction of CIPs to the total posts in each dataset has increased from 2019 to 2020, demonstrating the desire of business owners to use Instagram to continue their sales during the shutdown of their stores. Furthermore, the fraction of NCIPs to total posts in each dataset decreased as people living in Mashhad and tourists could not visit Holy Shrine in-person, so they did not post photos of Holy Shrine on their Instagram account.
This work employed the caption and set of hashtags associated with each scraped Instagram post to detect commercial content and classify the posts from the business and pilgrimage perspectives. However, Instagram posts that did not associate with caption and hashtags at all, or those that possessed captions or hashtags in other languages, e.g., English or Spanish, were discarded from the dataset in the preprocessing stage of the project, causing valuable information to be lost. For future works, images and videos of each post can be scraped as well, and by using CNN, they can be classified in the same way. The result from caption and hashtags evaluation can be combined with those from the photo and video assessment to provide more accurate classification. Furthermore, authors have adopted two separate neural networks as means of classification (CNN and MLP), which can be combined to constitute a hybrid neural network model to see if a combined topology of neural networks is better capable of classifying text values, photos, and videos of Instagram posts.

Appendix B
As mentioned in section 4, Instagram posts captions were transformed into numeric vectors using Word2Vec algorithms -i.e., CBoW and Sg -which were then fed to the embedding layers of deep learning models -i.e., MLP and CNN. Embedding layers had a 100-dimensional vector space, and their sizes were equal to the maximum number of tokenized words in each Instagram posts datasets.
To be more specific, the Instagram posts dataset in 2019 was an array of 2834 instances -posts -after preprocessing, each of which has 328 elements. Certainly, each Instagram post did not have 328 tokenized words. However, since there was an Instagram post possessing 328 words, other posts were given the corresponding number of zeros so that all posts in the Instagram posts dataset in 2019 have an equal size. In other words, the Instagram posts dataset in 2019 is a two-dimensional array with 2834 vectors; each has 328 values. To better clarify, consider an Instagram post in the 2019 dataset, which had 56 tokenized words after preprocessing. Therefore, 272 zeros should be added to its end to make it equal to other elements in the dataset and compatible with the embedding layer. The same process happened for the 2020 dataset, which similarly had 6919 instances after preprocessing and 360 number elements in each instance. For more clarification, a random vectorized Instagram post is presented below.