Analysis of the effects of lockdown on staff and students at universities using Natural Language Processing Techniques: case study of Spain and Colombia

The review of previous works shows this study is the first attempt to analyse the lockdown effect using Natural Language Processing Techniques, particularly sentiment analysis methods applied at large scale. On the other hand, it is also the first of its kind to analyse the impact of COVID 19 on the university community jointly on staff and students and with a multi-country perspective. The main overall findings of this work show that the most often related words were family, anxiety, house and life. On another front, it has also been shown that staff have a slightly less negative perception of the consequences of COVID in their daily life. We have used artificial intelligence models like swivel embedding and the Multilayer Perceptron, as classification algorithms. The performance reached in terms of accuracy metric are 88.8% and 88.5%, for student and staff respectively. The main conclusion of our study is that higher education institutions and policymakers around the world may benefit from these findings while formulating policy recommendations and strategies to support students during this and any future pandemics.


Introduction
Since identifying the severe acute respiratory novel coronavirus 2 (SARS CoV 2) in December 2019 in Wuhan, China, it has provoked turbulence in our society and countries worldwide. Thereby, rigorous public health measures were imposed around the world, such as quarantine, testing, and social distancing. However, they were different around the world. In Latin America, the different governments imposed confinement and social distancing measures in different ways, in some cases based on the reality of the moment and others on what was happening in countries outside the continent. Thus, premature and unplanned strategies unleashed a series of effects that added complexity to the pandemic situation, mostly related to inequity in health and the economy [1]. In Europe, the effects of the pandemic spread in a much more accelerated manner due to its high international traffic by air, land, and sea. This situation determined that the governments quickly assumed the confinement measures and with important effects on the economic, political, and social aspects.
The COVID-19 created unexpected challenges that affected in one way or another each one of us, having disruptive effects and a significant impact on almost all sectors of our society around the world: health [2,3], economic [4] and education [5] is not an exception. In fact, the academic community is a population group that has experienced dramatic effects during the pandemic, especially in the first five months of 2020, impacting their daily life, prospects for their immediate and distant future.
According to UNESCO's monitoring, more than 160 countries implemented nationwide closures, which impacted over 87% of the world's student population [6]. Physically closing or restricting access to educational institutions combined with other policy measures contributed to reducing the spread of the COVID-19. However, the pandemic has posed challenges for the academic community and their families, friends, employers, and the global economy [5]. The closure of the educational institution on all levels provoked learners to stay at home and reached a peak of 1.598 billion students at home from 194 countries on 1 April 2020 [1], altering the patterns of teaching, the transition to online learning, the change of the communication channels between academic staff members and teachers and students, new assessment methods, different workloads and performance levels [5], the technology use and others. All these measures gave way to a global change that was already underway, leaving a gap of inequality and poverty in the world's poorest regions, but that was accelerated with the pandemic and, therefore, widening the gap of social injustice, poverty, and social inequity.
In Latin America, the guarantee of public, free, and accessible education was not a reality. Children at home, in many cases, made it impossible for parents to attend to their daily jobs; even without having to travel to their workplace, they must accompany their school-aged children during their duties, especially in their early school years (Imanol, O.). Furthermore, the responsibility of providing each child or young adult with the needed technology to succeed in school falls upon the families, which further incentivizes inequity.
The investment of resources to strengthen technology in those regions and countries requiring it is a priority. There should be a significant investment in the countries concerning the enhancement of connectivity networks in distant regions and high geographical dispersion. Apart from the use of the internet and technology, there is still a significant percentage of illiteracy in most Central and South American countries for the adequate use thereof, which must be overcome to reduce inequity in education [7]. However, according to the report of "The current situation of education in Spain on the advice of the pandemic," this situation in Latin America was not very different to the Spanish situation." Based on the Report of the State School Council, in Spain as of January 2021.
Under this scenario, there is a need to collect more expansive data to understand how Covid-19 reshapes academic life in higher education. Although several papers have been already published that analyses the various aspects of the COVID-19 pandemic on academic life and the consequences for physical and mental health, the economy, society, and the environment [5], most of them focus on students, disregarding staff and limited to one country or a single higher education institution. Additionally, it has been observed in the literature review that previous works have not used a sentiment analysis technique at a large scale on the analysis of the data.
According to the source of information used, sentiment analysis can be divided into three groups: social networks [8], especially Twitter [9] refereed papers, and texts from interviews with family members [10]. In the COVID pandemic, the most frequent source of information for conducting sentiment analysis has been the use of Twitter and other social network content to categorize motivations related to multiple infectious diseases to monitoring, data analysis, and challenges faced by researchers in terms of available information, and in the relationship with social media platforms and the community [11]. The use of sentiment analysis through interviews with family members of covid-19 patients who used the virtual modality of visits focused on feelings experienced at the visit, barriers, and concerns about this option, and opportunities noted for improving the method [12].
However, there have been no reports of conducting sentiment analyses using expressions in online public opinion surveys to date.
The aim of our study is to understand the student's and staff's perspectives and experiences on how the pandemic and the closure of Universities has had an impact on the university community, students and staff, in Spain and Colombia applying sentiment analysis methods. This paper attempts to shed light on the impact of the COVID-19 pandemic on staff and students from Universities in Spanish countries. This article focuses on identifying what is the perception of the students and staff of the Universities in Colombia and Spain, regarding the contingent measures taken by the governments of the different countries that integrated the strategies of lockdown and social distance. For this aim, we conducted a customized questionnaire to understand how students perceive the impacts of the first wave of COVID-19 crisis in early 2020 on various aspects of their lives on a global level and applied sentiment analysis techniques.

Materials and Methods
The Materials and Methods should be described with sufficient details to allow others to replicate and build on the published results. Please note that the publication of your manuscript implies that you must make all materials, data, computer code, and protocols associated with the publication available to readers. Please disclose at the submission stage any restrictions on the availability of materials or information. New methods and protocols should be described in detail while well-established methods can be briefly described and appropriately cited.
Research manuscripts reporting large datasets that are deposited in a publicly available database should specify where the data have been deposited and provide the relevant accession numbers. If the accession numbers have not yet been obtained at the time of submission, please state that they will be provided during review. They must be provided prior to publication.
Interventionary studies involving animals or humans, and other studies that require ethical approval, must list the authority that provided approval and the corresponding ethical approval code.

Online Survey
The online questionnaire is part of a global initiative, The Lockdown Project, led by London School of Economics (LSE). The objective of this project was to understand the experience of university staff and students around the world and construct a better factual map of what was happening to them under the COVID19 situation.
The survey has been distributed in more than 25 countries by LSE and in collaboration with the project partners, 18 collaborators and 11 supporting organisations. The survey was initially designed in English and translated into 17 languages. This study is based on the Spanish version of the Survey.
The data were obtained through a web-based comprehensive questionnaire composed of 68 questions, 67 closed-ended questions that cover different aspects: sociodemographic, geographic, and the impact of the COVID pandemia in their daily lives (studies, work, families, social life, habits…). The last question was an open-end question related to personal reflections: "Share any other thoughts / experiences about your life in lockdown" and set the basis for the analysis of this paper.
The scope of the study has limitations due to various causes, among them, the little information provided by the governments of some countries, in several cases hasty decisions or unanalyzed, and of media impact that generated great uncertainty, lack of planning, false news leading to conspiracy theories that ended up affecting the perception of young students about confinement and the true impact of the pandemic in its their life. It is necessary, as already said before, that more studies of this type be carried out to recognize the perceptions and affirm or deny what was initially given as the perception of the study participants, since this is a qualitative study.

Study Participants and Procedure
All participants consented to their participation and responded to the survey in anomysed way to ensure that no personal data was collected. All data collected were GDPR compliant and has been used for research purposes only.
The target population comprised university staff and students of spanish speaking countries who were at least 18 years old. The web-based survey was launched via the open source web (https://research.healthbit.com/c/LockedDown-en) on 20th April and it remained open until July. The survey was distributed by advertising on University of Deusto communication systems and social media, dissemination through two University networks (Unijes and AUSJAL) and targeted mailing to other spanish and latin america universities. We obtained responses from 12 countries. The distribution by country and respondent's profile (student or staff) is depicted in table 1 and figure 1.

Country Students Staff
Spain

Labelling
The open responses to the question were labeled as positive, negative or neutral by two independent researchers, for the sentiment analysis. Their responses and doubts were resolved by a third independent researcher.
The criteria we applied was the following: 1) If the sentences indicated a positive testimony the associated label is positive, i.e.: "There are some aspects of home-schooling which I like and value" or " The pandemics have shown us to live better with less money.
2) On the other extreme, if the testimony was clearly negative, the associated label was negative, i.e.: " I was not able to progress with my research thesis and I felt anxious", or " The social relationships are not the same in remote, my friendship relationships have deteriorated.
3)The neutral testimonies have received a neutral tag: "In the end, we will all have Covid, we will normalise it, and we'll learn to live with it", "It's time to reflect".
If a single testimony had inputs corresponding to more than one of our three labelling categories, we counted each input under the corresponding labelling item.
The size of our hand-labeled data allows us to perform cross validation experiments and check for the variance in performance of the classifier across folds.

Histograms of labeled text by sentiment categories
The distribution of the data by categories gave us a clear idea of the tendency of feelings in the categories proposed for each country selected as a case study. For this reason, the distribution charts of the corresponding data for: student and staff data groups have been prepared separately.

Histograms for Student group data
The following histogram chart, correspond to the data distributions by country and by sentiment category for the student data set   In general, we can observe for the students in both countries, the corresponding texts are mostly of Negative sentiment with 185 labels of this category, in contrast to 35 labels for the Neutral category and 23 for the Positive category.

Histograms for the Staff dataset
The following Histogram chart corresponds to the data distribution by country and by sentiment category for the Staff data set.   In general, we can observe that for the Staff group, the corresponding texts are mostly of Negative sentiment with 73 labels of this class, in contrast with 28 labels for the Neutral category and 41 for the Positive category.

NLP techniques
Natural language processing techniques let us automate the analysis of the content of a text with the aim of extracting knowledge. In the context of this research, we are mainly interested in sentiment analysis, as well as in the infographic presentation of the text written by the users who answered the survey. Therefore, we present two subsections with the techniques used to achieve this goal. To analyze the sentiment of a text, it is necessary to represent it numerically, then, to do this, we have used the "Word Embedding" technique, explained in section 2.4.1.1. This method encodes the words in numerical vectors without losing the context information. It is important to note that this technique requires a large amount of data in the training stage. Therefore, we have decided to use transfer learning from the IMDB database, which is appropriate [13] for this type of task and whose details are explained in section 2.4.1.2. Once the encoding task to a numeric domain was performed, a classifier is trained to determine the sentiment of the text, figuring out the category, in this case positive, negative or neutral. Details of the classification methods are given in section 2.4.1.2.

Word embedding
Word embedding is a Natural Language Processing (NLP) approach used to convert words into vector arrays, with the intention of capturing the semantic and syntactic relationship between words, with the purpose to simulate, the human learning of linguistic vocabulary. This problem is one of the most interest of this field [14], where the authors say "Representation learning is a long-standing problem in natural language processing (NLP)". To solve this representation problem, we have decided to go beyond of the surface forms of a text (e.g., symbols, words, sentences, and actual documents) to meaningful similarities (e.g., semantic or syntactic) between two text fragments [15].
The main idea is to represent each word, in a large body of text, by a feature vector, so that it is possible to measure the similarity between vectors (i.e., words) using linear algebra (e.g., using Cosine Similarity [16]) .
There are two categories of Word embedding models, on the one hand, models based on matrix factorization and on the other hand, models based on sampling from a sliding window [17].
This approach has shown to be of great utility in tasks such as translation, analytics, and word similarity [18]. Currently, there are several successful and well-known word embeddings, such as GloVe [19] and Word2Vec [20] , which have had a profound impact on NLP research and have inspired in the construction of new word vectors based on stochastic downward gradient.

Swivel embedding
Swivel [21] (Submatrix-wise Vector Embedding Learner) is a model that proposes a hybrid between the shortcomings of the SkipGram Negative Sampling (SGNS) model [22] and GloVe [19]. On the one hand, it uses a co-occurrence matrix to calculate the PMI (Pointwise mutual information) between pairs of words, which it uses as an optimization objective by decreasing the error, using stochastic gradient descent, between the dot product of weight vectors (embeddings) of core words and context words and the PMI calculated through statistical counting of co-occurrence of words within the corpus. One of the outstanding advantages is the possibility of performing distributed training, since the nature of their proposal is to divide the co-occurrence matrix into k sub-matrices, extending the capability to parallelized training in a workers and central server configuration. This in turn allows the training of Word embeddings with larger corpora, adding that the computational cost is proportional to the size of the co-occurrence matrix as opposed to SGNS whose computational cost is proportional to the size of the corpus. One of the advantages compared to GloVe is that it takes into account the weighting of unobserved co-occurrences, thus providing a better vector representation for rare words. The model outperforms previous models [23] [24] in different NLP tasks such as WordSim Similarity and semantic analysis.

Transfer Learning with English Google News and IMDB database
As mentioned above, a large amount of text is necessary to train stochastic gradientbased models in order to obtain the desired knowledge extraction. For the present case, the syntactic and semantic relations of the context are obtained through the use of a pretrained embedding with the English Google News dataset [25] , which consists of 130 GB of Corpus. It provides a vector encoding of 20 features or dimensions.
On the other hand, the raters were trained with the IMDB database, which consists of 50,000 movie reviews, given by different users from all over the world, through a platform with the same name; separated into 80% for training and 20% for validation. Each comment is labeled with a value of 0 and 1 to discriminate between negative or positive comments accordingly. It is important to highlight that the IMDB database [13] was used as input for the training and validation of the model. However, the entire testing stage was performed using the database of open-ended responses given in the survey regarding people's thoughts on quarantine due to COVID 19 disease.

Multilayer Perceptron MLP
The proposed configuration to build a system capable of analyzing the sentiment in a text fragment consists of a 20-dimensional Word embedding Swivel [26] to find the vector representation of the input text, then a neural network consisting of an input layer of 20 neurons, a hidden layer of 16 neurons, with Relu activation layer and an output layer of a single neuron with a sigmoidal tangent activation function. The schematic of the proposal can be seen in Figure 10. The sigmoidal tangent activation function at the output is chosen to obtain a continuous value representing the degree of positive sentiment in a text fragment, i.e., a regressor between -1 and 1 is constructed from dichotomous features 0 and 1, which could usually be used for a classification task.

SVM Support Vector Machine
As in the previous section, the IMDB database is used to train and validate a classifier, the open response database for testing, and an embedding Swivel for text encoding, however, now the classifier is based on support vector machine for the regression task. As in the previous model, the output corresponds to a value between -1 and 1, in order to determine the degree of negativity or positivity of the text used in the input. Figure 12 shows the proposed model.

Decision model based on interval comparison
Finally, in the output of the proposed classification models, an interval-based decision stage is included, whose limits were obtained through 5-fold cross-validation, since O values between interval -1 and 1 of the proposed model output must be classified into Positive, Neutral or Negative classes. were obtained through 5-fold cross-validation, since the O values between the interval -1 and 1 of the proposed model output must be classified in the Positive, Neutral or Negative classes. It should be noted that the limits for the negative and positive classes are not explicitly found, since they are discarded with the two thresholds found. In addition, it should be noted that positive upper threshold is always 1 and Negative lower threshold is always -1. In order to achieve this objective, the algorithm shown in the following diagram is applied.

Performance measurement
Finally, the performance of the proposed models is measured, based on the ground truth for the test data, in this case, all the open responses of the instrument used for data collection. With the predictions obtained, the respective confusion matrices are constructed and the accuracy metric is calculated as a representative value to compare performance.

Figure 13: Frequency-based infographic retrieval diagram -Cloudwords
A word cloud is a graphical representation of the relative frequencies of words in a text, i.e. the number of times a word is repeated within a text [27]. It is very important to emphasize that in order to obtain a meaningful representation, words that are normally repeated in the language of the analyzed text must be eliminated. That is why for the present application we eliminated the determiner articles, as well as special characters and punctuation marks. The threshold frequency that allows the appearance or not of a word within the configured infographic or word cloud is at least 20 repetitions. Finally, the image obtained will present the word in a font size directly proportional to the number of times the word is repeated within a document or set of texts.

Results
This article presents a novel data set on perceptions and behaviors of university staff and students collected after the beginning of the Covid-19 outbreak in two Spanish Speaking Countries. In this study we present the summary of the findings from a sample of 225 students and 140 staff from two countries. It is part of a lager international survey, where consistently students and staff were asked about the same questions. The questionnaire was composed of 53 multiple choice questions and one open text question or item.
Our analysis can be used to uncover the impacts of the Covid-19 pandemic on perceptions and behaviors (work/study, move and travel) of spain and latinomerica university staff and students and in providing further insights into the sensitivity of student and staff towards confinement and online learning / teaching.

NLP techniques
To obtain the results presented below, 243 records were used in the case of the students and 142 records in the case of the staff, corresponding to the answers to the open question of the instrument used for data collection in this research. Therefore, four confusion matrices were obtained corresponding to the two sentiment analysis models described in subsection 2.4.1.2.2 and an image of the infographic described in section 2.4.2.

Sentiment Analysis
Once the proposed classification models were trained and validated, the following results were obtained, in the test stage, in the task of classification of feelings of the input text, which corresponds to the opinions given by the people who responded to the survey instrument for the collection of information for this research.

Decision thresholds for each case
Once the 5 Folds cross-validation is applied, Table 2 presents the average decision thresholds for each proposed model and the respective dataset, as well as the average accuracy obtained with the calculated thresholds.  As can be seen, for all cases, the MLP-based classifier performs better in terms of text classification accuracy for the 3 classes Positive, Neutral and Negative.

Model
It is observed that the range of neutrals for the staff occupies a higher range of values (from 0.38 to 0.52) while for the students it would be from 0.22 to 0.49.

Confounding matrices for the MLP-based classifier model
With the thresholds obtained, we proceed to execute the decision stage at the output of the MLP model, in order to determine the confusion matrix of  In the following tables, we performed an analysis of the relevant missclasifications results, obtained for the NLP system. This analysis was made with the purpose to identify the wekness and limitations, in terms of text classification. In order to conserve the nature of the properties of the original text of the database, as you can see, we conserve the phrases in spanish language.  As can be seen, one of the sentences where the system gets confused is the word solidarity, which by itself does not express any feeling. The case of the second sentence of the previous table is due to a mistake in the writing, which leads to change the idea that the person wanted to express.

Negatives registers classified as Positives
Most college students suffer greatly that give a good tuition discount The response of the organizations (Government, University... ) was insufficient, with little information and a great deal of uncertainty, which is what has caused the greatest source of stress, anxiety and discomfort. In addition to a feeling of helplessness and vulnerability.
The inequality of memories and the impossibility of many families without internet to continue their studies has not been taken into account.
I am a university student. the workload has been multiplied by 3 because the professors consider that ''we have more time because we are all at home''. the accumulation of work is embarrassing and unjustifiable.
The pandemic has contributed to the fact that the millennial generation is having a very difficult time finding job stability in line with their studies. The second sentence, in which the system is wrong, presented in the table above, is analyzed finding that in isolation, it can be understood with positive sentiment, however, in the ground truth it has been labeled as negative, as experts have contextualized the sentence in the framework of a complaint.
The remaining misclassified sentences in the table above are a consequence of the bias of the database, as it should be noted that transfer learning was applied to obtain the sentiment opinions in this application, due to the limited amount of texts for the training stage.
With the thresholds obtained, we proceed to execute the decision stage at the output of the MLP model, in order to determine the confusion matrix in Table 6 for the Staff case.

Positives registers classified as Negatives
Countries have the opportunity to learn how to improve family reconciliation, the fight against environmental pollution, teleworking and non-face-to-face or mixed modality in education, the opportunity to decrease the use of paper money and reduce monetary fraud, use national labor in jobs imported by foreign workers, improve the population rate in villages and reduce the decline in rural population, increase the hours dedicated to exercise, etc. I am an administration and services staff at my university. I work from home almost as before, using the family ADSL line. Working at home allows me to have sunlight, while my workstation is artificially illuminated. Now, my biggest concern is to be able to continue working from home in telecommuting mode. We are not allowed to bring our equipment home. I believe that teleworking would avoid commuting and contribute to the improvement of the environment. This experience has made me think that for some sectors, telework is a reliable and beneficial option for workers. I think it has to continue even when we come out of this pandemic. Giving the option to work from home has to be seriously contemplated not only as a preparedness measure for possible resurgences but also as a tool to improve the well-being of the workers. As can be seen, the system presents errors when the input sentence in word length is long and with grammatical and lexical errors. That is why the previous examples were classified as negative, when in the ground truth they were labeled as positive. It should be noted that the system is sensitive to the correct wording of the answers given by the users.

Negatives registers classified as Positives
It greatly reduces the quality of work, i.e., performance and concentration. The previous sentence is a consequence of the bias of the database, as well as with the classification errors presented in the group of students, where it is observed that although it expresses a negative feeling, the Natural Language Processing system catalogs it as Positive. The graph was obtained based on the frequency of occurrence of each word. It is clear that in the analyzed text, the word confinement is the word that appears the most, however the words university and pandemic are observed as following, in terms of relative frequency. In addition, the words anxiety, classes, people, family, among others, stand out.

Main limitations
The main limitations encountered, from the point of view of the technology used for data analysis, are detailed below: 1. The NLP classification model uses an automatic translator to the English language, as the original text is in Spanish, errors could occur during the automatic translation of the texts. As a future line, a database could be used to apply transfer learning in Spanish language, in the stage of codification of the words (embedding) and in the stage of classification of the texts.
2. Sensitivity of the system to errors in spelling and grammar in the texts written by the users, making it difficult for the NLP model to automatically translate and interpret the feeling of the texts.
3. The system is biased by the use of a general purpose database, such as IMDB ,in the application of the transfer learning technique [28], due to the limited amount of texts for training from scratch. As a future line, to extend the questionnaire to more countries and more users, in order to increase the number of texts and thus be able to train a specific model for this application.
Besides, our study presents the following limitations. First, the use of an automatic Spanish-English translator by the NLP classification model may lead to considerable mistranslations. Further studies may implement in the codification and classification stages a database with transfer learning techniques in Spanish. Second, system sensitivity to spelling and grammatical mistakes difficulted the automatic translation and therefore the sentiment analysis by the NLP model. Third, the general IMBD database for the transfer learning technique biased the results due to the limited number of texts available for data training. Future surveys should include more countries and users to improve data training. Finally, the selection bias due to the use of an online survey limited the sample to those with internet access and reduced our study's external validity.

Discussion
Further analysing the tagcloud results, it can be found some prominent elements regarding the expression of emotional status. Anxiety is reflected in the tagcloud. As any other word on the tagcloud, in itself it is a neutral word. However according to the overall lockdown context the expectation would be of a negative relationship and a negative impact of the pandemic and the lockdown on anxiety levels and management. If we take a look at the open text individuals comments, it can be observed that anxiety is often related to stress, discomfort, preoccupation and poor management: " the lack of information and the high levels of uncertainty was sufficient to provoke high levels of stress, anxiety" other mention that coping with high demanding levels of home office in combination with home work caused anxiety peaks. This is aligned with numerous publications showing that students have been highly vulnerable to mental health issues during the COVID-19 pandemic, and researchers have shown that perceived stress and mental health problems have increased during the pandemic [29]. In summary an expected finding of this study is the direct impact of covid and lockdown on anxiety level and its management among students and university population.
Another term beyond to highlight among the tagcloud is family. It requires further exploration to understand the dimension of its impact. The comments show that there are important nuances. Some participants expressed as a positive asset: The good thing is that I still have my internship job and my family has food and health,/the time with my family has been excellent, being with them and feeling their care and love, this is aligned with literature, where some recent studies reported that family income stability, living with parents and overall social support were protective factors against anxiety [29] Others about contagion: "On the other hand, covd-19 has caused me anxiety about the possibility of contagion in my home, keeping my distance at home in case of contagion is being very hard". Other comments were related more to the obligations concerning take care of their families: The most complicated thing has been the family conciliation / It is very stressful to be under four walls with all your family day and night, even more, when you are a person who likes solitude; not having an economic income to contribute to your family or home worsens your state of mind. Look for similar literature. In conclusion, the issue of family is related to both negative and positive effects, and is aligned with the current literature, for example Solomou et al, have recently and surprisingly concluded that those who have spent the pandemic alone have had less effects than those who have spent it in company: An unexpected finding of the present study was that those participants who were living with others had increased anxiety compared to those who lived alone. This finding is in contrast with previous studies, since living alone increased the risk for development of common mental disorders, such as anxiety and depression. This aspect may require further exploration.
The confinement as a result of government measures to contain the Covid-19 pandemic has had important repercussions on the lives of people in general. For university students and university staff, this situation accelerated the change that was already taking place, especially in universities with virtual education. In Colombia, according to Forbes [30], before the pandemic only 10% of university students received virtual classes, after the pandemic the Ministry of Information and Communication Technologies reports that virtuality has increased by more than 70% [31].
This study allows us to confirm the reality of students and university staff in relation to the feelings that the Covid-19 pandemic has awakened in these groups. Regardless of the type of university in which they develop their higher education training process and the tasks developed by the staff in the universities, there is a marked identification of negative feelings about the way of facing the pandemic with the confinement and what this has meant for their lives.
On the other hand, the sentiment analysis in this study is the methodology used to identify the positive, negative or neutral feelings of the people from the analysis of the narratives with which the students and staff of the universities give account of how they perceive or face a situation by the Covid-19. Among the words that mark their expressions is "Confinement", which marks a determining condition in people's lives. The study conducted at the public University of San Francisco de Paula Santander in Colombia [32] shows how the students of his university have been affected in their mental health because of the confinement; since it is one of the measures that has had the greatest impact on the lives of students affecting their personal relationships, their academic performance, their conditions of university student life and thus generating accelerated changes in their way of life.
ASCUN, the Colombian Association of Universities, in Bulletin No.3 of August 2020 [33], projects a decrease of more than 50% of enrolments due to the pandemic. One of the major concerns was based on the lack of technological infrastructure capacity to respond to the needs of low-income students who do not have the technology and sufficient resources to turn to this new form of education and the lack of employment in many families because of their socioeconomic conditions should postpone the university studies of their children in undergraduate and graduate themselves.
Another of the discussions raised in the ASCUN report and ratified through this study is the concern of students about the quality of education received through virtual platforms, especially in those students whose training has a practical component that involves research and training developments in contact with the community, specific groups or in laboratories [33]. Students have negative feelings about the quality of this education and feel that they will not be sufficiently prepared for their professional future, especially those students of careers related to the exact sciences and social sciences, whose training involves internships and practical training.
In the study conducted at the Universidad Francisco de Paula Santander [32], a higher incidence of depression was identified in men, in the population aged between 16 and 35 years. In this study it was not possible to differentiate feelings between men and women, nor in age groups, so it is not possible to determine the use of this methodological tool to identify the tendency of feelings towards positive, negative or neutral; since the analyzed expressions do not allow identifying these characteristics.
In the study of the Pedagogical and Technological University of Colombia [34], on the affectation of emotions in students and the way they face them due to uncertainty and fear of the unknown, it is shown how these circumstances have increased stress in the face of academic tasks and commitments and the way in which this negatively influenced their academic performance. On the other hand, the study also shows those feelings that are not recognized as positive or negative and that, therefore, are categorized as neutral because they express that there are particular situations of confinement in which it is not possible to classify, such as those descriptions about being with the family, integrating into their daily activities situations of other spaces previously controlled or autonomous for the students and now shared with members of their family or coexistence nucleus.
We have demonstrated the feasibility of implementing NLP techniques for the sentiment analysis in perceptions and behaviors of university staff and students using a dataset collected after the beginning of the Covid-19 outbreak in Colombia and Spain.
The lockdown is a government measure to contain the Covid-19 pandemic that has unevenly affected our lives, while for university students and university staff, the lockdown speeded up an underway change. Indeed, in Colombia before the pandemic, only 10% of university students received virtual classes, while after the lockdown, that number increased by more than 70% [31]. Measuring the impact of the Covid-19 pandemic on mental and physical health status may contribute to elucidate the risks of restrictions such as the lockdown and social distancing. Our results are aligned with publications showing that students have been highly vulnerable to mental health issues during the COVID-19 pandemic. Nevertheless, our approach is novel using AI techniques compared to the before-mentioned studies to determine the sentiment analysis of online survey responses.
We showed how the feelings of Colombian and Spanish students have awakened due to the Covid-19 pandemic. Unfortunately, there is a marked identification of negative feelings about facing the pandemic with the confinement and what it means for their lives, regardless of the university they attended to receive their higher education and the university staff duties. In addition, we identified a considerable difference in feelings between students in Spain and Colombia. It could be expected that countries with more significant development and economic infrastructure capacity, such as Spain, would have shown a higher number of positive responses compared to countries such as Colombia, which has fewer economic and infrastructure resources plus marked poverty and social inequality gap. Paradoxically, Spain's results showed negative feelings in more than 93% of the answers, while Colombia's negative feelings were roughly more than 70% of the total of the analyzed answers.
The Staff's situation is very similar between the two analyzed countries: both countries showed 50% of negative responses. However, it should be noted that Spanish university staff reported positive rather than neutral feelings in the remaining 50%, while Colombian university staff distributed their positive and neutral responses in equal proportions. The lack of noticeable differences may be explained by the fact that both university staff develops their tasks using and training on similar available technologies such as Zoom and Microsoft Teams. However, there is an increasing concern about how the Covid-19 pandemic will modify education. In Colombia and other low-middle income countries, the concerns revolve around the lack of technological infrastructure capacity to respond to the needs of students and personnel, foreseeing a decrease of more than 50% in enrollment due to the Covid-19 pandemic. Another major preoccupation is how careers with a high hands-on component, such as community services for social sciences and laboratory practices for engineer careers, will be affected by virtual learning. Our methodology could be useful to identify students with negative feelings and potentially more vulnerable to adapt to post-pandemic educative strategies.
All in all, understanding the impact of covid on mental and physical health status, can contribute to understanding the risks of severe restrictions such as lockdown and social distancing measures. Sentiment analysis has proven its validity to contribute to understanding the changes and patterns of such restrictions on overall health and wellbeing. Yet this research sheds lights in the limitation of the current IA models to predict with precision further implications due to the limitations explained above.

Conclusions
Our sentiment analysis has proven its validity to understand the changes and patterns of Covid-19 pandemic restrictions on overall health and wellbeing. In future surveys, the suggested approach using AI for sentimental analysis may be useful to uncover the impacts of but not limited to the Covid-19 pandemic on perceptions and behaviors, reducing the manual labor to quality control of misclassifications. Yet this research sheds light on the limitation of the current AI models to predict further implications and stream its implementation.
The methodology allows us to identify a paradox regarding the feelings that students express and shows the reality in Latin American countries such as Colombia and European countries such as Spain. The study presents a marked difference of feelings between students from Spain and Colombia, in the first case the percentage of answers identified with negative feelings reaches a little more than 93% leaving a small percentage between positive and neutral feelings; for the second case, in Colombia, the negative answers are a little more than 70% of the total of the analyzed answers. This result is striking, given that it is expected that countries with greater development and economic infrastructure capacity would have a more positive response due to their conditions of adaptability to the contingent situation, and countries such as Colombia, which have fewer economic and infrastructure resources and a marked poverty and social inequality gap, present a smaller proportion of negative responses, leaving between neutral and positive responses almost 30% of the students' responses.
With respect to the Staff, the situation is very similar between the two countries, both have 50% of negative responses to the situation of confinement due to the pandemic. What is striking about these responses is that the staff in Spain have positive feelings rather than neutral in the remaining 50%, while the staff in Colombia share their positive and neutral responses in equal proportions. This could be explained by the fact that the Colombian staff develop their tasks in very similar conditions to the Spanish staff, and the changes that arose from the pandemic can be assimilated to the tasks that the universities develop in general. The use of technologies, the availability of technological tools, the possibilities of training and training are given in a similar.

Positives registers classified as Negatives
Los países tenemos una oportunidad de aprender a mejorar la conciliación familiar, la lucha contra la contaminación ambiental, el teletrabajo y el modo no presencial o mixto en educación, oportunidad para ir bajando el uso de la moneda en papel y bajar el fraude monetario, usar mano de obra nacional en puestos de trabajo de importación de trabajadores extranjeros, mejorar la tasa poblacional en los pueblos y disminuir el descenso de la población rural, aumentar las horas dedicadas al ejercicio He aprendido a gestionar mejor mi tiempo, mis recursos económicos y mis relaciones con los demás miembros de la familia. No estaba en activo durante el tiempo de confinamiento, lo que me ha ayudado a centrarme en la familia. Considero muy complicado conciliar vida laboral y familiar si tengo que trabajar con mis hijos en casa.
Soy personal de administración y servicios de mi universidad. Trabajo desde casa casi igual que antes, usando la línea familiar de ADSL. Trabajar en casa me permite disponer de luz solar, mientras que mi puesto de trabajo se encuentra iluminado artificialmente Ahora, mi mayor preocupación es poder seguir trabajando desde casa en la modalidad de teletrabajo. No nos permiten traernos los equipos a casa. Creo que el teletrabajo evitaría desplazamientos y contribuiría a la mejora del medio ambiente.

Negatives registers classified as Positives
Reduce mucho la calidad de trabajo, es decir, el rendimiento y la concentración.