Submitted:
25 October 2023
Posted:
25 October 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background
3. Data Preprocessing
- Only textual information of the tweet was retained (no additional metadata is considered).
- Only tweets with a minimum of 6 terms were considered.
- Retweets were discarded
- Tweets with links were discarded
- Text was normalised to lowercase, without accents and compressing letter repetitions to two occurrences.
4. Tools for Psycholinguistic Characterization
4.1. Psycholinguistic Features of LIWC
- Openness: Clifford and Jerit [14] in their study indicates that of the Big 5 Personality Features (Neuroticism, Extraversion, Openness, Agreeableness and Conscientiousness) from openness and conscientiousness correlate with ideology. They use the work of Yarkoni [15] to match LIWC with openess through correlations with their categories. Correlation values with openness greater than 0.2 (which is the threshold that gives Zhang and Counts the best results), positive or negative are: total pronouns (-0.21), articles (0.2), time (-0.22), motion (-0.22) and grooming (-0.2), although the latter has been removed in the current versions of LIWC. We do not know if these correlations are applicable to Spanish, so we assume this possible bias and propose to carry out future research similar to that of Yarkoni for Spanish language.
- Emotions and feelings: they use LIWC to measure positive and negative emotion according to the content of the texts as well as the prevalence of emotions such as anger and anxiety, together with the prevalence of swear words (Maldec in Spanish LIWC).
- Certain: Certain and tentative measures are added.
- Bipartisanship: In addition to these measures that we can select from the literature, we have also decided to include the categories ppron and they as indicators of bipartisanship as we understand that the continuous references to other parties reflect a use of language oriented towards argumentation based on an opponent.
5. Topic Modeling
6. Experiments
6.1. Linguistic Inquiry and Word Count (LIWC)
6.2. Topic Modeling
7. Analysis
8. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Abbreviations
| PP | Partido Popular |
| PSOE | Partido Socialista Obrero Español |
| UPyD | Unión Progreso y Democracia |
| IU | Izquierda Unida |
| C’s | Ciudadanos |
| API | Application Programming Interface |
| LIWC | Linguistic Inquiry and Word Count |
| LDA | Latent Dirichlet Allocation |
| NLP | Natural Language Processing |
| TDT | Topic Detection and Tracking |
| PCA | Principal Component Analysis |
References
- Gutiérrez-Rubí, V. El silencio en política, 2016. [Online; 29 Marzo de 2016].
- Ferri-Fuentevilla, E.; Ruiz-Jiménez, A.M. Entre patria y estado: Formas de nombrar España. Un recorrido por los discursos programáticos de PSOE y AP-PP entre 1977 y 2011. Empiria. Revista de metodología de ciencias sociales 2015, 32, 63–84. [Google Scholar] [CrossRef]
- Easton, D. A systems analysis of political life; Wiley New York, 1967.
- Anderson, B.; Suárez, E.L. Comunidades imaginadas: Reflexiones sobre el origen y la difusión del nacionalismo; Fondo de Cultura Económica México, 1993.
- González de la Fé, T. Sociología y Big Data. Encrucijadas. Revista crítica de ciencias sociales 2014, 8, 51–53. [Google Scholar]
- Martínez-Cámara, E.; Martín-Valdivia, M.T.; Urena-López, L.A.; Montejo-Ráez, A.R. Sentiment analysis in Twitter. Natural Language Engineering 2014, 20, 1–28. [Google Scholar] [CrossRef]
- Troyano Jiménez, J.A.; Ureña López, L.A.; Maña López, M.J.; Cruz Mata, F.; Enríquez de Salamanca Ros, F. AORESCU: Análisis de opinión en redes sociales y contenidos generados por usuarios. Procesamiento del Lenguaje Natural 2015, 55, 153–156. [Google Scholar]
- y David Vilares, M.A.A. A review on political analysis and social media. Procesamiento del Lenguaje Natural 2016, 56, 13–24. [Google Scholar]
- Guy, I.; Zwerdling, N.; Ronen, I.; Carmel, D.; Uziel, E. Social media recommendation based on people and tags. In Proceedings of the Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010, pp. 194–201.
- Bosworth, A.G.; Cox, C.; Sanghvi, R.; Ramakrishnan, T.S.; D’angelo, A. Generating a feed of stories personalized for members of a social network, 2010. US Patent 7,827,208.
- Zhang, A.X.; Counts, S. Modeling Ideology and Predicting Policy Change with Social Media: Case of Same-Sex Marriage. In Proceedings of the CHI, 2015, pp. 2603–2612.
- Pennebaker, J.; Chung, C.; Ireland, M.; Gonzales, A.; Booth, R. The development and psychological properties of LIWC2007, 2014.
- Ramírez-Esparza, N.; Pennebaker, J.W.; García, F.A.; Suriá Martínez, R.; et al. La psicología del uso de las palabras: Un programa de computadora que analiza textos en español. Revista Mexicana de Psicología 2007, 24. [Google Scholar]
- Clifford, S.; Jerit, J. How words do the work of politics: Moral foundations theory and the debate over stem cell research. The Journal of Politics 2013, 75, 659–671. [Google Scholar] [CrossRef]
- Yarkoni, T. Personality in 100,000 words: A large-scale analysis of personality and word use among bloggers. Journal of research in personality 2010, 44, 363–373. [Google Scholar] [CrossRef] [PubMed]
- Graham, J.; Haidt, J.; Nosek, B.A. Liberals and conservatives rely on different sets of moral foundations. Journal of personality and social psychology 2009, 96, 1029. [Google Scholar] [CrossRef] [PubMed]
- Wallach, H.M. Topic modeling: Beyond bag-of-words. In Proceedings of the Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp. 977–984.
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. the Journal of machine Learning research 2003, 3, 993–1022. [Google Scholar]
- Cotelo, J.M.; Cruz, F.L.; Troyano, J.A. Dynamic topic-related tweet retrieval. Journal of the Association for Information Science and Technology 2014, 65, 513–523. [Google Scholar] [CrossRef]
- Hofmann, T. Probabilistic latent semantic indexing. In Proceedings of the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999, pp. 50–57.
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. CoRR 2013, abs/1301.3781.
| 1 | Available online: https://onclusive.com/ (accessed on 23 October 2023) |
| 2 | Avalaible online: https://metricool.com/ (accessed on 23 October 2023) |
| 3 | Available online: https://developer.twitter.com/en/docs/twitter-api (accessed on 23 October 2023) |
| 4 | Available online: https://www.elastic.co/logstash (accessed on 23 October 2023) |
| 5 | Available online: https://www.elastic.co/ (accessed on 23 October 2023) |
| 6 | Available online: http://mallet.cs.umass.edu/topics.php (accessed on 23 October 2023) |



| Days | Tweets | Data Size | Vocabylary Size |
|---|---|---|---|
| 28 | 5,530,927 tweets | 554 MB | 159,587 terms |
| Feature | LIWC | Spanish LIWC |
|---|---|---|
| Openness | total pronouns | TotPron, -0.21 |
| articles | Articulo, 0.2 | |
| time | Tiempo, -0.22 | |
| motion | Movim, -0.22 | |
| Emotions and feelings | affec | Afect |
| posemo | EmoPos | |
| negemo | EmoNeg | |
| anger | Enfado | |
| sad | Triste | |
| anx | Ansiedad | |
| swear | Maldec | |
| Certain | certain | Certeza |
| tentative | Tentat | |
| Bipartisanship | ppron | ProPer |
| they | Ellos(Ellas) |
| C’s | IU | Podemos | PP | PSOE | UPyD | |
|---|---|---|---|---|---|---|
| Openness | ||||||
| TotPron | 0,164228 | 0,171207 | 0,179671 | 0,197391 | 0,191635 | 0,193254 |
| Articulo | 0,197265 | 0,194188 | 0,191063 | 0,206423 | 0,196706 | 0,201728 |
| Tiempo | 0,010781 | 0,014057 | 0,025757 | 0,029475 | 0,021154 | 0,048754 |
| Movim | 0,010859 | 0,022507 | 0,024291 | 0,030905 | 0,022793 | 0,016398 |
| Emotions and feelings | ||||||
| Afect | 0,050269 | 0,045250 | 0,045106 | 0,031084 | 0,040244 | 0,030375 |
| EmoPos | 0,032568 | 0,023533 | 0,028228 | 0,021474 | 0,031701 | 0,021791 |
| EmoNeg | 0,017975 | 0,024086 | 0,015245 | 0,009431 | 0,011898 | 0,007484 |
| Enfado | 0,012570 | 0,011135 | 0,008879 | 0,004839 | 0,005457 | 0,001816 |
| Triste | 0,003460 | 0,006712 | 0,002220 | 0,001553 | 0,002507 | 0,000666 |
| Ansiedad | 0,001163 | 0,002764 | 0,002639 | 0,001663 | 0,001427 | 0,001101 |
| Maldec | 0,000479 | 0,002685 | 0,001759 | 0,000880 | 0,000829 | 0,000165 |
| Certain | ||||||
| Certeza | 0,008230 | 0,017373 | 0,019936 | 0,026616 | 0,011107 | 0,011115 |
| Tentat | 0,031336 | 0,021322 | 0,036521 | 0,023852 | 0,035327 | 0,012766 |
| Bipartisanship | ||||||
| PronPer | 0,104321 | 0,109295 | 0,114629 | 0,124060 | 0,133593 | 0,115776 |
| Ellos | 0,031424 | 0,018874 | 0,024291 | 0,034837 | 0,021057 | 0,019534 |
| Topic | Spanish Terms | Translated Terms | Parties |
|---|---|---|---|
| 0 | pueblo ahora cataluna soberania pues gente nunca nacion mayor voto politico dinero catalunya cosas elecciones corrupcion | people now catalonia sovereignty so people never nation biggest vote political money catalonia things elections corruption | Podemos |
| 1 | historia independencia espana mejor independentista espanol hablar mismo referendum parte gran alguien lengua catalanes hoy hombre proceso | history independence spain best independentist spanish speak same referendum part great someone language catalans today man process | C’s and IU |
| 2 | historia bandera cataluna patria pais frente partidos usted golpe viva catalana leyes parece pagar verguenza mayoria mismo patriotas declaracion | history flag catalonia homeland country front parties you coup viva catalana laws seems pay shame majority same patriots declaration | PP |
| 3 | presidente espana ninguno instrumentos juridicos objetivos alcance utilizara lograran garantizo nacion juntos democracia espanoles europea preservaremos avanzada fractura logrado | president spain none instruments legal objectives scope scope will_use will_achieve guaranteed nation together democracy spanish_people european will_preserve advanced fracture achieved | PP |
| 4 | cataluna independencia rajoy acabo manana deje prensa rueda financiar personalmente anunciar constitucion autogobierno autonomia golpistas emana suspendido espanola votar | catalonia independence rajoy finish tomorrow let press conference finance personally announce constitution self-government autonomy coup_plotters emanate suspended spanish voting | UPyD |
| 5 | pais constitucion cultura partido politica siempre espanola queremos anos personas psoe menos gobierno quieren espanoles quiero ley votar | country constitution culture party politics always spanish we_want years people psoe less government they_want spanish_people I_want law vote | PSOE and Podemos |
| 6 | hijos lider gusta historia segun acuerdo pregunten tdt espana reforma religion laico bienestar vida constitucional psoe | children leader like history according_to agreement ask tdt spain reform religion secular welfare life constitutional psoe | PSOE |
| 7 | pais contigo gente independencia cultura gracias cambio favor proteger dia menor abrazos grupos democracia disfrute participe ministro | country with_you people independence culture thanks change please protect day lower hugs groups democracy enjoyment participate minister | Podemos |
| 8 | izquierdas programa eliminar impuestos defendiendo laico educacion electricas concertada progresivos nacionalizar rajoy generales elecciones gobierno patriotas pueblo gane empenado cerrazon | left-wing programme eliminating taxes defending secular education electricity concerted progressive nationalise rajoy general elections government patriots people win insist bloody-mindedness | IU |
| 9 | anos democracia negros dictadura franco representa cuidemos valoremos murio himno pais empresas armas saudi trilero vendera arabia esplendor maximo historia | years democracy blacks, dictatorship franco represents care value died anthem country companies weapons saudi swindler will_sell arabia splendour maximum history | C’s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
