Anderson, R., Scala, C., Samuel, J., Kumar, V., & Jain, P. (2024). Are emotions conveyed across machine translations? Establishing an analytical process for the effectiveness of multilingual sentiment analysis with Italian text. Journal of Big Data and Artificial Intelligence, 2(1).
Anderson, R., Scala, C., Samuel, J., Kumar, V., & Jain, P. (2024). Are emotions conveyed across machine translations? Establishing an analytical process for the effectiveness of multilingual sentiment analysis with Italian text. Journal of Big Data and Artificial Intelligence, 2(1).
Anderson, R., Scala, C., Samuel, J., Kumar, V., & Jain, P. (2024). Are emotions conveyed across machine translations? Establishing an analytical process for the effectiveness of multilingual sentiment analysis with Italian text. Journal of Big Data and Artificial Intelligence, 2(1).
Anderson, R., Scala, C., Samuel, J., Kumar, V., & Jain, P. (2024). Are emotions conveyed across machine translations? Establishing an analytical process for the effectiveness of multilingual sentiment analysis with Italian text. Journal of Big Data and Artificial Intelligence, 2(1).
Abstract
Efficient textual data distributions (TDD) alignment and generation are open research problems in textual analytics and NLP. It is presently difficult to parsimoniously and methodologically confirm that two or more natural language datasets belong to similar distributions, and to identify the extent to which textual data possess alignment. This study focuses on addressing a segment of the broader problem described above by applying multiple supervised and unsupervised machine learning (ML) methods to explore the behavior of TDD by (i) topical alignment, and (ii) by sentiment alignment. Furthermore we use multiple text generation methods including fine-tuned GPT-2, to generate text by topic and by sentiment. Finally we develop a unique process driven variation of Kullback-Leibler divergence (KLD) application to TDD, named KL Textual Distributions Contrasts (KL-TDC) to identify the alignment of machine generated textual corpora with naturally occurring textual corpora. This study thus identifies a unique approach for generating and validating TDD by topic and sentiment, which can be used to help address sparse data problems and other research, practice and classroom situations in need of artificially generated topic or sentiment aligned textual data.
Keywords
Textual data distributions; supervised learning; unsupervised learning; Kullback-Leibler divergence; sentiment; textual analytics; text generation; vaccine; stock market
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.