Search | Preprints.org

Preprint ARTICLE | doi:10.20944/preprints202403.0147.v1

A Method to Classify Texts Based on Sentiment Analysis and Machine Learning

Claudia Corona López, Jesus Urias Piña, Rafael Lahoz-Beltra

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Sentiment analysis; text classification; machine learning

Online: 5 March 2024 (05:10:29 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202310.0286.v1

Natural Language Processing-based Method for Clustering and Analysis of Movie Reviews and Classification by Genre

Fernando González, Miguel Torres-Ruiz, Guadalupe Rivera-Torruco, Liliana Chanona-Hernández, Rolando Quintero

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Text document clustering; K-means; TF-IDF; NLP; Text vectorization; machine learning; movie reviews

Online: 5 October 2023 (11:57:13 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201802.0001.v1

Building a Domain Ontology in the Process of Linguistic Analysis of Text Resources

Nadezhda Yarushkina, Aleksey Filippov, Vadim Moshkin, Yuri Egorov

Subject: Computer Science And Mathematics, Computer Science Keywords: domain ontology; semantic analysis; linguistics, text resources

Online: 1 February 2018 (03:08:47 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0380.v1

Issues and Agendas of Pandemic Crisis Management: A Text Analysis of World Economic Forum COVID-19 Reports

Hyundong Nam, Taewoo Nam

Subject: Business, Economics And Management, Accounting And Taxation Keywords: COVID-19; pandemic crisis; crisis management; text mining; network analysis

Online: 15 March 2021 (12:34:01 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202309.0744.v1

Social Aspects in Energy Research & Social Science Journal Publications for 2019-2023. Bibliometric Analysis

Boris Chigarev

Subject: Social Sciences, Library And Information Sciences Keywords: subjects of publications; bibliometric analysis; text clustering; energy transition; social issues

Online: 12 September 2023 (10:54:09 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.1431.v1

Online Review Analysis from a Customer Behavior Observation Perspective for Product Development

YeongUn Lee, SeungHyun Chung, JoonYoung Park

Subject: Business, Economics And Management, Business And Management Keywords: online review analysis; customer journey map; customer observation; text mining; customer behavior

Online: 25 March 2024 (08:18:31 CET)

Show abstract| Download PDF| Share

Working Paper DATA DESCRIPTOR

A Large-Scale Tweet Dataset for Urdu Text Sentiment Analysis

Rakhi Batra, Zenun Kastrati, Ali Shariq Imran, Sher Muhammad Daudpota, Abdul Ghafoor

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Urdu Twitter Dataset; Urdu Natural language processing (NLP); Urdu text Sentiments and Emoticons

Online: 24 March 2021 (12:03:46 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.1364.v1

Enhancing Sentiment Analysis with Term Sentiment Entropy: Capturing Nuanced Sentiment in Text Classification

Suttipong Klongdee, Jatsada Singthongchai

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: sentiment analysis; term weighting; text classification; TF-IDF; TFRF; natural language processing

Online: 22 March 2024 (14:35:34 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.0693.v1

SFT For Improved Text-to-SQL Translation

Ankit Agrahari, Puneet Kumar Ojha, Abhishek Gautam, Parikshit Singh

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Text-to-sql

Online: 13 February 2024 (03:07:37 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.1241.v1

The Role of the Energy Sector in Contributing to Sustainability Development Goals: A Text Mining Analysis of Literature

Luísa Cagica Carvalho, Márcia R. C. Santos

Subject: Business, Economics And Management, Business And Management Keywords: Energy Sector; Circular Economy; Sustainable Development Goals; SDG; Text Mining; VOSviewer

Online: 20 November 2023 (13:56:34 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202111.0208.v1

Patent Analysis Using Vector Space Model and Deep Learning Model : A Case of Artificial Intelligence Industry Technology

Yongmin Yoo, Dongjin Lim, Kyungsun Kim

Subject: Computer Science And Mathematics, Information Systems Keywords: Technology analysis; Trend analysis; Patent keyword analysis; Text mining; Natural language processing

Online: 10 November 2021 (15:25:21 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202309.1664.v1

Analysis of Price Influencing Factors of Small-Scale Agricultural Products Based on Network Perspective

Lifang Fu, Huaxu Zhang

Subject: Business, Economics And Management, Econometrics And Statistics Keywords: small-scale agricultural products; price affecting factors; text analysis; sentiment analysis; TVP-VAR model

Online: 26 September 2023 (08:16:34 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202304.1035.v1

A Review on Deriving Maintenance of Accommodations Via Text-based Feedback Analysis

Tharindu Wickramasinghe, Pumudu Fernando

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Classification; Natural Language Processing; Text-based Customer Review Analysis; Sentiment Analysis; Deep Learning

Online: 27 April 2023 (04:30:17 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202010.0649.v2

Modern Clinical Text Mining: A Guide and Review

Bethany Percha

Subject: Computer Science And Mathematics, Information Systems Keywords: text mining; natural language processing; electronic health records; clinical text; machine learning

Online: 3 February 2021 (10:31:14 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202305.0007.v1

An Advanced Approach on Enhancing Accommodation Maintenance Via Text-Based Customer Review Analysis

Tharindu Wickramasinghe, Pumudu Fernando

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Machine Learning; Classification; Natural Language Processing; Text-based Customer Review Analysis; Sentiment Analysis; Deep Learning

Online: 1 May 2023 (03:21:05 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202403.1850.v1

Using Communicative Analysis to Evaluate Engagement in a Community-Based, Expert-Led Public-Health Intervention

Thomas Barker, Heather Allen, Karen Fulton, Nienke Klaver, Lori Motluk, Tanya Osborne, Edward Staples

Subject: Public Health And Healthcare, Primary Health Care Keywords: communicative analysis, communicative intervention, public health, alcohol use disorder, formative evaluation, summative evaluation, text data analytics

Online: 29 March 2024 (13:18:04 CET)

Show abstract| Download PDF| Share

The focus of this article is the evaluation of the quality and the degree of community engagement in an expert-based, public-health communicative intervention for Alcohol Use Disorder (AUD). In 2023 the Canadian Alcohol Use Disorder Society collaborated with a grass-roots community health group, a regional health authority, and an academic participatory researcher to organize a working group to mount a social-marketing campaign to increase community support for AUD medical treatment in family-practice settings in a rural community in British Columbia. The partnership working group conducted a series of activities in the school system, the town council, and the community. This article covers the formative evaluation and the summative evaluation of the working group's effectiveness. The formative evaluation ("lessons learned" and "recommendations") consists of consensus evaluations of the quality of each activity using a variety of data sources. The formative evaluation followed improvement science methods and used consensus meetings to provide and reflect on formal and informal feedback. Formative results reflect the consensus of the participants in the working group and they record the reactions in the community. The summative evaluation measures evidence of community engagement found in transcribed text from meeting minutes. The summative evaluation used theory-based frameworks and text data visualization to assess the engagement dynamics of the working group. The summative evaluation helps to put the formative evaluation into perspective as communicative action. The summative evaluation validates the formative evaluation by revealing the communicative dynamics of engagement which resulted in the community’s increased capacity to understand and support Alcohol Use Disorder Treatment options. The working group and the entire intervention was seen by the participants and the public as a success and capable of replication. In a small community, it instilled new understandings and attitudes, bringing an emerging awareness of medication as an optional health mitigation strategy for Alcohol Use Disorder. Our study furthers this overall assessment by using theoretical constructs and analysis of communicative records to trace engagement characteristics of interest to researchers, societies, and community groups contemplating similar interventions in the future. The study also validates the effectiveness of communicative analysis as a measure of social-marketing for expert-based public health interventions.

Preprint ARTICLE | doi:10.20944/preprints202203.0329.v1

Plagiarism Detection in the Bengali Language: A Text Similarity-Based Approach

Satyajit Ghosh, Aniruddha Ghosh, Bittaswer Ghosh, Abhishek Roy

Subject: Computer Science And Mathematics, Analysis Keywords: Plagiarism Detection; Plagiarism checker for Bengali text; Bengali Literature Corpus; OCR in Bengali text

Online: 24 March 2022 (09:36:56 CET)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202309.1969.v1

A Comprehensive Analysis of the Public Discourse on Twitter about Exoskeletons from 2017 to 2023

Nirmalya Thakur, Kesha A. Patel, Audrey Poon, Rishika Shah, Nazif Azizi, Changhee Han

Subject: Computer Science And Mathematics, Computer Science Keywords: Twitter; Data Analysis; Big Data; Exoskeletons; Data Science; Text Analysis; Sentiment Analysis; Content Analysis; Natural Language Processing

Online: 28 September 2023 (13:25:30 CEST)

Show abstract| Download PDF| Share

The work of this paper presents multiple novel findings from a comprehensive analysis of about 150,000 tweets about exoskeletons posted between May 2017 and May 2023. First, findings from content analysis and temporal analysis of these tweets reveal the specific months per year when a significantly higher volume of Tweets was posted and the time windows when the highest number of tweets, the lowest number of tweets, tweets with the highest number of hashtags, and tweets with the highest number of user mentions were posted. Second, the paper shows that there are statistically significant correlations between the number of tweets posted per hour and different characteristics of these tweets. Third, the paper presents a multiple linear regression model to predict the number of tweets posted per hour in terms of these characteristics of tweets. The R2 score of this model was observed to be 0.9540. Fourth, the paper reports that the 10 most popular hashtags were #exoskeleton, #robotics, #iot, #technology, #tech #innovation, #ai, #sci, #construction and #news. Fifth, sentiment analysis of these tweets was performed using VADER and the DistilRoBERTa-base library. The results show that the percentage of positive, neutral, and negative tweets were 46.8%, 33.1%, and 20.1%, respectively. The results also show that in the tweets that did not express a neutral sentiment, the sentiment of surprise was the most common sentiment. It was followed by the sentiments of joy, disgust, sadness, fear, and anger. Furthermore, analysis of hashtag-specific sentiments revealed several novel insights, for instance, for almost all the months in 2022, the usage of #ai in tweets about exoskeletons was mainly associated with a positive sentiment. Sixth, text processing-based approaches were used to detect possibly sarcastic tweets and tweets that contained news. Finally, a comparison of positive tweets, negative tweets, neutral tweets, possibly sarcastic tweets, and tweets that contained news, in terms of different characteristic properties of these tweets are presented. The findings reveal multiple novel insights, for instance, the average number of hashtags used in tweets that contained news has considerably increased since January 2022.

Preprint ARTICLE | doi:10.20944/preprints202401.1927.v1

Changes in Social Media Big Data on Healing Forests : A Time-series Analysis on the Use Behavior of Healing Forests Before and After the COVID-19 Pandemic in South Korea

Sangwook Kim, Juyeong Youn

Subject: Environmental And Earth Sciences, Sustainable Science And Technology Keywords: healing forest; COVID-19; text mining; network analysis; quadratic assignment; procedure correlation analysis

Online: 26 January 2024 (13:41:21 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202306.2163.v1

The Impact of Digital Media on Event-Related Perception

Stefano Calabrese

Subject: Arts And Humanities, Literature And Literary Theory Keywords: Event; transmedia storytelling; action; expanded text

Online: 30 June 2023 (03:08:15 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202110.0033.v1

KOMBAT: Knowledgebase of Microbes’ Battling Agents for Therapeutics

Anasuya Bhargav, Srijanee Gupta, Surabhi Seth, Sweety James, Firdaus Fatima, Pratibha Chaurasia, Srinivasan Ramachandran

Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Antibiotic resistance; text mining; therapy; database

Online: 4 October 2021 (08:58:52 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202011.0646.v1

Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

Neeraj Vashistha, Arkaitz Zubiaga

Subject: Computer Science And Mathematics, Computer Science Keywords: social media; hate speech; text classification

Online: 25 November 2020 (14:12:07 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201610.0012.v1

Bio-Resource Exchange: Study of Prevalence of Antibody Donation and Development of a Web Portal to Facilitate it

Sandeep Subramanian, Madhavi Ganapathiraju

Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: data exchange; resource donations; text mining

Online: 5 October 2016 (15:08:32 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202304.0907.v1

Vocational Domain Identification with Machine Learning and Natural Language Processing on Wikipedia Text: Error Analysis and Class Balancing

Maria Nefeli Nikiforos, Konstantina Deliveri, Katia Lida Kermanidis, Adamantia Pateli

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Natural Language Processing; Social Text Mining; Machine Learning; Vocational Domain Identification; Vocational Language; Error Analysis; Class Balancing

Online: 25 April 2023 (09:29:07 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202204.0303.v2

Linguistic Markers of Intercultural Competence in Student Blogs

Hilde Hanegreefs, Mark Pluymaekers, Ankie Hoefnagels

Subject: Social Sciences, Language And Linguistics Keywords: Blogging; intercultural competence; international learning outcomes; reflective writing; reflection; text analysis; text mining; psycholinguistics; linguistic markers

Online: 8 March 2023 (10:07:17 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201811.0149.v1

A Mathematical Analysis of Maria Valtorta’s Mystical Writings

Emilio Matricciani, Liberato De Caro

Subject: Arts And Humanities, Religious Studies Keywords: Confidence tests, dictations, Jesus Christ, Maria Valtorta, mystics, punctuation marks, readability index, sentences, semantic index, syntactic index, text characters, Virgin Mary, visions, words, word interval.

Online: 7 November 2018 (09:06:01 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202008.0033.v1

Detecting Suspicious Texts Using Machine Learning Techniques

Omar Sharif, Mohammed Moshiul Hoque, A. S. M. Kayes, Raza Nowrozy, Iqbal H. Sarker

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Natural Language Processing; Suspicious Text Detection; Bengali Language Processing; Machine Learning; Text Classification; Feature Extraction; Suspicious Corpora

Online: 2 August 2020 (14:38:13 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202212.0478.v1

Implementing Computer Vision Techniques to Recognize American Sign Language (ASL) Hand Signals

Tauheed Khan Mohd, Alvaro Martin Grande, Rodrigo E. Ayala, Stuart Isteefano

Subject: Computer Science And Mathematics, Information Systems Keywords: Datasets, Neural Networks, Hand Detection, Text Tagging

Online: 26 December 2022 (07:30:24 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202211.0017.v1

Performance Comparison of TTS Models for Brazilian Portuguese to Establish a Baseline

Wilmer Lobato, Felipe Farias, William Cruz, Marcellus Amadeus

Subject: Computer Science And Mathematics, Computer Science Keywords: text-to-speech; naturalness; intelligibility; Brazilian Portuguese

Online: 1 November 2022 (04:37:04 CET)

Show abstract| Download PDF| Share

Working Paper ARTICLE

DASTEX: a New Readability Formula based on Semantic Complexity of Text

Mohammad Reza Besharati, Mohammad Izadi

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Semantic Complexity; Semantics; Text Complexity; Readability Formulae

Online: 6 September 2021 (13:33:34 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202010.0057.v1

Automatic electronic invoice classification using machine learning models

Chiara Bardelli, Alessandro Rondinelli, Ruggero Vecchio, Silvia Figini

Subject: Business, Economics And Management, Accounting And Taxation Keywords: multiclass classification; text mining; accounting control system

Online: 5 October 2020 (09:05:53 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.0332.v1

Switching Self-Attention Text Classification Model with Innovative Reverse Positional Encoding for Right-To-Left Languages: A Focus on Arabic Dialects

Laith H. Baniata, Sangwoo Kang

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Switching Self-Attention; Reverse Positional Encoding (RPE) mothed; Text Classification (SA); Right-to-Left Text; five-polarity; ITL

Online: 6 February 2024 (05:20:40 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0153.v1

Trend Research of Maritime Autonomous Surface Ships (MASS) based on Shipboard Electronics: Focusing on Text Mining and Network Analysis

Jinsick Kim, Sungwon Han, Hyeyoung Lee, Byeongsoo Koo, Moonju Nam, Kukjin Jang, Jooyeoun Lee, Myoungsug Chung

Subject: Engineering, Transportation Science And Technology Keywords: Maritime Autonomous Surface Ships (MASS), Research trends, Text Mining, Latent Dirichlet allocation (LDA) topic modeling, Shipboard Electronics, Shipping industry

Online: 2 April 2024 (16:16:15 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202307.1177.v1

Machine Learning Analysis of Social Media Posts Written in Natural Language May Enhance Suicidal Ideation Detection in Romanian Adults with Major Depression

Eduard P. Drima MD, PhD

Subject: Medicine And Pharmacology, Psychiatry And Mental Health Keywords: Suicidal Ideation, major depression in adults, natural language written texts, Romanian depression support forum, machine learning text mining,

Online: 18 July 2023 (07:44:06 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.1073.v1

The Effect of Ambient Illumination and Text Color on Visual Fatigue Under Negative Polarity

Qiangqiang Fan, Jinhan Xie, Yang Wang, Zhaoyang Dong

Subject: Engineering, Industrial And Manufacturing Engineering Keywords: visual fatigue; ambient illumination; Negative polarity; text color

Online: 17 April 2024 (11:26:44 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202307.0462.v1

WHORU: Improving Abstractive Dialogue Summarization with Personal Pronoun Resolution

Tingting Zhou

Subject: Computer Science And Mathematics, Computer Science Keywords: text summarization; abstractive dialogue summarization; personal pronoun resolution

Online: 7 July 2023 (10:15:01 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202305.1326.v1

Image E-book Guidance for Improving Urinary Catheter Discomfort, Self-Efficacy, and Pain in Postoperative Patients

Hsin-Shu Huang, Hsin-Yuan Fang

Subject: Biology And Life Sciences, Behavioral Sciences Keywords: image e-book guidance; text guidance; self-efficacy

Online: 18 May 2023 (10:23:37 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202304.0935.v1

A Study on Generating Webtoons using Multilingual Text-to-Image Models

Kyungho Yu, Hyungho Ju, Jeongin Kim, Chanjun Chun, Pankoo Kim

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Multilingual BERT; Text-to-image; DCGAN; Webtoon; GAN

Online: 26 April 2023 (03:16:07 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202105.0601.v1

A Study on Ways to Improve Mobile RPG Using Big Data Text Mining

DongHyun Youm, JungYoon Kim

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Mobile RPG; Big Data; Text Mining; Topic Modeling

Online: 25 May 2021 (10:21:36 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202102.0120.v1

Integrating Text Mining and Balanced Scorecard Techniques to Investigate the Association between CEO Message of Homepage Words and Financial Status: Emphasis on Hospitals

Hyung Jong Na, Kun Chang Lee, Sung Tae Kim

Subject: Business, Economics And Management, Business And Management Keywords: Homepage words; Financial ratio; Text-mining; Balanced scorecard

Online: 3 February 2021 (15:07:40 CET)

Show abstract| Download PDF| Share

(1) Background: The CEO message of hospital homepage contain various contents such as the hospital's future vision, promises with customers, upgraded services and public activities. The CEO’s message of the homepage includes non-financial information as well as financial information of corporates. Also, it provides useful information for not only company's goals and vision but also firm performance and strategies for the future. This study aims to investigate associations between CEO’s message of hospitals homepages and financial status. We used the balanced scorecard frame to analyze what content on the hospital's homepage is related to the hospital's various financial ratios. (2) Methods: We adopt a text mining method to extract significantly repeated keywords from the CEO’s message of hospital website. And we classify these keywords by a balanced scorecard frame. To examine the relationship between keywords of CEO’s message of the hospital homepage and hospital’s financial ratio, T-test is conducted for the difference in the TF-IDF (Term Frequency is Divided by Inverse Document Frequency) mean of the home page contents and its relationship with the views of the balanced scorecard framework. (3) Results: According to empirical results on 65 samples collected from local hospitals, there are some significant relationship between the qualitative content of the hospital's homepage and the quantitative financial ratio that indicates profitability, activity, leverage, liquidity, and transfer to essential business fund (EBF) income. (4) Conclusions: The introduction section of a homepage is most accessible to customers, containing the aims and ideals of hospitals and reflecting their values and visions [1]. In addition, in view of financial status, they can either emphasize financial strength or focus on other areas to mask weakness of financial information. This study reminds us of the importance of hospital website’s disclosure, and it can be inferred from the financial status of the hospital. It also highlights the need for harmonization between quantitative data, financial statements, and qualitative data, CEO’s messages. (5) Implications: To our best knowledge, this paper is the first research attempting to investigate the relation between text of hospital homepage and financial ratio of hospital through text mining technique and balanced scorecard frame. Hospitals take a crucial part in a country’s welfare and healthcare backbone industry. Nevertheless, in many countries, hospital organization sectors tend to remain a source of critical fiscal deficits due to its ineffective and sloppy management. We expect that the result of this paper can provide hospital managers to useful information.

Preprint BRIEF REPORT | doi:10.20944/preprints201811.0527.v1

Mapping the Literature on Nutritional Interventions in Cognitive Health: A Data-Driven Approach

Erin I. Walsh, Nicolas Cherbuin

Subject: Biology And Life Sciences, Food Science And Technology Keywords: citation network analysis; text mining; nutrition intervention; cognition

Online: 21 November 2018 (13:50:28 CET)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints201811.0206.v1

Towards identifying author confidence in biomedical articles

Daniela Gifu, Mihaela Onofrei, Diana Trandabat

Subject: Computer Science And Mathematics, Other Keywords: Biomedical libraries; author’s confidence; writing styles; text analysis

Online: 8 November 2018 (11:01:24 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201810.0338.v1

Improving the Accuracy in Text Classification Methodology in Light of Modelling the Latent Semantic Relations

Nina Rizun, Yurii Taranenko, Wojciech Waloszek

Subject: Computer Science And Mathematics, Information Systems Keywords: text classification; topic modelling; latent semantic analysis; latent dirichlet allocation; hierarchical sentiment dictionary; contextually-oriented hierarchical corpus; text tonality; evaluation

Online: 16 October 2018 (07:55:35 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202208.0451.v1

Employing a Multilingual Transformer Model for Segmenting Unpunctuated Arabic Text

Abdullah M. Alshanqiti, Sami Albouq, Ahmad B. Alkhodre, Abdallah Namoun, Emad Nabil

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: text splitting; text tokenization; transfer learning; mask-fill prediction; NLP linguistic rules; missing punctuations; cross-lingual BERT model; Masked Language Modeling

Online: 26 August 2022 (05:19:39 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202305.0990.v1

Research on Multilingual News Clustering Based on Cross-Language Word Embeddings

Lin Wu, Rui Li, Wong-Hing Lam

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: news; cross-language word embedding; LDA model; text clustering

Online: 15 May 2023 (07:29:01 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

WATS-SMS: A T5-based French Wikipedia Abstractive Text Summarizer for SMS

Jean Louis Ebongue Kedieng Fendji, Désiré Manuel Taira, Marcellin Atemkeng, Adam Musa Ali

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Text summarization; Fine-tuning; Transformers; SMS; Gateway; French Wikipedia.

Online: 14 September 2021 (10:48:55 CEST)

Show abstract| Download PDF| Share

Working Paper ARTICLE

Learning by Injection: Attention Embedded Recurrent Neural Network for Amharic Text-image Recognition

Birhanu Belay, Tewodros Habtegebrial, Gebeyehu Belay, MIllion Mesheshsa, Marcus Liwicki, Didier Stricker

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Amharic script; Attention mechanism; OCR; Encoder-decoder; Text-image

Online: 15 October 2020 (13:42:28 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201812.0306.v1

Scene to Text Conversion and a Cymatics Based Configurable Text Perception

Saeed Mian Qaisa

Subject: Engineering, Electrical And Electronic Engineering Keywords: cymatics; text detection and recognition; optical character recognition (OCR)

Online: 25 December 2018 (13:52:31 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0744.v1

Analyzing the Possibilities of Using the Scilit Platform to Identify Current Energy Efficiency and Conservation Issues

Boris Chigarev

Subject: Engineering, Energy And Fuel Technology Keywords: energy efficiency, energy conservation, Scilit, bibliometric record analysis, text clustering

Online: 10 April 2024 (14:57:33 CEST)

Show abstract| Download PDF| Share

Purpose of publication: - Preparation of bibliometric data exported from the Scilit platform on energy efficiency and conservation for further analysis to identify relevant research topics. - To identify potential issues in the processing of data exported from the Scilit platform. - Providing colleagues with the opportunity to use the prepared data and examples of their analysis for independent research on topical issues of energy efficiency and energy conservation using materials provided by the Scilit platform. Research Materials: Files in CSV and RIS formats exported from Scilit for the query "energy conservation OR efficiency" in Common Fields [Title, Abstract, Keyword], using filters: Content Type → Journal Article; Year→2021-2023; Subject → Industrial Engineering (29.8K), Energy and Fuel Technology (9.8K), Manufacturing Engineering (9.2K). A total of 30K records sorted by their relevance (10K for each year) were exported. Data are current as of 14-03-2024. Methods: Preprocessing of title, annotation, and keyword field texts using lemmatization dictionaries collected on GitHub, removal of keywords taken from GATE and spaCy, and "manual" editing. Using VOSviewer to analyze publication topics by clustering keywords based on their co-occurrence. Using Scimago Graphica to build bubble diagrams.Application of the GSDMM algorithm for clustering bibliometric records by title and annotation texts. Creation of a dictionary for this algorithm using the keyword field.Use of the Carrot2 demo version and the NMF algorithm for a more detailed analysis of the topics of the record clusters obtained from GSDMM. Results: are presented in the form of initial and interim tables and graphs obtained in the course of this study. The full tables are provided as references to the attached materials. Supplementary material for this preprint on figshare: Chigarev, Boris (2024). Supplementary material for preprint "Analyzing the Possibilities of Using the Scilit Platform to Identify Current Energy Efficiency and Conservation Issues". figshare. Dataset. https://doi.org/10.6084/m9.figshare.25574058.v1

Preprint REVIEW | doi:10.20944/preprints202403.0064.v1

Enabling Public Security Text-Based Analytics: A Survey to Outline Research Directions

Victor Diogho Heuer De Carvalho, Robério José Rogério Dos Santos, Thyago Celso Cavalcante Nepomuceno, Thiago Poleto

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Text mining; Public security; Survey; Applications; Opportunities; Future Research Directions

Online: 1 March 2024 (18:30:13 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202312.0093.v1

Analytical Experimentations with Midjourney Architectural Virtual Lab: Defining Some Major Current Limits in AI-generated Representations of Islamic Architectural Heritage

Ahmad Sukkar, Mohamed W. Fareed, Moohammed Wasim Yahia, Salem Buhashima Abdalla, Iman Ibrahim, Khaldoun Abdul Karim Senjab

Subject: Engineering, Architecture, Building And Construction Keywords: Islamic architecture; architectural visualization; intangible heritage; text-to-image generation

Online: 1 December 2023 (21:29:02 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202306.0642.v1

Adverse Crosstalk Between Extracellular Matrix Remodeling and Ferroptosis in Basal Breast Cancer

Christophe Desterke, Emma Cosialls, Yao Xiang, Rima Elhage, Clémence Duruel, Yunhua Chang-Marchand, Ahmed Hamaï

Subject: Biology And Life Sciences, Cell And Developmental Biology Keywords: basal breast cancer; extracellular matrix remodeling; ferroptosis; transcriptome; text mining

Online: 8 June 2023 (11:21:03 CEST)

Show abstract| Download PDF| Supplementary Files| Share

(1) Background: Breast cancer is a frequent heterogeneous disorder diagnosed in woman and is a high cause of mortality of them in reason to rapid metastasis and disease recurrence. Ferroptosis can inhibit breast cancer cell growth, improve the sensitivity of chemotherapy and radiotherapy and inhibit distant metastases so potentially acts on tumor micro-environment; (2) Methods: Ferroptosis/Extracellular matrix remodeling literature text-mining results were integrated in breast cancer transcriptome cohort according their distant relapse free survival (DRFS) under adjuvant therapy (anthracyclin+taxanes) and also in MDA-MB-231 transcriptome functional experiments with ferroptosis activations (GSE173905); (3) Results: Ferroptosis/Extracellular matrix remodeling text-mining identified 910 associated genes in at list 10 articles. Univariate Cox analyses censored on breast cancer (GSE25066) selected 252 individual significant genes and 171 of them found with an adverse expression. Functional enrichment of these 171 adverse genes predicted basal breast cancer signatures. By text-mining some ferroptosis significant adverse selected genes shared citations in domain of ECM remodeling such as: TNF, IL6, SET, CDKN2A, EGFR, HMGB1, KRAS, MET, LCN2, HIF1A, TLR4. A molecular score based on expression the eleven genes was found predictive of worst prognosis breast cancer at univariate level: basal subtype, short DRFS, high grade values 3 and 4, estrogen and progesterone receptors negative and nodal stages 2 and 3. This eleven gene signature was validated as regulated by ferroptosis inductors (erastin and RSL3) in triple negative breast cancer cellular model MDA-MB-231.; (4) Conclusions: Crosstalk between ECM remodeling-Ferroptosis functionalities allowed to define a molecular score which have been characterized as an independent adverse parameter in prognosis of breast cancer patients. Gene signature of this molecular score have been validated to be regulated by erastin/RSL3 ferroptosis activators. This molecular score could be promising to evaluate ECM impact of ferroptosis target therapies in breast cancer.

Preprint ARTICLE | doi:10.20944/preprints202210.0247.v1

The New Version of the Anddigest Tool with Improved AI-Based Short Names Recognition

Timofey V Ivanisenko, Pavel S Demenkov, Nikolay A. Kolchanov, Vladimir A. Ivanisenko

Subject: Social Sciences, Library And Information Sciences Keywords: Text-mining; ANDDigest; ANDSystem; Named entity recognition; Machine learning; PubMedBERT

Online: 18 October 2022 (04:29:17 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202106.0482.v3

Fighting the COVID-19 Infodemic in News articles and False Publications: The NeoNet Text Classifier, a Supervised Machine Learning Algorithm

Mohammad AR Abdeen, Ahmed Abdeen Hamed, Xindong Wu

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: COVID-19 Infodemic; Text Classification; TFIDF Features; Network Training modes; Supervised Learning; Misinformation; News Classification; False Publications; PubMed; Anomaly Detection

Online: 26 July 2021 (12:06:04 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202103.0738.v1

Impact of the Coronavirus Pandemic on Science and Society: Insights from Temporal Bibliometric Networks

Ramya Gupta, Abhishek Prasad, Suresh Babu, Gitanjali Yadav

Subject: Computer Science And Mathematics, Analysis Keywords: bibliometry; coronavirus; text and data mining; SARS; MERS; COVID-19

Online: 31 March 2021 (17:30:56 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201809.0466.v1

Topological Signature of 19th Century Novelists: Persistence Homology in Context-Free Text Mining

Shafie Gholizadeh, Armin Seyeditabari, Wlodek Zadrozny

Subject: Computer Science And Mathematics, Information Systems Keywords: topological data analysis; text mining; computational topology; style; persistent homology

Online: 24 September 2018 (15:33:02 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202404.0563.v1

Development of Context-based Sentiment Classification for Intelligent Stock Market Prediction

Nurmaganbet Smatov, Ruslan Kalashnikov, Amandyk Kartbayev

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: sentiment analysis; neural networks; stock price prediction; text-mining; deep learning.

Online: 8 April 2024 (15:45:25 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202402.1432.v1

Differences in CEO Communication Strategies between High and Low Performing Firms in the Global Auto Parts Industry

Yunseok Hong, Keuntae Cho

Subject: Business, Economics And Management, Business And Management Keywords: CEO communication; innovation management; network analysis; text mining; auto parts industry

Online: 26 February 2024 (12:59:38 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.0818.v1

Transformer Text Classification Model for Arabic Dialects that Utilizes Inductive Transfer

Laith H. Baniata, Sangwoo Kang

Subject: Computer Science And Mathematics, Computer Science Keywords: transformer; inductive transfer; text classification; Arabic dialects; positional encoding; 5-polarity

Online: 13 November 2023 (12:11:04 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202309.1106.v1

Research on Safety Risk Transfer in Subway Construction Based on Text Mining and Complex Networks

Kunpeng Wu, Jianshe Zhang, Yanlong Huang, Hui Wang, Hujun Li, Huihua Chen

Subject: Engineering, Architecture, Building And Construction Keywords: Text mining; apriori algorithm; complex network model; subway construction; risk transfer

Online: 18 September 2023 (12:58:50 CEST)

Show abstract| Download PDF| Share

Subway construction is often in a complex natural and human-machine operating environment, and that complicated setting leads to subway construction more prone to safety accidents, which can cause substantial casualties and monetary losses. Thus, it is necessary to investigate the safety risks of subway construction. The existing literature on the identification and assessment of subway construction safety risks(SCSR) is susceptible to the influence of subjective factors. Moreover, although existing studies have explored the interrelationships between different risks, these studies usually analyze the interrelationships of single risks, lack the study of risk chain transfer relationships, and fail to find out the key path of risk transfer. Therefore, this paper innovatively combines text mining, association rules and complex networks to deep mine subway construction safety incident reports and explore risk transfer process. Firstly, it uses text mining technology to identify subway construction safety risk; Then, association rules are introduced to explore the causal relationships among safety risk; Finally, the key safety risk and important transfer paths of subway construction safety accidents (SCSA) are obtained based on the complex network model. Research results show that (a) improper safety management, unimplemented safety subject responsibilities, violation of operation rules, non-perfect safety responsibilities system and insufficient safety education and training are the key safety risk in SCSA; (b) two shorter key risk transfer paths in the subway construction safety network can be obtained: insufficient safety education and training→lower safety awareness→violation of operation rules→safety accidents; insufficient safety checks or hidden trouble investigations→violation of operation rules→safety accidents; (c) in the process of risk transfer, the risk can be controlled by controlling the key nodes or cutting off the transfer path. The results of the study provide new ideas and methods for SCSR identification and influence element mining, which help safety managers propose accurate subway construction safety risk control measures.

Preprint ARTICLE | doi:10.20944/preprints202302.0077.v1

aeroBERT-Classifier: Classification of Aerospace Requirements using BERT

Archana Tikayat Ray, Bjorn F. Cole, Olivia J. Pinon Fischer, Ryan T. White, Dimitri N. Mavris

Subject: Computer Science And Mathematics, Computer Science Keywords: Requirements Engineering; Natural Language Processing; NLP; BERT; Requirements Classification; Text Classification

Online: 6 February 2023 (02:26:56 CET)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202212.0064.v1

Application of Natural Language Processing (NLP) in Detecting and Preventing Suicide Ideation: A Systematic Review

Abayomi Arowosegbe, Tope Oyelade

Subject: Medicine And Pharmacology, Psychiatry And Mental Health Keywords: Natural language processing; NLP; Text mining; Suicide; Suicide-Ideation; Mental Health

Online: 5 December 2022 (07:34:30 CET)

Show abstract| Download PDF| Share

Introduction: Around a million people are reported to die by suicide every year, and due to the stigma associated with the nature of the death, this figure is usually assumed to be an underestimate. Suicide may be prevented if prompt intervention is taken to mitigate risk. Machine learning and artificial intelligence-based modelling, such as natural language processing (NLP) and other text analytics approaches, has the potential to become a major technique for the detection, diagnosis, and treatment of people who are suffering from mental health issues. The primary aims of this research are to determine whether NLP techniques have been utilised in the field of suicide prevention, and if so, were they effective? What were their limitations? Methods: PubMed, EMBASE, MEDLINE, PsycInfo, and Global Health databases were searched for studies that reported use of NLP for suicide ideation or self-harm. Thematic analysis was used to synthesise and analyse the included studies. Findings were reported using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement, and the Mixed Methods Appraisal Tool (MMAT) was used in assessing paper quality. Result: The preliminary search of five databases generated 387 results. Removal of duplicates resulted in 158 potentially suitable studies. Twenty papers were finally included in this review. Discussion: Studies show that combining structured and unstructured data in NLP data modelling yielded more accurate results than utilizing either alone. Also, to reduce suicides, people with mental problems must be continuously and passively monitored. Further, NLP and other machine learning/artificial intelligence technologies can be used to address health inequities and electronic health records provide valuable data for creating suicide risk tools. Finally, Online, social media, and smartphone applications can be leverage in detecting people with suicide ideation. Conclusion: The use of artificial intelligence and machine learning opens new avenues for considerably guiding risk prediction and advancing suicide prevention frameworks. The review's analysis of the included research revealed that the use of NLP may result in low-cost and effective alternatives to existing resource-intensive methods of suicide prevention. To summarise, there is substantial evidence that NLP is useful in identifying people who have suicide ideation.

Preprint ARTICLE | doi:10.20944/preprints202111.0344.v1

Extraction of the Relations between Significant Pharmacological Entities in Russian-Language Internet Reviews on Medications

Alexander Sboev, Anton Selivanov, Ivan Moloshnikov, Roman Rybka, Artem Gryaznov, Sanna Sboeva, Gleb Rylkov

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: pharmacological text corpus; automatic relation extraction; natural language processing; deep learning

Online: 19 November 2021 (10:40:10 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201904.0170.v1

Mapping Research in Assisted Reproduction Worldwide

D. García, Francesco Alessandro Massucci, Alessandro Mosca, Ismael Ràfols, A. Rodríguez, R. Vassena

Subject: Medicine And Pharmacology, Pediatrics, Perinatology And Child Health Keywords: topic modelling; latent dirichlet allocation; text mining; assisted reproduction; ART; IVF

Online: 15 April 2019 (12:25:12 CEST)

Show abstract| Download PDF| Share

Study question: What are the current trends of research in Human Assisted Reproduction around the world? Summary answer: USA is the leading country, followed by the UK, China, France and Italy. The largest research area is “laboratory techniques”, although other areas such as “public health”, “quality, ethics and law” and “female factor” are gaining ground worldwide. What is known already: Scientific research, especially in health and medical sciences, aims at addressing specific needs that society (and, especially, patients) perceives as pressing. One of the main challenges for policymakers and research funders alike is therefore to align research priorities to societal needs. We can thus think of research agendas in terms of a demand side (societal needs) and a supply side (research outputs). Research output in Human Assisted Reproduction has expanded in the past years, as indicated by the increasing number of scientific publications in indexed journals in this area. Nevertheless, no map of research related to assisted reproduction has been produced so far, hindering the identification of potential areas of improvement and need. Study design, size, duration: 26,000+ scientific publications (articles, letters, and reviews) on Human Assisted Reproduction produced worldwide between 2005 and 2016 were analyzed. These publications were indexed in PubMed or obtained from reference list of indexed publications included in the analysis.Participants/materials, setting, methods: The corpus of publications was obtained by combining the MeSH terms: “Reproductive techniques”, “Reproductive medicine”, “Reproductive health”, “Fertility”, “Infertility”, and “Germ cells”. Then it was analyzed by means of text mining algorithms (Topic Modeling (TM) based on Latent Dirichlet Allocation (LDA)), in order to obtain the main topics of interest. Finally, these categories were analyzed across world regions and time. Main results and the role of chance: We identified 44 main topics, which were further grouped in 11 macro categories, form larger to smaller: “laboratory techniques”, “male factor”, “quality, ethics and law”, “female factor”, “public health and infectious diseases”, “basic research and genetics”, “pregnancy complications and risks”, “general infertility and ART”, “psychosocial aspects”, “cancer”, and “research methodology”. The USA was the leading country in number of publications, followed by the UK, China, France and Italy. Interestingly, research contents in high income countries is fairly homogeneous across macro-categories, and it is dominated by “laboratory techniques” in Western and Southern Europe, and by “quality, ethics and law” in North America, Australia and New Zealand. In middle income countries we observe that research is mainly performed on “male factor”, and noticeably less on “female factor”. Finally, research on “public health and infectious diseases” predominates in low-income countries. Regarding temporal evolution of research, “laboratory techniques” is the most abundant topic on a yearly basis, and relatively constant over time. However, since production in most of the other categories is increasing, the relative contribution of this research category is actually decreasing. Publication is especially increasing in “public health and infectious diseases” (in all world regions, but especially in low income countries), “quality, ethics and law” (high income countries), and “female factor” (middle income countries). Limitations, reasons for caution: Three main factors might limit the robustness of our work: the textual corpus analyzed is based on abstract and titles, the reproducibility of the stochastic algorithms applied, which may produce slightly differing results at each run, and the interpretation of the topics obtained. Wider implications of the findings: This study should prove beneficial in the design of research strategies and policies that foster the alignment between supply (assisted reproduction research) and demand (society). Study funding/competing interest(s): PTQ-14-06718 of the Spanish MINECO Torres Quevedo programme (FAM).

Preprint ARTICLE | doi:10.20944/preprints201810.0678.v1

Unstructured Text in EMR Improves Prediction of Death after Surgery in Children

Oguz Akbilgic, Ramin Homayouni, Kevin Heinrich, Max Raymond langham, Jr, Robert Lowell Davis

Subject: Medicine And Pharmacology, Pediatrics, Perinatology And Child Health Keywords: post-operative death; unstructured data; logistic regression; text mining; surgery outcome

Online: 29 October 2018 (11:46:18 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201708.0055.v1

A Survey of Data Processing of EMR (Electronic Medical Record) Based on Data Mining

Wencheng Sun, Fang Liu, Zhiping Cai, Shengqun Fang, Guoyan Wang

Subject: Computer Science And Mathematics, Information Systems Keywords: EMR; data preprocessing; text mining; information extraction; medical decision support system

Online: 15 August 2017 (05:46:43 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202312.1380.v1

WDANet: Exploring Style Feature via Dual Cross-Attention for Woodcut-Style Design

Yangchunxue Ou, Jingjun Xu

Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: woodcut-style design; diffusion model; computer-aided design; text-to-image model

Online: 19 December 2023 (09:22:12 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202306.0323.v1

Utilizing Text Mining for Labeling Training Models from Futures Corpus in Generative AI

Hsien-Ming Chou, Tsai-Lun Cho

Subject: Computer Science And Mathematics, Information Systems Keywords: text mining; semantic analysis; labeling bull-bear words; futures corpus; generative AI

Online: 5 June 2023 (13:18:24 CEST)

Show abstract| Download PDF| Share

Preprint CONCEPT PAPER | doi:10.20944/preprints202305.0287.v1

Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy

Davut Emre TAŞAR, Ceren ÖCAL TAŞAR

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Natural language processing; encrypted text; data privacy; cloud computing; Doc2Vec; XGBoost; LSTM

Online: 5 May 2023 (03:38:42 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202212.0495.v1

Automatic Generation of Literary Sentences in French

Luis-Gil Moreno-Jiménez, Juan-Manuel Torres-Moreno, Roseli Wedemann

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: computational creativity; literary sentences; automatic text generation; shallow parsing and deep learning.

Online: 26 December 2022 (15:53:39 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202209.0324.v1

Natural Language Processing and Cognitive Networks Identify UK Insurers’ Trends in Investor Day Transcripts

Claus Stefan, Massimo Stella

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Insurance; natural language processing; topic modelling; text analysis; complex networks; risk ranking

Online: 21 September 2022 (10:25:26 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202205.0114.v1

Blockchain Technology for Supply Chain Management: A Comprehensive Review

Aichih (Jasmine) Chang, Nesreen El-Rayes, Jim Shi

Subject: Business, Economics And Management, Business And Management Keywords: Blockchain Technology; Industry 4.0; Supply Chain Management; Text mining; Metaverse; Hashgraph, Baas.

Online: 9 May 2022 (10:01:43 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201812.0086.v4

The User-Pleasant Video Skimming by Multi-Modal Keywords Semantics

Yiqing Shen, Yingbo Li

Subject: Computer Science And Mathematics, Computer Science Keywords: multi-model information fusion; video skim-ming; audio and text classification; keyframe extraction

Online: 5 August 2019 (03:48:49 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201802.0108.v1

Punctuation Generation Inspired Linguistic Features for Mandarin Prosody Generation

Chen-Yu Chiang, Yu-Ping Hung, Han-Yun Yeh, I-Bin Liao, Chen-Ming Pan

Subject: Computer Science And Mathematics, Information Systems Keywords: Mandarin; prosody generation; linguistic feature; break prediction; text-to-speech; punctuation confidence

Online: 16 February 2018 (15:39:58 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202308.2048.v1

Diffusion Denoising Process with Gated U-Net for High-Quality Document Binarization

Sangkwon Han, Seungbin Ji, Jongtae Rhee

Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: document binarization; deep learning; gated convolution; generative model; latent diffusion models; text stroke

Online: 30 August 2023 (08:23:05 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202307.0677.v1

Explore actual sustainable energy topics by using Yake!, Krovetz, GSDMM and short text summaries

Boris Chigarev

Subject: Engineering, Energy And Fuel Technology Keywords: sustainable energy topics, short text summaries, bibliometric records, Yake!, Krovetz stemmer, GSDMM algorithm

Online: 11 July 2023 (10:15:08 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202303.0319.v1

Probabilistic Hough Transform for Rectifying Industrial Nameplate Images: A Novel Strategy for Improved Text Detection and Precision in Difficult Environments

Han Li, Yan Ma, Hong Bao, Yuhao Zhang

Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: industrial image processing; feature amplification; image transformation strategy; text detection; Probabilistic Hough Transform

Online: 17 March 2023 (09:05:54 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202207.0090.v1

Natural Language Processing Methods for Scoring Sustainability Reports – A Study of Nordic Listed Companies

Marcelo Gutierrez-Bustamante, Leonardo Espinosa-Leal

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Text mining; natural language processing; sustainability; semantic similarity; corporate social responsibility; Machine Learning

Online: 6 July 2022 (08:53:02 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202110.0184.v1

Self-Attention Based Models for the Extraction of Molecular Interactions from Biological Texts

Prashant Srivastava, Saptarshi Bej, Kristina Yordanova, Olaf Wolkenhauer

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: text-mining; self-attention models; biological literature mining; relationship extraction; natural language processing

Online: 12 October 2021 (14:17:46 CEST)

Show abstract| Download PDF| Share

Preprint COMMUNICATION | doi:10.20944/preprints202104.0575.v2

A Personalized Machine-Learning-enabled Method for Efficient Research in Ethnopharmacology. The case of Southern Balkans and Coastal zone of Asia Minor

Evangelos Axiotis, Andreas Kontogiannis, Eleftherios Kalpoutzakis, George Giannakopoulos

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Ethnopharmacology; Artificial Intelligence; Web Crawling; Active Learning; Reinforcement Learning; Text Mining; Big Data

Online: 23 June 2021 (11:47:32 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202003.0249.v1

Machine Learning Algorithm’s Measurement and Analytical Visualization of User’s Reviews for Google Play Store

Abdul Karim, Azhari Azhari, Samir Brahim Belhaouri, Ali Adil Qureshi

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: machine learning; preprocessing; semantic analysis; text mining; TF/IDF; scraping; Google Play Store

Online: 11 August 2020 (08:14:10 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints201812.0114.v1

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

Chankyu Choi, Youngmin Yoon, Junsu Lee, Junseok Kim

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: directional encoding mask; selective attention network; supervised learning; horizontal and vertical text recognition

Online: 11 December 2018 (07:24:04 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202312.0144.v1

Redefining Textual Dynamics for Enhanced Text Style Transfer

Carlos Asanka, Conti Vatsalan, Rodolfo Patel

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Syntax-Aware Text Transformation; ; Style Classifier Analysis

Online: 4 December 2023 (06:53:06 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.0963.v1

Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Shoffan Saifullah, Rafał Dreżewski, Felix Andika Dwiyanto, Agus Sasmito Aribowo, Yuli Fauziah, Nur Heri Cahyana

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Hate Speech Detection; Machine Learning; Sentiment Analysis; Semi-Supervised Learning; Self-Learning; Text Mining

Online: 15 November 2023 (09:58:07 CET)

Show abstract| Download PDF| Share

Text annotation is an essential element of the natural language processing approaches. The manual annotation process performed by humans has several drawbacks, such as subjectivity, slowness, fatigue, and possibly carelessness. In addition, annotators may annotate ambiguous data. So, we developed the concept of automated annotation to get the best annotations using several machine-learning approaches. The proposed approach is based on an ensemble algorithm of meta-learners and meta-vectorizer techniques. The approach employs a semi-supervised learning technique for automated annotation, aimed at detecting hate speech. This involves leveraging various machine learning algorithms, including Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (KNN), and Naive Bayes (NB), in conjunction with Word2Vec and TF-IDF text extraction methods. The annotation process is performed using 13,169 Indonesian YouTube comments data. The proposed model used a Stemming approach using data from Sastrawi and also new data of 2,245 words. Semi-supervised learning uses 5%, 10%, and 20% of labeled data as compared to performing labeling based on 80% of the datasets. In semi-supervised learning, the model learns from the labeled data, which provides explicit information, and the unlabeled data, which offers implicit insights. This hybrid approach enables the model to generalize and make informed predictions even when limited labeled data is available, ultimately enhancing its ability to handle real-world scenarios with scarce annotated information. In addition, the proposed method uses a variety of thresholds for matching words labeled with hate speech ranging from 0.6, 0.7, 0.8, and 0.9. The experiment showed that the KNN-Word2ec model has the best accuracy value of 96.9% with a scenario of 5%:80%:0.9. However, several other methods have also accuracy above 90%, such as SVM and DT based on both text extraction methods in several test scenarios.

Preprint ARTICLE | doi:10.20944/preprints202309.0159.v1

The Impact of Reading Modalities and Text Types on Reading in School-Age Children: An Eye-tracking Study

Wi-Jiwoon Kim, Seo Rin Yoon, Seohyun Nam, Yunjin Lee, Dongsun Yim

Subject: Social Sciences, Education Keywords: eye-tracking; reading modality; audio-assisted reading; text type; reading comprehension; school-age children

Online: 5 September 2023 (02:26:49 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202308.1092.v1

The Design and Testing of a Text Message for use as an Informational Nudge in a Novel Food Insecurity Intervention

Michael F. Royer, Christopher Wharton

Subject: Public Health And Healthcare, Public Health And Health Services Keywords: food insecurity; food access; food assistance; SNAP; barriers; text message; information; nudge; qualitative; interview

Online: 15 August 2023 (08:57:59 CEST)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints202102.0447.v1

The Impacts of Covid-19 on Circular Economy: Gaining Insight across the SCOPUS and Web of Science Research Articles with Text Mining Techniques

Khoa Tran, Tuyet Anh Nguyen

Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: circular economy; Covid-19; Voyant tools; environmental sustainability; social sustainability; economic sustainability; text mining

Online: 20 February 2021 (01:42:10 CET)

Show abstract| Download PDF| Supplementary Files| Share

Preprint ARTICLE | doi:10.20944/preprints202008.0265.v2

Mining Stack Overflow: a Recommender Systems-Based Model

Fouzi Harrag, Mokdad Khamliche

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Ecommender system; learning to rank; Mining software repositories; Text Mining; Deep learning; Stack Overflow

Online: 4 September 2020 (11:20:33 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202007.0646.v1

Classification of Google Play Store Application Reviews Using Machine Learning

Abdul Karim, Azhari Azhari, Meshrif Alruily, Hamza Aldabbas, Samir Brahim Belhaouri, Ali Adil Qureshi

Subject: Computer Science And Mathematics, Computer Science Keywords: Machine Learning; Natural Language Processing; Text Mining; Semantic Analysis; Scraping; Google Play Store; Rating

Online: 26 July 2020 (17:11:09 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202311.1462.v1

Evaluation and Comparison of SVM, Deep Learning, and Naïve Bayes Performances for Natural Language Processing Text Classification Task

Destiny Ogaga, Abiodun Olalere

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: text classification; natural language processing; multinomial naï; ve bayes; support vector machine; deep learning

Online: 23 November 2023 (05:28:12 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202310.1494.v1

Assessing the Tourism City from an ES Perspective: The Evaluation of Tourism Service in Liyang, China

Xiangnan Fan, Yuning Cheng

Subject: Environmental And Earth Sciences, Ecology Keywords: tourism service evaluation; ecosystem services (ES); AHP-entropy weight method; GIS spatial analysis; text mining

Online: 24 October 2023 (08:34:17 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202310.0011.v1

Improving Understanding of Consumer Attitudes toward Cultured Meat through the Lens of Online Media Framing

Béré Benjamin Kouarfaté, Fabien Durif

Subject: Business, Economics And Management, Marketing Keywords: Text mining; Cultured meat; Artificial meat; media framing; attitude; social acceptability; Consumer acceptance; consumer behavior

Online: 1 October 2023 (09:45:46 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202306.0078.v1

GenCo: A Generative Learning Model for Heterogeneous Text Classification Based on Collaborative Partial Classifications

Zie Eya Ekolle, Ryuji Kohno

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Natural language processing; text classification; probabilistic models; machine learning; generative learning; collaborative learning; explainable AI

Online: 5 June 2023 (02:57:36 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202105.0449.v1

Explainable Hopfield Neural Networks by Using an Automatic Video Generation System

Clemente Rubio Manzano

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Explainable Artificial Intelligence; Hopfield Neural Networks; Automatic Video Generation; Data-to-text systems; Software Visualization

Online: 19 May 2021 (14:07:48 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202009.0657.v1

Test@Work Texts: Mobile Phone Messaging to Increase Awareness of HIV and HIV Testing in UK Construction Employees During the COVID-19 Pandemic

Matthew Middleton, Sarah Somerset, Catrin Evans, Holly Blake

Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: HIV; workplace intervention; SMS; HIV testing; construction; mobile phone; Covid-19; health promotion; text messaging

Online: 27 September 2020 (03:02:41 CEST)

Show abstract| Download PDF| Supplementary Files| Share

Background: HIV poses a threat to global health. With effective treatment options available, education and testing strategies are essential in preventing transmission. Text messaging is an effective tool for health promotion and can be used to target higher risk populations. This study reports on the design, delivery and testing of a mobile text messaging SMS intervention for HIV prevention and awareness, aimed at adults in the construction industry and delivered during the COVID-19 pandemic. Method: Participants were recruited at Test@Work workplace health promotion events (21 sites, n=464 employees), including health checks with HIV testing. Message development was based on a participatory design and included a focus group (n=9) and message fidelity testing (n=291) with assessment of intervention uptake, reach, acceptability, and engagement. Barriers to HIV testing were identified and mapped to the COM-B behavioural model. 23 one-way push SMS messages (19 included short web links) were generated and fidelity tested, then sent via automated SMS to two employee cohorts over a 10-week period during the COVID-19 pandemic. Engagement metrics measured were; opt-outs, SMS delivered/read, number of clicks per web link, and four two-way pull messages exploring repeat HIV testing, learning new information, perceived usefulness and behaviour change. Results: 291 people participated (68.3% of eligible attendees). A total of 7,726 messages were sent between March and June 2020, with 91.6% successfully delivered (100% read). 12.4% of participants opted out over 10 weeks. Of delivered messages, links were clicked an average of 14.4%, max 24.1% for HIV related links. The number of clicks on web links declined over time (r= -6.24, p=0.01). Response rate for two-way pull messages was 13.7% of participants. Since the workplace HIV test offer at recruitment, 21.6% reported having taken a further HIV test. Qualitative replies indicated behavioural influence of messaging on exercise, lifestyle behaviours and intention to HIV test. Conclusion: SMS messaging for HIV prevention and awareness is acceptable to adults in the construction industry, has high uptake, low attrition and good engagement with message content, when delivered during a global pandemic. Data collection methods may need refinement for audience and effect of COVID-19 on results is yet to be understood.

Preprint ARTICLE | doi:10.20944/preprints202301.0061.v1

Natural Language Processing Accurately Categorizes Indications, Findings, and Pathology Reports from Multicenter Colonoscopy

Shashank reddy Vadyala

Subject: Medicine And Pharmacology, Pathology And Pathobiology Keywords: Neural Network; Machine Learning; Natural Language Processing (NLP); Text Mining; Sentence Classification; Colorectal Cancer; Clinical Information.

Online: 4 January 2023 (03:48:26 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202106.0196.v3

A Hybrid Model for Similarity Measurement of Twitter Profiles

Niloufar Shoeibi, Nastaran Shoeibi, Pablo Chamoso, Zakie AlizadehSani, Juan M. Corchado

Subject: Computer Science And Mathematics, Information Systems Keywords: Twitter; Social Media; Social Networking; Social Network Analytic; DistilBERT; Text Similarity; Natural Language Processing; Character Computing

Online: 17 February 2022 (13:15:23 CET)

Show abstract| Download PDF| Share

Preprint REVIEW | doi:10.20944/preprints201708.0003.v1

De-Anonymizing Authors of Electronic Texts: A Survey on Electronic Text Stylometry

Mahmoud Khonji, Youssef Iraqi

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: stylometry; author identification; author verification; authorprofiling; stylistic inconsistency; text analysis; supervised learning; unsupervised learning; classification; forensics

Online: 2 August 2017 (12:38:17 CEST)

Show abstract| Download PDF| Share

Electronic text stylometry is a collection of forensics methods that analyze the writing styles of input electronic texts in order to extract information about authors of the input electronic texts. Such extracted information could be the identity of the authors, or aspects of the authors, such as their gender, age group, ethnicity, etc. This survey paper presents the following contributions: 1) A description of all stylometry problems in probability terms, under a unified notation. To the best of our knowledge, this is the most comprehensive definition to date. 2) A survey of key methods, with a particular attention to data representation (or feature extraction) methods. 3) An evaluation of 23,760 feature extraction methods, which is the most comprehensive evaluation of feature extraction methods in the literature of stylometry to date. The importance of this evaluation is two fold. First, identifying the relative effectiveness of the features (since, currently, many are not evaluated jointly; e.g. syntactic n-grams are not evaluated against k-skip n-grams, and so forth). Second, thanks to our generalizations, we could evaluate novel grams, such as what we name compound grams. 4) The release of our associated Python feature extraction library, namely Fextractor. Essentially, the library generalizes all existing n-gram based feature extraction methods under the "at least l-frequent, dir-directed, k-skipped n-grams'', and allows grams to be diversely defined, including definitions that are based on high-level grammatical aspects, such as POS tags, as well as lower-level ones, such as distribution of function words, word shapes, etc. This makes the library, by far, the most extensive in this domain to date. 5) The construction, evaluation, and release of the first dataset for Emirati social media text. This evaluation represents the first evaluation of author identification against Emirati social media texts. Interestingly, we find that, when using our models and feature extraction library (Fextractor), authors could be identified significantly more accurately than what is reported with similarly sized datasets. The dataset also contains sub-datasets that represent other languages (Dutch, English, Greek and Spanish), and our findings are consistent across them.

Preprint REVIEW | doi:10.20944/preprints202212.0086.v1

Information Needs and Communication Strategies for People with Coronary Heart Disease: A Scoping Review

Clara Zwack, Carlie Smith, Vanessa Poulsen, Natalie Raffoul, Julie Redfern

Subject: Social Sciences, Behavior Sciences Keywords: Information; resources; coronary heart disease; digital health; education; cardiac rehabilitation; secondary prevention; text message; sensors; cardiovascular risk

Online: 6 December 2022 (02:09:28 CET)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202206.0050.v2

Harvesting Context and Mining Emotions Related to Olfactory Cultural Heritage

M.Besher Massri, Inna Novalija, Dunja Mladenić, Janez Brank, Sara Graça da Silva, Natasza Marrouch, Carla Murteira, Ali Hürriyetoğlu, Beno Šircelj

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Emotions Mining; Context Mining; Sensory Mining; Artificial Intelligence; Information extraction; Text classification; Fairy tales; Olfactory Cultural Heritage

Online: 2 August 2022 (07:57:35 CEST)

Show abstract| Download PDF| Share

Search Results

114 articles found