Version 1
: Received: 3 May 2023 / Approved: 5 May 2023 / Online: 5 May 2023 (03:38:42 CEST)
How to cite:
TAŞAR, D.E.; ÖCAL TAŞAR, C. Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy. Preprints2023, 2023050287. https://doi.org/10.20944/preprints202305.0287.v1
TAŞAR, D.E.; ÖCAL TAŞAR, C. Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy. Preprints 2023, 2023050287. https://doi.org/10.20944/preprints202305.0287.v1
TAŞAR, D.E.; ÖCAL TAŞAR, C. Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy. Preprints2023, 2023050287. https://doi.org/10.20944/preprints202305.0287.v1
APA Style
TAŞAR, D.E., & ÖCAL TAŞAR, C. (2023). Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy. Preprints. https://doi.org/10.20944/preprints202305.0287.v1
Chicago/Turabian Style
TAŞAR, D.E. and Ceren ÖCAL TAŞAR. 2023 "Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy" Preprints. https://doi.org/10.20944/preprints202305.0287.v1
Abstract
With the increasing use of cloud-based services for training and deploying machine learning models, data privacy has become a major concern. This is particularly important for natural language processing (NLP) models, which often process sensitive information such as personal communications and confidential documents. In this study, we propose a method for training NLP models on encrypted text data to mitigate data privacy concerns while maintaining similar performance to models trained on non-encrypted data. We demonstrate our method using two different architectures, namely Doc2Vec+XGBoost and Doc2Vec+LSTM, and evaluate the models on the 20 Newsgroups dataset. Our results indicate that both encrypted and non-encrypted models achieve comparable performance, suggesting that our encryption method is effective in preserving data privacy without sacrificing model accuracy. In order to replicate our experiments, we have provided a Colab notebook at the following address: https://t.ly/lR-TP
Keywords
Natural language processing; encrypted text; data privacy; cloud computing; Doc2Vec; XGBoost; LSTM
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.