Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Towards Bengali Word Embedding: Corpus Creation, Intrinsic and Extrinsic Evaluations

Version 1 : Received: 22 December 2020 / Approved: 23 December 2020 / Online: 23 December 2020 (17:26:11 CET)

How to cite: Hossain, M.R.; Hoque, M.M. Towards Bengali Word Embedding: Corpus Creation, Intrinsic and Extrinsic Evaluations. Preprints 2020, 2020120600. https://doi.org/10.20944/preprints202012.0600.v1 Hossain, M.R.; Hoque, M.M. Towards Bengali Word Embedding: Corpus Creation, Intrinsic and Extrinsic Evaluations. Preprints 2020, 2020120600. https://doi.org/10.20944/preprints202012.0600.v1

Abstract

Distributional word vector representation orword embedding has become an essential ingredient in many natural language processing (NLP) tasks such as machine translation, document classification, information retrieval andquestion answering. Investigation of embedding model helps to reduce the feature space and improves textual semantic as well as syntactic relations.This paper presents three embedding techniques (such as Word2Vec, GloVe, and FastText) with different hyperparameters implemented on a Bengali corpusconsists of180 million words. The performance of the embedding techniques is evaluated with extrinsic and intrinsic ways. Extrinsic performance evaluated by text classification, which achieved a maximum of 96.48% accuracy. Intrinsic performance evaluatedby word similarity (e.g., semantic, syntactic and relatedness) and analogy tasks. The maximum Pearson (ˆr) correlation accuracy of 60.66% (Ssˆr) achieved for semantic similarities and 71.64% (Syˆr) for syntactic similarities whereas the relatedness obtained 79.80% (Rsˆr). The semantic word analogy tasks achieved 44.00% of accuracy while syntactic word analogy tasks obtained 36.00%

Keywords

Natural language processing; Extrinsic evaluation; Intrinsic evaluation; Word analogy; Word embedding

Subject

Engineering, Automotive Engineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.