Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Topological Signature of 19th Century Novelists: Persistence Homology in Context-Free Text Mining

Version 1 : Received: 23 September 2018 / Approved: 24 September 2018 / Online: 24 September 2018 (15:33:02 CEST)

A peer-reviewed article of this Preprint also exists.

Gholizadeh, S.; Seyeditabari, A.; Zadrozny, W. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining. Big Data Cogn. Comput. 2018, 2, 33. Gholizadeh, S.; Seyeditabari, A.; Zadrozny, W. Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining. Big Data Cogn. Comput. 2018, 2, 33.

Abstract

Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided, the topology of different entities through a textural document may reveal some additive information regarding the document that is not reflected in any other features from traditional text processing methods. In this paper, we introduce a novel approach that hires TDA in text processing in order to capture and use the topology of different same-type entities in textural documents. First, we will show how to extract some topological signatures in the text using persistent homology-i.e., a TDA tool that captures topological signature of data cloud. Then we will show how to utilize these signatures for text classification.

Keywords

topological data analysis; text mining; computational topology; style; persistent homology

Subject

Computer Science and Mathematics, Information Systems

Comments (1)

Comment 1
Received: 22 October 2018
Commenter: Kamillah
The commenter has declared there is no conflict of interests.
Comment: Dear Authors,

While this paper (article?) may indeed carry some academic value, we will never know it, since the quality of English used and the all-too-visible lack of proofreading makes it nigh-unreadable.
Please, please, for the love of God and all that is holy, PLEASE hire someone to proofread your theses. It's not that expensive.

Sincerely,
A student, who was forced to make a presentation on your paper.
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.