Preprint Article Version 1 This version is not peer-reviewed

Implementing Naive Bayes Algorithm for Detecting Spam Emails on Datasets

Version 1 : Received: 13 May 2020 / Approved: 14 May 2020 / Online: 14 May 2020 (12:12:51 CEST)

How to cite: GHOSH, A.; SENTHILRAJAN, A. Implementing Naive Bayes Algorithm for Detecting Spam Emails on Datasets. Preprints 2020, 2020050241 (doi: 10.20944/preprints202005.0241.v1). GHOSH, A.; SENTHILRAJAN, A. Implementing Naive Bayes Algorithm for Detecting Spam Emails on Datasets. Preprints 2020, 2020050241 (doi: 10.20944/preprints202005.0241.v1).

Abstract

Email is the most common as well as the fastest medium for communicating around the globe. But, presently every day we used to get lots of junk emails in the name of “spam”. This “spam” emails mainly used to contain two types of content, those are content like an advertisement, offers and, criminal activity content like a phishing website link, malware, trojan, etc. Those advertisements, offer types of spam or junk emails known as Unsolicited Commercial Emails and, those emails contain phishing website link, malware, trojan used to known as Unsolicited Bulk Emails. Whoever used to send spam emails, they are known as Spammers. Spammers mainly used to get the email address of target user from the websites, junk sites, browsers add on, etc. Naive Bayes algorithm is a probabilistic machine learning algorithm that mainly well-known for classifying spam emails. Naive Bayes algorithm mainly originated from Bayes Theorem. Bayes Theorem mainly used in conditional probability for elaborates the probability of an event in terms of when the probability of other event is true. In this research work, we have been performing Feature Extraction in terms of email characteristics and behavior. In this paper, we have been proposed a detection approach for classifying spam emails using Naïve Bayes classifier. In this research work, we have been used multiple email data-sets for implementing Naïve Bayes classifier. Those data sets are Spam Corpus, Spambase. Based on the results of WEKA (Waikato Environment for Knowledge Analysis) tool, we have been performing Experimental analysis in terms of measuring the performance of Naïve Bayes classifier using parameters like Accuracy, Recall, Precision, F-measure. Based on correctly classified instances of emails and incorrectly classified instances of emails, lastly comparing the performance of Naïve Bayes classifier in multiple data sets.

Subject Areas

Bayes Theorem; Naïve Bayes Classifier; Spam Detection; Feature Extraction; Text Classification; WEKA

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.