An online subject-based spam filter using natural language features

Chih Ning Lee, Yi Ruei Chen, Wen-Guey Tzeng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper proposes an online subject-based spam filter built upon an extended version of weighted naive Bayesian (WNB) classifier. The spam filter checks email subjects only. It is faster than spam filters that scan whole body of emails and useful even spam senders temper email bodies to avoid filtering. In addition to the widely used bag-of-word feature, we further consider statistical and nature language features to discover new characteristics from email subjects. In online learning, we use an extended WNB classifier. It is not only computationally efficient, but also more adaptive to the changes of spams with new malicious campaigns. The proposed classifier is immune to the spams with malicious campaigns beyond contemplation. We evaluate the performance of our spam filter on 8 well-known ham-spam email datasets from TREC and Enron-Spam corpus. Our approach achieves 94.85% of accuracy and 95.8% of F1-measure on TREC datasets, and 95.74% of accuracy and 97.2% of F1-measure on Enron-Spam datasets. Compared with previous works of the same line, our approach has 2.43%, 2.3%, and 3.2% improvements on accuracy, true positive rate, and false positive rate, respectively.

Original languageEnglish
Title of host publication2017 IEEE Conference on Dependable and Secure Computing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages479-484
Number of pages6
ISBN (Electronic)9781509055692
DOIs
StatePublished - 18 Oct 2017
Event2017 IEEE Conference on Dependable and Secure Computing - Taipei, Taiwan
Duration: 7 Aug 201710 Aug 2017

Publication series

Name2017 IEEE Conference on Dependable and Secure Computing

Conference

Conference2017 IEEE Conference on Dependable and Secure Computing
CountryTaiwan
CityTaipei
Period7/08/1710/08/17

Keywords

  • Email
  • Naive Bayesian
  • Natural language
  • Spam filter
  • Subject

Fingerprint Dive into the research topics of 'An online subject-based spam filter using natural language features'. Together they form a unique fingerprint.

Cite this