Accelerating web content filtering by the early decision algorithm

Po Ching Lin*, Ming Dao Liu, Ying-Dar Lin, Yuan Cheng Lai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Real-time content analysis is typically a bottleneck inWeb filtering. To accelerate the filtering process, this work presents a simple, but effective early decision algorithm that analyzes only part of the Web content. This algorithm can make the filtering decision, either to block or to pass the Web content, as soon as it is confident with a high probability that the content really belongs to a banned or an allowed category. Experiments show the algorithm needs to examine only around one-fourth of the Web content on average, while the accuracy remains fairly good: 89% for the banned content and 93% for the allowed content. This algorithm can complement otherWeb filtering approaches, such as URL blocking, to filter the Web content with high accuracy and efficiency. Text classification algorithms in other applications can also follow the principle of early decision to accelerate their applications.

Original languageEnglish
Pages (from-to)251-257
Number of pages7
JournalIEICE Transactions on Information and Systems
VolumeE91-D
Issue number2
DOIs
StatePublished - 1 Jan 2008

Keywords

  • Early decision
  • Text classification
  • Web filtering
  • World Wide Web

Fingerprint Dive into the research topics of 'Accelerating web content filtering by the early decision algorithm'. Together they form a unique fingerprint.

Cite this