Malicious URL detection based on Kolmogorov complexity estimation

Hsing Kuo Pao, Yan Lin Chou, Yuh-Jye Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Malicious URL detection has drawn a significant research attention in recent years. It is helpful if we can simply use the URL string to make precursory judgment about how dangerous a website is. By doing that, we can save efforts on the website content analysis and bandwidth for content retrieval. We propose a detection method that is based on an estimation of the conditional Kolmogorov complexity of URL strings. To overcome the incomputability of Kolmogorov complexity, we adopt a compression method for its approximation, called conditional Kolmogorov measure. As a single significant feature for detection, we can achieve a decent performance that can not be achieved by any other single feature that we know. Moreover, the proposed Kolmogorov measure can work together with other features for a successful detection. The experiment has been conducted using a private dataset from a commercial company which can collect more than one million unclassified URLs in a typical hour. On average, the proposed measure can process such hourly data in less than a few minutes.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2012
Pages380-387
Number of pages8
DOIs
StatePublished - 1 Dec 2012
Event2012 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2012 - Macau, China
Duration: 4 Dec 20127 Dec 2012

Publication series

NameProceedings - 2012 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2012

Conference

Conference2012 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2012
CountryChina
CityMacau
Period4/12/127/12/12

Keywords

  • blacklist
  • compression
  • entropy
  • Kolmogorov complexity
  • malicious URL

Fingerprint Dive into the research topics of 'Malicious URL detection based on Kolmogorov complexity estimation'. Together they form a unique fingerprint.

Cite this