TY - GEN
T1 - Generalized aho-corasick algorithm for signature based anti-virus applications
AU - Lee, Tsern-Huei
PY - 2007/12/1
Y1 - 2007/12/1
N2 - Because of its accuracy, signature matching is considered an important technique in anti-virus/worm applications. Among some famous pattern matching algorithms, the Aho-Corasick (AC) algorithm can match multiple patterns simultaneously and guarantee deterministic performance under all circumstances and thus is widely adopted in various systems, especially when worst-case performance such as wire speed requirement is a design factor. However, the AC algorithm was developed only for strings while virus/worm signatures could be specified by simple regular expressions. In this paper, we generalize the AC algorithm to systematically construct a finite state pattern matching machine which can indicate the ending position in a finite input string for the first occurrence of virus/worm signatures that are specified by strings or simple regular expressions. The regular expressions studied in this paper may contain the following operators: * (match any number of symbols), ? (match any symbol), and {min, max} (match minimum of min, maximum of max symbols), which are defined in ClamAV, a popular open source anti-virus/worm software module, for signature specification.
AB - Because of its accuracy, signature matching is considered an important technique in anti-virus/worm applications. Among some famous pattern matching algorithms, the Aho-Corasick (AC) algorithm can match multiple patterns simultaneously and guarantee deterministic performance under all circumstances and thus is widely adopted in various systems, especially when worst-case performance such as wire speed requirement is a design factor. However, the AC algorithm was developed only for strings while virus/worm signatures could be specified by simple regular expressions. In this paper, we generalize the AC algorithm to systematically construct a finite state pattern matching machine which can indicate the ending position in a finite input string for the first occurrence of virus/worm signatures that are specified by strings or simple regular expressions. The regular expressions studied in this paper may contain the following operators: * (match any number of symbols), ? (match any symbol), and {min, max} (match minimum of min, maximum of max symbols), which are defined in ClamAV, a popular open source anti-virus/worm software module, for signature specification.
UR - http://www.scopus.com/inward/record.url?scp=40949162082&partnerID=8YFLogxK
U2 - 10.1109/ICCCN.2007.4317914
DO - 10.1109/ICCCN.2007.4317914
M3 - Conference contribution
AN - SCOPUS:40949162082
SN - 9781424412518
T3 - Proceedings - International Conference on Computer Communications and Networks, ICCCN
SP - 792
EP - 797
BT - Proceedings of 16th International Conference on Computer Communications and Networks 2007, ICCCN 2007
Y2 - 13 August 2007 through 16 August 2007
ER -