On the applicability of the longest-match rule in lexical analysis

Wuu Yang*, Chey Woei Tsay, Jien Tsai Chan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The lexical analyzer of a compiler usually adopts the longest-match rule to resolve ambiguities when deciding the next token in the input stream. However, that rule may not be applicable in all situations. Because the longest-match rule is widely used, a language designer or a compiler implementor frequently overlooks the subtle implications of the rule. The consequence is either a flawed language design or a deficient implementation. We propose a method that automatically checks the applicability of the longest-match rule and identifies precisely the situations in which that rule is not applicable. The method is useful to both language designers and compiler implementors. In particular, the method is indispensable to automatic generators of language translation systems since, without the method, the generated lexical analyzers can only blindly apply the longest-match rule and this results in erroneous behaviors. The crux of the method consists of two algorithms: one is to compute the regular set of the sequences of tokens produced by a nondeterministic Mealy automaton when the automaton processes elements of an input regular set. The other is to determine whether a regular set and a context-free language have nontrivial intersection with a set of equations.

Original languageEnglish
Pages (from-to)273-288
Number of pages16
JournalComputer Languages, Systems and Structures
Volume28
Issue number3
DOIs
StatePublished - 1 Oct 2002

Keywords

  • Compiler
  • Context-free grammar
  • Finite-state automaton
  • Lexical analyzer
  • Mealy automaton
  • Moore automaton
  • Parser
  • Regular expression
  • Scanner

Fingerprint Dive into the research topics of 'On the applicability of the longest-match rule in lexical analysis'. Together they form a unique fingerprint.

Cite this