The Statistical Dictionary-Based String Matching Problem

M. Suri, S. Rini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the Dictionary-based String Matching (DSM) problem, an Information Retrieval (IR) system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the IR system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we define the communication efficiency of the IR system as the average cost in retrieving the entries of the posting list from the posting table, in the limit of an infinitely long source sequence. This formulation is used to study the communication efficiency for the case in which the dictionary is composed of (i) all the strings of a given length, referred to as k-grams, and (ii) run-length codes.

Original languageEnglish
Title of host publicationIWCIT 2019 - Iran Workshop on Communication and Information Theory
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728105840
DOIs
StatePublished - Apr 2019
Event2019 Iran Workshop on Communication and Information Theory, IWCIT 2019 - Tehran, Iran, Islamic Republic of
Duration: 24 Apr 201925 Apr 2019

Publication series

NameIWCIT 2019 - Iran Workshop on Communication and Information Theory

Conference

Conference2019 Iran Workshop on Communication and Information Theory, IWCIT 2019
CountryIran, Islamic Republic of
CityTehran
Period24/04/1925/04/19

Keywords

  • Content based retrieval
  • Dictionary-based string matching
  • Indexing database
  • Information retrieval
  • Phrase searching

Fingerprint Dive into the research topics of 'The Statistical Dictionary-Based String Matching Problem'. Together they form a unique fingerprint.

Cite this