A statistics-based approach to incrementally update inverted files

Wann Yun Shieh*, Chung-Ping Chung

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, is not suited to supporting incremental updates when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for future updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization count and space utilization. Simulation results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted-file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Information and Knowledge Engineering 2003
EditorsN. Goharian, N. Goharian
Pages38-43
Number of pages6
StatePublished - 1 Dec 2003
EventProceedings of the International Conference on Information and Knowledge Engineering 2003 - Las Vegas, NV, United States
Duration: 23 Jun 200326 Jun 2003

Publication series

NameProceedings of the International Conference on Information and Knowledge Engineering
Volume1

Conference

ConferenceProceedings of the International Conference on Information and Knowledge Engineering 2003
CountryUnited States
CityLas Vegas, NV
Period23/06/0326/06/03

Keywords

  • Incremental update
  • Information retrieval
  • Inverted file
  • Statistical approach

Fingerprint Dive into the research topics of 'A statistics-based approach to incrementally update inverted files'. Together they form a unique fingerprint.

Cite this