Complex document image segmentation using localized histogram analysis with multi-layer matching and clustering

Yen Lin Chen*, Chung Cheng Chiu, Bing-Fei Wu

*Corresponding author for this work

Research output: Contribution to journalConference article

2 Scopus citations

Abstract

This paper proposes a new segmentation method to separate the text from various complex document images. An automatic multilevel thresholding method, based on discriminant analysis, is utilized to recursively segment a specified block region into several layered image sub-blocks. Then the multi-layer region-based clustering method is performed to process the layered image sub-blocks to form several object layers. Hence character strings with different illuminations, non-text objects and background components are segmented into separate object layers. After performed text extraction process, the text objects with different sizes, styles and illuminations are properly extracted. Experimental results on the extraction of text strings from complex document images demonstrate the effectiveness of the proposed region-based segmentation method.

Original languageEnglish
Pages (from-to)3063-3070
Number of pages8
JournalConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Volume4
DOIs
StatePublished - 1 Dec 2004
Event2004 IEEE International Conference on Systems, Man and Cybernetics, SMC 2004 - The Hague, Netherlands
Duration: 10 Oct 200413 Oct 2004

Keywords

  • Document analysis
  • Image segmentation
  • Multilevel thresholding
  • Region-based segmentation

Fingerprint Dive into the research topics of 'Complex document image segmentation using localized histogram analysis with multi-layer matching and clustering'. Together they form a unique fingerprint.

Cite this