Text extraction from complex document images using the multi-plane segmentation technique

Yen Lin Chen*, Bing Fei Wu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

This study presents a new method for extracting characters from various real-life complex document images. The proposed method applies a multi-plane segmentation technique to separate homogeneous objects including text blocks, non-text graphical objects, and background textures into individual object planes. It consists of two stages - automatic localized multilevel thresholding, and multi-plane region matching and assembling. Then a text extraction process can be performed on the resultant planes to detect and extract characters with different characteristics in the respective planes. The proposed method processes document images regionally and adaptively according to their respective local features. This allows preservation of detailed characteristics from extracted characters, especially small characters with thin strokes, as well as gradational illuminations of characters. This also permits background objects with uneven, gradational, and sharp variations in contrast, illumination, and texture to be handled easily and well. Experimental results on real-life complex document images demonstrate that the proposed method is effective in extracting characters with various illuminations, sizes, and font styles from various types of complex document images.

Original languageEnglish
Title of host publication2006 IEEE International Conference on Systems, Man and Cybernetics
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3540-3547
Number of pages8
ISBN (Print)1424401003, 9781424401000
DOIs
StatePublished - 1 Jan 2006
Event2006 IEEE International Conference on Systems, Man and Cybernetics - Taipei, Taiwan
Duration: 8 Oct 200611 Oct 2006

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Volume4
ISSN (Print)1062-922X

Conference

Conference2006 IEEE International Conference on Systems, Man and Cybernetics
CountryTaiwan
CityTaipei
Period8/10/0611/10/06

Fingerprint Dive into the research topics of 'Text extraction from complex document images using the multi-plane segmentation technique'. Together they form a unique fingerprint.

Cite this