Text is commonly printed on a complex background. Segmenting text is an important part in document analysis. In the past some methods have been shown for the segmentation of texts with images. However, previous studies have not sufficiently addressed complex compound documents. This investigation presents an algorithm for the segmentation of text in various document images. The proposed segmentation algorithm applies a new multilayer segmentation method to separate the text from various compound document images, independent from the text and background overlapping or not. This method solves various problems associated with the complexity of background images. Experimental results obtained using various document images scanned from book covers, advertisements, brochures and magazines, reveal that the proposed algorithm can successfully segment Chinese and English text strings from various backgrounds, regardless of whether the texts are over a simple, slowly varying or rapidly varying background texture.
|Number of pages||29|
|Journal||International Journal of Pattern Recognition and Artificial Intelligence|
|State||Published - 1 Dec 2005|
- Complex compound document
- Document analysis
- Image segmentation
- Text extraction