TY - JOUR
T1 - Recognition-based character segmentation for multi-level writing style
AU - Inkeaw, Papangkorn
AU - Bootkrajang, Jakramate
AU - Charoenkwan, Phasit
AU - Marukatat, Sanparith
AU - Ho, Shinn-Ying
AU - Chaijaruwanich, Jeerayut
PY - 2018/6/1
Y1 - 2018/6/1
N2 - Character segmentation is an important task in optical character recognition (OCR). The quality of any OCR system is highly dependent on character segmentation algorithm. Despite the availability of various character segmentation methods proposed to date, existing methods cannot satisfyingly segment characters belonging to some complex writing styles such as the Lanna Dhamma characters. In this paper, a new character segmentation method named graph partitioning-based character segmentation is proposed to address the problem. The proposed method can deal with multi-level writing style as well as touching and broken characters. It is considered as a generalization of existing approaches to multi-level writing style. The proposed method consists of three phases. In the first phase, a newly devised over-segmentation technique based on morphological skeleton is used to obtain redundant fragments of a word image. The fragments are then used to form a segmentation hypotheses graph. In the last phase, the hypotheses graph is partitioned into subgraphs each corresponding to a segmented character using the partitioning algorithm developed specifically for character segmentation purpose. Experimental results based on handwritten Lanna Dhamma characters datasets showed that the proposed method achieved high correct segmentation rate and outperformed existing methods for the Lanna Dhamma alphabet.
AB - Character segmentation is an important task in optical character recognition (OCR). The quality of any OCR system is highly dependent on character segmentation algorithm. Despite the availability of various character segmentation methods proposed to date, existing methods cannot satisfyingly segment characters belonging to some complex writing styles such as the Lanna Dhamma characters. In this paper, a new character segmentation method named graph partitioning-based character segmentation is proposed to address the problem. The proposed method can deal with multi-level writing style as well as touching and broken characters. It is considered as a generalization of existing approaches to multi-level writing style. The proposed method consists of three phases. In the first phase, a newly devised over-segmentation technique based on morphological skeleton is used to obtain redundant fragments of a word image. The fragments are then used to form a segmentation hypotheses graph. In the last phase, the hypotheses graph is partitioned into subgraphs each corresponding to a segmented character using the partitioning algorithm developed specifically for character segmentation purpose. Experimental results based on handwritten Lanna Dhamma characters datasets showed that the proposed method achieved high correct segmentation rate and outperformed existing methods for the Lanna Dhamma alphabet.
KW - Character segmentation
KW - Graph partitioning
KW - Multi-level writing style
KW - Optical character recognition
KW - Touching and broken characters
UR - http://www.scopus.com/inward/record.url?scp=85047623457&partnerID=8YFLogxK
U2 - 10.1007/s10032-018-0302-5
DO - 10.1007/s10032-018-0302-5
M3 - Article
AN - SCOPUS:85047623457
VL - 21
SP - 21
EP - 39
JO - International Journal on Document Analysis and Recognition
JF - International Journal on Document Analysis and Recognition
SN - 1433-2833
IS - 1-2
ER -