LINGUISTIC CRITERIA FOR ARABIC TEXT OPTICAL RECOGNITION

Redkin, O.
Abstract:
Each written text could be considered as a sequence of symbols put up in groups organized in a certain order and assembled in a linear sequence, which is may be arranged horizontally or vertically. The existing methods of optical character recognition (OCR) aim at linear and vertical segmentation of written texts basing on interword spaces and intervals between letters as the demarcation markers of lexical units and characters respectively. This strategy is effective for many languages meanwhile building robust OCR techniques for Arabic still remains extremely challenging task. The problem lies primarily in the very character of the Arabic script in which letters may vary depending on their position in words and have different lengths and heights not talking about its cursiveness. In this case, pure mathematical methods of OCR have limited efficiency. We suggest methodology which along with 'traditional' attitudes takes into consideration such linguistic data as character entries frequency index, compatibility of characters within words, word frequency index for Arabic. This data are among the indicators that facilitate interpretation and identification of written text and should be used as the components to build a robust method of OCR for Arabic based scripts.
SGEM Research areas:
Year:
2018
Type of Publication:
In Proceedings
Keywords:
language; Arabic; script; recognition; OCR.
Volume:
18
SGEM Book title:
5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018
Book number:
3.1
SGEM Series:
International Multidisciplinary Scientific Conference on Social Sciences and Arts-SGEM
Pages:
277-282
Publisher address:
51 Alexander Malinov blvd, Sofia, 1712, Bulgaria
SGEM supporters:
Bulgarian Acad Sci; Acad Sci Czech Republ; Latvian Acad Sci; Polish Acad Sci; Russian Acad Sci; Serbian Acad Sci & Arts; Slovak Acad Sci; Natl Acad Sci Ukraine; Natl Acad Sci Armenia; Sci Council Japan; World Acad Sci; European Acad Sci, Arts & Letters; Ac
Period:
19 -21 March, 2018
ISBN:
978-619-7408-32-4
ISSN:
2367-5659
Conference:
5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018 , 19 -21 March, 2018
DOI:
10.5593/sgemsocial2018H/31/S10.035
Hits: 126