Abstract
With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.
Original language | English |
---|---|
Pages (from-to) | 213-232 |
Number of pages | 20 |
Journal | Machine Graphics and Vision |
Volume | 14 |
Issue number | 2 |
Publication status | Published - 2005 |
Externally published | Yes |
Keywords
- Amharic
- Arabic
- Confusion matrix
- Fonts
- OCR
- Optical character recognition
- Script
- Template matching
- Unicode