Two template matching approaches to Arabic, Amharic and Latin isolated characters recognition

John Cowell*, Fiaz Hussain

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.

Original languageEnglish
Pages (from-to)213-232
Number of pages20
JournalMachine Graphics and Vision
Volume14
Issue number2
Publication statusPublished - 2005
Externally publishedYes

Keywords

  • Amharic
  • Arabic
  • Confusion matrix
  • Fonts
  • OCR
  • Optical character recognition
  • Script
  • Template matching
  • Unicode

Cite this