Thinning Arabic characters for feature extraction

John Cowell, Fiaz Hussain

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Citations (Scopus)

Abstract

A successful approach to the recognition of Latin characters is to extract features from that character such as the number of strokes, stroke intersections and holes, and to use ad-hoc tests to differentiate between characters which have similar features. The first stage in this process is to produce thinned 1 pixel thick representations of the characters to simplify feature extraction. This approach works well with printed Latin characters which are of high quality. With poor quality characters, however, the thinning process itself is not straightforward and can introduce errors which are manifested in the later stages of the recognition process. The recognition of poor quality Arabic characters is a particular problem since the characters are calligraphic with printed characters having widely varying stroke thicknesses to simulate the drawing of the character with a calligraphy pen or brush. This paper describes the problems encountered when thinning large poor quality Arabic characters prior to the extraction of their features and submission to a syntactic recognition system.

Original languageEnglish
Title of host publicationProceedings - 5th International Conference on Information Visualisation, IV 2001
EditorsF. Khosrowshahi, E. Banissi, M. Sarfraz, A. Ursyn
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages181-185
Number of pages5
ISBN (Electronic)0769511953
DOIs
Publication statusPublished - 2001
Externally publishedYes
Event5th International Conference on Information Visualisation, IV 2001 - London, United Kingdom
Duration: 25 Jul 200127 Jul 2001

Publication series

NameProceedings of the International Conference on Information Visualisation
Volume2001-January
ISSN (Print)1093-9547

Conference

Conference5th International Conference on Information Visualisation, IV 2001
Country/TerritoryUnited Kingdom
CityLondon
Period25/07/0127/07/01

Keywords

  • Arabic
  • Characters
  • Ocr
  • Optical character recognition
  • Thinning
  • Urdu

Cite this