TY - JOUR
T1 - Computer Vision Based Automatic Margin Computation Model for Digital Document Images
AU - Guha, Abhijit
AU - Samanta, Debabrata
AU - Sengar, Sandeep Singh
N1 - Publisher Copyright:
© 2023, Crown.
PY - 2023/3/7
Y1 - 2023/3/7
N2 - Margin, in typography, is described as the space between the text content and the document edges and is often essential information for the consumer of the document, digital or physical. In the present age of digital disruption, it is customary to store and retrieve documents digitally and retrieve information automatically from the documents when necessary. Margin is one such non-textual information that becomes important for some business processes, and the demand for computing margins algorithmically mounts to facilitate RPA. We propose a computer vision-based text localization model, utilizing classical DIP techniques such as smoothing, thresholding, and morphological transformation to programmatically compute the top, left, right, and bottom margins within a digital document image. The proposed model has been experimented with different noise filters and structural elements of various shapes and size to finalize the bilateral filter and lines and structural elements for the removal of noises most commonly occurring due to scans. The proposed model is targeted towards text document images and not the natural scene images. Hence, the existing benchmark models developed for text localization in natural scene images have not performed with the expected accuracy. The model is validated with 485 document images of a real-time business process of a reputed TI company. The results show that 91.34 % of the document images have conferred more than 90 % IoU value which is well beyond the accuracy range determined by the company for that specific process.
AB - Margin, in typography, is described as the space between the text content and the document edges and is often essential information for the consumer of the document, digital or physical. In the present age of digital disruption, it is customary to store and retrieve documents digitally and retrieve information automatically from the documents when necessary. Margin is one such non-textual information that becomes important for some business processes, and the demand for computing margins algorithmically mounts to facilitate RPA. We propose a computer vision-based text localization model, utilizing classical DIP techniques such as smoothing, thresholding, and morphological transformation to programmatically compute the top, left, right, and bottom margins within a digital document image. The proposed model has been experimented with different noise filters and structural elements of various shapes and size to finalize the bilateral filter and lines and structural elements for the removal of noises most commonly occurring due to scans. The proposed model is targeted towards text document images and not the natural scene images. Hence, the existing benchmark models developed for text localization in natural scene images have not performed with the expected accuracy. The model is validated with 485 document images of a real-time business process of a reputed TI company. The results show that 91.34 % of the document images have conferred more than 90 % IoU value which is well beyond the accuracy range determined by the company for that specific process.
KW - Computer vision
KW - Digital image processing
KW - Margin detection
KW - Text localization
UR - http://www.scopus.com/inward/record.url?scp=85150171922&partnerID=8YFLogxK
U2 - 10.1007/s42979-023-01693-5
DO - 10.1007/s42979-023-01693-5
M3 - Article
AN - SCOPUS:85150171922
SN - 2662-995X
VL - 4
JO - SN Computer Science
JF - SN Computer Science
IS - 3
M1 - 253
ER -