BRAIN. Broad Research in Artificial Intelligence and Neuroscience

Volume: 16 | Issue: 2 | Paper number: 29.

Pixel-Wise Method for Enhanced Tesseract OCR Accuracy Using Colour and Spatial Distances

Published June 10, 2025
Cite
Mihai-Lucian Voncilă - National University of Science and Technology Politehnica Bucharest (RO), Nicolae Tarbă - National University of Science and Technology Politehnica Bucharest (RO), Cosmin-Dumitru Oprea - National University of Science and Technology Politehnica Bucharest (RO), Costin Anton Boiangiu - National University of Science and Technology Politehnica Bucharest (RO), Nicolae Goga - National University of Science and Technology Politehnica Bucharest (RO),

Abstract

Digital images often contain noise introduced during acquisition, storage, or transmission, which can hinder the performance of Optical Character Recognition systems. Effective noise reduction is essential for improving the accuracy of these systems, as noise can obscure text and reduce recognition rates. The problem of removing noise from images is widely studied in computer vision but remains challenging due to the variety of noise types and the risk of introducing artifacts or blurring. In this work, we propose a new preprocessing algorithm that is used in conjunction with the Tesseract engine, in order to improve its overall accuracy. We test this method against the SmartDoc dataset, which contains images taken from mobile devices, and obtain an improvement over the original accuracy of 6.5%. The method is also compared to several other classical algorithms such as Mean Filter, Median Filter, Bilateral Filter, Adaptive Smoothing, and others showing improved results over each individual one.

Full Text:

PDF

DOI: http://dx.doi.org/10.70594/brain/16.2/29

Article Overview Video

From our Blog




(C) 2010-2026 EduSoft