Mihai-Lucian Voncilă ORCID iD National University of Science and Technology Politehnica Bucharest Romania
Computer Science and Engineering Department, Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Bucharest, Romania.
Nicolae Tarbă ORCID iD National University of Science and Technology Politehnica Bucharest Romania
Computer Science and Engineering Department, Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Bucharest, Romania. nicolae.tarba@upb.ro https://orcid.org/0000-0002-7769-8289
Cosmin-Dumitru Oprea
National University of Science and Technology Politehnica Bucharest Romania
Computer Science and Engineering Department, Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Bucharest, Romania. cosmin_dumitru.oprea@upb.ro
Costin Anton Boiangiu ORCID iD National University of Science and Technology Politehnica Bucharest Romania
Computer Science and Engineering Department, Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Bucharest, Romania.
Nicolae Goga ORCID iD National University of Science and Technology Politehnica Bucharest Romania
The Faculty of Engineering in Foreign Languages, Faculty of Automatic Control and Computers, National University of Science and Technology Politehnica Bucharest, Bucharest, Romania.
Mihai-Lucian Voncilă -
National University of Science and Technology Politehnica Bucharest (RO),
Nicolae Tarbă -
National University of Science and Technology Politehnica Bucharest (RO),
Cosmin-Dumitru Oprea -
National University of Science and Technology Politehnica Bucharest (RO),
Costin Anton Boiangiu -
National University of Science and Technology Politehnica Bucharest (RO),
Nicolae Goga -
National University of Science and Technology Politehnica Bucharest (RO),
Abstract
Digital images often contain noise introduced during acquisition, storage, or transmission, which can hinder the performance of Optical Character Recognition systems. Effective noise reduction is essential for improving the accuracy of these systems, as noise can obscure text and reduce recognition rates. The problem of removing noise from images is widely studied in computer vision but remains challenging due to the variety of noise types and the risk of introducing artifacts or blurring. In this work, we propose a new preprocessing algorithm that is used in conjunction with the Tesseract engine, in order to improve its overall accuracy. We test this method against the SmartDoc dataset, which contains images taken from mobile devices, and obtain an improvement over the original accuracy of 6.5%. The method is also compared to several other classical algorithms such as Mean Filter, Median Filter, Bilateral Filter, Adaptive Smoothing, and others showing improved results over each individual one.