Digitization of Health Records using Spark OCR

Authors

  • Dia Alina TRAMBITAS-MIRON ”George Emil Palade” University of Medicine, Pharmacy, Sciences and Technology, Faculty of Medicine
  • Andrei Marian FEIER ”George Emil Palade” University of Medicine, Pharmacy, Sciences and Technology, Faculty of Medicine
  • Marius MĂRUŞTERI ”George Emil Palade” University of Medicine, Pharmacy, Sciences and Technology, Faculty of Medicine
  • Andrei IOANOVICI ”George Emil Palade” University of Medicine, Pharmacy, Sciences and Technology, Faculty of Medicine

Keywords:

Health record digitization, Spark OCR, Transformers

Abstract

Processing data in the healthcare domain often involves extracting information from documents with complex and heterogeneous formats such as forms, lab results, academic papers, receipts, genomic sequencing reports, signed legal agreements, clinical trial documents, application forms, invoices, etc. Those documents are usually available in paper format and their digitization and analysis in a secured, integrated and accurate manner remains a challenge. In this presentation we will explain our approach to patient charts digitization using Spark OCR library and the transformers it offers, with concrete code examples and obtained results. Spark OCR library enables the processing of documents privately without uploading them to a cloud service; and most importantly, provides state-of-the-art accuracy for a variety of common use cases. A primary method of maximizing accuracy is using a set of pre-built image pre-processing transformers - for noise reduction, skew correction, object removal, automated scaling, erosion, binarization, and dilation. These transformers can be combined into OCR pipelines that effectively resolve common 'document noise' issues that reduce OCR accuracy.

Downloads

Published

05.09.2021

How to Cite

1.
TRAMBITAS-MIRON DA, FEIER AM, MĂRUŞTERI M, IOANOVICI A. Digitization of Health Records using Spark OCR. Appl Med Inform [Internet]. 2021 Sep. 5 [cited 2024 Apr. 22];43(Suppl. S1):30. Available from: https://ami.info.umfcluj.ro/index.php/AMI/article/view/827

Issue

Section

Special Issue - RoMedINF