Spark NLP: A Versatile Solution for Structuring Data from Endoscopy Reports


  • Andrei Constantin IOANOVICI UMFST Targu Mures
  • Stefan Marius MĂRUŞTERI UMFST Targu Mures
  • Andrei Marian FEIER UMFST Targu Mures
  • Alina Dia TRAMBITAS-MIRON UMFST Targu Mures


Health record digitization, Spark NLP, Gastroenterology, Structured data


Artificial intelligence (AI) can be applied in the practice of gastroenterology to acquire and analyze information. Besides speed and duplicability, AI has the potential of also offering insight with results that surpass medical specialists. Natural language processing (NLP) is being used to extract information from text, organize and categorize documents. Processing unstructured data with NLP will result in structured data and medical codes can be extracted more easily (ICD10, medical procedure codes, etc) for reimbursement purposes among others. Recent research is studying the use of AI for automated interpretation of text from endoscopy and medical documents for better quality and patient phenotyping as well as enhanced detection and descriptions of endoscopic lesions such as colon polyps.  In this paper, we present a method of extracting medical data using Spark NLP (John Snow Labs, DE, USA), by annotating endoscopy reports and training a model to automatically extract labels in order to obtain structured medical data. This can be used in combination with other forms of structured data for an optimal and novel patient profiling.




How to Cite

IOANOVICI AC, MĂRUŞTERI SM, FEIER AM, TRAMBITAS-MIRON AD. Spark NLP: A Versatile Solution for Structuring Data from Endoscopy Reports. Appl Med Inform [Internet]. 2021 Sep. 5 [cited 2023 Mar. 26];43(Suppl. S1):26. Available from:



Special Issue - RoMedINF