Enhancing the Accuracy of Large Language Models in Medical Coding through Retrieval-Based Approaches
Keywords:
Natural Language Processing, Retrieve-Rank system, Automated diagnosis coding, Machine learning in healthcare, Medical informatics, ICD-10-CM codingAbstract
Purpose: Medical coding, essential for healthcare administration and research, requires significant expertise and resources. While Large Language Models (LLMs) showed promise in automating this task, recent studies highlighted their limitations, with even advanced models like GPT-4 achieving only moderate accuracy. This study presents a novel Retrieve-Rank system combining ColBERT-V2 retriever with GPT-3.5-turbo for medical coding automation. Methods: We evaluated the performance of our Retrieve-Rank system against a Vanilla LLM approach using a dataset of 100 single-term medical conditions with corresponding International Classification of Diseases, 10th edition, Clinical Modification (ICD-10-CM) codes, which is the latest version of the standardized system used to code diseases and medical conditions used in the United States. The system employed a two-step process: first, retrieving the top-15 most relevant codes using ColBERT-V2, then applying GPT-3.5-turbo for reranking to select the most appropriate code. The experiment was conducted on 1st June 2024. Performance was measured using top-one accuracy with normalized ICD-10-CM codes. Results: Our Retrieve-Rank system achieved 100% accuracy in code identification, significantly outperforming the Vanilla LLM approach's 6% accuracy. This improvement is particularly noteworthy as it was achieved using GPT-3.5, a more accessible model than GPT-4, demonstrating that LLMs, when equipped with appropriate retrieval mechanisms, can effectively overcome their inherent limitations in medical coding tasks. Conclusions: While our study was limited to single-term conditions, the results suggest significant potential for broader applications in healthcare administration. This research contributes to bridging the gap between AI capabilities and clinical implementation, offering a promising approach to automating medical coding while maintaining high accuracy. Future research should focus on validating these findings with more complex, real-world medical cases and unstructured clinical notes.
Downloads
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Keith KWAN, Hao CHEN, Ho Hung Billy CHEUNG

All papers published in Applied Medical Informatics are licensed under a Creative Commons Attribution (CC BY 4.0) International License.