Fine-tuning Llama-2-13B with AI-generated medical diagnoses: A novel strategy for optimizing ICD coding in gynecologic oncology

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: Given the substantial advancements in Large Language Models (LLMs), this study aimed to explore the effectiveness of using AI-generated medical diagnoses in the fine-tuning of the Llama-2 model, with the objective of optimizing the ICD10 coding process for gynecologic oncology. This study aimed to fine-tune the Llama-2-13B model using AI-generated diagnostic texts based on ICD10 descriptors, focusing on gynecologic oncology for initial validation. Materials and methods: AI-generated diagnostic texts were rigorously confirmed to ensure medical coherence and reliability for fine-tuning. Four models were established: The original Llama-2-13B (Model 1); a model fine-tuned with basic ICD10 codes (Model 2); a model trained with an additional set of 10 AI-generated diagnosis statements per ICD10 code (Model 3); and the forth model trained with an additional set of 20 AI-generated statements per code (Model 4). Validation involved a set of 83 discharge records related to gynecologic oncology, derived from 2415 discharge records collected from January 1, 2020, and June 30, 2023. Results: Validation results for the models showed significant improvement in the accuracy rates and Kappa scores: Model 1 (native Llama-2-13B) had an accuracy of 0.06 and a Kappa score of 0.04, Model 2 achieved 0.24 and 0.19, Model 3 reached 0.90 and 0.89, and Model 4 greatly improved to 0.95 and 0.94. Conclusion: The use of prompts to generate diagnostic descriptions, coupled with AI-generated data for model fine-tuning, resulted in a substantial enhancement in the Llama-2-13B model's capability to accurately determine ICD diagnostic codes from medical records. This methodology offers a cost-effective strategy, optimizes model accuracy, and underscores the potential for broader applications due to the LLM's generative capabilities.

Original languageEnglish
Pages (from-to)978-984
Number of pages7
JournalTaiwanese Journal of Obstetrics and Gynecology
Volume64
Issue number6
DOIs
Publication statusPublished - 2025 Nov

All Science Journal Classification (ASJC) codes

  • Obstetrics and Gynaecology

Cite this