It may one day be possible to use extended language models (LLMs) to automatically read clinical notes in medical records and reliably and efficiently extract relevant information to support patient care or research. But a recent study from Columbia University’s Mailman School of Public Health using ChatGPT-4 to read emergency room admissions medical notes to determine whether injured scooter and bicycle riders were wearing helmets finds that LLMs cannot yet do this reliably. The results are published in JAMA Open Network.
In a study of 54,569 emergency department visits among patients injured while riding a bicycle, scooter, or other micromobility from 2019 to 2022, LLM AI struggled to replicate the results of a text string search approach to extract helmet status from clinical notes.
The LLM performed well only when the prompt included the entire text used in the text-string search approach. The LLM also struggled to reproduce its work across trials on each of the five successive days, performing better at reproducing its hallucinations than its precise work. It particularly struggled when sentences were negated, such as reading “no helmet” or “no helmet” and reporting that the patient was wearing a helmet.
Electronic medical records contain large amounts of relevant medical data in the form of written clinical notes, a type of unstructured data. Efficient methods for reading and extracting information from these notes would be extremely useful for research.
Currently, the information contained in these clinical records can be extracted using simple text search methods using string matching or more sophisticated artificial intelligence (AI)-based approaches such as natural language processing. It was hoped that a new LLM, such as ChatGPT-4, could extract the information more quickly and reliably.
“While we see potential efficiency gains in using generative AI LLM for information extraction tasks, issues with reliability and hallucinations currently limit its utility,” said Andrew Rundle, DrPH, professor of epidemiology at Columbia Mailman School and lead author.
“When we used very detailed prompts that included all the text strings related to the headsets, ChatGPT-4 was able to extract accurate data from the clinical notes on some days. But the time it took to define and test all the text that needed to be included in the prompt, and ChatGPT-4’s inability to reproduce its work day after day, tells us that ChatGPT-4 was not yet up to the task.”
Using publicly available data from 2019 to 2022 from the U.S. Consumer Product Safety Commission’s National Electronic Injury Surveillance System, a sample of 96 U.S. hospitals, Rundle and colleagues analyzed emergency department records of patients injured in e-bike, bicycle, hoverboard, and e-scooter crashes. They compared the results of ChatGPT-4’s chart analyses to data generated using more traditional text-based string searches, and for 400 charts, they compared ChatGPT’s analyses to their own reading of clinical notes in the charts.
This research builds on their work investigating how to prevent injuries among micromobility users (i.e., cyclists, e-bike riders, scooter riders). “Helmet use is a key predictor of injury severity, but in most medical records and emergency department incident reports, information about helmet use is buried in the clinical notes written by the physician or EMS responder. There is a significant need for research to reliably and efficiently access this information,” said Kathryn Burford, lead author of the paper and a postdoctoral researcher in the Department of Epidemiology at the Mailman School.
“Our study looked at the potential of an LLM to extract information from clinical notes, a rich source of information for healthcare professionals and researchers,” Rundle said. “But at the time we were using ChatGPT-4, it couldn’t provide us with reliable data.”
Co-authors are Nicole G. Itzkowitz of the Columbia Mailman School of Public Health; Ashley G. Ortega of the Columbia Population Research Center; and Julien O. Teitler of the Columbia School of Social Work.
More information:
Kathryn G. Burford et al., Using Generative AI to Identify Helmet Status in Patients with Micromobility Injuries from Unstructured Clinical Notes, Opening of the JAMA network (2024). DOI: 10.1001/jamanetworkopen.2024.25981
Provided by Columbia University Mailman School of Public Health
Quote:Generative AI cannot yet reliably read and extract information from clinical notes in medical records, study finds (2024, August 19) retrieved August 19, 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.