Artificial intelligence (AI) has virtually limitless applications in healthcare, from automatically composing patient messages in MyChart to optimizing organ transplantation and improving the accuracy of tumor removal. Despite their potential benefits for doctors and patients, these tools have been met with skepticism due to concerns about patient privacy, the possibility of bias, and device accuracy.
In response to the rapid evolution of the use and approval of AI medical devices in healthcare, a multi-institutional team of researchers from UNC School of Medicine, Duke University, Ally Bank, Oxford University, Columbia University, and the University of Miami set out to build public trust and assess how exactly AI and algorithmic technologies are approved for use in patient care.
Together, Sammy Chouffani El Fassi, MD candidate at the UNC School of Medicine and investigator at the Duke Heart Center, and Gail E. Henderson, PhD, professor in the UNC Department of Social Medicine, conducted an in-depth analysis of clinical validation data from more than 500 medical AI devices, revealing that about half of the tools cleared by the U.S. Food and Drug Administration (FDA) lacked reported clinical validation data.
Their findings were published in Natural medicine.
“While AI device manufacturers boast about the credibility of their technology with FDA clearance, clearance does not mean that the devices have been adequately evaluated for clinical effectiveness using real-world patient data,” said Chouffani El Fassi, first author of the paper.
“With these results, we hope to encourage the FDA and industry to strengthen the credibility of device authorization by conducting clinical validation studies on these technologies and making the results of these studies publicly available.”
Since 2016, the average number of FDA AI medical device approvals per year has increased from 2 to 69, indicating significant growth in the commercialization of AI medical technologies. The majority of approved AI medical technologies are used to help physicians diagnose abnormalities in radiological imaging, analyze pathology slides, dose medications, and predict disease progression.
Artificial intelligence is able to learn and perform human-like functions using combinations of algorithms. The technology is then given a plethora of data and sets of rules to follow, so it can “learn” to detect patterns and relationships with ease.
From there, device makers must ensure that the technology does more than just remember data previously used to train AI, and can produce accurate results using novel solutions.
Regulation in a context of rapid proliferation of AI medical devices
Following the rapid proliferation of these devices and applications to the FDA, Chouffani El Fassi and Henderson et al. wondered to what extent the authorized devices were clinically effective and safe. Their team analyzed all submissions available in the FDA’s official database, “Artificial Intelligence and Machine Learning (AI/ML)-Based Medical Devices.”
“Many of the devices that came out after 2016 were new products, or were perhaps similar to a product that was already on the market,” Henderson said. “Using these hundreds of devices in this database, we wanted to determine what it actually means for an AI medical device to be cleared by the FDA.”
Of the 521 device approvals, 144 were classified as “retrospectively validated,” 148 as “prospectively validated,” and 22 were validated using randomized controlled trials. Specifically, 226 of the 521 FDA-approved medical devices, or approximately 43%, did not have published clinical validation data.
Some devices used “ghost images” or computer-generated images that were not from a real patient, which did not technically meet clinical validation requirements.
Additionally, the researchers found that the latest draft guidance, released by the FDA in September 2023, does not clearly distinguish between different types of clinical validation studies in its recommendations to manufacturers.
Types of clinical validation and new standard
In the field of clinical validation, there are three different methods by which researchers and device manufacturers validate the accuracy of their technologies: retrospective validation, prospective validation, and a subset of prospective validation called randomized controlled trials.
Retrospective validation involves feeding the AI model with image data from the past, such as chest X-rays of patients before the COVID-19 pandemic.
Prospective validation, however, typically produces stronger scientific evidence because the AI device is validated based on real-time data from patients. This method is more realistic, the researchers say, because it allows the AI to account for data variables that didn’t exist when it was trained, such as chest X-rays of patients that were affected by viruses during the COVID pandemic.
Randomized controlled trials are considered the gold standard for clinical validation. This type of prospective study uses randomized controls for confounding variables that would differentiate experimental and control groups, thereby isolating the therapeutic effect of the device.
For example, researchers could evaluate a device’s performance by randomly assigning patients to have a radiologist (control group) or an AI (experimental group) read their scans.
Because retrospective studies, prospective studies, and randomized controlled trials produce different levels of scientific evidence, the researchers involved in the study recommend that the FDA and device manufacturers clearly distinguish between the different types of clinical validation studies in their recommendations to manufacturers.
In their Natural medicine publication, Chouffani El Fassi, Henderson et al. present definitions for clinical validation methods that can be used as a standard in the field of medical AI.
“We have shared our findings with FDA directors who oversee medical device regulation, and we hope our work will inform their regulatory decision-making,” Chouffani El Fassi said.
“We also hope that our publication will inspire researchers and universities around the world to conduct clinical validation studies on medical AI to improve the safety and effectiveness of these technologies. We look forward to the positive impact this project will have on patient care on a large scale.”
Algorithms can save lives
Chouffani El Fassi is currently working with UNC cardiothoracic surgeons Aurelie Merlo and Benjamin Haithcock and the UNC Health leadership team to implement an algorithm into their electronic medical records system that automates the organ donor evaluation and referral process.
Unlike the rapid production of artificial intelligence devices, medicine lacks basic algorithms, such as software to diagnose patients from simple laboratory values in electronic medical records. Chouffani El Fassi explains that this is because implementation is often costly and requires interdisciplinary teams that have expertise in both medicine and computer science.
Despite the challenge, UNC Health is on a mission to improve the organ transplant space.
“Finding a potential organ donor, evaluating their organs, and then engaging the organ procurement organization to coordinate an organ transplant is a long and complicated process,” Chouffani El Fassi said.
“If this very basic computer algorithm works, we could optimize the organ donation process. Just one additional donor means many lives saved. With such a low threshold for success, we hope to give more people a second chance at life.”
More information:
Not all AI tools for healthcare that have regulatory approval are clinically validated, Natural medicine (2024). DOI: 10.1038/s41591-024-03203-3
Provided by the University of North Carolina Health Department
Quote:Nearly Half of FDA-Approved AI Medical Devices Aren’t Trained on Real Patient Data, Study Finds (2024, August 26) retrieved August 26, 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.