Overview of participant selection and RF model performance. a, From UCSF EHRs and the UCSF Memory and Aging Center (MAC) database, clinical and participant information was extracted, filtered, and prepared for time points before time indexing. All extracted clinical features were one-hot encoded and trained on random forest (RF) models to predict future risk of AD diagnosis. The models were evaluated on a 30% retained evaluation set to calculate AUROC/AUPRC and interpreted based on feature importance and using a heterogeneous knowledge network (SPOKE). The main functionalities were then validated in external databases. b, Filtering a consistent set of individuals with AD and controls from the UCSF EHR for model training and testing. The filtered participant cohorts are presented in Table 1 and distributed with 30% retained for testing. c, bootstrapped performance of RF models on the retained evaluation set (n = 300 bootstrapped iterations of 1,000 participants, prevalence of AD on the retained set = 0.003). Bootstrapped AUROC performance for models trained and tested on female strata and male strata is also shown. The inset shows the quartiles (25th, 50th and 75th percentiles), the whiskers extend up to 1.5 times the interquartile range and the remaining points are outliers. Credit: Natural aging (2024). DOI: 10.1038/s43587-024-00573-8
UC San Francisco scientists have found a way to predict Alzheimer’s disease up to seven years before symptoms appear by analyzing patient records using machine learning.
The conditions that most influenced the prediction of Alzheimer’s disease were high cholesterol and, in women, osteoporosis, a disease that weakens bones.
The work demonstrates the promise of using artificial intelligence (AI) to spot patterns in clinical data which can then be used to search large genetic databases to determine the cause of that risk. Researchers hope that one day this will speed up the diagnosis and treatment of Alzheimer’s and other complex diseases.
“This is a first step towards using AI on routine clinical data, not only to identify risk as early as possible, but also to understand the biology behind it,” he said. said the study’s lead author, Alice Tang, MD/Ph.D. student in the Sirota Lab at UCSF. “The power of this AI approach lies in identifying risks based on combinations of diseases.”
The results appear in Natural aging.
Clinical data and predictive power
Scientists have long sought to discover the biological factors and early predictors of Alzheimer’s disease, a progressive and ultimately fatal form of dementia that destroys memory. Alzheimer’s disease affects an estimated 6.7 million Americans, nearly two-thirds of whom are women. The risk of contracting the disease increases with age and women tend to live longer than men, but this does not fully explain why more women than men have it.
Researchers used the UCSF clinical database of more than 5 million patients to search for co-occurring conditions in patients diagnosed with Alzheimer’s disease at the UCSF Memory and Aging Center, per compared to individuals without AD and found that they could identify with a predictive power of 72% who would have developed the disease up to seven years previously.
Several factors, including hypertension, hypercholesterolemia, and vitamin D deficiency, were predictive in both men and women. Erectile dysfunction and an enlarged prostate were also predictive for men. But for women, osteoporosis was a particularly strong predictor.
This does not mean that everyone with bone disease, common in older women, will have Alzheimer’s disease.
“It is the combination of diseases that allows our model to predict the onset of AD,” Tang said. “Our finding that osteoporosis is a predictor for women highlights the biological interaction between bone health and dementia risk.”
A precision medicine approach
To understand the biology underlying the model’s predictive power, researchers turned to public molecular databases and a specialized tool developed at UCSF called SPOKE (Scalable Precision Medicine Oriented Knowledge Engine), developed in the laboratory of Sergio Baranzini, Ph.D., professor of neurology and member of the Weill Institute for Neuroscience at UCSF.
SPOKE is essentially a database of databases that researchers can use to identify patterns and potential molecular targets for therapy. It took up the well-known association between Alzheimer’s disease and hypercholesterolemia, via a variant of the apolipoprotein E gene, APOE4. But, combined with genetic databases, it also made it possible to identify a link between osteoporosis and Alzheimer’s disease in women, via a variant of a lesser known gene, called MS4A6A.
Ultimately, researchers hope this approach can be used with other difficult-to-diagnose diseases like lupus and endometriosis.
“This is a great example of how we can leverage patient data with machine learning to predict which patients are most likely to develop Alzheimer’s disease, and also to understand why this happens. “so,” said the study’s lead author, Marina Sirota, PhD, associate professor at UCSF’s Bakar Computational Health Sciences Institute.
More information:
Alice S. Tang et al, Leveraging electronic health records and knowledge networks for Alzheimer’s disease prediction and sex-specific biological information, Natural aging (2024). DOI: 10.1038/s43587-024-00573-8
Provided by University of California, San Francisco
Quote: How AI can help detect early risk factors for Alzheimer’s disease (February 21, 2024) retrieved February 21, 2024 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.