A new study shows that it is possible to use machine learning and statistics to solve a problem that has long hampered the field of metabolomics: wide variations in data collected from different sites.
“We don’t always know the source of the variation,” said Daniel Raftery, professor of anesthesiology and pain medicine at the University of Washington School of Medicine in Seattle. “This could be because the subjects are different with different genetics, diet and environmental exposure. Or it could be because of how the samples were collected and processed.”
Raftery and his fellow researchers wanted to see if machine learning – a form of artificial intelligence that uses computer algorithms to process large volumes of historical data and identify data patterns – could reduce this variation between data from different sites without hiding important differences.
“We wanted to bring these incompatible data sets together so that results from different studies could be compared or combined for further analysis,” Raftery said.
He led the project with Dabao Zhang and Min Zhang, formerly at Purdue University and now professors of epidemiology and biostatistics at the University of California, Irvine Public Health. Danni Liu, a Ph.D. student at Purdue, was the lead author of the paper, which appears in the Proceedings of the National Academy of Sciences.
Raftery is a researcher in the UW Mitochondria and Metabolism Center, based at UW Medicine South Lake Union in Seattle.
The term metabolomics refers to metabolism, a word that describes the chemical reactions our cells perform to maintain life. These include reactions that break down food to harvest energy and obtain the raw materials cells need for growth and repair, reactions that involve the assembly of cellular components necessary for life, and reactions that involved in dismantling damaged or unnecessary components so that they can be recycled. , thrown away or used as fuel.
The small chemicals produced by these metabolic processes are called metabolites. Metabolite levels reveal what chemical reactions are occurring in a cell, tissue, organ, or organism at any given time and how these reactions may change over time.
Metabolomics is the study of metabolites and the processes that produce them.
This information helps medical scientists better understand not only how cells maintain normal function, but also what might go wrong when people get sick. This knowledge could lead to new ways to diagnose, prevent and treat diseases, Raftery said.
In the new study, researchers built machine learning models to identify factors causing differences between data sets. The models accounted for demographic differences in the study populations, such as age and gender, and used information from other metabolites to explain the observed differences.
The researchers found that their approach reduced the variation between data sets by more than 95% without hiding significant differences, such as those that occur naturally between men and women.
“We have shown that our approach has the potential to reduce unwanted variances observed in metabolomics data while retaining the metabolomics signals of interest,” Raftery said.
The group plans to expand its studies with the goal of better understanding normal metabolism and identifying biomarkers of abnormal metabolism that may be a sign of disease.
More information:
Danni Liu et al, Modeling homeostatic levels of blood metabolites reduces sample heterogeneity between cohorts, Proceedings of the National Academy of Sciences (2024). DOI: 10.1073/pnas.2307430121. doi.org/10.1073/pnas.2307430121
Provided by the University of Washington School of Medicine
Quote: Machine learning promises to accelerate metabolism research (February 12, 2024) retrieved February 12, 2024 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.