The predisposition to certain diseases depends largely on the countless variants in our genome. However, it is so far difficult to determine the influence of these variants on the presentation of certain pathological traits, especially in the case of genetic variants that are only rarely present in the population.
Researchers from the German Cancer Research Center (DKFZ), the European Molecular Biology Laboratory (EMBL) and the Technical University of Munich have introduced a deep learning-based algorithm that can predict the effects of rare genetic variants.
The paper, “Integrating Variant Annotations Using Deep Ensemble Networks Boosts Rare Variant Testing,” was published in Natural medicine .
This method allows us to more accurately distinguish people at high risk of disease and facilitates the identification of genes involved in the development of diseases.
Each individual’s genome differs from that of its peers by millions of individual components. These differences in the genome are called variants. Many of these variants are associated with particular biological characteristics and diseases. These correlations are usually determined using genome-wide association studies.
But the influence of rare variants, which occur with a frequency of only 0.1% or less in the population, is often statistically overlooked in association studies.
“Rare variants in particular often have a significantly greater influence on the presentation of a biological trait or disease,” says Brian Clarke, one of the first authors of the current study.
“They can therefore help identify genes that play a role in the development of a disease and which can then guide us towards new therapeutic approaches,” adds Eva Holtkamp, co-first author.
To better predict the effects of rare variants, teams led by Oliver Stegle and Brian Clarke from DKFZ and EMBL and Julien Gagneur from the Technical University of Munich have developed a risk assessment tool based on machine learning. “DeepRVAT” (rare variant association testing), as the researchers have named the method, is the first to use artificial intelligence (AI) in genome-wide association studies to decipher rare genetic variants.
The model was initially trained using sequence data (exome sequences) from 161,000 individuals from the UK Biobank. In addition, the researchers incorporated information about individuals’ genetically influenced biological traits as well as the genes involved in those traits.
The sequences used for training included about 13 million variants. For each of these, detailed “annotations” are available, providing quantitative information on the possible effects that the variant in question may have on cellular processes or protein structure. These annotations were also a central element of the training.
After training, DeepRVAT is able to predict for each individual which genes are altered in their function by rare variants. To do this, the algorithm uses individual variants and their annotations to calculate a numerical value that describes the extent to which a gene is altered and its potential impact on health.
The researchers validated DeepRVAT on genomic data from the UK Biobank. For 34 features tested, i.e., disease-relevant blood test results, the testing method found 352 associations with implicated genes, far outperforming all previously existing models. The results obtained with DeepRVAT were found to be very robust and better reproducible in independent data than results from alternative approaches.
Another important application of DeepRVAT is the assessment of genetic predisposition to certain diseases. The researchers combined DeepRVAT with a polygenic risk score based on more common genetic variants. This significantly improved the accuracy of predictions, especially for high-risk variants.
Additionally, DeepRVAT was found to recognize genetic correlations for many diseases – including various cardiovascular diseases, types of cancer, metabolic and neurological diseases – that had not been found with existing tests.
“DeepRVAT has the potential to significantly advance personalized medicine. Our method works independently of the trait type and can be flexibly combined with other testing methods,” says physicist and data scientist Oliver Stegle. His team now wants to further test the risk assessment tool in large-scale trials as quickly as possible and implement it.
Scientists are already in contact with the organizers of INFORM, for example. The goal of this study is to use genomic data to identify personalized treatments for children with cancer who experience a relapse. DeepRVAT could help uncover the genetic basis of some childhood cancers.
“I am excited about the potential impact of DeepRVAT on rare disease applications. One of the major challenges in rare disease research is the lack of large-scale, systematic data. By leveraging the power of AI and the half-million exomes in the UK Biobank, we have objectively identified the genetic variants that most significantly alter gene function,” says Julien Gagneur from the Technical University of Munich.
The next step is to integrate DeepRVAT into the infrastructure of the German Human Genome and Phenomena Archive (GHGA) to facilitate applications in diagnostics and basic research.
Another advantage of DeepRVAT is that the method requires much less computing power than comparable models. DeepRVAT is available as a user-friendly software package that can be used either with pre-trained risk assessment models or with researchers’ own datasets for specialized purposes.
More information:
Integrating variant annotations using deep ensemble networks improves rare variant testing, Natural medicine (2024). DOI: 10.1038/s41588-024-01919-z. www.nature.com/articles/s41588-024-01919-z
Provided by the German Cancer Research Center
Quote: How do rare genetic variants affect health? AI provides more accurate predictions (2024, September 25) retrieved September 25, 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.