A new statistical tool developed by University of Chicago researchers improves the ability to find disease-causing genetic variants. The tool, described in a new article published on January 26, 2024 in Natural geneticscombines data from genome-wide association studies (GWAS) and gene expression predictions to limit the number of false positives and more accurately identify causative genes and variants of a disease.
GWAS is a commonly used approach to identify genes associated with a range of human traits, including the most common diseases. Researchers compare genome sequences from a large group of people with a specific disease, for example, with another set of sequences from healthy individuals. The differences identified in the disease group could indicate genetic variants that increase the risk of this disease and would warrant further study.
However, most human diseases are not caused by a single genetic variation. Instead, they are the result of a complex interaction of multiple genes, environmental factors, and a multitude of other variables. As a result, GWAS often identifies many variants in many disease-associated regions of the genome.
The limitation of GWAS, however, is that it only identifies association, not causation. In a typical genomic region, many variants are highly correlated with each other, due to a phenomenon called linkage disequilibrium. This is because DNA is passed from one generation to the next in entire blocks, not individual genes, so closely related variants tend to be correlated.
“You can have many genetic variants in a block that are all correlated with disease risk, but you don’t know which one is actually the causal variant,” said Xin He, Ph.D., associate professor of human genetics and director main. author of the new study. “This is the fundamental challenge of GWAS, which is how to move from association to causation.”
To make the problem even more difficult, most genetic variants are found in non-coding genomes, making their effects difficult to interpret. A common strategy to address these challenges is to use gene expression levels. Expression Quantitative Trait Loci, or eQTL, are genetic variants associated with gene expression.
The rationale for using eQTL data is that if a disease-associated variant is an eQTL of gene X, then X is likely the link between the variant and the disease. The problem with this reasoning, however, is that close variants and eQTLs of other genes can be correlated with the eQTL of gene X while directly affecting the disease, leading to a false positive.
Many methods have been developed to designate GWAS risk genes using eQTL data, but they all suffer from this fundamental problem of confounding by close associations. In fact, existing methods can generate false-positive genes more than 50% of the time.
In the new study, Professor He and Matthew Stephens, Ph.D., Ralph W. Gerard Professor and chair of the Departments of Statistics and professor of human genetics, developed a new method called world-wide causal association studies. transcriptome, or cTWAS, which uses advanced statistical techniques to reduce false positive rates. Instead of focusing on a single gene at a time, the new cTWAS model considers multiple genes and variants. Using a Bayesian multiple regression model, it can eliminate confounding genes and variants.
“If you look one at a time, you’ll get false positives, but if you look at all nearby genes and variants together, you’re much more likely to find the causative gene,” he said.
The article demonstrates the usefulness of this new technique by studying the genetics of LDL cholesterol levels. As an example, existing eQTL methods identified a gene involved in DNA repair, but the new cTWAS approach identified a different variant of the target gene for statin, a drug commonly used to treat hypercholesterolemia. In total, cTWAS identified 35 putative LDL causative genes, more than half of which have not been previously reported. These results indicate novel biological pathways and potential treatment targets for LDL.
The cTWAS software is now available for download from the He Lab website. He hopes to continue working on it to expand its capabilities to incorporate other types of “omics” data, such as splicing and epigenetics, as well as using eQTLs from multiple tissue types.
“The software will allow people to perform analyzes linking genetic variations to phenotypes. That’s really the main challenge facing the whole field,” he said. “We now have a much better tool to make those connections.”
Other authors of the study, “Adjusting for genetic confounding factors in transcriptome-wide association studies leads to reliable detection of causal genes,” include Siming Zhao, Wesley Crouse, Sheng Qian and Kaixuan Luo of the University of Chicago.
More information:
Adjusting for genetic confounders in transcriptome-wide association studies improves the discovery of risk genes for complex traits, Natural genetics (2024). DOI: 10.1038/s41588-023-01648-9
Provided by the University of Chicago
Quote: A new tool improves the search for genes responsible for diseases (January 26, 2024) retrieved on January 26, 2024 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.