The 3 billion base pairs that make up the human genome – the corresponding puzzle pieces, adenine pairing with thymine and cytosine pairing with guanine – are not just the body’s instruction manual. Rearrangements in the order of these base pairs are markers of the origins of diseases and our evolutionary history. They can be simple, when a handful of base pairs change places. They can also be complex, such as when a stretch of tens of thousands of base pairs flips and several sections are missing.
Current state-of-the-art techniques for reading the genome, called whole-genome sequencing, are suitable for finding simple variations, but they are not enough when it comes to finding complex structural variations. Now, a new study led by Stanford Medicine has developed an artificial intelligence-based method capable of identifying complex structural variants from whole genome sequencing data.
The study, published September 30 in Cellcreated a catalog of complex structural variants using more than 4,000 human genomes from around the world. These variants often appear in genes governing the brain and have been found in regions of the genome linked to human evolution.
The researchers also showed that some of the complex structural variants affected how instructions in brain-related genes were read in the brains of people diagnosed with schizophrenia or bipolar disorder.
“This work represents a major advance in understanding the genetic and molecular bases of psychiatric disorders and suggests that brain-related diseases and, in general, disorders that have a strong genetic component should be the subject of complex analysis of structural variants,” said the study’s lead author. study Alexander Urban, Ph.D., associate professor of psychiatry and behavioral sciences, as well as genetics.
“Any entire genome sequence should be analyzed via this new algorithm; this will allow us to uncover important answers in currently overlooked data.”
Urban and Wing Wong, Ph.D., the Goldman Sachs Stephen R. Pierce Family Professor of Science and Human Health and professor of statistics and biomedical data science, were co-senior authors.
The genome in wide angle
Almost all variations discovered so far in the human genome are simple. But the results of the new algorithm showed that each genome also has between 80 and 100 complex structural variations.
“Looking only for simple variations is like proofreading a book manuscript and looking exclusively for typos that change individual letters,” Urban said. “You overlook words that are garbled, duplicated, or in the wrong order. You may not even notice that half a chapter is missing. All of these things need to be caught before the manuscript is sent to the printer.”
The Automated Complex Structural Variant Reconstruction Algorithm, ARC-SV for short, detects all kinds of DNA rearrangements and has a 95% accuracy rate in finding complex structural variants. The algorithm uses an AI model and was trained on dozens of complete human genomes, called pangenomes, from people of diverse ancestry.
The algorithm found more than 8,000 distinct complex structural variants, varying in length between 200 and 100,000 base pairs. Many variants were located in regions of the genome that regulate brain development and function. The researchers took a closer look at whether these variants were associated with psychiatric illness.
Genetics and psychiatric illnesses
The ability to easily find and study complex structural variations could help explain which genome alterations lead to inherited psychiatric diseases. The study looked at two of these illnesses, schizophrenia and bipolar disorder. Genome-wide association studies, called GWAS, have identified many locations in the genome that carry a risk of psychiatric illness diagnosis. But the GWAS results fail to explain genetic risk in enough detail to act accordingly.
“We have made incredible progress in identifying the genetic components of psychiatric illnesses, but something important is still missing,” Urban said. “The GWAS results tell us where in the genome certain DNA changes linked to a disorder are located. But the GWAS information is somewhat vague. It’s like knowing there are errors somewhere on pages 118 , 237 and 304 of a book But we do not know what kind of errors they are or what words they are.
Urban explained that while the GWAS results might prompt researchers to look for something wrong on page 118, knowing the sequence of complex structural variants is like having a yellow highlighter over the actual 10-word sentence on that page which contains a scrambled word and another duplicated word.
“That’s exactly it,” he said.
The researchers tested the results of the ARC-SV algorithm. They used whole genome sequences combined with gene expression measurements from more than 100 postmortem brain tissue samples from healthy individuals and people diagnosed with schizophrenia or bipolar disorder to study complex structural variations.
The variants tended to be located near or overlapped with GWAS locations known to be associated with the risk of developing schizophrenia or bipolar disorder. The complex structural variants also affected how neighboring genes were expressed, changing the reading of instructions contained in the DNA, suggesting that the variants could contribute to the disease.
“Identifying and studying complex structural variants will allow us to better understand how DNA can vary and provide molecular clues that will map the trajectory of biological function leading to disease and disease treatment ” said Bo Zhou, Ph.D., professor of psychiatry and behavioral sciences and first author of the study.
More information:
Bo Zhou et al, Detection and analysis of complex structural variations in human genomes across populations and in the brains of donors with psychiatric disorders, Cell (2024). DOI: 10.1016/j.cell.2024.09.014
Cell
Provided by Stanford University
Quote: Complex genomic variants are linked to psychiatric illnesses, study finds (October 21, 2024) retrieved October 21, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.