Credit: AI generated image
A fish on land always flaps its fins, but the results are markedly different when it is in water. Attributed to the famous computer scientist Alan Kay, this analogy is used to illustrate the power of context to illuminate the questions being studied.
For the first time in the field of artificial intelligence (AI), a tool called PINNACLE embodies Kay’s vision of understanding how proteins behave in their own context, as determined by the tissues and cells in which those proteins act and interact. Notably, PINNACLE overcomes some of the limitations of current AI models, which tend to analyze how proteins function and malfunction, but do so in isolation, one cell and tissue type at a time.
The development of the new AI model, described in Methods of naturewas conducted by researchers at Harvard Medical School.
“The natural world is interconnected, and PINNACLE helps identify these connections, which we can use to gain more detailed knowledge about proteins and safer, more effective drugs,” said study lead author Marinka Zitnik, assistant professor of biomedical informatics in the Blavatnik Institute at HMS. “It overcomes the limitations of current, context-free models and suggests the future direction for improving protein interaction analyses.”
This breakthrough, the researchers note, could advance current understanding of the role of proteins in health and disease and shed light on new drug targets for designing more precise and better tailored therapies.
PINNACLE is available free of charge to scientists worldwide.
A big step forward
Disentangling interactions between proteins and the effects of their contiguous biological neighbors is challenging. Current analysis tools play a crucial role in providing information about the structural properties and shapes of individual proteins. However, these tools are not designed to address the contextual nuances of the overall protein environment. Instead, they produce protein representations that are context-independent, meaning they lack contextual information about cell type and tissue type.
Proteins, however, play different roles depending on the cellular and tissue context in which they evolve, and also depending on whether the tissue or cell is healthy or diseased. Single-protein representation models cannot identify protein functions that vary across the multitude of contexts.
When it comes to protein behavior, it’s location, location, location.
Made up of twenty different amino acids, proteins are the building blocks of cells and tissues and are essential for a range of biological functions essential to life, from transporting oxygen throughout the body to contracting muscles for breathing and walking, to digestion and fighting infections, among others.
Scientists estimate that the number of proteins in the human body ranges from 20,000 to hundreds of thousands.
Proteins interact with each other but also with other molecules, such as DNA and RNA.
The complex interaction between and among proteins creates complex protein interaction networks. Located within and among other cells, these networks engage in many complex interactions with other proteins and protein networks.
The advantage of PINNACLE is its ability to recognize that protein behavior can vary across cells and tissue types. The same protein may have a different function in a healthy lung cell, a healthy kidney cell, or a diseased colon cell.
PINNACLE sheds light on how these cells and tissues influence the same proteins differently, something that is not possible with current models. Based on the specific cell type in which a protein network resides, PINNACLE can determine which proteins are participating in certain conversations and which are silent. This allows PINNACLE to better decode the dialogue between proteins and behavioral patterns and, ultimately, predict drug targets that are closely tailored to the defective proteins that give rise to disease.
PINNACLE does not replace but complements single-representation models, the researchers noted, because it can analyze protein interactions in diverse cellular contexts.
Thus, PINNACLE could enable researchers to better understand and predict protein function and help elucidate vital cellular processes and disease mechanisms.
This ability can help identify “drug-friendly” proteins that will serve as targets for particular drugs, as well as predict the effects of various drugs on different cell types. For this reason, PINNACLE could become a valuable tool for scientists and drug developers, allowing them to target potential targets much more efficiently.
Such optimization of the drug discovery process is desperately needed, said Zitnik, who is also an associate faculty member at Harvard University’s Kempner Institute for the Study of Natural and Artificial Intelligence.
It can take 10 to 15 years and cost up to $1 billion to bring a new drug to market. The path from discovery to drug is notoriously bumpy, and the end result is often unpredictable. In fact, nearly 90% of drug candidates do not become drugs.
Building and training PINNACLE
Using human cell data from a comprehensive multiorgan atlas, combined with multiple protein-protein interaction networks, cell-type interactions, and tissue interactions, the researchers trained PINNACLE to produce panoramic graphical representations of proteins spanning 156 cell types and 62 tissues and organs.
PINNACLE has generated nearly 395,000 multidimensional representations to date, compared to about 22,000 representations possible in current single-protein models. Each of its 156 cell types includes context-rich protein interaction networks of about 2,500 proteins.
The current numbers of cell types, tissues, and organs are not the upper limits of the model. The cell types assessed to date are from living human donors and cover most, but not all, cell types in the human body. In addition, many cell types have not yet been identified, while others are rare or difficult to probe, such as neurons in the brain.
To diversify PINNACLE’s cellular repertoire, Zitnik plans to use a data platform comprising tens of millions of cells sampled from across the human body.
More information:
Michelle M. Li et al., Contextual AI models for single-cell protein biology, Methods of nature (2024). DOI: 10.1038/s41592-024-02341-3
Provided by Harvard Medical School
Quote:New AI tool captures protein behavior in context (2024, August 17) retrieved August 17, 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.