Researchers have created an artificial intelligence tool that uses sequences of life events, such as medical history, education, employment and income, to predict everything from a person’s personality to its mortality.
Built using transformer models, which power extended language models (LLMs) like ChatGPT, the new tool, life2vec, is trained on a dataset taken from Denmark’s entire population of 6 million of people. The dataset was made available only to researchers by the Danish government.
The tool that the researchers built based on this complex set of data is able to predict the future, including the lifespan of individuals, with an accuracy that exceeds state-of-the-art models. But despite its predictive power, the team behind the research says it’s best used as a basis for future work, not as an end in itself.
“Even though we use prediction to evaluate the quality of these models, the tool should not be used to predict on real people,” says Tina Eliassi-Rad, professor of computer science and inaugural Joseph E. Aoun Chair, Professor at Northeastern University. “It is a prediction model based on a specific data set of a specific population.”
Eliassi-Rad brought his expertise in AI ethics to the project. “These tools allow you to look at your company in a different way: the policies you have, the rules and regulations you have,” she says. “We can see it as an analysis of what is happening on the ground.”
By involving social scientists in the process of creating this tool, the team hopes it will bring a human-centered approach to AI development that does not lose sight of the humans in the middle of the massive dataset their tool was trained on.
“This model offers a much more complete reflection of the world as it is experienced by human beings than many other models,” says Sune Lehmann, author of the paper recently published in Nature Computational Science. A summary of research on the subject is presented in the same issue of the journal.
At the heart of life2vec is the massive data set that the researchers used to train their model. The data is held by Statistics Denmark, the central authority for Danish statistics, and, although strictly regulated, can be viewed by certain members of the public, including researchers. The reason it is so tightly controlled is that it includes a detailed register of every Danish citizen.
The many events and elements that make up a life and that are described in the data, from health factors and education to income. The researchers used this data to create long models of recurring life events to feed into their model, taking the transformer model approach used to train LLMs for language and adapting it to a human life represented as a sequence of events.
“The entire story of a human life, in a way, can also be thought of as a giant long sentence about the many things that can happen to a person,” says Lehmann, professor of network and complexity sciences at DTU Compute, Technical University of Denmark and previously a postdoctoral researcher at Northeastern.
The model uses information learned from observing millions of sequences of life events to construct so-called vector representations in embedding spaces, where it begins to categorize and make connections between life events. life such as income, education or health factors. These integration spaces serve as the basis for the predictions that the model ends up making.
One of the life events predicted by the researchers was a person’s likelihood of mortality.
“When we visualize the space the model uses to make predictions, it looks like a long cylinder that takes you from a low probability of death to a high probability of death,” says Lehmann. “Then we can show that ultimately where there is a high probability of death, many of these people actually died, and ultimately where there is a low probability of dying , causes of death are something we can’t predict, like car accidents.”
The article also illustrates how the model is able to predict individual responses to a standard personality questionnaire, particularly regarding extroversion.
Eliassi-Rad and Lehmann note that although the model makes very accurate predictions, these are based on very specific correlations, cultural and societal contexts, and the types of biases that exist in each data set.
“This type of tool is like an observatory of society, and not of all societies,” explains Eliassi-Rad. “This study was done in Denmark, and Denmark has its own culture, its own laws and its own societal rules. Whether this can be done in America is another story.”
Given all these caveats, Eliassi-Rad and Lehmann view their predictive model less as a final product and more as the start of a conversation. Lehmann says big tech companies have probably been creating these kinds of predictive algorithms for years in locked rooms. He hopes this work can begin to create a more open public understanding of how these tools work, what they are capable of, and how they should and should not be used.
More information:
German Savcisens et al, Using sequences of life events to predict human lives, Nature Computational Science (2023). DOI: 10.1038/s43588-023-00573-5
A transformation method that predicts human lives from sequences of life events, Nature Computational Science (2023). DOI: 10.1038/s43588-023-00586-0
Provided by Northeastern University
This story is republished courtesy of Northeastern Global News news.northeastern.edu.
Quote: New AI model can predict human lifespan, researchers say. They want to make sure it is used wisely (December 23, 2023) retrieved December 25, 2023 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.