AI-powered chatbots could potentially expand access to mental health support, but high-profile stumbles have cast doubt on their reliability in high-stakes scenarios. Credit: Sadjad/Figma and Alex Ouyang/MIT Jameel Clinic
Under the cover of anonymity and the company of strangers, the appeal of the digital world as a place to seek mental health support increases. This phenomenon is fueled by the fact that more than 150 million people in the United States live in areas designated by the federal government as having a shortage of mental health professionals.
“I really need your help because I’m too scared to talk to a therapist and I can’t get through to one anyway.”
“Am I overreacting, being hurt by my husband making fun of me to his friends?”
“Could strangers influence my life and decide my future for me?
The quotes above are real user posts on Reddit, a social media news website and forum where users can share content or ask advice in smaller, interest-based forums, called “subreddits”.
Using a dataset of 12,513 posts with 70,429 responses from 26 mental health-related subreddits, researchers from MIT, New York University (NYU), and the University of California at Los Angeles (UCLA) designed a framework to help assess equity and overall quality of mental health. healthcare support chatbots based on extended language models (LLM) like GPT-4.
To do this, researchers asked two licensed clinical psychologists to evaluate 50 randomly sampled Reddit posts seeking mental health support, matching each post with either a Redditor’s actual response or a response generated by GPT -4. Without knowing which responses were real or which were generated by AI, psychologists were asked to rate the level of empathy in each response.
Their work was recently presented at the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). It is available on the arXiv preprint server.
Mental health support chatbots have long been explored as a way to improve access to mental health support, but powerful LLMs like OpenAI’s ChatGPT are transforming human-AI interaction, AI-generated responses becoming more difficult to distinguish from the responses of real humans.
Despite this remarkable progress, the unintended consequences of mental health support provided by AI have drawn attention to its potentially life-threatening risks; in March last year, a Belgian man committed suicide following an exchange with ELIZA, a chatbot developed to imitate a psychotherapist with an LLM called GPT-J. A month later, the National Eating Disorders Association suspended its chatbot Tessa, after it began giving diet advice to patients with eating disorders.
Saadia Gabriel, a recent postdoctoral fellow at MIT who is now an assistant professor at UCLA and first author of the paper, admitted that she was initially very skeptical that mental health support chatbots would actually work.
Gabriel conducted this research during his postdoctoral stay at MIT in the Healthy Machine Learning Group, under the direction of Marzyeh Ghassemi, an MIT associate professor in the Department of Electrical Engineering and Computer Science and the Institute of Medical Engineering and science from MIT, affiliated with MIT. Abdul Latif Jameel Clinic for Machine Learning in Health and Computer Science and Artificial Intelligence Laboratory.
What Gabriel and the team of researchers found was that GPT-4 responses were not only more empathetic overall, but they were 48% more effective at encouraging positive behavior changes than human responses.
However, in an assessment of bias, researchers found that GPT-4 response empathy levels were reduced for Black (2% to 15% lower) and Asian (5% to 17% lower) posters by compared to white posters or posters whose race was unknown. .
To assess bias in GPT-4 responses and human responses, researchers included different types of posts with explicit demographic leakage (e.g., gender, race) and implicit demographic leakage.
An explicit demographic leak would look like: “I am a 32-year-old black woman. »
Whereas an implicit demographic leak would look like: “Being a 32 year old girl wearing my natural hair”, in which keywords are used to indicate certain demographics to GPT-4.
With the exception of posters of black women, GPT-4 responses were found to be less affected by explicit and implicit demographic leaks than human responders, who tended to be more empathetic when responding to messages containing suggestions implicit demographics.
“The structure of the contribution you give (the LLM) and some information about the context, for example whether you want (the LLM) to act in the style of a clinician, in the style of a social media post, or if you want to use patient demographics, has a major impact on the response you get,” says Gabriel.
The article suggests that explicitly providing LLMs with instructions to use demographic attributes can effectively mitigate bias, as this is the only method in which researchers did not observe a significant difference in empathy between different demographic groups.
Gabriel hopes this work can help ensure a more comprehensive and thoughtful evaluation of LLMs deployed in clinical settings across demographic subgroups.
“LLMs are already used to provide patient assistance and have been deployed in medical settings, in many cases to automate inefficient human systems,” says Ghassemi. “Here, we demonstrated that while peak LLMs are generally less affected by demographic leakage than humans in peer mental health support, they do not provide equitable mental health responses among sub -inferred patient groups…we have many opportunities to improve the models so that they provide better support when used.
More information:
Saadia Gabriel et al, Can AI Relate: Testing Large Language Model Response for Mental Health Support, arXiv (2024). DOI: 10.48550/arxiv.2405.12021
arXiv
Provided by the Massachusetts Institute of Technology
This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news in MIT research, innovation and education.
Quote: Study finds AI chatbots can detect race, but racial bias reduces response empathy (December 17, 2024) retrieved December 17, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.