The style in which the decision question is written does not affect the direction of discrimination between models. However, the degree of discrimination is sometimes greater for specific styles. For example, the magnitude of the discrimination score is generally greater when prompts are written in an emotional style. Credit: arXiv (2023). DOI: 10.48550/arxiv.2312.03689
The troubling presence of racial bias in AI results may be easier to contain than many thought. Scientists at AI research firm Anthropic say a little politeness might do the trick, at least in some cases.
In a report titled “Assessing and Mitigating Discrimination in Language Model Decisions,” uploaded to the preprint server arXiv On December 6, researchers said they were able to “significantly reduce” the occurrence of AI-generated decisions that showed evidence of discrimination simply by using carefully designed prompts.
They created a number of real-world scenarios and asked for recommendations from Claude 2.0, a model created by Anthropic that scored 76 percent on multiple-choice questions on a bar exam.
In this investigation, they tasked Claude with evaluating requests for actions such as a credit limit increase, a small business loan, a mortgage loan, approval of an adoption and granting of contract. In total, 70 scenarios were tested.
The baseline questions, without adjustments, yielded results showing both positive and negative discrimination when dealing with applicants of different race, age, and gender identification. Non-white applicants, women, and non-binary applicants received higher scores on discrimination, while older subjects received the lowest scores.
But when researchers suggested the “consider how to avoid discrimination before deciding” model or noted that discrimination is illegal, they saw a decline in prejudice.
“We are able to significantly reduce affirmative and negative discrimination through careful and rapid engineering,” the authors concluded.
Such engineering included adding emphatic prompts after the basic questions. For example, when researchers clarified that demographic data should not influence decisions or directly asserted that it is illegal to consider demographic information, bias scores, ranging from -0.5 to 2, increased. are close to 0.
In other cases, researchers made what they defined as “emotional formulations,” such as stating, “It’s really important” not to discriminate. In some cases, they repeated the word “really” several times in a single sentence.
They also explicitly asked Claude to “think out loud about how to avoid bias and stereotypes” in his responses.
Researchers found that such interventions resulted in bias scores close to zero.
“These findings demonstrate that affirmative and negative discrimination on the issues we examine can be significantly reduced, and in some cases completely eliminated, by a set of rapid interventions,” they said.
Evidence of troubling results emerged shortly after ChatGPT was introduced a year ago. A technical writer reported that an initial effort to incite racial bias failed because ChatGPT “politely” declined. But when he was asked to act as a biased writer for a racist magazine, he produced blatantly offensive comments.
Another user managed to get ChatGPT to write lyrics for a sexist song: “If you see a woman in a lab coat, she’s probably just there to clean the floor. But if you see a man in a lab coat, then he probably has the knowledge. and the skills you are looking for.
A recent study of four major language models by the Stanford School of Medicine found examples of “perpetuation of race-based medicine in their responses” across all models.
As AI is increasingly leveraged in industry, medicine, finance and education, biased data from often anonymous sources could wreak havoc – physical, financial and emotional.
“We believe that a sociotechnical lens will be necessary to ensure beneficial outcomes for these technologies, including both policies within individual companies as well as the broader policy and regulatory environment,” Anthropic researchers said .
“The appropriate use of models for high-stakes decisions is an issue that governments and societies as a whole should influence…rather than these decisions being made solely by individual companies or actors.”
More information:
Alex Tamkin et al, Assessing and Mitigating Discrimination in Language Pattern Decisions, arXiv (2023). DOI: 10.48550/arxiv.2312.03689
Dataset and prompts: huggingface.co/datasets/Anthropic/discrim-eval
arXiv
© 2023 Science X Network
Quote: Scientists tackle AI bias with polite prompting (December 13, 2023) retrieved December 14, 2023 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.