Lesson: AI chatbots provide less accurate information to vulnerable users | MIT News

Large-scale linguistic models (LLMs) have been advocated as tools that can democratize access to information worldwide, providing information in a user-friendly format regardless of a person’s background or location. However, new research from MIT’s Center for Constructive Communication (CCC) suggests that these artificial intelligence systems may actually be working very poorly for the users who would benefit most from them.
A study by researchers at CCC, based at the MIT Media Lab, found that modern AI chats – including OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, and Meta’s Llama 3 – sometimes give inaccurate and unrealistic answers to users with little knowledge of English, or in the United States from low English education, or in the United States. Models also refuse to answer questions at high rates from these users, and in some cases, respond in derogatory or encouraging language.
“We were inspired by the hope that LLMs help address unequal access to knowledge around the world,” said lead author Elinor Poole-Dayan SM ’25, a technical fellow at the MIT Sloan School of Management who led the research as a CCC affiliate and a master’s student in media arts and sciences. “But that vision cannot become a reality without ensuring that model biases and harmful tendencies are safely reduced for all users, regardless of language, ethnicity, or other type of population.”
A paper describing the work, “Under-Oriented LLM Disproportionately Impacts Vulnerable Users,” was presented at the AAAI Conference on Artificial Intelligence in January.
Systematic underperformance across multiple dimensions
In this study, the team examined how three LLMs responded to questions from two datasets: TruthfulQA and SciQ. TruthfulQA is designed to measure the validity of a model (by relying on common misconceptions and practical facts about the real world), while SciQ contains scientific test questions that test the accuracy of the truth. The researchers prepared short user texts for each question, three different aspects: level of education, knowledge of English, and country of origin.
In all three models and both data sets, the researchers found a significant drop in accuracy when questions came from users defined as having little formal education or non-native English speakers. The results were particularly striking for users at the intersection of these categories: those with informal education and non-native English speakers saw a significant drop in response quality.
The study also examined how country of origin affected the performance of the model. Examining users from the United States, Iran, and China with similar educational backgrounds, the researchers found that Claude 3 Opus in particular performed worse for users from Iran in both data sets.
“We see a significant drop in accuracy for a non-native and illiterate English-speaking user,” said Jad Kabbara, a research scientist at CCC and co-author of the paper. “These results show that the negative effects of modeling behavior in relation to these characteristics of users combine in related ways, thus suggesting that such models are used at a greater risk of spreading harmful behavior or incorrect information to those who cannot identify it.”
Refusal and derogatory language
Perhaps most striking was the difference in how often the models refused to answer the questions completely. For example, Claude 3 Opus refused to answer about 11 percent of the questions of the less educated, non-native English-speaking users – compared to only 3.6 percent in the control condition with no user history.
When the researchers manually analyzed these rejections, they found that Claude responded with derogatory, encouraging, or humorous language 43.7 percent of the time for less educated users, compared to less than 1 percent for more educated users. In some cases, the model imitates broken English or adopts an exaggerated dialect.
The model also refused to provide information on certain topics specifically for less educated users from Iran or Russia, including questions about nuclear power, anatomy, and historical events – although it answered similar questions correctly for other users.
“This is another indicator that suggests that the alignment process may encourage models to withhold information from certain users to avoid misleading them, even though the model clearly knows the correct answer and provides it to other users,” Kabbara said.
Echoes of human bias
The findings show documented patterns of social bias. Sociological research has shown that native English speakers tend to perceive non-native speakers as less educated, smarter, and more competent, regardless of their actual expertise. Similar biases have been documented among teachers who assess non-English speaking students.
“The value of major language models is reflected in their extraordinary adoption by individuals and the large investment that flows into technology,” said Deb Roy, professor of arts and media science, director of CCC, and co-author of the paper. “This study is a reminder of how important it is to continually examine the systematic biases that can silently creep into these systems, causing unfair harm to certain groups without any of us being fully aware.”
The implications are particularly concerning given that personalization features – such as ChatGPT Memory, which tracks user information across conversations – are becoming more common. Such features are at risk of managing already neglected groups.
“LLMs are marketed as tools that will promote equal access to knowledge and transform personalized learning,” says Poole-Dayan. “But our findings suggest that it may actually increase existing inequalities by providing false information or refusing to answer questions for certain users. People who rely too much on these tools may receive less, false, or harmful information.”


