New study takes novel approach to mitigating bias in LLMs

A recently published study by Stanford Law Professor Julian Nyarko and co-authors finds that racial and other biases exhibited by Large Language Models (LLMs) can be “pruned” away, but because the biases are highly context-specific, there are limits to holding AI model developers (like OpenAI or Google Vision) liable for harmful behavior, given that those companies won’t be able to come up with a one-size-fits-all solution.

Instead, the researchers found, it would be more effective from a legal and policy perspective to hold accountable the companies that are deploying the models in a particular use case – for example, an online retailer that uses OpenAI’s models to make product recommendations.

Numerous studies over the last several years, including research from Stanford Law School and Stanford University, have demonstrated that LLMs exhibit racial biases in their responses. These biases often manifest in ways that reinforce stereotypes or produce systematically different outputs based on racial markers, such as names or dialects. In 2024, for example, Nyarko and co-authors published a widely discussed paper, “What’s in a Name? Auditing Large Language Models for Race and Gender Bias,” which analyzed how AI-generated responses differ based on implicit racial and gender cues in user queries.

In his latest paper, “Breaking Down Bias: On The Limits of Generalizable Pruning Strategies,” Nyarko and his co-authors probed deep into the internal mechanisms of LLMs to identify and mitigate the sources of biased outputs. They established that selectively removing, or pruning, specific computational units – akin to artificial “neurons” – reduces bias without compromising a model’s overall utility. But a bias mitigation strategy trained on financial decision-making, for example, does not necessarily work for commercial transactions or hiring decisions, they found.

Julian Nyarko | Courtesy Stanford Law School

“The real challenge here is that bias in AI models doesn’t exist in a single, fixed location – it shifts depending on context,” Nyarko said. “There are good reasons to hold developers accountable for some of the negative consequences exhibited by their models. But in order to design effective mitigation strategies, we really need to think about regulatory and legal frameworks that focus on the companies actually using these models in real-world scenarios.” Nyarko, an expert in empirical legal studies and computational law, focuses his research at the intersection of AI, machine learning, and legal accountability. He is also an associate director and senior fellow at the Stanford Institute for Human-Centered AI (HAI).

The paper’s co-authors are Stanford Law research fellows Sibo Ma and Alejandro Salinas, along with Princeton computer science professor Peter Henderson.

A novel approach

According to Nyarko, his latest study takes a novel approach to identifying and mitigating racial bias in LLMs. The researchers began by dissecting the internal structure of LLMs, which are essentially vast networks of artificial neurons, comparable to the neurons in brains. These artificial neurons process information and contribute to the generation of responses, including, at times, biased responses.

To mitigate these biases, the team used a method known as model pruning. This involves selectively deactivating or removing specific neurons that were identified as contributing to biased behavior. To identify which neurons to prune, the researchers conducted a comprehensive analysis to identify which neurons only activate when the input prompt involves a racial minority, but not otherwise. The research team then applied their pruning strategy to various contexts to determine the effectiveness of their approach. They used scenarios including financial decision-making, commercial transactions, and hiring decisions to see how well the pruning process reduced bias in each specific context. This method allowed them to pinpoint and remove neurons that consistently contributed to biased responses across different situations.

In addition to neuron pruning, they also experimented with attention-head pruning. Attention heads are part of the mechanism that helps LLMs focus on specific parts of the input when generating a response. By selectively pruning these attention heads, the team assessed whether this method could also effectively reduce bias without significantly disrupting the model’s overall performance.

Their findings revealed that neuron-level pruning was more effective at reducing bias while maintaining the model’s utility. However, they discovered that the effectiveness of pruning techniques varied significantly across different contexts.

Legal and policy implications

The study’s conclusions resonate with ongoing legal debates about AI governance. Regulatory proposals, such as the European Union’s AI Act, take a risk-based approach that places additional compliance obligations on companies using AI for high-risk applications. Similarly, recent U.S. lawsuits, such as Mobley v. Workday, raise questions about whether AI service providers should face the same legal scrutiny as the businesses using their tools to make hiring decisions.

The research underscores the need for policymakers to clarify responsibility for AI-related harms, Nyarko said. If bias is inherently context-dependent, as the study suggests, then imposing broad liability on AI developers will not be very effective. Instead, regulators might consider requiring companies that deploy AI models to conduct rigorous bias audits, maintain transparency about their AI usage, and ensure compliance with anti-discrimination laws.

This story was originally published by Stanford Law School.

Along with Stanford news and stories, show me:

A novel approach

Legal and policy implications

University News

Research & Scholarship

On Campus

Student Experience

A novel approach

Legal and policy implications

For more information