The Inherited Prejudice

The Inherited Prejudice

Human judges exhibit cognitive biases that affect sentencing: the virtuous victim effect (victims perceived as morally good receive more favorable treatment) and prestige-based halo effects (defendants from prestigious backgrounds receive lighter sentences). If LLMs are used for judicial decision support, do they inherit these biases?

Testing five frontier models on modified vignettes from the cognitive bias literature: the virtuous victim effect is present and larger than in humans, with no statistically significant penalty for adjacent consent — a manipulation that should reduce the effect. The prestige-based halo effect is slightly reduced compared to humans for occupation and company prestige, but credential-based prestige shows a large reduction. LLMs are modestly better than human judges overall, but the improvement is uneven across bias types.

The through-claim: LLMs do not have cognitive biases in the mechanistic sense — they have statistical biases inherited from training data that happen to produce the same observable patterns. The distinction matters for intervention. Human cognitive biases arise from psychological heuristics (representativeness, anchoring, affect). LLM biases arise from distributional properties of the training corpus. The same debiasing techniques that work on statistical distributions (re-weighting, adversarial training) should be more effective on LLMs than the techniques that work on human cognition (awareness training, structured decision aids). The bias looks the same from outside but has a different causal structure — and the causal structure determines what fixes are possible.


No comments yet.