The Bias Head
The Bias Head
Standard fairness audits quantify that a model is biased. They don’t say where. A mechanistic fairness audit — combining projected residual-stream decomposition, zero-shot Concept Activation Vectors, and bias-augmented TextSpan analysis — locates demographic bias at individual attention heads in CLIP’s vision encoder.
The bias concentrates in terminal-layer heads. Four attention heads in the final layers of ViT-L-14, when ablated, reduce global gender bias. One single head accounts for most of the bias reduction in the most stereotyped profession classes: ablating it drops Cramer’s V from 0.381 to 0.362 while slightly improving accuracy (+0.42%). The bias was not distributed across the network — it was concentrated in a component that could be removed without performance cost.
Age bias tells a different story. The same pipeline identifies candidate heads, but ablation effects are weaker and less consistent. Age bias is encoded more diffusely than gender bias in this model — no single head carries it.
The through-claim: different biases have different architectures. Gender bias in CLIP is focal — concentrated in identifiable components that can be surgically removed. Age bias is distributed — spread across the network in ways that resist localized intervention. This means there is no single “debiasing” technique that works for all demographic dimensions. The intervention must match the architecture of the bias, and the architecture varies by category. A mechanistic audit is necessary not just to confirm bias exists but to determine what kind of intervention is even possible.