The Preserved Bias

The Preserved Bias

Differential privacy protects individuals by adding noise. But noise is symmetric — it obscures signal and bias equally. When a differentially private synthesizer faithfully reproduces the structure of a dataset, it reproduces the spurious correlations along with the real ones. The bias in the original data survives the privacy mechanism intact.

This is not a bug in differential privacy. Privacy is about what the data reveals about individuals; fairness is about what the data implies about groups. A perfectly private synthetic dataset can be perfectly biased, because the bias is a population-level pattern that noise addition doesn’t address. The synthesizer preserves all statistical dependencies — including the ones you don’t want.

PrivCI attacks this by enforcing conditional independence constraints during the measurement phase. Rather than synthesizing from the full dependency graph and hoping the bias washes out, it builds the graph with specified independence requirements baked in. A CI-aware greedy algorithm integrates feasibility checks into the graph construction under the exponential mechanism — ensuring that the synthetic data never learns the dependency between sensitive attributes and outcomes that the original data happened to contain.

The result: improved fidelity and predictive accuracy alongside enforced fairness. Not a tradeoff but a correction — removing the spurious dependency frees capacity for real signal.

The through-claim: privacy mechanisms that faithfully reproduce data structure will faithfully reproduce data bias, because bias IS structure. The fix is not more privacy or less fidelity but selective structure — choosing which dependencies to preserve and which to sever. The synthesizer must be told what not to learn. This is the opposite of the usual machine learning goal (learn everything), and it requires specifying the unwanted pattern explicitly. You cannot remove bias you haven’t named.


No comments yet.