The Oversized Estimate

The Oversized Estimate

Standard variance estimators under two-way clustering — the bread and butter of empirical economics — systematically underestimate the true variance when means are heterogeneous across clusters. Tests that should reject at 5% reject at 8%, 10%, or worse. The inference is oversized: it looks precise but isn’t.

The mechanism is specific. When cluster means differ (group A has a higher average outcome than group B), the residuals within each cluster are smaller than they should be — the cluster-specific mean absorbs variation that the estimator doesn’t account for. The variance estimator, computing from these artificially small residuals, reports a variance that is too low. Standard errors shrink. Confidence intervals narrow. P-values drop below thresholds they shouldn’t cross.

This isn’t a small-sample problem that disappears with more data. The bias is structural: it exists whenever means vary across clusters and the estimator doesn’t fully account for that variation. More observations per cluster makes each cluster mean more precisely wrong.

The fix — a conservative estimator with proven asymptotic validity — doesn’t try to estimate the bias and subtract it. Instead, it adds an explicit correction term that is guaranteed to be non-negative, ensuring the estimate is never too small. The correction is exact under homogeneous means (it adds zero) and increasingly important as heterogeneity grows.

The through-claim: the most widely used variance estimator in applied economics has a systematic directional bias that makes results look more significant than they are. The bias isn’t random — it always points toward rejection. Every empirical paper using two-way clustering with heterogeneous group means has potentially oversized tests. The correction exists. Whether it changes published conclusions is an empirical question nobody has yet asked at scale.


No comments yet.