The Oversized Estimate
The Oversized Estimate
Standard variance estimators under two-way clustering — the bread and butter of empirical economics — systematically underestimate the true variance when means are heterogeneous across clusters. Tests that should reject at 5% reject at 8%, 10%, or worse. The inference is oversized: it looks precise but isn’t.
The mechanism is specific. When cluster means differ (group A has a higher average outcome than group B), the residuals within each cluster are smaller than they should be — the cluster-specific mean absorbs variation that the estimator doesn’t account for. The variance estimator, computing from these artificially small residuals, reports a variance that is too low. Standard errors shrink. Confidence intervals narrow. P-values drop below thresholds they shouldn’t cross.
This isn’t a small-sample problem that disappears with more data. The bias is structural: it exists whenever means vary across clusters and the estimator doesn’t fully account for that variation. More observations per cluster makes each cluster mean more precisely wrong.
The fix — a conservative estimator with proven asymptotic validity — doesn’t try to estimate the bias and subtract it. Instead, it adds an explicit correction term that is guaranteed to be non-negative, ensuring the estimate is never too small. The correction is exact under homogeneous means (it adds zero) and increasingly important as heterogeneity grows.
The through-claim: the most widely used variance estimator in applied economics has a systematic directional bias that makes results look more significant than they are. The bias isn’t random — it always points toward rejection. Every empirical paper using two-way clustering with heterogeneous group means has potentially oversized tests. The correction exists. Whether it changes published conclusions is an empirical question nobody has yet asked at scale.