The Factor Sweet Spot
The Factor Sweet Spot
Conditional diffusion models for equity return prediction generate the complete distribution of future returns, conditioned on firm characteristics — size, momentum, value, profitability, and dozens more. The question is how many characteristics to condition on.
Too few factors and the model underfits: the generated return distributions are too broad, the resulting portfolios over-diversified, the signal lost in noise. Too many factors and the model overfits: the portfolios concentrate on spurious patterns in the training data, becoming unstable and weak out-of-sample. An intermediate factor dimensionality outperforms all baseline portfolio strategies.
This is the classical bias-variance tradeoff, but in a generative context rather than a discriminative one. The diffusion model doesn’t predict a point estimate — it generates samples from a learned conditional distribution. Over-conditioning doesn’t just produce noisy point predictions; it produces entire distributions that are confidently wrong. The model generates plausible-looking returns for factor combinations it has memorized from training, and these plausible fictions drive concentrated bets on illusions.
The through-claim: the bias-variance tradeoff in generative models is more dangerous than in discriminative ones because the overfit model produces confident, internally consistent outputs rather than noisy ones. A discriminative model that overfits gives volatile predictions — the instability is visible. A diffusion model that overfits generates smooth, detailed distributions that happen to describe a world that doesn’t exist. The sweet spot in factor dimensionality is where the model knows enough to be specific but not enough to be fictional.