The Factor Sweet Spot

The Factor Sweet Spot

Conditional diffusion models for equity return prediction generate the complete distribution of future returns, conditioned on firm characteristics — size, momentum, value, profitability, and dozens more. The question is how many characteristics to condition on.

Too few factors and the model underfits: the generated return distributions are too broad, the resulting portfolios over-diversified, the signal lost in noise. Too many factors and the model overfits: the portfolios concentrate on spurious patterns in the training data, becoming unstable and weak out-of-sample. An intermediate factor dimensionality outperforms all baseline portfolio strategies.

This is the classical bias-variance tradeoff, but in a generative context rather than a discriminative one. The diffusion model doesn’t predict a point estimate — it generates samples from a learned conditional distribution. Over-conditioning doesn’t just produce noisy point predictions; it produces entire distributions that are confidently wrong. The model generates plausible-looking returns for factor combinations it has memorized from training, and these plausible fictions drive concentrated bets on illusions.

The through-claim: the bias-variance tradeoff in generative models is more dangerous than in discriminative ones because the overfit model produces confident, internally consistent outputs rather than noisy ones. A discriminative model that overfits gives volatile predictions — the instability is visible. A diffusion model that overfits generates smooth, detailed distributions that happen to describe a world that doesn’t exist. The sweet spot in factor dimensionality is where the model knows enough to be specific but not enough to be fictional.


No comments yet.